Warning: This is just my notes from the excellent fast.ai MOOC which you can find here

The Practice of Deep Learning

Starting Your Project

The State of Deep Learning

Computer vision

Text (natural language processing)

Combining text and images

Tabular data

Recommendation systems

Other data types

The Drivetrain Approach

Gathering Data

clean

To download images with Bing Image Search, sign up at Microsoft Azure for a free account. You will be given a key, which you can copy and enter in a cell as follows (replacing 'XXX' with your key and executing it):

search_images_ddg

<function fastbook.search_images_ddg(term, max_images=200)>

search_images_bing

<function fastbook.search_images_bing(key, term, min_sz=128, max_images=150)>

results = search_images_bing(key, 'grizzly bear')
ims = results.attrgot('contentUrl')
len(ims)

150

ims

(#150) ['https://images.gearjunkie.com/uploads/2015/07/Grizzly-Bear.jpg','http://someinterestingfacts.net/wp-content/uploads/2016/07/Canadian-Grizzly-Bear.jpg','http://www.pbs.org/wnet/nature/files/2018/07/Bear133.jpg','http://wildlifearticles.co.uk/wp-content/uploads/2015/10/grizzly-bear3.jpg','https://upload.wikimedia.org/wikipedia/commons/thumb/a/a9/GrizzlyBearJeanBeaufort.jpg/1200px-GrizzlyBearJeanBeaufort.jpg','https://gohunt-assets-us-west-2.s3.amazonaws.com/wyoming-grizzly-bear-og_0.jpg','https://upload.wikimedia.org/wikipedia/commons/e/e2/Grizzlybear55.jpg','https://i0.wp.com/www.commonsenseevaluation.com/wp-content/uploads/2013/08/Bear.jpg','https://www.tsln.com/wp-content/uploads/2018/10/bears-tsln-101318-3-1240x826.jpg','https://d3d0lqu00lnqvz.cloudfront.net/media/media/897b2e5d-6d4c-40fa-bbe8-6829455747e2.jpg'...]

dest = '../images/grizzly.jpg'
download_url(ims[0], dest)

im = Image.open(dest)
im.to_thumb(128,128)

bear_types = 'grizzly','black','teddy'
path = Path('bears')

if not path.exists():
    path.mkdir()
    for o in bear_types:
        dest = (path/o)
        dest.mkdir(exist_ok=True)
        results = search_images_bing(key, f'{o} bear')
        download_images(dest, urls=results.attrgot('contentUrl'))

fns = get_image_files(path)
fns

(#436) [Path('bears/grizzly/00000007.jpg'),Path('bears/grizzly/00000002.jpg'),Path('bears/grizzly/00000001.jpg'),Path('bears/grizzly/00000004.jpg'),Path('bears/grizzly/00000006.jpg'),Path('bears/grizzly/00000008.jpg'),Path('bears/grizzly/00000009.jpg'),Path('bears/grizzly/00000003.jpg'),Path('bears/grizzly/00000014.jpg'),Path('bears/grizzly/00000012.jpg')...]

failed = verify_images(fns)
failed

(#0) []

failed.map(Path.unlink);

Sidebar: Getting Help in Jupyter Notebooks

End sidebar

From Data to DataLoaders

bears = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(128))

doc(parent_label)

dls = bears.dataloaders(path)

dls.valid.show_batch(max_n=12, nrows=2)

bears = bears.new(item_tfms=Resize(128, ResizeMethod.Squish))
dls = bears.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)

bears = bears.new(item_tfms=Resize(128, ResizeMethod.Pad, pad_mode='zeros'))
dls = bears.dataloaders(path)
dls.valid.show_batch(max_n=4, nrows=1)

bears = bears.new(item_tfms=RandomResizedCrop(128, min_scale=0.3))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=4, nrows=1, unique=True)

Data Augmentation

bears = bears.new(item_tfms=Resize(128), batch_tfms=aug_transforms(mult=2))
dls = bears.dataloaders(path)
dls.train.show_batch(max_n=8, nrows=2, unique=True)

Training Your Model, and Using It to Clean Your Data

bears = bears.new(
    item_tfms=RandomResizedCrop(224, min_scale=0.5),
    batch_tfms=aug_transforms())
dls = bears.dataloaders(path)

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)

interp = ClassificationInterpretation.from_learner(learn)
interp.plot_confusion_matrix()

interp.plot_top_losses(5, nrows=1)

cleaner = ImageClassifierCleaner(learn)
cleaner

Turning Your Model into an Online Application

Using the Model for Inference

learn.export()

path = Path()
path.ls(file_exts='.pkl')

(#1) [Path('export.pkl')]

learn_inf = load_learner(path/'export.pkl')

learn_inf.predict('../images/grizzly.jpg')

('grizzly', TensorImage(1), TensorImage([3.2993e-07, 1.0000e+00, 1.7181e-07]))

learn_inf.dls.vocab

['black', 'grizzly', 'teddy']

Creating a Notebook App from the Model

btn_upload = widgets.FileUpload()
btn_upload

img = PILImage.create(btn_upload.data[-1])

out_pl = widgets.Output()
out_pl.clear_output()
with out_pl: display(img.to_thumb(128,128))
out_pl

pred,pred_idx,probs = learn_inf.predict(img)

lbl_pred = widgets.Label()
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
lbl_pred

btn_run = widgets.Button(description='Classify')
btn_run

def on_click_classify(change):
    img = PILImage.create(btn_upload.data[-1])
    out_pl.clear_output()
    with out_pl: display(img.to_thumb(128,128))
    pred,pred_idx,probs = learn_inf.predict(img)
    lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'

btn_run.on_click(on_click_classify)

VBox([widgets.Label('Select your bear!'), 
      btn_upload, btn_run, out_pl, lbl_pred])

Turning Your Notebook into a Real App

Deploying your app

How to Avoid Disaster

Unforeseen Consequences and Feedback Loops

Get Writing!

Questionnaire

Provide an example of where the bear classification model might work poorly in production, due to structural or style differences in the training data.
Where do text models currently have a major deficiency?
What are possible negative societal implications of text generation models?
In situations where a model might make mistakes, and those mistakes could be harmful, what is a good alternative to automating a process?
What kind of tabular data is deep learning particularly good at?
What's a key downside of directly using a deep learning model for recommendation systems?
What are the steps of the Drivetrain Approach?
How do the steps of the Drivetrain Approach map to a recommendation system?
Create an image recognition model using data you curate, and deploy it on the web.
What is DataLoaders?
What four things do we need to tell fastai to create DataLoaders?
What does the splitter parameter to DataBlock do?
How do we ensure a random split always gives the same validation set?
What letters are often used to signify the independent and dependent variables?
What's the difference between the crop, pad, and squish resize approaches? When might you choose one over the others?
What is data augmentation? Why is it needed?
What is the difference between item_tfms and batch_tfms?
What is a confusion matrix?
What does export save?
What is it called when we use a model for getting predictions, instead of training?
What are IPython widgets?
When might you want to use CPU for deployment? When might GPU be better?
What are the downsides of deploying your app to a server, instead of to a client (or edge) device such as a phone or PC?
What are three examples of problems that could occur when rolling out a bear warning system in practice?
What is "out-of-domain data"?
What is "domain shift"?
What are the three steps in the deployment process?

Further Research

Consider how the Drivetrain Approach maps to a project or problem you're interested in.
When might it be best to avoid certain types of data augmentation?
For a project you're interested in applying deep learning to, consider the thought experiment "What would happen if it went really, really well?"
Start a blog, and write your first blog post. For instance, write about what you think deep learning might be useful for in a domain you're interested in.

epoch	train_loss	valid_loss	error_rate	time
0	0.093664	0.066300	0.022989	00:11
1	0.067296	0.090083	0.022989	00:10
2	0.047179	0.104809	0.022989	00:10
3	0.037067	0.096211	0.034483	00:12