Warning: This is just my notes from the excellent fast.ai MOOC which you can find here

Deep Learning Is for Everyone

Neural Networks: A Brief History

Who We Are

How to Learn Deep Learning

Your Projects and Your Mindset

The Software: PyTorch, fastai, and Jupyter

Your First Model

Getting a GPU Deep Learning Server

Running Your First Notebook

from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
    path, get_image_files(path), valid_pct=0.2, seed=42,
    label_func=is_cat, item_tfms=Resize(224))

learn = cnn_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
epoch train_loss valid_loss error_rate time
0 0.161300 0.021319 0.009472 00:33
epoch train_loss valid_loss error_rate time
0 0.069097 0.040982 0.009472 00:43

Sidebar: This Book Was Written in Jupyter Notebooks

1+1
2
img = PILImage.create(image_cat())
img.to_thumb(192)

End sidebar

uploader = widgets.FileUpload()
uploader
img = PILImage.create(uploader.data[0])
is_cat,_,probs = learn.predict(img)
print(f"Is this a cat?: {is_cat}.")
print(f"Probability it's a cat: {probs[1].item():.6f}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-37db1363ed76> in <module>
      1 img = PILImage.create(uploader.data[0])
----> 2 is_cat,_,probs = learn.predict(img)
      3 print(f"Is this a cat?: {is_cat}.")
      4 print(f"Probability it's a cat: {probs[1].item():.6f}")

NameError: name 'learn' is not defined

What Is Machine Learning?

gv('''program[shape=box3d width=1 height=0.7]
inputs->program->results''')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G program program results results program->results inputs inputs inputs->program
gv('''model[shape=box3d width=1 height=0.7]
inputs->model->results; weights->model''')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G model model results results model->results inputs inputs inputs->model weights weights weights->model
gv('''ordering=in
model[shape=box3d width=1 height=0.7]
inputs->model->results; weights->model; results->performance
performance->weights[constraint=false label=update]''')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G model model results results model->results inputs inputs inputs->model performance performance results->performance weights weights weights->model performance->weights update
gv('''model[shape=box3d width=1 height=0.7]
inputs->model->results''')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G model model results results model->results inputs inputs inputs->model

What Is a Neural Network?

A Bit of Deep Learning Jargon

gv('''ordering=in
model[shape=box3d width=1 height=0.7 label=architecture]
inputs->model->predictions; parameters->model; labels->loss; predictions->loss
loss->parameters[constraint=false label=update]''')
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd"> G model architecture predictions predictions model->predictions inputs inputs inputs->model loss loss predictions->loss parameters parameters parameters->model labels labels labels->loss loss->parameters update

Limitations Inherent To Machine Learning

From this picture we can now see some fundamental things about training a deep learning model:

  • A model cannot be created without data.
  • A model can only learn to operate on the patterns seen in the input data used to train it.
  • This learning approach only creates predictions, not recommended actions.
  • It's not enough to just have examples of input data; we need labels for that data too (e.g., pictures of dogs and cats aren't enough to train a model; we need a label for each one, saying which ones are dogs, and which are cats).

Generally speaking, we've seen that most organizations that say they don't have enough data, actually mean they don't have enough labeled data. If any organization is interested in doing something in practice with a model, then presumably they have some inputs they plan to run their model against. And presumably they've been doing that some other way for a while (e.g., manually, or with some heuristic program), so they have data from those processes! For instance, a radiology practice will almost certainly have an archive of medical scans (since they need to be able to check how their patients are progressing over time), but those scans may not have structured labels containing a list of diagnoses or interventions (since radiologists generally create free-text natural language reports, not structured data). We'll be discussing labeling approaches a lot in this book, because it's such an important issue in practice.

Since these kinds of machine learning models can only make predictions (i.e., attempt to replicate labels), this can result in a significant gap between organizational goals and model capabilities. For instance, in this book you'll learn how to create a recommendation system that can predict what products a user might purchase. This is often used in e-commerce, such as to customize products shown on a home page by showing the highest-ranked items. But such a model is generally created by looking at a user and their buying history (inputs) and what they went on to buy or look at (labels), which means that the model is likely to tell you about products the user already has or already knows about, rather than new products that they are most likely to be interested in hearing about. That's very different to what, say, an expert at your local bookseller might do, where they ask questions to figure out your taste, and then tell you about authors or series that you've never heard of before.

How Our Image Recognizer Works

What Our Image Recognizer Learned

Image Recognizers Can Tackle Non-Image Tasks

Jargon Recap

Deep Learning Is Not Just for Image Classification

path = untar_data(URLs.CAMVID_TINY)
dls = SegmentationDataLoaders.from_label_func(
    path, bs=8, fnames = get_image_files(path/"images"),
    label_func = lambda o: path/'labels'/f'{o.stem}_P{o.suffix}',
    codes = np.loadtxt(path/'codes.txt', dtype=str)
)

learn = unet_learner(dls, resnet34)
learn.fine_tune(8)
epoch train_loss valid_loss time
0 2.987978 2.480352 00:21
epoch train_loss valid_loss time
0 2.158624 1.743689 00:23
1 1.767877 1.964173 00:26
2 1.643144 1.283908 00:25
3 1.472303 1.108046 00:25
4 1.320544 0.973587 00:25
5 1.183503 0.858053 00:26
6 1.073220 0.822954 00:26
7 0.986720 0.817717 00:25
learn.show_results(max_n=6, figsize=(7,8))
from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)
epoch train_loss valid_loss accuracy time
0 0.809147 01:06
---------------------------------------------------------------------------
KeyboardInterrupt                         Traceback (most recent call last)
<ipython-input-17-5ab79cd5e866> in <module>
      3 dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test')
      4 learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
----> 5 learn.fine_tune(4, 1e-2)

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/callback/schedule.py in fine_tune(self, epochs, base_lr, freeze_epochs, lr_mult, pct_start, div, **kwargs)
    155     "Fine tune with `freeze` for `freeze_epochs` then with `unfreeze` from `epochs` using discriminative LR"
    156     self.freeze()
--> 157     self.fit_one_cycle(freeze_epochs, slice(base_lr), pct_start=0.99, **kwargs)
    158     base_lr /= 2
    159     self.unfreeze()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/callback/schedule.py in fit_one_cycle(self, n_epoch, lr_max, div, div_final, pct_start, wd, moms, cbs, reset_opt)
    110     scheds = {'lr': combined_cos(pct_start, lr_max/div, lr_max, lr_max/div_final),
    111               'mom': combined_cos(pct_start, *(self.moms if moms is None else moms))}
--> 112     self.fit(n_epoch, cbs=ParamScheduler(scheds)+L(cbs), reset_opt=reset_opt, wd=wd)
    113 
    114 # Cell

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in fit(self, n_epoch, lr, wd, cbs, reset_opt)
    203             self.opt.set_hypers(lr=self.lr if lr is None else lr)
    204             self.n_epoch = n_epoch
--> 205             self._with_events(self._do_fit, 'fit', CancelFitException, self._end_cleanup)
    206 
    207     def _end_cleanup(self): self.dl,self.xb,self.yb,self.pred,self.loss = None,(None,),(None,),None,None

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    152 
    153     def _with_events(self, f, event_type, ex, final=noop):
--> 154         try:       self(f'before_{event_type}')       ;f()
    155         except ex: self(f'after_cancel_{event_type}')
    156         finally:   self(f'after_{event_type}')        ;final()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _do_fit(self)
    194         for epoch in range(self.n_epoch):
    195             self.epoch=epoch
--> 196             self._with_events(self._do_epoch, 'epoch', CancelEpochException)
    197 
    198     def fit(self, n_epoch, lr=None, wd=None, cbs=None, reset_opt=False):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    152 
    153     def _with_events(self, f, event_type, ex, final=noop):
--> 154         try:       self(f'before_{event_type}')       ;f()
    155         except ex: self(f'after_cancel_{event_type}')
    156         finally:   self(f'after_{event_type}')        ;final()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _do_epoch(self)
    188 
    189     def _do_epoch(self):
--> 190         self._do_epoch_train()
    191         self._do_epoch_validate()
    192 

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _do_epoch_train(self)
    180     def _do_epoch_train(self):
    181         self.dl = self.dls.train
--> 182         self._with_events(self.all_batches, 'train', CancelTrainException)
    183 
    184     def _do_epoch_validate(self, ds_idx=1, dl=None):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    152 
    153     def _with_events(self, f, event_type, ex, final=noop):
--> 154         try:       self(f'before_{event_type}')       ;f()
    155         except ex: self(f'after_cancel_{event_type}')
    156         finally:   self(f'after_{event_type}')        ;final()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in all_batches(self)
    158     def all_batches(self):
    159         self.n_iter = len(self.dl)
--> 160         for o in enumerate(self.dl): self.one_batch(*o)
    161 
    162     def _do_one_batch(self):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in one_batch(self, i, b)
    176         self.iter = i
    177         self._split(b)
--> 178         self._with_events(self._do_one_batch, 'batch', CancelBatchException)
    179 
    180     def _do_epoch_train(self):

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _with_events(self, f, event_type, ex, final)
    152 
    153     def _with_events(self, f, event_type, ex, final=noop):
--> 154         try:       self(f'before_{event_type}')       ;f()
    155         except ex: self(f'after_cancel_{event_type}')
    156         finally:   self(f'after_{event_type}')        ;final()

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/learner.py in _do_one_batch(self)
    161 
    162     def _do_one_batch(self):
--> 163         self.pred = self.model(*self.xb)
    164         self('after_pred')
    165         if len(self.yb): self.loss = self.loss_func(self.pred, *self.yb)

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
    115     def forward(self, input):
    116         for module in self:
--> 117             input = module(input)
    118         return input
    119 

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/models/core.py in forward(self, input)
     79             #Note: this expects that sequence really begins on a round multiple of bptt
     80             real_bs = (input[:,i] != self.pad_idx).long().sum()
---> 81             o = self.module(input[:real_bs,i: min(i+self.bptt, sl)])
     82             if self.max_len is None or sl-i <= self.max_len:
     83                 outs.append(o)

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/models/awdlstm.py in forward(self, inp, from_embeds)
    104         new_hidden = []
    105         for l, (rnn,hid_dp) in enumerate(zip(self.rnns, self.hidden_dps)):
--> 106             output, new_h = rnn(output, self.hidden[l])
    107             new_hidden.append(new_h)
    108             if l != self.n_layers - 1: output = hid_dp(output)

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/text/models/awdlstm.py in forward(self, *args)
     51             # To avoid the warning that comes because the weights aren't flattened.
     52             warnings.simplefilter("ignore", category=UserWarning)
---> 53             return self.module(*args)
     54 
     55     def reset(self):

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    725             result = self._slow_forward(*input, **kwargs)
    726         else:
--> 727             result = self.forward(*input, **kwargs)
    728         for hook in itertools.chain(
    729                 _global_forward_hooks.values(),

/opt/conda/envs/fastai/lib/python3.8/site-packages/torch/nn/modules/rnn.py in forward(self, input, hx)
    579         self.check_forward_args(input, hx, batch_sizes)
    580         if batch_sizes is None:
--> 581             result = _VF.lstm(input, hx, self._flat_weights, self.bias, self.num_layers,
    582                               self.dropout, self.training, self.bidirectional, self.batch_first)
    583         else:

/opt/conda/envs/fastai/lib/python3.8/site-packages/fastai/torch_core.py in __torch_function__(self, func, types, args, kwargs)
    315 
    316     def __torch_function__(self, func, types, args=(), kwargs=None):
--> 317         with torch._C.DisableTorchFunction(): ret = _convert(func(*args, **(kwargs or {})), self.__class__)
    318         if isinstance(ret, TensorBase): ret.set_meta(self, as_copy=True)
    319         return ret

KeyboardInterrupt: 

If you hit a "CUDA out of memory error" after running this cell, click on the menu Kernel, then restart. Instead of executing the cell above, copy and paste the following code in it:

from fastai.text.all import *

dls = TextDataLoaders.from_folder(untar_data(URLs.IMDB), valid='test', bs=32)
learn = text_classifier_learner(dls, AWD_LSTM, drop_mult=0.5, metrics=accuracy)
learn.fine_tune(4, 1e-2)

This reduces the batch size to 32 (we will explain this later). If you keep hitting the same error, change 32 to 16.

learn.predict("I really liked that movie!")
('neg', TensorText(0), TensorText([0.5623, 0.4377]))

Sidebar: The Order Matters

End sidebar

from fastai.tabular.all import *
path = untar_data(URLs.ADULT_SAMPLE)

dls = TabularDataLoaders.from_csv(path/'adult.csv', path=path, y_names="salary",
    cat_names = ['workclass', 'education', 'marital-status', 'occupation',
                 'relationship', 'race'],
    cont_names = ['age', 'fnlwgt', 'education-num'],
    procs = [Categorify, FillMissing, Normalize])

learn = tabular_learner(dls, metrics=accuracy)
learn.fit_one_cycle(3)
epoch train_loss valid_loss accuracy time
0 0.367843 0.357764 0.832924 00:07
1 0.362614 0.354566 0.831849 00:07
2 0.344427 0.349152 0.837838 00:08
from fastai.collab import *
path = untar_data(URLs.ML_SAMPLE)
dls = CollabDataLoaders.from_csv(path/'ratings.csv')
learn = collab_learner(dls, y_range=(0.5,5.5))
learn.fine_tune(20)
epoch train_loss valid_loss time
0 1.523609 1.353591 00:00
epoch train_loss valid_loss time
0 1.377288 1.306087 00:00
1 1.314181 1.222488 00:00
2 1.191319 1.065583 00:00
3 1.010563 0.854901 00:00
4 0.818294 0.725766 00:00
5 0.699861 0.688180 00:00
6 0.650566 0.677446 00:00
7 0.629411 0.673032 00:00
8 0.609606 0.668113 00:00
9 0.606750 0.663429 00:00
10 0.594681 0.660590 00:00
11 0.583541 0.656942 00:00
12 0.574227 0.654432 00:00
13 0.567019 0.651633 00:00
14 0.548985 0.650612 00:00
15 0.544749 0.649310 00:00
16 0.542530 0.648421 00:00
17 0.546451 0.648087 00:00
18 0.544443 0.647970 00:00
19 0.539262 0.647940 00:00
learn.show_results()
userId movieId rating rating_pred
0 83.0 64.0 3.5 3.697940
1 27.0 42.0 3.0 3.154352
2 73.0 48.0 4.0 3.848425
3 80.0 31.0 3.5 3.996437
4 12.0 13.0 4.0 3.767331
5 30.0 20.0 3.0 4.216630
6 85.0 58.0 5.0 4.794383
7 27.0 54.0 2.5 4.053595
8 66.0 44.0 4.0 3.077837

Sidebar: Datasets: Food for Models

End sidebar

Validation Sets and Test Sets

Use Judgment in Defining Test Sets

A Choose Your Own Adventure moment

Questionnaire

It can be hard to know in pages and pages of prose what the key things are that you really need to focus on and remember. So, we've prepared a list of questions and suggested steps to complete at the end of each chapter. All the answers are in the text of the chapter, so if you're not sure about anything here, reread that part of the text and make sure you understand it. Answers to all these questions are also available on the book's website. You can also visit the forums if you get stuck to get help from other folks studying this material.

For more questions, including detailed answers and links to the video timeline, have a look at Radek Osmulski's aiquizzes.

  1. Do you need these for deep learning?

    • Lots of math T / F
    • Lots of data T / F
    • Lots of expensive computers T / F
    • A PhD T / F
  2. Name five areas where deep learning is now the best in the world.

  3. What was the name of the first device that was based on the principle of the artificial neuron?
  4. Based on the book of the same name, what are the requirements for parallel distributed processing (PDP)?
  5. What were the two theoretical misunderstandings that held back the field of neural networks?
  6. What is a GPU?
  7. Open a notebook and execute a cell containing: 1+1. What happens?
  8. Follow through each cell of the stripped version of the notebook for this chapter. Before executing each cell, guess what will happen.
  9. Complete the Jupyter Notebook online appendix.
  10. Why is it hard to use a traditional computer program to recognize images in a photo?
  11. What did Samuel mean by "weight assignment"?
  12. What term do we normally use in deep learning for what Samuel called "weights"?
  13. Draw a picture that summarizes Samuel's view of a machine learning model.
  14. Why is it hard to understand why a deep learning model makes a particular prediction?
  15. What is the name of the theorem that shows that a neural network can solve any mathematical problem to any level of accuracy?
  16. What do you need in order to train a model?
  17. How could a feedback loop impact the rollout of a predictive policing model?
  18. Do we always have to use 224×224-pixel images with the cat recognition model?
  19. What is the difference between classification and regression?
  20. What is a validation set? What is a test set? Why do we need them?
  21. What will fastai do if you don't provide a validation set?
  22. Can we always use a random sample for a validation set? Why or why not?
  23. What is overfitting? Provide an example.
  24. What is a metric? How does it differ from "loss"?
  25. How can pretrained models help?
  26. What is the "head" of a model?
  27. What kinds of features do the early layers of a CNN find? How about the later layers?
  28. Are image models only useful for photos?
  29. What is an "architecture"?
  30. What is segmentation?
  31. What is y_range used for? When do we need it?
  32. What are "hyperparameters"?
  33. What's the best way to avoid failures when using AI in an organization?

Further Research

Each chapter also has a "Further Research" section that poses questions that aren't fully answered in the text, or gives more advanced assignments. Answers to these questions aren't on the book's website; you'll need to do your own research!

  1. Why is a GPU useful for deep learning? How is a CPU different, and why is it less effective for deep learning?
  2. Try to think of three areas where feedback loops might impact the use of machine learning. See if you can find documented examples of that happening in practice.