research/deeplab/g3doc/faq.md from tensorflow/models

research/deeplab/g3doc/faq.md
Summary

Maintainability

Test Coverage

Issues
# FAQ
___
Q1: What if I want to use other network backbones, such as ResNet [1], instead of only those provided ones (e.g., Xception)?

A: The users could modify the provided core/feature_extractor.py to support more network backbones.
___
Q2: What if I want to train the model on other datasets?

A: The users could modify the provided dataset/build_{cityscapes,voc2012}_data.py and dataset/segmentation_dataset.py to build their own dataset.
___
Q3: Where can I download the PASCAL VOC augmented training set?

A: The PASCAL VOC augmented training set is provided by Bharath Hariharan et al. [2] Please refer to their [website](http://home.bharathh.info/pubs/codes/SBD/download.html) for details and consider citing their paper if using the dataset.
___
Q4: Why the implementation does not include DenseCRF [3]?

A: We have not tried this. The interested users could take a look at Philipp Krähenbühl's [website](http://graphics.stanford.edu/projects/densecrf/) and [paper](https://arxiv.org/abs/1210.5644) for details.
___
Q5: What if I want to train the model and fine-tune the batch normalization parameters?

A: If given the limited resource at hand, we would suggest you simply fine-tune
from our provided checkpoint whose batch-norm parameters have been trained (i.e.,
train with a smaller learning rate, set `fine_tune_batch_norm = false`, and
employ longer training iterations since the learning rate is small). If
you really would like to train by yourself, we would suggest

1. Set `output_stride = 16` or maybe even `32` (remember to change the flag
`atrous_rates` accordingly, e.g., `atrous_rates = [3, 6, 9]` for
`output_stride = 32`).

2. Use as many GPUs as possible (change the flag `num_clones` in train.py) and
set `train_batch_size` as large as possible.

3. Adjust the `train_crop_size` in train.py. Maybe set it to be smaller, e.g.,
513x513 (or even 321x321), so that you could use a larger batch size.

4. Use a smaller network backbone, such as MobileNet-v2.

___
Q6: How can I train the model asynchronously?

A: In the train.py, the users could set `num_replicas` (number of machines for training) and `num_ps_tasks` (we usually set `num_ps_tasks` = `num_replicas` / 2). See slim.deployment.model_deploy for more details.
___
Q7: I could not reproduce the performance even with the provided checkpoints.

A: Please try running

```bash
# Run the simple test with Xception_65 as network backbone.
sh local_test.sh
```

or

```bash
# Run the simple test with MobileNet-v2 as network backbone.
sh local_test_mobilenetv2.sh
```

First, make sure you could reproduce the results with our provided setting.
After that, you could start to make a new change one at a time to help debug.
___
Q8: What value of `eval_crop_size` should I use?

A: Our model uses whole-image inference, meaning that we need to set `eval_crop_size` equal to `output_stride` * k + 1, where k is an integer and set k so that the resulting `eval_crop_size` is slightly larger the largest
image dimension in the dataset. For example, we have `eval_crop_size` = 513x513 for PASCAL dataset whose largest image dimension is 512. Similarly, we set `eval_crop_size` = 1025x2049 for Cityscapes images whose
image dimension is all equal to 1024x2048.
___
Q9: Why multi-gpu training is slow?

A: Please try to use more threads to pre-process the inputs. For, example change [num_readers = 4](https://github.com/tensorflow/models/blob/master/research/deeplab/train.py#L457).
___


## References

1. **Deep Residual Learning for Image Recognition**<br />
   Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun<br />
   [[link]](https://arxiv.org/abs/1512.03385), In CVPR, 2016.

2. **Semantic Contours from Inverse Detectors**<br />
   Bharath Hariharan, Pablo Arbelaez, Lubomir Bourdev, Subhransu Maji, Jitendra Malik<br />
   [[link]](http://home.bharathh.info/pubs/codes/SBD/download.html), In ICCV, 2011.

3. **Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials**<br />
   Philipp Krähenbühl, Vladlen Koltun<br />
   [[link]](http://graphics.stanford.edu/projects/densecrf/), In NIPS, 2011.