Add documentation on GPU performance on Quantization example (apache#13145)

kalyc · azai91 · commit 471842e3aed5 · 2018-11-30T20:28:11.000-08:00
* Add documentation on GPU performance

* Update README.md
diff --git a/example/quantization/README.md b/example/quantization/README.md
@@ -320,4 +320,6 @@ the console to run model quantization for a specific configuration.
 - `launch_inference.sh` This is a shell script that calculate the accuracies of all the quantized models generated
 by invoking `launch_quantize.sh`.
 
-**NOTE**: This example has only been tested on Linux systems.
+**NOTE**: 
+- This example has only been tested on Linux systems.
+- Performance is expected to decrease with GPU, however the memory footprint of a quantized model is smaller. The purpose of the quantization implementation is to minimize accuracy loss when converting FP32 models to INT8. MXNet community is working on improving the performance. 
diff --git a/example/quantization/imagenet_inference.py b/example/quantization/imagenet_inference.py
@@ -93,6 +93,7 @@ def score(sym, arg_params, aux_params, data, devs, label_name, max_num_examples,
     if logger is not None:
         logger.info('Finished inference with %d images' % num)
         logger.info('Finished with %f images per second', speed)
+        logger.warn('Note: GPU performance is expected to be slower than CPU. Please refer quantization/README.md for details')
         for m in metrics:
             logger.info(m.get())