apache · szha · Nov 7, 2018 · Nov 6, 2018 · Nov 7, 2018 · ThomasDelteil
@@ -320,4 +320,6 @@ the console to run model quantization for a specific configuration.
 - `launch_inference.sh` This is a shell script that calculate the accuracies of all the quantized models generated
 by invoking `launch_quantize.sh`.
 
-**NOTE**: This example has only been tested on Linux systems.
+**NOTE**: 
+- This example has only been tested on Linux systems.
+- Performance is expected to decrease with GPU as the params. The purpose of the quantization implementation is to minimize accuracy loss when converting FP32 models to INT8. MXNet community is working on improving the performance. 
@@ -93,6 +93,7 @@ def score(sym, arg_params, aux_params, data, devs, label_name, max_num_examples,
     if logger is not None:
         logger.info('Finished inference with %d images' % num)
         logger.info('Finished with %f images per second', speed)
+        logger.warn('Note: GPU performance is expected to be slower than CPU. Please refer quantization/README.md for details')
         for m in metrics:
             logger.info(m.get())