Batch normalization --training parameter #11

galinator9000 · 2019-02-27T10:06:31Z

Hi, I wanted to use YOLOv3-tiny model. Downloaded cfg and weights from official website.

With this code below i successfully built .pb and .meta files.
python main.py --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/"

With this script below I could load graph and weights.
Tried to get output from last convolutional13 layer, I got array with full of nan values:

import tensorflow as tf
import numpy as np
import cv2
saver = tf.train.import_meta_graph("yolov3-tiny/yolov3-tiny.meta")
sess = tf.Session()
saver.restore(sess, "yolov3-tiny/yolov3-tiny.ckpt")

image = cv2.cvtColor(cv2.imread("sample.jpg"), cv2.COLOR_BGR2RGB) / 255.0
image = np.expand_dims(image, axis=0)
print(
	sess.run("YOLO/convolutional13/BiasAdd:0", feed_dict={"YOLO/net1:0":image})
)

Outputs:

[[[[nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   ...
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]]

  [[nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   ...
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]]]]

However when i tried same conversion with
python main.py --training --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/

Same script outputs:

[[[[-0.5312634   0.23449755 -0.22042923 ... -0.99058443 -0.75764066
     0.05638865]
   [-0.1264087  -0.06148954 -0.13978335 ... -0.57391363 -0.65091616
    -0.34988856]
   [-0.27005857  0.18064664 -0.1842366  ... -0.7720764  -0.63676864
    -0.22235665]
   ...
   [-0.14108022  0.12593661  0.040429   ... -0.51453155 -0.8112872
    -0.2482701 ]
   [-0.14169356  0.05826963  0.04545707 ... -0.36210614 -0.6568373
    -0.17424914]
   [-0.24074644  0.49974358 -0.17072684 ... -1.1237179  -0.8400626
    -0.20994306]]

  [[-0.37883073  0.06569445  0.07646853 ... -0.72665095 -0.5669313
     0.23495841]
   [-0.11390454  0.00512573  0.09839267 ...  0.02260823 -0.31830767
     0.00776402]
   [-0.18927872  0.14090516  0.06336813 ... -0.17192174 -0.3423958
     0.07134365]
   ...
   [-0.5374908   0.17205149  0.30092606 ... -1.299513   -0.50735444
    -0.45372528]
   [-0.44234592  0.17717186  0.11988509 ... -0.9887123  -0.25854525
    -0.40106654]
   [-0.30651295  0.32414198  0.01627261 ... -1.7556211  -0.55981153
    -0.5505434 ]]]]

I believe this is because batch-normalization, --training parameter. And I want to use this model for transfer learning.

Also when I tried to get output from earlier layers like convolutional2 (without --training parameter), values were like:

[[[[nan -1.4262159e+36 -1.6400952e+36 ... -1.5521092e+36
     1.1826908e+38 -1.1971094e+37]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   ...
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -4.9901782e+36 -2.4481979e+36 ...  8.4210530e+36
              -inf -1.1353102e+37]]

  [[           nan -1.3676106e+36            inf ...  1.5158864e+37
               inf -8.5954786e+36]
   [           nan -7.9527132e+36            inf ...  2.1685821e+37
     1.6828479e+37           -inf]
   [           nan -7.9527132e+36            inf ...  2.1685821e+37
     1.6828479e+37           -inf]
   ...
   [           nan -3.1938362e+36            inf ...  1.5331453e+37
     3.3975579e+37 -9.5892951e+36]
   [           nan -3.1938362e+36            inf ...  1.5331453e+37
     3.3975579e+37 -9.5892951e+36]
   [           nan -5.6393693e+36  4.6983167e+37 ...  1.0347686e+37
    -5.8164126e+36 -4.1906564e+36]]]]

Is this a problem about code or am I missing something about like image input?

The text was updated successfully, but these errors were encountered:

sjain-stanford · 2019-02-27T22:03:30Z

@fmehmetun Thanks for reporting this. After a little digging, this seems to be due to different weight offsets (16 vs 20) for different major/minor versions. So, yolov2-tiny, yolov3-tiny and yolov3 seem to require an offset of 20 instead of 16. If not set properly, this can corrupt the converted TF weights (ckpt), which likely caused the nans you reported.

Fortunately someone fixed this for darkflow in this PR. From a quick test, it seems to resolve your issue. I'll run some more tests and push the fix shortly.

…per thtrieu/darkflow#642

sjain-stanford · 2019-02-27T22:40:03Z

@fmehmetun - give it a try and let me know if you see any other issues.

galinator9000 · 2019-02-28T10:51:00Z

Thanks for the fix. I tried now and its working with no problem. After opening issue I tried darkflow though, it's worked with no problem too. It's good to know I have another option for conversion. Thanks.

sjain-stanford self-assigned this Feb 27, 2019

sjain-stanford closed this as completed in 1a56e04 Feb 27, 2019

sjain-stanford added a commit that referenced this issue Feb 27, 2019

fixes #11: pick correct offset (16/20) based on major/minor versions …

cbb8516

…per thtrieu/darkflow#642

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch normalization --training parameter #11

Batch normalization --training parameter #11

galinator9000 commented Feb 27, 2019

sjain-stanford commented Feb 27, 2019 •

edited

Loading

sjain-stanford commented Feb 27, 2019

galinator9000 commented Feb 28, 2019 •

edited

Loading

Batch normalization --training parameter #11

Batch normalization --training parameter #11

Comments

galinator9000 commented Feb 27, 2019

sjain-stanford commented Feb 27, 2019 • edited Loading

sjain-stanford commented Feb 27, 2019

galinator9000 commented Feb 28, 2019 • edited Loading

sjain-stanford commented Feb 27, 2019 •

edited

Loading

galinator9000 commented Feb 28, 2019 •

edited

Loading