Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch normalization --training parameter #11

Closed
galinator9000 opened this issue Feb 27, 2019 · 3 comments
Closed

Batch normalization --training parameter #11

galinator9000 opened this issue Feb 27, 2019 · 3 comments
Assignees

Comments

@galinator9000
Copy link

Hi, I wanted to use YOLOv3-tiny model. Downloaded cfg and weights from official website.

With this code below i successfully built .pb and .meta files.
python main.py --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/"

With this script below I could load graph and weights.
Tried to get output from last convolutional13 layer, I got array with full of nan values:

import tensorflow as tf
import numpy as np
import cv2
saver = tf.train.import_meta_graph("yolov3-tiny/yolov3-tiny.meta")
sess = tf.Session()
saver.restore(sess, "yolov3-tiny/yolov3-tiny.ckpt")

image = cv2.cvtColor(cv2.imread("sample.jpg"), cv2.COLOR_BGR2RGB) / 255.0
image = np.expand_dims(image, axis=0)
print(
	sess.run("YOLO/convolutional13/BiasAdd:0", feed_dict={"YOLO/net1:0":image})
)

Outputs:

[[[[nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   ...
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]]

  [[nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   ...
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]
   [nan nan nan ... nan nan nan]]]]

However when i tried same conversion with
python main.py --training --cfg ../yolov3-tiny/yolov3-tiny.cfg --weights ../yolov3-tiny/yolov3-tiny.weights --output ../yolov3-tiny/ --prefix "YOLO/

Same script outputs:

[[[[-0.5312634   0.23449755 -0.22042923 ... -0.99058443 -0.75764066
     0.05638865]
   [-0.1264087  -0.06148954 -0.13978335 ... -0.57391363 -0.65091616
    -0.34988856]
   [-0.27005857  0.18064664 -0.1842366  ... -0.7720764  -0.63676864
    -0.22235665]
   ...
   [-0.14108022  0.12593661  0.040429   ... -0.51453155 -0.8112872
    -0.2482701 ]
   [-0.14169356  0.05826963  0.04545707 ... -0.36210614 -0.6568373
    -0.17424914]
   [-0.24074644  0.49974358 -0.17072684 ... -1.1237179  -0.8400626
    -0.20994306]]

  [[-0.37883073  0.06569445  0.07646853 ... -0.72665095 -0.5669313
     0.23495841]
   [-0.11390454  0.00512573  0.09839267 ...  0.02260823 -0.31830767
     0.00776402]
   [-0.18927872  0.14090516  0.06336813 ... -0.17192174 -0.3423958
     0.07134365]
   ...
   [-0.5374908   0.17205149  0.30092606 ... -1.299513   -0.50735444
    -0.45372528]
   [-0.44234592  0.17717186  0.11988509 ... -0.9887123  -0.25854525
    -0.40106654]
   [-0.30651295  0.32414198  0.01627261 ... -1.7556211  -0.55981153
    -0.5505434 ]]]]

I believe this is because batch-normalization, --training parameter. And I want to use this model for transfer learning.

Also when I tried to get output from earlier layers like convolutional2 (without --training parameter), values were like:

[[[[nan -1.4262159e+36 -1.6400952e+36 ... -1.5521092e+36
     1.1826908e+38 -1.1971094e+37]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   ...
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -5.4608188e+36           -inf ... -2.9475174e+35
    -2.9942158e+36           -inf]
   [           nan -4.9901782e+36 -2.4481979e+36 ...  8.4210530e+36
              -inf -1.1353102e+37]]

  [[           nan -1.3676106e+36            inf ...  1.5158864e+37
               inf -8.5954786e+36]
   [           nan -7.9527132e+36            inf ...  2.1685821e+37
     1.6828479e+37           -inf]
   [           nan -7.9527132e+36            inf ...  2.1685821e+37
     1.6828479e+37           -inf]
   ...
   [           nan -3.1938362e+36            inf ...  1.5331453e+37
     3.3975579e+37 -9.5892951e+36]
   [           nan -3.1938362e+36            inf ...  1.5331453e+37
     3.3975579e+37 -9.5892951e+36]
   [           nan -5.6393693e+36  4.6983167e+37 ...  1.0347686e+37
    -5.8164126e+36 -4.1906564e+36]]]]

Is this a problem about code or am I missing something about like image input?

@sjain-stanford sjain-stanford self-assigned this Feb 27, 2019
@sjain-stanford
Copy link
Collaborator

sjain-stanford commented Feb 27, 2019

@fmehmetun Thanks for reporting this. After a little digging, this seems to be due to different weight offsets (16 vs 20) for different major/minor versions. So, yolov2-tiny, yolov3-tiny and yolov3 seem to require an offset of 20 instead of 16. If not set properly, this can corrupt the converted TF weights (ckpt), which likely caused the nans you reported.

Fortunately someone fixed this for darkflow in this PR. From a quick test, it seems to resolve your issue. I'll run some more tests and push the fix shortly.

@sjain-stanford
Copy link
Collaborator

@fmehmetun - give it a try and let me know if you see any other issues.

@galinator9000
Copy link
Author

galinator9000 commented Feb 28, 2019

Thanks for the fix. I tried now and its working with no problem. After opening issue I tried darkflow though, it's worked with no problem too. It's good to know I have another option for conversion. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants