-
Notifications
You must be signed in to change notification settings - Fork 18.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Xavier filler and inner product parameters #1575
Comments
@denizyuret |
I (vaguely) understand what you mean for convolutional layers. How about an inner product / fully-connected layer? In the example given below the layer has 1000 outputs and 1024 inputs and we are computing fan_in as 1024000. Are we not supposed to use Xavier for these layers? And conversely are we not supposed to use sparse gaussian fillers for convolutional layers (there is a CHECK_EQ(blob->num(), 1)) in there.)? --thanks. From http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html: "Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024." |
It should apply the same to inner_product layers. So the fan-in should be 1024, since the 1000 are outputs not inputs. However given the current way the weights are defined for inner_product layers |
@Yangqing I think we should redefine the way the weights and bias are defined for |
It was an unfortunate mistake. I propose to not allow InnerProductLayer to have arbitrary input shape - we always require a FlattenLayer before InnerProductLayer if the shape does not correspond to a vector (which is what I am doing in my refactoring). FlattenLayer will just share data pointers so no additional cost will incur. |
I propose to make InnerProductLayer behave as a ConvolutionalLayer with Sergio 2015-02-23 9:27 GMT-08:00 Yangqing Jia notifications@github.com:
|
Note that #1970 should fix this issue, since the weights for With the new 2D convention, the formula used by |
Fixed by #1970. Thanks for the catch and discussion everyone. |
XavierFiller in filler.hpp takes a blob and fills it with U[-scale, scale] where scale = sqrt(Dtype(3) / fan_in). fan_in is calculated as blob->count() / blob->num(). Now if this is a parameter blob (I am assuming it is, since we are passing it to a filler), blob->num() is always 1 (in fact the GaussianFiller checks for this condition) so we are in effect scaling all weights to sqrt(3/count) instead of sqrt(3/fan_in). Isn't the fan_in simply blob->width()?
The text was updated successfully, but these errors were encountered: