Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xavier filler and inner product parameters #1575

Closed
denizyuret opened this issue Dec 14, 2014 · 9 comments
Closed

Xavier filler and inner product parameters #1575

denizyuret opened this issue Dec 14, 2014 · 9 comments

Comments

@denizyuret
Copy link

XavierFiller in filler.hpp takes a blob and fills it with U[-scale, scale] where scale = sqrt(Dtype(3) / fan_in). fan_in is calculated as blob->count() / blob->num(). Now if this is a parameter blob (I am assuming it is, since we are passing it to a filler), blob->num() is always 1 (in fact the GaussianFiller checks for this condition) so we are in effect scaling all weights to sqrt(3/count) instead of sqrt(3/fan_in). Isn't the fan_in simply blob->width()?

@sguada
Copy link
Contributor

sguada commented Dec 16, 2014

@denizyuret blob->num() can be greater than 1, each blob contains count = num x channels x height x width, and fan_in = channels x height x width what means that fain_in = count / num

@sguada sguada closed this as completed Dec 16, 2014
@denizyuret
Copy link
Author

I (vaguely) understand what you mean for convolutional layers. How about an inner product / fully-connected layer? In the example given below the layer has 1000 outputs and 1024 inputs and we are computing fan_in as 1024000. Are we not supposed to use Xavier for these layers? And conversely are we not supposed to use sparse gaussian fillers for convolutional layers (there is a CHECK_EQ(blob->num(), 1)) in there.)? --thanks.

From http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html:

"Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024."

@sguada
Copy link
Contributor

sguada commented Dec 26, 2014

It should apply the same to inner_product layers. So the fan-in should be 1024, since the 1000 are outputs not inputs. However given the current way the weights are defined for inner_product layers
this->blobs_[0].reset(new Blob<Dtype>(1, 1, N_, K_));
instead of
this->blobs_[0].reset(new Blob<Dtype>(N_, K_, 1, 1));
then there will be a problem with using "xavier" filler with inner_product layers.

@sguada sguada reopened this Dec 26, 2014
@sguada
Copy link
Contributor

sguada commented Dec 26, 2014

@Yangqing I think we should redefine the way the weights and bias are defined for inner_product layers, although that could break reading saved models it could be fixed.

@shelhamer shelhamer changed the title bug in xavierFiller Xavier filler and inner product parameters Jan 16, 2015
@shelhamer
Copy link
Member

@sguada @Yangqing I agree the current shape of the inner product params is unfortunate. We could switch them as part of the next protobuf definition but it adds a step to the conversion.

@Yangqing
Copy link
Member

It was an unfortunate mistake. I propose to not allow InnerProductLayer to have arbitrary input shape - we always require a FlattenLayer before InnerProductLayer if the shape does not correspond to a vector (which is what I am doing in my refactoring). FlattenLayer will just share data pointers so no additional cost will incur.

@sguada
Copy link
Contributor

sguada commented Feb 23, 2015

I propose to make InnerProductLayer behave as a ConvolutionalLayer with
kernel equal to the bottom size.

Sergio

2015-02-23 9:27 GMT-08:00 Yangqing Jia notifications@github.com:

It was an unfortunate mistake. I propose to not allow InnerProductLayer to
have arbitrary input shape - we always require a FlattenLayer before
InnerProductLayer if the shape does not correspond to a vector (which is
what I am doing in my refactoring). FlattenLayer will just share data
pointers so no additional cost will incur.


Reply to this email directly or view it on GitHub
#1575 (comment).

@seanbell
Copy link

Note that #1970 should fix this issue, since the weights for InnerProductLayer are updated to be 2D with shape output x input.

With the new 2D convention, the formula used by XavierFiller now works: count() / num() = output * input / output = input as required.

@shelhamer
Copy link
Member

Fixed by #1970. Thanks for the catch and discussion everyone.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants