Xavier filler and inner product parameters #1575

denizyuret · 2014-12-14T16:57:18Z

XavierFiller in filler.hpp takes a blob and fills it with U[-scale, scale] where scale = sqrt(Dtype(3) / fan_in). fan_in is calculated as blob->count() / blob->num(). Now if this is a parameter blob (I am assuming it is, since we are passing it to a filler), blob->num() is always 1 (in fact the GaussianFiller checks for this condition) so we are in effect scaling all weights to sqrt(3/count) instead of sqrt(3/fan_in). Isn't the fan_in simply blob->width()?

sguada · 2014-12-16T23:44:46Z

@denizyuret blob->num() can be greater than 1, each blob contains count = num x channels x height x width, and fan_in = channels x height x width what means that fain_in = count / num

denizyuret · 2014-12-26T16:16:08Z

I (vaguely) understand what you mean for convolutional layers. How about an inner product / fully-connected layer? In the example given below the layer has 1000 outputs and 1024 inputs and we are computing fan_in as 1024000. Are we not supposed to use Xavier for these layers? And conversely are we not supposed to use sparse gaussian fillers for convolutional layers (there is a CHECK_EQ(blob->num(), 1)) in there.)? --thanks.

From http://caffe.berkeleyvision.org/tutorial/net_layer_blob.html:

"Parameter blob dimensions vary according to the type and configuration of the layer. For a convolution layer with 96 filters of 11 x 11 spatial dimension and 3 inputs the blob is 96 x 3 x 11 x 11. For an inner product / fully-connected layer with 1000 output channels and 1024 input channels the parameter blob is 1 x 1 x 1000 x 1024."

sguada · 2014-12-26T18:28:51Z

It should apply the same to inner_product layers. So the fan-in should be 1024, since the 1000 are outputs not inputs. However given the current way the weights are defined for inner_product layers
this->blobs_[0].reset(new Blob<Dtype>(1, 1, N_, K_));
instead of
this->blobs_[0].reset(new Blob<Dtype>(N_, K_, 1, 1));
then there will be a problem with using "xavier" filler with inner_product layers.

sguada · 2014-12-26T18:46:19Z

@Yangqing I think we should redefine the way the weights and bias are defined for inner_product layers, although that could break reading saved models it could be fixed.

shelhamer · 2015-01-16T06:25:15Z

@sguada @Yangqing I agree the current shape of the inner product params is unfortunate. We could switch them as part of the next protobuf definition but it adds a step to the conversion.

Yangqing · 2015-02-23T17:26:51Z

It was an unfortunate mistake. I propose to not allow InnerProductLayer to have arbitrary input shape - we always require a FlattenLayer before InnerProductLayer if the shape does not correspond to a vector (which is what I am doing in my refactoring). FlattenLayer will just share data pointers so no additional cost will incur.

sguada · 2015-02-23T18:29:20Z

I propose to make InnerProductLayer behave as a ConvolutionalLayer with
kernel equal to the bottom size.

Sergio

2015-02-23 9:27 GMT-08:00 Yangqing Jia notifications@github.com:

It was an unfortunate mistake. I propose to not allow InnerProductLayer to
have arbitrary input shape - we always require a FlattenLayer before
InnerProductLayer if the shape does not correspond to a vector (which is
what I am doing in my refactoring). FlattenLayer will just share data
pointers so no additional cost will incur.

—
Reply to this email directly or view it on GitHub
#1575 (comment).

seanbell · 2015-02-28T20:46:58Z

Note that #1970 should fix this issue, since the weights for InnerProductLayer are updated to be 2D with shape output x input.

With the new 2D convention, the formula used by XavierFiller now works: count() / num() = output * input / output = input as required.

shelhamer · 2015-03-10T17:22:12Z

Fixed by #1970. Thanks for the catch and discussion everyone.

sguada closed this as completed Dec 16, 2014

sguada reopened this Dec 26, 2014

shelhamer changed the title ~~bug in xavierFiller~~ Xavier filler and inner product parameters Jan 16, 2015

seanbell mentioned this issue Feb 23, 2015

MSR weight filler #1883

Closed

shelhamer added focus JD JL ES labels Feb 23, 2015

shelhamer closed this as completed Mar 10, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Xavier filler and inner product parameters #1575

Xavier filler and inner product parameters #1575

denizyuret commented Dec 14, 2014

sguada commented Dec 16, 2014

denizyuret commented Dec 26, 2014

sguada commented Dec 26, 2014

sguada commented Dec 26, 2014

shelhamer commented Jan 16, 2015

Yangqing commented Feb 23, 2015

sguada commented Feb 23, 2015

seanbell commented Feb 28, 2015

shelhamer commented Mar 10, 2015

Xavier filler and inner product parameters #1575

Xavier filler and inner product parameters #1575

Comments

denizyuret commented Dec 14, 2014

sguada commented Dec 16, 2014

denizyuret commented Dec 26, 2014

sguada commented Dec 26, 2014

sguada commented Dec 26, 2014

shelhamer commented Jan 16, 2015

Yangqing commented Feb 23, 2015

sguada commented Feb 23, 2015

seanbell commented Feb 28, 2015

shelhamer commented Mar 10, 2015