Failed in fine-tuning inception_v3 #302

JingyunLiang · 2017-10-18T06:55:22Z

I failed in using inception_v3 on my own dataset. (Ubuntu14.04, cuda8.0, python3.6.2)

It outputs warning when loaded:

/home/ljy/anaconda3/lib/python3.6/site-packages/torchvision-0.1.9-py3.6.egg/torchvision/models/inception.py:65: UserWarning: src is not broadcastable to dst, but they have the same number of elements.  Falling back to deprecated pointwise behavior.

It failed which training:

Traceback (most recent call last):
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 382, in <module>
    main()
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 213, in main
    train(train_loader, model, criterion, optimizer, epoch)
  File "/home/ljy/pytorch-examples-master/cub_pytorch/main.py", line 251, in train
    loss = criterion(output, target_var)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 482, in forward
    self.ignore_index)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 746, in cross_entropy
    return nll_loss(log_softmax(input), target, weight, size_average, ignore_index)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/functional.py", line 537, in log_softmax
    return _functions.thnn.LogSoftmax.apply(input)
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 126, in forward
    ctx._backend = type2backend[type(input)]
  File "/home/ljy/anaconda3/lib/python3.6/site-packages/torch/_thnn/__init__.py", line 15, in __getitem__
    return self.backends[name].load()
KeyError: <class 'tuple'>

The text was updated successfully, but these errors were encountered:

alykhantejani · 2017-10-18T09:00:06Z

Hi @MichaelLiang12,

What PyTorch version are you using (found by torch.__version__), also can you provide us with a minimum working example to reproduce this?

Thanks

alykhantejani · 2017-10-18T09:44:44Z

Also the user warning you are getting when loading the model is fixed in master (via #231)

jamiechoi1995 · 2017-10-27T02:35:37Z

Same issue:

(tensorflow) wcai@tdtd-desktop ~/tensorflow/AI_competition/pytorch $ python main.py -a inception_v3 . --pretrained
=> using pre-trained model 'inception_v3'
/home/wcai/tensorflow/lib/python3.5/site-packages/torchvision/models/inception.py:65: UserWarning: src is not broadcastable to dst, but they have the same number of elements. Falling back to deprecated pointwise behavior.
m.weight.data.copy_(values)
Traceback (most recent call last):
File "main.py", line 353, in
main()
File "main.py", line 176, in main
train(train_loader, model, criterion, optimizer, epoch)
File "main.py", line 214, in train
loss = criterion(output, target_var)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/modules/module.py", line 224, in call
result = self.forward(*input, **kwargs)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 482, in forward
self.ignore_index)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/functional.py", line 746, in cross_entropy
return nll_loss(log_softmax(input), target, weight, size_average, ignore_index)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/functional.py", line 537, in log_softmax
return _functions.thnn.LogSoftmax.apply(input)
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 126, in forward
ctx._backend = type2backend[type(input)]
File "/home/wcai/tensorflow/lib/python3.5/site-packages/torch/_thnn/init.py", line 15, in getitem
return self.backends[name].load()
KeyError: <class 'tuple'>

Python: Python 3.5.2

print (torch.version)
0.2.0_3

alykhantejani · 2017-10-27T08:12:26Z

Hi @jamiechoi1995,

Can you provide a minimum working example of this failing (i.e. an input that causes this when you pass it the model).

From the stack trace it seems like the input to the loss is a tuple, instead of a Variable.

jamiechoi1995 · 2017-10-30T06:02:12Z

Hi @alykhantejani

You can reproduce this problom by using the code in https://github.com/pytorch/examples/tree/master/imagenet

I modify the size of rescale and crop to 299 for inception v3,
and my train&validate data are jpg files and the corresponding json files.

Using the same code with size of 224 in resnet model is OK,
but when I swith it to inception v3, I got this problem.

Thanks.

TiRune · 2017-11-01T11:01:03Z

Isn't this problem because the Aux error branch in the network? If you remove it it should work :)

alykhantejani · 2017-11-01T16:41:59Z

@jamiechoi1995 @MichaelLiang12, @TiRune is correct, inception_v3 has an aux branch, and if this is not disabled the forward function will return a tuple (see here), which when passed to the criterion will throw this error.

So you have two choices:

disable aux_logits when the model is created here by also passing aux_logits=False to the inception_v3 function.
edit your train function to accept and unpack the returned tuple here to be something like:

output, aux = model(input_var)

rajasekharponakala · 2019-03-23T17:24:24Z

@alykhantejani:Hi, why we have to disable the aux_logits?, what are these aux_logits? does they effect the training/validation?

I'm trying to reproduce the accuracy from a model trained using with the bvlc_googlenet (without pretrained weights). So when I do aux branch off with pytorch(googlenet) it works and reports val_acc with 50% which is very low when compared to the caffe. any other methods to reproduce the same accurcy using pytorch?
Thanks.

@jamiechoi1995 @MichaelLiang12, @TiRune is correct, inception_v3 has an aux branch, and if this is not disabled the forward function will return a tuple (see here), which when passed to the criterion will throw this error.

So you have two choices:
1. disable `aux_logits` when the model is created [here](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L75) by also passing `aux_logits=False` to the `inception_v3` function.

2. edit your `train` function to accept and unpack the returned tuple [here](https://github.com/pytorch/examples/blob/master/imagenet/main.py#L194) to be something like:
output, aux = model(input_var)

fmassa · 2019-03-24T17:49:30Z

@rajasekharponakala the aux_logits is a separate classifier that is added to help during training, but it is not used during inference.

I'm trying to reproduce the accuracy from a model trained using with the bvlc_googlenet (without pretrained weights). So when I do aux branch off with pytorch(googlenet) it works and reports val_acc with 50% which is very low when compared to the caffe. any other methods to reproduce the same accurcy using pytorch?

Both googlenet and inception_v3 use pre-trained weights from TensorFlow, and as far as I know we didn't manage to reproduce accuracies from the paper when training from scratch.

rajasekharponakala · 2019-03-25T15:48:58Z

Hi @fmassa, thanks. I followed (pytorch discourse) to add below lines in train() imagenet example.

output, aux = model(input_var)
loss1 = criterion(output, target)
loss2 = criterion(aux, target)
loss = loss1 + 0.4*loss2

but ended with error:

Traceback (most recent call last):
  File "imagenet.py", line 407, in <module>
    main()
  File "imagenet.py", line 114, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "imagenet.py", line 240, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "imagenet.py", line 281, in train
    output, aux = model(input)
ValueError: too many values to unpack (expected 2)

any idea?

fmassa · 2019-03-25T18:30:32Z

you need to set your model to train() mode, it's probably in eval mode

rajasekharponakala · 2019-03-25T18:49:53Z

Thanks. Yes, I'm following the example/imagenet/main.py script:

def main()
      ...
def main_worker()
      ...
def train()
      ....
      model.train()
      ....
      outputs, aux_outputs = model(inputs)
      loss1 = criterion(outputs, target)
      loss2 = criterion(aux_outputs, target)
      loss = loss1 + 0.4*loss2
def validate()
      ...
      model.eval()
      ...
     outputs = model(inputs)
     loss = criterion(outputs, target)
     ....
def adjust_learning_rate()
     ...
def accuracy()
     ...

I found some other method in dicourse

        output = model(input) 
        loss = None
        # for nets that have multiple outputs such as inception
        if isinstance(output, tuple):
            loss = sum((criterion(o,target) for o in output))
        else:
            loss = criterion(output, target)

This times it throws different error:

Traceback (most recent call last):
  File "imagenet.py", line 417, in <module>
    main()
  File "imagenet.py", line 114, in main
    main_worker(args.gpu, ngpus_per_node, args)
  File "imagenet.py", line 240, in main_worker
    train(train_loader, model, criterion, optimizer, epoch, args)
  File "imagenet.py", line 298, in train
    acc1, acc5 = accuracy(output, target, topk=(1, 5))
  File "imagenet.py", line 405, in accuracy
    _, pred = output.topk(maxk, 1, True, True)
AttributeError: 'tuple' object has no attribute 'topk'

fmassa · 2019-03-25T18:54:51Z

The issue is that both googlenet and inception can return auxiliary classifiers in training mode.
Your code is not taking that into account, or you didn't set aux classifiers. Double-check that and you'll be able to find the issue.

rajasekharponakala · 2019-03-25T19:01:08Z

Yeah. def main_worker() set to

if args.pretrained:
        print("=> using pre-trained model '{}'".format(args.arch))
        model = models.__dict__[args.arch](pretrained=True)
    else:
        print("=> creating model '{}'".format(args.arch))
        model = models.__dict__[args.arch](aux_logits=True)

and also vision/models/googlenet.py has

class GoogLeNet(nn.Module):

    def __init__(self, num_classes=1000, aux_logits=True, transform_input=False, init_weights=True):
        super(GoogLeNet, self).__init__()
        self.aux_logits = aux_logits
        self.transform_input = transform_input
        .....
        def forward() #has self.aux_logits

TheCodez · 2019-03-25T19:58:35Z

@rajasekharponakala one thing to note here is that GoogLeNet has two aux branches where as inception v3 only has one.

So for GoogLeNet you have to use:
aux1, aux2, output = model(inputs)

rajasekharponakala · 2019-03-25T20:25:20Z

@TheCodez: Thanks, its working now!
format:

aux1, aux2, output = model(inputs)     
loss1 = criterion(outputs, target)
loss2 = criterion(aux1, target)
loss3 = criterion(aux2, target)
loss = loss1 + 0.4*(loss2+loss3)

TheCodez · 2019-03-25T20:30:58Z

@rajasekharponakala the correct weighting scheme for GoogLeNet is using 0.3:

aux1, aux2, output = model(inputs)     
loss1 = criterion(outputs, target)
loss2 = criterion(aux1, target)
loss3 = criterion(aux2, target)
loss = loss1 + 0.3 * (loss2 + loss3)

rajasekharponakala · 2019-03-25T20:31:36Z

Yeah, thanks.

tejasri19 · 2019-07-10T06:59:54Z

@TheCodez @fmassa @alykhantejani @rajasekharponakala Do we have to set auxiliary classifiers in test mode? I get very poor test accuracy when I retrieve trained model ( auxiliary classifiers are set here). I'm using inception v3 model for my task!

fmassa · 2019-07-10T08:37:51Z

@tejasri19 for inference, don't forget to set your model to eval() mode.

You don't need to use the aux classifiers for inference, only for training

Holmeyoung · 2019-07-16T01:57:50Z

Hi, i have a question. In the https://github.com/pytorch/vision/blob/master/torchvision/models/googlenet.py
it's

        if self.training and self.aux_logits:
            return _GoogLeNetOutputs(x, aux2, aux1)
        return x

_GoogLeNetOutputs = namedtuple('GoogLeNetOutputs', ['logits', 'aux_logits2', 'aux_logits1'])

so, should it be
output, aux2, aux1 = model(inputs)
but not
aux1, aux2, output = model(inputs)

Is it right? Thanks.

fmassa · 2019-07-16T08:28:53Z

It should be output, aux2, aux1.

rasha-salim · 2020-05-16T10:21:24Z

Thanks for this thread it really helped me but now I'm getting this error when unpacking the model output:
output, aux1= model(data)
ValueError: too many values to unpack (expected 2)

and even when I added an extra output to unpack:
output, aux2, aux1 = model(data)
I still have the following error:
not enough values to unpack (expected 3, got 2)

rasha-salim · 2020-05-16T10:46:42Z

I solved it by unpacking the output in seperatelly:
output = model(data).logits
aux1 = model(data).aux_logits
It seems that there are extra outputs such as counts that I don't believe we need for training

TheCodez · 2020-05-16T11:16:05Z

@gamesMum I would advise not to do that, as you are essentially running your model twice.
Instead just use this once:
output = model(data)

and then access using:

output.logits
output.aux_logits

rasha-salim · 2020-05-16T17:07:42Z

@TheCodez oh dear how did I kiss that!
Thanks for pointing this out

wlj567 · 2021-10-20T01:30:40Z

Traceback (most recent call last): File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 282, in <module> trainer.training(epoch) File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 214, in training loss = self.criterion(outputs, target) File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/pxg/DAN/DANet/encoding/parallel.py", line 130, in forward return self.module(inputs, *targets[0], **kwargs[0]) File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/pxg/DAN/DANet/encoding/nn/loss.py", line 68, in forward return super(SegmentationLosses, self).forward(*outputs) File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward ignore_index=self.ignore_index, reduction=self.reduction) File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction) File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1295, in log_softmax ret = input.log_softmax(dim) AttributeError: 'tuple' object has no attribute 'log_softmax'
Hello, can you help me with my question? Thank you very much.

wlj567 · 2021-10-20T09:04:37Z

Traceback (most recent call last):
File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 283, in
trainer.training(epoch)
File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 215, in training
loss = self.criterion(outputs, target)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/pxg/DAN/DANet/encoding/parallel.py", line 130, in forward
return self.module(inputs, *targets[0], **kwargs[0])
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call
result = self.forward(*input, **kwargs)
File "/home/pxg/DAN/DANet/encoding/nn/loss.py", line 68, in forward
return super(SegmentationLosses, self).forward(*inputs)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 904, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1970, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1295, in log_softmax
ret = input.log_softmax(dim)
AttributeError: 'tuple' object has no attribute 'log_softmax'

wlj567 · 2021-10-27T10:00:18Z

Traceback (most recent call last): File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 292, in <module> trainer.training(epoch) File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 214, in training aux1, aux2, outputs = self.model(image) ValueError: not enough values to unpack (expected 3, got 1)
Hello, I modified it according to the above method and still reported an error. Can you help me see it? Thank you very much.

TheCodez · 2021-10-27T11:07:46Z

@gamesMum I would advise not to do that, as you are essentially running your model twice. Instead just use this once: output = model(data)

and then access using:
output.logits
output.aux_logits

@wlj567 see if that works.

wlj567 · 2021-10-28T01:15:37Z

@TheCodez Hello, I don't quite understand how to modify it. Can you have a look?

def training(self, epoch):
train_loss = 0.0
self.model.train()
tbar = tqdm(self.trainloader)
for i, (image, target) in enumerate(tbar):
self.scheduler(self.optimizer, i, epoch, self.best_pred)
self.optimizer.zero_grad()
outputs = self.model(image)
loss = self.criterion(outputs, target)
loss.backward()
self.optimizer.step()
train_loss += loss.item()
tbar.set_description('Train loss: %.3f' % (train_loss / (i + 1)))

TheCodez · 2021-10-29T09:44:40Z

@wlj567 see if that works.

loss = self.criterion(outputs.logits, target)

wlj567 · 2021-10-30T07:09:21Z

File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 223, in training
loss = self.criterion(outputs.logits, target)
AttributeError: 'tuple' object has no attribute 'logits'

Do I modify or report an error according to this method

wlj567 · 2021-10-30T07:22:25Z

def forward(self, *inputs):

    preds, target = tuple(inputs)
    inputs = tuple(list(preds) + [target])
    if not self.se_loss and not self.aux:
        return super(SegmentationLosses, self).forward(*inputs)
    elif not self.se_loss:
        pred1, pred2, target = tuple(inputs)
        loss1 = super(SegmentationLosses, self).forward(pred1, target)
        loss2 = super(SegmentationLosses, self).forward(pred2, target)
        return loss1 + self.aux_weight * loss2

I modified this part in loss.py and added

preds, target = tuple(inputs)
inputs = tuple(list(preds) + [target])

Train loss: 4.745: 0%| | 6/5052 [00:12<2:36:10, 1.86s/it]
This part can be loaded, but the loading speed is very slow, and an error will be reported in the end.

wlj567 · 2021-10-30T07:31:48Z

Traceback (most recent call last):
File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 293, in
trainer.training(epoch)
File "/home/pxg/DAN/DANet/experiments/segmentation/train.py", line 224, in training
loss = self.criterion(outputs, target)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/pxg/DAN/DANet/encoding/parallel.py", line 130, in forward
return self.module(inputs, *targets[0], **kwargs[0])
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/pxg/DAN/DANet/encoding/nn/loss.py", line 70, in forward
return super(SegmentationLosses, self).forward(*inputs)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 942, in forward
ignore_index=self.ignore_index, reduction=self.reduction)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 2056, in cross_entropy
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
File "/home/pxg/DAN/venv/lib/python3.6/site-packages/torch/nn/functional.py", line 1350, in log_softmax
ret = input.log_softmax(dim)
AttributeError: 'tuple' object has no attribute 'log_softmax'

This is the original code error.

alykhantejani added the awaiting response label Oct 19, 2017

alykhantejani closed this as completed Nov 1, 2017

alykhantejani removed the awaiting response label Nov 1, 2017

csarofeen mentioned this issue Jul 11, 2018

AttributeError: 'tuple' object has no attribute 'log_softmax' when run inception_v3 NVIDIA/apex#27

Closed

Failed in fine-tuning inception_v3 #302

Failed in fine-tuning inception_v3 #302

Comments

JingyunLiang commented Oct 18, 2017

alykhantejani commented Oct 18, 2017

alykhantejani commented Oct 18, 2017

jamiechoi1995 commented Oct 27, 2017 • edited Loading

alykhantejani commented Oct 27, 2017

jamiechoi1995 commented Oct 30, 2017

TiRune commented Nov 1, 2017

alykhantejani commented Nov 1, 2017 • edited Loading

rajasekharponakala commented Mar 23, 2019 • edited Loading

fmassa commented Mar 24, 2019

rajasekharponakala commented Mar 25, 2019 • edited Loading

fmassa commented Mar 25, 2019

rajasekharponakala commented Mar 25, 2019 • edited Loading

fmassa commented Mar 25, 2019

rajasekharponakala commented Mar 25, 2019

TheCodez commented Mar 25, 2019

rajasekharponakala commented Mar 25, 2019

TheCodez commented Mar 25, 2019

rajasekharponakala commented Mar 25, 2019

tejasri19 commented Jul 10, 2019 • edited Loading

fmassa commented Jul 10, 2019

Holmeyoung commented Jul 16, 2019 • edited Loading

fmassa commented Jul 16, 2019

rasha-salim commented May 16, 2020 • edited Loading

rasha-salim commented May 16, 2020 • edited Loading

TheCodez commented May 16, 2020

rasha-salim commented May 16, 2020

wlj567 commented Oct 20, 2021

wlj567 commented Oct 20, 2021

wlj567 commented Oct 27, 2021

TheCodez commented Oct 27, 2021

wlj567 commented Oct 28, 2021

TheCodez commented Oct 29, 2021

wlj567 commented Oct 30, 2021

wlj567 commented Oct 30, 2021

wlj567 commented Oct 30, 2021

jamiechoi1995 commented Oct 27, 2017 •

edited

Loading

alykhantejani commented Nov 1, 2017 •

edited

Loading

rajasekharponakala commented Mar 23, 2019 •

edited

Loading

rajasekharponakala commented Mar 25, 2019 •

edited

Loading

rajasekharponakala commented Mar 25, 2019 •

edited

Loading

tejasri19 commented Jul 10, 2019 •

edited

Loading

Holmeyoung commented Jul 16, 2019 •

edited

Loading

rasha-salim commented May 16, 2020 •

edited

Loading

rasha-salim commented May 16, 2020 •

edited

Loading