The loss becomes negative #1917

FiReTiTi · 2016-03-08T01:55:15Z

Hi,
I jut ran a CNN built with Keras on a big training set, and I has weird loss values at each epoch (see below):
66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

345664/511502 [===================>..........] - ETA: 23s - loss: 8.3174
345696/511502 [===================>..........] - ETA: 23s - loss: -39342531075525840.0000

214080/511502 [===========>..................] - ETA: 41s - loss: 8.3406
214112/511502 [===========>..................] - ETA: 41s - loss: -63520753730220536.0000

How is that possible? The loss becomes suddenly to big and the value gets bigger than the double encoding?
Is there a way to avoid it?

Regards,

the-moliver · 2016-03-08T02:08:18Z

Do you really think that's enough information for anyone to be able to answer your question?

fchollet · 2016-03-08T03:14:09Z

The loss is just a scalar that you are trying to minimize. It's not supposed to be positive! For instance a cosine proximity loss will usually be negative (trying to make proximity as high as possible by minimizing a negative scalar).

FiReTiTi · 2016-03-08T05:35:06Z

Hi, thank you for your answers.
The last layer is a dense layer with a sigmoid, so the value should not be negative:
model.add(Dense(1)) model.add(Activation('sigmoid'))
What really surprise me is that from one batch to the next, there is such a fall.
@the-moliver: which information would you need?

ymcui · 2016-03-08T07:43:43Z

What is your training objective, binary_crossentropy or others?
And if you are using GPU to train your network, the datatype should be in float32 (theano restriction).

FiReTiTi · 2016-03-08T08:06:09Z

Hi, thank you for your help.
Yes, the training objective is binary_crossentropy.
And yes, all my data are already float32, I made sure of that.

ymcui · 2016-03-08T08:24:23Z

@FiReTiTi
I don't think binary_crossentropy could return the negative values.
In Theano backend,

crossentropy(t,o) = -(t*log(o) + (1 - t)*log(1 - o)).

t and o are all in range of [0,1], making the whole equation non-negative.
your output log more seems like an overflow.
Maybe you can check on your data ? or you can extract a proportion of data to test if the model still gives the negative loss.

66496/511502 [==>...........................] - ETA: 63s - loss: 8.2800
66528/511502 [==>...........................] - ETA: 63s - loss: -204433556137039776.0000

FiReTiTi · 2016-03-08T18:44:09Z

If the loss cannot be negative, then does it mean it goes over the encoding limits and then it loops back into the negative values?

FiReTiTi · 2016-03-11T19:02:44Z

I am still working on the same data, and here is an other weird thing:
Epoch 140/3000
5s - loss: 0.5968 - val_loss: 0.4191
Epoch 141/3000
5s - loss: 0.5974 - val_loss: 0.4556
Epoch 142/3000
5s - loss: 0.5979 - val_loss: 0.4382
Epoch 143/3000
5s - loss: 6.0467 - val_loss: 11.1324
Epoch 144/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 145/3000
5s - loss: 7.7176 - val_loss: 11.1324
Epoch 146/3000
5s - loss: 7.7176 - val_loss: 11.1324
And nothing changes during the next 2850 epochs, perfectly identical.

philipperemy · 2016-03-12T08:03:00Z

@FiReTiTi please give more information about your model if you want help on that.

In your last case, your optimiser is likely to be stuck in a local minima. That could explain why it remains identical during all your next iterations.

FiReTiTi · 2016-03-12T08:13:12Z

Here is the model:
model = Sequential()
model.add(Convolution2D(8, 7, 7, border_mode='valid', input_shape=(1, 31, 31), activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(16, 5, 5, border_mode='valid', activation='tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Convolution2D(32, 3, 3, border_mode='valid', activation='relu'))
model.add(Convolution2D(65, 1, 1, border_mode='valid', activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(23))
model.add(Activation('tanh'))
model.add(Dropout(0.1))
model.add(Dense(11))
model.add(Activation('sigmoid'))
model.add(Dropout(0.1))
model.add(Dense(1))
model.add(Activation('sigmoid'))
optimizer = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss='binary_crossentropy', optimizer=optimizer)
model.fit(dataset, labels, batch_size=batch_size, nb_epoch=nb_epoch, shuffle=True, validation_split=0.1, verbose=2)

The dataset contains 70 000 images of size 31x31.
What I don't understand is why there is this sudden jump of the loss and then the loss is stuck.

sunshineatnoon · 2016-03-21T08:35:59Z

@FiReTiTi May I ask how did you solve the negative loss problem? I ran into the same problem, my loss is a customed loss with a bunch of mse, so it shouldn't be negative either. It looks like this:

Epoch 1/15
   32/33102 [..............................] - ETA: 15552s - loss: -88028794.750
   48/33102 [..............................] - ETA: 11176s - loss: -1419246161.8
   64/33102 [..............................] - ETA: 8987s - loss: -13590295485.3
   80/33102 [..............................] - ETA: 7674s - loss: -107586018455.
   96/33102 [..............................] - ETA: 6797s - loss: -661847078867.
  112/33102 [..............................] - ETA: 6172s - loss: -3960883097561
  128/33102 [..............................] - ETA: 5702s - loss: -3047999712303
  144/33102 [..............................] - ETA: 5337s - loss: -2318531227797
  320/33102 [..............................] - ETA: 3720s - loss: -1825597231654244712448.0000

FiReTiTi · 2016-03-21T17:23:39Z

I haven't, it still happen time to time :-(
Use smaller NNs seems to reduce the phenomenon.

sunshineatnoon · 2016-03-22T01:16:37Z

@FiReTiTi Thanks for your reply. I found my problem, I use a custom loss and accidentally put y_pred and y_true in the wrong order when passing them to my loss function, so maybe it's not the same reason in your case.

philipperemy · 2016-03-22T01:18:00Z

@sunshineatnoon so you were using a non symmetric loss function like the cross entropy?

sunshineatnoon · 2016-03-22T01:20:00Z

Here is my loss function, it's a bunch of squared values.

FiReTiTi · 2016-03-22T02:08:19Z

Cool for you!
It happened to me with a binary_crossentropy :-(

sunshineatnoon · 2016-03-22T02:15:21Z

@FiReTiTi In that case, I think it's more likely an overflow. Do you use Theano? Maybe you can try NanGuardMode in theano to see if it gives you any errors or warnings. I googled a lot last night and found that Nans or Infs might cause this kind of error. Such as this one

FiReTiTi · 2016-03-22T02:29:21Z

That's also my opinion. Thanks for the tips, I will test them when it occurs again.

zahrasorour · 2016-11-14T17:40:12Z

My loss is negative, what does that mean? I am using tensorflow backend.

Epoch 1/10
2536/2536 [==============================] - 584s - loss: -7.7728 - acc: 0.2492 - val_loss: -7.9712 - val_acc: 0.2500

My code is here for reference:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.preprocessing.image import ImageDataGenerator

import numpy as np

model = Sequential()

model.add(Convolution2D(3, 3, 32, border_mode='valid', dim_ordering='tf', input_shape=(150, 200, 3)))
model.add(Activation('relu'))
model.add(Convolution2D(3, 3, 32))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Convolution2D(64, 3, 3, border_mode='valid'))
model.add(Activation('relu'))
model.add(Convolution2D(64, 3, 3))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))

model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(Dense(1))
model.add(Activation('sigmoid'))

train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True)

model.compile(loss='binary_crossentropy',
optimizer='rmsprop',
metrics=['accuracy'])

test_datagen = ImageDataGenerator(rescale=1./255)

train_generator = train_datagen.flow_from_directory(
'Data/Train', # this is the target directory
target_size=(150, 200), # all images will be resized to 150x150
batch_size=32,
class_mode='binary') # since we use binary_crossentropy loss, we need binary labels

validation_generator = test_datagen.flow_from_directory(
'Data/Validation',
target_size=(150, 200),
batch_size=32,
class_mode='binary')

model.fit_generator(train_generator, samples_per_epoch=2536, nb_epoch=10, validation_data=validation_generator, nb_val_samples=800)

model.save_weights('thesis.h5')

hsiaoyi0504 · 2017-11-27T04:23:10Z

@zach-nervana FYI
Some possible reason is listed in this stackoverflow question.

HuangBo-Terraloupe · 2018-03-26T14:03:45Z

Hello everyone,

As we all know, the kld loss can not be negative, I am training a regression model, and get negative values.
Here is my model:

model

base_model = VGG16(input_shape=(360, 480, 3), weights='imagenet', include_top=False)
x = base_model.layers[-2].output
x = MaxPool2D(pool_size=(2, 2), padding='same', strides=(1, 1), name='block5_pool')(x)
x = Conv2D(32, (7, 7), activation='relu', padding='same', name='block5_conv5')(x)
x = Conv2D(8, (7, 7), activation='relu', padding='same', name='block5_conv6')(x)
x = Conv2D(1, (7, 7), activation='relu', padding='same', name='block5_conv7')(x)
x = Flatten(name='flatten')(x)
prediction = Activation('softmax')(x) # problem come in!!!!!!!!!!!
model = Model(inputs=base_model.input, outputs=prediction)

compile

adam = Adam(lr=1e-5, beta_1=0.9, beta_2=0.999, epsilon=1e-8, decay=0.0)
model.compile(optimizer=adam, loss='kld', metrics=['accuracy'])

The problem is, if I add a softmax layer at end of the model, the loss is positive, which is fine, but the loss is around 32, it is really big. But if I remove the softmax layer, the loss becomes negative.

For the input and output, input are images, I normalize the images to 0-1, and labels also 0-1.
My point is, this is a regression model, I do not want to add a softmax layer at end of the model, but the loss becomes negative, which is not right. Is there someone has a idea? How to solve the problem?

hujiao1314 · 2018-03-27T02:07:22Z

@FiReTiTi Did you solve your problem? I had a similar problem. I used theano as backend, and the loss function is binary_crossentropy, during the training, the acc, val_acc, loss, and val_loss never changed in every epoch, and loss value is very high , about 8. I used 4000 training samples 1000 validation samples
this is my model:

`inputs_x=Input(shape=(1,65,21))
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(64,(3,3),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Conv2D(32,(5,5),padding='same',data_format='channels_first',activation='relu',use_bias=True)(x)
x=Conv2D(16,(5,5),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(x)
x=MaxPooling2D(pool_size=(2,2),strides=(2,2))(x)

x=Dropout(0.55)(x)
x=Flatten()(x)

inputs_y=Input(shape=(1,32,21))
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(32,(2,2),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Conv2D(32,(4,4),padding='same',data_format='channels_first',activation='relu',use_bias=True)(y)
y=Conv2D(8,(4,4),padding='valid',data_format='channels_first',activation='relu',use_bias=True)(y)
y=MaxPooling2D(pool_size=(2,2),strides=(2,2))(y)

y=Dropout(0.60)(y)
y=Flatten()(y)

merged_input=keras.layers.concatenate([x,y],axis=-1)

z=Dense(16,activation='softmax')(merged_input)
z=Dense(8,activation='softmax')(z)
z=Dense(4,activation='softmax')(z)

outp=Dense(1,activation='softmax')(z)

model=Model(inputs=[inputs_x,inputs_y],outputs=outp)
model.compile(loss='binary_crossentropy',
optimizer='sgd',
metrics=['accuracy'])

history=model.fit(x=[train_inputs_x,train_inputs_y],y=train_label,batch_size=32,
epochs=30,validation_split=0.2,shuffle=True)`

Any ideas for this problem?

FiReTiTi · 2018-03-27T06:51:56Z

No. It looks like an overflow problem that did not happen when I reduced the size of my model.

Have you tried to switch for TensorFlow as backend. Things seems to be more stable for me since I use TensorFlow.

hujiao1314 · 2018-03-28T05:05:41Z

Ok, I will try switching the backend. Thanks

fregocap · 2018-07-09T12:45:14Z

@FiReTiTi Did you try to normalize your input? Non appropriate normalization of the input may lead to a gradient explosion problem.

FiReTiTi · 2018-07-09T16:23:24Z

@fregocap Yes, the input are normalized.

asmekal · 2018-09-21T15:01:02Z

I had the same problem with negative loss binary crossentropy.
My model ended with

model.add(Dense(1))
model.add(Activation('sigmoid'))

The problem in my case was that the outputs given by generator were not 0 and 1 but several classes (0, 1, 2, ... 6) instead. The model unexpectedly did not fail but provided negative loss.

The solution is to use Dense(n_classes, activation='softmax')
Just be careful with what you are doing

Quetzalcohuatl · 2019-01-04T18:54:30Z

When binary cross entropy predictions are negative, it is because the true values are not [0,1]. In my case I was using [-1,1]. The model does not fail, but produces negative value.

FiReTiTi · 2019-01-05T01:57:50Z

Thanks.

ShaShekhar · 2019-10-03T11:40:22Z

I got the negative loss, when i training autoencoder on image data and normalize the images to 0 mean and 1 std (half of data value is -ve) and using binary_crossentropy loss. Later i figure out, this is happening because of binary_crossentropy loss work as regression loss when the input is between 0 and 1, but in my case inputs are also -ve.
http://neuralnetworksanddeeplearning.com/chap3.html

HamedMozaffari · 2020-01-09T18:07:40Z

The answer is easy in my opinion. Your data are not between 0 and 1 and they are between 0 and 255. Just add a "/ 255" on your ground truth data and results will be positive.

snsmssss · 2020-04-27T23:53:42Z

Thanks Hamed. You are right

ierho · 2024-02-11T17:24:44Z

Quite a long time ago I also found this issue, fixed it by changing the optimizer from Adam back to the default RMSprop.

ierho · 2024-02-11T17:31:50Z

I think it can also be the result of a high learning rate in some cases, the weights might become too large for tensorflow to work properly. Sometimes when I have the loss growing, I try decreasing the learning rate and it works.

FiReTiTi mentioned this issue Mar 8, 2016

Learning does not start #1926

Closed

stale bot added the stale label May 23, 2017

stale bot closed this as completed Jun 23, 2017

shibbirtanvin mentioned this issue Feb 22, 2022

RFC keras-team/governance#34

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The loss becomes negative #1917

The loss becomes negative #1917

FiReTiTi commented Mar 8, 2016

the-moliver commented Mar 8, 2016

fchollet commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

ymcui commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

ymcui commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

FiReTiTi commented Mar 11, 2016

philipperemy commented Mar 12, 2016

FiReTiTi commented Mar 12, 2016

sunshineatnoon commented Mar 21, 2016

FiReTiTi commented Mar 21, 2016 •

edited

Loading

sunshineatnoon commented Mar 22, 2016

philipperemy commented Mar 22, 2016

sunshineatnoon commented Mar 22, 2016

FiReTiTi commented Mar 22, 2016

sunshineatnoon commented Mar 22, 2016

FiReTiTi commented Mar 22, 2016

zahrasorour commented Nov 14, 2016

hsiaoyi0504 commented Nov 27, 2017 •

edited

Loading

HuangBo-Terraloupe commented Mar 26, 2018

hujiao1314 commented Mar 27, 2018

FiReTiTi commented Mar 27, 2018

hujiao1314 commented Mar 28, 2018

fregocap commented Jul 9, 2018

FiReTiTi commented Jul 9, 2018

asmekal commented Sep 21, 2018

Quetzalcohuatl commented Jan 4, 2019

FiReTiTi commented Jan 5, 2019

ShaShekhar commented Oct 3, 2019

HamedMozaffari commented Jan 9, 2020

snsmssss commented Apr 27, 2020

ierho commented Feb 11, 2024 •

edited

Loading

ierho commented Feb 11, 2024

The loss becomes negative #1917

The loss becomes negative #1917

Comments

FiReTiTi commented Mar 8, 2016

the-moliver commented Mar 8, 2016

fchollet commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

ymcui commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

ymcui commented Mar 8, 2016

FiReTiTi commented Mar 8, 2016

FiReTiTi commented Mar 11, 2016

philipperemy commented Mar 12, 2016

FiReTiTi commented Mar 12, 2016

sunshineatnoon commented Mar 21, 2016

FiReTiTi commented Mar 21, 2016 • edited Loading

sunshineatnoon commented Mar 22, 2016

philipperemy commented Mar 22, 2016

sunshineatnoon commented Mar 22, 2016

FiReTiTi commented Mar 22, 2016

sunshineatnoon commented Mar 22, 2016

FiReTiTi commented Mar 22, 2016

zahrasorour commented Nov 14, 2016

hsiaoyi0504 commented Nov 27, 2017 • edited Loading

HuangBo-Terraloupe commented Mar 26, 2018

model

compile

hujiao1314 commented Mar 27, 2018

FiReTiTi commented Mar 27, 2018

hujiao1314 commented Mar 28, 2018

fregocap commented Jul 9, 2018

FiReTiTi commented Jul 9, 2018

asmekal commented Sep 21, 2018

Quetzalcohuatl commented Jan 4, 2019

FiReTiTi commented Jan 5, 2019

ShaShekhar commented Oct 3, 2019

HamedMozaffari commented Jan 9, 2020

snsmssss commented Apr 27, 2020

ierho commented Feb 11, 2024 • edited Loading

ierho commented Feb 11, 2024

FiReTiTi commented Mar 21, 2016 •

edited

Loading

hsiaoyi0504 commented Nov 27, 2017 •

edited

Loading

ierho commented Feb 11, 2024 •

edited

Loading