Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) #140

beniz · 2016-06-02T08:37:36Z

RNN + LSTM support now merged into Caffe, BVLC/caffe#3948.
This paves the way for robust integration within dd.

beniz · 2016-06-25T11:38:00Z

This is a good example to start working on and to reproduce: https://github.com/fchollet/keras/blob/master/examples/imdb_bidirectional_lstm.py

beniz · 2016-09-04T17:55:11Z

Good link to get started with: http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html

kyrs · 2016-09-05T18:50:40Z

Hi @beniz ,
I have gone through both of the tutorial, which you have listed above. Apart from these tutorial, I am also learning more about LSTM and RNN networks. In the meantime, can you tell me, the next steps for intergeneration of these networks with dd .

beniz · 2016-09-05T19:08:15Z

Sure @kyrs. I'd suggest as a first goal that you take the keras example and try to get similar results with DD. To do this, you may have to go through a few steps. First, you may want to write the required prototxt files that describe the network with lstm units. Second, try to connect it to the existing character based input connector.

In other words onve you've got the prototxt files, you should be able to use the examples on http://www.deepdetect.com/applications/text_model/ with lstm instead of CNN.

This is easier to say than do of course, and the input format to Caffe for lstm may not fit exactly the existing DD code, but based on your experiments, we'll fix that up.

Let me know how this sounds.

kyrs · 2016-09-06T19:59:54Z

hi @beniz. I was trying to implement Christopher Bourez's blog( http://christopher5106.github.io/deep/learning/2016/06/07/recurrent-neural-net-with-Caffe.html ) on implementation of LSTM using caffe. Whole tutorial is based upon https://github.com/christopher5106/last_caffe_with_stn . I guess, caffe has officially added LSTM code in their master branch. BVLC/caffe#4629 Do you think there is a need to integrate a new lib just for LSTM especially when caffe already provide such functionality. One more point I found literature about LSTM implementation in caffe a little incomplete. Do you have any resources to learn more about it ?? . This will be my first time coding with caffe

kyrs · 2016-09-06T20:33:52Z

finally managed to find some example ... https://github.com/jeffdonahue/caffe/tree/recurrent-rebase-cleanup/examples/coco_caption .

beniz · 2016-09-06T20:41:40Z

~~What do you mean more exactly by integrating a new lib ?~~ Ok I believe I understand: yes it'd be better to use the original Caffe lstm implementation. But if there are things that need change, I'll help or I'll do it. The custom version of Caffe we now use with DD has many improvements I've built in, so when it is the best solution, it's OK to do it.

kyrs · 2016-09-14T20:37:00Z

hi @beniz ,
Just wanted to update you about my progress . It seems there is very less literature on using LSTM and RNN with caffee. Although, the pull request by jeffdonahue https://github.com/BVLC/caffe/pull/2033/files give some overview about it. I am trying to run the tutorial as is suggested in the pull request.

beniz · 2016-09-15T16:20:21Z

@kyrs OK thanks. Let me know if you need help, and reach out on gitter for instance. We can coordinate there.

kyrs · 2016-09-20T20:07:25Z

@beniz as per our discussion I have modified my prototxt https://gist.github.com/kyrs/e93548079ab9954915122263cf845325 on the basis of the PR in https://github.com/beniz/deepdetect/pull/189 and have merged https://github.com/beniz/deepdetect/pull/174 in my forked branch. but on running the training process training process have thrown following error https://gist.github.com/kyrs/c9dc967bfd49e553cfed10668a4b19e4 .
following are my service creation and training requests
curl -X PUT "http://localhost:8888/services/ag" -d "{\"mllib\":\"caffe\",\"description\":\"newsgroup classification service\",\"type\":\"supervised\",\"parameters\":{\"input\":{\"connector\":\"txt\"},\"mllib\":{\"nclasses\":4,\"embedding\":true}},\"model\":{\"repository\":\"/home/shubham/openSource/deepdetect/models/agr_lstm/\"}}"

curl -X POST "http://localhost:8888/train" -d "{\"service\":\"ag\",\"async\":true,\"parameters\":{\"mllib\":{\"gpu\":true,\"solver\":{\"iterations\":50000,\"test_interval\":1000,\"base_lr\":0.01,\"solver_type\":\"ADAM\"},\"net\":{\"batch_size\":300}},\"input\":{\"sequence\":1024,\"embedding\":true,\"shuffle\":true,\"test_split\":0.2,\"min_count\":10,\"min_word_length\":5,\"count\":false},\"output\":{\"measure\":[\"mcll\",\"f1\"]}},\"data\":[\"/home/shubham/openSource/deepdetect/models/data/agnews_data/\"]}"

beniz · 2016-09-21T10:33:59Z

Caffe documentation and examples are a mess, but the comment that explains the required inputs to the RNN and LSTM recurrent layers is here: BVLC/caffe#2033 (comment)

As expected this requires modifying the way DD produces the inputs, in order to fit these requirements. I'll make more comments and post potential code to help with this.

kyrs · 2016-09-28T19:11:59Z

found this interesting document, which explain lstm integration with caffe in more detail. http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-sequences.pdf

kyrs · 2016-10-05T21:23:41Z

hi @beniz I am experimenting with dd to run LSTM. I need your suggestion on few stuff, In the example given for training LSTM in caffe the value of \delta is explicitly created and stored in hdf5 format before training or testing the LSTM network. see https://github.com/BVLC/caffe/pull/2033/files#diff-c912186cd39ea15b5646c3b2f5350a7eR105 and https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193. Do you think that user should formulate the values of this \delta based on his data and provide it in .prototxt before training or testing or the binary value of this delta should be filled/created in caffe during training or testing based on the batch data.

beniz · 2016-10-06T08:10:03Z

\delta should be put into the Datum before storage as LMDB, in CaffeInputConn.h, much like the other decompositions. I've put it all on paper, it should not be long to implement. You can do it if you like, by looking at the way the characters or words in sentences are converted into Datum, still in CaffeInputConn. The existing code can serve as support for implementing storage of padded sentences with \delta.

kyrs · 2016-10-17T20:16:44Z

hi @beniz, I have few doubts regarding modification in caffeInputConn.h.
In the file https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 , are you converting a full sentence/sequence into datum ? . Also as per my understanding, I need to create a separate lmdb file for storing \delta in datum format and develop an one to one mapping with the main lmdb file i.e. train.lmdb & test.lmdb (BVLC/caffe#1381).

Finally I have started to get hang of all these terms. Please, correct me if I am going in wrong direction.

beniz · 2016-10-17T20:53:01Z

hi @kyrs ! yes, to_datum converts one hot word or char vector sequences to Caffe Datum.

You can write two lmdb files, they will be synced if you write the entries in the same order. An alternative that DD already uses elsewhere is to put data and deltas into a single Datum and to slice the resting Blob accordingly when running. This requires adding a Slice layer after the data layer If you look at multiple target regression models in DD, they already use the Slice layer. This post https://groups.google.com/forum/m/#!topic/caffe-users/RuT1TgwiRCo can help you with the slicing if you choose this route.

Let me know how this goes!

kyrs · 2016-10-19T20:20:21Z

hi @beniz if you look into the files for generating \delta https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 .
Also, can I also assume that starting index of the vector hit marks the start of a new sequence ?

beniz · 2016-10-20T15:24:48Z

hi @beniz if you look into the files for generating \delta https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR193 you can easily see that all the padded words have a value 0. But for our use case how can we find the index of padded word in https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L673 .

We are padding to, see https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L786 We fill up the whole sequence with zeros, then fill what we can.

Also, can I also assume that starting index of the vector hit marks the start of a new sequence ?

Yes, hit holds a sequence (i.e. a sentence).

Let me know if this helps.

kyrs · 2016-10-22T08:59:31Z

I think that current method of padding the whole sequence with zeros and filling it with appropriate values doesn't preserve the ordering of a words in a given sentence. For LSTM the ordering of word is also important . if you look into https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170 the author have appended words in a sequential manner, which clearly preserve the word ordering.

what do you think about this ?. I guess we have to change the format in which words are being stored in datum. I have few stuff in mind but before implementing I need to discuss it with you.

beniz · 2016-10-22T15:33:56Z

You can change the format, but you could also use the characters instead of words to play with the LSTM.

kyrs · 2016-10-23T21:25:58Z

I have made some changes in caffeinputconns.h file to integrate it with lstm . Although, I have managed to build it properly, but still I am sceptical about my method. what do you think about it ? https://gist.github.com/kyrs/a1b1065c7bfd92ea48c56f66607b1d0a

beniz · 2016-10-24T15:52:57Z

I'm not sure why you are calling to_datum before filling up the Datum. Actually I believe the code should be executed within to_datum, though I may have missed something.

kyrs · 2016-10-24T16:03:45Z

I am following the multi label classification example https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L400 to understand the slicing for \delta file. Also, I didn't wanted to change the code of to_datum unless I am confident with my process. If you think I should make changes in to_datum file then I will do it today and try to train the model on a sample dataset.

beniz · 2016-10-24T16:16:14Z

Yes you can change to_datum, otherwise you may get weird results by letting the datums being filled up before your code runs.

Slicing is not difficult: just append the deltas after the fixed length sentence (use padding for fixed length as necessary). When running the model, a Slice layer separates the sentence from the deltas, that's it.

The fixed length can be relaxed later on, there's no need to try the most complex setup first. Let me know how it goes!

kyrs · 2016-10-24T16:29:40Z

@beniz is it possible to slice a datum based on width rather than slicing it on channel ??. If you look in character based encoding of text https://github.com/beniz/deepdetect/blob/master/src/caffeinputconns.h#L793. We may not need to pad. As the length of _alphabet is already fixed fixed.https://github.com/beniz/deepdetect/blob/master/src/txtinputfileconn.cc#L353

beniz · 2016-10-24T17:06:44Z

You can slice in any dimension you want, even multiple times.

beniz · 2016-10-25T18:12:35Z

The current padding for character does preserve order. The one for words does not since it is a bag of word model. But you could build one that has ordered words. To begin with you might want to try LSTM on ordered characters and thus play with only minimal changes to the existing code.

⁣

Sent from BlueMail

On Oct 22, 2016, 10:59, at 10:59, Kumar Shubham notifications@github.com wrote:

I think that current method of padding the whole sequence with zeros
and filling it with appropriate values doesn't preserve the ordering of
a words in a given sentence. For LSTM the ordering of word is also
important . if you look into
https://github.com/BVLC/caffe/pull/2033/files#diff-3a0266c4b6244affd2fd7505a2452f5fR170
the author have appended words in a sequential manner, which clearly
preserve the word ordering.

what do you think about this ?. Do you have anything in mind to do this
?

You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/beniz/deepdetect/issues/140#issuecomment-255516344

kyrs · 2016-10-25T21:38:19Z

sure, I am making changes in the code for character based LSTM prediction. Will soon update you with the results.

kyrs · 2016-10-27T20:45:34Z

I have created a PR with little modification https://github.com/beniz/deepdetect/pull/208. I think these changes will work. what do you say ? As per next step I am creating deploy.prototxt file with necessary slicing to run it on AG News data.

beniz · 2016-10-27T20:50:19Z

hi @kyrs, best is to PR once you know that it works :) Have you tried training on an example ? The IMDb dataset would be a good one to use!

kyrs · 2016-10-27T20:57:27Z

Oops my fault !!! I just wanted to share code with you that's why I created it. if you say so, I will close it for now .. until I tested it completely.

beniz · 2016-10-27T21:00:28Z

since you must have pushes it onto your branch, just point me to the branch :) I'll take a look at it tomorrow!

kyrs · 2016-10-27T21:09:23Z

You can see the changes in https://github.com/kyrs/deepdetect/tree/lstm_140/

kyrs · 2016-10-28T20:42:51Z

I have created .prototxt https://gist.github.com/kyrs/86021a67b82c34513cffe6e839bcbf7b
file for AG News data based on the changes I have made in the local branch.

But, when I tried to test the changes, I got stuck on some issue. Although, I am able to launch the service and start training process with 200 status. But when I try to check the training status, I am again getting Error: service ag training status call failed.

beniz · 2016-10-29T08:19:23Z

run the job with async:false and then you need to investigate

hakimkhalafi · 2017-03-13T15:40:50Z

Hey guys, exciting to see if LSTM support is possible within DeepDetect also. Did you ever reach a conclusion from your tests in October?

Cheers, Hakim

beniz · 2017-03-13T16:22:57Z

@hakimkhalafi it hasn't been fulfilled yet. Although we have all the pieces lying on the table, don't expect LSTM support within DD before a few months unless it gets sponsored by one of our customers. Interestingly, the demand for LSTM has been very high. What is the application you are contemplating at the moment if you can share ?

divamgupta · 2017-12-11T15:36:55Z

Hi @beniz,

We wanted to implement a CNN + LSTM model. Here we have multiple images and each image is fed to the same CNN, then the fixed size vector output of each image is fed into an LSTM. Would you be knowing about any resources/link/etc which could help in implementing that?

Thank You

beniz · 2017-12-12T03:57:42Z

Hi @divamgupta DD does not directly support input LSTM layer for production but this should not affect you with images and CNN as first layer, though you may need to re-arrange your inputs.

If you already have the Caffe network defined (e.g. prototxt), you could pass the aggregated images as input, then split them with a Split layer and feed them to your CNN + LSTM.

Join the gitter chat rather than discussing these details here.

soulslicer · 2018-03-06T18:22:58Z

Hi all,

How does one actually use the LSTM layer. I keep getting errors, saying certain parameters are invalid

layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  RecurrentParameter {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}

beniz · 2018-03-06T18:30:10Z

Open an issue and report all requested information, and let's start from there. Thanks.

cuixing158 · 2018-06-12T14:14:03Z

Hi,all,
when I train LSTM network, error appear" Message type "caffe.LayerParameter" has no field named "lstm_param""?

I have installed the latest caffe version from the master branch. According to my knowledge Caffe now supports LSTM layers. But when I run the solver I get this error.

lstm.prototxt is :

input: "data"
input_shape { dim: 320 dim: 1 }
input: "clip"
input_shape { dim: 320 dim: 1 }
input: "label"
input_shape { dim: 320 dim: 1 }
layer {
  name: "Silence"
  type: "Silence"
  bottom: "label"
  include: { phase: TEST }
}
layer {
  name: "lstm1"
  type: "Lstm"
  bottom: "data"
  bottom: "clip"
  top: "lstm1"

  lstm_param {
    num_output: 15
    clipping_threshold: 0.1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "ip1"
  type: "InnerProduct"
  bottom: "lstm1"
  top: "ip1"

  inner_product_param {
    num_output: 1
    weight_filler {
      type: "gaussian"
      std: 0.1
    }
    bias_filler {
      type: "constant"
    }
  }
}
layer {
  name: "loss"
  type: "EuclideanLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
  include: { phase: TRAIN }
}



#my "solver.prototxt" is:
`net: "lstm.prototxt"
test_iter: 1
test_interval: 2000000
base_lr: 0.0001
momentum: 0.95
lr_policy: "fixed"
display: 200
max_iter: 100000
solver_mode: GPU
average_loss: 200
#debug_info: true`

beniz · 2018-06-12T14:59:23Z

You should post this on Caffe issues, you are obviously not using dd.

beniz added type:enhancement kind:neural net kind:API kind:connectors mllib:caffe labels Jun 2, 2016

beniz mentioned this issue Aug 7, 2016

LSTM Support #163

Closed

beniz changed the title ~~Recurrent neural layers support (RNN, LSTM) via Caffe backend~~ Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) Dec 12, 2017

Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) #140

Recurrent neural layers support (RNN, LSTM) via Caffe backend (Direct inputs to LSTM) #140

Comments

beniz commented Jun 2, 2016

beniz commented Jun 25, 2016

beniz commented Sep 4, 2016

kyrs commented Sep 5, 2016

beniz commented Sep 5, 2016

kyrs commented Sep 6, 2016

kyrs commented Sep 6, 2016

beniz commented Sep 6, 2016 • edited Loading

kyrs commented Sep 14, 2016

beniz commented Sep 15, 2016

kyrs commented Sep 20, 2016

beniz commented Sep 21, 2016

kyrs commented Sep 28, 2016

kyrs commented Oct 5, 2016

beniz commented Oct 6, 2016

kyrs commented Oct 17, 2016

beniz commented Oct 17, 2016

kyrs commented Oct 19, 2016

beniz commented Oct 20, 2016

kyrs commented Oct 22, 2016 • edited Loading

beniz commented Oct 22, 2016

kyrs commented Oct 23, 2016

beniz commented Oct 24, 2016

kyrs commented Oct 24, 2016

beniz commented Oct 24, 2016

kyrs commented Oct 24, 2016

beniz commented Oct 24, 2016

beniz commented Oct 25, 2016

kyrs commented Oct 25, 2016

kyrs commented Oct 27, 2016

beniz commented Oct 27, 2016 • edited Loading

kyrs commented Oct 27, 2016

beniz commented Oct 27, 2016

kyrs commented Oct 27, 2016

kyrs commented Oct 28, 2016 • edited Loading

beniz commented Oct 29, 2016

hakimkhalafi commented Mar 13, 2017

beniz commented Mar 13, 2017

divamgupta commented Dec 11, 2017

beniz commented Dec 12, 2017

soulslicer commented Mar 6, 2018

beniz commented Mar 6, 2018

cuixing158 commented Jun 12, 2018

beniz commented Jun 12, 2018

beniz commented Sep 6, 2016 •

edited

Loading

kyrs commented Oct 22, 2016 •

edited

Loading

beniz commented Oct 27, 2016 •

edited

Loading

kyrs commented Oct 28, 2016 •

edited

Loading