sebastianGehrmann
diff --git a/‎.gitignore
+109 b/‎.gitignore
+109
diff --git a/‎.travis.yml
+34 b/‎.travis.yml
+34
diff --git a/‎CONTRIBUTORS.md
+11 b/‎CONTRIBUTORS.md
+11
diff --git a/‎Dockerfile
+2 b/‎Dockerfile
+2
diff --git a/‎LICENSE.md
+22 b/‎LICENSE.md
+22
diff --git a/‎README.md
+185 b/‎README.md
+185
diff --git a/‎docs/README.md
+28 b/‎docs/README.md
+28
@@ -0,0 +1,109 @@
+# repo-specific stuff
+pred.txt
+multi-bleu.perl
+*.pt
+\#*#
+.idea
+*.sublime-*
+.DS_Store
+data/
+
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
@@ -0,0 +1,34 @@
+language: python
+python:
+  - "2.7"
+  - "3.5"
+
+install:
+  - sudo apt-get update
+  - wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
+  - bash miniconda.sh -b -p $HOME/miniconda
+  - export PATH="$HOME/miniconda/bin:$PATH"
+  - hash -r
+  - conda config --set always_yes yes --set changeps1 no
+  - conda update -q conda
+  # Useful for debugging any issues with conda
+  - conda info -a
+  # freeze the supported pytorch version for consistency
+  - conda create -q -n test-environment python=$TRAVIS_PYTHON_VERSION pytorch=0.2.0 -c soumith
+  - source activate test-environment
+  # use requirements.txt for dependencies
+  - pip install -r requirements.txt
+  - python setup.py install
+
+script:
+  - python -m unittest discover
+  - python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data /tmp/data -src_vocab_size 1000 -tgt_vocab_size 1000
+  - head data/src-test.txt > /tmp/src-test.txt; python translate.py -model test/test_model.pt -src /tmp/src-test.txt -verbose
+  - head data/src-val.txt > /tmp/src-val.txt; head data/tgt-val.txt > /tmp/tgt-val.txt; python preprocess.py -train_src /tmp/src-val.txt -train_tgt /tmp/tgt-val.txt -valid_src /tmp/src-val.txt -valid_tgt /tmp/tgt-val.txt -save_data /tmp/q -src_vocab_size 1000 -tgt_vocab_size 1000; python train.py -data /tmp/q -rnn_size 2 -batch_size 10 -word_vec_size 5 -report_every 5 -rnn_size 10 -epochs 1
+  - python translate.py -model test/test_model2.pt  -src  data/morph/src.valid  -verbose -batch_size 10 -beam_size 10 -tgt data/morph/tgt.valid -out /tmp/trans; diff  data/morph/tgt.valid /tmp/trans
+matrix:
+  include:
+    - env: LINT_CHECK
+      python: "2.7"
+      install: pip install flake8
+      script: flake8
@@ -0,0 +1,11 @@
+OpenNMT-py is a community developed project and we love developer contributions.
+
+Before sending a PR, please do this checklist first:
+
+- Please run `tools/pull_request_chk.sh` and fix any errors. When adding new functionality, also add tests to this script. Included checks:
+    1. flake8 check for coding style;
+    2. unittest;
+    3. continuous integration tests listed in `.travis.yml`.
+- When adding/modifying class constructor, please make the arguments as same naming style as its superclass in pytorch.
+- If your change is based on a paper, please include a clear comment and reference in the code. 
+- If your function takes/returns tensor arguments, please include assertions to document the sizes. See `GlobalAttention.py` for examples. 
@@ -0,0 +1,2 @@
+FROM pytorch/pytorch:latest
+RUN git clone https://github.com/OpenNMT/OpenNMT-py.git && cd OpenNMT-py && pip install -r requirements.txt && python setup.py install
@@ -0,0 +1,22 @@
+This software is derived from the OpenNMT project at 
+https://github.com/OpenNMT/OpenNMT.
+
+The MIT License (MIT)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in
+all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+THE SOFTWARE.
@@ -0,0 +1,185 @@
+# OpenNMT-py: Open-Source Neural Machine Translation
+
+This is a [Pytorch](https://github.com/pytorch/pytorch)
+port of [OpenNMT](https://github.com/OpenNMT/OpenNMT),
+an open-source (MIT) neural machine translation system. It is designed to be research friendly to try out new ideas in translation, summary, image-to-text, morphology, and many other domains.
+
+
+OpenNMT-py is run as a collaborative open-source project. It is currently maintained by [Sasha Rush](http://github.com/srush) (Cambridge, MA), [Ben Peters](http://github.com/bpopeters) (Saarbrücken), and [Jianyu Zhan](http://github.com/jianyuzhan) (Shenzhen). The original code was written by [Adam Lerer](http://github.com/adamlerer) (NYC). Codebase is nearing a stable 0.1 version. We currently recommend forking if you want stable code.
+
+We love contributions. Please consult the Issues page for any [Contributions Welcome](https://github.com/OpenNMT/OpenNMT-py/issues?q=is%3Aissue+is%3Aopen+label%3A%22contributions+welcome%22) tagged post. 
+
+<center style="padding: 40px"><img width="70%" src="http://opennmt.github.io/simple-attn.png" /></center>
+
+
+Table of Contents
+=================
+
+  * [Requirements](#requirements)
+  * [Features](#features)
+  * [Quickstart](#quickstart)
+  * [Advanced](#advanced)
+  * [Citation](#citation)
+ 
+## Requirements
+
+```bash
+pip install -r requirements.txt
+```
+
+
+## Features
+
+The following OpenNMT features are implemented:
+
+- multi-layer bidirectional RNNs with attention and dropout
+- data preprocessing
+- saving and loading from checkpoints
+- Inference (translation) with batching and beam search
+- Context gate
+- Multiple source and target RNN (lstm/gru) types and attention (dotprod/mlp) types
+- TensorBoard/Crayon logging
+- Source word features
+
+Beta Features (committed):
+- multi-GPU
+- Image-to-text processing
+- "Attention is all you need"
+- Copy, coverage
+- Structured attention
+- Conv2Conv convolution model
+- SRU "RNNs faster than CNN" paper
+- Inference time loss functions.
+
+## Quickstart
+
+## Step 1: Preprocess the data
+
+```bash
+python preprocess.py -train_src data/src-train.txt -train_tgt data/tgt-train.txt -valid_src data/src-val.txt -valid_tgt data/tgt-val.txt -save_data data/demo
+```
+
+We will be working with some example data in `data/` folder.
+
+The data consists of parallel source (`src`) and target (`tgt`) data containing one sentence per line with tokens separated by a space:
+
+* `src-train.txt`
+* `tgt-train.txt`
+* `src-val.txt`
+* `tgt-val.txt`
+
+Validation files are required and used to evaluate the convergence of the training. It usually contains no more than 5000 sentences.
+
+
+After running the preprocessing, the following files are generated:
+
+* `demo.src.dict`: Dictionary of source vocab to index mappings.
+* `demo.tgt.dict`: Dictionary of target vocab to index mappings.
+* `demo.train.pt`: serialized PyTorch file containing vocabulary, training and validation data
+
+
+Internally the system never touches the words themselves, but uses these indices.
+
+## Step 2: Train the model
+
+```bash
+python train.py -data data/demo -save_model demo-model
+```
+
+The main train command is quite simple. Minimally it takes a data file
+and a save file.  This will run the default model, which consists of a
+2-layer LSTM with 500 hidden units on both the encoder/decoder. You
+can also add `-gpuid 1` to use (say) GPU 1.
+
+## Step 3: Translate
+
+```bash
+python translate.py -model demo-model_epochX_PPL.pt -src data/src-test.txt -output pred.txt -replace_unk -verbose
+```
+
+Now you have a model which you can use to predict on new data. We do this by running beam search. This will output predictions into `pred.txt`.
+
+!!! note "Note"
+    The predictions are going to be quite terrible, as the demo dataset is small. Try running on some larger datasets! For example you can download millions of parallel sentences for [translation](http://www.statmt.org/wmt16/translation-task.html) or [summarization](https://github.com/harvardnlp/sent-summary).
+
+## Some useful tools:
+
+
+## Full Translation Example
+
+The example below uses the Moses tokenizer (http://www.statmt.org/moses/) to prepare the data and the moses BLEU script for evaluation.
+
+```bash
+wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/tokenizer/tokenizer.perl
+wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.de
+wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/share/nonbreaking_prefixes/nonbreaking_prefix.en
+sed -i "s/$RealBin\/..\/share\/nonbreaking_prefixes//" tokenizer.perl
+wget https://raw.githubusercontent.com/moses-smt/mosesdecoder/master/scripts/generic/multi-bleu.perl
+```
+
+## WMT'16 Multimodal Translation: Multi30k (de-en)
+
+An example of training for the WMT'16 Multimodal Translation task (http://www.statmt.org/wmt16/multimodal-task.html).
+
+### 0) Download the data.
+
+```bash
+mkdir -p data/multi30k
+wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/training.tar.gz &&  tar -xf training.tar.gz -C data/multi30k && rm training.tar.gz
+wget http://www.quest.dcs.shef.ac.uk/wmt16_files_mmt/validation.tar.gz && tar -xf validation.tar.gz -C data/multi30k && rm validation.tar.gz
+wget https://staff.fnwi.uva.nl/d.elliott/wmt16/mmt16_task1_test.tgz && tar -xf mmt16_task1_test.tgz -C data/multi30k && rm mmt16_task1_test.tgz
+```
+
+### 1) Preprocess the data.
+
+```bash
+# Delete the last line of val and training files.
+for l in en de; do for f in data/multi30k/*.$l; do if [[ "$f" != *"test"* ]]; then sed -i "$ d" $f; fi;  done; done
+for l in en de; do for f in data/multi30k/*.$l; do perl tokenizer.perl -a -no-escape -l $l -q  < $f > $f.atok; done; done
+python preprocess.py -train_src data/multi30k/train.en.atok -train_tgt data/multi30k/train.de.atok -valid_src data/multi30k/val.en.atok -valid_tgt data/multi30k/val.de.atok -save_data data/multi30k.atok.low -lower
+```
+
+### 2) Train the model.
+
+```bash
+python train.py -data data/multi30k.atok.low -save_model multi30k_model -gpuid 0
+```
+
+### 3) Translate sentences.
+
+```bash
+python translate.py -gpu 0 -model multi30k_model_*_e13.pt -src data/multi30k/test.en.atok -tgt data/multi30k/test.de.atok -replace_unk -verbose -output multi30k.test.pred.atok
+```
+
+### 4) Evaluate.
+
+```bash
+perl tools/multi-bleu.perl data/multi30k/test.de.atok < multi30k.test.pred.atok
+```
+
+## Pretrained Models
+
+The following pretrained models can be downloaded and used with translate.py (These were trained with an older version of the code; they will be updated soon).
+
+- [onmt_model_en_de_200k](https://drive.google.com/file/d/0B6N7tANPyVeBWE9WazRYaUd2QTg/view?usp=sharing): An English-German translation model based on the 200k sentence dataset at [OpenNMT/IntegrationTesting](https://github.com/OpenNMT/IntegrationTesting/tree/master/data). Perplexity: 20.
+- onmt_model_en_fr_b1M (coming soon): An English-French model trained on benchmark-1M. Perplexity: 4.85.
+
+
+## Citation
+
+[OpenNMT technical report](https://doi.org/10.18653/v1/P17-4012)
+
+```
+@inproceedings{opennmt,
+  author    = {Guillaume Klein and
+               Yoon Kim and
+               Yuntian Deng and
+               Jean Senellart and
+               Alexander M. Rush},
+  title     = {OpenNMT: Open-Source Toolkit for Neural Machine Translation},
+  booktitle = {Proc. ACL},
+  year      = {2017},
+  url       = {https://doi.org/10.18653/v1/P17-4012},
+  doi       = {10.18653/v1/P17-4012}
+}
+```
@@ -0,0 +1,28 @@
+[MkDocs](http://www.mkdocs.org/) is used to generate the documentation at http://opennmt.net/OpenNMT/.
+
+If you want to visualize and deploy the documentation, continue reading the next sections.
+
+## Installation
+
+```bash
+pip install mkdocs mkdocs-material python-markdown-math
+```
+
+## Workflow
+
+1. Edit the Markdown documentation in `docs/`
+2. Visualize the documentation locally with `mkdocs serve`
+3. Commit your documentation changes
+4. Generate and deploy the static website on the `gh-pages` branch with `mkdocs gh-deploy` (if you are testing on a fork, don't forget to configure the remote with the `-r` option)
+
+## Tips
+
+### Adding pages
+
+Update the main configuration file `mkdocs.yml`.
+
+### Generating options listing
+
+```bash
+./docs/options/generate.sh
+```
Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,2 @@`
	`1`	`+FROM pytorch/pytorch:latest`
	`2`	`+RUN git clone https://github.com/OpenNMT/OpenNMT-py.git && cd OpenNMT-py && pip install -r requirements.txt && python setup.py install`