Skip to content

Commit f4aed5b

Browse files
authored
Merge pull request #369 from heinpa/extend-mt-components
Extend Qanary Python MT components for multiple source and target languages
2 parents 868b113 + 2a8d949 commit f4aed5b

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+2465
-778
lines changed
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,21 @@
1-
FROM python:3.7
1+
FROM python:3.10
22

33
COPY requirements.txt ./
44

55
RUN pip install --upgrade pip
6-
RUN pip install -r requirements.txt; exit 0
7-
RUN pip install gunicorn
6+
RUN pip install -r requirements.txt
87

98
COPY component component
9+
COPY utils utils
1010
COPY run.py boot.sh ./
1111

12+
# to allow preconfigured images
13+
ARG SOURCE_LANGUAGE
14+
ARG TARGET_LANGUAGE
15+
16+
ENV SOURCE_LANGUAGE=$SOURCE_LANGUAGE
17+
ENV TARGET_LANGUAGE=$TARGET_LANGUAGE
18+
1219
RUN chmod +x boot.sh
1320

1421
ENTRYPOINT ["./boot.sh"]

qanary-component-MT-Python-HelsinkiNLP/README.md

+36-9
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,9 @@ SPRING_BOOT_ADMIN_CLIENT_INSTANCE_SERVICE-BASE-URL=http://public-component-host:
5454
SPRING_BOOT_ADMIN_USERNAME=admin
5555
SPRING_BOOT_ADMIN_PASSWORD=admin
5656
SERVICE_NAME_COMPONENT=MT-Helsinki-NLP
57-
SERVICE_DESCRIPTION_COMPONENT=Translates question to English
57+
SERVICE_DESCRIPTION_COMPONENT=Translates questions
5858
SOURCE_LANGUAGE=de
59+
TARGET_LANGUAGE=en
5960
```
6061

6162
The parameters description:
@@ -68,7 +69,8 @@ The parameters description:
6869
* `SPRING_BOOT_ADMIN_CLIENT_INSTANCE_SERVICE-BASE-URL` -- the URL of your Qanary component (has to be visible to the Qanary pipeline)
6970
* `SERVICE_NAME_COMPONENT` -- the name of your Qanary component (for better identification)
7071
* `SERVICE_DESCRIPTION_COMPONENT` -- the description of your Qanary component
71-
* `SOURCE_LANGUAGE` -- (optional) the source language of the text (the component will use langdetect if no source language is given)
72+
* `SOURCE_LANGUAGE` -- (optional) the default source language of the translation
73+
* `TARGET_LANGUAGE` -- (optional) the default target language of the translation
7274

7375
4. Build the Docker image:
7476

@@ -82,18 +84,43 @@ docker-compose build .
8284
docker-compose up
8385
```
8486

85-
After execution, component creates Qanary annotation in the Qanary triplestore:
87+
After successful execution, component creates Qanary annotation in the Qanary triplestore:
8688
```
8789
GRAPH <uuid> {
88-
?a a qa:AnnotationOfQuestionLanguage .
89-
?a qa:translationResult "translation result" .
90-
?a qa:sourceLanguage "ISO_639-1 language code" .
91-
?a oa:annotatedBy <urn:qanary:app_name> .
92-
?a oa:annotatedAt ?time .
93-
}
90+
?a a qa:AnnotationOfQuestionTranslation .
91+
?a oa:hasTarget <urn:myQanaryQuestion> .
92+
?a oa:hasBody "translation_result"@ISO_639-1 language code
93+
?a oa:annotatedBy <urn:qanary:app_name> .
94+
?a oa:annotatedAt ?time .
9495
}
9596
```
9697

98+
### Support for multiple Source and Target Languages
99+
100+
This component relies on the presence of one of more existing annotations that associate a question text with a language.
101+
This can be in the form of an `AnnotationOfQuestionLanguage`, as created by LD components, or an `AnnotationOfQuestionTranslation` as created by MT components.
102+
103+
It supports multiple combinations of source and target languages.
104+
You can specify a desired source and target language independently, or simply use all available language pairings.
105+
106+
If a `SOURCE_LANGUAGE` is set, then only texts with this specific language are considered for translation.
107+
If none is set, then all configured source languages will be used to find candidates for translation.
108+
109+
Similarily, if a `TARGET_LANGUAGE` is set, then texts are only translated into that language.
110+
If none is set, then the texts are translated into all target languages that are supported for their respective source language.
111+
112+
Note that while configured source languages naturally determine the possible target languages,
113+
the configured target languages also determine which source languages can be supported!
114+
115+
### Pre-configured Docker Images
116+
117+
You may use the included file `docker-compose-pairs.yml` to build a list of images that are preconfigured for specific language pairs.
118+
Note that if you intend to use these containers at the same time, you need to assign different `SERVER_PORT` values for each image.
119+
120+
```bash
121+
docker-compose -f docker-compose-pairs.yml build
122+
```
123+
97124
## How To Test This Component
98125

99126
This component uses the [pytest](https://docs.pytest.org/).
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,33 @@
1-
#!/bin/sh
1+
#!/bin/bash
2+
export $(grep -v "^#" < .env)
23

4+
# check required parameters
5+
declare -a required_vars=(
6+
"SPRING_BOOT_ADMIN_URL"
7+
"SERVER_HOST"
8+
"SERVER_PORT"
9+
"SPRING_BOOT_ADMIN_USERNAME"
10+
"SPRING_BOOT_ADMIN_PASSWORD"
11+
"SERVICE_NAME_COMPONENT"
12+
"SERVICE_DESCRIPTION_COMPONENT"
13+
)
314

4-
export $(grep -v '^#' .env | xargs)
15+
for param in ${required_vars[@]};
16+
do
17+
if [[ -z ${!param} ]]; then
18+
echo "Required variable \"$param\" is not set!"
19+
echo "The required variables are: ${required_vars[@]}"
20+
exit 4
21+
fi
22+
done
523

6-
echo Downloading the model
7-
python -c "from transformers.models.marian.modeling_marian import MarianMTModel; from transformers.models.marian.tokenization_marian import MarianTokenizer; supported_langs = ['ru', 'es', 'de', 'fr']; models = {lang: MarianMTModel.from_pretrained('Helsinki-NLP/opus-mt-{lang}-en'.format(lang=lang)) for lang in supported_langs}; tokenizers = {lang: MarianTokenizer.from_pretrained('Helsinki-NLP/opus-mt-{lang}-en'.format(lang=lang)) for lang in supported_langs}"
8-
echo Downloading the model finished
24+
echo Downloading the models
25+
26+
python -c "from utils.model_utils import load_models_and_tokenizers; SUPPORTED_LANGS = { 'en': ['de', 'fr', 'ru', 'es'], 'de': ['en', 'fr', 'es'], 'fr': ['en', 'de', 'ru', 'es'], 'ru': ['en', 'fr', 'es'], 'es': ['en', 'de', 'fr', 'es'], }; load_models_and_tokenizers(SUPPORTED_LANGS); "
927

28+
echo Downloading the model finished
1029

1130
echo The port number is: $SERVER_PORT
31+
echo The host is: $SERVER_HOST
1232
echo The Qanary pipeline URL is: $SPRING_BOOT_ADMIN_URL
13-
if [ -n $SERVER_PORT ]
14-
then
15-
exec gunicorn -b :$SERVER_PORT --access-logfile - --error-logfile - run:app # refer to the gunicorn documentation for more options
16-
fi
33+
exec uvicorn run:app --host 0.0.0.0 --port $SERVER_PORT --log-level warning
Original file line numberDiff line numberDiff line change
@@ -1,27 +1,33 @@
1-
from component.mt_helsinki_nlp import mt_helsinki_nlp_bp
2-
from flask import Flask
1+
from component import mt_helsinki_nlp
2+
from fastapi import FastAPI
3+
from fastapi.responses import RedirectResponse, Response
34

4-
version = "0.1.2"
5+
version = "0.2.0"
56

67
# default config file (use -c parameter on command line specify a custom config file)
78
configfile = "app.conf"
89

910
# endpoint for health information of the service required for Spring Boot Admin server callback
10-
healthendpoint = "/health"
11-
12-
aboutendpoint = "/about"
11+
HEALTHENDPOINT = "/health"
12+
ABOUTENDPOINT = "/about"
13+
# TODO: add languages endpoint?
1314

1415
# initialize Flask app and add the externalized service information
15-
app = Flask(__name__)
16-
app.register_blueprint(mt_helsinki_nlp_bp)
16+
app = FastAPI(docs_url="/swagger-ui.html")
17+
app.include_router(mt_helsinki_nlp.router)
18+
19+
20+
@app.get("/")
21+
async def main():
22+
return RedirectResponse("/about")
1723

1824

19-
@app.route(healthendpoint, methods=['GET'])
25+
@app.get(HEALTHENDPOINT, description="Shows the status of the component")
2026
def health():
2127
"""required health endpoint for callback of Spring Boot Admin server"""
22-
return "alive"
28+
return Response("alive", media_type="text/plain")
2329

24-
@app.route(aboutendpoint, methods=['GET'])
30+
@app.get(ABOUTENDPOINT, description="Shows a description of the component")
2531
def about():
26-
"""required about endpoint for callback of Spring Boot Admin server"""
27-
return "about"
32+
"""required about endpoint for callback of Srping Boot Admin server"""
33+
return Response("Translates questions into English", media_type="text/plain")

0 commit comments

Comments
 (0)