This project is part of BGU University's Distributed System Programming course, Assignment 2.
The project is map-reduce algorithm.
Implementation with Java, Amazon Web Services (AWS) and Hadoop framework.
Instructions Assignment 2
In this assignment you will generate a knowledge-base for Hebrew word-prediction system, based on
Google 3-Gram Hebrew dataset, using Amazon Elastic Map-Reduce (EMR).
Outputs Examples
- Configure AWS credentials in your machine.
- Create
S3 bucket
with the name specified atApp
line 25. - Create a
jar
for each step (5 steps). When creating aJAR
file, ensure that theMETA-INF/MANIFEST.MF
file specifies the appropriatemain class
. - Using the file system change the name of the
jars
to:Step1
,Step2
... (exact name) - At the
S3 bucket
create ajar
folder. - Upload the
jars
to<bucketName>/jars
. - For Demo:
arbix.txt
file is in the<bucketName>
. This file used as an example input. - Run
App
. - Output will be in
<bucketName>/outputs/
after a successful run.
Bucket Structure At Start:
Bucket Jars Structure At Start:
Note: make sure that the S3 bucket doesn't include output
or log
folder pre-run.