Skip to content

DSP Assignment 2 in Java, by Amit and Naveh. Scalable knowledge-base for word prediction using EMR, Hadoop, S3, to handle large input of text.

Notifications You must be signed in to change notification settings

AmitNG2000/AWS-EMR-Knowledge-Base

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


README

Backround

This project is part of BGU University's Distributed System Programming course, Assignment 2.
The project is map-reduce algorithm.
Implementation with Java, Amazon Web Services (AWS) and Hadoop framework.
Instructions Assignment 2

Overview

In this assignment you will generate a knowledge-base for Hebrew word-prediction system, based on Google 3-Gram Hebrew dataset, using Amazon Elastic Map-Reduce (EMR).
Outputs Examples

How to run

  1. Configure AWS credentials in your machine.
  2. Create S3 bucket with the name specified at App line 25.
  3. Create a jar for each step (5 steps). When creating a JAR file, ensure that the META-INF/MANIFEST.MF file specifies the appropriate main class.
  4. Using the file system change the name of the jars to: Step1, Step2 ... (exact name)
  5. At the S3 bucket create a jar folder.
  6. Upload the jars to <bucketName>/jars.
  7. For Demo: arbix.txt file is in the <bucketName>. This file used as an example input.
  8. Run App.
  9. Output will be in <bucketName>/outputs/ after a successful run.

Bucket Structure At Start:

BucketStructure

Bucket Jars Structure At Start: Bucket

Note: make sure that the S3 bucket doesn't include output or log folder pre-run.

Logic Diagram

Diagram PDF
draw.io Diagram

Requested report

Step 5 final output

Final step5 output file

Step 5 final output

About

DSP Assignment 2 in Java, by Amit and Naveh. Scalable knowledge-base for word prediction using EMR, Hadoop, S3, to handle large input of text.

Topics

Resources

Stars

Watchers

Forks