ez-gimpy-recognizer

Recognizing the word contained in a EZ-Gimpy captcha image.

The images and the files with words can be downloaded from here.

Usage

Note: The python-levenshtein package must be installed in order to run the recognizer or the tests. This can be done with the pip install command.

pip install python-levenshtein

If you are having trouble installing the package, you can download the binaries from here. Make sure to download the correct version, specific to the platform you are using. The installation is done in the same way, just specify the route to the binaries in the argument

To run the recognizer on a single image, specify the relative route to the image in the argument. For an example:

python Recognizer.py Dataset/001.jpg

The calculated words will appear on the terminal display.

Running without any specified arguments means that all images from the Dataset folder will be processed and the results will be placed in the out.txt file (make sure to download the Dataset folder first, using the link mentioned in the beginning).

About

The process of calculating the most probable words is divided in three steps:

Image processing - remove the noise and extract the estimated regions that contain letters
Prediction - predict a letter for every region using the K-nearest neighbour algorithm
Find probable words - using the Levenshtein distance, find the most probable words that can be found in the word_collection.txt file using the calculated letters and their order

Testing

All of the testing is done referencing the correct_words.txt file that contains the correct words for every image in the Dataset folder. The data that is being tested should be in the out.txt file.

There are three results of the test:

The number of correctly predicted words - checks whether the correct word is contained in the set of most probable words
The average ratio using the Levenshtein distance between the correct words and a word with the highest success ratio from the respectful set of most probable words
The average ratio using the Levenshtein distance between the correct words and all words from the respectful set of most probable words

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Poster za soft.pdf		Poster za soft.pdf
README.md		README.md
Recognizer.py		Recognizer.py
Test.py		Test.py
out.txt		out.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ez-gimpy-recognizer

Usage

About

Testing

About

Releases

Packages

Languages

VrsajkovIvan33/ez-gimpy-recognizer

Folders and files

Latest commit

History

Repository files navigation

ez-gimpy-recognizer

Usage

About

Testing

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages