Aho–Corasick Algorithm

Research of strings in a text

Aho–Corasick Algorithm


In computer science, the Aho–Corasick algorithm is a string searching algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all strings simultaneously. The complexity of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches.

During the courses on graphs (in October 2015), we discussed the trees. So we have seen in which situations they can be used and how they work in general. Searching for strings in a text was one of these different situations.

We were asked, after a complete study of the trees, to carry out the Aho-Corasick algorithm by implementing a tree and performing various operations that could facilitate the search (adding the words to be searched in a field, returning the line where is the word in the text...). The programming language being free, my teammate and I chose to make it in Java.

We returned our work accompanied with a graphical interface (allowing to enter the words to be searched and the text) but also with a Java documentation (which you can find here).

Informations about the project


Title Aho–Corasick Algorithm
Description Implementation of the Aho–Corasick Algorithm with GUI.
Used Language Java
Year of work 2015
Download
(Windows)*
Link
Download
(Mac OS et Linux)*
Link
Access to code Link
Documentation Link

*The implementation of the program requires the Java environment