Aho–Corasick Algorithm
Research of strings in a text
Aho–Corasick Algorithm
In computer science, the Aho–Corasick algorithm is a string searching algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all strings simultaneously. The complexity of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches.
During the courses on graphs (in October 2015), we discussed the trees. So we have seen in which situations they can be used and how they work in general. Searching for strings in a text was one of these different situations.
We were asked, after a complete study of the trees, to carry out the Aho-Corasick algorithm by implementing a tree and performing various operations that could facilitate the search (adding the words to be searched in a field, returning the line where is the word in the text...). The programming language being free, my teammate and I chose to make it in Java.
We returned our work accompanied with a graphical interface (allowing to enter the words to be searched and the text) but also with a Java documentation (which you can find here).
Informations about the project
| Title | Aho–Corasick Algorithm |
| Description | Implementation of the Aho–Corasick Algorithm with GUI. |
| Used Language | Java |
| Year of work | 2015 |
| Download (Windows)* |
Link |
| Download (Mac OS et Linux)* |
Link |
| Access to code | Link |
| Documentation | Link |
*The implementation of the program requires the Java environment