Romain Semler | Aho-Corasick

Aho–Corasick Algorithm

Research of strings in a text

Aho–Corasick Algorithm

In computer science, the Aho–Corasick algorithm is a string searching algorithm invented by Alfred V. Aho and Margaret J. Corasick. It is a kind of dictionary-matching algorithm that locates elements of a finite set of strings (the "dictionary") within an input text. It matches all strings simultaneously. The complexity of the algorithm is linear in the length of the strings plus the length of the searched text plus the number of output matches.

During the courses on graphs (in October 2015), we discussed the trees. So we have seen in which situations they can be used and how they work in general. Searching for strings in a text was one of these different situations.

We were asked, after a complete study of the trees, to carry out the Aho-Corasick algorithm by implementing a tree and performing various operations that could facilitate the search (adding the words to be searched in a field, returning the line where is the word in the text...). The programming language being free, my teammate and I chose to make it in Java.

We returned our work accompanied with a graphical interface (allowing to enter the words to be searched and the text) but also with a Java documentation (which you can find here).

Informations about the project

Title	Aho–Corasick Algorithm
Description	Implementation of the Aho–Corasick Algorithm with GUI.
Used Language	Java
Year of work	2015
Download (Windows)*	Link
Download (Mac OS et Linux)*	Link
Access to code	Link
Documentation	Link

*The implementation of the program requires the Java environment