Utility Mill

Porter_Stemming_Algorithm

Strip endings of words to get to their root e.g., running -> run


Output


Instructions / Discussion

[zasf says:] Can you add support for other languages other than english?

Martin Porter says:

The Porter stemming algorithm (or �Porter stemmer�) is a process for removing the commoner morphological and inflexional endings from words in English. Its main use is as part of a term normalisation process that is usually done when setting up Information Retrieval systems.

Here is what Wikipedia has to say on the subject:

Stemming is the process for reducing inflected (or sometimes derived) words to their stem, base or root form � generally a written word form. The stem need not be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. The algorithm has been a long-standing problem in computer science; the first paper on the subject was published in 1968. The process of stemming, often called conflation, is useful in search engines for query expansion or indexing and other natural language processing problems. Stemming programs are commonly referred to as stemming algorithms or stemmers.

Coded originally by Vivake Gupta

There is another web based implementation of this algorithm here.

Utility Mill is another wonderful Blended Technologies project.

copyright, owned and operated by Blended Technologies LLC.

Powered by Python and the ineffable Web.py