giugy wrote:
> Someone knows where I can find the Keyword Extractor source code
> written in java? A software that analyzes a text and extract the
> keyword of the text (the most present words in the text....for example
> the word "hello" is present forty times,the word "thanks" is present
> thirty times....).
> I need to see the software's source code written in java in order to
> understand as it works....
It is very easy to write in Java.
First read a line and extract words using StringTokenizer. Then
use a Hashtable to find out if you have seen that word before.
If so, increment a counter. If not, add it to the Hashtable with
a count of 1. I store a long[] in the hashtable for convenience
in incrementing, but others will do something different.
One trick, though. After you extract words with StringTokenizer and
find they are not in the table, create a new String to store the
reference in the hash table. If you don't it will take up too much
memory, as the whole line of characters is stored for each word.
After you finish reading the file, go through the Hashtable,
extract words and counts, and print them out.
It should not take long at all to write.
-- glen


|