Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Java Machine > Re: Keyword ext...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 5 of 9 Topic 745 of 843
Post > Topic >>

Re: Keyword extractor's source code....where I can find it???

by "giugy" <matteozatto@[EMAIL PROTECTED] > Jan 16, 2007 at 09:20 AM

Yes, I have found a code like this....

im****t java.io.*;
im****t java.util.*;

class Counter implements Comparable {
  private String word;
  private int count;
  public Counter(String word) {
    this.word = word;
    count = 1;
  }
  public void increment() { count++; }
  public String toString() {
    return "\n" + word + " [" + count + "]";
  }
  public boolean equals(Object obj) {
    return obj instanceof Counter &&
      ((Counter)obj).word.equals(word);
  }
  public int hashCode() {
    return word.hashCode();
  }
  public int compareTo(Object o) {
    return word.compareTo(((Counter)o).word);
  }
}

class CounterSet extends AbstractSet {
  private Map set = new TreeMap();
  public void addOrIncrement(String s) {
    Counter c = new Counter(s);
    if (set.containsKey(c))
      ((Counter)set.get(c)).increment();
    else
      set.put(c, c);
  }
  public Iterator iterator() {
    return set.keySet().iterator();
  }
  public int size() {
    return set.size();
  }
  public String toString() {
    return set.keySet().toString();
  }
}

class WordCount {
  private FileReader file;
  private StreamTokenizer st;

  private CounterSet counts = new CounterSet();
  WordCount(String filename)
    throws FileNotFoundException {
    try {
      file = new FileReader(filename);
      st = new StreamTokenizer(
        new BufferedReader(file));
      st.ordinaryChar('.');
      st.ordinaryChar('-');
	st.lowerCaseMode(true);

    } catch(FileNotFoundException e) {
      System.err.println(
        "Could not open " + filename);
      throw e;
    }
  }
  void cleanup() {
    try {
      file.close();
    } catch(IOException e) {
      System.err.println(
        "file.close() unsuccessful");
    }
  }
  void countWords() {
    try {
      while(st.nextToken() !=
        StreamTokenizer.TT_EOF) {
        String s = "a";
        switch(st.ttype) {
          case StreamTokenizer.TT_EOL:
            s = new String("EOL");
            break;

          case StreamTokenizer.TT_NUMBER:
        //    s = Double.toString(st.nval);
            break;

          case StreamTokenizer.TT_WORD:
            s = st.sval;
            break;
          default: // single character in ttype
            s = String.valueOf((char)st.ttype);
        }

        if(s.length() > 3)
            counts.addOrIncrement(s);
      }
    } catch(IOException e) {
      System.err.println(
        "st.nextToken() unsuccessful");
    }
  }
  public Iterator iterator() {
    return counts.iterator();
  }
  public String toString() {
    return counts.toString();
  }
}

public class KeyWordExtractor {
  public static void main(String[] args)
  throws FileNotFoundException {
    for(int i = 0; i < args.length; i++){
	    WordCount wc =  new WordCount(args[i]);
	    wc.countWords();
	    System.out.println("WORD = " + wc);
	    wc.cleanup();
    }
  }
}


and it give me to occurrency of every world in the text...in example if
i give in input a text like (a stupid example) "java function java
library function java"  in output I obtain WORD = [function[2] ,
java[3] , library[1]] ....that are the occurrences of the word in the
text,but my problem is that I need in output not all the word of the
text...but only the the word that appears many times in the text...in
this case java that is the keyword of the text....WORD = [java]

I know that there is still little code to write,but I do not know well
java and so I don't succeed to write it!!!
Please Help me....THANKS!!!

glen herrmannsfeldt ha scritto:

> giugy wrote:
>
> > Someone knows where I can find the Keyword Extractor source code
> > written in java? A software that analyzes a text and extract the
> > keyword of the text (the most present words in the text....for example
> > the word "hello" is present forty times,the word "thanks" is present
> > thirty times....).
>
> > I need to see the software's source code written in java in order to
> > understand as it works....
>
> It is very easy to write in Java.
>
> First read a line and extract words using StringTokenizer.  Then
> use a Hashtable to find out if you have seen that word before.
> If so, increment a counter.  If not, add it to the Hashtable with
> a count of 1.   I store a long[] in the hashtable for convenience
> in incrementing, but others will do something different.
>
> One trick, though.  After you extract words with StringTokenizer and
> find they are not in the table, create a new String to store the
> reference in the hash table.  If you don't it will take up too much
> memory, as the whole line of characters is stored for each word.
>
> After you finish reading the file, go through the Hashtable,
> extract words and counts, and print them out.
> 
> It should not take long at all to write.
> 
> -- glen
 




 9 Posts in Topic:
Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-11 07:59:17 
Re: Keyword extractor's source code....where I can find it???
glen herrmannsfeldt <g  2007-01-11 17:57:33 
Re: Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-16 09:20:23 
Re: Keyword extractor's source code....where I can find it???
glen herrmannsfeldt <g  2007-01-16 23:15:15 
Re: Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-16 09:20:25 
Re: Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-17 01:15:50 
Re: Keyword extractor's source code....where I can find it???
glen herrmannsfeldt <g  2007-01-17 01:25:31 
Re: Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-17 01:15:51 
Re: Keyword extractor's source code....where I can find it???
"giugy" <mat  2007-01-17 01:15:50 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Wed Dec 3 15:24:51 CST 2008.