Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Spell Correct i...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 1 Topic 2221 of 2241
Post > Topic >>

Spell Correct in Gawk (just 15 lines)

by Tiago Peczenyj <tiago.peczenyj@[EMAIL PROTECTED] > Apr 13, 2008 at 12:26 PM

Based on : http://norvig.com/spell-correct.html
and http://raelcunha.com/spell-correct.php

I wrote this simple gawk script [ http://pastie.caboo.se/pastes/180039
]


Usage: gawk -v word=something -f thisfile.awk [ big.txt
[ big2.txt ... ]]

Example:
$ time gawk -v word=somethink -f spelling.awk
correct(somethink)=> something

real    0m4.862s
user    0m4.702s
sys     0m0.093s

Source Code:

# Usage: gawk -v word=some_word_to_verify -f spelling.awk [ big.txt
[ big2.txt ... ]]
# Gawk version with 15 lines -- 04/13/2008
# Author: tiago (dot) peczenyj (at) gmail (dot) com
# Based on : http://norvig.com/spell-correct.html
function edits(w,max,candidates,list,        i,j){
       for(i=0;i<  max ;++i) ++list[substr(w,0,i) substr(w,i+2)]
       for(i=0;i< max-1;++i) ++list[substr(w,0,i) substr(w,i+2,1)
substr(w,i+1,1) substr(w,i+3)]
       for(i=0;i<  max ;++i) for(j in alpha) ++list[substr(w,0,i)
alpha[j] substr(w,i+2)]
       for(i=0;i<= max ;++i) for(j in alpha) ++list[substr(w,0,i)
alpha[j] substr(w,i+1)]
       for(i in list) if(i in NWORDS) candidates[i] = NWORDS[i] }

function correct(word            ,candidates,i,list,max,temp){
       edits(word,length(word),candidates,list)
       if (!asort(candidates,temp)) for(i in list)
edits(i,length(i),candidates)
       return (max = asorti(candidates)) ? candidates[max] : word }

BEGIN{ if (ARGC == 1) ARGV[ARGC++] = "big.txt" # http://norvig.com/big.txt
       while(++i<=length(x="abcdefghijklmnopqrstuvwxyz"))
alpha[i]=substr(x,i,1)
       IGNORECASE=RS="[^"x"]+" }

{      ++NWORDS[tolower($1)]   }

END{   print (word in NWORDS) ? word : "correct("word")=> "
correct(tolower(word)) }

Enjoy :)




 1 Posts in Topic:
Spell Correct in Gawk (just 15 lines)
Tiago Peczenyj <tiago.  2008-04-13 12:26:15 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat May 17 14:04:43 CDT 2008.