Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Search patt...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 3 of 3 Topic 2150 of 2241
Post > Topic >>

Re: Search pattern for non-ASCII alphabetic characters

by Hermann Peifer <peifer@[EMAIL PROTECTED] > Feb 3, 2008 at 09:42 PM

Janis Papanagnou wrote:
> Hermann Peifer wrote:
>> Hi,
>>
>> Occasionally, I'd like to search for non-ASCII alphabetic characters 
>> in UTF-8 encoded text documents.
>>
>> In the absence of an appropriate character class (at least I wouldn't 
>> know of any), I do something like:
>>
>> awk '/[ÀÁÂÃÄÅ ...and so on... ŸŹźŻżŽž]/{ action }'
>>
>> This is perhaps not the smartest solution. Any better idea?
>>
>> TIA. Hermann
> 
> I can't tell if it is a smarter solution but you could use the inverse
> logic based on the existing character classes...
> 
>   LANG=C  awk '/[^[:alnum:][:punct:][:blank:][:cntrl:]]/'
> 
> (Note: there's also the ANSI character class [:ascii:] but my GNU awk
> seems to not support it.)
> 
> Janis


Thanks for the hint. This pattern also finds:  N°1
Which does not exactly contain a non-ASCII *alphabetic* character, but 
it's still better than the long character list I was using. I can filter 
out some false positives.

Hermann




 3 Posts in Topic:
Search pattern for non-ASCII alphabetic characters
Hermann Peifer <peifer  2008-02-03 18:42:50 
Re: Search pattern for non-ASCII alphabetic characters
Janis Papanagnou <Jani  2008-02-03 19:19:09 
Re: Search pattern for non-ASCII alphabetic characters
Hermann Peifer <peifer  2008-02-03 21:42:45 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sat May 17 3:31:55 CDT 2008.