Re: Search pattern for non-ASCII alphabetic characters
by Janis Papanagnou <Janis_Papanagnou@[EMAIL PROTECTED]
>
Feb 3, 2008 at 07:19 PM
Hermann Peifer wrote:
> Hi,
>
> Occasionally, I'd like to search for non-ASCII alphabetic characters in
> UTF-8 encoded text documents.
>
> In the absence of an appropriate character class (at least I wouldn't
> know of any), I do something like:
>
> awk '/[ÀÁÂÃÄÅ ...and so on... ŸŹźŻżŽž]/{ action }'
>
> This is perhaps not the smartest solution. Any better idea?
>
> TIA. Hermann
I can't tell if it is a smarter solution but you could use the inverse
logic based on the existing character classes...
LANG=C awk '/[^[:alnum:][:punct:][:blank:][:cntrl:]]/'
(Note: there's also the ANSI character class [:ascii:] but my GNU awk
seems to not support it.)
Janis