Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: OT: Gawk ma...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 45 of 45 Topic 2231 of 2317
Post > Topic >>

Re: OT: Gawk match() and numbers in scientific notation

by Hermann Peifer <peifer@[EMAIL PROTECTED] > May 13, 2008 at 03:41 AM

On May 10, 8:10=A0pm, Hermann Peifer <pei...@[EMAIL PROTECTED]
> wrote:
>
> $ cat collating_chars.sh
> #!/bin/bash
> # collate sequence of characters which belong
> # to a given character range or class
> # based on Ed's and Steffen's code
> # seems to work for Unicode locales
> #
> # usage:
> # collating_chars.sh [A-Z]
> # collating_chars.sh [[:upper:]]
>
> gawk 'BEGIN {
>
> =A0 =A0 =A0for (i=3D0;i<=3D32767;i++) {
>
> =A0 =A0 =A0 =A0 =A0num =3D sprintf("%X", i)
> =A0 =A0 =A0 =A0 =A0l =A0 =3D length(num)
>
> =A0 =A0 =A0 =A0 =A0# construct a format that is
> =A0 =A0 =A0 =A0 =A0# understood by /usr/bin/printf
>
> =A0 =A0 =A0 =A0 =A0if (i < 16)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num =3D "\\\\x0" num
> =A0 =A0 =A0 =A0 =A0else if (i < 128)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num =3D "\\\\x" num
> =A0 =A0 =A0 =A0 =A0else if (l =3D=3D 2)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num =3D "\\\\u00" num
> =A0 =A0 =A0 =A0 =A0else if (l =3D=3D 3)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num =3D "\\\\u0" num
> =A0 =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0num =3D "\\\\u" num
>
> =A0 =A0 =A0 =A0 =A0# exclude C1 control chars
> =A0 =A0 =A0 =A0 =A0# as printf doesnt like them
>
> =A0 =A0 =A0 =A0 =A0if (i < 128 || i > 159)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0print num
> =A0 =A0 =A0}
>
> }' |
>
> # Print Unicode chars with /usr/bin/printf
> while read num ; do /usr/bin/printf "$num\n" ; done =A0| sort |
>
> # Collate characters which are matched by the given range or class
> gawk -v re=3D$1 '$1 ~ re { s =3D s $1 } END { print s }'


For those who might be interested: here another variation of the
script. Reading an additional local file is perhaps not the smartest
solution, but guarantees that all relevant Unicode code points are
covered. (I promise this to be my last OT posting in this thread ;-)

Hermann

$ cat collate_rechars.sh
#!/bin/bash
# Collate a sorted list of chars that belong to a range or class
# Based on Ed's and Steffen's code, works only for Unicode locales
#
# Usage:
# LC_ALL=3Den_GB.UTF-8 collate_rechars.sh [A-Z]
# LC_ALL=3Dda_DK.UTF-8 collate_rechars.sh [[:upper:]]
#
# For this script you need to have a local copy of
# http://www.unicode.org/Public/UNIDATA/UnicodeData.txt

gawk -F";" '

# Construct a format for chars in ASCII range
#
NR < 129 { print "\\\\x" substr($1,3,2) ; next }

# Construct a format for other code points in UnicodeData.txt
# Exclude C1 control chars and some other code points
# as usr/bin/printf re****ts errors for them
#
NR > 160 && $1 !~ /^(D800|DB7F|DB80|DBFF|DC00|DFFF)$/ {
        printf "\\\\U%08s\n", $1 }' UnicodeData.txt |

# Print chars with /usr/bin/printf and sort them
#
while read f ; do /usr/bin/printf "$f\n" ; done | sort |

# Collate chars for the given character range or class
#
gawk -v re=3D"$1" '$1 ~ re { s =3D s $1 } END { print re "=3D" s }'
 




 45 Posts in Topic:
Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-06 04:16:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-06 13:28:06 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 07:11:38 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 09:18:57 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 19:50:11 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 13:03:32 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 20:39:44 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:48:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 19:21:58 
Re: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-07 07:59:10 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:20:16 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 17:25:24 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:37:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 18:04:24 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 11:16:35 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 20:27:53 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-24 18:25:58 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:49:51 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 13:16:24 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 11:25:06 
[OT] collating sequences: using glibc
Steffen Schuler <schul  2008-05-09 08:51:38 
Re: [OT] collating sequences: using glibc
pk <pk@[EMAIL PROTECTE  2008-05-09 10:32:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 06:58:39 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 16:22:59 
OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 08:46:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 18:11:28 
Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-08 22:29:32 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 22:49:38 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-09 09:44:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-09 10:24:00 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 09:45:28 
[OT] Re: OT: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-09 02:08:34 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 10:58:52 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:52:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:55:35 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:10:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:31:22 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Steffen Schuler <schul  2008-05-10 21:56:00 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 23:14:44 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Cesar Rabak <csrabak@[  2008-05-11 10:50:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-11 17:27:57 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Cesar Rabak <csrabak@[  2008-05-16 12:11:13 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-11 11:17:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-10 15:07:10 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-13 03:41:09 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 0:43:54 CDT 2008.