Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: [OT] Re: OT...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 37 of 43 Topic 2231 of 2236
Post > Topic >>

Re: [OT] Re: OT: Gawk match() and numbers in scientific notation

by Steffen Schuler <schuler.steffen@[EMAIL PROTECTED] > May 10, 2008 at 09:56 PM

Hermann Peifer wrote:
> pk wrote:
>> On Saturday 10 May 2008 10:58, Hermann Peifer wrote:
>>
>> Agreed. That's why I'm trying to find a way to print character classes 
>> and
>> collating sequences, so that when someone comes asking "why doesn't
sort
>> work correctly" or "why doesn't grep/awk/sed. etc. match what I
expect",
>> they can be instructed to run the (non-existent yet) command or script
to
>> see for themselves what the programs' idea of what they waht to do is.
>>
> 
> I combined some code from Ed's and Steffen's scripts and added 
> /usr/bin/printf in the middle part. Unlike bash's builtin printf or 
> gawk's (s)printf: /usr/bin/printf is able to convert Unicode code point 
> values into chars.
> 
> The script is far away from being smart, efficient, or anything like 
> that, but it seems to work with Unicode-aware locales.
> 
> $ LC_ALL=en_GB.UTF-8 ./collating_chars.sh [A-C]
>
AáÁàÀăắằẵẳặĂẮẰẴẲẶâấầẫẩậÂẤẦẪẨẬǎǍåǻÅǺäǟÄǞãÃȧǡȦǠąĄāĀảẢȁȀȃȂạẠḁḀẚªæǽǣÆǼǢbBḃḂḅḄḇḆɓƁcC

> 
> 
> $ LC_ALL=da_DK.UTF-8 ./collating_chars.sh [A-C]
>
AaÁáÀàĂẮẰẴẲẶăắằẵẳặÂẤẦẪẨẬâấầẫẩậǍǎǺǻǞǟÃãȦǠȧǡĄąĀāẢảȀȁȂȃẠạḀḁẚªBbḂḃḄḅḆḇƁɓC
> 
> The example shows that in da_DK.UTF-8 locale, the [A-C] range expands to

>  less characters than in en_GB (and most other) locales. The reason is 
> that A_RING and AE_LIGATURE characters sort after Z, according to Danish

> sorting rules.
> 
> Hermann
> 
> 
> $ cat collating_chars.sh
> #!/bin/bash
> # collate sequence of characters which belong
> # to a given character range or class
> # based on Ed's and Steffen's code
> # seems to work for Unicode locales
> #
> # usage:
> # collating_chars.sh [A-Z]
> # collating_chars.sh [[:upper:]]
> 
> gawk 'BEGIN {
> 
>     for (i=0;i<=32767;i++) {
> 
>         num = sprintf("%X", i)
>         l   = length(num)
> 
>         # construct a format that is
>         # understood by /usr/bin/printf
> 
>         if (i < 16)
>                 num = "\\\\x0" num
>         else if (i < 128)
>                 num = "\\\\x" num
>         else if (l == 2)
>                 num = "\\\\u00" num
>         else if (l == 3)
>                 num = "\\\\u0" num
>         else
>                 num = "\\\\u" num
> 
>         # exclude C1 control chars
>         # as printf doesnt like them
> 
>         if (i < 128 || i > 159)
>                 print num
>     }
> }' |
> 
> # Print Unicode chars with /usr/bin/printf
> while read num ; do /usr/bin/printf "$num\n" ; done  | sort |
> 
> # Collate characters which are matched by the given range or class
> gawk -v re=$1 '$1 ~ re { s = s $1 } END { print s }'
              ^^
better use: re="$1"

Nice script. Works fine. I tested a lot with it and adapted it to my 
personal suites.

--
Steffen




 43 Posts in Topic:
Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-06 04:16:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-06 13:28:06 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 07:11:38 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 09:18:57 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 19:50:11 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 13:03:32 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 20:39:44 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:48:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 19:21:58 
Re: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-07 07:59:10 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:20:16 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 17:25:24 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:37:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 18:04:24 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 11:16:35 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 20:27:53 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:49:51 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 13:16:24 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 11:25:06 
[OT] collating sequences: using glibc
Steffen Schuler <schul  2008-05-09 08:51:38 
Re: [OT] collating sequences: using glibc
pk <pk@[EMAIL PROTECTE  2008-05-09 10:32:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 06:58:39 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 16:22:59 
OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 08:46:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 18:11:28 
Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-08 22:29:32 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 22:49:38 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-09 09:44:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-09 10:24:00 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 09:45:28 
[OT] Re: OT: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-09 02:08:34 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 10:58:52 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:52:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:55:35 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:10:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:31:22 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Steffen Schuler <schul  2008-05-10 21:56:00 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 23:14:44 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Cesar Rabak <csrabak@[  2008-05-11 10:50:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-11 17:27:57 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-11 11:17:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-10 15:07:10 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-13 03:41:09 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 9:27:28 CDT 2008.