Talk About Network



Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > [OT] collating ...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 20 of 43 Topic 2231 of 2236
Post > Topic >>

[OT] collating sequences: using glibc

by Steffen Schuler <schuler.steffen@[EMAIL PROTECTED] > May 9, 2008 at 08:51 AM

schuler.steffen@[EMAIL PROTECTED]
 wrote:
> On May 7, 5:25 pm, pk <p...@[EMAIL PROTECTED]
> wrote:
> [snip]
>> Is there a way to explicitly print out that information (or, better,
the
>> entire collating sequence in use)? I've been looking for a method to do
>> that for long time, but I have found no complete answer.
> [snip]
> 
> Here the collating sequence of the alphabetical ([:alpha:]) letters
> using
> gawk and sort for the locales C, en_US, and en_US.UTF-8
> (using Fedora Core 7 and the current gawk from CVS):
[snip]

Perhaps the result of the GLIBC functions iswalpha and wprintf together 
with GNU sort explains better the collating of the alphabetic characters 
for the 3 locales C, en_US, en_US.utf8 (my current system: Debian 
GNU/Linux testing (lenny)):

$ cat collate.c
#include <stdio.h>
#include <wchar.h>
#include <stdlib.h>

int
main(int argc, char **argv)
{
	wint_t i;

	if (argc != 2) {
		fprintf(stderr, "usage: ./collate MAX_CHAR_CODE\n");
		exit(1);
	}

	for (i = 1; i <= atoi(argv[1]); ++i)
		if (iswalpha(i))
			wprintf(L"%c\n", i);
	return 0;
}
$ cc -o collate collate.c
$ export LC_ALL=C; ./collate 255 | sort | tr -d '\n'; echo
ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
$ export LC_ALL=en_US; ./collate 255 | sort | tr -d '\n'; echo
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
$ # ^^^^^^^...: only Latin letters
$ export LC_ALL=en_US.UTF-8; ./collate 65535 | sort | tr -d '\n'; echo
aAbBcCdDeEfFgGhHiIjJkKlLmMnNoOpPqQrRsStTuUvVwWxXyYzZ
$


There seem to be some inconsistencies between gawk and glibc locale 
usage. Especially in the C-code sample only Latin letters are listed for 
LC_ALL=en_US contrary to the similar gawk sample.

Besides, the collating order may be different from the order of the 
character codes --- see:

    http://tinyurl.com/6gdraw

I'm really interested whether there are any further locale bugs in gawk 
(for trying to fix them).
Perhaps I have more time for the research at the current weekend.

--
Steffen




 43 Posts in Topic:
Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-06 04:16:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-06 13:28:06 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 07:11:38 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 09:18:57 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 19:50:11 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 13:03:32 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-07 20:39:44 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:48:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 19:21:58 
Re: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-07 07:59:10 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:20:16 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 17:25:24 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 10:37:01 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 18:04:24 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 11:16:35 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-07 20:27:53 
Re: Gawk match() and numbers in scientific notation
Ed Morton <morton@[EMA  2008-05-07 21:49:51 
Re: Gawk match() and numbers in scientific notation
schuler.steffen@[EMAIL PR  2008-05-07 13:16:24 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 11:25:06 
[OT] collating sequences: using glibc
Steffen Schuler <schul  2008-05-09 08:51:38 
Re: [OT] collating sequences: using glibc
pk <pk@[EMAIL PROTECTE  2008-05-09 10:32:37 
Re: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 06:58:39 
Re: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 16:22:59 
OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 08:46:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-08 18:11:28 
Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-08 22:29:32 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 22:49:38 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-09 09:44:54 
Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-09 10:24:00 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-08 09:45:28 
[OT] Re: OT: Gawk match() and numbers in scientific notation
Janis <janis_papanagno  2008-05-09 02:08:34 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 10:58:52 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:52:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-10 11:55:35 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:10:19 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 20:31:22 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Steffen Schuler <schul  2008-05-10 21:56:00 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-10 23:14:44 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Cesar Rabak <csrabak@[  2008-05-11 10:50:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-11 17:27:57 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
pk <pk@[EMAIL PROTECTE  2008-05-11 11:17:15 
Re: [OT] Re: OT: Gawk match() and numbers in scientific notation
Janis Papanagnou <Jani  2008-05-10 15:07:10 
Re: OT: Gawk match() and numbers in scientific notation
Hermann Peifer <peifer  2008-05-13 03:41:09 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri May 16 9:20:03 CDT 2008.