Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Gawk and multi-...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 1 of 8 Topic 2132 of 2317
Post > Topic >>

Gawk and multi-byte characters

by Hermann Peifer <peifer@[EMAIL PROTECTED] > Jan 16, 2008 at 07:51 PM

Hi All,

I am working with UTF-8 encoded files and wonder how Gawk is treating 
multibyte characters. Here an example:

$ cat AAA.txt
AAA
ÅÃÆ

The length() function counts 3 characters in both records:

$ awk '{print length()}' AAA.txt
3
3

But printf seems to count bytes, rather than characters

$ awk '{printf "%-7s|\n", $0}' AAA.txt
AAA    |
ÅÃÆ |


What about other functions: index(), substr(), etc.? Is this do***ented 
somewhere?

TIA. Hermann
 




 8 Posts in Topic:
Gawk and multi-byte characters
Hermann Peifer <peifer  2008-01-16 19:51:56 
Re: Gawk and multi-byte characters
=?UTF-8?B?SsO8cmdlbiBLYWh  2008-01-16 20:01:18 
Re: Gawk and multi-byte characters
Hermann Peifer <peifer  2008-01-16 20:45:26 
Re: Gawk and multi-byte characters
=?ISO-8859-1?Q?J=FCrgen_K  2008-01-16 21:18:09 
Re: Gawk and multi-byte characters
Steffen Schuler <schul  2008-03-20 08:07:22 
Re: Gawk and multi-byte characters
Hermann Peifer <peifer  2008-03-20 10:55:51 
Re: Gawk and multi-byte characters
=?ISO-8859-1?Q?J=FCrgen_K  2008-03-20 17:22:16 
Re: Gawk and multi-byte characters
Hermann Peifer <peifer  2008-03-20 18:05:36 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Jul 24 15:14:31 CDT 2008.