Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Awk > Re: Printing Un...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 4 of 5 Topic 2100 of 2347
Post > Topic >>

Re: Printing Unicode chars

by Hermann Peifer <peifer@[EMAIL PROTECTED] > Dec 17, 2007 at 05:30 AM

On Dec 16, 11:38 pm, Kees Nuyt <k.n...@[EMAIL PROTECTED]
> wrote:
> On Sun, 16 Dec 2007 07:05:18 -0800 (PST), Hermann Peifer
>
> <pei...@[EMAIL PROTECTED]
> wrote:
> >Hi,
>
> >GAWK's printf has a %c format:
> >> This prints a number as an ASCII character;
> >> thus, `printf "%c", 65' outputs the letter `A'.
>
> >I am looking for a somewhat similar format that would print a Unicode
> >character based on a hex value.  Any idea?
>
> >Thanks in advance, Hermann
>
> Primitive but it works.
> I'm sure someone will try to optimize this.
> Advantage: the algoritm is clear.
>
> BEGIN{
>         hexchar   =3D "0123456789ABCDEF"
>         binchar =3D
> "0000000100100011010001010110011110001001101010111100110111101111"
>         i =3D 0
>         for (g =3D 1;g < length(binchar);g+=3D4){
>                 bin2ord[substr(binchar,g,4)] =3D i
>                 i++
>         }
>         for (i=3D0;i<16;i++){
>                 for (j=3D0;j<16;j++){
>                         entry =3D ((16*i) + j)
>                         ord2chr[entry] =3D sprintf("%c",entry)
>                         chr2hex[sprintf("%c",entry)] =3D \
> substr(hexchar,i + 1,1) substr(hexchar,j + 1,1)
>                 }
>         }
>         printf("\xEF\xBB\xBF")        # BOM}
>
> #
> # x =3D string of 4 hex digits
> #      ISO character code HHHH
> #     (usually defined as \uHHHH or 0xHHHH)
> # returns UTF-8
> #
> function hex2ut8(x  ,b,q,r,s,t){
>         b =3D "";
>         for (q=3D1;q <=3D length(x);q++){
>                 b =3D b substr(binchar,1 + 4 *
> (index(hexchar,substr(x,q,1)) - 1),4)
>         }
> #  calculate nr of significant bits
>         q =3D 17 - index(b,"1")
>         if (q <=3D 7){
>                 t =3D substr(b,9,8)
>         } else if (q <=3D 11){
>                 t =3D "110" substr(b,6,5)  "10" substr(b,11,6)
>         } else if (q <=3D 16){
>                 t =3D "1110" substr(b,1,4) "10" substr(b,5,6)\
> "10" substr(b,11,6)
>         } else {
>                 abort("UTF-8 sup****t is limited to 16 bits.")
>                 # could be extended quite easily.
>         }
>         r =3D ""
>         for (u =3D 1;u < length(t);u+=3D8){
>                 r =3D r ord2chr[(16 * bin2ord[substr(t,u,4)])\
> + bin2ord[substr(t,(u + 4),4)]]
>         }
>         return r
>
> }
>

Thanks for this solution.

To be honest, I was hoping that something simpler would be possible,
perhaps along the lines how printing of Unicode characters can be done
with GNU printf, say for the EURO currency sign, as mentioned by
J=FCrgen:

$ /usr/bin/printf "\u20AC\n"
EURO

But obviously, this is not possible with GAWK printf.

Hermann
 




 5 Posts in Topic:
Printing Unicode chars
Hermann Peifer <peifer  2007-12-16 07:05:18 
Re: Printing Unicode chars
=?UTF-8?B?SsO8cmdlbiBLYWh  2007-12-16 17:19:00 
Re: Printing Unicode chars
Kees Nuyt <k.nuyt@[EMA  2007-12-16 23:38:01 
Re: Printing Unicode chars
Hermann Peifer <peifer  2007-12-17 05:30:00 
Re: Printing Unicode chars
Hermann Peifer <peifer  2007-12-17 19:22:46 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Sun Sep 7 7:53:31 CDT 2008.