On Dec 16, 11:38 pm, Kees Nuyt <k.n...@[EMAIL PROTECTED]
> wrote:
> On Sun, 16 Dec 2007 07:05:18 -0800 (PST), Hermann Peifer
>
> <pei...@[EMAIL PROTECTED]
> wrote:
> >Hi,
>
> >GAWK's printf has a %c format:
> >> This prints a number as an ASCII character;
> >> thus, `printf "%c", 65' outputs the letter `A'.
>
> >I am looking for a somewhat similar format that would print a Unicode
> >character based on a hex value. Any idea?
>
> >Thanks in advance, Hermann
>
> Primitive but it works.
> I'm sure someone will try to optimize this.
> Advantage: the algoritm is clear.
>
> BEGIN{
> hexchar =3D "0123456789ABCDEF"
> binchar =3D
> "0000000100100011010001010110011110001001101010111100110111101111"
> i =3D 0
> for (g =3D 1;g < length(binchar);g+=3D4){
> bin2ord[substr(binchar,g,4)] =3D i
> i++
> }
> for (i=3D0;i<16;i++){
> for (j=3D0;j<16;j++){
> entry =3D ((16*i) + j)
> ord2chr[entry] =3D sprintf("%c",entry)
> chr2hex[sprintf("%c",entry)] =3D \
> substr(hexchar,i + 1,1) substr(hexchar,j + 1,1)
> }
> }
> printf("\xEF\xBB\xBF") # BOM}
>
> #
> # x =3D string of 4 hex digits
> # ISO character code HHHH
> # (usually defined as \uHHHH or 0xHHHH)
> # returns UTF-8
> #
> function hex2ut8(x ,b,q,r,s,t){
> b =3D "";
> for (q=3D1;q <=3D length(x);q++){
> b =3D b substr(binchar,1 + 4 *
> (index(hexchar,substr(x,q,1)) - 1),4)
> }
> # calculate nr of significant bits
> q =3D 17 - index(b,"1")
> if (q <=3D 7){
> t =3D substr(b,9,8)
> } else if (q <=3D 11){
> t =3D "110" substr(b,6,5) "10" substr(b,11,6)
> } else if (q <=3D 16){
> t =3D "1110" substr(b,1,4) "10" substr(b,5,6)\
> "10" substr(b,11,6)
> } else {
> abort("UTF-8 sup****t is limited to 16 bits.")
> # could be extended quite easily.
> }
> r =3D ""
> for (u =3D 1;u < length(t);u+=3D8){
> r =3D r ord2chr[(16 * bin2ord[substr(t,u,4)])\
> + bin2ord[substr(t,(u + 4),4)]]
> }
> return r
>
> }
>
Thanks for this solution.
To be honest, I was hoping that something simpler would be possible,
perhaps along the lines how printing of Unicode characters can be done
with GNU printf, say for the EURO currency sign, as mentioned by
J=FCrgen:
$ /usr/bin/printf "\u20AC\n"
EURO
But obviously, this is not possible with GAWK printf.
Hermann


|