On 5/7/2008 9:59 AM, Janis wrote:
> On 7 Mai, 16:18, Ed Morton <mor...@[EMAIL PROTECTED]
> wrote:
>
>>On 5/6/2008 6:16 AM, Hermann Peifer wrote:
>>
>>
>>>Hi,
>>
>>>I am somehwat puzzled with match() results for numbers in scientific
>>>notation. See below.
>>
>>>$ cat testdata
>>>100
>>>100e-3
>>>100E3
>>
>>>I am wondering what kind of uppercase character is matched in record
>>>2:
>>
>>>$ gawk '{print $1,match($1,/[A-Z]/)}' testdata
>>
>>There may not be an uppercase character matching. [A-Z] represents the
list of
>>characters in between the character A and the character Z in your locale
>
>
> Can you provide some reference for that definition?
From the GNU awk user guide
(http://www.gnu.org/software/gawk/manual/gawk.html#Character-Lists):
.... For example, in the default C locale, `[a-dx-z]' is equivalent to
`[abcdxyz]'. Many locales sort characters in dictionary order, and in
these
locales, `[a-dx-z]' is typically not equivalent to `[abcdxyz]'; instead it
might
be equivalent to `[aBbCcDdxXyYz]', for example.
> I thought that ranges like [A-Z] depend on the _coding_ of the
> character set (IOW, on the code values of the characters), and
> not depending on the locale.
>
> So that in case you have a different code set than ISO Latin 1,
> ASCII, or similar, e.g. like EBCDIC (where there may be other
> characters spread in between the letter code positions) you'd
> get unexpected results.
>
> So, because of that, your suggestion below is valid anyway, but
> I'd like to be sure about what you wrote if that is in fact true.
Sounds to me like it's dependent on locale but if what your describing
above is
something different from locale, then you may be right. Whatever the
"thingy" is
that causes the difference, using character classes is the right approach.
Ed.
> Janis
>
>
>>- that
>>does NOT mean it has to be upper case characters. For example, your
locale might
>>consider characters ordered as:
>>
>> aAbBcCdDeEfF....zZ
>>
>>so "e" would sit between "A" and "Z". That's why you should use
character
>>classes instead of specific ranges, e.g.:
>>
>> gawk '{print $1,match($1,/[[:upper:]]/)}' testdata
>>
>>Regards,
>>
>> Ed.
>
>


|