Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Programming > Java Advocacy > Re: Website Dom...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 2 of 2 Topic 2340 of 2478
Post > Topic >>

Re: Website Domain Rule Change

by The Ghost In The Machine <ewill@[EMAIL PROTECTED] > Dec 29, 2007 at 05:34 PM

In comp.lang.java.advocacy, Roedy Green
<see_website@[EMAIL PROTECTED]
>
 wrote
on Fri, 28 Dec 2007 10:42:09 GMT
<dgk9n3lnp1frrkndkk46dpot72f67kgse8@[EMAIL PROTECTED]
>:
> I think we should change the rules on website names.
>
> If you own a name, you should also have exclusive rights to all
> variants with dashes inserted.

Dashes are only one of many many "decorative" Unicode [*]
characters.  Presumably, the actual active name should
be alphanumeric only, with alpha including such things
as Chinese glyphs and a few letter/numberlike symbols;
the latter might morph into their corresponding letter or
number during lookup, if they can.

For example, U+201E might either be construed as is (it's
the Unicode for Planck's Constant), or mapped to the letter
'h' during lookup; U+2460 through U+2468 might be mapped
to their corresponding digits (they are the digits 1-9
enclosed in a circle).  U+2469 through U+2473 might map
into *two* digits.  U+24B6 through U+24E9 might similarly
map into their corresponding letters.

It gets worse. The character U+0391 (capital Greek Alpha)
looks almost exactly like the character U+0041 (capital
Latin A), which could lead to some interesting issues,
especially during printout.  Of course everyone is (or
should be!) familiar with the issues regarding U+004F
(capital latin "O"), U+006F (small latin "o"), and U+0030
(zero or "0"); ditto for U+004C (capital latin "L"), U+006C
(lower latin "l"), and U+0031 (one or "1").  There is the
mild but real possibility of confusion of "S" with "5",
"Z" with "2", and lowercase "g" with "9" in some fonts.

Some of the Arabic symbols look a bit like a very stylized
"J", "i", or "U" form; presumably someone who is astute
enough might notice the difference but naive readers may
not.  (One difference: Arabic symbols go right-to-left.)
The character U+4E2B (a CJY Unified Ideograph) looks a bit
like the letter Y (U+0059).  The Cyrillic character U+041D
looks like U+0048 (capital Latin "H") but is pronounced more
like U+004E (capital letter "N").  U+041E is the Cyrillic
capital letter O, and could be added to the other zerolike
or O-like characters.  U+0414 looks a bit like the capital
letter "A", but is pronounced "de".

I could go on, but you probably get the idea.

A number of the Unicode symbols are the Latin with
modifiers.  For example, U+00CA is Latin Capital Letter E
with cir***flex.  This is transmitted as either 0xCA or as
0xC3 0x8A, depending on the encoding (the latter is UTF-8);
the encoding further complicates issues.

All this would of course have to be do***ented somewhere.

Since ICANN is being split up into multiple jurisdictions,
it may also depend on the jurisdiction involved.
For example, .com might be under the UN somewhere; those
countries purely in the US would hopefully mutate from .com
into .co.us, and be subject to US conditions.  However,
..co.uk would be enforced by the British, and .co.de
the Germans, who have their own ideas regarding letters
(U+00DF is the Latin Small Letter Sharp S, which used to
be called German "esstset" or some such; I'd have to look).
Then there's the Chinese, Iraqis, Ethiopians, etc., etc. ...

Even in the US one might have some interesting issues.
The Cherokee alphabetic forms are recorded in Unicode;
the chareacter U+13CB in particular, which is "quv"
(I have no idea how one would pronounce this) looks a
bit like a stylized letter "E"; the U+13D9 "do" character
looks like a "V", and U+13DA ("du") "S".  Since ideally
Cherokee, along with all other Native American tribes,
are independent entities, ideally they would have their
own domains as well, though I'm not sure how faithfully
the US Government will go along with that idea, or the
precise implementation thereof.

And one might even have deltas depending on which ISP
implements the website, and which ISP is doing the lookup,
if the specifications are sufficiently lax.

In short...good luck achieving consensus in this area.

>
> Alternatively dashes is URLs should be automatically removed before
> lookup.
>
> So often I get taken to bogus squatter sites when I copy down an URL,
> or try to remember it. and don't get the dashes exactly right.

The domain "yaoho.com" was at one point registered to a
****o site (it stood for something along the lines of Young
Adults Orgasming, Heaving, and Osculating, or something
equally salacious).  Thankfully, that particular one was
noticed rather famously by Time Magazine (time.com) and
removed -- but others are still out there.

There's a controversy regarding "micr0soft.com", which
apparently now doesn't resolve to anything useful, and
"micros0ft.com", which resolves to a working website that
returns "f00", and of course Microsoft.  Various squatters
and plagiarizers also exist; occasionally the big concerns
go after them to reclaim their good name -- which can
lead to controversy in the case of Mike Rowe.

His site mutated into MichaelRowe.com, presumably, which
is what Google is suggesting now.  Microsoft bribed him
with an Xbox, and got some rather bad publicity out of
it all: http://www.news.com/2100-1014_3-5143614.html
.

(Of course, not doing anything has its own risks; for
example, "Kleenex", "Frisbee", "Hoover", and "Xerox" [+]
are in very real danger of becoming generic words in the
English language, despite being registered trademarks
for facial tissues, flying discs, vacuum cleaners, and
photoduplicators.)

>
> If there are other decorative letters in URLs, they should be treated
> the same way.
>

Again, good luck; see the examples above for some of the
complexities involved.

[*] If one has Linux one might be able to use the utility
    'gucharmap', which is part of GNOME.  I don't know
    the KDE equivalent.  Shareware Windows utilities
    should be available or one can simply go to
    http://unicode.org/
.

[+] Kleenex: Kimberly-Clark
    Frisbee: Wham-O
    Hoover: Hoover
    Xerox: Xerox

    Some of these may already have been lost; Wikipedia
    suggests Frisbee in particular is now a genericized
    trademark (and Frisbee.com is held by a squatter!), but
    that Kleenex is still associated with Kimberly-Clark.

-- 
#191, ewill3@[EMAIL PROTECTED]
 memory has to be one of the most UNconventional
architectures I've seen in a computer system.

-- 
Posted via a free Usenet account from http://www.teranews.com
 




 2 Posts in Topic:
Website Domain Rule Change
Roedy Green <see_websi  2007-12-28 10:42:09 
Re: Website Domain Rule Change
The Ghost In The Machine   2007-12-29 17:34:10 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Thu Dec 4 1:09:01 CST 2008.