33 Years of the Digest ... founded August 21, 1981
Copyright © 2014 E. William Horne. All Rights Reserved.
The Telecom Digest for Sep 20, 2014
|Messages in this Issue:|
|Re: Is it time for a new charset in the Digest?||(Gordon Burditt)|
|Re: Is it time for a new charset in the Digest?||(Garrett Wollman)|
|Re: Is it time for a new charset in the Digest?||(Michael Moroney)|
|Q.: at&t to sell SNET holdings to Frontier?||(tlvp)|
I like the dreams of the future better than the history of the past. - Thomas Jefferson
See the bottom of this issue for subscription and archive details.
|Date: Fri, 19 Sep 2014 00:52:37 -0500
From: email@example.com (Gordon Burditt)
Subject: Re: Is it time for a new charset in the Digest?
> I've been using the ISO-8859-1 "Latin1" character set in the Digest
> for a few years now: we adopted it as the standard after a reader made
> me awaare that there are no accented characters in ASCII, so I figured
> that I'd implement a way for him to spell his name properly, and also
> be able to add "Internationalization" to my résumé.
> I'm wondering if it's time for another change, either to one of the
> "transitional" Unicode formats, such as UTF-8, or perhaps to a
> permanent solution such as UCS-16.
There is no UCS-16. There are UCS-2 or UTF-16. The "TF" in "UTF"
stands for "Transformation Format", not "Transitional Format".
Another thing that uses the term "transitional" is HTML, which
is not related to character sets.
I recommend that you go to UTF-8, or stick with ISO-8859-1 (or
Windows-1252, which is a superset of ISO-8859-1). I don't think
the other choices are reasonable. Trying to go with ISO-8859-*,
where 15 different charsets with lots of overlap are distinguished
by charset tags is going to cause problems when someone using
ISO-8859-X quotes someone using ISO-8859-Y, where X != Y, and
characters outside the common subset are used.
I like UTF-8. I hope it becomes permanent for things like the web
and email. It has the advantage that no byte sequence for any
character is a subset of the byte sequence for any other character,
so a pattern-search designed for ASCII still works. Actually, a
lot of things "just work" with UTF-8 for programs expecting ASCII.
That won't happen for UTF-16.
I hope UTF-16 and UCS-2 die out. They encourage a halfway solution
in which characters with codes that won't fit in 16 bits aren't
supported. They also have the byte-order abomination. They do
NOT solve the issue of variable-width characters. Even UCS-4 or
UTF-32 does not do that, due to the existence of "combining
characters". The byte order mark of UTF-16 is a problem for mail
and news articles. Where do you put it? If it's before the headers,
then most every mail and news server currently running will interpret
it as part of the headers, mangling one of them, or worse, interpret
it as a division between (no) headers and the body of the message,
and ending up with a lot of rejected mail due to "missing" headers
like From:, Subject: or Newsgroups: . If you put it at the start
of the body, well, I can imagine the mess you end up with replies
to articles with quoting, even if everyone is using UTF-16. No
BOM. Multiple conflicting BOMs. BOMs in the middle of text where
they aren't looked at.
How often have you needed to translate something to be posted from
whatever character set it was in to ISO-8859-1, and ended up with
untranslatable characters? If the answer is "never", there's
probably no pressing need to change. If your only concern is
people's names, there may be no need to change, unless you get a
lot of contributers with Japanese, Chinese, Korean, or Vietnamese
names who still write in English. But if you are going to change,
please choose UTF-8, not UTF-16.
One problem that often arises from using multiple charsets in a
newsgroup or mailing list is that quoted text with charset A included
in a post with charset B often results in a mess on the screens of
readers. Using UTF-8 won't solve this, but it will reduce it. It's
even worse when characters in charset A used in the quoted post
have no equivalent in charset B (possible with, for example,
ISO-8859-1 vs. ISO-8859-5). At least if charset B includes all the
characters, translation is possible. Unless you try putting your
foot down and claiming that all submissions must be in UTF-8,
you'll probably still have to translate parts of some submissions.
You should check out browser and mail reader support for various
charsets. I believe the only required charsets for browsers are:
ASCII, ISO-8859-1 ("Latin1"), Windows-1252 (a superset of ISO-8859-1),
and UTF-8. I may be wrong about "required"; it may just mean
"essential for the success of the program". In any case, a browser
that does not support UTF-8 is going to miss out on a lot of the
In a survey of character sets used on the web in August, 2014
these are some of the results (a web site may use more than one character
set, so results may add to more than 100%, but not by much):
|Date: Sat, 20 Sep 2014 04:47:01 +0000 (UTC)
From: firstname.lastname@example.org (Garrett Wollman)
Subject: Re: Is it time for a new charset in the Digest?
In article <xcGdnSFVKt04WYbJnZ2dnUVZ_tmdnZ2d@posted.internetamerica>,
Gordon Burditt <email@example.com> wrote:
>I like UTF-8. I hope it becomes permanent for things like the web
>and email. It has the advantage that no byte sequence for any
>character is a subset of the byte sequence for any other character,
>so a pattern-search designed for ASCII still works. Actually, a
>lot of things "just work" with UTF-8 for programs expecting ASCII.
Because it was designed specifically to have that property, of course.
(UTF-8 is intellectually descended from FSS-UTF -- "file system safe"
-- which was invented by the Plan 9 people at Bell Labs, at a time
when the Unicode Consortium was dead set on 16-bit characters. The
actual encoding is slightly different.)
Other than that, I agree with pretty much everything that Gordon
says. (And I say that as someone whose universe is pretty much all
still ISO 8859-1.) Becase UTF-8 degrades gracefully to ASCII (erm,
ISO 646), for most purposes, in English-language documents, there is
no penalty to using it.
|Date: Fri, 19 Sep 2014 23:46:25 +0000 (UTC) From: firstname.lastname@example.org (Michael Moroney) To: email@example.com. Subject: Re: Is it time for a new charset in the Digest? Message-ID: <firstname.lastname@example.org> If you do decide to stick with ISO-8859-1, consider ISO-8859-15 instead. It is nearly the same as ISO-8859-1 except for a few minor differences that allow more languages. But the big difference is that ISO-8859-15 has the Euro character.|
|Date: Fri, 19 Sep 2014 21:13:45 -0400 From: tlvp <mPiOsUcB.EtLlLvEp@att.net> To: email@example.com. Subject: Q.: at&t to sell SNET holdings to Frontier? Message-ID: <firstname.lastname@example.org> >From a presorted First-Class U.S. Mail post card received today: > Dear Valued Customer, > > Pending regulatory approval, Frontier Communications Corporation will > assume ownership of the Southern New England Telephone Company (SNET) > and SNET America, Inc. (SAI) as soon as late October 2014. There's more, but it's just customer assuagement talk. And local loop at&t Customer Service reps seem not to have had much of a briefing yet on how to respond to customer inquiries about this, other than to reassure folks that Frontier's corporate headquarters is (grammar: are?) in Stamford, CT. Thus far, only the marketing arm of Comcast seems to have had wind of this impending change, using it as an anti-carrot with which to lure internet clients away from at&t/Yahoo! HSI DSL services to Comcast cable. What can the repercussions be of this change on local loop service, DSL (or other high-speed ISP) service, the continued existence of the email domains sbcglobal.net, att.net, snet.net, etc., pricing, and billing? Cheers, & thanks in advance, -- tlvp (from deep in the heart of SNET-land) -- Avant de repondre, jeter la poubelle, SVP.|
TELECOM Digest is an electronic journal devoted mostly to telecom- munications topics. It is circulated anywhere there is email, in addition to Usenet, where it appears as the moderated newsgroup 'comp.dcom.telecom'.
TELECOM Digest is a not-for-profit educational service offered to the Internet by Bill Horne.
The Telecom Digest is moderated by Bill Horne.
43 Deerfield Road
Sharon MA 02067-2301
bill at horne dot net
This Digest is the oldest continuing e-journal about telecomm- unications on the Internet, having been founded in August, 1981 and published continuously since then. Our archives are available for your review/research. We believe we are the oldest e-zine/mailing list on the internet in any category! URL information: http://telecom-digest.org Copyright © 2014 E. William Horne. All rights reserved.
Finally, the Digest is funded by gifts from generous readers such as yourself. Thank you!
All opinions expressed herein are deemed to be those of the author. Any organizations listed are for identification purposes only and messages should not be considered any official expression by the organization.