[ietf-charsets] [art] US-ASCII and its various names

Martin J. Dürst duerst at it.aoyama.ac.jp
Tue Dec 19 03:07:43 UTC 2023


Hello John, others,

[Removing iana at iana.org, because this doesn't contain any requests for 
them.]

On 2023-12-18 12:57, John C Klensin wrote:
> 
> 
> --On Monday, 18 December, 2023 09:21 +0900 "Martin J. Dürst"
> <duerst at it.aoyama.ac.jp> wrote:
> 
>> Hello Stephen,
>>
>> On 2023-12-16 04:06, Steffen Nurpmeso wrote:
>>
>>> To add that for backward compatibility the plain ASCII alias
>>> cannot go away,
>>
>> I seem to remember too that ASCII was listed as an alias, and
>> have confirmed this with
>> https://web.archive.org/web/20051229042158/http://www.iana.org
>> /assignments/character-sets
> 
> That article appears to be to be discussing MIBs, not charset
> parameters for what are now called Media Types, particularly
> text and its subtypes.

Sorry, wrong. Yes, it *also* talks about MIBs, but that's just because 
the MIB stuff is integrated in the registry.

As for the many contributions of historical background, that's very much 
appreciated. I definitely haven't been around charsets when RFC 20 was 
created, although my experience with charsets goes back quite a bit 
longer than my role as expert reviewer.

My summary of the history as relevant for the problem at hand (probably 
a clerical miss) is as follows:

- Around 1969 (RFC 20, "ASCII format for Network Interchange", Oct 16),
   the issue was more about ASCII vs. EBCDIC (IBM) vs. encodings from
   other vendors that nobody remembers these days. The ARPANET was
   just being started (The first ARPANET communication is dated
   22:30 hours on October 29, 1969, California time). This was a purely
   US-only undertaking, and nobody in that undertaking was worrying
   too much (if at all) about encodings for languages other than
   English.
   [RFC 20 is STD 80 now, but that designation must have happened
    in or close to 2014, because STD 79/RFC 7296 dates from Oct. 2014.]

- For quite some time before and around 2000 (RFC 2978, "IANA Charset
   Registration Procedures"; RFC 2278, same name, Jan 1998 (*)), the
   name "ASCII" was in various contexts used with somewhat different,
   meanings, to the extent that people involved found it very prudent
   to strongly insist on using the label "US-ASCII" for "the real thing",
   while still listing the label "ASCII" as an alias.

- Somewhere around 2013, the "ASCII" alias got dropped from the
   registry, probably as a result of a clerical error (and not
   related to the fact that "ASCII" was essentially deprecated
   for a long time although 'deprecated' isn't used in the registry).

- In this day and age, most of the 'ASCII-like' character encodings
   are virtually out of use. "ASCII" as a label is also used extremely
   rarely in contexts where the IANA charset registry is relevant
   (email and the Web definitely count, OSes and programming languages
   don't). UTF-8 is recommended and used widely, and elegantly subsumes
   (US-)ASCII.


>> Stephen, maybe you can do a bisection to find out where this
>> alias disappeared.
> 
> Oh, at the risk of repeating parts of the note I sent some days
> ago, let me tell you what I remember before I co-wrote the
> document that created the charset review role:

(*) Can you tell us about that document? RFC 2048 (MIME Registration 
Procedures, which you co-wrote) explicitly says "Registration of 
character sets for use in MIME is covered elsewhere and is no longer 
addressed by this document."

I didn't find anything about registration in the RFC 2045~49 series nor 
in RFC 1522 or 1590, but maybe I didn't look hard enough.


> While it seems odd to not have it among the collection of
> synonyms, is isn't there (and was probably dropped from various
> earlier specs) because of the potential for ambiguity.
> 
> If you are going to add it to the registry, that should be done,
> not as a synonym for "US-ASCII", but as a separate items with an
> explanation of the ambiguity problem.

It was there as a synonym, and I'm not going to ask IANA to add it 
separately.


>> <charset reviewer hat on>
>> I don't remember ever having dealt with a request to remove
>> this ALIAS, and I strongly doubt that Ned ever did that.
>> </charset reviewer hat on>
> 
> I don't believe it was ever listed as a valid charset value,
> regardless of other uses.  As I said, not having it there was a
> very explicit decision.

It was there up to around 2013, and that must have been an explicit 
decision, the same way that designating US-ASCII as the preferred MIME 
label was a very explicit decision.


Regards,   Martin.


More information about the ietf-charsets mailing list