[ietf-charsets] Update the information for GB18030 in IANA?

Martin J. Dürst duerst at it.aoyama.ac.jp
Wed Nov 16 08:29:05 UTC 2022

Dear Eiso, others,

Many thanks for your request.

I'm the expert reviewer for character set registrations (Ned Freed was 
the primary reviewer, but unfortunately no longer is with us).

I'm sorry I missed your mail for over a month.

On 2022-10-14 12:02, 陈永聪 wrote:
> Dear Anthony,
> (cc Mr. Chen Zhuang and Ken)
> I find your email in the GB18030 page in IANA (https://www.iana.org/assignments/charset-reg/GB18030), which the page is compiled by you.
> The newest version of GB 18030 has been published as GB 18030-2022, please see https://std.samr.gov.cn/gb/search/gbDetailed?id=E4A2A4C875726A5DE05397BE0A0A61E8 and https://openstd.samr.gov.cn/bzgk/gb/newGbInfo?hcno=A1931A578FE14957104988029B0833D3 . However the IANA information has still kept for the original 2000 version, which is not better for the users.

Many thanks for this information. What we need to decide is whether to 
update the registration for charset "GB18030" to this new standard or 
whether we should define a new charset label (e.g. "GB18030-2022") to 
distinguish this from the old (2000 version) of the standard. This 
depends on how much changes there are in the new version, and how 
various implementers are expected to deal with the new version.

Given that as far as I understand, GB 18030 is an encoding of 
Unicode/ISO 10646, and we do not distinguish versions in charset labels 
for Unicode/ISO 10646, there is a good argument for not introducing a 
new label for this new version.

On the other hand, if there are new structural features in the 2022 
version that are not present in the 2000 version, that might indicate 
the need for a new charset label.

So any information on structural changes (or the absence thereof) as 
well as expectations towards industry and plans and needs from industry 
are greatly appreciated.

 From reading https://en.wikipedia.org/wiki/GB_18030, my understanding 
is that there are no structural changes, and that the main change is 
that mappings to the PUA have been completely eliminated. That means 
that there are some mapping differences between the 2000/2005 version 
and the 2022 version, but we might characterize them as minor (80 or so 
codepoints) and decide to keep the same label.

> CESI also kindly released the mapping table at http://www.nits.org.cn/getIndex.req?action=findAllNews&req=modulenvpromote&type=0&moduleId=455&sid=4 . If it is not convenient to download the file from the website, you can find the attachment.

Ideally, I'd prefer if you didn't send such large files to the mailing 
list. But I have been unable go get any response from the above URI in a 
browser. What's interesting is that a ping to www.nits.org.ch works 
without problems, with a bit over 100 ms round trip time.

If such data is available somewhere, it might be better to get a diff 
between the old and the new mapping tables. That should be much shorter.

> I think it is better to update the information in corresponding IANA  page. So, do you know how shall we need to do?

Once we know exactly what/how we want to update the information, I'll 
ask IANA to do so.

My understanding is that this is not immediately urgent, because 
says that the (google translate) "implementation date" (实施日期) is August 
1st, 2023.

Looking forward to getting additional information from anybody who has some.

Regards,   Martin.

> Eiso

More information about the ietf-charsets mailing list