[tz] [PROPOSED 1/4] Allow “§” etc. in commentary
Brian Inglis
Brian.Inglis at Shaw.ca
Tue Jan 24 02:22:46 UTC 2023
On 2023-01-23 15:32, John Sauter via tz wrote:
> On Mon, 2023-01-23 at 15:28 -0700, Paul Gilmartin via tz wrote:
>> On 1/23/23 13:48:02, Paul Eggert via tz wrote:
>>> * Makefile (UNUSUAL_OK_LATIN_1): Allow all non-alphabetic,
>>> non-ASCII printable characters that are Latin-1. This is
>>> primarily for “§” and we might as well allow them all
>>> since even XEmacs 21 supports them all.
>>> +UNUSUAL_OK_LATIN_1 = ¡¢£¤¥¦§¨©ª«¬®¯°±²³´µ¶·¸¹º»¼½¾¿×÷
>> Ouch! UTF-8 is too pervasive on desktops and WWW for that to be
>> comfortable.
>> And on a UTF-8 desktop, GNU sed strangles on non-UTF-8 strings:
>> 1250 $ printf 'a\xa7b\n' | sed -E 's/(.)(.)(.)/1 \1 2 \2 3 \3/'
>> sed: RE error: illegal byte sequence
>> 1251 $
> I think the intent is to allow non-ASCII characters that are in Latin-
> 1, even though the file is coded in UTF-8. That is, not all Unicode
> characters are allowed, only those that appear in Latin-1.
Nitpick - ordinal indicators are Letters other like non-Latin scripts and micro
sign is lowercase like Western scripts so match [[:alpha:]] not [[:punct:]]:
$ man iso-8859-1 | grep '\s[[:alpha:]]\s' | head -3
252 170 AA ª FEMININE ORDINAL INDICATOR
265 181 B5 µ MICRO SIGN
272 186 BA º MASCULINE ORDINAL INDICATOR
$ grep -ah 'ORDINAL\|MICRO SIGN' unicode-symbols.txt \
unicode/15.0.0/ucd/UnicodeData.txt
ª U+00AA FEMININE ORDINAL INDICATOR
µ U+00B5 MICRO SIGN
º U+00BA MASCULINE ORDINAL INDICATOR
00AA;FEMININE ORDINAL INDICATOR;Lo;0;L;<super> 0061;;;;N;;;;;
00B5;MICRO SIGN;Ll;0;L;<compat> 03BC;;;;N;;;039C;;039C
00BA;MASCULINE ORDINAL INDICATOR;Lo;0;L;<super> 006F;;;;N;;;;;
--
Take care. Thanks, Brian Inglis Calgary, Alberta, Canada
La perfection est atteinte Perfection is achieved
non pas lorsqu'il n'y a plus rien à ajouter not when there is no more to add
mais lorsqu'il n'y a plus rien à retirer but when there is no more to cut
-- Antoine de Saint-Exupéry
More information about the tz
mailing list