[tz] [PROPOSED] Allow some IPA symbols in commentary
Paul Eggert
eggert at cs.ucla.edu
Thu Oct 4 18:54:53 UTC 2018
On 10/4/18 9:23 AM, Paul.Koning at dell.com wrote:
> I wonder if the Unicode standard says so, or if this is a Linux bug.
I looked into it, and it's arguably a bug in the glibc regular
expression matcher: it does not recognize u̯ as alphabetic because it's
composed of U+0075 U+032F and the latter is not alphabetic. Let's add a
comment to that effect and remove ɪ from the macro since it works OK.
Proposed further patch attached.
-------------- next part --------------
From b9e47bcf13fa6fe46f340e1265fde5053ead4282 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert at cs.ucla.edu>
Date: Thu, 4 Oct 2018 11:47:44 -0700
Subject: [PROPOSED] =?UTF-8?q?*=20Makefile=20(UNUSUAL=5FOK=5FIPA):=20Omit?=
=?UTF-8?q?=20=E2=80=98=C9=AA=E2=80=99=20as=20unnecessary.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
---
Makefile | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/Makefile b/Makefile
index 39da17e..d56a198 100644
--- a/Makefile
+++ b/Makefile
@@ -421,8 +421,10 @@ SAFE_CHAR= '[]'$(SAFE_CHARSET)'-]'
# These characters are Latin-1, and so are likely to be displayable
# even in editors with limited character sets.
UNUSUAL_OK_LATIN_1 = «°±»½¾×
-# IPA symbols are OK in commentary despite being non-alphabetic.
-UNUSUAL_OK_IPA = ɪu̯
+# This IPA symbol is represented in Unicode as the composition of
+# U+0075 and U+032F, and U+032F is not considered alphabetic by some
+# grep implementations that do not grok composition.
+UNUSUAL_OK_IPA = u̯
# Non-ASCII non-letters that OK_CHAR allows, as these characters are
# useful in commentary.
UNUSUAL_OK_CHARSET= $(UNUSUAL_OK_LATIN_1)$(UNUSUAL_OK_IPA)
--
2.17.1
More information about the tz
mailing list