[tz] [PROPOSED] Allow some IPA symbols in commentary

Paul Eggert eggert at cs.ucla.edu
Thu Oct 4 18:54:53 UTC 2018


On 10/4/18 9:23 AM, Paul.Koning at dell.com wrote:
> I wonder if the Unicode standard says so, or if this is a Linux bug.

I looked into it, and it's arguably a bug in the glibc regular 
expression matcher: it does not recognize u̯ as alphabetic because it's 
composed of U+0075 U+032F and the latter is not alphabetic. Let's add a 
comment to that effect and remove ɪ from the macro since it works OK. 
Proposed further patch attached.

-------------- next part --------------
From b9e47bcf13fa6fe46f340e1265fde5053ead4282 Mon Sep 17 00:00:00 2001
From: Paul Eggert <eggert at cs.ucla.edu>
Date: Thu, 4 Oct 2018 11:47:44 -0700
Subject: [PROPOSED] =?UTF-8?q?*=20Makefile=20(UNUSUAL=5FOK=5FIPA):=20Omit?=
 =?UTF-8?q?=20=E2=80=98=C9=AA=E2=80=99=20as=20unnecessary.?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 Makefile | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 39da17e..d56a198 100644
--- a/Makefile
+++ b/Makefile
@@ -421,8 +421,10 @@ SAFE_CHAR=	'[]'$(SAFE_CHARSET)'-]'
 # These characters are Latin-1, and so are likely to be displayable
 # even in editors with limited character sets.
 UNUSUAL_OK_LATIN_1 = «°±»½¾×
-# IPA symbols are OK in commentary despite being non-alphabetic.
-UNUSUAL_OK_IPA = ɪu̯
+# This IPA symbol is represented in Unicode as the composition of
+# U+0075 and U+032F, and U+032F is not considered alphabetic by some
+# grep implementations that do not grok composition.
+UNUSUAL_OK_IPA = u̯
 # Non-ASCII non-letters that OK_CHAR allows, as these characters are
 # useful in commentary.
 UNUSUAL_OK_CHARSET= $(UNUSUAL_OK_LATIN_1)$(UNUSUAL_OK_IPA)
-- 
2.17.1



More information about the tz mailing list