[tz] Corrupt output (was: Re: [tz-announce] 2020e release of tz code and data available
Guy Harris
gharris at sonic.net
Wed Dec 23 08:26:53 UTC 2020
On Dec 22, 2020, at 10:59 PM, Deborah Goldsmith via tz <tz at iana.org> wrote:
> OK, I think I (mostly) figured it out. On Darwin (macOS) the default value of FS is “ “ (space).
On any Single UNIX Specification-compatible system, the default value of FS is space.
To quote the awk page in The Open Group Base Specifications Issue 7, 2018 edition:
https://pubs.opengroup.org/onlinepubs/9699919799/utilities/awk.html
"FS
Input field separator regular expression; a <space> by default."
Apple doesn't claim conformance to that (I've seen it referred to as "V7", which is more than a bit amusing...), but they do claim conformance to UNIX 03, and the UNIX 03 awk page:
https://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html
says the same thing.
That probably goes back to earlier versions - all the way back to V7 (Seventh Edition UNIX, not Issue 7 of the Single UNIX Specification), I'd bet.
The GNU Awk manual's section on default field splitting:
https://www.gnu.org/software/gawk/manual/gawk.html#Default-Field-Splitting
says
Fields are normally separated by whitespace sequences (spaces, TABs, and newlines), not by single spaces. Two spaces in a row do not delimit an empty field. The default value of the field separator FS is a string containing a single space, " ". If awk interpreted this value in the usual way, each space character would separate fields, so two spaces in a row would make an empty field between them. The reason this does not happen is that a single space as the value of FS is a special case—it is taken to specify the default manner of delimiting fields.
And the Single UNIX Specification awk page says:
An extended regular expression can be used to separate fields by using the -F ERE option or by assigning a string containing the expression to the built-in variable FS. The default value of the FS variable shall be a single <space>. The following describes FS behavior:
* If FS is a null string, the behavior is unspecified.
* If FS is a single character:
* If FS is <space>, skip leading and trailing <blank>s; fields shall be delimited by sets of one or more <blank>s.
* Otherwise, if FS is any other character c, fields shall be delimited by each single occurrence of c.
* Otherwise, the string value of FS shall be considered to be an extended regular expression. Each occurrence of a sequence matching the extended regular expression shall delimit fields.
so, again, FS = " " is a special case, meaning "one or more blanks separate fields".
The awk.h in Apple's awk is copyright by Lucent Technologies, which indicates that it's presumably an AT&T version that got open-sourced, probably the One True AWK.
I'm not sure which versions of AWK that leaves out, so "On Darwin (macOS) the default value of FS is “ “ (space)." can probably be replaced by "in any version of AWK worthy of the name the default value of FS is " " (space).", so that's not a difference between macOS and other OSes.
> I suspect that these failures will occur on any system, not just Darwin,
Probably, as per the above.
> but I don’t have access to a non-Darwin system with a working awk at the moment.
I have a large pile of VMs running Linux, Solaris 11, and various *BSDs (as well as macOS going back to Leopard!), so I can give it a try on several of them (all of them would be a bit tedious:
$ ls -d ~/Documents/Virtual\ Machines/*.vmwarevm | wc -l
55
but trying it on the most recent version of each major group of OSes, dumping both Ubuntu and Fedora into the "Linux" group, wouldn't be too bad).
More information about the tz
mailing list