[tz] strftime %s
Robert Elz
kre at munnari.OZ.AU
Sun Jan 14 11:14:37 UTC 2024
Date: Sat, 13 Jan 2024 22:51:14 -0800
From: Paul Eggert <eggert at cs.ucla.edu>
Message-ID: <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd at cs.ucla.edu>
| It's not necessarily undefined behavior because we're talking about a
| standard library function, and (as you mentioned elsewhere) such
| functions need not be implemented in standard C.
It might not necessarily actually result in something bad happening,
but the application must assume that it might.
| More important, the draft POSIX standard is simply wrong if it says
| strftime can't look at (for example) tm_gmtoff when calculating %z,
The implementation can look at whatever it likes. There's no problem
there, but it cannot assume that the application has placed any meaningful
data in the fields that the standard doesn't say that strftime() is likely
to examine.
| In POSIX 2017 strftime must also look at TZ for %Z and %z,
I'm not sure must is correct, I suspect that the intended
implementation for %Z is
if (_tz_inited)
strcpy(result, tzname[tm->tm_isdst == 0]);
else
strcpu(result, "");
(ignoring buffer overflows and stuff like that), and similarly for
%z except using sprintf to generate a string from "timezone" in the
case that a value has earlier been placed there - still "" otherwise).
No examination of TZ (by strftime) required for those.
Of course, the implementation can do it some other way if it likes,
but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been
set to anything intersting (tm_zone might be a pointer to somewhere
which generates SIGSEGV if referenced).
| That's another place where the POSIX draft gets things wrong.
| I'm talking about where draft POSIX
| mistakenly says strftime needs mktime (or equivalent) to implement %s.
It says nothing of the kind. What it says is:
Replaced by the number of seconds since the Epoch as a decimal number,
calculated as described for mktime().
[tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst]
That doesn't say that mktime() needs to be used, just that the value
you get from
strftime(buf, sizeof buf, "%s", &tm);
needs to be the exact same thing you'd get from
snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm));
Note that in the strftime case though, the tm struct is not altered
by the call, which it would be (if required) by the mktime() variant.
Regardless of how those 7 fields of the struct tm got filled in, and
regardless of what (if anything at all) is in the other fields of the
struct (the ones that must exist, and any others the implementation
might have added), if those two ways of generating a string representation
of an integer in buf don't do the same thing (assuming the mktime()
variant doesn't generate an error and return (time_t)-1 - and perhaps
even then) then the implementation is broken.
| First, an implementation of POSIX 202x/D3 strftime doesn't need to go
| near TZ to implement any conversion, not even %s.
Agreed.
| Second, the POSIX draft requires strftime to act as if it calls tzset,
No, what it says is:
Local timezone information shall be set as though strftime( )
called tzset( ).
So if you want %z or %Z then things act as if tzset() was called (whether
it actually is or not), because there's no other way to get the data
that would allow those to be possible. Similarly, with %s, since we
need to act as if mktime() was called (or at least generate the same
result) then we need to know the local timezone, so mktime() is defined
to act as if tzset() were called, and consequently, so does strftime()
when using %s. For other conversions, there's no reference to local
time at all, strftime() simply formats whatever is in the tm handed to
it (or probably nothing, if you ask for the name of the 13th day of the week,
or the 19th month, or something similarly stupid - that's actually unspecified).
| Where? I don't see LC_CTYPE mentioned anywhere in the strftime section
| (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's
| not explicitly called out in strftime's description.
If it were needed to reference the LC_ vars in every function which uses
them in some way, the standard would be even bigger than it is. They
are listed for utilities, but for the system interfaces (ie: functions)
see XBD 7.1 where it says:
The behavior of some of the C-language functions defined in the
System Interfaces volume of POSIX.1-202x shall also be modified
based on a locale selection. The locale to be used by these functions
can be selected in the following ways:
[the first two mechanisms that different functions might use omitted here]
3. Some functions, such as catopen( ) and those related to text
domains, may reference various environment variables and a locale
category of a specific locale to access files they need to use.
And on it goes. This is from draft 4, but I don't believe that any of
this changed after D3.
| I mentioned LC_CTYPE only because it's another example of something that
| strftime can use to calculate conversions,
I assume it might want LC_NUMERIC as well, but that's not mentioned either.
LC_TIME is, because that one is particularly relevant to some of the
conversions.
| something that is not
| explicitly mentioned in the spec; and this demonstrates that the set of
| things that strftime can use is not exhaustive.
Of course not, strftime() can look at whatever it likes. What it can't
do is expect the application to have provided any data that the standard
does not require of it. Locales are easy (in this regard), as everything
defaults to "C" (aka "POSIX") if the application has done nothing. So
any of the system interfaces can access any locale information it needs,
it is always defined (somehow - how is up to the implementation).
| This interpretation is too strict,
Notice I said "strictly C conforming" not POSIX conforming.
In that environment, this line:
| tm.tm_gmtoff = gmtoff;
is likely to generate a compilation error, so the application cannot
include it. That one must be removed to be strictly C conforming.
Given that one is not there, how is it that the strftime() function
would be expected to use the parameter to this function ?
| If the abovementioned interpretation were correct, then to conform to
| POSIX-2017, strftime %z could look *only* at tm_isdst and could not look
| at tm_gmtoff, and the Ubuntu behavior would therefore be incorrect
| because its strftime is clearly looking at tm_gmtoff.
It is fine, provided it continues working when the application hasn't
provided a value for tm_gmtoff. And somehow it can work out the difference
between that field (which will still be in the struct of course) hasn't
been set and just contains garbage, and when it has.
That's no longer the case in the forthcoming standard, as %z is allowed
to use the value in tm_gmtoff, and so conforming applications will need to
set it.
| But the Ubuntu behavior *is* correct: it's good behavior, it's the
| behavior most people would expect, and it's common on many
| implementations. If an interpretation of the C and/or POSIX standards
| says that Ubuntu doesn't conform,
For C, clearly not, as tm_gmtoff doesn't exist there (which doesn't mean
that an implementation cannot add it, but no conforming application can
assumes that it has been - so it can never be init'd, except perhaps to
0 by a memset() (or equiv) of the entire struct.
For POSIX, now that tm_gmtoff has been added, the standard has been
amended.
| It intends to fix the bug reported by Dag-Erling Smørgrav here:
| https://mm.icann.org/pipermail/tz/2024-January/033488.html
But there is no bug described there, the answers that the example
produced are what is intended to happen. That's because of the
"same result as mktime() would produce" requirement. And mktime()
is defined to be the inverse of localtime() not of gmtime().
(C23 or something is supposedly adding timegm() - which is sorely
lacking, and POSIX will then add it in issue 9, in a decade or two,
perhaps, just perhaps, but unlikely, in some earlier TC update).
| Without the patch, the bug occurs even on systems with tm_gmtoff.
Once again, there is no bug (or at least, was, if you have changed how it
works, anything like was requested, there will be a bug now).
It might not have met his expectations, in which case his expectations
were incorrect. This is really all very simple stuff.
kre
More information about the tz
mailing list