[tz] strftime %s

Mon Jan 15 04:22:33 UTC 2024

On 2024-01-14 03:14, Robert Elz wrote:
>      Date:        Sat, 13 Jan 2024 22:51:14 -0800
>      From:        Paul Eggert <eggert at cs.ucla.edu>
>      Message-ID:  <99ee73a9-6336-4f3d-a8b3-e57b2e1817dd at cs.ucla.edu>
> 
>    | It's not necessarily undefined behavior because we're talking about a
>    | standard library function, and (as you mentioned elsewhere) such
>    | functions need not be implemented in standard C.
> 
> It might not necessarily actually result in something bad happening,
> but the application must assume that it might.

There are two things going on here. (A) Does the POSIX strftime spec 
require the caller to set a struct tm component when this requirement 
should be unnecessary? And (B) does the POSIX spec fail to require the 
caller to set a struct tm component when the requirement should be 
necessary?

(A) is of lesser importance in practice. Although it's overkill for the 
POSIX strftime spec to require struct tm components to be set when they 
don't need to be set, and this overkill can cause useless work by 
portable applications, it's not that big a deal. In practice nearly 
every app calls strftime on the result of localtime etc. and so the 
components are set anyway. Our thread, if I understand things correctly, 
is mostly about (B), not (A). More on this below.

>    | More important, the draft POSIX standard is simply wrong if it says
>    | strftime can't look at (for example) tm_gmtoff when calculating %z,
> 
> The implementation can look at whatever it likes.

That's good news. In that case we're in agreement.

>    | In POSIX 2017 strftime must also look at TZ for %Z and %z,
> 
> I'm not sure must is correct, I suspect that the intended
> implementation for %Z is
> 
> 	if (_tz_inited)
> 		strcpy(result, tzname[tm->tm_isdst == 0]);
> 	else
> 		strcpu(result, "");

Oh, by "look at TZ" I meant look at data generated from TZ's value. 
tzname is part of that data, so the code you give is "looking at TZ" in 
the sense I meant.

If by "_tz_inited" you meant "after[ strftime did its mandatory call to 
tzset-or-equivalent, that call sucessfully determined the current 
timezone", I agree the code you gave reflects the intent for POSIX-2017. 
(If you meant something else then it'd be helpful to know what it was.) 
However, it's not at all clear that the code reflects the intent, or 
should reflect the intent, for POSIX-202x/D4.

Similarly for %z.

> the implementation can do it some other way if it likes,
> but it cannot assume that tm->tm_gmtoff or tm->tm_zone has been
> set to anything intersting (tm_zone might be a pointer to somewhere
> which generates SIGSEGV if referenced).

That's true for POSIX-2017, but not true for POSIX 202x/D4. It's OK for 
the implementation to examine tm_zone when processing %Z. See POSIX 
202x/D4 line 69872.

>    | I'm talking about where draft POSIX
>    | mistakenly says strftime needs mktime (or equivalent) to implement %s.
> 
> It says nothing of the kind.   What it says is:
> 
> 	Replaced by the number of seconds since the Epoch as a decimal number,
> 	calculated as described for mktime().
>          [tm_year, tm_mon, tm_mday, tm_hour, tm_min, tm_sec, tm_isdst]
> 
> That doesn't say that mktime() needs to be used

That's why I wrote "mktime (or equivalent)", not "mktime". The draft is 
imprecisely worded here, and it's easy to misread it as saying this:

> 	strftime(buf, sizeof buf, "%s", &tm);
> needs to be the exact same thing you'd get from
> 	snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm));

but this reading isn't quite right. All that's needed is for strftime to 
compute seconds since the Epoch in the usual way (i.e., using the 
Gregorian calendar and ignoring leap seconds), and while doing that to 
infer the UTC offset following the constraints described in the mktime 
section. Those constraints do not uniquely determine a result in every 
case, and this gives strftime wiggle room. That is, strftime need not 
use the same code that mktime does to make its inferences, and on a 
particular struct tm an implementation's strftime %s could infer a 
different UTC offset than the same implementation's mktime on the same 
struct tm.

> Note that in the strftime case though, the tm struct is not altered
> by the call, which it would be (if required) by the mktime() variant.

Yes, of course.

>    | First, an implementation of POSIX 202x/D3 strftime doesn't need to go
>    | near TZ to implement any conversion, not even %s.
> 
> Agreed.

That's good.

> 
>    | Second, the POSIX draft requires strftime to act as if it calls tzset,
> 
> No, what it says is:
> 
> 	Local timezone information shall be set as though strftime( )
> 	called tzset( ).

I see that as the same idea, just using different words. There should be 
no practical difference.

> So if you want %z or %Z then things act as if tzset() was called (whether
> it actually is or not), because there's no other way to get the data
> that would allow those to be possible.

But there is another way. With %z, strftime can use tm_gmtoff; see POSIX 
202x/D4 line 69870. And with %Z, strftime can use tm_zone; see line 69872.

The same argument applies to %s, if we fix the error on line POSIX 
202x/D4 line 69837 where tm_gmtoff is mistakenly omitted. There's no 
need to look at tzset's output if strftime simply uses tm_gmtoff and the 
other struct tm members listed there.

>    | Where? I don't see LC_CTYPE mentioned anywhere in the strftime section
>    | (202x/D3 lines 69411-69832). Obviously LC_CTYPE is essential but it's
>    | not explicitly called out in strftime's description.
> 
> If it were needed to reference the LC_ vars in every function which uses
> them in some way, the standard would be even bigger than it is.

That's fine, and I'm not objecting to that. All I'm saying is that the 
standard does not list every source of information that strftime can use 
to process conversion specs. For example, when lines 69871-69872 say:

   Z  Replaced by the timezone name or abbreviation,
      or by no bytes if no timezone information exists.
      [tm_isdst, tm_zone]

This does not mean that %Z's replacement is completely determined by 
tm_isdst and tm_zone; all it means is that tm_isdst and tm_zone must be 
set by the caller and must be in the normal range so that strftime can 
use them (among other things) to determine %Z's replacement.

>    | something that is not
>    | explicitly mentioned in the spec; and this demonstrates that the set of
>    | things that strftime can use is not exhaustive.
> 
> Of course not, strftime() can look at whatever it likes.

Good, and this matches what I just wrote above. (I hope we're in violent 
agreement. :-)

> Notice I said "strictly C conforming" not POSIX conforming.
> 
> In that environment, this line:
> 
>    |      tm.tm_gmtoff = gmtoff;
> 
> is likely to generate a compilation error, so the application cannot
> include it.  That one must be removed to be strictly C conforming.

Oh, good point. So let's use similar code but without tm_gmtoff:

   #include <stdio.h>
   #include <time.h>

   static void
   f (char const *fn, struct tm *tm)
   {
     char buf[100];
     tm->tm_isdst = 0;
     strftime (buf, sizeof buf, "%z", tm);
     printf ("after %s, %%z formats as '%s' with tm_isdst=%d\n",
	    fn, buf, tm->tm_isdst);
   }

   int
   main ()
   {
     time_t t = 0;
     f ("   gmtime",    gmtime (&t));
     f ("localtime", localtime (&t));
   }

On Ubuntu 23.10 with TZ="America/Los_Angeles" in the environment, this 
outputs:

   after    gmtime, %z formats as '+0000' with tm_isdst=0
   after localtime, %z formats as '-0800' with tm_isdst=0

which is fine and is the sort of behavior that Dag-Erling Smørgrav 
expected, even though strftime obviously must be using information other 
than what's in tm_isdst (or even in TZ) to compute the differing strings 
"+0000" and "-0800".

Because strftime can look at whatever it likes, this behavior is OK.

>    | It intends to fix the bug reported by Dag-Erling Smørgrav here:
>    | https://mm.icann.org/pipermail/tz/2024-January/033488.html
> 
> But there is no bug described there, the answers that the example
> produced are what is intended to happen.   That's because of the
> "same result as mktime() would produce" requirement.

As mentioned above, since mktime's behavior isn't completely determined 
by POSIX 202x/D4, there's wiggle room in how strftime can behave on 
Dag-Erling's example.

One possibility is that tzcode conforms to POSIX 202x/D4 both before and 
after the recently-installed tzcode patch 
<https://mm.icann.org/pipermail/tz/2024-January/033524.html>. (This 
patch implements Dag-Erling's suggestion, albeit in a different way that 
avoids some rare overflow issues.) That is, it's possible that the patch 
didn't fix a POSIX-conformance bug, but merely a user-expectation bug in 
an area where POSIX allows different behaviors.

If this possibility is correct, I guess I can live with it, though I'm a 
bit disappointed that POSIX allows the confusing behavior that 
Dag-Erling described. But anyway, this would mean the recently-installed 
patch is OK as far as POSIX 202x/D4 is concerned.

> This is from draft 4, but I don't believe that any of
> this changed after D3.

Thanks, I didn't know that draft 4 was out. I got a copy and am now 
referring to it instead in my comments now. I too haven't noticed any 
changes in this area.