[tz] strftime %s

Robert Elz kre at munnari.OZ.AU
Mon Jan 15 09:38:43 UTC 2024


    Date:        Sun, 14 Jan 2024 20:22:33 -0800
    From:        Paul Eggert <eggert at cs.ucla.edu>
    Message-ID:  <24326f8e-ec8d-4a70-bf82-d62a5790ae7f at cs.ucla.edu>

  | (A) is of lesser importance in practice.

Agreed.

  | Although it's overkill for the POSIX strftime spec to require
  | struct tm components to be set when they don't need to be set,

You're missing the point of it all.  The components that it lists
are what some correct implementations need to be set to function.
That some other implementation might not is irrelevant - the goal
is for the user to be able to write code that will work with any
conforming implementation, not just the one that they happen to be
using when they write the code.

  | and this overkill can cause useless work by portable applications,

Not useless, just perhaps not needed for a particular implementation
but if the application starts caring about that, it might as well
delve into any local variation, and cease pretending to be portable.

  | it's not that big a deal. In practice nearly 
  | every app calls strftime on the result of localtime etc.

Is there some evidence to support that?   And even if true,
why would you penalise those other apps which don't?

  | Our thread, if I understand things correctly, 
  | is mostly about (B), not (A).

Yes.

  | > The implementation can look at whatever it likes.
  | That's good news. In that case we're in agreement.

Yes, on that we are.   Of course, the implementation still needs
to implement the result that is actually specified to be produced
and not some other result it thinks might be better, and it needs
some way to determine whether the other information has actually
been provided by anyone or not (otherwise there's nothing there
to use).   For things like locale info, that's easy, as the rules
specify that always exists.   For TZ related info, also easy in
specific cases, as there the rules specify "as if tzset() were
called" and that allows access to TZ and all that it happens to
provide.   But only when the spec says that, not just arbitrarily.

  | Oh, by "look at TZ" I meant look at data generated from TZ's value. 

Oh - that's not how I interpreted it, but OK.

  | If by "_tz_inited" you meant "after[ strftime did its mandatory call to 
  | tzset-or-equivalent, that call sucessfully determined the current 
  | timezone", I agree the code you gave reflects the intent for POSIX-2017. 
Yes.  And that is what I meant.

  | (If you meant something else then it'd be helpful to know what it was.) 
  | However, it's not at all clear that the code reflects the intent, or 
  | should reflect the intent, for POSIX-202x/D4.

Since I was replying to a comment of yours which started:

	In POSIX 2017 strftime must ...

so I am not sure what the current drafts say is relevant.

But yes, once the next POSIX is published, then the tm_gmtoff field
will be available to %z and tm_zone to %Z, and simply using those
will be easy to do.   Of course, if you do it that way, you're
breaking any existing applications which were written to either
conform to the C standards (any of them) or versions up to and
including the current published POSIX standard (as who knows, when
it comes to ISO and IEEE balloting, the current drafts might be
rejected, and be sent back to be "fixed".)   But since that bridge
was crossed a long long time ago, there are unlikely to be many.

  | That's true for POSIX-2017, but not true for POSIX 202x/D4.

Again, see just above for the context for my comments.  It was
your restriction that provoked that response.

  | It's OK for the implementation to examine tm_zone when processing %Z.
  | See POSIX 202x/D4 line 69872.

Oh, I know that's there, it was my defect report that got those all added
properly.  It just wasn't relevant to your precondition.

  | That's why I wrote "mktime (or equivalent)", not "mktime".

OK.

  | The draft is imprecisely worded here, and it's easy to misread it
  | as saying this:
  |
  | > 	strftime(buf, sizeof buf, "%s", &tm);
  | > needs to be the exact same thing you'd get from
  | > 	snprintf(buf, sizeof buf, "%jd", (intmax_t)mktime(&tm));

That is what it (at least) intends to be saying.   If the wording
needs fixing, now is the time to make that happen, if you can make
anyone believe that the words can reasonably be read any other way
(within the context of everything that is in the HUGE standard doc).

  | but this reading isn't quite right. All that's needed is for strftime
  | to compute seconds since the Epoch in the usual way (i.e., using the 
  | Gregorian calendar and ignoring leap seconds),

But aside from correcting for out of range values, which strftime is
not required to do, that's eactly what mktime() is specified to do.
Referencing mktime() simply avoids saying all of that in two different
places (which could lead to eventual contradictions, if one of them is
altered and the other is not).

  | infer the UTC offset following the constraints described in the mktime 
  | section. 

That UTC offset *must* come from the TZ value, such that if TZ is
altered to refer to some other offset, then the result from mktime()
(and hence from strftime(%s)) must change.   The contents of the
struct tm are not allowed to alter that.   The example code we were
shown where TZ is purposely altered and then the results of the
two calls subtracted from each other to show the time offset between
two different timezones (the subtraction only works with a POSIX
time_t but for most of us, that's all that matters) is an example
of code that must not be broken.

I think that was using mktime(), but believe me, mktime() and
strftime("%s") are required to produce the exact same number.
Always (assuming in range values in the tm).

  | Those constraints do not uniquely determine a result in every case,

Only in summer time local time warps, and if you believe the POSIX
people, not even then unless tm_isdst is set to -1.

  | and this gives strftime wiggle room. That is, strftime need not 
  | use the same code that mktime does to make its inferences,

No, it doesn't need to use the same code, but providing the 7
struct tm values it has to work with are within range, it must
produce the same answer as mktime() would, however the code is
written.

  | and on a particular struct tm an implementation's strftime %s
  | could infer a different UTC offset than the same implementation's
  | mktime on the same struct tm.

That would be broken.   It must generate the same result, not something
different.   There is no "wriggle" room to allow anything different.

  | But there is another way. With %z, strftime can use tm_gmtoff;
  | see POSIX 202x/D4 line 69870. And with %Z, strftime can use tm_zone;
  | see line 69872.

Yes, those can, because that's what common implementations where those
fields exist (which is most of them) actually do, and so applications
tend to accomodate that already.   That's what is needed for the
standard to get updated - it is specifying what actually exists and
works (except where known bugs exist).   It is not a legislature.
Where there is no common ground, the standard just ends up saying
that something is unspecified, which is a big red flag for applications
to avoid stepping into that pothole.

  | The same argument applies to %s,

No it doesn't, because that's not what the implementations have done.
Not even yours, until a week or so ago.  No working application code
expects that.   Mistaken users might, and they might send in incorrect
bug reports complaining that their code doesn't work because of it,
but it is thier mistaken belief that needs to be corrected.

  | if we fix the error on line POSIX 202x/D4 line 69837 where tm_gmtoff
  | is mistakenly omitted.

Good luck with that.   And I can assure you, that was not an accident.
But by all means, submit a defect report, and see how far that gets
you.

  | This does not mean that %Z's replacement is completely determined by 
  | tm_isdst and tm_zone; all it means is that tm_isdst and tm_zone must be 
  | set by the caller and must be in the normal range so that strftime can 
  | use them (among other things) to determine %Z's replacement.

Yes.   But the implementation needs to know that whatever other data
it wants to look at is in fact actual pertinent data, and not just
random bits, or it won't be producing the correct result.  So if it
wants to use tzname[] (as an example, given the %Z assumed) it needs
to know (or arrange) for tzname[] to have been correctly set.
And for tzname[] that would mean (explicitly or via some equivalent)
calling tzset().   Not copying tm_zone into tzname[] and then using
that - that would be broken.   It can just return what is in tm_zone
though (or will be able to in the next POSIX - which is there because
that's what is currently actually done) and forget about the "among
other things" (and ignore tm_isdst completely).

  | > Of course not, strftime() can look at whatever it likes.
  | Good, and this matches what I just wrote above.
  | (I hope we're in violent agreement. :-)

We are, providing you're not proposing to use data that isn't
guaranteed to be valid.  I'm in violent opposition to that.

  | On Ubuntu 23.10 with TZ="America/Los_Angeles" in the environment, this 
  | outputs:
  |
  |    after    gmtime, %z formats as '+0000' with tm_isdst=0
  |    after localtime, %z formats as '-0800' with tm_isdst=0

which is because it is using the tm_zone extension, that C does
not guarantee exists, but ununtu is more POSIX like, and is using
that, which is how it got added to the forthcoming POSIX spec
(because that's how the POSIX world actually works, regardless of
what the old spec said).

  | which is fine and is the sort of behavior that Dag-Erling Smørgrav 
  | expected,

Yes, but note, that was a complaint that it didn't work, because
that's not how the implementations work.   And hence, not how the
function is specified to behave, now, or in the next POSIX.

If you want to go into the vanguard, and make changes arbitrarily,
knowingly violating POSIX, that's fine (I do that kind of thing in
other areas where the standard is stupid, even when the reasons it
is stupid were once valid) that's fine.  But expect to get (perhaps
many) bug reports over the next decade or two until there's any
chance of POSIX being updated to match your implementation, with
users pointing to the standard and asking why you're not doing what
other conforming implementations do, and requesting you to fix it.

After all, the tzcode strftime() implementation has been how it
was for how many decades now, and just how many complaints like
that one on this issue have been received in all that time ?

  | Because strftime can look at whatever it likes, this behavior is OK.

As long as producing garbage answers is OK to you, then, fine.

See my (quite) recent reply to Steve's message to see an example
of code that should work, and you'd be breaking by doing this.

More likely code would be something more like

    void
    convfile(FILE * ifd, FILE * ofd)
    {
	struct tm T;
	char buf[1024];
	char sbuf[128];
	int line = 0;

	while (fscanf(ifd, "%d-%d-%d %d:%d:%d %1000s",
		&T.tm_year, &T.tm_mon, &T.tm_mday,
		&T.tm_hour, &T.tm_min, &T.tm_sec, buf) != EOF) }

	    line++;

	    T.tm_year -= 1900;
	    T.tm_mon -= 1;

	    if (invalid_tm_ranges(&T)) {   /* function not supplied here */
		fprintf(stderr, "Line %d, Whatever...", line /*, ... */ );
		continue;
	    }

	    if (strftime(sbuf, sizeof sbuf, " %s") == 0) {
		fprintf(stderr, "Line %d, cannot convert time", line);
		continue;

	    fprintf(ofd, "%s %s\n", sbuf+1, buf);
	}
    }

That needs to work (given the limitations on what I had time
to write here, and as with my example in the previous message, not
even compile tested, so there may be some stupid bugs, and of
course, the internal func that's called needs writung (that's just
simple comparisons of the 6 values in the arg struct tm* against
the ranges specified - 5 really, as anything goes for tm_year).

  | As mentioned above, since mktime's behavior isn't completely determined 
  | by POSIX 202x/D4,

That's not what the POSIX writers want you to think, at least when
tm_isdst is not -1 (you made it be 0, which is fine) - they're wanting
mktime() (and hence strftime("%s") because it is defined by reference)
to be usable for arithmetic on the struct tm, as that's the only way
a conforming C application can modify a time_t (other than calling one
of the functions, like time() ior stat() which returns one or more).
And supporting conforming C applications is one of the goals.

  | there's wiggle room in how strftime can behave on Dag-Erling's example.

No, there really isn't.

  | That is, it's possible that the patch didn't fix a POSIX-conformance
  | bug, 

No, it certainly did not do that, rather it introduced one.

  | but merely a user-expectation bug in 
  | an area where POSIX allows different behaviors.

Try asking that on the Austin Group list, and see how far it
gets you.

  | If this possibility is correct, I guess I can live with it, though
  | I'm a bit disappointed that POSIX allows the confusing behavior that 
  | Dag-Erling described.

Not allows.   Requires.  There is no assumption that tm_gmtoff is
set to anything at all when mktime() or strftime("%s") are called.
Using it for them is a bug.   Your implementation needs to work for
the code in my previous message, and this one (at least if any
idiotic typos/thinkos are fixed, and they are fleshed out to be
complete programs, with #include added, and all that stuff).

The issue is all based upon the mistaken belief that a struct tm
has an underlying time_t upon which it is based, and that's what
%s should produce (and since that's defined as the same as mktime()
produces, then that can, and would have to be, extended to mktime()
as well).

  | But anyway, this would mean the recently-installed 
  | patch is OK as far as POSIX 202x/D4 is concerned.

It isn't.   But by all means, there's no need to trust my
interpretation, ask on the austin group list, or submit a
defect report via mantis, and see what happens (I know you
know how to do both of those).

kre


More information about the tz mailing list