Proposal for new ISO C 9x time API
Markus.Kuhn at cl.cam.ac.uk
Fri Oct 2 10:24:54 UTC 1998
Nathan Myers wrote on 1998-09-16 19:10 UTC:
> While this new proposal is much, much better than tmx, it still has
> some problems. They look fixable to me.
> First, it doesn't entirely solve the re-entrancy problem. If an error
> state and error message are to be carried around in the timezone_t
> object, then a "bad" timezone_t cannot be shared across threads which
> might have different locales. This part of the interface needs some
> rework. Given a bad timezone_t value, I don't see how strfxtime should
> indicate failure for those formats which use the time zone. An
> alternative interface, in place of tz_prep and tz_error, might be:
> timezone_t* tz_construct(const char *restrict tzstring,
> char *msg, int maxsize)
> which returns 0 for failure, and then if msg is non-null, stores a
> message into it up to max_size characters in length. This way
> there is never a bad timezone value to handle. (A null timezone
> is already specified to be treated like UTC.)
I expect tz_error usually to be called immediately after a tz_prep has
signalled a problem. If programmers are sharing a bad timezone_t
across thread, then honestly, that is their problem. Your concern
sounds to me a bit far fetched and there are many obvious programming
techniques to avoid the problem. There is no way in C for an API
designer to enforce multi-threading safety, all we can do is to provide
an API that enables multi-threading safe use of the functions, and that
is IMHO good enough.
I have indeed thought about a user provided finite buffer, as well as
about tz_error doing a malloc. The main reason why I do not like both
approaches is my recently gained experience with writing bindings from C
libraries to other languages. Let's take Ada for example: All this mess
with C returning variable length strings in a multi-threading safe way
is a non-problem in Ada. Ada allows functions to return variable length
arrays. The way most compilers (e.g., GNU Ada) do is this as follows:
There is a secondary stack managed by the run-time library. Before an
expression is evaluated which returns variable length arrays, the
secondary stack pointer is saved. The space for the variable array to be
returned is allocated on the secondary stack and can be used from there
by other functions in the expression which use returned result. The
secondary stack pointer is restored after the expression has been fully
evaluated. If the returned variable length array has to be preserved for
further use bejond the expression, it usually has to be copied (or
relinked if the secondary stack uses the same storage pool with
reference counters as the normal variable length string library). If I
have an Ada function that returns the tz_error value, then I first have
to call the C tz_error, then I have to find out how long the resulting
string is (e.g., with strlen(), perhaps the length should also be
returned), then I copy the string on the secondary stack of the Ada
runtime-library and return from the function. In your proposal, I would
have to introduce arbitrary limits for the length of the returnable
string, which are difficult to justify to users of the API in other
programming languages where such restrictions have no justification. Or
I would have to iterate over the function that accepts a user-provided
buffer, to find out how large this buffer has to be. In the end,
considering interfacing to programming languages like Ada or Python
which can comfortably return strings, I see my approach as preferably
conceptually. In practice, error messages will hardly ever be longer
than 80 characters, so a 256 limit for the maximum length should never
> Another problem I foresee is that there is no way, given a timezone_t
> object, to retrieve the string used to construct it. This might best
> be another strfxtime directive.
Which would mean that we have to force the implementor to store the
original string. Is this really necessary? The user has provided the
string himself, so why whould he depend on getting it back later. We
certainly could easily add another strfxtime conversion specifier, but I
wonder whether this is necessary at all.
> I don't like to see the %H, %M, and %S formats restricted in their
> format to only '.' and ',' decimal separators. Separate directives
> (perhaps %.nH et al) that format only the fractional part would
> allow users to supply any decimal separator they chose, as in
> "%H:%M!%.3M" for "03:20!666". (I have seen satellite navigation
> systems with stranger choices of syntax.)
Thanks, that is a very good suggestion. I also have seem frequently h,
m, or s in astronomical software as the separator instead of dot and
comma, so the decimal separator should be user provided.
Related problem: What is the semantics of decimal fractions of minutes
and hours during a leap second. These are obviously illdefined and a
neat solution is not possible (applications expecting leap seconds
should never use decimal fractions of minutes, hours, and days). The
best semantic I can thing of is to use max(nsec, 999_999_999) instead of
nsec directly when calculating these decimal fractions. For decimal
fractions of seconds, there is no problem, as long as we are with a leap
second beyond 59 (which should be guaranteed unless someone introduces a
UTC offset that is not an integral multiple of minutes).
> Given that the format already specifies 64-bit operations on the more
> commonly-used component of the time, is there any reason to restrict
> the resolution of the fractional part to nanoseconds? Clock speeds
> greater than 1e9 Hz will be common before this interface comes into
> wide use. It may as well use (say) attoseconds, as in Bernstein's
I think attoseconds are horrible overkill.
The best atomic clock on this planet (CS1 by PTB in Braunschweig) barely
can do UTC with a real-time precision of one nanosecond. GPS provides to
civilian users UTC with around 340 ns root mean square error, military
users get down to perhaps tens on nanoseconds. Radio clocks like WWV or
DCF77 provide UTC with around a millisecond precision, atmospheric path
delays are often worse. NTP also works with millisecond precision under
good conditions. So considering phase precision, we are many orders of
magnitude better with nanoseconds than what is practically required.
Considering resolution and uniqueness of local timestamps:
It is hard to imagine that mass market silicon microprocessors will
leave UHF and break through the 10 GHz barrier for internal clock speed
during my lifetime. Reading out an internal monotonic clock counter,
converting it to a portable UTC representation, and returning it via a
system call interface will certainly take much longer than a few tens of
instruction cycles (unless we see full hardware implementations of
xtime_get(), for which I see no market justification), therefore
processors that can do more than 10**9 calls to xtime_get() sound to me
very much like science-fiction at the moment. Nanoseconds sound to me
quite sufficient to guarantee unique timestamps with a comfortable
Considering frequency resolution:
A nanosecond is also a nice representation for the phase and nanoseconds
per second is a nice representation of the frequency of a kernel clock.
If you add an adjustable real second every second to your phase base,
then you can adjust the frequency with which your phase base progresses
in nanoseconds per second, also known as parts per billion. This is
significantly better already than the frequency change in your PC if you
open the window and the temperature in the room drops a few Kelvin.
Between these per-second adjustments, you do a linear extrapolation
using a bus cycle counter and precalculated compensation factors. See
the Linux kernel clock PLL for an implementation example, or Mill's
papers that I quoted on my page.
> The library might also define constants corresponding to
> one nanosecond, microsecond, and millisecond in whatever unit is used
> for the fractional part, to minimize user errors.
This is one solution. I would prefer another more general solution to
minimize user error here: Allow underscores in numeric literals, like
Ada does. I think 1_000_000_000 is much more readable than 1000000000
(it is really one billion?).
> Typos: I believe the first paragraph describing representation of leap
> seconds refers to the member "sec" in one place where it should say
I couldn't find this.
> Also, the "note" text shows up in my browser in a microscopic font.
I only used the HTML <SMALL> tag and did not specify a specific font
size. It is the reponsibility of your browser and its local
configuration to select an adequate font size. If you use Netscape under
X11, I can probably tell you how to fully configure all font sizes
(look into Netscape.ad).
> I'm interested in what can be done to improve its suitability for
> incorporation into a future C++ standard. If the C and C++ bindings
> could be described simultaneously this would save a lot of trouble
> in the future.
A C++ binding (and also an Ada95 binding) could use exceptions in order
to signal the unavailablibity of a clock in xtime_get. This would leave
the return value for the actual clock value. xtime would become a class
under C++ and a private record under Ada, and the arithmetic functions
and conversion functions to other existing time types would be
appropriately overloaded. As I pointed out above, the tz_error message
can be returned directly as an array under Ada (I don't think C++ has a
comparable allocator-free mechanism).
I don't see any immediate improvements that I could make to the C API to
make it more suitable for bindings to more modern programming languages.
I think though that this is a general design criterion, as the run-time
libraries of most modern languages are sitting on top of some underlying
C API. Therefore the underlying C API has to be as robust and flexible
as possible. In order to not hinder proper implementations of higher
Markus G. Kuhn, Security Group, Computer Lab, Cambridge University, UK
email: mkuhn at acm.org, home page: <http://www.cl.cam.ac.uk/~mgk25/>
More information about the tz