Position of rules definitions
Brian S O'Neill
bronee at gmail.com
Thu May 3 14:40:40 UTC 2007
The current tz parsers seem to go quite fast, so it doesn't seem worth
the trouble to optimize the format. Also keep in mind that the O(logN)
searches don't need to exist, as hashtables are often used instead to
manage symbols. This optimization is used by one and two pass parsers.
Since the full set of tz rules is fairly small, the parser can read all
the rules in memory and process them there. The biggest cost is disk I/O
for the first pass, and the second pass is just in memory and costs
nothing in comparison.
I'm not too sure how zic works, but the tz parser I wrote does
everything I described above.
Liviu Nicoara wrote:
> Allow me to offer a performance reason which would probably show where
> I am heading; a two-step parser would have to deal with:
> 1. Failure to resolve an undefined reference at a cost of O(logN) per
> search in a set of N rules.
> 2. Creating a set of unresolved references - at a total cost of O(M)
> for M unresolved references if the set is unordered (plus a constant
> if necessary adornment of the dangling reference is needed, etc.).
> 3. Search the [smaller] set of unresolved references and for each one
> (M) perform a search at a minimum cost of O(logN) for a set of N
> defined rules.
> (The above with fairly basic data structures and algorithms and
> further optimization may be possible.)
> A one-step parser would only incur the penalty of step 1 because it
> would always find the [already] defined rule.
> All of the above may seem academic, and indeed the sets are small
> enough that we could ignore, e.g., the differences between linear and
> binary searches, but it just does not seem right. If all we are
> challenging is convenience, then I believe it is worth a second look
> regardless of the capabilities of existing parsers.
> I hope I have not offended anyone by insisting on this relatively
> unimportant issue.
More information about the tz