[Latingp] Work Product Quality

Mon Feb 25 18:56:26 UTC 2019

Dear colleagues,

To chime into this discussion - I tend to agree with Bill in that the data is always more important the theory. Our methodologies are based on certain hypotheses. If these turn out to be insufficient, the solution is not to ignore the data but to amend the methodology and change the hypothesis and those are very basic principles of scientific analysis and methods. Nonetheless I find this last exchange of emails a bit unspecific and would suggest focusing on the actual data, rather than having abstract methodological debates:

I assume that Bill’s email makes reference to the discussions we had regarding the cases of double diaresis vs. double acute during our last teleconference. If I remember correctly the gist of the line of argumentation by Dennis was that us considering them variants was not possible, because our reasons were based on visual similarity, while we were discussing non-visual similarity at this time.

The data was that we had votes from two users of languages, which make use of diaresis, both of which judged them confusable, namely me and Michael Bauland. We had several examples from fonts where they were near-homoglyphs if not homoglyphs to use established terminology (without implying a categrorial notion of visual vs. non-visual similarity). Now Dennis argued that if we would make them variants because they were near-homoglyphs or homoglyphs, we would have to restrain ourselves to the rendering in the same three fonts we had opted for earlier in our analysis of cross-script visual variants.

In this context, I would like to remind the group of two facts:

1) We have not yet conducted an analysis of in-script variants on strictly visual grounds. With cross-script variants we started with a visual analysis, and then during that work discovered that we had to consider further non-visual criteria. This led us to developing the current theories and framework of how to identify non-visual criteria. We went on to amend our visual analysis in the case of cross-script variants and then proceed with an analysis of in-script variants.

Accordingly, we haven’t had an analysis of in-script variants on strictly visual grounds as of yet. So, either that task remains, or we do a combined pass analyzing both aspects together, and the latter seems the obvious choice to me.

2) As I tried to explain in the proposal itself, there is no systematic, non-arbitrary boundary between visual and non-visual similarity - One conditions the other and vice versa. Therefore, in my opinion we should not try to enforce such a boundary, neither from the point of analysis nor from the point of methodology.

We started with a visual analysis of cross-script variant candidates because that is what was expected from us, as clarified by the Integration Panel on various occasions. However, we cannot abuse these statements to deduct that - by consequence - other things are out of scope and I’m confident that an incomplete analysis will not be accepted by the Integration Panel.

Therefore, the relevant question in our discussion will be if there is a succinct risk for stability, rather that which type of variant relationship a particular case falls into and whether we applied the correct methodology.

In the case under discussion I interpret the data to substantiate such a risk for stability simply because users would confuse them for whatever reasons. We should however stick to discussing the data rather than arguing over methodology, because it is such arguments which keep us from making progress, not the quality or quantity of the data.

I hope this summary will speed up our discussion during our next tele-conference, but even if it does not settle the discussion it is important that we *cannot* rule out any analysis because of any methodology developed before analyzing the data.

Best,

Meikal
Am 23. Feb. 2019, 18:44 +0100 schrieb Bill Jouris <bill.jouris at insidethestack.com>:
> Dennis,
>
> I'm not saying that the methodologies we have developed are flawed.  I am saying that they are limited.
>
> There is some set of code points which are variants.  Each methodology we have can identify some subset of those variants -- but not all of them.  That is, after all, why we have developed multiple methodologies.  Our task, I submit, is not to process our methodologies; our task is to identify variants.  The methodologies are a means to that end; they are not the end itself.
>
> I believe we need to bear in mind that the methodologies which we have at this point may still not be sufficient to identify all of the variants which exist.  We need to accept that, if we find code points which we generally agree are variants, but which are not identified by our existing methodologies, that is not a bad thing.  It would be good to then identify why they are variants, in the interests of then finding others of the same type; to develop an additional methodology.  Good, but not critical.
>
> The quality of our work product is determined by how successfully we identify the variants which exist.  If we find variants which are not generated by our methodologies, that improves the quality of our work.  It does not, as you appeared to suggest, diminish it.
>
> What I suggest the panel do is remain open to the possibility (I would say the certainty) that we have not yet created methodologies which will identify all variants.  At some point, of course, we do have to stop.  But that doesn't justify just saying "These code points do not fit the existing methodologies, so we cannot even consider them."   Again, the methodologies are a tool, not an end.
>
>
> Bill Jouris
> Inside Products
> bill.jouris at insidethestack.com
> 831-659-8360
> 925-855-9512 (direct)
>
>
> On Friday, February 22, 2019, 10:23:19 AM PST, Tan Tanaka, Dennis <dtantanaka at verisign.com> wrote:
>
>
> Bill,
>
> I don’t see how the methods this panel have developed and signed-off on (some of them over a year ago) misalign with our goal.
>
> What exactly are you suggesting this panel to do?
>
> -Dennis
>
> From: Latingp <latingp-bounces at icann.org> on behalf of Bill Jouris <bill.jouris at insidethestack.com>
> Reply-To: Bill Jouris <bill.jouris at insidethestack.com>
> Date: Thursday, February 21, 2019 at 5:36 PM
> To: Latin GP <latingp at icann.org>
> Subject: [EXTERNAL] [Latingp] Work Product Quality
>
> Dear colleagues,
>
> I want to strongly disagree with the thesis put forward at this morning's meeting that the quality of our work product depends on our following some very small number of discrete methodologies for identifying variants.  It does not.
>
> A methodology, any methodology, is merely a tool.  Its purpose is to help us get to our goal -- identifying variants.  If we use multiple methodologies, each of which allows us to identify some (quite possibly overlapping) subset of the universe of variants, that's OK.  If we identify a few additional cases that we believe to be variants, while going outside those methodologies, that's OK too.   It is desirable to have methodologies which identify large subsets, but only because that expedites our work.  The methodologies are a means to an end, not the end in themselves.
>
> The quality of our work product is based on how successfully we identify the members of the underlying universe of variants.  How we get there is really irrelevant to the quality of what we produce.
>
>
> Bill Jouris
>
> Sent from Yahoo Mail on Android
> _______________________________________________
> Latingp mailing list
> Latingp at icann.org
> https://mm.icann.org/mailman/listinfo/latingp
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mm.icann.org/pipermail/latingp/attachments/20190225/b9baebdd/attachment.html>