Minutes:Character Repertoires working group meeting, January 21, 2003

Minutes of the March 31, 2003 meeting of the PWG Character Repertoires Committee

Elliott Bradshaw, 4/1/03

Attendees

Lee Farrell, Canon
Ira McDonald, High North
Alan Berkema, HP
Harry Lewis, IBM
Dennis Carney, IBM
Jerry Thrasher, Lexmark
Bill Wagner, NetSilicon
Elliott Bradshaw, Oak Technology

General Discussion

Our main agenda item was the review of the March 17 version of the Character Repertoire Interoperability document.

It is clear that the overall intent and scope of the document are still difficult to digest for the first time reader. This may suggest we need better explanations, or it may suggest the goals and scope are flawed.

There was consensus that it is useful to agree on a syntax for referring to existing character repertoires, as discussed in 3.1; and that it is useful to add this to SM and IPP.

There was more discussion and, perhaps, discomfort, about the material that attempts to prescribe required advertisement and support under various circumstances; i.e. the concept of a Basic Repertoire. For one thing, the circumstances (e.g. "supporting similar repertoires") are hard to lock down in a definitive way. For another, it is hard to pick a list of Basic Repertoires and manage its growth in a productive way. For example, we would like to expand the scheme to other countries without requiring a major document revision cycle.

One possible approach is to move some of this material into a "Best Practices" category, in which there would be no firm requirement but general guidelines for how to proceed with as much interoperability as possible.

These discussion ideas are captured in the Issues, below.

Decisions

We made the following decisions regarding changes to the document:

Emphasize that support for a character does not imply WYSIWYG...any mapped version will do.
State that support for a character means that the printer knows how to render it, through any combination of local and downloaded-as-needed font data.
Use Semantic Model naming conventions and syntax throughout. E.g. "repertoires-supported" should be "RepertoireSupported" (singular, not plural, for consistency with other SM Supported elements).
Remove "repertoires-ready", as the group didn't feel there was a strong use case for this.
In Section 3, separate out the SM-specific material into a SM mapping section near the end; this will be short and descriptive (and, I suggest, informative). The description of values should be independent of SM (while freely borrowing SM syntax conventions where needed).
Put a similar IPP mapping section near the end of section 3; remove appendices for SM and IPP mapping.
In section 3.1 clarify that we are defining a "textual namespace" with our prefix conventions.
The mapping rules were approved as written, and there is no need for a canonical form.
The conformance section should be a summary of previous material. Make sure each item (e.g. requirement for euro) is addressed separately.
Amend the Unicode references to lock down the current version (3.2).

Issues

Should we add diagrams, use cases, or other explanatory material to the Introduction?
Should the rules for Basic Repertoires be normative, or a best practice? Should they be in a separate section or document?
Specifically, when a printer advertises a repertoire, must it guarantee to render every character, or is it best effort?
It was suggested that in addition to RepertoireSupported, we allow Repertoire (or Repertoires?) to be submitted with a job. This would warn the printer that the job needed characters from those repertoires; if the printer had specific knowledge that it could not handle them, then it could fail the job. (But see issue #2.)
Currently the list of basic repertoires uses certain Unicode charts. Alternatively, should it simply include all Unicode charts? This would provide minimal interoperability for a wider range of circumstances, but it does extend the obligations of a printer, since it must support every character in each of the repertoires it advertises.

Stake in the ground: Best Practices

For discussion, here is a possible approach.

We create a new section called Best Practices, which I suppose is marked Informative. The purpose of the Best Practices is to improve interoperability, including with lightweight clients.
The syntax for repertoire names (i.e. the name space prefix notation) remains normative. For normative purposes, support of a repertoire means the printer can render most of its characters most of the time.
The matching rules are normative, as this defines what "equal" means for repertoire names.
Basic repertoires are:
- All code charts of the form Unicode: chart
- Unihan: GB 2312
- Unihan: JIS X 0208
- Unihan: KS X 1001:1992
- Unihan: Big5
Best Practices include the following:
- For each supported repertoire, be able to render all of its characters regardless of which font is current. [Aggressive substitution and non-WYSIWYG is fine.]
- Always support and advertise Unicode: Basic Latin and Unicode: Latin-1 Supplement.
- Always support the euro character.
- Always support characters called for by a particular formatting language, e.g. predefined character entities in XHTML.
- [This is the fuzzy part] Support and advertise a basic repertoire whenever supporting a similar one. E.g. when supporting IANA: ISO-8859-7, also support Unicode: Greek.
- When a client submits a job with the Repertoire element specified, examine the values in the element. If there is any element which a) the printer recognizes, and b) the printer knows it does not support most of its characters, then fail the job as defined in the current protocol in use.

Update from PWG Plenary, 4/2/2003

At the Plenary these decisions were made:

CR should proceed to a formal charter.
CR should follow the proposed new PWG process.