[With apologies for cross-posting to IPP and Character Repertoires
mailing lists.]
Background - the IEEE/ISTO PWG Character Repertoires standard (a
standard for NAMES, not a standard requiring support of particular
character repertoires) is nearly complete and is expected to be in
PWG 'last call' during the June 2003 face-to-face PWG meeting.
The latest CR working draft is at:
ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html
------------------------------------------------------------------------
Hi folks, Sunday (18 May 2003)
Below is a draft version of the IPP "repertoire-supported" attribute,
for inclusion in Appendix B 'Bindings to IPP' of the next working draft
of the PWG Character Repertoires standard.
First, some background. When I started to write up this attribute, I
realized that our (currently proposed) syntax for CR labels now uses
characters that are not allowed in the IPP "keyword" datatype.
We _could_ add a new datatype (similar to "charset") to IPP called
"repertoire". Tom Hastings has convinced me that a new "repertoire"
datatype is a _very_ bad idea. Most importantly, it would break all
existing IPP parsers.
Instead, Tom and I agree that we should alter our CR labels to achieve
strict conformance to the IPP "keyword" syntax. Then IANA can register
our small set of well-known CR/1.0 labels in the IANA IPP registry,
along with the new IPP "repertoire-supported" attribute itself.
Cheers,
- Ira McDonald
High North Inc
ISSUE: The Unihan names (based on the source legacy CJK charset) are
_not_ disjoint (i.e., they DO overlap). Should we abandon their use in
favor of IANA Charset Registry names. What value do these Unihan names
add? (hint - read the attribute description below before commenting)
Describing the Unicode HAN character assignments based on Unicode code
chart titles (from http://www.unicode.org/charts/) _does_ provide unique
non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is
the title for the Unicode character block starting at 'U+2E80').
------------------------------------------------------------------------
repertoire-supported (1setOf (keyword | name))
This REQUIRED IPP Printer Description attribute identifies some or all
of the character repertoires that the IPP Printer object and contained
IPP Job objects support for rendering of document data content. At
least the value 'unicode_basic-latin' MUST always be present, since
conforming IPP Printers MUST support at least the character repertoire
defined in the Unicode/4.0 'Basic Latin' code chart (and character
block).
A character repertoire is defined as a named subset of the characters
defined in a given character set standard (e.g., Unicode/4.0) that are
supported for output rendering of document data. The character set of
the document data (e.g., the value of "document-charset" in the the IPP
Document object) constrains the relevant character repertoires (e.g.,
since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that
character _cannot_ be represented in the ISO 8859-1 character set).
Character repertoires of legacy character sets (e.g., ISO 8859-1 and
ISO 8859-2) often overlap. However, character repertoires identified
by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are
disjoint). Therefore, a conforming IPP Printer SHOULD advertise
"repertoire-supported" values based on the Unicode/4.0 code chart
titles, to avoid ambiguity.
The ABNF [RFC2234] for legal values of "repertoire-supported" is:
repertoire = rep-prefix "_" rep-name
rep-prefix = "unicode" / ; from Code Chart titles
; of Unicode/4.0 char database
"unihan" / ; from Code Chart titles of
; of Unicode/4.0 Unihan database
"iana" / ; from Name or Alias fields in
; IANA Charset Registry
"vendor" ; from vendor-specific
; repertoire names
rep-name = rep-alpha *(rep-char)
rep-char = rep-alpha / rep-digit / ; alphanumeric or
"-" / "." / "_" ; limited punctuation chars
rep-alpha = %61-7A ; lowercase a-z
rep-digit = %30-39 ; decimal 0-9
Mapping Rule 1: If a source standard repertoire name (e.g., a value in
the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha
characters, those characters MUST be mapped to the IPP 'keyword' syntax
by converting each of them to their corresponding lowercase alpha
characters.
Mapping Rule 2: If a source standard repertoire name (e.g., a value in
the IANA Charset Registry [IANA-Charsets)] contains any other
non-keyword characters, those characters MUST be mapped to the IPP
'keyword' syntax by converting each of them (including space) to a
hyphen "-" character.
This archive was generated by hypermail 2b29 : Sun May 18 2003 - 13:32:22 EDT