Character Repertories Mail Archive: RE: CR> Draft of IPP &qu

RE: CR> Draft of IPP "repertoire-supported" Printer attribute

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Mon May 19 2003 - 15:38:55 EDT

  • Next message: McDonald, Ira: "CR> PWG SM bindings for new RepertoireSupported element"

    Hi Tom,

    Because "_" has already been used in the Name fields in the
    IANA Charset Registry, and we don't want to overly alter those
    in the 'iana_' namespace for repertoires.

    Cheers,
    - Ira

    PS - The namespace prefix MUST use the only 'field separator'
    permanently.

    PPS - Folks should ignore this discussion for a day or two.
    Elliot and I are working offline to refine the proposal and
    figure out the impacts on the main CR spec - thanks!

    -----Original Message-----
    From: Hastings, Tom N [mailto:hastings@cp10.es.xerox.com]
    Sent: Monday, May 19, 2003 3:29 PM
    To: McDonald, Ira
    Cc: 'cr@pwg.org'; 'ipp@pwg.org'
    Subject: RE: CR> Draft of IPP "repertoire-supported" Printer attribute

    Minor comment:

    Why allow the "_" in the rep-char (repertoire character names), since space
    characters are to be mapped to "-", not "_"? The advantage of not allowing
    "_" is that a future field could be added using "_" as a field separator,
    but not if "_" could be in rep-char.

    rep-char = rep-alpha / rep-digit / ; alphanumeric or
                  "-" / "." / "_" ; limited punctuation chars

    Tom

    -----Original Message-----
    From: McDonald, Ira [mailto:imcdonald@sharplabs.com]
    Sent: Sunday, May 18, 2003 10:31
    To: 'cr@pwg.org'; 'ipp@pwg.org'
    Subject: CR> Draft of IPP "repertoire-supported" Printer attribute

    [With apologies for cross-posting to IPP and Character Repertoires
    mailing lists.]

    Background - the IEEE/ISTO PWG Character Repertoires standard (a
    standard for NAMES, not a standard requiring support of particular
    character repertoires) is nearly complete and is expected to be in
    PWG 'last call' during the June 2003 face-to-face PWG meeting.

    The latest CR working draft is at:

    ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/wd-pcr10-20030317.html

    ------------------------------------------------------------------------
    Hi folks, Sunday (18 May 2003)

    Below is a draft version of the IPP "repertoire-supported" attribute,
    for inclusion in Appendix B 'Bindings to IPP' of the next working draft
    of the PWG Character Repertoires standard.

    First, some background. When I started to write up this attribute, I
    realized that our (currently proposed) syntax for CR labels now uses
    characters that are not allowed in the IPP "keyword" datatype.

    We _could_ add a new datatype (similar to "charset") to IPP called
    "repertoire". Tom Hastings has convinced me that a new "repertoire"
    datatype is a _very_ bad idea. Most importantly, it would break all
    existing IPP parsers.

    Instead, Tom and I agree that we should alter our CR labels to achieve
    strict conformance to the IPP "keyword" syntax. Then IANA can register
    our small set of well-known CR/1.0 labels in the IANA IPP registry,
    along with the new IPP "repertoire-supported" attribute itself.

    Cheers,
    - Ira McDonald
      High North Inc

    ISSUE: The Unihan names (based on the source legacy CJK charset) are
    _not_ disjoint (i.e., they DO overlap). Should we abandon their use in
    favor of IANA Charset Registry names. What value do these Unihan names
    add? (hint - read the attribute description below before commenting)

    Describing the Unicode HAN character assignments based on Unicode code
    chart titles (from http://www.unicode.org/charts/) _does_ provide unique
    non-overlapping labels (e.g., 'unicode_cjk-radicals-supplement' which is
    the title for the Unicode character block starting at 'U+2E80').

    ------------------------------------------------------------------------

    repertoire-supported (1setOf (keyword | name))

    This REQUIRED IPP Printer Description attribute identifies some or all
    of the character repertoires that the IPP Printer object and contained
    IPP Job objects support for rendering of document data content. At
    least the value 'unicode_basic-latin' MUST always be present, since
    conforming IPP Printers MUST support at least the character repertoire
    defined in the Unicode/4.0 'Basic Latin' code chart (and character
    block).

    A character repertoire is defined as a named subset of the characters
    defined in a given character set standard (e.g., Unicode/4.0) that are
    supported for output rendering of document data. The character set of
    the document data (e.g., the value of "document-charset" in the the IPP
    Document object) constrains the relevant character repertoires (e.g.,
    since ISO 8859-1 does not assign a codepoint to GREEK TONOS U+0384, that
    character _cannot_ be represented in the ISO 8859-1 character set).

    Character repertoires of legacy character sets (e.g., ISO 8859-1 and
    ISO 8859-2) often overlap. However, character repertoires identified
    by the Unicode/4.0 code chart titles do _not_ overlap (i.e., they are
    disjoint). Therefore, a conforming IPP Printer SHOULD advertise
    "repertoire-supported" values based on the Unicode/4.0 code chart
    titles, to avoid ambiguity.

    The ABNF [RFC2234] for legal values of "repertoire-supported" is:

    repertoire = rep-prefix "_" rep-name
    rep-prefix = "unicode" / ; from Code Chart titles
                                            ; of Unicode/4.0 char database
                  "unihan" / ; from Code Chart titles of
                                            ; of Unicode/4.0 Unihan database
                  "iana" / ; from Name or Alias fields in
                                            ; IANA Charset Registry
                  "vendor" ; from vendor-specific
                                            ; repertoire names
    rep-name = rep-alpha *(rep-char)
    rep-char = rep-alpha / rep-digit / ; alphanumeric or
                  "-" / "." / "_" ; limited punctuation chars
    rep-alpha = %61-7A ; lowercase a-z
    rep-digit = %30-39 ; decimal 0-9

    Mapping Rule 1: If a source standard repertoire name (e.g., a value in
    the IANA Charset Registry [IANA-Charsets]) contains any uppercase alpha
    characters, those characters MUST be mapped to the IPP 'keyword' syntax
    by converting each of them to their corresponding lowercase alpha
    characters.

    Mapping Rule 2: If a source standard repertoire name (e.g., a value in
    the IANA Charset Registry [IANA-Charsets)] contains any other
    non-keyword characters, those characters MUST be mapped to the IPP
    'keyword' syntax by converting each of them (including space) to a
    hyphen "-" character.



    This archive was generated by hypermail 2b29 : Mon May 19 2003 - 15:42:59 EDT