Character Repertories Mail Archive: RE: CR> W3C Character Mo

RE: CR> W3C Character Model and Early Uniform Normalization

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Wed Sep 24 2003 - 10:21:49 EDT

  • Next message: McDonald, Ira: "CR> FW: New W3C FAQ: CSS character encoding declarations"

    Hi Jim,

    To reduce the implementation burden, I suggest that XHTML-Print
    state the a conforming Printer SHOULD normalize the document
    data to NFC (citing UAX-15 as the authoritative source).

    Since W3C Charmod is still a working draft, XHTML-Print should
    NOT have a Normative reference to W3C Charmod (which would
    prevent publication of XHTML-Print as PWG Candidate Standard).

    Because normalization is a fairly costly activity on large
    volumes of data (I wrote the normalization library for the
    forthcoming CUPS 1.2 release), I suggest that the XHTML-Print
    conformance be SHOULD rather than MUST.

    Cheers,
    - Ira McDonald
      High North Inc

    -----Original Message-----
    From: BIGELOW,JIM (HP-Boise,ex1) [mailto:jim.bigelow@hp.com]
    Sent: Monday, September 22, 2003 6:52 PM
    To: 'cr@pwg.org'
    Subject: RE: CR> W3C Character Model and Early Uniform Normalization

    Ira wrote:
    >
    > (2) [answering Jim]
    > No - a printer should _never_ throw away any document data
    > that happens not to be normalized ...

    I agree. However, the XHTML-Print spec [1, 2, 3] in their Printer
    Conformance sections that a printer may "flush or otherwise reject a
    non-conforming XHTML-Print document." This is the source of my worry that a
    printer could reject a document that is not normalized.
    >
    > (3) [answering Jim]
    > No - a printer should _never_ trust the sender/generator
    > to have properly normalized Unicode data.

    If a very low cost printer assumed that an XHTML-Print document's content is
    normalized and it is not, the very worse that could happen is that word
    breaks occur in the wrong place, e.g., between a letter and it's non-spacing
    mark, or class/id selectors don't match the value of the class/id attribute
    -- causing the misapplication of style sheet rules.

    I think the a printer should normalize and therefore correctly handle
    combining characters. I just wondering if other printer people think such a
    normalization should be mandated for all printers.

    Jim

    [1] ftp://ftp.pwg.org/pub/pwg/xhtml-print/drafts/xhtml-print-draft-095.pdf
    [2] http://www.pwg.org/xhtml-print/HTML-Version/XHTML-Print.html
    {3]



    This archive was generated by hypermail 2b29 : Wed Sep 24 2003 - 10:22:17 EDT