Character Repertories Mail Archive: RE: CR> FW: GB 18030 Inf

RE: CR> FW: GB 18030 Information Required

From: McDonald, Ira (imcdonald@sharplabs.com)
Date: Mon Mar 03 2003 - 16:38:58 EST

  • Next message: ElliottBradshaw@oaktech.com: "CR> Reminder: Conference call Wed. 3/12 at 4:00 Eastern"

    Hi Elliot,

    Yes - GB18030 is a mapping to EVERY codepoint in Unicode (not just
    the assigned ones, but all 1.1 million possible Unicode codepoints).
    But it's a multi-byte, variable-length (one to four bytes) set of
    codepoints in GB18030.

    As Markus Scherer says it is best thought of as a Chinese-market
    UTF (Unicode Transformation Format), like UTF-8, UTF-16, and UTF-32.

    I agree with you therefore, that PWG CR should view GB18030 as a
    valid 'charset' (which can be tagged) but NOT as a unique
    'repertoire' (because it's a different encoding of Unicode).

    Cheers,
    - Ira McDonald
      High North Inc

    -----Original Message-----
    From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
    Sent: Monday, March 03, 2003 11:32 AM
    To: McDonald, Ira
    Cc: 'cr@pwg.org'; owner-cr@pwg.org
    Subject: Re: CR> FW: GB 18030 Information Required

    Interesting.

    If I read this correctly, then 18030 is a mapping to ALL of Unicode. This
    would make it an encoding, but not a subset.

    If that's right, then we would treat it as a kind of charset, but not as a
    repertoire.

    Your thoughts?

      E.

    ------------------------------------------
    Elliott Bradshaw
    Director, Software Engineering
    Oak Technology Imaging Group
    781 638-7534

     

                        "McDonald, Ira"

                        <imcdonald@shar To: "'cr@pwg.org'"
    <cr@pwg.org>
                        plabs.com> cc:

                        Sent by: Subject: CR> FW: GB 18030
    Information Required
                        owner-cr@pwg.or

                        g

     

     

                        03/03/2003

                        11:42 AM

     

     

    Hi folks,

    Elliot - the first two white papers (links below) look highly
    useful. Markus Scherer is a Unicode and charsets heavy at IBM.

    Cheers,
    - Ira McDonald
      High North Inc

    -----Original Message-----
    From: Markus Scherer [mailto:markus.scherer@jtcsv.com]
    Sent: Monday, March 03, 2003 10:26 AM
    To: vinay.aggarwal@rebus.co.in; charsets
    Subject: Re: GB 18030 Information Required

    vinay.aggarwal@rebus.co.in wrote:
    > Could you please let me know if following supports the GB18030?
    > - Any web based application
    > - Browser (Internet Explorer/ Netsacpe) based application

    Yes and no. Generally, web-based applications and browsers and related
    protocols do support GB 18030
    and Unicode and various other charsets.

    Specifically, you need to read about
    - charsets, e.g.,
    http://oss.software.ibm.com/icu/docs/papers/codepages_and_unicode.html
    - GB 18030, e.g., http://oss.software.ibm.com/icu/docs/papers/gb18030.html
    - Unicode, e.g., http://www.unicode.org/standard/WhatIsUnicode.html

    and about the particular applications (and versions of them) that you
    intend
    to use.

    markus



    This archive was generated by hypermail 2b29 : Mon Mar 03 2003 - 16:39:26 EST