PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to

Wed Jul 23 19:46:51 EDT 1997

If there really is a broad interest in "fixing" the localization
problem, I would suggest an alternative to Tom's proposal -- switch from
ASCII to UTF-8 for OCTET STRING objects where representation of
multilingual text is appropriate. 

Summary of arguments in favor: no new objects, consistent with existing
conforming implementations (ASCII is subset of UTF-8), doesn't introduce
the complexity of multiple character sets for affected objects, doesn't
introduce the complexity of changeable character sets for affected
objects, seems to be consistent with direction of IETF generally and
SNMP in particular. 

Problems I see are, briefly: forces implementations to deal with UTF-8,
and it conflicts with existing implementations that allow non-ASCII
characters in the strings.  How serious these are depends, in part, on
whether you believe other MIB work is going to force UTF-8 anyway, and
how much weight you want to give to existing practice that deviates from
the existing standard. 

Supporting material: 
 1. See the note from Randy Presuhn that Chris forwarded to the mailing
    list.  He suggests this approach, has obviously given the topic a
    lot of thought, and discusses it in some detail.  He also asserts
    that the SNMPv3 effort is headed toward use of UTF-8 for all
    human-readable strings. 
 2. I read Harald Alvestrand's message differently than Tom.  I think it
    says to specify the character set (a single one) and recommends
    UTF-8; not to allow multiple character sets, chosen at the
    discretion of the agent or application. 
 3. I also read RFC 2130 (The Character Set Workshop Report) differently
    than Tom.  It covers a lot of ground, trying to address migration of
    existing protocols as well as new work.  For new protcols in
    particular, it says in part: 
        New protocols do not suffer from the need to be compatible with
        old 7-bit pipes.  New protocol specifications SHOULD use ISO
        10646 as the base charset unless there is an overriding need to
        use a different base character set. 

Here are the details of the changes to the document:

 1. Copy the Utf8String TC from the sysAppl draft:

    Utf8String ::= TEXTUAL-CONVENTION
         DISPLAY-HINT "255a"
         STATUS  current
         DESCRIPTION
                 "To facilitate internationalization, this TC
                  represents information taken from the ISO/IEC IS
                  10646-1 character set, encoded as an octet string
                  using the UTF-8 character encoding scheme described
                  in RFC 2044 [**].  For strings in 7-bit US-ASCII,
                  there is no impact since the UTF-8 representation
                  is identical to the US-ASCII encoding."
         SYNTAX  OCTET STRING (SIZE (0..255))

    Stylistically, you might want to introduce a ShortUtf8String with
    SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
    below). 

 2. Change the SYNTAX for the following objects from OCTET STRING:

    prtGeneralCurrentOperator   Utf8String (SIZE(0..127))
    prtGeneralServicePerson     Utf8String (SIZE(0..127))
    prtGeneralSerialNumber      Utf8String
    prtGeneralPrinterName       Utf8String

    prtInputMediaName           Utf8String (SIZE(0..63))
    prtInputName                Utf8String (SIZE(0..63))
    prtInputVendorName          Utf8String (SIZE(0..63))
    prtInputModel               Utf8String (SIZE(0..63))
    prtInputVersion             Utf8String (SIZE(0..63))
    prtInputSerialNumber        Utf8String (SIZE(0..32))

    prtInputMediaType           Utf8String (SIZE(0..63))
    prtInputMediaColor          Utf8String (SIZE(0..63))

    prtOutputName               Utf8String (SIZE(0..63))
    prtOutputVendorName         Utf8String (SIZE(0..63))
    prtOutputModel              Utf8String (SIZE(0..63))
    prtOutputVersion            Utf8String (SIZE(0..63))
    prtOutputSerialNumber       Utf8String (SIZE(0..63))

    prtMarkerColorantValue      Utf8String

    prtChannelProtocolVersion   Utf8String (SIZE(0..63))

    prtInterpreterLangLevel     Utf8String (SIZE(0..31))
    prtInterpreterLangVersion   Utf8String (SIZE(0..31))
    prtInterpreterVersion       Utf8String (SIZE(0..31))

 3. Add the reference to RFC 2044 to the bibliography: 

    [**] F. Yergeau, "UTF-8, a transformation format of Unicode
         and ISO 10646", RFC 2044, October 1996.

That's it. 

::  David Kellerman         Northlake Software      503-228-3383
::  david_kellerman at nls.com Portland, Oregon        fax 503-228-5662