If there really is a broad interest in "fixing" the localization
problem, I would suggest an alternative to Tom's proposal -- switch from
ASCII to UTF-8 for OCTET STRING objects where representation of
multilingual text is appropriate.
Summary of arguments in favor: no new objects, consistent with existing
conforming implementations (ASCII is subset of UTF-8), doesn't introduce
the complexity of multiple character sets for affected objects, doesn't
introduce the complexity of changeable character sets for affected
objects, seems to be consistent with direction of IETF generally and
SNMP in particular.
Problems I see are, briefly: forces implementations to deal with UTF-8,
and it conflicts with existing implementations that allow non-ASCII
characters in the strings. How serious these are depends, in part, on
whether you believe other MIB work is going to force UTF-8 anyway, and
how much weight you want to give to existing practice that deviates from
the existing standard.
Supporting material:
1. See the note from Randy Presuhn that Chris forwarded to the mailing
list. He suggests this approach, has obviously given the topic a
lot of thought, and discusses it in some detail. He also asserts
that the SNMPv3 effort is headed toward use of UTF-8 for all
human-readable strings.
2. I read Harald Alvestrand's message differently than Tom. I think it
says to specify the character set (a single one) and recommends
UTF-8; not to allow multiple character sets, chosen at the
discretion of the agent or application.
3. I also read RFC 2130 (The Character Set Workshop Report) differently
than Tom. It covers a lot of ground, trying to address migration of
existing protocols as well as new work. For new protcols in
particular, it says in part:
New protocols do not suffer from the need to be compatible with
old 7-bit pipes. New protocol specifications SHOULD use ISO
10646 as the base charset unless there is an overriding need to
use a different base character set.
Here are the details of the changes to the document:
1. Copy the Utf8String TC from the sysAppl draft:
Utf8String ::= TEXTUAL-CONVENTION
DISPLAY-HINT "255a"
STATUS current
DESCRIPTION
"To facilitate internationalization, this TC
represents information taken from the ISO/IEC IS
10646-1 character set, encoded as an octet string
using the UTF-8 character encoding scheme described
in RFC 2044 [**]. For strings in 7-bit US-ASCII,
there is no impact since the UTF-8 representation
is identical to the US-ASCII encoding."
SYNTAX OCTET STRING (SIZE (0..255))
Stylistically, you might want to introduce a ShortUtf8String with
SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
below).
2. Change the SYNTAX for the following objects from OCTET STRING:
prtGeneralCurrentOperator Utf8String (SIZE(0..127))
prtGeneralServicePerson Utf8String (SIZE(0..127))
prtGeneralSerialNumber Utf8String
prtGeneralPrinterName Utf8String
prtInputMediaName Utf8String (SIZE(0..63))
prtInputName Utf8String (SIZE(0..63))
prtInputVendorName Utf8String (SIZE(0..63))
prtInputModel Utf8String (SIZE(0..63))
prtInputVersion Utf8String (SIZE(0..63))
prtInputSerialNumber Utf8String (SIZE(0..32))
prtInputMediaType Utf8String (SIZE(0..63))
prtInputMediaColor Utf8String (SIZE(0..63))
prtOutputName Utf8String (SIZE(0..63))
prtOutputVendorName Utf8String (SIZE(0..63))
prtOutputModel Utf8String (SIZE(0..63))
prtOutputVersion Utf8String (SIZE(0..63))
prtOutputSerialNumber Utf8String (SIZE(0..63))
prtMarkerColorantValue Utf8String
prtChannelProtocolVersion Utf8String (SIZE(0..63))
prtInterpreterLangLevel Utf8String (SIZE(0..31))
prtInterpreterLangVersion Utf8String (SIZE(0..31))
prtInterpreterVersion Utf8String (SIZE(0..31))
3. Add the reference to RFC 2044 to the bibliography:
[**] F. Yergeau, "UTF-8, a transformation format of Unicode
and ISO 10646", RFC 2044, October 1996.
That's it.
:: David Kellerman Northlake Software 503-228-3383
::david_kellerman at nls.com Portland, Oregon fax 503-228-5662