It's a hack that attempts to sanction existing non-conforming
implementations. (Come on, it's a convenient fig leaf to say that
implementors didn't know what ASCII meant -- they knew; it was just
expedient to allow the extended local character sets.) It imposes a
continuing burden of multiple code sets on applications. And the
introduction of an open-ended choice of code sets can only complicate
interoperability.
Your proposal goes on for pages and pages of dense text. And every time
you attempt to explain it to people, you end up with pages of
explanation. This should be a clue that it's not simple.
The SYNTHESIS proposal is tricky. And I only started to appreciate some
of the implications yesterday as I was putting together the Utf8String
proposal. To give just one example, all the objects affected by the
existing prtGeneralCurrentLocalization and prtConsoleLocalization are
(with one carefully documented exception) read-only, as is the
localization table (so the agent completely controls the localization).
The SYNTHESIS proposal, in contrast, affects a mix of read-only and
writable objects, and the character set selection may be writable. This
breaks new ground for the Printer MIB. What are the implications for
agent and application, and how many pages of explanation are required to
cover them?
Now I'm not asking you for an explanation of the issue above. In fact,
my point is really that an explanation isn't too useful. I didn't start
to see how the machinery fit together until I started working with it,
started trying to see the implications for an application
implementation. In effect, getting my hands dirty.
I think the issue needs this sort of hands-on consideration from others,
particularly applications implementors concerned with interoperability,
in order to build confidence that we understand the implications. The
floods of "urgent, reply by yesterday" e-mail, by contrast, quickly
start to blur into a muddle.
:: David Kellerman Northlake Software 503-228-3383
:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662
------------------------------------------------------------------------
Date: Thu, 24 Jul 1997 11:34:50 PDT
To: David_Kellerman@nls.com
From: Tom Hastings <hastings@cp10.es.xerox.com>
Subject: Re: PMP> URGENT: SYNTHESIS proposal on definition of OCTET STRING to
allow superset of ASCII
CC: pmp@pwg.org
David,
If this were the fall if 1994 when the PWG finished the Printer MIB
and forwarded it to the IESG (and it got published in March 1995 as
RFC 1759), I would be in favor of your proposal to use UTF-8 only.
It is unambiguous and doesn't require a new object and covers the world.
The Printer MIB was a "new protocol" at that time. Two and a half years
later and with lots of vendors products in the market, the Printer MIB
is no longer a "new protocol".
However, even if the Printer MIB were a "new" protocol, the Asian vendors
are split on using ISO 10646/Unicode/UTF-8 versus their long established
national set (JIS X0208:1990 for Japanese) and GB2312:1980 for Chinese).
So if there was real Asian representation in this discussion, it is not
clear that they would favor UTF-8. (The SYNTHESIS proposal works with
these Asian national sets, because code positions 32 to 127 are US-ASCII).
Also RFC 2130 does state the case of existing protocols, such as HTTP
which use ISO 8859 (Latin1). So our MIB is NOT being required to use
UTF-8, since the Printer MIB is not a NEW protocol.
My SYNTHESIS proposal allows using UTF-8 (and encourages it as the default),
but does NOT require it. The simple scenario of how the new object
prtGeneralStaticCodeSet is used (as a read-only object) is that the vendor
ships a floppy with his printer. The System Administrator runs an install
application that allows him to select which representation for the
vendor supplied information to include and the install application puts that
information into the flash memory of the printer. The System Administrator
also decides at the same time which site-settable objects, such as
prtGeneralPrinterName, prtGeneralCurrentOperator, prtGeneralServicePerson,
etc. and sets that information also into flash memory of the printer.
All these objects can be implemented as READ-ONLY in the MIB.
Only if there is some sort of security mechansm in place should an implementor
(or the system administrator) consider making these object READ-WRITE.
The SYNTHESIS proposal is simple. The SA chooses one char set for all the
information, whether it comes from the vendor or is site-dependent.
Different printer implementations could support some or all of the following
character sets:
Market Coded Character Set
US US-ASCII
Western Hemisphere/ ISO 8859-1 (Latin1), HP Roman8, Code page 850
Wester Europe
World UTF-8, US-ASCII/JIS X0208, US-ASCII/GB2312
Also the vendor might chose to only put English on his floppy, or could
have different versions for each language on the floppy. But once in the
MIB, there is only one coded charater set as selected by the System
Administrator (hopefully in some user-friendly way, such as the SA
choosing his environment, rather than choosing an actual coded character
set).
The point is that any one of the above character sets cover multiple
languages for a significant region of the world. So that it is possible
for a System Administrator to choose one of them at install time of the
printer.
Applications that are "localized" are encouraged to be character set
independent. The application passes the data to the platform to display
and the platform should have the same character set as the SA set for
the printer.
Tom
At 16:46 07/23/97 PDT, David_Kellerman@nls.com wrote:
>If there really is a broad interest in "fixing" the localization
>problem, I would suggest an alternative to Tom's proposal -- switch from
>ASCII to UTF-8 for OCTET STRING objects where representation of
>multilingual text is appropriate.
>
>Summary of arguments in favor: no new objects, consistent with existing
>conforming implementations (ASCII is subset of UTF-8), doesn't introduce
>the complexity of multiple character sets for affected objects, doesn't
>introduce the complexity of changeable character sets for affected
>objects, seems to be consistent with direction of IETF generally and
>SNMP in particular.
>
>Problems I see are, briefly: forces implementations to deal with UTF-8,
>and it conflicts with existing implementations that allow non-ASCII
>characters in the strings. How serious these are depends, in part, on
>whether you believe other MIB work is going to force UTF-8 anyway, and
>how much weight you want to give to existing practice that deviates from
>the existing standard.
>
>Supporting material:
> 1. See the note from Randy Presuhn that Chris forwarded to the mailing
> list. He suggests this approach, has obviously given the topic a
> lot of thought, and discusses it in some detail. He also asserts
> that the SNMPv3 effort is headed toward use of UTF-8 for all
> human-readable strings.
> 2. I read Harald Alvestrand's message differently than Tom. I think it
> says to specify the character set (a single one) and recommends
> UTF-8; not to allow multiple character sets, chosen at the
> discretion of the agent or application.
> 3. I also read RFC 2130 (The Character Set Workshop Report) differently
> than Tom. It covers a lot of ground, trying to address migration of
> existing protocols as well as new work. For new protcols in
> particular, it says in part:
> New protocols do not suffer from the need to be compatible with
> old 7-bit pipes. New protocol specifications SHOULD use ISO
> 10646 as the base charset unless there is an overriding need to
> use a different base character set.
>
>Here are the details of the changes to the document:
>
> 1. Copy the Utf8String TC from the sysAppl draft:
>
> Utf8String ::= TEXTUAL-CONVENTION
> DISPLAY-HINT "255a"
> STATUS current
> DESCRIPTION
> "To facilitate internationalization, this TC
> represents information taken from the ISO/IEC IS
> 10646-1 character set, encoded as an octet string
> using the UTF-8 character encoding scheme described
> in RFC 2044 [**]. For strings in 7-bit US-ASCII,
> there is no impact since the UTF-8 representation
> is identical to the US-ASCII encoding."
> SYNTAX OCTET STRING (SIZE (0..255))
>
> Stylistically, you might want to introduce a ShortUtf8String with
> SIZE (0..63) -- it would simplify many of the SYNTAX clauses (see
> below).
>
> 2. Change the SYNTAX for the following objects from OCTET STRING:
>
> prtGeneralCurrentOperator Utf8String (SIZE(0..127))
> prtGeneralServicePerson Utf8String (SIZE(0..127))
> prtGeneralSerialNumber Utf8String
> prtGeneralPrinterName Utf8String
>
> prtInputMediaName Utf8String (SIZE(0..63))
> prtInputName Utf8String (SIZE(0..63))
> prtInputVendorName Utf8String (SIZE(0..63))
> prtInputModel Utf8String (SIZE(0..63))
> prtInputVersion Utf8String (SIZE(0..63))
> prtInputSerialNumber Utf8String (SIZE(0..32))
>
> prtInputMediaType Utf8String (SIZE(0..63))
> prtInputMediaColor Utf8String (SIZE(0..63))
>
> prtOutputName Utf8String (SIZE(0..63))
> prtOutputVendorName Utf8String (SIZE(0..63))
> prtOutputModel Utf8String (SIZE(0..63))
> prtOutputVersion Utf8String (SIZE(0..63))
> prtOutputSerialNumber Utf8String (SIZE(0..63))
>
> prtMarkerColorantValue Utf8String
>
> prtChannelProtocolVersion Utf8String (SIZE(0..63))
>
> prtInterpreterLangLevel Utf8String (SIZE(0..31))
> prtInterpreterLangVersion Utf8String (SIZE(0..31))
> prtInterpreterVersion Utf8String (SIZE(0..31))
>
> 3. Add the reference to RFC 2044 to the bibliography:
>
> [**] F. Yergeau, "UTF-8, a transformation format of Unicode
> and ISO 10646", RFC 2044, October 1996.
>
>That's it.
>
>:: David Kellerman Northlake Software 503-228-3383
>:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662
>
>