Makes sense.
>Perhaps this was an unfortunate choice in the ISO DPA standard.
Indeed!
>I recently read the MIME media-type registration for UTF-8 and
>it seemed somewhat ambiguous about which level of ISO 10646
>is mandatory.
The charset label (not media-type) registration for UTF-8 does not mandate
transmission of one level or another. The charset label is only an
announcement mechanism that tells a receiving application how to turn the
received bytes into characters.
As such, there is nothing to stop a given protocol, say IPP, from mandating
that all strings be encoded at level 2. The "UTF-8" label is then
perfectly adequate -- the receiving end knows how to decode -- and the
(overstated, IMHO) burden of level 3 does not come to bear on that protocol.
The rationale for not having level-specific labels is that it does not seem
to help anybody. In your case, if the transmitters are restricted to level
2 by the protocol definition, the receiving end gains nothing from a label
that says level 2 -- they already know that. In general, a receiver that
is prepared to receive level 3 will gain nothing from a level 2 label. In
the opposite case (level 2 receiver, level 3 label) the only gain is a
slightly earlier warning that the receiver may fail; even that gain may be
illusory, as a level 3 label does not guarantee that level 3 combinations
actually occur -- the level 2 receiver may well succeed if it tries.
There has been ample discussion of this (and the related versioning) issue
on various IETF, ISO and Unicode lists since early 1996, and the consensus
is that this meager gain in one case is to small to bother, especially when
faced with a potential proliferation of labels for all version-level
combinations, and the possibility that it leads to less interoperability in
some cases.
>The IPP working group has approximately three (3) weeks left
>to finish their Model and Semantics spec and Protocol spec
>and advance them to the IESG (per a recent note from Keith
>Moore, one of our IETF Applications Area Directors), if the
>IPP/1.0 specs are to be adopted within calendar 1997. There
>is a very real urgency about completing IPP/1.o specs soon.
>Otherwise, it is probable that single-vendor or small
>consortium implementations of something "similar" to IPP,
>but NOT interoperable. And the "window of opportunity"
>to solve the problem of a robust replacement for RFC 1179
>will have passed.
>
>Regards,
>- Ira McDonald (outside consultant at Xerox)
>------------------------------------------------------
>>From ipp-owner@pwg.org Tue Oct 14 14:18:05 1997
>Return-Path: <ipp-owner@pwg.org>
>Received: from zombi (zombi.eso.mc.xerox.com) by snorkel.eso.mc.xerox.com
(4.1/XeroxClient-1.1)
> id AA18881; Tue, 14 Oct 97 14:18:04 EDT
>Received: from alpha.Xerox.COM by zombi (4.1/SMI-4.1)
> id AB07428; Tue, 14 Oct 97 14:13:51 EDT
>Received: from lists.underscore.com ([199.125.85.31]) by alpha.xerox.com
with SMTP id <55990(1)>; Tue, 14 Oct 1997 11:13:13 PDT
>Received: from localhost (daemon@localhost) by lists.underscore.com
(8.7.5/8.7.3) with SMTP id OAA09275 for <imcdonal@eso.mc.xerox.com>; Tue,
14 Oct 1997 14:09:44 -0400 (EDT)
>Received: by pwg.org (bulk_mailer v1.5); Tue, 14 Oct 1997 14:05:46 -0400
>Received: (from daemon@localhost) by lists.underscore.com (8.7.5/8.7.3) id
NAA08865 for ipp-outgoing; Tue, 14 Oct 1997 13:55:26 -0400 (EDT)
>Message-Id: <3.0.1.32.19971014095159.00b88790@genstar.alis.ca>
>X-Sender: yergeau@genstar.alis.ca
>X-Mailer: Windows Eudora Pro Version 3.0.1 (32)
>Date: Tue, 14 Oct 1997 06:51:59 PDT
>To: Tom Hastings <hastings@cp10.es.xerox.com>
>From: Francois Yergeau <yergeau@alis.com>
>Subject: IPP> Re: MOD - Comment on RFC 2044: need to specify the ISO 10646
> conformance level
>Cc: Harald.T.Alvestrand@uninett.no, Keith Moore <moore@cs.utk.edu>,
> ipp@pwg.org
>In-Reply-To: <3.0.1.32.19971008215215.00f613b0@garfield>
>Mime-Version: 1.0
>Content-Type: text/plain; charset="iso-8859-1"
>Content-Transfer-Encoding: quoted-printable
>Sender: ipp-owner@pwg.org
>Status: R
>
>=C0 21:52 08/10/97 PDT, Tom Hastings a =E9crit :
>>For the purposed of Internet Protocols, we suspect that level 2 is
>sufficient.
>>But the UTF-8 definition in RFC 2144 is silent on this matter. We suggest
>>that
>>a refision to RFC 2144 be issues that indicates that utf-8 means just
>level 2.
>
>A restriction to level 2, which excludes most combining characters, would
>severely restrict the expressive power of ISO 10646, and in consequence the
>ability of the protocols that use it (in the UTF-8 form) to represent the
>textual content that they need to represent for truly world-wide operation.
> IMHO, such a restriction is too serious to be entertained solely on the
>basis of "we suspect that level 2 is sufficient." ISO has not found it
>sufficient, it has level 3; Unicode has *only* level 3. And I think the
>purposes of Internet protocols (at least those that carry text) are the
>same as the purposes underlying ISO 10646 and Unicode: to enable
>communication in all the world's languages.
>
>>Alternatively, register a new value, say, 'utf-8-level-2'.
>
>There is actually a revision of RFC 2044 underway. The latest draft
>(ftp://ds.internic.net/internet-drafts/draft-yergeau-utf8-rev-01.txt) has a
>discussion of version-specific labels, which you may find relevant to your
>proposal of a level-specific label. Please take a look, and come back with
>your proposal -- preferably to the ietf-charset list, as suggested by
>Harald -- if you still think it is appropriate.
>
>Regards,
>
>
>--=20
>Fran=E7ois Yergeau <yergeau@alis.com>
>Alis Technologies inc., Montr=E9al
>T=E9l : +1 (514) 747-2547
>Fax : +1 (514) 747-2561
>
>
>
-- François Yergeau <yergeau@alis.com> Alis Technologies inc., Montréal Tél : +1 (514) 747-2547 Fax : +1 (514) 747-2561