According to 10646-1:1993, the conformance is a bit different than
you say: level 1 says to avoid combining characters and the HANGUL JAMOS
block, level 2 says to avoid a specific, short list of 30 combining
characters, including things like the IDEOGRAPHIC LEVEL TONE MARK (don't ask
me!) and the
COMBINING CYRILLIC PALATALIZATION, while level 3 says that you have to do
all of it.
(My sneaking suspicion is that these 30 are marks that are commonly used
together with other marks on the same character, but the standard is silent
on this point; Greek, for instance, commonly uses > 1 accent on letters,
and the standard doesn't say "no double diacritics". Would you believe
the (precomposed) Greek letter U+0390, GREEK SMALL LETTER IOTA WITH DIALYTICA
AND TONOS?)
The Unicode standard, version 2.0, seems to say nothing about levels.
All of these levels are requirements on the sender: Don't send certain
characters.
I, like you, believe that a 10646 usage that uses NO combining characters
is simply Not The Right Thing.
And wrt the 30 characters not allowed in
The recipient is already allowed to do "magic" whenever unrecognized
characters are encountered; telling him to include the code for ignoring
unrecognized diacritcs is not a big deal, and allowing him to skip the
code for handling 30 more "unrecognized" diacritics is simply not worth it.
The coding of comparator functions is another, and complex matter; it is
with good reason that I'm far from explicit about this in the policy
document. And comparision is not a tagging function, but and end-system
operation function; that's why I thought I could get away with it.
(of course, protocols that require comparision of values and don't
address this problem are deficent - but we're not at a point where
we can make policy for those problems. See the ACAP protocol for one
example of language trying to address the issue)
BTW, the mailing list for discussion of the character set policy is
ietf-charsets@innosoft.com; mail to ietf-charsets-request@innosoft.com
to be added.
Harald A