ISO 10646 has three conformance levels:
1. No composed characters
2. Several code points respresenting a single character in certain Asian
languages where such composition is part of the language.
3. Composed characters using non-spacing accents. Thus accented letters
have at least two different encodings: (1) as a single code point, and (2)
a sequence
of two code points where one is a non-spacing accent. Characters with more
than one accent are represented by a sequence of three code points. For
level 3,
a recipient has to fold the alternate forms into a single cannonical form for
purposes of comparing. One of the advantages of level 3 is that new accented
letters can be invented without having to add new code points to the standard.
Another disadvantage is that characters are not fixed length.
For the purposed of Internet Protocols, we suspect that level 2 is sufficient.
But the UTF-8 definition in RFC 2144 is silent on this matter. We suggest
that
a refision to RFC 2144 be issues that indicates that utf-8 means just level 2.
Alternatively, register a new value, say, 'utf-8-level-2'.
Tom