This is going to be a bit repetitive (a no-no in e-mail, I know), but
this issue seems to create a lot of confusion.
To my mind, what Tom is proposing is very different from:
1. Randy Presun's e-mail and the SYSAPPL MIB approach to character sets
2. Harald Alvestrand's e-mail
3. The RFC 2130 (Character Set Workshop Report) recommendations
The way I read all these sources, they essentially say to use ISO 10646
(roughly UNICODE worked over by ISO, for those of you still getting your
bearings) as the base character set and UTF-8 as the character encoding
scheme (again, roughly speaking, encodes ISO 10646 codes as multi-byte
sequences, seven-bit single-byte codes happen to match ASCII).
Tom's approach, and similarly the approach taken with the
prtLocalizationCharacterSet MIB object, allows multiple character sets
and encodings. You need to know the encoding to interpret the codes;
one code represent different characters in different encodings. In
Tom's proposal, the determination of encoding takes place outside the
MIB.
Now these two approaches are not the same, by a long shot. And it's my
understanding that in other places, proponents of opposing sides line up
with armor and broadsword to debate the issue. Being an applications
software person, I happen to prefer the UTF-8 approach. Now I'm not a
licensed character set professional, I've misplaced my broadsword, and
my armor doesn't fit anymore, so I'm feeling a little handicapped in the
debate.
So, Chris, I know you'd like to find a "satisfactory compromise" here,
but I don't see where you've got convergence of positions. (Between
your advisors and Tom's proposal, in particular.) Perhaps Tom
would like to propose that all the strings now constrained as ASCII be
allowed to contain UTF-8 codes?
:: David Kellerman Northlake Software 503-228-3383
:: david_kellerman@nls.com Portland, Oregon fax 503-228-5662