Hi,
Inline below...
Cheers,
- Ira McDonald
High North Inc
PS - Note that the European standard CEN CWA 13873:2000 standardized
Multilingual European Subsets of ISO 10646/Unicode called MES-1,
MES-2, MES-3A, and MES-3B (see very end of this note below).
-----Original Message-----
From: Michael Sweet [mailto:mike at easysw.com]
Sent: Monday, October 21, 2002 2:19 PM
To: ElliottBradshaw at oaktech.com
Cc: pwg-announce at pwg.org
Subject: Re: PWG-ANNOUNCE> Character repertoires in printers
ElliottBradshaw at oaktech.com wrote:
> ...
> 1. Is this a problem worth solving? (vs. vendor-specific solutions)
Yes.
> 2. Should it be treated as part of XHTML-Print, UPnP, or some other
> group? (as opposed to a separate working group)
Probably as part of an existing group.
> 3. Who is interested in participarting, as author or reviewer?
I'd be interested, at least in the reviewer/back-seat-driver role. :)
....
Some immediate thoughts based on my own experiences, and without
looking at the Bluetooth docos.
<ira> Bluetooth, rather cleverly, enumerated character repertoires
of Unicode (subsets) by REFERENCE to existing legacy character sets
(see the excerpt from BPP below).
</ira>
1. Aside from the Euro, all printers seem to provide the basic
Latin characters needed for English and most Western European
languages. If you do a language/country-based scheme, it should
address the presence/absence of the Euro symbol as a separate
entity. [this doesn't quite sound right to me, but in the context
of ISO Latin 1 the Euro is a major pain WRT support in printers;
do with it what you will...]
<ira> Good point about the Euro - ambiguous when you say ISO-8859-1
(Latin-1), because you might well mean that the Euro IS defined at
the canonical location of 0xA4 assigned in ISO-8859-16, rather
than the original (non-specific) CURRENCY SIGN at 0xA4 in Latin-1.
</ira>
2. Providing a list of Unicode ranges may be the simplest way
of reporting what the device supports, and the client can use
this to choose embedding/exclusion/error display when the
user prints something. This needs to be a per-font resource.
<ira> Since Unicode ranges are assigned well-known names according
to the language/script repertoire, I'd rather we used those names
(and standardized that they reference exactly the ranges assigned
in perpetuity in Unicode 3.2 (current) and above. For example,
the range U+1000 to U+109F is Myanmar (script used to write
Burmese).
</ira>
3. In addition to or instead of #2, you could define a CSS
attribute that determines what the device does for characters
it does not have: exclude (blanks or squares), substitute (from
another font with the required characters), or error out.
--
______________________________________________________________________
Michael Sweet, Easy Software Products mike at easysw.com
Printing Software for UNIX http://www.easysw.com
----------------------------------------------------------------------
[excerpt from draft Bluetooth Basic Printing Profile v0.95a (5 Oct 2001)]
BLUETOOTH SPECIFICATION
Basic Printing Profile Page 118 of 131
The most current version of the bit assignments for the Character
Repertoires
Supported field may be found in the Host Operating Environment Identifiers
section of
the Bluetooth Assigned Numbers Document [16]. Unassigned bits will be
assigned by
the maintainer of [16] according to procedures described by the Bluetooth
SIG. The
general guideline is that each bit should indicate a subset of the 4 -byte
Unicode
space of use to providers of Senders, Printers, fonts, and Internet content,
with
appropriate support from national standards groups. It is strongly
recommended that
new character repertoires also be filed with IANA (see [36]).
The capability to print 7-bit US-ASCII characters is not listed as part of
the following
table; however, that capability is mandatory for all Printers supporting any
part of this
Profile.
Bit Number Character Repertoire Description
Bit0 ISO-8859-1 Latin alphabet No. 1
Bit1 ISO-8859-2 Latin alphabet No. 2
Bit2 ISO-8859-3 Latin alphabet No. 3
Bit3 ISO-8859-4 Latin alphabet No. 4
Bit4 ISO-8859-5 Latin/Cyrillic alphabet
Bit5 ISO-8859-6 Latin/Arabic alphabet
Bit6 ISO-8859-7 Latin/Greek alphabet
Bit7 ISO-8859-8 Latin/Hebrew alphabet
Bit8 ISO-8859-9 Latin alphabet No. 5
Bit9 ISO-8859-10 Latin alphabet No. 6
Bit10 ISO-8859-13 Latin alphabet No. 7
Bit11 ISO-8859-14 Latin alphabet No. 8
Bit12 ISO-8859-15 Latin alphabet No. 9
Bit13 GB_2312-80 Chinese (People's Republic of China)
Bit14 Shift_JIS Japanese
Bit15 KS_C_5601-1987 Korean
Bit16 Big5 Chinese (Taiwan)
Bit17 TIS-620 Thai
Bits18-127 Reserved (These bits will be allocated by the
Bluetooth SIG. The Printer should set
them to zero if not yet allocated, or if
relevant character repertoire is not
supported.)
Table 43: Character Repertoires Supported
-------------------------------------------------------
[excerpt from CEN CWA 13873:2000]
Annex B. List of languages covered by MES-1 (Informative)
The Multilingual European Subset No 1 is believed to cover at least
the languages listed here:
Afrikaans
Albanian
Basque
Breton
Catalan
Croatian
Czech
Danish
Dutch
English
Esperanto
Estonian
Faroese
Finnish
French
Frisian
Galician
German
Greenlandic
Hungarian
Icelandic
Irish Gaelic (new orthogra-phy)
Italian
Latvian
Lithuanian
Luxemburgish
Maltese
Manx Gaelic
Moldavian (new orthogra-phy,
with restrictions; has Þ
ß â ãthough Y Z i lare pre-ferred)
Northern Sámi
Norwegian
Occitan
Polish
Portuguese
Rhaeto-Romanic
Romanian (with restrictions;
has Þ ß â ãthough Y Z i lare
preferred)
Scottish Gaelic
Slovak
Slovenian
Lower Sorbian
Upper Sorbian
Spanish
Swedish
Turkish
Welsh (with restrictions;
only W ^ w^ Y ´ y´ Y ^ y^ Y and ÿ)
Annex C. List of languages covered by MES-2 (Informative)
In addition to the languages listed in annex B, the Multilingual
European Subset No. 2 is believed to cover at least the languages
listed in C.1-C.3.
C.1 Latin script
Arumanian
Asturian
Azerbaijani (new orthogra-phy)
Cornish
Friulian
Inari Sámi
Irish Gaelic (old and new
orthographies)
Istro-Romanian
Karelian
Kashubian
Ladin
Latin
Lule Sámi
Megleno-Romanian
Northern Sámi
Romani
Romanian
Skolt Sámi
Southern Sámi
Vepsian
Votic
Welsh
C.2 Greek script
Greek
C.3 Cyrillic script
Abaza
Abkhaz
Adyge
Altai
Avar
Azerbaijani (old orthogra-phy)
Balkar
Bashkir
Belarussian
Bulgarian
Buryat
Chechen
Chukchi
Chuvash
Crimean Tatar
Dargwa
Dungan
Even
Evenki
Gagauz
Hill Mari
Ingush
Kabardian
Kalmuk
Kalmyk
Karaim
Karakalpak
Kazakh
Khakas
Khanty
Komi
Komi-Permyak
Koryak
Kumyk
Kyrgyz
Lak
Lezgian
Mansi
Meadow Mari
Moksha
Moldavian (old orthography)
Nanai
Nenets
Nogai
Ossetian
Romani
Russian
Rutul
Serbian
Siberian Yupik
Slavic Macedonian
Tabasaran
Tajik
Tatar
Tati
Türkmen
Tuva
Udmurt
Uighur
Ukrainian
Uzbek
Yakut