Hi,
In some places POSIX uses the "collection of characters" phrasing.
In others it uses (especially in the revised POSIX:2000 spec) the
"subset of characters defined in a larger character set..."
phrasing. I think it's important to ALSO list the classic (1)
definition in our spec. It makes clear where definition (2)
came from.
The ISO 10646 folks have being developing named formal ISO Profiles
(a kind of ISO derived standard) that define "character repertoires"
that are subsets of ISO 10646/Unicode (not the subsets we want, by
the way, but more generic ones like Western European coverage).
Cheers,
- Ira
-----Original Message-----
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Thursday, January 09, 2003 10:11 AM
To: McDonald, Ira
Cc: cr@pwg.org
Subject: RE: CR> CR teleconference and Implementor's Guide
Ira,
I think definition #2 covers exactly what we are trying to do. Is this
form in prior use?
-Bluetooth BPP: yes
-Unicode: I couldn't get this meaning out of the Unicode glossary
-Posix: ???
At the call yesterday there was some interest in the term "character
collection" as an alternative to "repertoire". Have you encounted this?
Group: I am going to use Ira's definitions in the next version of the
Guide.
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534
"McDonald, Ira"
<imcdonald@shar To:
"'ElliottBradshaw@oaktech.com'"
plabs.com> <ElliottBradshaw@oaktech.com>,
Jun Fujisawa
<fujisawa.jun@canon.co.jp>
01/08/2003 cc: cr@pwg.org,
owner-cr@pwg.org
05:39 PM Subject: RE: CR> CR
teleconference and
Implementor's Guide
Hi folks,
Sorry I missed the telecon earlier today. I failed to
note the earlier time (3pm EST rather than 5pm EST).
I wrote the following definition (for CUPS documentation),
drawing on POSIX.1 (ISO 9945-1) and Unicode 3.2 glossaries:
Character Repertoire:
(1) The complete set of characters defined in a given named
character set, such as ISO 8859-1.
(2) The subset of characters defined in a large character
set, such as Unicode 3.2, that are needed for an exact
mapping to a smaller character set, such as ISO 8859-1.
For PWG CR, we could refine (2) above to fix Unicode 3.2
(or later) as the "large character set".
Cheers,
- Ira McDonald.
-----Original Message-----
From: ElliottBradshaw@oaktech.com [mailto:ElliottBradshaw@oaktech.com]
Sent: Wednesday, January 08, 2003 10:41 AM
To: Jun Fujisawa
Cc: cr@pwg.org; owner-cr@pwg.org
Subject: Re: CR> CR teleconference and Implementor's Guide
Hello Fujisawa-san,
Thanks for the useful information. I think we can get a lot of what we
need from the Japanese Profile document.
I am not entirely satisfied by the term "repertoire", and would like to
have some discussion in the group. We are looking for a term that means
"named subset of Unicode characters, without regard to encoding."
Bluetooth uses "repertoire" in this way. Some other ideas:
-character complement
-Unicode Subset
-CCSS (Coded Character SubSet)
I'd like proposals for the term, as well as how we will actually define it.
With regard to Shift-JIS, I now understand that there is no universal
mapping from it to Unicode. And, many Japanese web pages still use
Shift-JIS. So, we may want to recommend that a Japanese-capable printer
support Shift-JIS as well as UTF-8, and that a Japanese-capable client use
Shift-JIS if it is available. Otherwise the client must map to Unicode,
and deal with the ambiguities of the different available mappings. I
wonder how strongly we should follow Microsoft's lead in this area...
------------------------------------------
Elliott Bradshaw
Director, Software Engineering
Oak Technology Imaging Group
781 638-7534
Jun Fujisawa
<fujisawa.jun@ca To:
ElliottBradshaw@oaktech.com
non.co.jp> cc: cr@pwg.org
Sent by: Subject: Re: CR> CR
teleconference and
owner-cr@pwg.org Implementor's Guide
01/06/2003 05:43
AM
Hello Elliott,
At 2:16 PM -0500 03.1.3, ElliottBradshaw@oaktech.com wrote:
>As our main topic I would like to go through the draft Implementor's
Guide,
>which I have placed at:
>ftp://ftp.pwg.org/pub/pwg/Character-Repertoires/CRimplementorsGuide.htm.
I would like to point out that the terms "repertoire" and "character set"
as
defined in Terminology section does not seem to be consistent with the
usage
in W3C Character Model.
For example, the use of therm "character set" is discouraged in Section
3.6.2
of Character Model for the World Wide Web 1.0
- Character Model for the World Wide Web 1.0
<http://www.w3.org/TR/charmod/>
>As before, my biggest challenge is finding online, normative material for
>the details of the Asian character sets (except Korean, which is covered
in
>an RFC).
Unfortunately, the only normative materials to the definitions of Japanese
coded character sets (CCS) are Japanese national standards.
- JIS X 0201
Japanese Industrial Standards Committee. 7-bit and 8-bit coded character
sets for information interchange, JIS X 0201:1997, Japanese Standards
Association, 1997.
- JIS X 0208
Japanese Industrial Standards Committee. 7-bit and 8-bit double byte coded
KANJI sets for information interchange, JIS X 0208:1997, Japanese
Standards Association, 1997.
- JIS X 0212
Japanese Industrial Standards Committee. Code of the supplementary Japanese
graphic character set for information interchange, JIS X0212:1990,
Japanese Standards Association, 1990.
- JIS X 0221
Japanese Industrial Standards Committee. Universal Multiple-Octet Coded
Character Set (UCS) -- Part 1: Architecture and Basic
Also, I suggest to consult the following W3C Note for the detailed
information
on some Japanese character encoding schemes (CES) and their mappings to
Unicode.
- XML Japanese Profile
<http://www.w3.org/TR/japanese-xml/>
-- Jun Fujisawa <mailto:fujisawa.jun@canon.co.jp>
This archive was generated by hypermail 2b29 : Thu Jan 09 2003 - 13:02:07 EST