IPP Mail Archive: Re: IPP> MOD - Separate 'document-format' and 'document-language'

Re: IPP> MOD - Separate 'document-format' and 'document-language'

Ira Mcdonald x10962 (imcdonal@eso.mc.xerox.com)
Tue, 30 Sep 1997 17:02:24 PDT

Hi Ned,

Thanks for the quick response. Actually I did re-read that section
of RFC 2046 and the short list (US-ASCII and ISO-8859-X) were
termed the 'Internet standard character sets'. It also says a
little later, 'this standard does NOT endorse the use of any
character set other than US-ASCII'.

I'm aware that UTF-8 is registered in the IANA character set
registry. What I couldn't find was an unambiguous statement
in RFC 2046 (or an updating RFC) that ANY IANA registered
character set MAY be specified in a 'charset' parameter of
a MIME 'media-type'. Can you point at such a statement, to
help us all out?

Cheers,
- Ira McDonald
------------------------- Ned's note --------------------------------
Return-Path: <Ned.Freed@innosoft.com>
Received: from zombi (zombi.eso.mc.xerox.com) by snorkel.eso.mc.xerox.com (4.1/XeroxClient-1.1)
id AA14548; Tue, 30 Sep 97 13:07:31 EDT
Received: from alpha.xerox.com by zombi (4.1/SMI-4.1)
id AA07025; Tue, 30 Sep 97 13:03:31 EDT
Received: from THOR.INNOSOFT.COM ([192.160.253.66]) by alpha.xerox.com with SMTP id <52232(4)>; Tue, 30 Sep 1997 10:03:26 PDT
Received: from INNOSOFT.COM by INNOSOFT.COM (PMDF V5.1-10 #8694)
id <01IO7JHY9WM894GL1L@INNOSOFT.COM> for imcdonal@eso.mc.xerox.com; Tue,
30 Sep 1997 10:01:22 PDT
Date: Tue, 30 Sep 1997 09:35:50 PDT
From: Ned Freed <Ned.Freed@innosoft.com>
Subject: Re: IPP> MOD - Separate 'document-format' and 'document-language'
In-Reply-To: "Your message dated Tue, 30 Sep 1997 06:50:29 -0700 (PDT)"
<9709301350.AA14192@snorkel.eso.mc.xerox.com>
To: imcdonal@eso.mc.xerox.com
Cc: ipp@pwg.org
Message-Id: <01IO902DAVIO94GL1L@INNOSOFT.COM>
Mime-Version: 1.0
Content-Type: TEXT/PLAIN; CHARSET=US-ASCII
Status: R

> After talking with Larry Masinter yesterday, I WITHDRAW my suggestion
> that IPP's 'document-format' attribute be an extended form of a MIME
> 'media-type' (used in 'Content-Type' headers), with an added 'language'
> parameter.

> Larry argues that this fosters incoherence (in IETF standard protocols)
> and forces an IPP Printer (ie, server application) to sometimes PARSE
> 'document-format', in order to construct MIME headers for 'Content-Type'
> and 'Content-Language' (thus 'document-format' would NOT be opaque to
> the IPP server application - this is not good).

> Instead, I suggest we have two MANDATORY attributes for job operations
> (and the Job Monitoring MIB):

> 1) 'document-format'
> - value is 'media-type' (with 'charset' for 'text/*' types)
> - maps one-to-one to MIME 'Content-Type' header

> 2) 'document-language'
> - value is an RFC 1766 compliant language tag
> - maps one-to-one to MIME 'Content-Language' header

> There remains one apparent problem with using MIME 'media-types' (see
> RFC 2046) for IPP 'document-format' - their possible limitation (see
> RFC 2046, section 4.1.2 'Charset Parameter', page 7) to the use of ONLY
> US-ASCII (7-bit) or ISO-8859-X (8-bit) character sets.

Such a restriction only exists in your imagination, I'm afraid. You need to
reread the section you cited. In particular, you should note that the list of
charsets it specifies is an *initial* list. Many other charsets can, and have
been, registered. Over 200 of them, as a matter of fact.

Now, there are a fair number of problems with our current charset registration
procedures, but lack of registered charsets definitely isn't one of them.

> Support for UTF-8 (RFC 2044, IANA registered character set type for ISO
> 10646 folded into a multi-octet 8-bit superset of US-ASCII, is critical
> for IPP documents.

UTF-8 is already registered and hence it is entirely legal to use it
in MIME. See:

ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets

> Support for ALL of the IANA registered character set
> types is highly desirable (and coherent with the revised ABNF for MIME
> parameter VALUES specified in RFC 2184).

This issue is being addressed in the new charset registration procedures. See
draft-freed-charset-reg-03.txt (soon to be -04) for specifics.

Ned