Ira,
There appears to be a serious issue with how the IETF is tracking 1766. The following link points to 3066 AND 3282:
http://tools.ietf.org/html/rfc1766
3282 has not been updated.
3066 points to 4646 and 4647, and 4646 points to 5646.
So the currently-in-effect RFCs (replacing 1766) are:
RFC 3282: Content Language Headers
RFC 4647: Matching of Language Tags
RFC 5646: Tags for Identifying Languages
I am not going to rant about what the ISO and IETF have done to language tags. However, I do think it would be useful to add some text to IPP Everywhere to talk about the current state of affairs and some general guidelines, namely:
1. Use the rules in RFC 4647 for matching language tags.
2. Validate language tag values by supported characters - lowercase letters, digits, and "-".
3. Do not support extended ISO 639 language codes since technically they violate RFC 5656 (which requires the shortest ISO 639 code be used); for example, "english" is not OK but "en", "en-us", "en-uk", etc. are fine.
4. Expect Clients to make requests with language codes that do not match the generated-natural-language-supported values but map to them using the matching rules in #1.
For #4 the common issues I've dealt with in CUPS are:
- "no" (formerly "Norwegian Bokmål) is now "nb" for clarity; the other (less) common language in Norway is "Norwegian Nynorsk" ("nn").
- "zh-cn" is now "zh-hans", "zh-hans-cn", or "zh-hant-cn" (based on usage on Linux/UNIX)
- "zh-tw" is now "zh-hant", "zh-hans-tw", or "zh-hant-tw" (based on usage on Linux/UNIX)
The latter issue is the one that causes the greatest interoperability issues, and one that I have fielded many bugs for in CUPS. In some future release libcups will have a secret decoder ring API that does matching and substitution in the face of two different language codes that mean the same thing.
Thoughts?
On May 1, 2012, at 8:25 AM, Ira McDonald wrote:
> Hi,
>> Per the IETF RFC Index, the latest successors to RFC 1766 are:
>> 5646 Tags for Identifying Languages. A. Phillips, Ed., M. Davis, Ed..
> September 2009. (Format: TXT=208592 bytes) (Obsoletes RFC4646) (Also
> BCP0047) (Status: BEST CURRENT PRACTICE)
>> 4647 Matching of Language Tags. A. Phillips, M. Davis. September 2006.
> (Format: TXT=45595 bytes) (Obsoletes RFC3066) (Also BCP0047) (Status:
> BEST CURRENT PRACTICE)
>> Language tags can now include embedded tags for scripts,
> geographic regions (not just countries), variants, and private
> use tags.
>> Parsing language tags is no longer simple. Beware that they
> embedded the script tag (if present) BEFORE the region tag
> (e.g., 'US'). Naive parsers will fail badly. See the ABNF on
> pages 5-6 of RFC 5646. Reference libraries for parsing do
> exist.
>> Both simple language tags (e.g., 'en') and simple region tags
> (e.g., 'US') can now be EITHER 2 or 3 characters. Script
> tags must be exactly 4 characters. Variant tags can be 5-8
> characters. See the examples right after the ABNF.
>> Of course the fixed language and country tags of 2 characters
> in the Printer MIB v2 are broken - they were broken when it
> was published, but I failed to convince the other editors to let
> me fix it (many naive clients would have been broken).
>> BEWARE: 3-character language tags are in common use
> and are in fact preferred where an older 2-character tag was
> already registered (I can't explain why - never understood it).
>> Cheers,
> - Ira
>>> Ira McDonald (Musician / Software Architect)
> Chair - Linux Foundation Open Printing WG
> Secretary - IEEE-ISTO Printer Working Group
> Co-Chair - IEEE-ISTO PWG IPP WG
> Co-Chair - TCG Trusted Mobility Solutions WG
> Chair - TCG Embedded Systems Hardcopy SG
> IETF Designated Expert - IPP & Printer MIB
> Blue Roof Music/High North Inc
>http://sites.google.com/site/blueroofmusic>http://sites.google.com/site/highnorthinc> mailto:blueroofmusic at gmail.com> Winter 579 Park Place Saline, MI 48176 734-944-0094
> Summer PO Box 221 Grand Marais, MI 49839 906-494-2434
>>>> On Mon, Apr 30, 2012 at 8:04 PM, Michael Sweet <msweet at apple.com> wrote:
> Glen,
>> On Apr 30, 2012, at 3:59 PM, Petrie, Glen wrote:
>> Can someone provide me the reference for the natural-language registry is used by IPP?
>>> From RFC 2911:
>>> 4.1.8 'naturalLanguage'
>> The 'naturalLanguage' attribute syntax is a standard identifier for a
> natural language and optionally a country. The values for this
> syntax type are defined by RFC 1766 [RFC1766]. Though RFC 1766
> requires that the values be case-insensitive US-ASCII [ASCII], IPP
> requires all lower case to simplify comparing by IPP clients and
> Printer objects. Examples include:
>> 'en': for English
> 'en-us': for US English
> 'fr': for French
> 'de': for German
>> The maximum length of 'naturalLanguage' values used to represent IPP
> attribute values is 63 octets.
>> which leads to RFC 3282 (the replacement), which references ISO 639, 639-2, 3166, and 15924.
>> _________________________________________________________
> Michael Sweet, Senior Printing System Engineer, PWG Chair
>>> --
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.
>> _______________________________________________
> ipp mailing list
>ipp at pwg.org>https://www.pwg.org/mailman/listinfo/ipp>>
_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pwg.org/pipermail/ipp/attachments/20120501/ba4211b4/attachment-0001.html>