[IPP] What natural-language registry is used by IPP?

Tue May 1 23:00:40 UTC 2012

Ira,

There appears to be a serious issue with how the IETF is tracking 1766. The following link points to 3066 AND 3282:

    http://tools.ietf.org/html/rfc1766

3282 has not been updated.

3066 points to 4646 and 4647, and 4646 points to 5646.

So the currently-in-effect RFCs (replacing 1766) are:

    RFC 3282: Content Language Headers
    RFC 4647: Matching of Language Tags
    RFC 5646: Tags for Identifying Languages

I am not going to rant about what the ISO and IETF have done to language tags. However, I do think it would be useful to add some text to IPP Everywhere to talk about the current state of affairs and some general guidelines, namely:

1. Use the rules in RFC 4647 for matching language tags.

2. Validate language tag values by supported characters - lowercase letters, digits, and "-".

3. Do not support extended ISO 639 language codes since technically they violate RFC 5656 (which requires the shortest ISO 639 code be used); for example, "english" is not OK but "en", "en-us", "en-uk", etc. are fine.

4. Expect Clients to make requests with language codes that do not match the generated-natural-language-supported values but map to them using the matching rules in #1.

For #4 the common issues I've dealt with in CUPS are:

- "no" (formerly "Norwegian Bokmål) is now "nb" for clarity; the other (less) common language in Norway is "Norwegian Nynorsk" ("nn").
- "zh-cn" is now "zh-hans", "zh-hans-cn", or "zh-hant-cn" (based on usage on Linux/UNIX)
- "zh-tw" is now "zh-hant", "zh-hans-tw", or "zh-hant-tw" (based on usage on Linux/UNIX)

The latter issue is the one that causes the greatest interoperability issues, and one that I have fielded many bugs for in CUPS.  In some future release libcups will have a secret decoder ring API that does matching and substitution in the face of two different language codes that mean the same thing.

Thoughts?

On May 1, 2012, at 8:25 AM, Ira McDonald wrote:

> Hi,
> 
> Per the IETF RFC Index, the latest successors to RFC 1766 are:
> 
> 5646 Tags for Identifying Languages. A. Phillips, Ed., M. Davis, Ed..
>      September 2009. (Format: TXT=208592 bytes) (Obsoletes RFC4646) (Also
>      BCP0047) (Status: BEST CURRENT PRACTICE)
> 
> 4647 Matching of Language Tags. A. Phillips, M. Davis. September 2006.
>      (Format: TXT=45595 bytes) (Obsoletes RFC3066) (Also BCP0047) (Status:
>      BEST CURRENT PRACTICE)
> 
> Language tags can now include embedded tags for scripts, 
> geographic regions (not just countries), variants, and private 
> use tags.
> 
> Parsing language tags is no longer simple.  Beware that they
> embedded the script tag (if present) BEFORE the region tag
> (e.g., 'US').  Naive parsers will fail badly.  See the ABNF on
> pages 5-6 of RFC 5646.  Reference libraries for parsing do
> exist.
> 
> Both simple language tags (e.g., 'en') and simple region tags
> (e.g., 'US') can now be EITHER 2 or 3 characters.  Script
> tags must be exactly 4 characters.  Variant tags can be 5-8
> characters.  See the examples right after the ABNF.
> 
> Of course the fixed language and country tags of 2 characters
> in the Printer MIB v2 are broken - they were broken when it
> was published, but I failed to convince the other editors to let
> me fix it (many naive clients would have been broken).
> 
> BEWARE:  3-character language tags are in common use
> and are in fact preferred where an older 2-character tag was
> already registered (I can't explain why - never understood it).
> 
> Cheers,
> - Ira
> 
> 
> Ira McDonald (Musician / Software Architect)
> Chair - Linux Foundation Open Printing WG
> Secretary - IEEE-ISTO Printer Working Group
> Co-Chair - IEEE-ISTO PWG IPP WG
> Co-Chair - TCG Trusted Mobility Solutions WG
> Chair - TCG Embedded Systems Hardcopy SG
> IETF Designated Expert - IPP & Printer MIB
> Blue Roof Music/High North Inc
> http://sites.google.com/site/blueroofmusic
> http://sites.google.com/site/highnorthinc
> mailto:blueroofmusic at gmail.com
> Winter  579 Park Place  Saline, MI  48176  734-944-0094
> Summer  PO Box 221  Grand Marais, MI 49839  906-494-2434
> 
> 
> 
> On Mon, Apr 30, 2012 at 8:04 PM, Michael Sweet <msweet at apple.com> wrote:
> Glen,
> 
> On Apr 30, 2012, at 3:59 PM, Petrie, Glen wrote:
>> Can someone provide me the reference for the natural-language registry is used by IPP?
> 
> 
> From RFC 2911:
> 
> 
> 4.1.8 'naturalLanguage'
> 
>    The 'naturalLanguage' attribute syntax is a standard identifier for a
>    natural language and optionally a country.  The values for this
>    syntax type are defined by RFC 1766 [RFC1766].  Though RFC 1766
>    requires that the values be case-insensitive US-ASCII [ASCII], IPP
>    requires all lower case to simplify comparing by IPP clients and
>    Printer objects.  Examples include:
> 
>       'en':  for English
>       'en-us': for US English
>       'fr': for French
>       'de':  for German
> 
>    The maximum length of 'naturalLanguage' values used to represent IPP
>    attribute values is 63 octets.
> 
> which leads to RFC 3282 (the replacement), which references ISO 639, 639-2, 3166, and 15924.
> 
> _________________________________________________________
> Michael Sweet, Senior Printing System Engineer, PWG Chair
> 
> 
> -- 
> This message has been scanned for viruses and 
> dangerous content by MailScanner, and is 
> believed to be clean.
> 
> _______________________________________________
> ipp mailing list
> ipp at pwg.org
> https://www.pwg.org/mailman/listinfo/ipp
> 
> 

_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair

-- 
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pwg.org/pipermail/ipp/attachments/20120501/ba4211b4/attachment-0001.html>