Hi,
I was in Palo Alto yesterday and managed to corner Larry Masinter on the
form-data document. He knocked up this version on-the-fly, and asked us to
give back any comments on the scope and content over the next couple of
days, make that Friday this week. The document will initially be published
as a personal Internet-Draft from Larry, but can then proceed directly from
there to the standards process, without the need to go through any WG
(according to Larry).
Larry will send the document to the IETF before the deadline for Memphis.
Regards,
Carl-Uno
--
Internet Draft Larry Masinter
March 18, 1997
Expires in 6 months
multipart/form-data: a format
for returning the values obtained
from filling out a form
Status of this Memo
Internet draft boilerplate
This memo defines an Experimental Protocol for the Internet
community. This memo does not specify an Internet standard of any
kind. Discussion and suggestions for improvement are requested.
Distribution of this memo is unlimited.
1. Abstract
This specification defines an Internet Media Type, multipart/form-data,
which can be used by a wide variety of applications and transported
by a wide variety of protocols as a way of returning a set of values
as the result of a user filling out a form. Typical applications
include form values generated by HTML forms and submitted by
HTTP post or by electronic mail, but the format is independent
of those contexts. This data type is unchanged from its original
description as part of RFC 1867.
2 Use of multipart/form-data
The definition of multipart/form-data is included in section 3. A
boundary is selected that does not occur in any of the data. (This
selection is sometimes done probabilisticly.) Each field of the form
is sent, in the order in which it occurs in the form, as a part of
the multipart stream. Each part identifies the INPUT name within the
original form. Each part should be labelled with an appropriate
content-type if the media type is known (e.g., inferred from the file
extension or operating system typing information) or as
application/octet-stream.
If multiple files are selected, they should be transferred together
using the multipart/mixed format.
While the HTTP protocol can transport arbitrary BINARY data, the
default for mail transport (e.g., if the ACTION is a "mailto:" URL)
is the 7BIT encoding. The value supplied for a part may need to be
encoded and the "content-transfer-encoding" header supplied if the
value does not conform to the default encoding. [See section 5 of
RFC 1521 for more details.]
The original local file name may be supplied as well, either as a
'filename' parameter either of the 'content-disposition: form-data'
header or in the case of multiple files in a 'content-disposition:
file' header of the subpart. The client application should make best
effort to supply the file name; if the file name of the client's
operating system is not in US-ASCII, the file name might be
approximated or encoded using the method of RFC 1522. This is a
convenience for those cases where, for example, the uploaded files
might contain references to each other, e.g., a TeX file and its .sty
auxiliary style description.
On the server end, the ACTION might point to a HTTP URL that
implements the forms action via CGI. In such a case, the CGI program
would note that the content-type is multipart/form-data, parse the
various fields (checking for validity, writing the file data to local
files for subsequent processing, etc.).
3. definition of multipart/form-data
The media-type multipart/form-data follows the rules of all multipart
MIME data streams as outlined in RFC 1521. It is intended for use in
returning the data that comes about from filling out a form. In a
form (in HTML, although other applications may also use forms), there
are a series of fields to be supplied by the user who fills out the
form. Each field has a name. Within a given form, the names are
unique.
multipart/form-data contains a series of parts. Each part is expected
to contain a content-disposition header where the value is "form-
data" and a name attribute specifies the field name within the form,
e.g., 'content-disposition: form-data; name="xxxxx"', where xxxxx is
the field name corresponding to that field. Field names originally in
non-ASCII character sets may be encoded using the method outlined in
RFC 1522.
As with all multipart MIME types, each part has an optional Content-
Type which defaults to text/plain. If the contents of a file are
returned via filling out a form, then the file input is identified as
application/octet-stream or the appropriate media type, if known. If
multiple files are to be returned as the result of a single form
entry, they can be returned as multipart/mixed embedded within the
multipart/form-data.
Each part may be encoded and the "content-transfer-encoding" header
supplied if the value of that part does not conform to the default
encoding.
File inputs may also identify the file name. The file name may be
described using the 'filename' parameter of the "content-disposition"
header. This is not required, but is strongly recommended in any case
where the original filename is known. This is useful or necessary in
many applications.
4. Other considerations
4.1 Compression, encryption
Some of the data in forms may be compressed or encrypted, using
other MIME mechanisms.
4.2 Transmitting long files in form-data
<discussion of how this can work>
In some situations, it might be advisable to have the server validate
various elements of the form data (user name, account, etc.) before
actually preparing to receive the data. However, after some
consideration, it seemed best to require that servers that wish to do
this should implement this as a series of forms, where some of the
data elements that were previously validated might be sent back to
the client as 'hidden' fields, or by arranging the form so that the
elements that need validation occur first. This puts the onus of
maintaining the state of a transaction only on those servers that
wish to build a complex application, while allowing those cases that
have simple input needs to be built simply.
The HTTP protocol may require a content-length for the overall
transmission. Even if it were not to do so, HTTP clients are
encouraged to supply content-length for overall file input so that a
busy server could detect if the proposed file data is too large to be
processed reasonably and just return an error code and close the
connection without waiting to process all of the incoming data. Some
current implementations of CGI require a content-length in all POST
transactions.
In any case, a HTTP server may abort a file upload in the middle of
the transaction if the file being received is too large.
4.3 Other choices for return transmission of binary data
Various people have suggested using new mime top-level type
"aggregate", e.g., aggregate/mixed or a content-transfer-encoding of
"packet" to express indeterminate-length binary data, rather than
relying on the multipart-style boundaries. While we are not opposed
to doing so, this would require additional design and standardization
work to get acceptance of "aggregate". On the other hand, the
'multipart' mechanisms are well established, simple to implement on
both the sending client and receiving server, and as efficient as
other methods of dealing with multiple combinations of binary data.
4.5 Transmitting form-data via mail
Some forms will allow the results to be mailed, e.g., by supplying
a "mailto" URL as the form's action. In this case, a mail appropriate
choice for encoding must be made for the form and its data.
4.6 Remote files with third-party transfer
In some scenarios, the user operating the client software might want
to specify a URL for remote data rather than a local file. In this
case, is there a way to allow the browser to send to the client a
pointer to the external data rather than the entire contents? This
capability could be implemented, for example, by having the client
send to the server data of type "message/external-body" with
"access-type" set to, say, "uri", and the URL of the remote data in
the body of the message.
4.7 CRLF used as line separator
As with all MIME transmissions, CRLF is used as the separator for
lines in a POST of the data in multipart/form-data.
4.8 Relationship to multipart/related
The MIMESGML group is proposing a new type called multipart/related.
While it contains similar features to multipart/form-data, the use
and application of form-data is different enough that form-data is
being described separately.
It might be possible at some point to encode the result of HTML forms
(including files) in a multipart/related body part; this is not
incompatible with this proposal.
4.9 Non-ASCII field names
Note that mime headers are generally required to consist only of 7-
bit data in the US-ASCII character set. Hence field names should be
encoded according to the prescriptions of RFC 1522 if they contain
characters outside of that set. In HTML 2.0, the default character
set is ISO-8859-1, but non-ASCII characters in field names should be
encoded.
5. Security Considerations
TBD
6. Author's Addresses
Larry Masinter
Xerox Palo Alto Research Center
3333 Coyote Hill Road
Palo Alto, CA 94304
Phone: (415) 812-4365
Fax: (415) 812-4333
EMail: masinter at parc.xerox.com
Carl-Uno Manros
Principal Engineer - Advanced Printing Standards - Xerox Corporation
701 S. Aviation Blvd., El Segundo, CA, M/S: ESAE-231
Phone +1-310-333 8273, Fax +1-310-333 5514
Email: manros at cp10.es.xerox.com