It isn't hard to find the boundary string in binary data. Look
for CRLF followed by the boundary string followed by CRLF. It
doesn't matter whether a CRLF in the midst of binary data might
actually be binary data, the robustness of multipart/* is based
on the fact that the boundary doesn't appear, not on the parsing
of CRLFs.
> Besides searching for the boundary marker was very expensive computationally, every byte had to be examnined.
I think 'very' is pretty subjective, and using string matching
algorithms like boyer-moore mean that the number of comparisons
is reduced for longer boundary strings.
> As I understand it, the selection of a boundary string in MIME is
> already 'probabilistic'; the sender is responsible for choosing a
> string that 'probably' won't appear in the body (I do not claim to
> be an authority on MIME).
The standard says only that the boundary string DOES NOT appear
in the body. It happens that if you know nothing about the body
at all, then you can implement this in a probabalistic way, e.g.,
the likelihood that a randomly generated boundary string would
appear in arbitrary data could be made arbitrarily small ("less
than the probability that the computer would spontaneously explode").
In any case, I don't think we're going to change HTTP to suddenly
require content-length on multipart boundaries; there may be some
clarification needed to identify what's necessary for an interoperable
implementation.
> I was keen to keep the HTTP spec clean ...
We've lost this particular battle a long time ago. I just want
to keep the HTTP spec ambiguous and functional.
I think we could do chunked multipart if we need to, but I don't think
we're going to be able to require senders to generate it, so it
may be that this whole discussion is just 'wishful thinking', or,
to put it another way, part of the requirements setting for
HTTP-NG.