Bill,
My responses are in-line below.
Pete
Peter Zehler
PARC, A Xerox Company
800 Phillips Rd, 128-27E
Webster NY, 14580-9701
Email: Peter.Zehler at Xerox.com<mailto:Peter.Zehler at Xerox.com>
Office: +1 (585) 265-8755
Mobile: +1 (585) 329-9508
FAX: +1 (585) 265-7441
From: William A Wagner [mailto:wamwagner at comcast.net]
Sent: Friday, September 26, 2014 5:19 PM
To: Zehler, Peter
Cc: ipp at pwg.org
Subject: RE: IPP Scan question.
Pete,
Thanks for the correction. A few items for clarification to see if I got it right. I would very much appreciate either agreement with or correction of my interpretation.
1. Hardcopy document refers just to the material that is being scanned and has no other correlation to Digital Document. For example, conceivably, a set of pages from multiple books could be scanned with the image data ultimately appearing in a single document. Alternatively, multiple pages for a single hardcopy document could be scanned and appear as multiple Digital Documents.
2. <PZ>Correct</PZ>
3. Image refers to the content in a specified scan region of a hardcopy document. Unless the scanner is set up to combine the data from multiple scanned images, an image cannot refer to more than the content of one sheet side. Therefore, for example, a scan job concerned with multiple pages of a hardcopy document will contain multiple images.
4. <PZ>Correct</PZ>
3. There is a 1:1 relationship between Digital Documents and files, in that each Digital Document is formatted and stored as a separate file.
<PZ>That is true of the IPP binding for Network Scanning. The MFD Network Scan model permitted more than one file to be associated with a Digital Document. We never discussed the packaging of a multi-file Digital Document. We could add a "run list" that referenced all the files or we could, as I did in an early prototype, collect all the image files for a Digital Document into a folder and the Document Object would reference the folder. It will be up to the SM 3 group to determine what will be done with the SM3 Network Scan specification. The SM3 model can be simplified or there could be an "IPP Production Scanning Set 1" drafted. :)</PZ>
5. In general, the relationship of images to digital documents is format (and perhaps implementation) specific. If the 'document-format-accepted' is a document format such as PDF, there may be multiple images per document. If 'document-format-accepted' is an image format such as JPEG or GIF, each image is more likely a separate digital document.
6. <PZ>That is correct, I think. As far as I know there is no multipage JPEG format.</PZ>
5. In IPP Scan Push mode, all Digital Documents produced in a Job are scanned and formatted in the same way and stored to the same destination(s), as specified in the CreateJob.
<PZ>Not necessarily, the client could specify multiple acceptable formats. I see no limitation preventing a smart scan service storing content that contains text in a pdf document and photos in jpeg files. The usual case would be to consistently store all the Digital Documents produced in the Job.</PZ>
a. If the job produces multiple digital documents, the destination is a directory with each Digital Document being a separate file in that directory.
<PZ>Correct, it would be an error to specify a file as a destination for a multi-paged Digital Document. The error would something like "conflicting-attributes" or "document-access-error"</PZ>
b. If the job produces a single document, the specified destination is of the file.
<PZ>If it is a file that is where the Digital Document is stored. If it is a directory, a file would be created in that directory to hold the Digital Document.</PZ>
7. In IPP Scan Push mode, the client may specify multiple destinations.
<PZ>Yes.</PZ>
8. In IPP Scan Pull mode, each digital document produced by a job is sent back to the client in response to a GetNextDocumentImage request from the Client. Each document may contain data from a single image or from multiple images. There may be multiple documents as part of a single job.
<PZ>Yes</PZ>
a. Unlike in Push mode, the Compression Accepted and Document Format Accepted may be separately specified in each GetNextDocumentImage request. (I find this rather odd - especially since each such request does not necessarily correspond to either a Document or an image. )
<PZ>While this is true I would not expect the client to change the acceptable format/compression throughout the exchange. I believe it is the Scan Service that is in control of the format/compression as the Digital Document is being delivered. The acceptable format/compression operational attributes for the "Get-Next-Document-Images" Request could be deleted from the specification without any problems. It would probably remove some confusion. The format/compression attributes in the response and in the "Create-Job" request</PZ>
b. The mode of operation of GetNextDocumentImage depends upon whether Wait Mode is agreed upon. In Wait mode, data is sent as it becomes available and can be accepted. If not in Wait mode (or if Wait mode is interrupted or timed out) , the client must issue a GetNextDocumentImage for each buffer's-worth of data.
<PZ>Yes, it is a choice between synchronous and asynchronous network operations.</PZ>
8. This mode of transfer suggests that GetNextDocumentImage does not refer either to getting an Image or getting a Document, it just pulls data in a mode determined by the Wait mode. That data may be formatted into one or more Digital Documents, depending on format and contents.
<PZ>While that is true we had to call it something. We went through a couple of name changes. In IPP it's just "0x004A". It is pulling the data for an image(s) within a document. Subsequent responses may pull the data for an image(s) from within another document I'd prefer not to entertain cosmetic changes at this time.</PZ>
I also understand that you suggest that the Scan Service in SM3 be changed to agree with IPP Scan.
<PZ>.As previously stated, it will be up to the SM 3 group to determine what will be done with the SM3 Network Scan specification. The SM3 model can be simplified or there could be an "IPP Production Scanning Set 1" drafted. :)</PZ>
Many thanks,
Bill Wagner
-----Original Message-----
From: Zehler, Peter [mailto:Peter.Zehler at xerox.com]
Sent: Friday, September 26, 2014 9:20 AM
To: William A Wagner; 'Michael Sweet'
Cc: ipp at pwg.org<mailto:ipp at pwg.org>; cloud at pwg.org<mailto:cloud at pwg.org>
Subject: RE: IPP Scan question.
All,
IPP Scan can support multiple document jobs. There are attributes that allow the printer to declare that capability ( "multiple-document-jobs-supported") as well as operational attributes ("document-number", "last-document") to segment the data pulled from the scan service into multiple files (i.e. one file per document, number of images in a file is format and implementation specific). During the prototype I used a scanner that emitted JPG or PDF. When loading a stack of media into the ADF each image acquisition resulted in an image. The number of documents objects generated was dictated by output file type. In the IPP binding I limited the file to document object association to 1 to 1. I did not want to deal with the complexities of associating multiple files with a single document object. The abstract MFD Scan model did allow multiple files per document.
Running a stack of paper using JPG as the " document-format-accepted" resulted in a multiple files each of which was associated with a single document. Running that same stack of paper using PDF as the "document-format-accepted" resulted in a single multipage file associated with a single document. From the client perspective using Get-Next-Document-Images behaved a bit different for each job. With the JPG output the responses had a document number that changed throughout the scan job retrieval. The number of responses with the same document number varied based on the complexity of the image. Each time the document number changed, the output file is closed and a new one is opened. The last Get-Next-Document-Images for the last document in the job set the "last-document" to true. In a push job version of this scan job, the same number of files are created at the destination. With the PDF output the responses had a document number remained the same throughout the scan job retrieval. When the last Get-Next-Document-Images for the job had the "last-document" to true, the output file was closed. In a push job version of this scan job, one file was created at the destination.
The MFD Scan model was created with the idea that the same protocol would be used locally or remotely. Therefore the was considerable more control over the behavior of the scanner itself. The IPP Scan service simplified a number of aspects to address the 98% needs for network scanning in a mobile environment. I expect the MFD Scan service would be adjusted to better reflect implementation experience within the PWG (i.e., IPP Scan) and in the industry (e.g., WS-Scan, UPnP Scan, vendor specific scan).
Peter Zehler
PARC, A Xerox Company
800 Phillips Rd, 128-27E
Webster NY, 14580-9701
Email: Peter.Zehler at Xerox.com<mailto:Peter.Zehler at Xerox.com>
Office: +1 (585) 265-8755
Mobile: +1 (585) 329-9508
FAX: +1 (585) 265-7441
-----Original Message-----
From: William A Wagner [mailto:wamwagner at comcast.net]
Sent: Thursday, September 25, 2014 2:15 PM
To: 'Michael Sweet'
Cc: Zehler, Peter; ipp at pwg.org<mailto:ipp at pwg.org>; cloud at pwg.org<mailto:cloud at pwg.org>
Subject: RE: IPP Scan question.
Michael,
Thank you for your response.
1. I agree that Figure 3 of the MFD Scan spec definitely indicates that there can be multiple images in one scan document; I do not see where it indicates that there cannot be multiple documents is a job. Furthermore, Figure 4 of that same document (with the associated text) definitely states that, for a multi-document Job, " Job object contains multiple Document objects. Each Document can have a different set of processing parameters."
And further that the Scan Service semantic model may allow the End User to specify a multi-document Job as a service output. If we have intentionally decided to not consider multi-document jobs in IPP, that should be made clear. I think it is to be determined if we decide to eliminate them from the SM3. (Incidentally, I do not see a compelling Use Case for multi-document Scan Jobs, although some may exist.)
2. I get your explanation that Get-Next-Document-Images refers to multiple images of a document, and that "last-document" refers to the last image of a document. But these are names are misleading. Do we use 'Images' to refer to anything other than 'Document Images'?
I apologize for not commenting on the IPP Scan document earlier, but I think the one document per job characteristic, despite what one might expect from the names, should be made more clear. Also, as you suggest, the fact that for Pull Scan, the GetNextDocumentImages can redefine Compression Accepted and Document Format Accepted for each image of potentially multiple images document.
Thanks,
Bill Wagner
-----Original Message-----
From: Michael Sweet [mailto:msweet at apple.com]
Sent: Thursday, September 25, 2014 9:12 AM
To: William A Wagner
Cc: Zehler, Peter; ipp at pwg.org<mailto:ipp at pwg.org>; cloud at pwg.org<mailto:cloud at pwg.org>
Subject: Re: IPP Scan question.
Bill,
> On Sep 21, 2014, at 9:50 AM, William A Wagner <wamwagner at comcast.net<mailto:wamwagner at comcast.net>>
wrote:
> ...
> It is also clear from the IPP Scan specification GetNextDocumentImages
operation that a scan job can have multiple documents.
I don't think these are multiple document objects, however.
Get-Next-Document-Images is a convenient way to pull one or more images/pages from the scanner, but from the point of view of the model they are part of one document object and would be delivered (in the case of push
scan) as a single file.
>
> The Cloud conference call comment is that FetchJob (corresponding to
> Destination, DestinationAccesses, and InputElements for Scan with no
need to have a FetchDocument operation. This suggests that there is but one document (possibly with multiple destinations) in a Scan Job.
Alternatively, it may be that the Input Parameters and Destinations for each one of multiple documents are defined in the CreateJob. This seemes inconsistent with the general Imaging Service model.
In the case of Scan, the CreateScanJob operation is instantiating a single scan job containing a single document object that may have multiple digital representations (e.g. PDF, TIFF, etc.) of the same images. Figure 3 on page
22 of the MFD Scan spec seems pretty clear on that point. This is similar to how the Copy and FaxIn services work (single document jobs).
Print, FaxOut, and Transform can support multiple digital document inputs (and thus multiple document objects).
I think the only inconsistency here is that some job services support multiple document objects and some don't. But I don't think that hurts the overall model - just something worth pointing out.
(and perhaps as well worth considering/mentioning that most Print and FaxOut service implementations only support single document jobs...)
> The IPP Scan specification definitely refers to multiple documents in
> one
scan job. However, Figure 1 can be interpreted to mean that the only operation necessary for Scan is a CreateJob, with GetNextDocumentImages necessary if it is a Pull Scan Job. Indeed, InputAttributes is defined to be in the CreateJob request as well as are the Job Template attributes defining destination; but it does not appear that different InputAttributes and/or destinations can be specified for different documents.
I think the choice of reusing the "last-document" operation attribute in the response of Get-Next-Document-Images operation is causing confusion here. It really is (semantically) "last-document-image".
Pete, do you think this is worth an editorial change before publication, either the attribute name or the description ("indicating that the last document IMAGE has been reached")?
> [Also, Compression Accepted and Document Format Accepted are defined
> in CreateJob, but also in GetNextDocumentImages for Pull Scans. Can it
> be assumed that requests in GetNextDocumentImages takes precedence?]
I think this needs some clarification - you put those in Create-Job for a Push Scan and in Get-Document-Images for a Pull Scan.
> Do I correctly understand that, although there may be multiple
> documents
in a scan job, they must all have the same InputAttributes and the same destination(s)? An alternate approach might have been to send a SetDocumentAttributes sent for each document to be scanned, which contained the input parameters and destination for each specific document/image file; that would have been consistent with the Model.
Currently you scan whatever is at the input source and send it to the
destination(s) or pull the images with Get-Next-Document-Images. The only way to break things up is to create multiple jobs and specify the number of images for each job in the "input-images-to-transfer" member attribute.
> For Cloud, we need to decide whether we should reflect the Semantic
> Model
(with which we should bet be consistent) or the IPP Scan Binding. Or do we need to change the semantic model?
The intent is that IPP Scan would update the SM definition of SM Scan, since SM Scan doesn't deal with Pull Scan.
> Also, a few minor editorial comments/questions I had while looking up
stuff.
>
> 1. Table 1 lists Get-Next-Document-Images and
refers to PWG 5100.SCAN. I take it that this means to have the specification refer to itself, but it is confusing even if the proper number is inserted. Better to refer to the internal paragraph.
Agreed.
> 2. Figure 1 refers to the operation as
GetNextDocumentImage rather than GetNextDocumentImages
>
> 3. In para 7.1.1, under Group 2: Job Template
Attributes is a reference to section 8.28.1.7.2. There is no such section (should it be 8.2?)
>
> 4. Although the text makes a distinction between
Print Jobs and Scan Jobs, section 8.2.1.1 refers to a Print Job.
Thanks for catching these!
_________________________________________________________
Michael Sweet, Senior Printing System Engineer, PWG Chair
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.pwg.org/pipermail/ipp/attachments/20140929/b40b5de9/attachment.html>