A METHOD FOR SELECTIVE IMAGE ACQUISITION AND TRANSMISSION
Introduction:
The invention is that the sequential transmission of images stored on a client system could include redundant information. In order to minimize transmission time and volume, a system is proposed where the portable imaging device, connected to a remote server by means of either a wired or wireless data connection, is instructed by the remote server which images or video sequences to transmit, and what the resolution should be for each transmission.
The invention operates in this manner: The portable device sends one or more low quality transmissions, sufficiently detailed to allow the remote server to identify the image but not sufficiently to require a full transmission of all data. On the basis of this preliminary transmission, the server then determines the required data, computes the next sequence of transmission to transmit this data, and then commands the portable device to perform these transmissions. The portable device transmits the required data, and the remote server then constructs the final information required on the basis of the second sequence of transmissions. This is not an image compression scheme in the sense of replacing data blocks with tokens; rather it is a method for sending only the data necessary for task completion.
The data is not necessarily stored in a format that requires less space than usual. The primary data that is sent is not sufficient for extraction of the entire required data, but is sufficient for determination of the area containing the data of interest. Since the area of interest is expected to be a small portion of the entire image, the data of the area of interest is than transmitted in a detailed format.
Title of invention:
Selective transmission method for images and video.
Field of invention:
The invention relates to an image acquisition and communication device and a client- server system.
What's the problem?
Current situation of portable imaging devices with a data connection: the devices are typically composed of a camera, a storage media which is either nonvolatile (e.g. flash memory, hard disk) or volatile (RAM), some image processing capabilities (e.g. compression, color interpolation) and data transmission capabilities (e.g. a modem, a data connection to a cellular phone, etc.). Some of these devices support image compression to reduce data transmission time and cost. We mention here Photovity from Flashpoint, the new Polaroid camera with a modem (PDC-640M) and the Lightsurf solution, etc. These devices are designed for the transmission of images taken by a digital camera using a data connection. They are designed to transmit the images taken previously by the camera either sequentially (until all the images have been uploaded) or based on user selection (that is the user chooses which pictures to upload).
For certain applications, such as document imaging, panoramic imaging, product imaging, the imaging of a symbol to be decoded (e.g. a bar code, a piece of text), the imaging device may take many more images than are required to be transmitted. Since data transmission is costly, takes up time and device battery power, there is a need to minimize the amount of data transmitted.
There are two options for reduction of the transmitted data: System 1. The imaging process is not controled. Redundant information is aquired and stored, but only the necessary information is transmitted. A portion of the data (e.g a small part of the images taken) is analysed. The analysis can be made in the imaging device or in a remote server. The result of both analysis is the selection of the necessary information to be transmitted. System 2: Control of the imaging process. The imaging process is changed in time according to analysis of the first images taken.
Description of Samples of Prior Art:
References 1 and 2, attached to this application, demonstrate some features of prior art, and the differences between the present invention and the prior art.
The concept of combining series ofϊmages into a single mosaic or panorama that covers a wider field of view, has better resolution or signal to noise ratio, and/or better dynamic range, is not new in itself. Reference 1 is an article describing some of the mathematical techniques for accomplishing such a type of "stitching" method. Reference 2 is an article describing some commercial products which enable taking panoramic images and/or performing the image "stitching".
Some of the novel aspects of the invention, in comparison to this prior art, are: 1) The concept of performing the imaging operation in a small portable device that is not necessarily optimized for imaging operation described herein. This is not a trivial problem of the prior art, since such small portable devices generally hold limited storage and limited data transmission capabilities. There are many assumptions inherent in prior art stitching processes, but these assumptions do not apply to and do not limit the current invention. For example, assumptions that are not required by the current invention but are required by prior art methods include: a. The use of high resolution images. (See the images on pages 1282 and 1283 of Reference 1.) b. The use of a high quality lens system with little optical distortion. (See the pictures on pages 1, 6 bottom, and 7, of Reference 2.) c. Having the user perform a linear scanning motion with accurate image alignment due to the use of a view-finder on the device and due to the device being a camera rather than a cellular phone or a PDA. d. The image being performed on objects that are relatively distant (a few meters, for example) from the imaging unit, and hence avoiding the very complex problem of finite distance image registration. (This assumption is mentioned as the "no parallax" issue in the Abstract on the first page of Reference 1, and also in several pages of the text of Reference 1.)
Since the current invention is not bound by these assumptions, it takes the concept of imaging and image stitching significantly beyond the prior art.
2) Currently, the computers and systems that perform image stitching and registration require the availability of the full set of images for performing the operation. For example, Reference 1 describes specifically a method for performing various kinds of correlation calculations on the whole image of the stitching process. Hence, if the stitching or registration is/are to be done at a later time (hence requiring intermediate storage of data), or in a different location (hence requiring transmission of the images), the full set of images must stored and/or transferred. This requirement will greatly burden the storage capacity of the portable device, or the data link capacity, or both. The current invention recognizes and makes use of the insight that not all of the information must be sent, but rather only portions of such data (with potentially reduced resolution, or other acceptable degradation) can be stored or sent. Further according to the current invention, only the critical images or critical image sections need to be retrieved in full resolution. This implementation is a major deviation from the method of operation according to the prior art.
Description of the invention: Figure 1 describes the prior art. Figures 2-4 inclusive describe the first embodiment of the invention, according to System 1 noted above. Figure 5 describes the second embodiment of the invention, according to System 1 noted above. Figure 6 describes the third embodiment of the invention, according to System 2 noted above.
Figure 1 (Prior Art): Possible known paths of communication between the system's elements
1. Element (1) is an imaging device. Element (2) is a device capable of saving the data. It can be embedded in element (1), or separate such as a portable device or a PC. Element (1) is connected to element (2) either through a wired data connection or wirelessly (a).
2. Element (3) is a server. Element (2) can be connected to element (3) in several ways:
* Element (2) is connected to a cell operator (b). The cell operator is connected to the Internet (c). The Internet is connected to element (3) (d).
* Element (2) is connected to the Internet (e). The Internet is connected to element (3) (d).
* Element (2) is connected directly to element (3) (f).
Figure 2.3.4 : Embodiment 1 (System 1): There will now follow a general description of Embodiment 1, then a detailed description of how the invention works with reference to the relevant Figures.
GENERAL DESCRIPTION OF EMBODIMENT 1: Embodiment 1 is the extraction of an area from a single picture without a prior knowledge of the object photographed. It involves the following stages of operation:
First, the entire first picture is sent in a low resolution.
Second, the type of the object is determined (document, headline in a news paper, barcode, etc).
Third, the relevant algorithm is performed. The algorithm determines what part of the picture is relevant and the minimal resolution necessary. For example: only the part containing the digits is extracted out of an image of a barcode. The extraction can be done using different methods, one of them is the folllowing: The image is first scanned for existence of lines. Some candidates areas having a line characteristic (e.g. rapid changes in a certain direction and minimal changes in the perpendicular direction) are selected. These candidates are examined further by, for example, computing moments in varying angles, the area size, etc. The most suitable candidate is chosen. The lines' direction is determined in a greated accuracy. The two possible locations of the digits relative to the lines (in the upper or lower part of the image) are examined and the correct one is determined. A simple OCR or ICR algorithm is used for recognition of the locations of the digits (without determination of the specific digit in each location). A line is matched to the location of the centers of the digits. The location of the center is corrected based on the rules of the digit's location inside a barcode (e.g equal distances between digits, number of digits in a barcode, etc.) Fourth, the command flows back to element (2) and the required data is transmitted to the server (3).
DETAILED DESCRIPTION OF EMBODIMENT 1 WITH REFERENCE TO THE RELEVANT FIGURES:
Figure 2: Determination of the relevant algorithm for a given image for extraction of an area from it ,
Element (a): An image is Squired. The image can contain a barcode, headlines form a news paper, a text etc. ,\
Element (b):The image is sent in low resolution.
Element (c): First the type of the required service is recognized. The basic algorithm deferentiates between services such as document imaging, panoramic imaging and product imaging. The identification can be made by searching for several characterizing patterns of each of the supported services in any given picture. For example any picture will be screened for lines of a bar code, headline format letters and a pattern of a text document.
Element (d): According to match results of the different patterns the relevant algorithm is chosen and performed. For example, if the format letters of a newspaper were detected, the algorithm for headline identification will be executed.
Figure 3:
Embodiment: Determination of the first data transition according to a priori knowledge of the object photographed
a. A notification about the photograph action is sent to element (3). (No part of the image which was collected is sent so far.) b. The server (3) determines the type of the object. The decision can be based on a) location of the user b) time c) previous configuration made by the user d) previous use by the user. For example: if the user's is location is identified as a shop the product imaging is the default application. c. The following parameters are determined according to the object: a) resolution b) the part of the picture to be sent (for example cutting 10% of the edges) c) number of pictures d) the most suitable pictures (for example the second, forth and last pictures are chosen rather than the first 3 ones). d. The image is transmitted according to the transmission parameters detrmined in (c).
Figure 4,5: Examples for the operation of algorithm for extraction of an area from a single picture.
Figure 4: An algorithm for extraction of a headline area from a single picture of a newspaper.
If headline format letter were identified, the algorithm will be executed.
Element (a): The image is sent in low resolution. The entire area of the image is sent but in a limited information format.: low resolution, black and white instead of colors etc.
Element (b): The algorithm determines the location of all the candidate headlines in the image, and the candidate the user tried to photo, for example, according to its size and location in the image. The location of the part of interest in the original image is sent back
Element (c): The headline is sent to the server in a higher resolution than in (a).
Figure 5: An algorithm for extraction of a barcode's digits area from a single picture of a product
Element (a): The image is sent in low resolution. The entire area of the image is sent but in a limited information format: low resolution, black and white instead of colors etc.
Element (b): The algorithm determines the barcode's location, angle and direction.
The location of the digits respected to the barcode is determined. The location of the digits in the original image is sent back.
Element (c). The digits are sent to the server in a higher resolution than in (a).
Figures 6: Embodiment 2 ("System 1): There will now follow a general description of Embodiment 2, then a detailed description of how the invention works with reference to the relevant Figures.
GENERAL DESCRIPTION OF EMBODIMENT 2: Embodiment 2 is the extraction of non-redundant data from multiple pictures without a priori knowledge of the object photographed. It involves the following stages of operation:
First, a small portion of the pictures is sent in a low resolution.
Second, the algorithm determines what are the overlapping parts and if more pictures are required for stitching the entire document
Third, the additional pictures are sent in low resolution (if necessary).
Fourth, a stitching method is determined for the entire picture
Fifth, the non-redundant image data is transmitted in a higher resolution and stitched to create the entire image.
DETAILED DESCRIPTION OF EMBODIMENT 2 WITH REFERENCE TO THE RELEVANT FIGURES:
Figure 6: A stitching method using selective transmission
Element (a): The original object being photographed. The original images taken by element (1) from Figure 1, and stored in element (2) from Figure 1, can contain redundant information, as shown in element (a).
Element (b): The original images are sent to element (3) in Figure 1, in low resolution
Element (c): The redundancy between the images is determined.
Element (d):The pictures of interest and the area of interest inside these pictures are determined and the location data is transmitted back to element (2) in Figure 1. The relative location of each area, compared to the other parts needed for reconstruction of the original image (e.g. the stitching method), is determined and saved in element (3) of Figure 1.
Element (e): The areas of interest are sent to element (3) in Figure 1, in a higher resolution than in sub-section b of this paragraph. These parts are stitched together according to the stitching method determined in sub-section c of this paragraph.
Since embodiment 2 of the invention by its definition and nature requires the processing of multiple images, any application that requires the processing of multiple images will also be within the purview of the invention. For example, if multiple images are to made of the same target object, but at different times or from different angles of view, these may be stitched together, according to the invention, in order to achieve the desired result. For example, if images are to made of different target objects, whether there will be one image for each target or multiple images of each target, these images may taken and processed in accordance with the invention. Further, video is simply a combination of multiple images, processed at a certain rate of speed. Thus, video imaging is also application within the purview of the current invention. It will be appreciated that any application or usage that requires imaging of objects can be a subject of the current invention, particularly where the imaging must be transmitted in an accordance with a method where the communication bandwidth is limited.
Also, it is possible to take one image from a video, or other multiple image application, and improve upon that image by application of embodiment 1 of the invention. That is, the server can select one image that will be processed, then specify new values for imaging parameters for that one image, send these new values to the client, where the client will make a new image of the object and send that new image to the server. Other combinations are possible also.
Figure 7: Embodiment 3 (System 2): There will now follow a general description of Embodiment 3, then a detailed description of how the invention works with reference to the relevant Figures.
GENERAL DESCRIPTION OF EMBODIMENT 3: Embodiment 3 is the control of the imaging process. It involves the following stages of operation:
First, the first images are taken according to default parameters (such as, for example, exposure time, gamma factor, photographic frequency, total number of photos, storage format, etc). These images are sent to the server.
Second, the algorithm determines new values for the parameters in the imaging process. The parameters may include, for example, the number of pictures to be taken, the time differences between the next pictures, gamma correction, focus, etc. The algorithm also determines new values for the parameters in storage of the data. These parameters may include, for example, the format for storage, what parts of the images should be stored, what shall be the resolution of the image stored, etc. Third, the next set of images are taken according to the new parameters. Fourth, this process may be terminated after a predetermined number of sets of images have been taken by the client, transmitted by the client, and received by the server. (For example, the user may specify that there shall be only two rounds of images, or three rounds of images, or some other number.) Alternatively, the process may be repeated in an iterative manner until all of the necessary has been received at the server, without reference to a fixed number of rounds of transmissions. The "necessary" data is that amount and nature of data required to reconstruct the images in the quality required. In essence, the user determines the required quality, but does not limit number of rounds of transmissions, or the amount of data to be transmitted or processed. The manner in which the process is implemented, by number of rounds, amount of data, reqired quality, etc., may be varied by each application.
DETAILED DESCRIPTION OF EMBODIMENT 3 WITH REFERENCE TO THE RELEVANT FIGURES:
Figure 7: The control of the imaging process.
Element (a) The first images are taken according to default parameters (such as, for example, exposure time, gamma factor, photographic frequency, total number of photos, storage format, etc). The images are sent to the server in a limited information format, such as low resolution, white characters on black background, etc.. Element (b): The algorithm determines first, the parameters for the imaging process, such as, for example, number of pictures to be taken, the time differences between the next pictures, gamma correction, focus, etc.; and second, the parameters for storage of the data. The server then sends a message to the client, with specific parameters for the next set of images to be taken and transmitted to the server. In Figure 6, the example is that there changes in the time for image exposure, the compression ratio, and the gamma factor.
The next images are taken according to the new parameters. The next set of images is then sent to the server. It will be appreciated that this is an iterative process, with multiple rounds of images, refinements of the factors, and transmission of more images. The entire process allows the server to capture only the date\required for the focus and quality required, while at the same time minimizing the total amount of data transmitted.
Embodiment 4: Combination of embodiments 1 and 3:
An additional embodiment 4 is the combination of embodiment 1 and embodiment 3 above. In this new embodiment 4, there are predetermined criteria for imaging at the client (in accordance with embodiment 3), and this image is sent to the server. The server then determines new values for the parameters, and sends these values to the client. The client takes a new image on the basis of the new values, and sends this image to the server. This process of imaging, transmission, determination of new values, etc., may be continued according to some predefined criteria such as number of rounds of images, quality of the picture desired, etc.
Embodiment 5: Combination of embodiment 2 and 3:
Embodiment 5 operates similarly to embodiment 4, except that with embodiment 5 there are multiple images taken per round of imaging, rather than one image only. For example, a user may want to create a panoramic image. The first images will be taken without a priori knowledge about the user's action, according to default parameters. The first images are sent in low resolution. The redundancy between the pictures is determined. According to the degree of redundancy, values such as the number of images, and the time lag between images, may be changed. The redundancy also determines which pictures and what part of the pictures will be used for the creation of the panoramic image.
Advantage of the invention over prior art:
The current invention reduce data transmission time and cost. Instead of a sequentially or user selected-based transmission of the image data, a selective transmission enables the transmission of the minimal amount of data required. A relevant area can be determined from a low-resolution image, and then extracted from a higher-resolution image. Alternatively, the location of non-redundant data can be determined using multiple low-resolution images. Then only the non-redundant data can be sent in higher resolution and stitched. The method can be combined with existing methods for data compression for minimization of transmission time and cost.
Innovative steps:
The novel items in the invention include:
1. A system where the imaging device transmits lower grade partial images to a server to facilitate image identification, and the server requests further image information (e.g. higher resolution portions, etc) to facilitate the desired action, such as image stitching, OCR etc.
2. A system where, instead of the server as described immediately above, a special algorithm running on the imaging device's processor performs the identification, and the decision is not which parts of the image(s) to transmit and how to transmit themm but rather how to store images for future usage/transmission.
3. A method where the server instructs the imaging device on camera and/or imaging specific parameters such as exposure time, camera AGC, camera gamma factor etc.
4. A method where the server provides the imaging device with information on the quality of the received picture and hence updates/controls the compression characteristics/algorithm parameters used in the compression algorithm on the imaging and data transmission device.
5. A system where the feedback about the imaging operation for the user (e.g. camera scan speed, camera distance from the object, image brightness, existence of letters/numerals/bar-codes in the image, object angle, etc.) is computed in the server and sent back to the imaging device to assist the user in the imaging operation.
6. A method for transmitting (or storing) only the part of the image that is critical for accomplishing the image recognition task - e.g. the headline in a newspaper, the numerals or bar-code in a UPC/EAN or other bar-code symbol, the new part of the picture revealed in the new picture etc.
7. A method for reconstructing an image from storage on the device or from the transmissions received on the server in such a way that the proper image identification/image sending/image display/image printing operation will be of sufficient quality. For example, for faxing a document the server may stitch together the relevant transmitted image portions, and for this stitching an 8-bit per pixel color depth may be necessary. For performing OCR on the same image, a 1 bit pixel depth (and stronger compression)_may be optimal. The novel principle is that there is no "one image" of a given resolution,size, color depth and with a given compression method. Rather, the image, as residing on the imaging device's volatile and/or non- volatile memory, is extracted and sent to the server with parameters reflecting the desired application and controlled by special software in the imaging device or the server.
Supplementary questions and answers about the invention:
1) Who are the inventors? Tsvi Lev and Ofer Bar-Or.
2) When was the invention conceived? In November, 2000 at the offices of UCnGo in Ramat-Gan, Israel.
3) What is the current stage of the invention, development, testing, marketing, etc.? The invention is currently under implementation/development. (See the Fax SOW document, below.)
4) Have any revelations been made about this invention to anyone outside the company? If so, what are the details of such revelations? 77ze invention was not revealed to anyone but company employees, existing investors, and our Special Counsel for Intellectual Property, Ariel Goldstein.
File:ucl01021.doc
References:
1. Steven Mann, member, IEEE, and Rosalind W. Picard, member IEEE, "Video Orbits of the Projective Group: A Simple Approach to Featureless Estimation of Parameters", IEEE Transaction On Imagging Processing, vol 6 no 9, September 1997.
2. The future Image Report, November 2000, volume 8 issue5
The tr o references listed above are incorporated herein by reference.
Appendix: The following appendix is an internal engineering document of UCnGo, the employer of the applicants. This document indicates parameters for implementation of the invention. It will be appreciated that this document is suggestive only. The invention is not limited to the criteria, the numbers, or the applications, stated herein. Nevertheless, the appendix suggests technical criteria and parameters that are part of the invention.
Fax Application Statement of Work (draft)
Application requirements:
General requirements
Fax application is designed to run on a portable platform connected to a remote server by a modem.
The application is designed to acquire a monochrome text image from A4-sized paper using a digital camera based on some embedded platform and reconstruct it as a readable binary or 4 gray-level image on the remote platform.
The amount of calculations to be performed on the portable platform is minimal, under the constraints of:
1. Minimal acquisition speed
2. Communication speed
3. User feedback
Based on requirement of approximately 100 frames per A4-sized paper, the minimal acquisition speed should be between .25 and .75 frames per second.
Communication speed dictates compressed image size to be approximately 3 KByte per frame.
User feedback implies visual or auditory response TBD.
Mode of operation
The fax acquisition operation is performed by an ordinary user after short training.
The user performs acquisition as smoothly as possible without any additional hardware.
The envelope of operation is
1. Distance between camera and paper: 7- 15cm.
2. Maximal peak to peak variation on the distance for the entire scan: 2cm.
3. Maximal distance between two consecutive frames is 3 cm.
4. The first 4 frames of the acquisition sequence will be used for extrinsic camera calibration and the distance between these frames should be between .25 and .75 cm.
5. The acquisition is performed in overlapping strips, so that there are 3-4 strips of 15-20 frames per A4 size page.
6. The transition between strips is smooth, so that 4-5 frames are required per transition.
7. Maximal pitch and roll angle is 7 degrees.
8. Maximal camera rotation is 30 degrees peak to peak for the entire acquisition process and 10 degrees between 2 consecutive frames.
Prior to acquisition intrinsic camera calibration is performed using a predefined (checkerboard) target. For the fixed focus cameras the calibration processed is performed once only.
Embedded hardware requirements
Programmable DSP processor: TBD ops
Digital camera with minimal resolution 320x240 pixels, TBD bits per pixel.
Minimal communication speed: 20KBps.
Remote server requirements
Minimal communication speed: 64KBps.
Maximal latency between 2 consecutive frames: .25sec.
Maximal total processing time: 200sec.
Hardware TBD. ^
Processing power requirements: TBD ^
(For the on-line processing a quad Pentium 1GHz computer with at least 256Mbyte memory is recommended.
For the off-line processing 16 computers or 4 quads with highly-parallel structure of multiple servers, 1GByte memory and fast connection is recommended)
Operational block-diagram
Processing stages
Total processing can be divided into 3 almost independent stages:
1. Image preprocessing and compression is performed on the embedded unit
2. On-line camera trajectory estimation on the remote server. 3. Off-line final image reconstruction on the remote server.
Image preprocessing and compression Camera trajectory estimation Final image reconstruction (embedded) (server, on-line) (server, off-line)
Note that different hardware is required for each processing stage.
Preprocessing and compression
Since each frame size is 3KByte and a 320x240 image with 8bps takes approximately 72KByte, some dedicated preprocessing and compression is required in the embedded unit. The a-priory monochrome properties of the image can be used to minimize the compression artifacts, quantization effects and computational requirements in the following processing stages. Therefore the following operations are performed in the embedded unit:
The most time-consuming process in this flowchart is image homogenization, for which a floating point processor is required. Smart casting decreases significantly the number of floating point operations required to taking the power .25 of each pixels' statistic.
Camera trajectory estimation
The estimation of relative position of frames is crucial for the image reconstruction process. The user should receive a feedback regarding the camera movement in real time, so he can correct his mistakes. All the frames, their sanity scores and camera positions are saved in a database for final image reconstruction.
Frame
Compressed restoration frame CZ
(3 bit per pixel) Restored frames
I database
Coarse camera Coarse camera Smart correlation
Mulliscale image position rotation (Coarse scale) representation correction correction
Inform
-User feedback-i user a-pnoπ
LPrevιou s camera position- π probabilities
Camera
Camera position Smart conelation
Saπity check trajectory calculation (Fine scale)
Camera trajectory estimation a-pπoπ weights
-2nd iteration feedback- Ώ
-Cunent camera position-1 Fine camera position correction
Synthetic
-Frame weight- image
Pixel weights
Image synthesis assignment
The pair- wise processing of the frames is based on a smart correlation procedure, which is performed in multi-scale setting for fast implementation. The relative position of the frames can be translated to camera trajectory and various deformations of the image provide an estimator for the camera position. Sanity check and various weight assignment allow to correct distortions caused by errors in camera position estimation. A temporary synthetic image is constructed to improve trajectory estimation.
Multiple feedbacks between various processes allows fast adaptation and on-line problem correction.
Real-time (.25 sec delay) feedback supplied to the user allows to correct problems caused by improper operation.
Final image reconstruction
The final image reconstruction is the most time-consuming stage of the process. It is based on acquisition sequence clustering into strips followed by recursive merging of the detected strips. This time-consuming process allows to correct problems caused by sequential frame acquisition and eliminates 'bad' frames.
The reconstructed image undergoes various resolution improvement procedures and final. fax-like image is created. The user is informed on the success of the operation.
Development process
Parallel activities
The development process can be divided into various parallel activities:
1) Low-level image processing: a) Homogenization, b) Quantization and binarization, c) Compression and restoration, d) Image sharpening and denoising.
2) Smart multistage correlation computation with weight assignment:
3) Camera modeling a) Camera trajectory model, b)Camera rotation correction, c) Intrinsic camera calibration, d) Extrinsic camera calibration, e) Sanity check and trajectory analysis, f) Abnormal frames detection
4) Image synthesis a) On-line image synthesis, b) Frame clustering, c) Strip reconstruction, d) Recursive strip merge
5) Integration