WO2014184372A1 - Capture d'image au moyen d'un dispositif client - Google Patents

Capture d'image au moyen d'un dispositif client Download PDF

Info

Publication number
WO2014184372A1
WO2014184372A1 PCT/EP2014/060154 EP2014060154W WO2014184372A1 WO 2014184372 A1 WO2014184372 A1 WO 2014184372A1 EP 2014060154 W EP2014060154 W EP 2014060154W WO 2014184372 A1 WO2014184372 A1 WO 2014184372A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
focus
card
adaptive threshold
Prior art date
Application number
PCT/EP2014/060154
Other languages
English (en)
Inventor
Liu ZIZHOU
Warren BLUMENOW
Daniel HEGARTY
Original Assignee
Wonga Technology Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wonga Technology Limited filed Critical Wonga Technology Limited
Publication of WO2014184372A1 publication Critical patent/WO2014184372A1/fr

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/0035User-machine interface; Control console
    • H04N1/00405Output means
    • H04N1/00477Indicating status, e.g. of a job
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B7/00Mountings, adjusting means, or light-tight connections, for optical elements
    • G02B7/28Systems for automatic generation of focusing signals
    • G02B7/36Systems for automatic generation of focusing signals using image sharpness techniques, e.g. image processing techniques for generating autofocus signals
    • GPHYSICS
    • G02OPTICS
    • G02BOPTICAL ELEMENTS, SYSTEMS OR APPARATUS
    • G02B7/00Mountings, adjusting means, or light-tight connections, for optical elements
    • G02B7/28Systems for automatic generation of focusing signals
    • G02B7/36Systems for automatic generation of focusing signals using image sharpness techniques, e.g. image processing techniques for generating autofocus signals
    • G02B7/365Systems for automatic generation of focusing signals using image sharpness techniques, e.g. image processing techniques for generating autofocus signals by analysis of the spatial frequency components of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/142Image acquisition using hand-held instruments; Constructional details of the instruments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/0035User-machine interface; Control console
    • H04N1/00405Output means
    • H04N1/0049Output means providing a visual indication to the user, e.g. using a lamp
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/40Picture signal circuits
    • H04N1/409Edge or detail enhancement; Noise or error suppression
    • H04N1/4092Edge or detail enhancement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/8146Monomedia components thereof involving graphical data, e.g. 3D object, 2D graphics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/67Focus control based on electronic image sensor signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/70Circuitry for compensating brightness variation in the scene
    • H04N23/743Bracketing, i.e. taking a series of images with varying exposure conditions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20048Transform domain processing
    • G06T2207/20061Hough transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20192Edge enhancement; Edge preservation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30176Document
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • G06T3/608Rotation of whole images or parts thereof by skew deformation, e.g. two-pass or three-pass rotation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/17Image acquisition using hand-held instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/00127Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture
    • H04N1/00204Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a digital computer or a digital computer system, e.g. an internet server
    • H04N1/00244Connection or combination of a still picture apparatus with another apparatus, e.g. for storage, processing or transmission of still picture signals or of information associated with a still picture with a digital computer or a digital computer system, e.g. an internet server with a server, e.g. an internet server
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/27Server based end-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/4104Peripherals receiving signals from specially adapted client devices
    • H04N21/4126The peripheral being portable, e.g. PDAs or mobile phones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/0077Types of the still picture apparatus
    • H04N2201/0084Digital still camera
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2201/00Indexing scheme relating to scanning, transmission or reproduction of documents or the like, and to details thereof
    • H04N2201/0096Portable devices

Definitions

  • This invention relates to methods and systems for efficient capturing of images of documents such as cards and the like using mobile devices.
  • embodiments of the invention relate to capturing of images for optical character recognition of data on a card using a mobile device such as a smart phone.
  • Optical character recognition techniques are known for the automated reading of characters. For example, scanners for the automated reading of text on A4 pages and for scanning text on business cards and the like are known. However, such devices and techniques typically operate in controlled lighting conditions and capture plain, non-reflective surfaces.
  • an embodiment of the invention provides a new approach to detecting the edge of a card in an image.
  • a new focus detection process is provided.
  • a new card framing process is provided.
  • FIG. 1 is a flow diagram showing the main process for capturing a card image
  • FIG. 4 shows the card image of Figure 4 after filtering and channel processing according to an embodiment of the invention
  • FIG. 1 shows a final image resulting from the card framing process; and shows an image uploading arrangement.
  • the invention may be embodied in methods of operating client devices, methods of using a system involving a client device, client devices, modules within client devices and computer instructions for controlling operation of client devices.
  • Client devices include personal computers, smart phones, tablet devices and other devices useable to access remote services.
  • a client device embodying the invention is arranged to capture an image of a document such as a card.
  • a card may be a credit card, debit card, store card, driving licence, ID card or any of a number of credit card sized items on which text and other details are printed.
  • cards will be simply referred to hereafter as "cards”, and include printed, embossed and cards with or without a background image.
  • Other objects with which the embodying device and methods may be used include cheques, printed forms and other such documents.
  • the embodying device and processes are arranged for capture of images of rectangular documents, in particular cards which are one type of document.
  • a system embodying the invention is shown in Figure 1.
  • the system shown in Figure 1 comprises a mobile client device such as a smart phone, tablet device or the like 2 and a server system 20 to which the client device connects via any known wired or wireless network such as the Internet.
  • the client device 2 comprises a processor, memory, battery source, screen, camera and input devices such as keyboard or touch screen.
  • Such hardware items are known and will not be described further.
  • the device is arranged to have a number of separate functional modules, each of which may be operable under the command of executable code. As such, the functional components may be considered as either hardware modules or as software components.
  • a video capture module 10 is arranged to produce a video stream of images comprising a sequence of frames.
  • the video capture module 10 will therefore include imaging optics, sensors, executable code and memory for producing a video stream.
  • the video capture module provides the sequence of frames to a card detection module 12 and a focus detection module 14.
  • the card detection module 12 provides the functionality for determining the edges of a card and then determining if the card is properly positioned.
  • This module provides an edge detection algorithm and a Hough transform based card detection algorithm. The latter consumes the edge images, which are generated by the former, and determines whether the card is properly positioned in each frame of the video stream.
  • the focus detection module 14 is arranged to determine which frames of a sequence of frames are in focus. One reason for providing such focus detection is that many smart phones do not allow applications to control the actual focus of the camera system, and so the card detection arrangement is reliant upon the camera autofocus.
  • This module features an adaptive threshold algorithm, which has been developed to determine the focus status of the card in each frame of the video stream.
  • a card framing module 16 is arranged to produce a final properly framed image of a card. This module combines a card detection process and card framing algorithm and produces a properly framed card image from a high-resolution still image.
  • An image upload module 18 is arranged to upload the card image to the server 20.
  • a front end client application of the client device 2 comprising the modules described, produces a live video stream of the user's card using the user device's camera while the user positions the card in a specific region indicted by the application (referred to as the "card alignment box" shown in Figure 2).
  • the functional modules then operate as quickly as possible to produce a properly framed, in focus image of the card.
  • the main modules operate as follows.
  • the card detection module 12 analyses live video frames to determine whether the card is properly positioned.
  • the focus detection module 14 processes live video frames and decides whether the camera is properly focused on the card. Once the card is properly positioned and the card is in focus, the application causes the user device to take a new still image automatically.
  • the card framing module 16 consumes the still image and produces a properly framed card image for upload by an image upload module 18.
  • the properly framed card image is then uploaded to a backend server 20 for Optical Character Recognition (OCR).
  • OCR Optical Character Recognition
  • the high resolution images captured are also uploaded to remote storage of the server in the background via a queuing mechanism, shown here as image upload module, designed to minimise the effect on the user experience of the application.
  • the output of the process is a properly framed card image in the sense that all background details are removed from the original image, only the card region is extracted, and the final card image has no perspective distortion, as shown in Figure 13 and described later.
  • each module may be provided by dedicated hardware, but the preferred embodiment is for each module to be provided as program code executable by a client device.
  • a card detection process embodying the invention will now be described with reference to Figures 3 to 6.
  • the purpose of the card detection process is to assist a user to position the card correctly, and decide whether the card is properly aligned within the frame and in focus.
  • the output is a captured high-resolution image and card position metrics.
  • the card detection module 12 operates a process as shown in Figure 3.
  • the process shown in Figure 3 operates on an incoming video stream. An incoming video stream is analysed. For each incoming video frame the process operates the following steps:
  • the process extracts from the original frame a sub-image that potentially contains the card (as shown in Figure 4) and optionally downsamples the image to speed up subsequent processing.
  • Figure 5 shows an output of processing the image of Figure 4 using a known edge detection algorithm. As can be seen, many unwanted "edges" have been detected in addition to the actual card edges.
  • the process produces a binary image of edge segments (as shown in Figure 6) by applying a new edge detection algorithm to the downsampled sub- image.
  • the edge detection algorithm takes into account the nature of the image being analysed (a card) and uses techniques to improve upon more general known algorithms.
  • the edge detection is defined by steps 1 to 4 below.
  • the edge detection algorithm operates as follows:
  • Step 1 provides directional blurring: On the original image, in the top and bottom edge areas, use a horizontal kernel [ ⁇ TM TM ] for blurring; in the left and right edge areas use a vertical kernel j ⁇ ⁇ T (transpose of horizontal kernel) for blurring. This operation removes some unwanted noise and intensifies the card edges.
  • the directional blurring may comprise one or more of a variety of processes that operate to reduce the rate of change of an image in a particular direction. Such processes include smoothing or blurring algorithms such as a Gaussian. In the preferred arrangement, the directional blurring operates in one dimension at a time on a line by line basis.
  • Step 2 uses a directional filter: A Sobel edge detector is preferably used to operate on the edge areas and outputs derivatives of the gradient changes. From the derivatives produced, the magnitudes and directions of the gradient changes are calculated.
  • a fixed threshold is applied on the outputted magnitude to selected pixels of strong gradient changes (usually edges in the image are preserved), further filtering the pixels based on directions of gradient changes. For top and bottom areas, horizontal edges (where gradient changes are nearly vertical) are preserved; for left and right area, vertical edges (where gradient changes are nearly horizontal) are preserved. Finally, a binary image is outputted, which only contains promising edges pixels.
  • the directional filter may be a number of different filters all of which have in common that they produce an output giving magnitude and direction of gradient changes in the image.
  • the top, bottom, left and right areas may be as defined in relation to step 1.
  • Step 3 Multi-channel processing: a directional filter such as the Sobel edge detector operates separately on the R, G, and B channels and the final derivatives of gradient changes are aggregated from all channels by taking the maximum value from the outputs of all channels at each pixel location.
  • Multichannel processing increases the sensitivity of the card detection algorithm, in cases where the environment in which the card image was captured is such that luminance contrast between the card and the background is low but chroma contrast is high.
  • the multi-channel processing may be in any colour space, or could be omitted entirely. The choice of R, G, B colour space is preferred, but alternatives such as CMYK are also possible.
  • Step 4 Directional Morphological Operation: On the filtered edge image, in the top and bottom edge areas, erode with [1, 1, 1, 1, 1, 1] to remove false edges; in the left and right edge areas, erode with [l, 1,1,1,1, 1, i] T (transpose of horizontal erosion mask). After erosion, apply dilation with the same masks of erosion in the edge areas. This operation removes some false edges and intensifies card edges.
  • the final image looks like Figure 6.
  • the morphological operations improve the output image by removing pixels that appear to be small "edges" (as shown by the edge clutter in Figure 5).
  • the erosion operation computes a local minimum for the specified kernel, so will reduce visibility of vertical structures in the top and bottom areas and reduce visibility of horizontal structures in the left and right areas.
  • the dilation takes the maximum for the specified kernel and so emphasises horizontal structures in the top and bottom areas and emphasises vertical structures in the side areas.
  • erosion precedes dilation.
  • the purpose of the erosion step is to remove the remaining false edge segments in the binary image.
  • a dilation operation is used to fill up small gaps between the edge segments and compensates the effect of erosion on the genuine edge segments.
  • the process detects the card edges by using the Probabilistic Hough Transform for line detection on the binary image of edge segments. For each edge line that is detected that matches the specified conditions for a card edge (minimum length of the edge line, angle of the edge line, prediction error of the edge line), the process calculates, at step 38, line metrics (line function and line end points) for the detected edge line.
  • the Hough Transform provides extra information about the lines and in the process by which the edges within the image of Figure 6 are detected.
  • the card is considered to be properly positioned. If, at step 42, the card is also in focus, the application takes a high- resolution image at step 44. Otherwise, the process is repeated for the next frame.
  • the arrangement could use the video stream to provide the still image. However, devices tend to use lower resolutions for video streams and so the step of capturing a single still image using the camera functionality of the user device is preferred.
  • the client application displays images to the user as follows. For each frame, highlight on the application screen those edges that have been detected, by lighting up the corresponding edges of the displayed frame; turn off such highlighting for edges that failed to be detected. (In Figure 2, all edges have been detected so all four edges are highlighted).
  • the user interface by which the user is shown that they have correctly positioned the card within the boundary area is best understood with reference to Figure 2.
  • the user positions the card in front of the imaging optics of their smart phone or other user device and they can view that card on the screen of the display of their device.
  • a boundary rectangle is shown giving the area within which the card should be positioned.
  • the algorithm described above is operated on the video stream of the card image and, as each of the left, right, top and bottom edges are detected using the technique described above, those edges are indicated by highlighting on the display or changing the colour of the edge of the frame so as to indicate to the user that they have correctly positioned the card within the appropriate area.
  • the calculation of line metrics at step 38 above may be provided by a known edge detector.
  • edge detection algorithms such as the Canny edge detector and the Sobel edge detector are generalised edge detectors, which not only detect the edges of cards but also noise edges from a cluttered background, as shown in Figure 5. Accordingly, we have provided a new robust edge detection algorithm described below that filters out the noise edges and preserves the card edges accurately. Focus Detection Process
  • Focus detection process uses underlying algorithms for focus metric calculation and focus discrimination. Focus metrics are calculated values that are highly correlated with the actual focus of the image. Focus discrimination is achieved by applying an adaptive threshold algorithm on the calculated focus metrics.
  • the focus detection aspects of an embodiment of the invention are shown in Figures 7a, 7b, 8 and 9.
  • the arrangement determines a focus metric for each frame of a video stream, using one or more algorithms operating on each frame, and then determines whether the current frame is in focus by determining whether the focus metric for that frame is above or below a threshold.
  • the threshold is an adaptive threshold in the sense that it varies adaptively depending upon the focus metric for previous frames. In this way, as the focus metric of each frame in turn is determined, when the focus metric of a given frame is above the adaptive threshold which varies based on the focus metric of previous frames, the system then determines that the focus of the current frame is sufficient. The fact that the focus is sufficient can then be used as part of the triggering of capturing a still image of the card.
  • the choice of focus metrics used will first be discussed followed by the manner in which the adaptive threshold is determined.
  • the embodiment uses five distinct focus metric calculation algorithms. Each of these algorithms produces a focus metric value for each sampled frame in a video stream. As shown in Figure 7a, the higher the focus metric value, the better the actual focus of the image as can be seen intuitively by the example frames of Figure 7b. Only one focus metric is required for the focus detection algorithm; alternative algorithms may be used depending upon the metric that performs best.
  • the focus metrics that may be used include:
  • the above focus metrics may be used in an embodiment, but the preferred approach is to use a Discrete Cosine Transform (DCT).
  • DCT Discrete Cosine Transform
  • a discrete cosine transform expresses a finite sequence of data points in terms of a sum of cosine functions oscillating at different frequencies.
  • focus values are calculated block by block (a 4x4 block size is used in the preferred implementation as shown by the sample areas in a region of interest in Figure 8).
  • a DCT transformation is applied to each image block, producing a representation of the block in the frequency domain.
  • the result contains a number of frequency components.
  • One of these components is the DC component, which represents the baseline of the image frequency.
  • the other components are considered to be high-frequency components.
  • the sum of all the quotients of the high frequency components divided by the DC component is considered to be the focus value of the block.
  • the focus value of the image can be calculated by aggregating the focus values for all blocks.
  • the process for producing the preferred focus metric may therefore be summarised by the following steps: 1 . For each 4x4 pixel block of the image, apply 2D DCT operation and obtain a 4x4 DCT frequency map.
  • the focus metric can be used on sub-images of the original image. Focus values can be calculated in regions of interest of the original image. This feature gives the application the ability to specify the region to focus on, as shown in Figure 8. By calculating a focus metric for small sub regions of the original image, the CPU consumption is also reduced.
  • the system must cope with a wide variety of different devices under different lighting conditions and card surfaces. When one of these conditions changes, the focus metrics can output values with significantly different ranges. Using a fixed threshold cannot discriminate focused images for all variations of image capture conditions. In order to provide accurate focus discrimination, an adaptive threshold algorithm has been created which can automatically adjust threshold values according to the focus values of historical sampled frames.
  • the adaptive threshold algorithm uses the following features:
  • Sliding window The algorithm keeps the focus values of recently sampled frames, using focus values within a sliding window, in which focus values are retained for frames within that window.
  • the window moves with the live video stream, thereby retaining the focus values for a specified number of frames.
  • the windows moves concurrently with the video stream, with the newly sampled focus values added in from the right side of the window and old focus values dropped out from the left side of the window, as shown in Figure 9.
  • the adaptive algorithm then operates as follows in relation to the sliding window. For each newly sampled frame the focus metric is calculated and the moving sliding window moved.
  • the adaptive threshold is recalculated based on an unfocused base line, a focused base line and a discrimination threshold for the focus values within the sliding window.
  • the focus value for the current frame is then compared to the adaptive threshold and the discrimination threshold and, if the focus value is above the adaptive threshold and discrimination threshold, then the frame is deemed to be in focus.
  • the values used within the focus detection process are as follows:
  • Minimum window size This is the minimum number of sampled frames that must be present in the sliding window before the adaptive threshold algorithm is applied.
  • Maximum window size This is the maximum number of sampled frames in the sliding window.
  • Adaptive threshold This threshold value roughly separates focused frames from non-focused frames. It adapts itself according to the values in the sliding window. If there is no value above the adaptive threshold in the sliding window, the adaptive threshold shrinks; if there is no value below the adaptive threshold in the sliding window, the adaptive threshold grows. The adaptive threshold is adjusted whenever a new frame is sampled.
  • Adaptive threshold higher limit This is the limit to which the adaptive threshold can grow.
  • Adaptive threshold lower limit This is the limit to which the adaptive threshold can shrink.
  • Adaptive threshold growing speed This is the speed at which the adaptive threshold grows.
  • Adaptive threshold shrinking speed This is the speed at which the adaptive threshold shrinks.
  • Un-focused baseline This is the mean of focus values lower than the adaptive threshold in the sliding window.
  • Focused baseline This is the larger of: the mean of focus values higher than the discrimination threshold in the sliding window; or the current adaptive threshold value.
  • Discrimination threshold This threshold is used for discriminating focused frames from unfocused frames. This threshold is the largest value among: the adaptive threshold, double the un-focused baseline and 80% of the focus baseline. These numbers may change after parameter optimisation. Using the combination of determining a focus metric for each frame and varying the adaptive threshold for that focus metric based on the focus metric for a certain number of previous frames as defined by the sliding window, an accurate determination of the focus of an image may be made within the user device.
  • An advantage of this is that the technique may be used across many different types of device using a process within a downloadable application and without direct control of the imaging optics (which is not available to applications for many user devices).
  • some control of the imaging optics may be included. For example, some devices allow a focus request to be transmitted from a downloadable application to the imaging optics of the device, prompting the imaging optics to attempt to obtain focus by varying the lens focus. Although the device will do its best to focus on the object, it is not guaranteed to get a perfectly focused image using this autofocus function.
  • the imaging optics will then attempt to hunt for the correct focus position and, in doing so, the focus metric will vary for a period of time.
  • the process described above is then operable to determine when an appropriate focus has been achieved based on the variation of the focus metric during the period of time that the imaging optics hunts for the correct focus.
  • the card detection process is only re-run if needed, for example if the final image being processed is freshly captured still image. If the image being used is, in fact, one of the frames of the video stream analysed, the card edges may already be available from the earlier card detection process. If the algorithm fails to detect any of the four edge lines, use the line metrics produced by the Card Detection Process as the edge line metrics for the high- resolution image.
  • the next step is to extract the card region from the high-resolution image and resize it to 1200x752 pixels.
  • the arrangement has produced a high resolution image of just the card, but the perspective may still require some correction if the card was not held perfectly parallel to the imaging season of the client device. For this reason a process is operated to identify the "corners" of the rectangular shape and then to apply perspective correction such that the corners are truly rectangular in position.
  • the next step is to extract the corner regions (for example 195x195 patches from the 1200x752 card region).
  • the process then "folds" the corner regions so that all the corners point to the northwest and thus can be treated the same way.
  • the folding process is known to the skilled person and involves translating and/or rotating the images.
  • the next step is to split each corner region into channels.
  • the process produces an edge image (for example using a Gaussian filter and Canny Edge Detector).
  • the separate processing of each channel is preferred, as this improves the quality, but a single channel could be used.
  • the process step is to merge the edge images from all channels (for example using a max operator). This produces a single edge image that results from the combined edge image of each channel.
  • the edge image processing steps so far produce an edge image of each corner as shown in Figure 10.
  • the process next identifies the exact corner points of the rectangular image.
  • the process draws the corresponding candidate edge line (produced in the first step) on each corner edge image, as shown in Figure 10. Then a template matching method is used to find the potential corner coordinates on the corner edge image.
  • Template matching techniques are known to the skilled person and involve comparing a template image to an image by sliding one with respect to the other. A template as shown in Figure 1 1 is used for this process.
  • the result matrix of the template matching method is shown in Figure 12. The brightest locations indicate the highest matches. In the result matrix, the brightest location is taken as the potential edge corner.
  • the corners are then unfolded to obtain corner coordinates.
  • the process then perspectively corrects the card region specified by the corner coordinates and generates the final card image (this can either be a colour image or a grayscale image).
  • An example is shown in Figure 13. When complete, the device transmits the properly framed card image to the server.
  • Image Upload The properly framed card image produced by the card framing process is immediately uploaded to the back-end OCR service for processing. Before uploading, the card image is resized to a size suitable for transmission (1200x752 is used in the current application).
  • the application can upload grayscale or color images.
  • the final image uses JPEG compression and the degree of compression can be specified.
  • the original high resolution image captured is uploaded to a remote server or Cloud storage for further processing, such as a fraud detection or face recognition based ID verification.
  • Image serialisation queue This is a first-in-first-out (FIFO) queue maintaining the images to be serialised to the file system.
  • Image upload queue This is a FIFO queue maintaining the path information of image files to be uploaded to remote storage.
  • Serialisation background thread This serialises the images in the image serialisation queue from memory to the file system in the background.
  • Upload background thread This uploads the images referenced by the path information in the image upload queue from the client's file system to a remote server or Cloud storage in the background.
  • Background upload process After an image has been captured, the image is stored in memory on the client. The captured images are put in an image serialisation queue. The images in the queue are serialised to the client's file system one by one by the serialisation background thread. After serialisation, the image is removed from the image serialisation queue and the storage path information of the image file (not the image file itself) is put in a file upload queue. The upload background thread uploads the images referenced by the storage path information in the image upload queue one by one to remote storage. Once an image has been uploaded successfully, it is removed from the file storage and its storage path information is also removed from the image upload queue. The image upload queue is also backed up on the file system, so the client can resume the image upload task if the client is restarted.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Library & Information Science (AREA)
  • Optics & Photonics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

L'invention concerne des procédés exploitables dans des dispositifs clients pour capturer une image d'un document. Ces procédés consistent à capturer automatiquement une image fixe alors qu'un algorithme de détection des bords détermine le bord de document à localiser. L'algorithme de détection implique un traitement visant à augmenter l'importance de structures horizontales des régions supérieures et inférieures d'un cadre, et à augmenter l'importance de structures verticales des régions de gauche et de droite, de façon à améliorer la probabilité de capture d'un document. L'invention propose également des mesures de mise au point variable permettant de capturer un document lorsque ces mesures de mise au point dépassent un seuil adaptatif. Ce seuil adaptatif prend en compte un historique des variations des mesures de mise au point appliquées à une suite de cadres. L'invention concerne également un traitement permettant de corriger la perspective des coins d'image d'un document de façon à bien inscrire ces coins dans un rectangle.
PCT/EP2014/060154 2013-05-17 2014-05-16 Capture d'image au moyen d'un dispositif client WO2014184372A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB1308954.5A GB2517674A (en) 2013-05-17 2013-05-17 Image capture using client device
GB1308954.5 2013-05-17

Publications (1)

Publication Number Publication Date
WO2014184372A1 true WO2014184372A1 (fr) 2014-11-20

Family

ID=48746949

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2014/060154 WO2014184372A1 (fr) 2013-05-17 2014-05-16 Capture d'image au moyen d'un dispositif client

Country Status (2)

Country Link
GB (1) GB2517674A (fr)
WO (1) WO2014184372A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820968A (zh) * 2015-04-22 2015-08-05 上海理工大学 文本稿角度校正方法
CN108229368A (zh) * 2017-12-28 2018-06-29 浙江大华技术股份有限公司 一种视频显示方法及装置
CN108304839A (zh) * 2017-08-31 2018-07-20 腾讯科技(深圳)有限公司 一种图像数据处理方法以及装置
CN108429877A (zh) * 2017-02-15 2018-08-21 腾讯科技(深圳)有限公司 图像采集方法和移动终端
US10341418B2 (en) 2015-11-06 2019-07-02 Microsoft Technology Licensing, Llc Reducing network bandwidth utilization during file transfer
CN112183517A (zh) * 2020-09-22 2021-01-05 平安科技(深圳)有限公司 证卡边缘检测方法、设备及存储介质
US20230153697A1 (en) * 2016-06-23 2023-05-18 Capital One Services, Llc Systems and methods for automated object recognition

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105611147A (zh) * 2015-10-30 2016-05-25 北京旷视科技有限公司 拍照方法和设备
CN105512658B (zh) * 2015-12-03 2019-03-15 小米科技有限责任公司 矩形物体的图像识别方法及装置

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155775A (en) * 1988-10-13 1992-10-13 Brown C David Structured illumination autonomous machine vision system
WO2000014968A1 (fr) * 1998-09-10 2000-03-16 Wisconsin Alumni Research Foundation Reduction des oscillations amorties dans les images decompressees par filtrage morphologique a posteriori et dispositif a cet effet
US20030095709A1 (en) * 2001-11-09 2003-05-22 Lingxiang Zhou Multiple image area detection in a digital image
US20050249430A1 (en) * 2004-05-07 2005-11-10 Samsung Electronics Co., Ltd. Image quality improving apparatus and method
US20060020203A1 (en) * 2004-07-09 2006-01-26 Aloka Co. Ltd. Method and apparatus of image processing to detect and enhance edges
US20070262148A1 (en) * 2006-05-11 2007-11-15 Samsung Electronics Co., Ltd. Apparatus and method for photographing a business card in portable terminal
US20110274353A1 (en) * 2010-05-07 2011-11-10 Hailong Yu Screen area detection method and screen area detection system
US20120163728A1 (en) * 2010-12-22 2012-06-28 Wei-Ning Sun Image rectification method
US20130051671A1 (en) * 2011-08-25 2013-02-28 Mark A. Barton Method for segmenting a composite image
US20130071033A1 (en) * 2011-09-21 2013-03-21 Tandent Vision Science, Inc. Classifier for use in generating a diffuse image

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7986806B2 (en) * 1994-11-16 2011-07-26 Digimarc Corporation Paper products and physical objects as means to access and control a computer or to navigate over or act as a portal on a network
US7301564B2 (en) * 2002-07-17 2007-11-27 Hewlett-Packard Development Company, L.P. Systems and methods for processing a digital captured image
JP4363151B2 (ja) * 2003-10-14 2009-11-11 カシオ計算機株式会社 撮影装置、その画像処理方法及びプログラム
JP4033198B2 (ja) * 2004-02-27 2008-01-16 カシオ計算機株式会社 画像処理装置、画像投影装置、画像処理方法及びプログラム
JP3874761B2 (ja) * 2004-03-18 2007-01-31 株式会社ソニー・コンピュータエンタテインメント エンタテイメント装置、エンタテイメント方法及びプログラム
US7593595B2 (en) * 2004-08-26 2009-09-22 Compulink Management Center, Inc. Photographic document imaging system
US7729602B2 (en) * 2007-03-09 2010-06-01 Eastman Kodak Company Camera using multiple lenses and image sensors operable in a default imaging mode
CN101681432B (zh) * 2007-05-01 2013-11-06 计算机连接管理中心公司 图片文档分割方法和系统
US7780084B2 (en) * 2007-06-29 2010-08-24 Microsoft Corporation 2-D barcode recognition
EP2166408B1 (fr) * 2008-09-17 2014-03-12 Ricoh Company, Ltd. Dispositif d'imagerie et procédé d'imagerie l'utilisant
CN102273190A (zh) * 2008-10-31 2011-12-07 惠普开发有限公司 适合于选择聚焦设置的方法和数字成像设备
WO2010140159A2 (fr) * 2009-06-05 2010-12-09 Hewlett-Packard Development Company, L.P. Détection de bordure
US8630504B2 (en) * 2012-01-16 2014-01-14 Hiok Nam Tay Auto-focus image system

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5155775A (en) * 1988-10-13 1992-10-13 Brown C David Structured illumination autonomous machine vision system
WO2000014968A1 (fr) * 1998-09-10 2000-03-16 Wisconsin Alumni Research Foundation Reduction des oscillations amorties dans les images decompressees par filtrage morphologique a posteriori et dispositif a cet effet
US20030095709A1 (en) * 2001-11-09 2003-05-22 Lingxiang Zhou Multiple image area detection in a digital image
US20050249430A1 (en) * 2004-05-07 2005-11-10 Samsung Electronics Co., Ltd. Image quality improving apparatus and method
US20060020203A1 (en) * 2004-07-09 2006-01-26 Aloka Co. Ltd. Method and apparatus of image processing to detect and enhance edges
US20070262148A1 (en) * 2006-05-11 2007-11-15 Samsung Electronics Co., Ltd. Apparatus and method for photographing a business card in portable terminal
US20110274353A1 (en) * 2010-05-07 2011-11-10 Hailong Yu Screen area detection method and screen area detection system
US20120163728A1 (en) * 2010-12-22 2012-06-28 Wei-Ning Sun Image rectification method
US20130051671A1 (en) * 2011-08-25 2013-02-28 Mark A. Barton Method for segmenting a composite image
US20130071033A1 (en) * 2011-09-21 2013-03-21 Tandent Vision Science, Inc. Classifier for use in generating a diffuse image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
JUNG C R ET AL: "Rectangle detection based on a windowed hough transform", COMPUTER GRAPHICS AND IMAGE PROCESSING, 2004. PROCEEDINGS. 17TH BRAZIL IAN SYMPOSIUM ON CURITIBA, PR, BRAZIL 17-20 OCT. 2004, PISCATAWAY, NJ, USA,IEEE, 17 October 2004 (2004-10-17), pages 113 - 120, XP010737732, ISBN: 978-0-7695-2227-2, DOI: 10.1109/SIBGRA.2004.1352951 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104820968A (zh) * 2015-04-22 2015-08-05 上海理工大学 文本稿角度校正方法
US10341418B2 (en) 2015-11-06 2019-07-02 Microsoft Technology Licensing, Llc Reducing network bandwidth utilization during file transfer
US20230153697A1 (en) * 2016-06-23 2023-05-18 Capital One Services, Llc Systems and methods for automated object recognition
US11783234B2 (en) * 2016-06-23 2023-10-10 Capital One Services, Llc Systems and methods for automated object recognition
CN108429877A (zh) * 2017-02-15 2018-08-21 腾讯科技(深圳)有限公司 图像采集方法和移动终端
CN108304839A (zh) * 2017-08-31 2018-07-20 腾讯科技(深圳)有限公司 一种图像数据处理方法以及装置
CN108304839B (zh) * 2017-08-31 2021-12-17 腾讯科技(深圳)有限公司 一种图像数据处理方法以及装置
CN108229368A (zh) * 2017-12-28 2018-06-29 浙江大华技术股份有限公司 一种视频显示方法及装置
CN108229368B (zh) * 2017-12-28 2020-05-26 浙江大华技术股份有限公司 一种视频显示方法及装置
CN112183517A (zh) * 2020-09-22 2021-01-05 平安科技(深圳)有限公司 证卡边缘检测方法、设备及存储介质
CN112183517B (zh) * 2020-09-22 2023-08-11 平安科技(深圳)有限公司 证卡边缘检测方法、设备及存储介质

Also Published As

Publication number Publication date
GB201308954D0 (en) 2013-07-03
GB2517674A (en) 2015-03-04

Similar Documents

Publication Publication Date Title
US11967164B2 (en) Object detection and image cropping using a multi-detector approach
WO2014184372A1 (fr) Capture d'image au moyen d'un dispositif client
JP6255486B2 (ja) 情報認識のための方法及びシステム
US9996741B2 (en) Systems and methods for classifying objects in digital images captured using mobile devices
US8457403B2 (en) Method of detecting and correcting digital images of books in the book spine area
RU2631765C1 (ru) Способ и система исправления перспективных искажений в изображениях, занимающих двухстраничный разворот
EP2973226A1 (fr) Classification des objets dans des images numériques capturées à l'aide de dispositifs mobiles
KR20130066819A (ko) 촬영 이미지 기반의 문자 인식 장치 및 방법
CN114255337A (zh) 文档图像的矫正方法、装置、电子设备及存储介质
US20150112853A1 (en) Online loan application using image capture at a client device
US8306335B2 (en) Method of analyzing digital document images
CN107085699B (zh) 信息处理设备、信息处理设备的控制方法和存储介质
US10373329B2 (en) Information processing apparatus, information processing method and storage medium for determining an image to be subjected to a character recognition processing
US10275888B2 (en) Algorithmic method for detection of documents in images
CN108304840B (zh) 一种图像数据处理方法以及装置
JP2017120455A (ja) 情報処理装置、プログラム及び制御方法
KR102071975B1 (ko) 광학적 문자 인식을 사용하는 카드 결제 장치 및 방법
JP6077873B2 (ja) 画像処理装置、画像処理方法
KR101349672B1 (ko) 영상 특징 고속 추출 방법 및 이를 지원하는 장치
Ettl et al. Text and image area classification in mobile scanned digitised documents
CN116401484A (zh) 纸质材料电子化的处理方法、装置、终端及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14724470

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14724470

Country of ref document: EP

Kind code of ref document: A1