WO2021262187A1 - Document image relighting - Google Patents

Document image relighting Download PDF

Info

Publication number
WO2021262187A1
WO2021262187A1 PCT/US2020/039758 US2020039758W WO2021262187A1 WO 2021262187 A1 WO2021262187 A1 WO 2021262187A1 US 2020039758 W US2020039758 W US 2020039758W WO 2021262187 A1 WO2021262187 A1 WO 2021262187A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
document
varying
lighting surface
neural network
Prior art date
Application number
PCT/US2020/039758
Other languages
French (fr)
Inventor
Lucas Nedel KIRSTEN
Ricardo RIBANI
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2020/039758 priority Critical patent/WO2021262187A1/en
Publication of WO2021262187A1 publication Critical patent/WO2021262187A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • G06T5/94Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/60Image enhancement or restoration using machine learning, e.g. neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/16Image preprocessing
    • G06V30/164Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10141Special mode during image acquisition
    • G06T2207/10152Varying illumination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/387Composing, repositioning or otherwise geometrically modifying originals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N1/00Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
    • H04N1/46Colour picture communication systems
    • H04N1/56Processing of colour picture signals
    • H04N1/60Colour correction or control
    • H04N1/6083Colour correction or control controlled by factors external to the apparatus
    • H04N1/6086Colour correction or control controlled by factors external to the apparatus by scene illuminant, i.e. conditions at the time of picture capture, e.g. flash, optical filter used, evening, cloud, daylight, artificial lighting, white point measurement, colour temperature

Definitions

  • FIG. 1 is a diagram of an example neural network for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
  • FIG. 2 is a diagram of the example neural network of FIG.1 in more detail, as a convolutional neural network.
  • FIG. 3 is a diagram of an example non-transitory computer- readable data storage medium for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
  • FIG. 4 is a block diagram of an example computing device that can capture an image of a document under varying environmental conditions and that can relight the document image as if it had been captured under non varying environmental lighting conditions.
  • FIG. 5 is a flowchart of an example method for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured by relighting the image with a non-varying lighting surface.
  • a physical document can be scanned as a digital image to convert the document to electronic form.
  • dedicated scanning devices have been used to scan documents to generate images of the documents.
  • Such dedicated scanning devices include sheetfed scanning devices, flatbed scanning devices, and document camera scanning devices.
  • a dedicated scanning device can optimally light a document during scanning, so that the resultantly scanned image is largely if not completely free from artifacts that may otherwise result from non-optimal lighting conditions. This is because the scanning device is able to control the lighting conditions under which the image is scanned.
  • a non- dedicated scanning device may capture an image of a document under varying environmental lighting conditions due to a variety of different factors.
  • varying environmental lighting conditions may result from the external light incident to the document varying over the document surface, because of a light source being off-axis from the document, or because of other physical objects casting shadows on the document.
  • the physical properties of the document itself can contribute to varying environmental lighting conditions, such as when the document has folds, creases, or is otherwise not perfectly flat.
  • the angle at which the non- dedicated scanning device is positioned relative to the document during image capture can also contribute to varying environmental lighting conditions.
  • Capturing an image of a document under varying lighting environmental conditions can imbue the captured image with undesirable artifacts.
  • artifacts can include darkened areas within the image in correspondence with shadows discernibly or indiscernibly cast during image capture.
  • Existing approaches for enhancing document images captured by non-dedicated scanning devices to remove artifacts from the scanned images are usually general purpose, and do not focus on artifacts resulting from varying environmental lighting conditions. The approaches thus may remove such artifacts with less than satisfactory results.
  • Techniques described herein can remove artifacts within a captured image of a document that result from varying environmental lighting conditions.
  • the image of the document can be relighted as if it had been captured under non-varying environmental lighting conditions, and thus under near-optimal if not optimal lighting conditions.
  • a document image may have a varying lighting surface corresponding to the varying environmental lighting conditions under which the image was captured.
  • the varying lighting surface can be removed from the document image prior to relighting the image with a non-varying lighting surface corresponding to non-varying environmental lighting conditions akin to those under which dedicated scanning devices scan documents.
  • FIG. 1 shows an example neural network 100 for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
  • the neural network 100 is more generally a machine learning model.
  • the neural network 100 can include a varying lighting surface removal encoder network 102 and a relighting decoder network 104.
  • the encoder and decoder networks 102 and 104 may thus be corresponding parts of the same neural network 100 (and thus parts of the same overall machine learning model), and may themselves each be considered a neural network.
  • the neural network 100 may be implemented in one implementation similar to the technique described in the technical journal article T. Sun et al. , “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, Article 79, published in July 2019. This article describes a technique for relighting a portrait photograph with an input target light.
  • the neural network 100 of FIG. 1 differs from the article’s described neural network in that the neural network 100 acts upon an image of a document, which the article does not describe. Unlike the article’s neural network, the neural network 100 removes artifacts introduced into an image due to the varying environmental lighting conditions under which the image was captured, which the article does not contemplate.
  • a captured image 108 of a document with a varying lighting surface is input into the neural network 100, and the neural network 100 correspondingly outputs a relighted image 112 of the document with a non varying lighting surface.
  • the image 108 may have been captured by an image sensor of a computing device like a smartphone, under varying environmental lighting conditions.
  • the lighting surface of the image 108 is the data integrated within the image 108 that results from the image 108 of the document having been captured under varying environmental lighting conditions, and thus is a varying lighting surface corresponding to these varying environmental lighting conditions.
  • the captured image 108 is specifically input into the encoder network 102 of the neural network 100.
  • the encoder network 102 encodes a representation 109 of the image features of the image 108 of the document that does not include the varying lighting surface, and which is passed to the decoder network 104.
  • the representation 109 of the image 108 can therefore be considered a representation of the image features of the image 108 of the document, such as a vector of image descriptions of the image 108.
  • the encoder network 102 thus in effect extracts the varying lighting surface 110 from the captured image 108, which can be considered as being output by the encoder network 102.
  • the extracted varying lighting surface 110 of the image 108 is not subsequently used within the neural network 100, and therefore can be discarded.
  • the extracted varying lighting surface 110 may thus be employed just during the training of the neural network 100.
  • the decoder network 104 decodes the encoded representation of the image 108 passed by the encoder 102 to regenerate the image 108 of the document as the image 112.
  • the decoder network 104 relights the image 108 with a non-varying lighting surface 106. Therefore, the image 112 is a relighted image, corresponding to the captured image 108 with the varying light surface 110 removed and with the non-varying lighting surface 106 added.
  • the non varying lighting surface 108 corresponds to non-varying lighting conditions, such as those under which a dedicated scanning device may scan images of documents.
  • the non-varying lighting surface 108 may not be separately input into the decoder network 104, which is another way by which the neural network 100 differs from the neural network of the article referenced above.
  • the non-varying lighting surface 108 can instead be integrated within the decoder network 104 itself.
  • the non-varying lighting surface 108 may be constructively represented as a reference blank one-color image.
  • Such a reference blank one-color image may be conceptualized as an ideal blank sheet of paper of the same size as the document captured as the image 108, with white pixels at non-varying maximum brightness and contrast.
  • FIG. 2 shows the neural network 100 in more detail, as a convolutional neural network.
  • a convolutional neural network is a type of deep neural network, which can be employed in the context of image analysis and processing.
  • the encoder and decoder networks 102 and 104 may thus also be considered convolutional neural networks.
  • the encoder network 102 has cascading encoder layers 202A, 202B, . . ., 202N of decreasing spatial resolution, which are collectively referenced as the layers 202.
  • the decoder network 104 similarly has cascading decoder layers 204A, . . 204M, 204N of increasing spatial resolution, which are collectively referenced as the layers 204.
  • the layers 202 and 204 can include convolutional layers, batch normalization layers, and activation layers, for instance.
  • the captured image 108 is input into the encoder layer 202A corresponding to the highest spatial resolution.
  • the encoder layers 202 sequentially process the captured image 108 in cascading fashion, with the image 108 downsampled in spatial resolution from one layer 202 to the next layer 202 as indicated by arrow 206.
  • Each encoder layer 202 encodes the representation 109 of the features of the image 108 in correspondence with its spatial resolution.
  • Each encoder layer 202 further passes the image 108 as downsampled to the next layer 202. Processing by the encoder network 102 therefore occurs from the layer 202A at maximum resolution to the layer 202N at minimum resolution, at which point the extracted varying lighting surface 110 (at all resolutions) may be output and discarded.
  • the representation 109 of the image features is distributively input into the decoder network 104 over its decoder layers 204.
  • Each decoder layer 204 is input the representation 109 from the encoder layer 202 at the same spatial resolution.
  • the decoder layers 204A, . . ., 204M, 204N therefore respectively correspond to the encoder layers 202N, . . ., 202B,
  • the decoder layers 204 regenerate the captured image 108 from which the varying lighting surface 110 has been removed and to which a non-varying lighting surface has been added, as the relighted image 112.
  • the decoder layers 204 sequentially generate the relighted image 112 in cascading fashion, with the image 112 upsampled in spatial resolution from one layer 204 to the next layer 204 as indicated by arrow 210.
  • the non-varying lighting surface is in effect representatively integrated within the decoder layers 204 as respective constants 208A,
  • each decoder layer 204 is hardcoded to generate the relighted image 112 in correspondence with its spatial resolution such that the captured image 108 is relighted by a non-varying lighting surface.
  • Each decoder layer 204 decodes the representation 109 of the image features, and passes the relighted image 112 as upsampled to the next layer 204. Processing by the decoder network 104 occurs from the layer 204A at minimum resolution to the layer 204N at maximum resolution, with the generated relighted image 112 having the same resolution as the captured image 108 output at the layer 204N.
  • FIG. 3 shows an example non-transitory computer-readable data storage medium 300.
  • the computer-readable data storage medium stores program code 302 executable by a computing device, such as a smartphone or other mobile computing device, to perform processing.
  • the processing includes causing an image sensor to capture an image of a document that is under varying environmental lighting conditions (304).
  • the image sensor may be part of the same computing device that is executing the program code 302.
  • the image sensor may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
  • the processing includes removing artifacts within the document image that result from the varying environmental light conditions under which the image was captured (306), by relighting the image as if it had been captured under non-varying environmental lighting conditions.
  • the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured may be removed from the image, and the image relighted with a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
  • a machine-learning model like a neural network, such as a convolutional neural network, may be employed, as has been described.
  • FIG. 4 shows an example computing device 400.
  • the computing device 400 may be a smartphone or other mobile computing device, for instance.
  • the computing device 400 includes an image sensor 402 and image enhance hardware 404.
  • the image enhance hardware 404 may include a processor and a non-transitory computer-readable data storage medium storing program code that the processor executes.
  • the processor may be a general-purpose processor separate from the data storage medium.
  • the processor may instead be a special-purpose processor integrated with the data storage medium, as is the case with an application-specific integrated circuit (IC), as one example.
  • IC application-specific integrated circuit
  • the image sensor 402 captures an image of a document under varying environmental conditions.
  • the image sensor 402 may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
  • the image enhance hardware 404 relights the captured image of the document as if the image had been captured under non-varying environmental lighting conditions.
  • the image enhance hardware 404 may employ a machine-learning model like a neural network, such as a convolutional neural network, as has been described. The image enhance hardware 404 may thus remove the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured, and then add a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
  • FIG. 5 shows an example method 500.
  • the method 500 can be performed by a processor, such as the (general- or special-purpose) processor of a computing device like a smartphone or other mobile computing device.
  • the method 500 includes receiving an image of a document having a varying light surface (502).
  • the document image may have been captured under varying environmental lighting conditions to which the varying lighting surface corresponds.
  • the method 500 includes removing the varying lighting surface from the document image (504), such as by using an encoder neural network like an encoder convolutional neural network as has been described.
  • the method 500 includes relighting the image document from which the varying lighting surface has been removed with a non-varying lighting surface (506), such as by using a decoder neural network like a decoder convolutional neural network as has been described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Facsimile Image Signal Circuits (AREA)
  • Image Processing (AREA)

Abstract

A processor receives an image of a document that may have been captured under varying lighting conditions has a varying lighting surface. The processor removes the varying lighting surface removed from the varying lighting surface from the image of the document. The process relights the image of the document with a non-varying lighting surface. The image of the document may thus be relighted as if the image of the document had been captured under non-varying environmental lighting conditions.

Description

DOCUMENT IMAGE RELIGHTING
BACKGROUND
[0001] While information is increasingly communicated in electronic form with the advent of modern computing and networking technologies, physical documents, such as printed and handwritten sheets of paper and other physical media, are still often exchanged. Such documents can be converted to electronic form by a process known as optical scanning. Once a document has been scanned as a digital image, the resulting image may be archived, or may undergo further processing to extract information contained within the document image so that the information is more usable. For example, the document image may undergo optical character recognition (OCR), which converts the image into text that can be edited, searched, and stored more compactly than the image itself.
BRIEF DESCRIPTION OF THE DRAWINGS [0002] FIG. 1 is a diagram of an example neural network for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
[0003] FIG. 2 is a diagram of the example neural network of FIG.1 in more detail, as a convolutional neural network. [0004] FIG. 3 is a diagram of an example non-transitory computer- readable data storage medium for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured. [0005] FIG. 4 is a block diagram of an example computing device that can capture an image of a document under varying environmental conditions and that can relight the document image as if it had been captured under non varying environmental lighting conditions.
[0006] FIG. 5 is a flowchart of an example method for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured by relighting the image with a non-varying lighting surface.
DETAILED DESCRIPTION
[0007] As noted in the background, a physical document can be scanned as a digital image to convert the document to electronic form. Traditionally, dedicated scanning devices have been used to scan documents to generate images of the documents. Such dedicated scanning devices include sheetfed scanning devices, flatbed scanning devices, and document camera scanning devices. A dedicated scanning device can optimally light a document during scanning, so that the resultantly scanned image is largely if not completely free from artifacts that may otherwise result from non-optimal lighting conditions. This is because the scanning device is able to control the lighting conditions under which the image is scanned.
[0008] Flowever, with the near ubiquitousness of smartphones and other usually mobile computing devices that include cameras and other types of image sensors, documents are often scanned with such non-dedicated scanning devices. A difficulty with scanning documents using a non- dedicated scanning device is that the document images are generally captured under non-optimal lighting conditions. Stated another way, a non- dedicated scanning device may capture an image of a document under varying environmental lighting conditions due to a variety of different factors. [0009] For example, varying environmental lighting conditions may result from the external light incident to the document varying over the document surface, because of a light source being off-axis from the document, or because of other physical objects casting shadows on the document. The physical properties of the document itself can contribute to varying environmental lighting conditions, such as when the document has folds, creases, or is otherwise not perfectly flat. The angle at which the non- dedicated scanning device is positioned relative to the document during image capture can also contribute to varying environmental lighting conditions. [0010] Capturing an image of a document under varying lighting environmental conditions can imbue the captured image with undesirable artifacts. For example, such artifacts can include darkened areas within the image in correspondence with shadows discernibly or indiscernibly cast during image capture. Existing approaches for enhancing document images captured by non-dedicated scanning devices to remove artifacts from the scanned images are usually general purpose, and do not focus on artifacts resulting from varying environmental lighting conditions. The approaches thus may remove such artifacts with less than satisfactory results.
[0011] Techniques described herein can remove artifacts within a captured image of a document that result from varying environmental lighting conditions. The image of the document can be relighted as if it had been captured under non-varying environmental lighting conditions, and thus under near-optimal if not optimal lighting conditions. A document image may have a varying lighting surface corresponding to the varying environmental lighting conditions under which the image was captured. The varying lighting surface can be removed from the document image prior to relighting the image with a non-varying lighting surface corresponding to non-varying environmental lighting conditions akin to those under which dedicated scanning devices scan documents.
[0012] FIG. 1 shows an example neural network 100 for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured. The neural network 100 is more generally a machine learning model. The neural network 100 can include a varying lighting surface removal encoder network 102 and a relighting decoder network 104. The encoder and decoder networks 102 and 104 may thus be corresponding parts of the same neural network 100 (and thus parts of the same overall machine learning model), and may themselves each be considered a neural network.
[0013] The neural network 100 may be implemented in one implementation similar to the technique described in the technical journal article T. Sun et al. , “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, Article 79, published in July 2019. This article describes a technique for relighting a portrait photograph with an input target light. The neural network 100 of FIG. 1 differs from the article’s described neural network in that the neural network 100 acts upon an image of a document, which the article does not describe. Unlike the article’s neural network, the neural network 100 removes artifacts introduced into an image due to the varying environmental lighting conditions under which the image was captured, which the article does not contemplate.
[0014] A captured image 108 of a document with a varying lighting surface is input into the neural network 100, and the neural network 100 correspondingly outputs a relighted image 112 of the document with a non varying lighting surface. The image 108 may have been captured by an image sensor of a computing device like a smartphone, under varying environmental lighting conditions. The lighting surface of the image 108 is the data integrated within the image 108 that results from the image 108 of the document having been captured under varying environmental lighting conditions, and thus is a varying lighting surface corresponding to these varying environmental lighting conditions.
[0015] The captured image 108 is specifically input into the encoder network 102 of the neural network 100. The encoder network 102 encodes a representation 109 of the image features of the image 108 of the document that does not include the varying lighting surface, and which is passed to the decoder network 104. The representation 109 of the image 108 can therefore be considered a representation of the image features of the image 108 of the document, such as a vector of image descriptions of the image 108. The encoder network 102 thus in effect extracts the varying lighting surface 110 from the captured image 108, which can be considered as being output by the encoder network 102. However, the extracted varying lighting surface 110 of the image 108 is not subsequently used within the neural network 100, and therefore can be discarded. The extracted varying lighting surface 110 may thus be employed just during the training of the neural network 100. [0016] The decoder network 104 decodes the encoded representation of the image 108 passed by the encoder 102 to regenerate the image 108 of the document as the image 112. In generating the image 112 of the document, the decoder network 104 relights the image 108 with a non-varying lighting surface 106. Therefore, the image 112 is a relighted image, corresponding to the captured image 108 with the varying light surface 110 removed and with the non-varying lighting surface 106 added. The non varying lighting surface 108 corresponds to non-varying lighting conditions, such as those under which a dedicated scanning device may scan images of documents.
[0017] The non-varying lighting surface 108 may not be separately input into the decoder network 104, which is another way by which the neural network 100 differs from the neural network of the article referenced above. The non-varying lighting surface 108 can instead be integrated within the decoder network 104 itself. The non-varying lighting surface 108 may be constructively represented as a reference blank one-color image. Such a reference blank one-color image may be conceptualized as an ideal blank sheet of paper of the same size as the document captured as the image 108, with white pixels at non-varying maximum brightness and contrast.
[0018] FIG. 2 shows the neural network 100 in more detail, as a convolutional neural network. A convolutional neural network is a type of deep neural network, which can be employed in the context of image analysis and processing. The encoder and decoder networks 102 and 104 may thus also be considered convolutional neural networks. The encoder network 102 has cascading encoder layers 202A, 202B, . . ., 202N of decreasing spatial resolution, which are collectively referenced as the layers 202. The decoder network 104 similarly has cascading decoder layers 204A, . . 204M, 204N of increasing spatial resolution, which are collectively referenced as the layers 204. The layers 202 and 204 can include convolutional layers, batch normalization layers, and activation layers, for instance.
[0019] The captured image 108 is input into the encoder layer 202A corresponding to the highest spatial resolution. The encoder layers 202 sequentially process the captured image 108 in cascading fashion, with the image 108 downsampled in spatial resolution from one layer 202 to the next layer 202 as indicated by arrow 206. Each encoder layer 202 encodes the representation 109 of the features of the image 108 in correspondence with its spatial resolution. Each encoder layer 202 further passes the image 108 as downsampled to the next layer 202. Processing by the encoder network 102 therefore occurs from the layer 202A at maximum resolution to the layer 202N at minimum resolution, at which point the extracted varying lighting surface 110 (at all resolutions) may be output and discarded.
[0020] The representation 109 of the image features is distributively input into the decoder network 104 over its decoder layers 204. Each decoder layer 204 is input the representation 109 from the encoder layer 202 at the same spatial resolution. The decoder layers 204A, . . ., 204M, 204N therefore respectively correspond to the encoder layers 202N, . . ., 202B,
202A in spatial resolution. The decoder layers 204 regenerate the captured image 108 from which the varying lighting surface 110 has been removed and to which a non-varying lighting surface has been added, as the relighted image 112. The decoder layers 204 sequentially generate the relighted image 112 in cascading fashion, with the image 112 upsampled in spatial resolution from one layer 204 to the next layer 204 as indicated by arrow 210.
[0021] The non-varying lighting surface is in effect representatively integrated within the decoder layers 204 as respective constants 208A,
208M, 208N, collectively referenced as the constants 208. In other words, each decoder layer 204 is hardcoded to generate the relighted image 112 in correspondence with its spatial resolution such that the captured image 108 is relighted by a non-varying lighting surface. Each decoder layer 204 decodes the representation 109 of the image features, and passes the relighted image 112 as upsampled to the next layer 204. Processing by the decoder network 104 occurs from the layer 204A at minimum resolution to the layer 204N at maximum resolution, with the generated relighted image 112 having the same resolution as the captured image 108 output at the layer 204N.
[0022] FIG. 3 shows an example non-transitory computer-readable data storage medium 300. The computer-readable data storage medium stores program code 302 executable by a computing device, such as a smartphone or other mobile computing device, to perform processing. The processing includes causing an image sensor to capture an image of a document that is under varying environmental lighting conditions (304). The image sensor may be part of the same computing device that is executing the program code 302. The image sensor may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
[0023] The processing includes removing artifacts within the document image that result from the varying environmental light conditions under which the image was captured (306), by relighting the image as if it had been captured under non-varying environmental lighting conditions. The varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured may be removed from the image, and the image relighted with a non-varying lighting surface corresponding to the non-varying environmental lighting conditions. For instance, a machine-learning model like a neural network, such as a convolutional neural network, may be employed, as has been described.
[0024] FIG. 4 shows an example computing device 400. The computing device 400 may be a smartphone or other mobile computing device, for instance. The computing device 400 includes an image sensor 402 and image enhance hardware 404. The image enhance hardware 404 may include a processor and a non-transitory computer-readable data storage medium storing program code that the processor executes. The processor may be a general-purpose processor separate from the data storage medium. The processor may instead be a special-purpose processor integrated with the data storage medium, as is the case with an application-specific integrated circuit (IC), as one example.
[0025] The image sensor 402 captures an image of a document under varying environmental conditions. The image sensor 402 may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do. The image enhance hardware 404 relights the captured image of the document as if the image had been captured under non-varying environmental lighting conditions. The image enhance hardware 404 may employ a machine-learning model like a neural network, such as a convolutional neural network, as has been described. The image enhance hardware 404 may thus remove the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured, and then add a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
[0026] FIG. 5 shows an example method 500. The method 500 can be performed by a processor, such as the (general- or special-purpose) processor of a computing device like a smartphone or other mobile computing device. The method 500 includes receiving an image of a document having a varying light surface (502). The document image may have been captured under varying environmental lighting conditions to which the varying lighting surface corresponds. The method 500 includes removing the varying lighting surface from the document image (504), such as by using an encoder neural network like an encoder convolutional neural network as has been described. The method 500 includes relighting the image document from which the varying lighting surface has been removed with a non-varying lighting surface (506), such as by using a decoder neural network like a decoder convolutional neural network as has been described.
[0027] Techniques have been described herein for removing artifacts within an image of a document captured under varying environmental lighting conditions. The artifacts that are removed are those resulting from these varying environmental lighting conditions. The described techniques have been demonstrated to better enhance such a scanned document in this respect, as compared to other scanned document artifact removal techniques that do not specifically consider the environmental lighting conditions under which an image of the document was captured.

Claims

We claim:
1. A non-transitory computer-readable data storage medium comprising program code executable by a processor of a computing device to perform processing comprising: causing an image sensor of the computing device to capture an image of a document under varying environmental lighting conditions; and removing artifacts within the image of the document resulting from the varying environmental lighting conditions by relighting the image of the document as if the image of the document had been captured under non- varying environmental lighting conditions.
2. The non-transitory computer-readable data storage medium of claim 1 , wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions comprises: removing a varying lighting surface from the image of the document, the varying lighting surface corresponding to the varying environmental lighting conditions under which the image of the document was captured by the image sensor.
3. The non-transitory computer-readable data storage medium of claim 2, wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions further comprises: relighting the image of the document, from which the varying lighting surface has been removed, with a non-varying lighting surface.
4. The non-transitory computer-readable data storage medium of claim 3, wherein the non-varying lighting surface is constructively represented as a reference blank one-color image during relighting of the image of the document with the non-varying lighting surface.
5. The non-transitory computer-readable data storage medium of claim 3, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, and wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional neural network having a plurality of cascading decoder layers of increasing spatial resolution.
6. The non-transitory computer-readable data storage medium of claim 5, wherein the non-varying lighting surface is representatively integrated within the decoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.
7. A computing device comprising: an image sensor to capture an image of a document under varying environmental lighting conditions; and image enhance hardware to relight the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions.
8. A method comprising: receiving, by a processor, an image of a document having a varying lighting surface; removing, by the processor, the varying lighting surface from the image of the document; and relighting, by the processor, the image of the document with a non varying lighting surface.
9. The method of claim 8, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, the image of the document being input into a first cascading encoder layer having a highest spatial resolution, and wherein at each cascading encoder convolutional layer, the encoder convolutional neural network outputs a representation of the image of the document from which the varying lighting surface.
10. The method of claim 9, wherein the encoder convolutional neural network outputs the varying lighting surface at a last cascading encoder layer having a lowest spatial resolution.
11. The method of claim 9, wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional neural network having a plurality of cascading decoder layers of increasing spatial resolution, and wherein the representation of the image of the document output at each cascading encoder layer is input into the cascading decoder convolutional layer having a same resolution.
12. The method of claim 11 , wherein the encoder convolutional neural network outputs the image of the document with the non-varying lighting surface from a last cascading decoder layer having a highest spatial resolution.
13. The method of claim 11 , wherein the non-varying lighting surface is representatively integrated within the encoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.
14. The method of claim 11 , wherein the decoder and encoder convolutional neural networks correspond to one another as different parts of an overall machine learning model.
15. The method of claim 8, wherein the non-varying lighting surface is constructively represented during relighting of the image of the document as a reference blank one-color image.
PCT/US2020/039758 2020-06-26 2020-06-26 Document image relighting WO2021262187A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2020/039758 WO2021262187A1 (en) 2020-06-26 2020-06-26 Document image relighting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2020/039758 WO2021262187A1 (en) 2020-06-26 2020-06-26 Document image relighting

Publications (1)

Publication Number Publication Date
WO2021262187A1 true WO2021262187A1 (en) 2021-12-30

Family

ID=79281684

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2020/039758 WO2021262187A1 (en) 2020-06-26 2020-06-26 Document image relighting

Country Status (1)

Country Link
WO (1) WO2021262187A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018002533A1 (en) * 2016-06-30 2018-01-04 Fittingbox Method for concealing an object in an image or a video and associated augmented reality method
GB2572435A (en) * 2018-03-29 2019-10-02 Samsung Electronics Co Ltd Manipulating a face in an image

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018002533A1 (en) * 2016-06-30 2018-01-04 Fittingbox Method for concealing an object in an image or a video and associated augmented reality method
GB2572435A (en) * 2018-03-29 2019-10-02 Samsung Electronics Co Ltd Manipulating a face in an image

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIANCHENG SUN ET AL.: "Single Image Portrait Relighting", ACM TRANS. GRAPH., vol. 38, no. 4, July 2019 (2019-07-01), XP058452114, Retrieved from the Internet <URL:https://arxiv.org/abs/1905.00824> DOI: 10.1145/3306346.3323008 *

Similar Documents

Publication Publication Date Title
US7650041B2 (en) System and method for optical character recognition in an image
CN107316047B (en) Image processing apparatus, image processing method, and storage medium
JP5566811B2 (en) Deblurring and surveillance adaptive thresholding for image evaluation of printed and scanned documents
WO2006002009A3 (en) Document management system with enhanced intelligent document recognition capabilities
US11069068B2 (en) Image processing apparatus that performs multi-crop processing, method of generating image in units of documents by multi-crop processing and storage medium
JP2011045078A (en) Adaptive deblurring for camera-based document image processing
JP6755787B2 (en) Image processing equipment, image processing methods and programs
Zhang et al. Restoration of curved document images through 3D shape modeling
KR100691651B1 (en) Automatic Recognition of Characters on Structured Background by Combination of the Models of the Background and of the Characters
US20060215232A1 (en) Method and apparatus for processing selected images on image reproduction machines
US20220309275A1 (en) Extraction of segmentation masks for documents within captured image
Marne et al. Identification of optimal optical character recognition (OCR) engine for proposed system
CN111932462A (en) Training method and device for image degradation model, electronic equipment and storage medium
WO2022182353A1 (en) Captured document image enhancement
Ch et al. Optical character recognition on handheld devices
CN1941960A (en) Embedded scanning cell phone
WO2021262187A1 (en) Document image relighting
KR20220099672A (en) System of License Plate Recognition
KR20100011187A (en) Method of an image preprocessing for recognizing scene-text
Juvonen et al. Helsinki deblur challenge 2021: Description of photographic data
JP2008283705A (en) Image reading device and image reading method used in its device
KR20170026999A (en) Method for pre-processing image data, computer readable medium including set of commands for performing the same and optical object recognition appratus using the same
KR102458053B1 (en) Method of printing image on 3d object abd apparatus performing the same
Sharma et al. Data extraction from exam answer sheets using OCR with adaptive calibration of environmental threshold parameters
JPH04288692A (en) Image input device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20941906

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20941906

Country of ref document: EP

Kind code of ref document: A1