WO2021262187A1 - Document image relighting - Google Patents
Document image relighting Download PDFInfo
- Publication number
- WO2021262187A1 WO2021262187A1 PCT/US2020/039758 US2020039758W WO2021262187A1 WO 2021262187 A1 WO2021262187 A1 WO 2021262187A1 US 2020039758 W US2020039758 W US 2020039758W WO 2021262187 A1 WO2021262187 A1 WO 2021262187A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- document
- varying
- lighting surface
- neural network
- Prior art date
Links
- 230000007613 environmental effect Effects 0.000 claims abstract description 42
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000013527 convolutional neural network Methods 0.000 claims description 20
- 238000013500 data storage Methods 0.000 claims description 12
- 238000010801 machine learning Methods 0.000 claims description 5
- 230000003247 decreasing effect Effects 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 description 26
- 238000010586 diagram Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000005266 casting Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
- G06T5/94—Dynamic range modification of images or parts thereof based on local image properties, e.g. for local contrast enhancement
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/60—Image enhancement or restoration using machine learning, e.g. neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/16—Image preprocessing
- G06V30/164—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10141—Special mode during image acquisition
- G06T2207/10152—Varying illumination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/387—Composing, repositioning or otherwise geometrically modifying originals
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N1/00—Scanning, transmission or reproduction of documents or the like, e.g. facsimile transmission; Details thereof
- H04N1/46—Colour picture communication systems
- H04N1/56—Processing of colour picture signals
- H04N1/60—Colour correction or control
- H04N1/6083—Colour correction or control controlled by factors external to the apparatus
- H04N1/6086—Colour correction or control controlled by factors external to the apparatus by scene illuminant, i.e. conditions at the time of picture capture, e.g. flash, optical filter used, evening, cloud, daylight, artificial lighting, white point measurement, colour temperature
Definitions
- FIG. 1 is a diagram of an example neural network for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
- FIG. 2 is a diagram of the example neural network of FIG.1 in more detail, as a convolutional neural network.
- FIG. 3 is a diagram of an example non-transitory computer- readable data storage medium for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
- FIG. 4 is a block diagram of an example computing device that can capture an image of a document under varying environmental conditions and that can relight the document image as if it had been captured under non varying environmental lighting conditions.
- FIG. 5 is a flowchart of an example method for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured by relighting the image with a non-varying lighting surface.
- a physical document can be scanned as a digital image to convert the document to electronic form.
- dedicated scanning devices have been used to scan documents to generate images of the documents.
- Such dedicated scanning devices include sheetfed scanning devices, flatbed scanning devices, and document camera scanning devices.
- a dedicated scanning device can optimally light a document during scanning, so that the resultantly scanned image is largely if not completely free from artifacts that may otherwise result from non-optimal lighting conditions. This is because the scanning device is able to control the lighting conditions under which the image is scanned.
- a non- dedicated scanning device may capture an image of a document under varying environmental lighting conditions due to a variety of different factors.
- varying environmental lighting conditions may result from the external light incident to the document varying over the document surface, because of a light source being off-axis from the document, or because of other physical objects casting shadows on the document.
- the physical properties of the document itself can contribute to varying environmental lighting conditions, such as when the document has folds, creases, or is otherwise not perfectly flat.
- the angle at which the non- dedicated scanning device is positioned relative to the document during image capture can also contribute to varying environmental lighting conditions.
- Capturing an image of a document under varying lighting environmental conditions can imbue the captured image with undesirable artifacts.
- artifacts can include darkened areas within the image in correspondence with shadows discernibly or indiscernibly cast during image capture.
- Existing approaches for enhancing document images captured by non-dedicated scanning devices to remove artifacts from the scanned images are usually general purpose, and do not focus on artifacts resulting from varying environmental lighting conditions. The approaches thus may remove such artifacts with less than satisfactory results.
- Techniques described herein can remove artifacts within a captured image of a document that result from varying environmental lighting conditions.
- the image of the document can be relighted as if it had been captured under non-varying environmental lighting conditions, and thus under near-optimal if not optimal lighting conditions.
- a document image may have a varying lighting surface corresponding to the varying environmental lighting conditions under which the image was captured.
- the varying lighting surface can be removed from the document image prior to relighting the image with a non-varying lighting surface corresponding to non-varying environmental lighting conditions akin to those under which dedicated scanning devices scan documents.
- FIG. 1 shows an example neural network 100 for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
- the neural network 100 is more generally a machine learning model.
- the neural network 100 can include a varying lighting surface removal encoder network 102 and a relighting decoder network 104.
- the encoder and decoder networks 102 and 104 may thus be corresponding parts of the same neural network 100 (and thus parts of the same overall machine learning model), and may themselves each be considered a neural network.
- the neural network 100 may be implemented in one implementation similar to the technique described in the technical journal article T. Sun et al. , “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, Article 79, published in July 2019. This article describes a technique for relighting a portrait photograph with an input target light.
- the neural network 100 of FIG. 1 differs from the article’s described neural network in that the neural network 100 acts upon an image of a document, which the article does not describe. Unlike the article’s neural network, the neural network 100 removes artifacts introduced into an image due to the varying environmental lighting conditions under which the image was captured, which the article does not contemplate.
- a captured image 108 of a document with a varying lighting surface is input into the neural network 100, and the neural network 100 correspondingly outputs a relighted image 112 of the document with a non varying lighting surface.
- the image 108 may have been captured by an image sensor of a computing device like a smartphone, under varying environmental lighting conditions.
- the lighting surface of the image 108 is the data integrated within the image 108 that results from the image 108 of the document having been captured under varying environmental lighting conditions, and thus is a varying lighting surface corresponding to these varying environmental lighting conditions.
- the captured image 108 is specifically input into the encoder network 102 of the neural network 100.
- the encoder network 102 encodes a representation 109 of the image features of the image 108 of the document that does not include the varying lighting surface, and which is passed to the decoder network 104.
- the representation 109 of the image 108 can therefore be considered a representation of the image features of the image 108 of the document, such as a vector of image descriptions of the image 108.
- the encoder network 102 thus in effect extracts the varying lighting surface 110 from the captured image 108, which can be considered as being output by the encoder network 102.
- the extracted varying lighting surface 110 of the image 108 is not subsequently used within the neural network 100, and therefore can be discarded.
- the extracted varying lighting surface 110 may thus be employed just during the training of the neural network 100.
- the decoder network 104 decodes the encoded representation of the image 108 passed by the encoder 102 to regenerate the image 108 of the document as the image 112.
- the decoder network 104 relights the image 108 with a non-varying lighting surface 106. Therefore, the image 112 is a relighted image, corresponding to the captured image 108 with the varying light surface 110 removed and with the non-varying lighting surface 106 added.
- the non varying lighting surface 108 corresponds to non-varying lighting conditions, such as those under which a dedicated scanning device may scan images of documents.
- the non-varying lighting surface 108 may not be separately input into the decoder network 104, which is another way by which the neural network 100 differs from the neural network of the article referenced above.
- the non-varying lighting surface 108 can instead be integrated within the decoder network 104 itself.
- the non-varying lighting surface 108 may be constructively represented as a reference blank one-color image.
- Such a reference blank one-color image may be conceptualized as an ideal blank sheet of paper of the same size as the document captured as the image 108, with white pixels at non-varying maximum brightness and contrast.
- FIG. 2 shows the neural network 100 in more detail, as a convolutional neural network.
- a convolutional neural network is a type of deep neural network, which can be employed in the context of image analysis and processing.
- the encoder and decoder networks 102 and 104 may thus also be considered convolutional neural networks.
- the encoder network 102 has cascading encoder layers 202A, 202B, . . ., 202N of decreasing spatial resolution, which are collectively referenced as the layers 202.
- the decoder network 104 similarly has cascading decoder layers 204A, . . 204M, 204N of increasing spatial resolution, which are collectively referenced as the layers 204.
- the layers 202 and 204 can include convolutional layers, batch normalization layers, and activation layers, for instance.
- the captured image 108 is input into the encoder layer 202A corresponding to the highest spatial resolution.
- the encoder layers 202 sequentially process the captured image 108 in cascading fashion, with the image 108 downsampled in spatial resolution from one layer 202 to the next layer 202 as indicated by arrow 206.
- Each encoder layer 202 encodes the representation 109 of the features of the image 108 in correspondence with its spatial resolution.
- Each encoder layer 202 further passes the image 108 as downsampled to the next layer 202. Processing by the encoder network 102 therefore occurs from the layer 202A at maximum resolution to the layer 202N at minimum resolution, at which point the extracted varying lighting surface 110 (at all resolutions) may be output and discarded.
- the representation 109 of the image features is distributively input into the decoder network 104 over its decoder layers 204.
- Each decoder layer 204 is input the representation 109 from the encoder layer 202 at the same spatial resolution.
- the decoder layers 204A, . . ., 204M, 204N therefore respectively correspond to the encoder layers 202N, . . ., 202B,
- the decoder layers 204 regenerate the captured image 108 from which the varying lighting surface 110 has been removed and to which a non-varying lighting surface has been added, as the relighted image 112.
- the decoder layers 204 sequentially generate the relighted image 112 in cascading fashion, with the image 112 upsampled in spatial resolution from one layer 204 to the next layer 204 as indicated by arrow 210.
- the non-varying lighting surface is in effect representatively integrated within the decoder layers 204 as respective constants 208A,
- each decoder layer 204 is hardcoded to generate the relighted image 112 in correspondence with its spatial resolution such that the captured image 108 is relighted by a non-varying lighting surface.
- Each decoder layer 204 decodes the representation 109 of the image features, and passes the relighted image 112 as upsampled to the next layer 204. Processing by the decoder network 104 occurs from the layer 204A at minimum resolution to the layer 204N at maximum resolution, with the generated relighted image 112 having the same resolution as the captured image 108 output at the layer 204N.
- FIG. 3 shows an example non-transitory computer-readable data storage medium 300.
- the computer-readable data storage medium stores program code 302 executable by a computing device, such as a smartphone or other mobile computing device, to perform processing.
- the processing includes causing an image sensor to capture an image of a document that is under varying environmental lighting conditions (304).
- the image sensor may be part of the same computing device that is executing the program code 302.
- the image sensor may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
- the processing includes removing artifacts within the document image that result from the varying environmental light conditions under which the image was captured (306), by relighting the image as if it had been captured under non-varying environmental lighting conditions.
- the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured may be removed from the image, and the image relighted with a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
- a machine-learning model like a neural network, such as a convolutional neural network, may be employed, as has been described.
- FIG. 4 shows an example computing device 400.
- the computing device 400 may be a smartphone or other mobile computing device, for instance.
- the computing device 400 includes an image sensor 402 and image enhance hardware 404.
- the image enhance hardware 404 may include a processor and a non-transitory computer-readable data storage medium storing program code that the processor executes.
- the processor may be a general-purpose processor separate from the data storage medium.
- the processor may instead be a special-purpose processor integrated with the data storage medium, as is the case with an application-specific integrated circuit (IC), as one example.
- IC application-specific integrated circuit
- the image sensor 402 captures an image of a document under varying environmental conditions.
- the image sensor 402 may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
- the image enhance hardware 404 relights the captured image of the document as if the image had been captured under non-varying environmental lighting conditions.
- the image enhance hardware 404 may employ a machine-learning model like a neural network, such as a convolutional neural network, as has been described. The image enhance hardware 404 may thus remove the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured, and then add a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
- FIG. 5 shows an example method 500.
- the method 500 can be performed by a processor, such as the (general- or special-purpose) processor of a computing device like a smartphone or other mobile computing device.
- the method 500 includes receiving an image of a document having a varying light surface (502).
- the document image may have been captured under varying environmental lighting conditions to which the varying lighting surface corresponds.
- the method 500 includes removing the varying lighting surface from the document image (504), such as by using an encoder neural network like an encoder convolutional neural network as has been described.
- the method 500 includes relighting the image document from which the varying lighting surface has been removed with a non-varying lighting surface (506), such as by using a decoder neural network like a decoder convolutional neural network as has been described.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Facsimile Image Signal Circuits (AREA)
- Image Processing (AREA)
Abstract
A processor receives an image of a document that may have been captured under varying lighting conditions has a varying lighting surface. The processor removes the varying lighting surface removed from the varying lighting surface from the image of the document. The process relights the image of the document with a non-varying lighting surface. The image of the document may thus be relighted as if the image of the document had been captured under non-varying environmental lighting conditions.
Description
DOCUMENT IMAGE RELIGHTING
BACKGROUND
[0001] While information is increasingly communicated in electronic form with the advent of modern computing and networking technologies, physical documents, such as printed and handwritten sheets of paper and other physical media, are still often exchanged. Such documents can be converted to electronic form by a process known as optical scanning. Once a document has been scanned as a digital image, the resulting image may be archived, or may undergo further processing to extract information contained within the document image so that the information is more usable. For example, the document image may undergo optical character recognition (OCR), which converts the image into text that can be edited, searched, and stored more compactly than the image itself.
BRIEF DESCRIPTION OF THE DRAWINGS [0002] FIG. 1 is a diagram of an example neural network for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
[0003] FIG. 2 is a diagram of the example neural network of FIG.1 in more detail, as a convolutional neural network. [0004] FIG. 3 is a diagram of an example non-transitory computer- readable data storage medium for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.
[0005] FIG. 4 is a block diagram of an example computing device that can capture an image of a document under varying environmental conditions and that can relight the document image as if it had been captured under non varying environmental lighting conditions.
[0006] FIG. 5 is a flowchart of an example method for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured by relighting the image with a non-varying lighting surface.
DETAILED DESCRIPTION
[0007] As noted in the background, a physical document can be scanned as a digital image to convert the document to electronic form. Traditionally, dedicated scanning devices have been used to scan documents to generate images of the documents. Such dedicated scanning devices include sheetfed scanning devices, flatbed scanning devices, and document camera scanning devices. A dedicated scanning device can optimally light a document during scanning, so that the resultantly scanned image is largely if not completely free from artifacts that may otherwise result from non-optimal lighting conditions. This is because the scanning device is able to control the lighting conditions under which the image is scanned.
[0008] Flowever, with the near ubiquitousness of smartphones and other usually mobile computing devices that include cameras and other types of image sensors, documents are often scanned with such non-dedicated scanning devices. A difficulty with scanning documents using a non- dedicated scanning device is that the document images are generally captured under non-optimal lighting conditions. Stated another way, a non-
dedicated scanning device may capture an image of a document under varying environmental lighting conditions due to a variety of different factors. [0009] For example, varying environmental lighting conditions may result from the external light incident to the document varying over the document surface, because of a light source being off-axis from the document, or because of other physical objects casting shadows on the document. The physical properties of the document itself can contribute to varying environmental lighting conditions, such as when the document has folds, creases, or is otherwise not perfectly flat. The angle at which the non- dedicated scanning device is positioned relative to the document during image capture can also contribute to varying environmental lighting conditions. [0010] Capturing an image of a document under varying lighting environmental conditions can imbue the captured image with undesirable artifacts. For example, such artifacts can include darkened areas within the image in correspondence with shadows discernibly or indiscernibly cast during image capture. Existing approaches for enhancing document images captured by non-dedicated scanning devices to remove artifacts from the scanned images are usually general purpose, and do not focus on artifacts resulting from varying environmental lighting conditions. The approaches thus may remove such artifacts with less than satisfactory results.
[0011] Techniques described herein can remove artifacts within a captured image of a document that result from varying environmental lighting conditions. The image of the document can be relighted as if it had been captured under non-varying environmental lighting conditions, and thus under near-optimal if not optimal lighting conditions. A document image may have a
varying lighting surface corresponding to the varying environmental lighting conditions under which the image was captured. The varying lighting surface can be removed from the document image prior to relighting the image with a non-varying lighting surface corresponding to non-varying environmental lighting conditions akin to those under which dedicated scanning devices scan documents.
[0012] FIG. 1 shows an example neural network 100 for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured. The neural network 100 is more generally a machine learning model. The neural network 100 can include a varying lighting surface removal encoder network 102 and a relighting decoder network 104. The encoder and decoder networks 102 and 104 may thus be corresponding parts of the same neural network 100 (and thus parts of the same overall machine learning model), and may themselves each be considered a neural network.
[0013] The neural network 100 may be implemented in one implementation similar to the technique described in the technical journal article T. Sun et al. , “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, Article 79, published in July 2019. This article describes a technique for relighting a portrait photograph with an input target light. The neural network 100 of FIG. 1 differs from the article’s described neural network in that the neural network 100 acts upon an image of a document, which the article does not describe. Unlike the article’s neural network, the neural network 100 removes artifacts introduced into an image
due to the varying environmental lighting conditions under which the image was captured, which the article does not contemplate.
[0014] A captured image 108 of a document with a varying lighting surface is input into the neural network 100, and the neural network 100 correspondingly outputs a relighted image 112 of the document with a non varying lighting surface. The image 108 may have been captured by an image sensor of a computing device like a smartphone, under varying environmental lighting conditions. The lighting surface of the image 108 is the data integrated within the image 108 that results from the image 108 of the document having been captured under varying environmental lighting conditions, and thus is a varying lighting surface corresponding to these varying environmental lighting conditions.
[0015] The captured image 108 is specifically input into the encoder network 102 of the neural network 100. The encoder network 102 encodes a representation 109 of the image features of the image 108 of the document that does not include the varying lighting surface, and which is passed to the decoder network 104. The representation 109 of the image 108 can therefore be considered a representation of the image features of the image 108 of the document, such as a vector of image descriptions of the image 108. The encoder network 102 thus in effect extracts the varying lighting surface 110 from the captured image 108, which can be considered as being output by the encoder network 102. However, the extracted varying lighting surface 110 of the image 108 is not subsequently used within the neural network 100, and therefore can be discarded. The extracted varying lighting surface 110 may thus be employed just during the training of the neural network 100.
[0016] The decoder network 104 decodes the encoded representation of the image 108 passed by the encoder 102 to regenerate the image 108 of the document as the image 112. In generating the image 112 of the document, the decoder network 104 relights the image 108 with a non-varying lighting surface 106. Therefore, the image 112 is a relighted image, corresponding to the captured image 108 with the varying light surface 110 removed and with the non-varying lighting surface 106 added. The non varying lighting surface 108 corresponds to non-varying lighting conditions, such as those under which a dedicated scanning device may scan images of documents.
[0017] The non-varying lighting surface 108 may not be separately input into the decoder network 104, which is another way by which the neural network 100 differs from the neural network of the article referenced above. The non-varying lighting surface 108 can instead be integrated within the decoder network 104 itself. The non-varying lighting surface 108 may be constructively represented as a reference blank one-color image. Such a reference blank one-color image may be conceptualized as an ideal blank sheet of paper of the same size as the document captured as the image 108, with white pixels at non-varying maximum brightness and contrast.
[0018] FIG. 2 shows the neural network 100 in more detail, as a convolutional neural network. A convolutional neural network is a type of deep neural network, which can be employed in the context of image analysis and processing. The encoder and decoder networks 102 and 104 may thus also be considered convolutional neural networks. The encoder network 102 has cascading encoder layers 202A, 202B, . . ., 202N of decreasing spatial
resolution, which are collectively referenced as the layers 202. The decoder network 104 similarly has cascading decoder layers 204A, . . 204M, 204N of increasing spatial resolution, which are collectively referenced as the layers 204. The layers 202 and 204 can include convolutional layers, batch normalization layers, and activation layers, for instance.
[0019] The captured image 108 is input into the encoder layer 202A corresponding to the highest spatial resolution. The encoder layers 202 sequentially process the captured image 108 in cascading fashion, with the image 108 downsampled in spatial resolution from one layer 202 to the next layer 202 as indicated by arrow 206. Each encoder layer 202 encodes the representation 109 of the features of the image 108 in correspondence with its spatial resolution. Each encoder layer 202 further passes the image 108 as downsampled to the next layer 202. Processing by the encoder network 102 therefore occurs from the layer 202A at maximum resolution to the layer 202N at minimum resolution, at which point the extracted varying lighting surface 110 (at all resolutions) may be output and discarded.
[0020] The representation 109 of the image features is distributively input into the decoder network 104 over its decoder layers 204. Each decoder layer 204 is input the representation 109 from the encoder layer 202 at the same spatial resolution. The decoder layers 204A, . . ., 204M, 204N therefore respectively correspond to the encoder layers 202N, . . ., 202B,
202A in spatial resolution. The decoder layers 204 regenerate the captured image 108 from which the varying lighting surface 110 has been removed and to which a non-varying lighting surface has been added, as the relighted image 112. The decoder layers 204 sequentially generate the relighted image
112 in cascading fashion, with the image 112 upsampled in spatial resolution from one layer 204 to the next layer 204 as indicated by arrow 210.
[0021] The non-varying lighting surface is in effect representatively integrated within the decoder layers 204 as respective constants 208A,
208M, 208N, collectively referenced as the constants 208. In other words, each decoder layer 204 is hardcoded to generate the relighted image 112 in correspondence with its spatial resolution such that the captured image 108 is relighted by a non-varying lighting surface. Each decoder layer 204 decodes the representation 109 of the image features, and passes the relighted image 112 as upsampled to the next layer 204. Processing by the decoder network 104 occurs from the layer 204A at minimum resolution to the layer 204N at maximum resolution, with the generated relighted image 112 having the same resolution as the captured image 108 output at the layer 204N.
[0022] FIG. 3 shows an example non-transitory computer-readable data storage medium 300. The computer-readable data storage medium stores program code 302 executable by a computing device, such as a smartphone or other mobile computing device, to perform processing. The processing includes causing an image sensor to capture an image of a document that is under varying environmental lighting conditions (304). The image sensor may be part of the same computing device that is executing the program code 302. The image sensor may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.
[0023] The processing includes removing artifacts within the document image that result from the varying environmental light conditions under which
the image was captured (306), by relighting the image as if it had been captured under non-varying environmental lighting conditions. The varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured may be removed from the image, and the image relighted with a non-varying lighting surface corresponding to the non-varying environmental lighting conditions. For instance, a machine-learning model like a neural network, such as a convolutional neural network, may be employed, as has been described.
[0024] FIG. 4 shows an example computing device 400. The computing device 400 may be a smartphone or other mobile computing device, for instance. The computing device 400 includes an image sensor 402 and image enhance hardware 404. The image enhance hardware 404 may include a processor and a non-transitory computer-readable data storage medium storing program code that the processor executes. The processor may be a general-purpose processor separate from the data storage medium. The processor may instead be a special-purpose processor integrated with the data storage medium, as is the case with an application-specific integrated circuit (IC), as one example.
[0025] The image sensor 402 captures an image of a document under varying environmental conditions. The image sensor 402 may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do. The image enhance hardware 404 relights the captured image of the document as if the image had been captured under non-varying environmental lighting conditions. The
image enhance hardware 404 may employ a machine-learning model like a neural network, such as a convolutional neural network, as has been described. The image enhance hardware 404 may thus remove the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured, and then add a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.
[0026] FIG. 5 shows an example method 500. The method 500 can be performed by a processor, such as the (general- or special-purpose) processor of a computing device like a smartphone or other mobile computing device. The method 500 includes receiving an image of a document having a varying light surface (502). The document image may have been captured under varying environmental lighting conditions to which the varying lighting surface corresponds. The method 500 includes removing the varying lighting surface from the document image (504), such as by using an encoder neural network like an encoder convolutional neural network as has been described. The method 500 includes relighting the image document from which the varying lighting surface has been removed with a non-varying lighting surface (506), such as by using a decoder neural network like a decoder convolutional neural network as has been described.
[0027] Techniques have been described herein for removing artifacts within an image of a document captured under varying environmental lighting conditions. The artifacts that are removed are those resulting from these varying environmental lighting conditions. The described techniques have been demonstrated to better enhance such a scanned document in this
respect, as compared to other scanned document artifact removal techniques that do not specifically consider the environmental lighting conditions under which an image of the document was captured.
Claims
1. A non-transitory computer-readable data storage medium comprising program code executable by a processor of a computing device to perform processing comprising: causing an image sensor of the computing device to capture an image of a document under varying environmental lighting conditions; and removing artifacts within the image of the document resulting from the varying environmental lighting conditions by relighting the image of the document as if the image of the document had been captured under non- varying environmental lighting conditions.
2. The non-transitory computer-readable data storage medium of claim 1 , wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions comprises: removing a varying lighting surface from the image of the document, the varying lighting surface corresponding to the varying environmental lighting conditions under which the image of the document was captured by the image sensor.
3. The non-transitory computer-readable data storage medium of claim 2, wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions further comprises:
relighting the image of the document, from which the varying lighting surface has been removed, with a non-varying lighting surface.
4. The non-transitory computer-readable data storage medium of claim 3, wherein the non-varying lighting surface is constructively represented as a reference blank one-color image during relighting of the image of the document with the non-varying lighting surface.
5. The non-transitory computer-readable data storage medium of claim 3, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, and wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional neural network having a plurality of cascading decoder layers of increasing spatial resolution.
6. The non-transitory computer-readable data storage medium of claim 5, wherein the non-varying lighting surface is representatively integrated within the decoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.
7. A computing device comprising: an image sensor to capture an image of a document under varying environmental lighting conditions; and image enhance hardware to relight the image of the document as if the
image of the document had been captured under non-varying environmental lighting conditions.
8. A method comprising: receiving, by a processor, an image of a document having a varying lighting surface; removing, by the processor, the varying lighting surface from the image of the document; and relighting, by the processor, the image of the document with a non varying lighting surface.
9. The method of claim 8, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, the image of the document being input into a first cascading encoder layer having a highest spatial resolution, and wherein at each cascading encoder convolutional layer, the encoder convolutional neural network outputs a representation of the image of the document from which the varying lighting surface.
10. The method of claim 9, wherein the encoder convolutional neural network outputs the varying lighting surface at a last cascading encoder layer having a lowest spatial resolution.
11. The method of claim 9, wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional
neural network having a plurality of cascading decoder layers of increasing spatial resolution, and wherein the representation of the image of the document output at each cascading encoder layer is input into the cascading decoder convolutional layer having a same resolution.
12. The method of claim 11 , wherein the encoder convolutional neural network outputs the image of the document with the non-varying lighting surface from a last cascading decoder layer having a highest spatial resolution.
13. The method of claim 11 , wherein the non-varying lighting surface is representatively integrated within the encoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.
14. The method of claim 11 , wherein the decoder and encoder convolutional neural networks correspond to one another as different parts of an overall machine learning model.
15. The method of claim 8, wherein the non-varying lighting surface is constructively represented during relighting of the image of the document as a reference blank one-color image.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2020/039758 WO2021262187A1 (en) | 2020-06-26 | 2020-06-26 | Document image relighting |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/US2020/039758 WO2021262187A1 (en) | 2020-06-26 | 2020-06-26 | Document image relighting |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021262187A1 true WO2021262187A1 (en) | 2021-12-30 |
Family
ID=79281684
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2020/039758 WO2021262187A1 (en) | 2020-06-26 | 2020-06-26 | Document image relighting |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2021262187A1 (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018002533A1 (en) * | 2016-06-30 | 2018-01-04 | Fittingbox | Method for concealing an object in an image or a video and associated augmented reality method |
GB2572435A (en) * | 2018-03-29 | 2019-10-02 | Samsung Electronics Co Ltd | Manipulating a face in an image |
-
2020
- 2020-06-26 WO PCT/US2020/039758 patent/WO2021262187A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018002533A1 (en) * | 2016-06-30 | 2018-01-04 | Fittingbox | Method for concealing an object in an image or a video and associated augmented reality method |
GB2572435A (en) * | 2018-03-29 | 2019-10-02 | Samsung Electronics Co Ltd | Manipulating a face in an image |
Non-Patent Citations (1)
Title |
---|
TIANCHENG SUN ET AL.: "Single Image Portrait Relighting", ACM TRANS. GRAPH., vol. 38, no. 4, July 2019 (2019-07-01), XP058452114, Retrieved from the Internet <URL:https://arxiv.org/abs/1905.00824> DOI: 10.1145/3306346.3323008 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7650041B2 (en) | System and method for optical character recognition in an image | |
CN107316047B (en) | Image processing apparatus, image processing method, and storage medium | |
JP5566811B2 (en) | Deblurring and surveillance adaptive thresholding for image evaluation of printed and scanned documents | |
WO2006002009A3 (en) | Document management system with enhanced intelligent document recognition capabilities | |
US11069068B2 (en) | Image processing apparatus that performs multi-crop processing, method of generating image in units of documents by multi-crop processing and storage medium | |
JP2011045078A (en) | Adaptive deblurring for camera-based document image processing | |
JP6755787B2 (en) | Image processing equipment, image processing methods and programs | |
Zhang et al. | Restoration of curved document images through 3D shape modeling | |
KR100691651B1 (en) | Automatic Recognition of Characters on Structured Background by Combination of the Models of the Background and of the Characters | |
US20060215232A1 (en) | Method and apparatus for processing selected images on image reproduction machines | |
US20220309275A1 (en) | Extraction of segmentation masks for documents within captured image | |
Marne et al. | Identification of optimal optical character recognition (OCR) engine for proposed system | |
CN111932462A (en) | Training method and device for image degradation model, electronic equipment and storage medium | |
WO2022182353A1 (en) | Captured document image enhancement | |
Ch et al. | Optical character recognition on handheld devices | |
CN1941960A (en) | Embedded scanning cell phone | |
WO2021262187A1 (en) | Document image relighting | |
KR20220099672A (en) | System of License Plate Recognition | |
KR20100011187A (en) | Method of an image preprocessing for recognizing scene-text | |
Juvonen et al. | Helsinki deblur challenge 2021: Description of photographic data | |
JP2008283705A (en) | Image reading device and image reading method used in its device | |
KR20170026999A (en) | Method for pre-processing image data, computer readable medium including set of commands for performing the same and optical object recognition appratus using the same | |
KR102458053B1 (en) | Method of printing image on 3d object abd apparatus performing the same | |
Sharma et al. | Data extraction from exam answer sheets using OCR with adaptive calibration of environmental threshold parameters | |
JPH04288692A (en) | Image input device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20941906 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20941906 Country of ref document: EP Kind code of ref document: A1 |