WO2021262187A1

WO2021262187A1 - Document image relighting

Info

Publication number: WO2021262187A1
Application number: PCT/US2020/039758
Authority: WO
Inventors: Lucas Nedel KIRSTEN; Ricardo RIBANI
Original assignee: Hewlett-Packard Development Company, L.P.
Priority date: 2020-06-26
Filing date: 2020-06-26
Publication date: 2021-12-30

Abstract

A processor receives an image of a document that may have been captured under varying lighting conditions has a varying lighting surface. The processor removes the varying lighting surface removed from the varying lighting surface from the image of the document. The process relights the image of the document with a non-varying lighting surface. The image of the document may thus be relighted as if the image of the document had been captured under non-varying environmental lighting conditions.

Description

DOCUMENT IMAGE RELIGHTING

BACKGROUND

[0001] While information is increasingly communicated in electronic form with the advent of modern computing and networking technologies, physical documents, such as printed and handwritten sheets of paper and other physical media, are still often exchanged. Such documents can be converted to electronic form by a process known as optical scanning. Once a document has been scanned as a digital image, the resulting image may be archived, or may undergo further processing to extract information contained within the document image so that the information is more usable. For example, the document image may undergo optical character recognition (OCR), which converts the image into text that can be edited, searched, and stored more compactly than the image itself.

BRIEF DESCRIPTION OF THE DRAWINGS [0002] FIG. 1 is a diagram of an example neural network for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured.

[0003] FIG. 2 is a diagram of the example neural network of FIG.1 in more detail, as a convolutional neural network. [0004] FIG. 3 is a diagram of an example non-transitory computer- readable data storage medium for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured. [0005] FIG. 4 is a block diagram of an example computing device that can capture an image of a document under varying environmental conditions and that can relight the document image as if it had been captured under non varying environmental lighting conditions.

[0006] FIG. 5 is a flowchart of an example method for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured by relighting the image with a non-varying lighting surface.

DETAILED DESCRIPTION

[0007] As noted in the background, a physical document can be scanned as a digital image to convert the document to electronic form. Traditionally, dedicated scanning devices have been used to scan documents to generate images of the documents. Such dedicated scanning devices include sheetfed scanning devices, flatbed scanning devices, and document camera scanning devices. A dedicated scanning device can optimally light a document during scanning, so that the resultantly scanned image is largely if not completely free from artifacts that may otherwise result from non-optimal lighting conditions. This is because the scanning device is able to control the lighting conditions under which the image is scanned.

[0008] Flowever, with the near ubiquitousness of smartphones and other usually mobile computing devices that include cameras and other types of image sensors, documents are often scanned with such non-dedicated scanning devices. A difficulty with scanning documents using a non- dedicated scanning device is that the document images are generally captured under non-optimal lighting conditions. Stated another way, a non- dedicated scanning device may capture an image of a document under varying environmental lighting conditions due to a variety of different factors. [0009] For example, varying environmental lighting conditions may result from the external light incident to the document varying over the document surface, because of a light source being off-axis from the document, or because of other physical objects casting shadows on the document. The physical properties of the document itself can contribute to varying environmental lighting conditions, such as when the document has folds, creases, or is otherwise not perfectly flat. The angle at which the non- dedicated scanning device is positioned relative to the document during image capture can also contribute to varying environmental lighting conditions. [0010] Capturing an image of a document under varying lighting environmental conditions can imbue the captured image with undesirable artifacts. For example, such artifacts can include darkened areas within the image in correspondence with shadows discernibly or indiscernibly cast during image capture. Existing approaches for enhancing document images captured by non-dedicated scanning devices to remove artifacts from the scanned images are usually general purpose, and do not focus on artifacts resulting from varying environmental lighting conditions. The approaches thus may remove such artifacts with less than satisfactory results.

[0011] Techniques described herein can remove artifacts within a captured image of a document that result from varying environmental lighting conditions. The image of the document can be relighted as if it had been captured under non-varying environmental lighting conditions, and thus under near-optimal if not optimal lighting conditions. A document image may have a varying lighting surface corresponding to the varying environmental lighting conditions under which the image was captured. The varying lighting surface can be removed from the document image prior to relighting the image with a non-varying lighting surface corresponding to non-varying environmental lighting conditions akin to those under which dedicated scanning devices scan documents.

[0012] FIG. 1 shows an example neural network 100 for removing artifacts within a document image resulting from varying environmental conditions under which the image was captured. The neural network 100 is more generally a machine learning model. The neural network 100 can include a varying lighting surface removal encoder network 102 and a relighting decoder network 104. The encoder and decoder networks 102 and 104 may thus be corresponding parts of the same neural network 100 (and thus parts of the same overall machine learning model), and may themselves each be considered a neural network.

[0013] The neural network 100 may be implemented in one implementation similar to the technique described in the technical journal article T. Sun et al. , “Single Image Portrait Relighting,” ACM Transactions on Graphics, vol. 38, no. 4, Article 79, published in July 2019. This article describes a technique for relighting a portrait photograph with an input target light. The neural network 100 of FIG. 1 differs from the article’s described neural network in that the neural network 100 acts upon an image of a document, which the article does not describe. Unlike the article’s neural network, the neural network 100 removes artifacts introduced into an image due to the varying environmental lighting conditions under which the image was captured, which the article does not contemplate.

[0014] A captured image 108 of a document with a varying lighting surface is input into the neural network 100, and the neural network 100 correspondingly outputs a relighted image 112 of the document with a non varying lighting surface. The image 108 may have been captured by an image sensor of a computing device like a smartphone, under varying environmental lighting conditions. The lighting surface of the image 108 is the data integrated within the image 108 that results from the image 108 of the document having been captured under varying environmental lighting conditions, and thus is a varying lighting surface corresponding to these varying environmental lighting conditions.

[0015] The captured image 108 is specifically input into the encoder network 102 of the neural network 100. The encoder network 102 encodes a representation 109 of the image features of the image 108 of the document that does not include the varying lighting surface, and which is passed to the decoder network 104. The representation 109 of the image 108 can therefore be considered a representation of the image features of the image 108 of the document, such as a vector of image descriptions of the image 108. The encoder network 102 thus in effect extracts the varying lighting surface 110 from the captured image 108, which can be considered as being output by the encoder network 102. However, the extracted varying lighting surface 110 of the image 108 is not subsequently used within the neural network 100, and therefore can be discarded. The extracted varying lighting surface 110 may thus be employed just during the training of the neural network 100. [0016] The decoder network 104 decodes the encoded representation of the image 108 passed by the encoder 102 to regenerate the image 108 of the document as the image 112. In generating the image 112 of the document, the decoder network 104 relights the image 108 with a non-varying lighting surface 106. Therefore, the image 112 is a relighted image, corresponding to the captured image 108 with the varying light surface 110 removed and with the non-varying lighting surface 106 added. The non varying lighting surface 108 corresponds to non-varying lighting conditions, such as those under which a dedicated scanning device may scan images of documents.

[0017] The non-varying lighting surface 108 may not be separately input into the decoder network 104, which is another way by which the neural network 100 differs from the neural network of the article referenced above. The non-varying lighting surface 108 can instead be integrated within the decoder network 104 itself. The non-varying lighting surface 108 may be constructively represented as a reference blank one-color image. Such a reference blank one-color image may be conceptualized as an ideal blank sheet of paper of the same size as the document captured as the image 108, with white pixels at non-varying maximum brightness and contrast.

[0018] FIG. 2 shows the neural network 100 in more detail, as a convolutional neural network. A convolutional neural network is a type of deep neural network, which can be employed in the context of image analysis and processing. The encoder and decoder networks 102 and 104 may thus also be considered convolutional neural networks. The encoder network 102 has cascading encoder layers 202A, 202B, . . ., 202N of decreasing spatial resolution, which are collectively referenced as the layers 202. The decoder network 104 similarly has cascading decoder layers 204A, . . 204M, 204N of increasing spatial resolution, which are collectively referenced as the layers 204. The layers 202 and 204 can include convolutional layers, batch normalization layers, and activation layers, for instance.

[0019] The captured image 108 is input into the encoder layer 202A corresponding to the highest spatial resolution. The encoder layers 202 sequentially process the captured image 108 in cascading fashion, with the image 108 downsampled in spatial resolution from one layer 202 to the next layer 202 as indicated by arrow 206. Each encoder layer 202 encodes the representation 109 of the features of the image 108 in correspondence with its spatial resolution. Each encoder layer 202 further passes the image 108 as downsampled to the next layer 202. Processing by the encoder network 102 therefore occurs from the layer 202A at maximum resolution to the layer 202N at minimum resolution, at which point the extracted varying lighting surface 110 (at all resolutions) may be output and discarded.

[0020] The representation 109 of the image features is distributively input into the decoder network 104 over its decoder layers 204. Each decoder layer 204 is input the representation 109 from the encoder layer 202 at the same spatial resolution. The decoder layers 204A, . . ., 204M, 204N therefore respectively correspond to the encoder layers 202N, . . ., 202B,

202A in spatial resolution. The decoder layers 204 regenerate the captured image 108 from which the varying lighting surface 110 has been removed and to which a non-varying lighting surface has been added, as the relighted image 112. The decoder layers 204 sequentially generate the relighted image 112 in cascading fashion, with the image 112 upsampled in spatial resolution from one layer 204 to the next layer 204 as indicated by arrow 210.

[0021] The non-varying lighting surface is in effect representatively integrated within the decoder layers 204 as respective constants 208A,

208M, 208N, collectively referenced as the constants 208. In other words, each decoder layer 204 is hardcoded to generate the relighted image 112 in correspondence with its spatial resolution such that the captured image 108 is relighted by a non-varying lighting surface. Each decoder layer 204 decodes the representation 109 of the image features, and passes the relighted image 112 as upsampled to the next layer 204. Processing by the decoder network 104 occurs from the layer 204A at minimum resolution to the layer 204N at maximum resolution, with the generated relighted image 112 having the same resolution as the captured image 108 output at the layer 204N.

[0022] FIG. 3 shows an example non-transitory computer-readable data storage medium 300. The computer-readable data storage medium stores program code 302 executable by a computing device, such as a smartphone or other mobile computing device, to perform processing. The processing includes causing an image sensor to capture an image of a document that is under varying environmental lighting conditions (304). The image sensor may be part of the same computing device that is executing the program code 302. The image sensor may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do.

[0023] The processing includes removing artifacts within the document image that result from the varying environmental light conditions under which the image was captured (306), by relighting the image as if it had been captured under non-varying environmental lighting conditions. The varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured may be removed from the image, and the image relighted with a non-varying lighting surface corresponding to the non-varying environmental lighting conditions. For instance, a machine-learning model like a neural network, such as a convolutional neural network, may be employed, as has been described.

[0024] FIG. 4 shows an example computing device 400. The computing device 400 may be a smartphone or other mobile computing device, for instance. The computing device 400 includes an image sensor 402 and image enhance hardware 404. The image enhance hardware 404 may include a processor and a non-transitory computer-readable data storage medium storing program code that the processor executes. The processor may be a general-purpose processor separate from the data storage medium. The processor may instead be a special-purpose processor integrated with the data storage medium, as is the case with an application-specific integrated circuit (IC), as one example.

[0025] The image sensor 402 captures an image of a document under varying environmental conditions. The image sensor 402 may capture the document image as a whole, as opposed to on a line-by-line basis as sheetfed and flatbed dedicated scanning devices do. The image enhance hardware 404 relights the captured image of the document as if the image had been captured under non-varying environmental lighting conditions. The image enhance hardware 404 may employ a machine-learning model like a neural network, such as a convolutional neural network, as has been described. The image enhance hardware 404 may thus remove the varying lighting surface of the document image corresponding to the varying environmental lighting conditions under which the image was captured, and then add a non-varying lighting surface corresponding to the non-varying environmental lighting conditions.

[0026] FIG. 5 shows an example method 500. The method 500 can be performed by a processor, such as the (general- or special-purpose) processor of a computing device like a smartphone or other mobile computing device. The method 500 includes receiving an image of a document having a varying light surface (502). The document image may have been captured under varying environmental lighting conditions to which the varying lighting surface corresponds. The method 500 includes removing the varying lighting surface from the document image (504), such as by using an encoder neural network like an encoder convolutional neural network as has been described. The method 500 includes relighting the image document from which the varying lighting surface has been removed with a non-varying lighting surface (506), such as by using a decoder neural network like a decoder convolutional neural network as has been described.

[0027] Techniques have been described herein for removing artifacts within an image of a document captured under varying environmental lighting conditions. The artifacts that are removed are those resulting from these varying environmental lighting conditions. The described techniques have been demonstrated to better enhance such a scanned document in this respect, as compared to other scanned document artifact removal techniques that do not specifically consider the environmental lighting conditions under which an image of the document was captured.

Claims

We claim:

1. A non-transitory computer-readable data storage medium comprising program code executable by a processor of a computing device to perform processing comprising: causing an image sensor of the computing device to capture an image of a document under varying environmental lighting conditions; and removing artifacts within the image of the document resulting from the varying environmental lighting conditions by relighting the image of the document as if the image of the document had been captured under non- varying environmental lighting conditions.

2. The non-transitory computer-readable data storage medium of claim 1 , wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions comprises: removing a varying lighting surface from the image of the document, the varying lighting surface corresponding to the varying environmental lighting conditions under which the image of the document was captured by the image sensor.

3. The non-transitory computer-readable data storage medium of claim 2, wherein relighting the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions further comprises: relighting the image of the document, from which the varying lighting surface has been removed, with a non-varying lighting surface.

4. The non-transitory computer-readable data storage medium of claim 3, wherein the non-varying lighting surface is constructively represented as a reference blank one-color image during relighting of the image of the document with the non-varying lighting surface.

5. The non-transitory computer-readable data storage medium of claim 3, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, and wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional neural network having a plurality of cascading decoder layers of increasing spatial resolution.

6. The non-transitory computer-readable data storage medium of claim 5, wherein the non-varying lighting surface is representatively integrated within the decoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.

7. A computing device comprising: an image sensor to capture an image of a document under varying environmental lighting conditions; and image enhance hardware to relight the image of the document as if the image of the document had been captured under non-varying environmental lighting conditions.

8. A method comprising: receiving, by a processor, an image of a document having a varying lighting surface; removing, by the processor, the varying lighting surface from the image of the document; and relighting, by the processor, the image of the document with a non varying lighting surface.

9. The method of claim 8, wherein removing the varying lighting surface from the image of the document comprises using an encoder convolutional neural network having a plurality of cascading encoder layers of decreasing spatial resolution, the image of the document being input into a first cascading encoder layer having a highest spatial resolution, and wherein at each cascading encoder convolutional layer, the encoder convolutional neural network outputs a representation of the image of the document from which the varying lighting surface.

10. The method of claim 9, wherein the encoder convolutional neural network outputs the varying lighting surface at a last cascading encoder layer having a lowest spatial resolution.

11. The method of claim 9, wherein relighting the image of the document with the non-varying lighting surface comprises using a decoder convolutional neural network having a plurality of cascading decoder layers of increasing spatial resolution, and wherein the representation of the image of the document output at each cascading encoder layer is input into the cascading decoder convolutional layer having a same resolution.

12. The method of claim 11 , wherein the encoder convolutional neural network outputs the image of the document with the non-varying lighting surface from a last cascading decoder layer having a highest spatial resolution.

13. The method of claim 11 , wherein the non-varying lighting surface is representatively integrated within the encoder convolutional neural network as constants within the cascading decoder layers and is not separately input into the encoder convolutional neural network.

14. The method of claim 11 , wherein the decoder and encoder convolutional neural networks correspond to one another as different parts of an overall machine learning model.

15. The method of claim 8, wherein the non-varying lighting surface is constructively represented during relighting of the image of the document as a reference blank one-color image.