WO2022082431A1

WO2022082431A1 - Systems and methods for extracting information from paper media based on depth information

Info

Publication number: WO2022082431A1
Application number: PCT/CN2020/122199
Authority: WO
Inventors: Runsheng ZHU; Yusheng Ye; Zheng Han
Original assignee: Beijing Tripmonkey Technology Limited
Priority date: 2020-10-20
Filing date: 2020-10-20
Publication date: 2022-04-28

Abstract

Systems and methods for information extraction from a paper receipt are provided. An exemplary system may include an image acquisition device configured to capture an image of the paper receipt, a memory storing computer-readable instructions, and at least one processor communicatively coupled to the memory and the image acquisition device. The computer-readable instructions, when executed by the at least one processor, may cause the at least one processor to perform operations. The operations may include obtaining depth information of the paper receipt and flattening the image of the paper receipt based on the depth information. The operations may also include classifying the paper receipt into one of a plurality of categories based on the depth information and extracting information from the flattened image of the paper receipt based on the category of the paper receipt.

Description

SYSTEMS AND METHODS FOR EXTRACTING INFORMATION FROM PAPER MEDIA BASED ON DEPTH INFORMATION

TECHNICAL FIELD

The present disclosure relates to extracting information from paper media, and more particularly, to systems and methods for extracting information from paper media based on depth information.

BACKGROUND

Advances in office automation and cloud computing lead to digitalization of traditional paper-based tasks such as spending recordation and reimbursement. An important step to facilitate digital management of spending and reimbursement information is to extract such information accurately from paper media, such as paper receipts. On one hand, the proliferation of camera-equippedmobile devices enables convenient capture of digital images of such paper receipts. On the other hand, however, due to the vast varieties of paper receipts and unpredictable capturing conditions, it is challenging to accurately extract information from the digital images of paper receipts, limiting the ability to integrate receipt management into a larger framework of office automation.

Embodiments of the present disclosure provide systems and methods for information extraction from paper receipts using depth information of the paper receipts, thereby improving the accuracy of the extracted information.

SUMMARY

In one example, a system for information extraction from a paper receipt may include an image acquisition device configured to capture an image of the paper receipt, a memory storing computer-readable instructions, and at least one processor communicatively coupled to the memory and the image acquisition device. The computer-readable instructions, when executed by the at least one processor, may cause the at least one processor to perform operations. The operations may include obtaining depth information of the paper receipt and flattening the image of the paper receipt based on the depth information. The operations may also include classifying the paper receipt into one of a plurality of categories based on the depth information. The operations may further include extracting information from the flattened image of the paper receipt based on the category of the paper receipt.

In another example, a method of information extraction from a paper receipt may include receiving an image of the paper receipt and receiving depth information of the paper receipt. The method may also include flattening the image of the paper receipt based on the depth information. The method may further include classifying the paper receipt into one of a plurality of categories based on the depth information. In addition, the method may include extracting information from the flattened image of the paper receipt based on the category of the paper receipt.

In a further example, a non-transitory computer-readable medium may store instructions that, when executed by at least one processor, cause the at least one processor to perform a method of information extraction from a paper receipt. The method may include receiving an image of the paper receipt and receiving depth information of the paper receipt. The method may also include flattening the image of the paper receipt based on the depth information. The method may further include classifying the paper receipt into one of a plurality of categories based on the depth information. In addition, the method may include extracting information from the flattened image of the paper receipt based on the category of the paper receipt.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an exemplary system for information extraction, according to embodiments of the disclosure.

FIG. 2 illustrates an exemplary computer system configured to implement certain components of the system shown in FIG. 1, according to embodiments of the disclosure.

FIG. 3 is a flowchart of an exemplary method of information extraction, according to embodiments of the disclosure.

FIG. 4 illustrates an exemplary preprocessing process, according to embodiments of the disclosure.

FIG. 5 illustrates another exemplary preprocessing process, according to embodiments of the disclosure.

FIG. 6 illustrates yet another exemplary preprocessing process, according to embodiments of the disclosure.

FIG. 7 illustrates an exemplary method of performing image flattening, according to embodiments of the disclosure.

DETAILED DESCRIPTION

The trend of office automation has transformed many traditional paper-based tasks into paperless processing, essentially digitizing information printed on paper media such that the information can be processed by computers. Accurately extracting printed information from paper media is thus important to facilitate downstream office task automations.

Current practice of information extraction from paper media (e.g., paper receipts) relies heavily on optical character recognition (OCR) . For example, a digital image of a paper receipt can be obtained by scanning the paper receipt using a scanner or taking a snapshot of the paper receipt using a digital camera or a camera-equipped smart phone. The digital image can then be OCRed to extract textual information originally printed on the paper receipt. The performance of such conventional methods depends largely on the quality of the digital image. In cases where the digital image is obtained from casual photo capturing with mobile devices, the performance of information extraction often suffers from distortions in the digital image due to uncontrollable factors such as physical deformation of the paper receipt, varying camera positions, and unconstrained illumination conditions. Therefore, extracting information accurately and reliably from digital images captured by mobile devices is often difficult.

Embodiments of the present disclosure provide systems and methods for improving the performance of information extraction from paper media, such as paper receipts, utilizing depth information. As used herein, depth information refers to spatial information of a paper medium. For example, depth information may include a point cloud in a 3D spatial system, where each point represents a 3D position on the surface of the paper medium. Thus, depth information of a paper medium may also be referred to as 3D information or a 3D image of the paper medium. In another example, depth information may include distance information of a plurality of points on the surface of the paper medium relative to a reference point or a reference surface. Therefore, depth information may also be referred to as range information. Depth information may also be in other forms, such as a network of polygons representing a surface profile, a color map representing distance values from a reference point or surface, etc. Systems and methods disclosed herein are not limited to any particular form of depth information.

Depth information can be used to enhance the performance of information extraction in many ways. For example, depth information can be used to correct distortions in the image of the paper medium (e.g., to flatten the image) , thereby improving the recognition accuracy. Depth information can also be used to classify the paper medium into one of a plurality of categories to apply a category-specific or category-optimized recognition process to improve the recognition accuracy.

In this disclosure, various embodiments are describedin the context of information extraction from paper receipts, including purchase receipts, official receipts, tickets, invoices, bills, orders, confirmations, product certificates, manuals, user guides, instructions, etc. It is noted that methods and systems disclosed herein can also be used to extract information from other types of paper media, such as books, magazines, newspapers, photos, labels, printed publications, etc. Information extracted from paper media may include characters, graphs, numbers (e.g., serial numbers, numerical codes, identification numbers, etc. ) , bar codes, 2D codes (e.g., QR codes) , or other types of information.

FIG. 1 illustrates an exemplary system 100 for information extraction, according to embodiments of the disclosure. As shown in FIG. 1, system 100 may include several components, such as a depth sensor 102, an image acquisition device 104, a depth information extraction unit 106, a preprocessing unit 108, an image flattening unit 110, a classification unit 112, a recognition unit 130, and an extraction unit 150. Some components may be implemented primarily by hardware, such as depth sensor 102 and image acquisition device 104. Some components may be implemented primarily by software programs running on data processing devices, such as depth information extraction unit 106, preprocessing unit 108, image flattening unit 110, classification unit 112, recognition unit 130, and extraction unit 150. However, system 100 is not limited by the specific manner its components are implemented, so long as the components are configured or programmed to perform their respective functions as disclosed herein. For example, some or all of the software-implemented components may also be implemented by dedicated hardware or a combination of hardware and software. FIG. 1 also illustrates certain auxiliary systems that can be either part of or interfaced with system 100, such as a merchant database 160 and an office automation interface 180. Office automation interface 180 may establish communications between system 100 and other digital office information management systems, such as a spending management system, a reimbursement management system, a property management system, or the like.

FIG. 2 illustrates an exemplary computer system 200 configured to implement certain components of system 100, according to embodiments of the disclosure. As shown in FIG. 2, computer system 200 may include a processor 210, a memory 220, and a communication interface 230. Computer system 200 may be configured or programmed to implement certain components of system 100, such as depth information extraction unit 106, preprocessing unit 108, image flattening unit 110, classification unit 112, recognition unit 130, and extraction unit 150. For example, processor 210 may execute instructions of one or more software modules stored in memory 220 to implement the functions of the corresponding components of system 100. For simplicity, the software modules are denoted using the same names and reference numbers as the corresponding components shown in FIG. 1.

FIG. 2 also illustrates exemplary interconnections among components of computer system 200, and between computer system 200 and other components of system 100 or other auxiliary systems, such as depth sensor 102, image acquisition device 104, merchant database 160, and office automation interface 180. For example, processor 210 may be communicatively coupled to memory 220 and communication interface 230. As used herein, communicative coupling may include any suitable form of connections (e.g., electrical connections, magnetic coupling, optical coupling, etc. ) that allow information communications between the coupled components. In some embodiments, memory 220 may be communicative coupled to communication interface 230 directly. As shown in FIG. 2, computer system 200 may be communicatively coupled to depth sensor 102, image acquisition device 104, merchant database 160, and office automation interface 180 via communication interface 230.

Processor 210 may include any suitable data processing devices such as a microprocessor, a central processing unit (CPU) , a graphics processing unit (GPU) , or the like. Processor 210 may be implemented in a centralized or distributed manner, depending on particular applications. Processor 210 may execute computer-readable instructions, such as software codes, to perform various operations disclosed herein. As described above, processor 210 may be communicatively coupled to memory 220 and communication interface 230 via data transmission channels such as data buses.

Communication interface 230 may include any suitable software, middleware, firmware, and/or hardware that are configured to establish communication links between computer system 200 and an external device and/or to facilitate input/output of information. For example, communication interface 230 may include wired connection devices such as an Ethernet adapter, a modem, a coaxial cable adaptor, a fiber optical adapter, or the like. In another example, communication interface 230 may include wireless connection devices such as a wireless network adapter, a telecommunication modem, a satellite communication modem, a short-range communication adapter, or the like. In yet another example, communication interface 230 may include I/O devices such as a display, a keyboard, a mouse, a printer, a touch screen, a speaker, or the like.

Memory 220 may include any suitable memory devices and/or storage media, such as a read only memory (ROM) , a flash memory, a random access memory (RAM) , a static memory, a hard drive, a semiconductor-based memory, etc., on which computer-readable instructions are stored in any suitable format. Memory 220 may store computer-readable instructions of one or more software programs (also referred to as software modules or software units) , which can be executed by processor 210 to perform various operations and functions. As shown in FIG. 2, memory 220 may store computer-readable instructions of various software units for performing respective functions of depth information extraction unit 106, preprocessing unit 108, image flattening unit 110, classification unit 112, recognition unit 130, and extraction unit 150. In some embodiments, two or more of the software units shown in FIG. 2 may be combined. In some embodiments, additional software units for information extraction may be stored in memory 220. In some embodiments, one or more of the software units shown in FIG. 2 may be omitted.

FIG. 3 is a flowchart of an exemplary method 300 of information extraction, according to embodiments of the disclosure. Method 300 includes several steps, which may be performed by components of system 100. For example, certain steps of method 300 may be performed by processor 210 by executing corresponding software module (s) stored in memory 220. Some of the steps of method 300 may be omitted. In addition, the steps may be performed in a different order than the one shown in FIG. 3. One or more steps may also be performed simultaneously. In the following, FIGs. 1-3 will be described together.

Referring to FIG. 3, method 300 may start from step 310, in which image acquisition device 104 (FIG. 1) may capture an image of a paper receipt. Turning to FIG. 1, image acquisition device 104 may include a camera equipped on a mobile device, such as a camera of a smart phone, a tablet, a laptop, or the like. A user may use image acquisition device 104 to capture a digital image (hereinafter referred to as an “image” ) of the paper receipt. In many cases, the captured image may have various distortions. Some distortions may be caused by the physical conditions of the paper receipt. For example, the paper receipt may have been folded and therefor have folding marks. The surface of the paper receipt may not be flat due to folding, holding position, and/or its natural curvature. The image may have perspective distortions due to the shooting angle or the position of the camera relative to the paper receipt. These and other similar distortions may degrade the quality of the image, and ultimately adversely affect the performance of information extraction from the image of the paper receipt.

Embodiments of the present disclosure can improve the performance of information extraction by utilizing depth information of the paper receipt. The depth information may be obtained in step 320 (FIG. 3) . There are different ways of obtaining the depth information. For example, some mobile devices may be equipped with depth sensor 102. Depth sensor 102 may include, for example, a structured-light 3D scanner, a time-of-flight sensor/camera, etc. In these cases, depth information may be obtained by depth sensor 102. As discussed above, the depth information may be in the form of a point cloud, 3D coordinates, distance information, a network of polygons, a color map representing distance values, etc. When depth sensor 102 is not available or not used, depth information may be obtained from the image of the paper receipt using depth information extraction unit 106. For example, the image of the paper receipt may be input to depth information extraction unit 106, which may extract depth information from the image using a learning network.

In some embodiments, the learning network (also referred to as a “shape network” ) may include a deep learning neural network configured to regress the 3D shape of the paper receipt based on the input image. The regression task can be formulated as an image-to-image translation problem: given an input image I, the network translates each pixel of I into a 3D coordinate of a 3D map C. A convolutional neural network (CNN) style encoder-decoder architecture with skip connections may be used to implement the shape network. The shape network may be trained with publicly available data set, historically collected data set (e.g., image-point cloud pairs obtained by depth sensor-equipped devices) , and/or synthetic data set generated by computer simulations.

Referring to FIG. 3, after the depth information is obtained (e.g., either from depth sensor 102 or from depth information extraction unit 106) , method 300 may proceed to step 330, in which preprocessing unit 108 may preprocess the image of the paper receipt and/or the depth information. Returning to FIG. 1, preprocessing unit 108 may take the depth information and the image as input, either in individual signal channels or in a combined signal channel, and condition the depth information, the image, or both, for downstream operations.

In some embodiments, preprocessing unit 108 may segment a profile of the paper receipt from the image of the paper receipt. A profile may include an area in the image corresponding to the paper receipt in that image or substantially overlapping with the paper receipt in that image. For example, the profile may include a portion of the image enclosed by outer boundaries of the paper receipt in the image. In this case, segmenting the profile may include finding the outer boundaries. In another example, the profile may include a portion of the image that is not in the background. In this case, segmenting the profile may include differentiating foreground and background of the image.

FIG. 4 illustrates a diagram indicating an exemplary image 410 of a paper receipt. Image 410 includes a portion 412 corresponding to the paper receipt capture in the image (hereinafter referred to as “paper receipt 412” for simplicity, not to be confused with the actual paper medium receipt) , as well as background information 414 that is not part of paper receipt 412. Paper receipt 412 may include contents, such as character (s) , image (s) , logo (s) , pattern (s) , color (s) , or other visual elements on its surface. In addition, due to its curvature, paper receipt 412 exhibits some distortions, resembling a slightly twisted rectangular shape. Preprocessing unit 108 may segment a profile 422 of paper receipt 412 in a new, preprocessed image 420. For example, preprocessing unit 108 may remove background information 414 from image 410, as background information 414 does not contribute to information extraction and is therefore considered as noise. Preprocessing unit 108 may also remove contents of paper receipt 412. During the segmentation process, preprocessing unit 108 may utilize information relating to the boundaries of paper receipt 412, such as edges, contract/color gradients, etc. Contents within paper receipt 412 may be of less importance. After background information 414 and/or the contents of paper receipt 412 are removed from image 410, preprocessing unit 108 may segment profile 422 from image 410. The segmented profile 422 may be stored or otherwise associated with image 410 or may be provided in a new image 420.

In some embodiments, preprocessing unit 108 may preprocess the depth information input from depth sensor 102 or depth information extraction unit 106. For example, preprocessing unit 108 may segment a depth profile of the paper receipt from the depth information based on the profile of the paper receipt segmented from the image of the paper receipt. Referring again to FIG. 4, after profile 422 is segmented, preprocessing unit 108 may segment a depth profile 432 from the depth information input to preprocessing unit 108 to generate, for example, preprocessed depth information 430 (also referred to as a “preprocessed depth image” ) . For example, preprocessing unit 108 may map profile 422 to the input depth information to identify the spatial points that fall within the boundaries of profile 422 and retain their values. The collection of these spatial points may be denoted as depth profile 432. On the other hand, preprocessor unit 108 may set the values of other spatial points falling outside of profile 422 to a predetermined value (e.g., zero or certain preset value (s) that distinguish from those values of spatial points within depth profile 432) . In this way, noises in the depth information not contributing to information extraction can be reduced.

FIG. 5 illustrates another example of a preprocessing process, according to embodiments of the disclosure. An image 510 may be captured by captured by image acquisition device 104, containing a portion 512 corresponding to a paper receipt (hereinafter portion 512 is referred to as “paper receipt” 512 for simplicity) and background information 514. Depth information (also referred to as a “depth image” ) 530 may be obtained using depth sensor 102 or extracted from image 510 by depth information extraction unit 106. Image 510 may be input to preprocessing unit 108, which may crop image 510 into a smaller image 520 to remove background information 514, remove contents from paper receipt 512, and segment a profile 522 of the paper receipt. Profile 522 may then be mapped to depth information 530 to segment the corresponding depth profile 542. For example, the original depth image 530 may be cropped into a smaller depth image 540 based on image 520. The preprocessed depth image 540 may then be used in downstream processing to extract information from paper receipt 512. In some embodiments, depth profile 542 may be segmented from depth image 540 by mapping spatial points in depth image 540 to profile 522, or vice versa.

FIG. 6 illustrates yet another example of a preprocessing process, according to embodiments of the disclosure. Referring to FIG. 6, an image 610 may be captured by image acquisition device 104. Image 610 may contain a portion 612 corresponding to a paper receipt (hereinafter portion 612 is referred to as “paper receipt” 612 for simplicity) and background information 614. Image 610 may be input to preprocessing unit 108, which may crop image 610 to remove background information 614 to generate a cropped image 620. Contents of paper receipt 612 may also be removed to reduce noise, as described above. Preprocessing unit 108 may segment profile 622 from cropped image 620. Cropped image 620 or profile 622 may be used to segment a depth profile 632 in a preprocessed depth image 630, similar to the segmentation process shown in FIG. 5.

After preprocessing unit 108 segments a profile and/or a depth profile, the profile/depth profile may be used to determine dimensions of the paper receipt. In some embodiments, preprocessing unit 108 may determine the size (e.g., length, width, or both) of the profile based on the number of pixels, spatial coordinates, reference object (s) , or other measures. In some embodiments, preprocessing unit 108 may also determine the relative size such as length-to-width ratio based on, for example, the numbers of pixels along the edges of the profile. The dimensions of the paper receipt may be input to classification unit 112 to classify the paper receipt into one of a plurality of categories.

Returning to FIG. 3, method 300 proceeds from step 330 to step 340, in which image flattening unit 110 may flatten the image of the paper receipt based on depth information to reduce noises caused by various distortions. As shown in FIG. 1, image flattening unit 110 may receive preprocessed image and preprocessed depth information from preprocessing unit 108, and flatten the image of the paper receipt in the preprocessed image based on the preprocessed depth information. It is noted that in some embodiments, preprocessing unit 108 may not preprocess both the image and the depth information. For example, preprocessing unit 108 may preprocess only the image or preprocess only the depth information. In either case, image flattening unit 110 may flatten the image of the paper receipt from either the preprocessed image or the original image (e.g., the image captured by image acquisition device 104) based on either the preprocessed depth information or the original depth information (e.g., the depth information captured by depth sensor 102 or extracted by depth information extraction unit 106) . In some embodiments, preprocessing unit 108 may be omitted from system 100 all together, and image flattening unit 110 may flatten the original image based on the original depth information.

As used herein, “flattening” an image refers to an image processing operation in which a distorted image of a paper receipt is restored to a state similar to what the image would be if the paper receipt is scanned using a flatbed scanner and free from folding or other physical deformations. In other words, the flattening operation aims to reduce distortions such as perspective distortions, folding distortions, curvature distortions, etc. that are common to casual photo capturing.

Image flattening unit 110 may performing the image flattening operation based on the depth information or the preprocessed depth information. When preprocessed depth information is used, the performance of the flattening may be enhanced due to the reduced noise in the preprocessed depth information. FIG. 7 shows an exemplary method of performing image flattening (step 340) , according to some embodiments. Referring to FIG. 7, image 510 may be preprocessed by preprocessing unit 108 to remove background information 514 by, for example, cropping, to generate a preprocessed image 710 containing paper receipt 512. A corresponding depth image 720 may be generated by, for example, similarly cropping the original depth information obtained by depth sensor 102 or depth information extraction unit 106. Paper receipt 512 may be distorted due to, for example, folding, as illustrated in FIG. 7. Image flattening unit 110 may receive image 710 and depth image 720 as inputs, and flatten paper receipt 512 based on depth image 720 to generate a flattened image 730, in which a flattened paper receipt 732 exhibits improved image quality with much reduced distortions.

In some embodiments, image flattening unit 110 may correct distortions in the image of the paper receipt using a deep neural network. For example, the deep neural network may be implemented by a CNN style multi-layer network, which can be trained using depth data of paper receipts having various distortions to establish relationship between features in the depth information and distortions in the paper receipts or the corrections thereof. The training data may be drawn from publicly available data set, historically collected data set, and/or synthetic data set. Depth information can be particularly useful in correcting perspective distortion and folding distortion because of the distinctive features in the spatial shape of the paper receipt associated with these types of distortions. For example, perspective distortions occur when the image of a paper receipt is captured from a direction non-perpendicular to the surface of the paper receipt or the paper receipt is off center in the field of view (FOV) of image acquisition device 104. In either case, the corresponding depth information can provide a pattern of the spatial points that reveal the relative positions between the paper receipt and image acquisition device 104, thereby aiding the correction of such distortions. In another example, the depth information of a non-flat paper receipt due to prior folding can reveal the spatial variations on the surface of the paper receipt. As a result, the effect of the folding can be accounted for to correct such distortions in the image of the paper receipt. It is noted that a properly trained deep neural network may include model parameters that are automatically tuned to provide the optimized distortion correction results from the input images based on the corresponding depth information.

Returning to FIG. 3, method 300 proceeds to step 350, in which classification unit 112 may classify the paper receipt into one of a plurality of categories based on the depth information. As shown in FIG. 1, classification unit 112 may receive the preprocessed depth information (or the original depth information if preprocessing unit 108 is omitted) as input, and classify the paper receipt into one of a plurality of categories (e.g., categories A, B, ..., N shown in a collection of categories 120) based on the depth information. The categories may be based on the general type of the paper receipt, for example, one category may be point-of-sale (POS) receipts, and another category may be official value-added tax (VAT) receipts, and yet another category may be standard sales receipts. In another example, the categories may be based on the grammage of the paper receipt: one category may be high-grammage receipts, and another category may be low grammage receipts. In a further example, the categories may be set based on the specific type of the paper receipt: one category may be train tickets, another category may be boarding passes, a further category may be hotel invoices, etc. Categories may also be combined. For example, an exemplary set of categories may include POS receipts, official VAT receipts, standard sales receipts, low grammage receipts, and train tickets. The number and characteristic of the categories may be determined based on particular applications.

In some embodiments, classification unit 112 may classify the paper receipt based on the dimensions of the paper receipt. As described above, the dimensions of the paper receipt may be determined based on the profile or the depth profile of the paper receipt. The dimensions may then be used to determine to which category the paper receipt belongs. For example, certain paper receipts may have a standard size, such as train tickets, boarding passes, official VAT receipts, standard sales receipts, etc. Certain paper receipts may have a standard width, such as POS receipts printed on standard receipt paper rolls. Certain paper receipts may have a fixed length-to-width ratio, such as invoices printed on standard letter size or A4 size paper. Thus, dimension information may be used to classify the paper receipt into an appropriate category. In some embodiments, classification unit 112 may compare the dimensions of the paper receipt with a set of reference dimensions. If the dimensions of the paper receipt match any reference dimensions, classification unit 112 may classify the paper receipt into the category corresponding to the matched reference dimensions. When multiple criteria are used in the classification operation, classification unit 112 may increase the weight of the category corresponding to the matched reference dimensions in determining the final category into which the paper receipt is classified.

In some embodiments, one or more physical features of the paper receipt may be extracted from the depth information by classification unit 112 to aid the classification operation. The physical features may include, for example, a natural curvature, a holding curvature, a folding pattern, and a folding depth. The natural curvature refers to the curvature exhibited by the paper receipt due to its natural property without external force or disturbance. For example, certain paper materials such as train tickets may exhibit a cylindrical-like shape and the curvature may be relatively uniform throughout the tickets; certain POS receipts may tend to return to their original rolled state and may exhibit larger curvature toward the top and bottom parts of the receipts. FIG. 4 shows another example of nature curvature where two opposing corners curve up while the other two corners stay relatively flat. The holding curvature refers to the curvature caused by holding the paper receipt (for example, for capturing the photo of the paper receipt) . Typical holding curvatures include a diagonally extending trench (e.g., caused by holding a relatively hard paper receipt at a corner) , a linear ridge or trench extending across the surface of the paper receipt (e.g., caused by holding the upper and lower edges, or left and right edges of a relatively hard paper receipt) , a fan-like ridge around a corner (e.g., caused by holding a relatively soft paper receipt at the corner) , etc. The folding pattern refers to the way the paper receipt is folded (and then unfolded) before an image of the paper receipt is captured. For example, FIG. 7 illustrates an exemplary folding pattern in which paper receipt 512 is folded first along the center line, and then folded again along the new center line. FIG. 6 illustrates another folding pattern in which paper receipt 612 is folded twice, along the one-third and then two-third lines. The folding depth refers to the depth of the folding marks. For example, it is more common to find deep folding marks on official VAT receipts due to their relatively large size (and therefore inconvenient to carry if not folded) and their relatively soft and light material (and therefore easy to fold) , while it is rare to find deep folding marks on train tickets due to their relatively small size and relatively hard material.

Classification unit 112 may extract these and similar physical features from the depth information and classify the paper receipt based on one or more of the physical features. For example, classification unit 112 may determine the spatial variations on the surface of the paper receipt based on the depth information, and further determine the curvature, folding pattern, and/or folding depth from the spatial variations. As described above, certain natural curvatures and holding curvatures may be related to the material, type, or other characteristics of the paper receipt. Similarly, the folding patterns and folding depth may also be related to the material, type, or other characteristics of the paper receipt. Therefore, classification unit 112 may map one or more physical features of the paper receipt to the category of the paper receipt using methods such as direct mapping, weighted mapping, and/or a machine learning network such as a CNN.

In some embodiments, classification unit 112 may determine a material of the paper receipt based on the depth information. As discussed above, the material of the paper receipt may affect the spatial shape of the paper receipt, which is contained in the depth information and can be used to derive information about the material of the paper receipt. For example, high-grammage receipts are usually relatively hard and tend to retain their original form. The surface of a high-grammage receipt is usually smoother than a low-grammage counterpart, with relatively few abrupt variations in a local area (e.g., local maxima or local minima) . Instead, spatial variations exhibiting on a high-grammage receipt tend to extend along a relatively large area (e.g., curvatures or folding marks across the entire receipt) . These and other distinctive features contained in the depth information may be used to predict the material of the paper receipt. For example, a machine learning network may be trained to relate depth information with various paper receipt materials, and may then be used to determine the material of a paper receipt based on its corresponding depth information.

After classification unit 112 determines the material of the paper receipt, classification unit 112 may classify the paper receipt based on the material. For example, when the categories are material-based, such as high-grammage receipts, low-grammage receipts, etc., classification unit 112 may classify the paper receipt directly into the appropriate category corresponding to the determined material. In another example, the material of the paper receipt may be used to classify the paper receipt into a category that is highly corelated to the material (e.g., low-grammage material to POS receipts) . In yet another example, the material information may be used along with other characteristics in the classification process, e.g., serving as one of the inputs to a leaning network-based classifier. For instance, one or more physical features may be used along with the material information to classify the paper receipt.

In some embodiments, classification unit 112 may determine the probability that a paper receipt belongs to each of a plurality of categories, and classify the paper receipt to the category that has the highest probability. For example, classification unit 112 may determine that the probability that a paper receipt is an official VAT receipt is 90%, the probability that the paper receipt is a standard sales receipt is 30%, and the probability that the paper receipt is a POS receipt is 5%. In this case, classification unit 112 may classify the paper receipt into the category of official VAT receipts, as this category has the highest probability.

Returning to FIG. 3, method 300 may proceed to step 360, in which information is extracted from the flattened image of the paper receipt based on the category of the paper receipt. Referring to FIG. 1, step 360 may by performed jointly by recognition unit 130 and extraction unit 150. Recognition unit 130 may receive the flattened image from image flattening unit 110 and the category into which the paper receipt is classified by classification unit 112. Based on the category, recognition unit 130 may recognition information from the flattened image. As shown in FIG. 1, recognition unit 130 may include an image/pattern recognition unit 132 and a text/semantic recognition unit 134. While

units

132 and 134 are illustrated as separate units in FIG. 1, in some embodiments they may be combined as a single recognition unit.

In some embodiments, recognition unit 130 may recognize at least one characteristic from the flattened image of the paper receipt based on the category of the paper receipt. The at least one characteristic may include, for example, a receipt pattern 142, a merchant feature 144, textual information 146, etc. Receipt pattern 142 may include, for example, a style of the paper receipt such as a set of lines, figures, colors, or patterns; a layout of the paper receipt; a content composition of the paper receipt such as a specific font, size, indentation, or spacing. Merchant feature 144 may include information identifying a merchant, such as a merchant’s logo, seal, identifier, slogan, name, address, phone number, website address, email, bar code, 2D code, etc. Textual information 146 may include text printed on the receipt, including characters or words in one or more languages, numbers, symbols, etc.

Embodiments of the disclosure can utilize category information to improve the accuracy of recognizing information from an image. In some embodiments, recognition unit 130 may identify one or more category-specific features associated with the paper receipt based on the category of the paper receipt, and recognize characteristic (s) from the flattened image of the paper receipt based on the category-specific feature (s) . Exemplary category-specific features may include an area on the paper receipt, a location on the paper receipt, a size of the paper receipt, etc. For example, a paper receipt may be classified as an official VAT receipt. Recognition unit 130 may identify that official VAT receipts have a standard size and layout, where the amount of the transaction is printed on a specific location of the receipt, and the spender’s name is printed in a 2 cm by 8 cm block located on the upper left corner. Such category-specific features may provide guidance in the actual recognition process, in which the recognition engine (e.g., 132 or 134) may recognize specific type of contents in specific location or area of the paper receipt image, thereby improving the accuracy of the information extraction result.

As shown in FIG. 1, image/pattern recognition unit 132 may be used to recognize receipt pattern 142 and merchant feature 144 from the flattened image of the paper receipt based on category-specific features. Similarly, text/semantic recognition unit 134 may recognize textual information 146 from the flattened image of the paper receipt based on category-specific features. In some embodiments, selection between

units

132 and 134 may be determined based on the category-specific features. For example, when the category-specific features specify only textual information, unit 134 may be employed to perform the recognition operation. When the category-specific features specify non-textual information, unit 132 may be employed, in place of or in addition to unit 134, to recognize, e.g., receipt pattern 142 and/or merchant feature 144.

Characteristics recognized by recognition unit 130 may be input to extraction unit 150 to extract structured information. Exemplary structured information may include a set of information that is generally carried by a paper receipt, such as the date and time of the transaction, merchant information, goods or services, transaction amount, purchaser information, etc. As shown in FIG. 1, extraction unit 150 may include a matching unit 152, which may match the recognized characteristic (s) with records in merchant database 160 to generate a matching result. For example, recognition unit 130 may recognize a merchant’s logo, which may be compared by matching unit 152 with merchant logo’s stored in merchant database 160. If a matching logo is found, the corresponding merchant information may be fetched from merchant database 160 to aid the extraction of structured information from the paper receipt. For example, the merchant’s name, address, tax ID, or other information fetched from merchant database 160 may be used by a structure information extraction unit 154 to verify corresponding information recognized from the image of the paper receipt. In some embodiments, the verification result and/or the matching result may be fed back to recognition unit 130 to refine the recognition of characteristics from the image of the paper receipt. In this way, the added information from merchant database can improve the accuracy and efficiency of information extraction from the image of the paper receipt. In another example, after a matching merchant is found by matching unit 152 in merchant database 160, a receipt template may be fetched from merchant database 160 to aid information extraction. The receipt template may include information about the specific layout of the receipt issued by that merchant, including the location of merchant-specific information such as the transaction amount, purchase information, goods/service list, etc. Such merchant-specific information may be used to verify or improve the characteristics already recognized, or be fed back to recognition unit 130 to refine the recognition operation. In any case, individual pieces of information recognized by recognition unit 130 may be verified, corrected, and/or enriched, before assembled into structured information 170 by structured information extraction unit 154. Structure information 170 may then be provided to downstream office automation systems through office automation interface 180, which may be implemented by, for example, application program interfaces (APIs) to interface with various electronic office management systems.

In some embodiments, not all of the components illustrated in FIG. 1 are present in system 100, and one or more components may be omitted. For example, depth sensor 102 may not be available in some embodiments, and depth information may be provided by depth information extraction unit 106 based on image (s) captured by image acquisition device 104. On the other hand, in embodiments where depth sensor 102 is present, depth information extraction unit 106 may be omitted. In another example, preprocessing unit 108 may be omitted with respective to depth information, image information, or both. As a result, depth information and/or image information not undergoing preprocessing may be used by classification unit 112 and/or image flattening unit 110. In yet another example, only one of the image/pattern recognition unit 132 and text/semantic recognition unit 134 may be present in recognition unit 130, and matching unit 152 may take a subset of receipt pattern 142, merchant feature 144, and text information 146 as input to match with records in merchant database 160.

A further aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods disclosed herein. The computer-readable medium may be volatile or non-volatile, magnetic, semiconductor-based, tape-based, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

A system for information extraction from a paper receipt, the system comprising:

an image acquisition device configured to capture an image of the paper receipt;

a memory storing computer-readable instructions; and

at least one processor communicatively coupled to the memory and the image acquisition device, wherein the computer-readable instructions, when executed by the at least one processor, cause the at least one processor to perform operations comprising:

obtaining depth information of the paper receipt;

flattening the image of the paper receipt based on the depth information;

classifying the paper receipt into one of a plurality of categories based on the depth information; and

extracting information from the flattened image of the paper receipt based on the category of the paper receipt.
The system of claim 1, comprising a depth sensor, wherein obtaining the depth information comprises obtaining the depth information using the depth sensor.
The system of claim 1, wherein obtaining the depth information comprises:

extracting the depth information from the image of the paper receipt using a learning network.
The system of claim 1, wherein:

the operations comprise preprocessing the image of the paper receipt,

flattening the image of the paper receipt comprises flattening the preprocessed image of the paper receipt based on the depth information.
The system of claim 4, wherein preprocessing the image of the paper receipt comprises segmenting a profile of the paper receipt from the image of the paper receipt.
The system of claim 5, wherein segmenting the profile of the paper receipt comprises:

removing background information from the image of the paper receipt;

removing contents of the paper receipt from the image of the paper receipt; and

segmenting the profile of the paper receipt from the image of the paper receipt with the background information and the contents of the paper receipt removed.
The system of claim 5, wherein:

the operations comprise preprocessing the depth information; and

flattening the image of the paper receipt comprises flattening the image of the paper receipt based on the preprocessed depth information.
The system of claim 7, wherein preprocessing the depth information comprises segmenting a depth profile of the paper receipt from the depth information based on the profile of the paper receipt segmented from the image of the paper receipt.
The system of claim 8, wherein the operations comprise determining dimensions of the paper receipt based on at least one of the depth profile or the profile of the paper receipt.
The system of claim 9, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the dimensions of the paper receipt.
The system of claim 1, wherein the operations comprise extracting at least one physical feature of the paper receipt from the depth information, the at least one physical feature comprising at least one of a natural curvature, a holding curvature, a folding pattern, or a folding depth.
The system of claim 11, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the extracted at least one physical feature of the paper receipt.
The system of claim 1, wherein the operations comprise determining a material of the paper receipt based on the depth information.
The system of claim 13, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the material of the paper receipt.
The system of claim 1, wherein the plurality of categories comprise at least two of point-of-sale (POS) receipts, official value-added tax (VAT) receipts, standard sales receipts, high-grammage receipts, or low-grammage receipts.
The system of claim 1, wherein flattening the image of the paper receipt based on the depth information comprises:

correcting, using a deep neural network, perspective distortion or foldingdistortion in the image of the paper receipt based on the depth information.
The system of claim 1, wherein the operations comprise:

recognizing at least one characteristic from the flattened image of the paper receipt based on the category of the paper receipt; and

matching the recognized at least one characteristic with records in a merchant database to generate a matching result,

wherein extracting the information from the flattened image of the paper receipt comprises extracting structured information from the flattened image of the paper receipt based on the matching result.
The system of claim 17, wherein the operations comprise:

identifying at least one category-specific feature associated with the paper receipt based on the category of the paper receipt,

wherein recognizing at least one characteristic from the flattened image of the paper receipt comprises recognizing the at least one characteristic from the flattened image of the paper receipt based on the at least one category-specific feature.
The system of claim 18, wherein the at least one category-specific feature comprises an area on the paper receipt, a location on the paper receipt, or a size of the paper receipt.
The system of claim 17, wherein the at least one characteristic comprises at least one of a receipt pattern, a merchant feature, or textual information.
The system of claim 20, wherein the receipt pattern comprises at least one of a style of the paper receipt, a layout of the paper receipt, or a content composition of the paper receipt.
The system of claim 20, wherein the merchant feature comprises at least one of a merchant logo, a merchant seal, or a merchant identifier.
A method of information extraction from a paper receipt, the method comprising:

receiving an image of the paper receipt;

receiving depth information of the paper receipt;

flattening the image of the paper receipt based on the depth information;

classifying the paper receipt into one of a plurality of categories based on the depth information; and

extracting information from the flattened image of the paper receipt based on the category of the paper receipt.
The method of claim 23, comprising:

obtaining the depth information from a depth sensor.
The method of claim 23, comprising:

extracting the depth information from the image of the paper receipt using a learning network.
The method of claim 23, comprising:

preprocessing the image of the paper receipt,

wherein flattening the image of the paper receipt comprises flattening the preprocessed image of the paper receipt based on the depth information.
The method of claim 26, wherein preprocessing the image of the paper receipt comprises segmenting a profile of the paper receipt from the image of the paper receipt.
The method of claim 27, wherein segmenting the profile of the paper receipt comprises:

removing background information from the image of the paper receipt;

removing contents of the paper receipt from the image of the paper receipt; and

segmenting the profile of the paper receipt from the image of the paper receipt with the background information and the contents of the paper receipt removed.
The method of claim 27, comprising:

preprocessing the depth information,

wherein flattening the image of the paper receipt comprises flattening the image of the paper receipt based on the preprocessed depth information.
The method of claim 29, wherein preprocessing the depth information comprises segmenting a depth profile of the paper receipt from the depth information based on the profile of the paper receipt segmented from the image of the paper receipt.
The method of claim 30, comprising determining dimensions of the paper receipt based on at least one of the depth profile or the profile of the paper receipt.
The method of claim 31, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the dimensions of the paper receipt.
The method of claim 23, comprising extracting at least one physical feature of the paper receipt from the depth information, the at least one physical feature comprising at least one of a natural curvature, a holding curvature, a folding pattern, or a folding depth.
The method of claim 33, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the extracted at least one physical feature of the paper receipt.
The method of claim 23, comprising determining a material of the paper receipt based on the depth information.
The method of claim 35, wherein classifying the paper receipt into one of a plurality of categories comprises classifying the paper receipt into one of a plurality of categories based on the material of the paper receipt.
The method of claim 23, wherein the plurality of categories comprise at least two of point-of-sale (POS) receipts, official value-added tax (VAT) receipts, standard sales receipts, high-grammage receipts, or low-grammage receipts.
The method of claim 23, wherein flattening the image of the paper receipt based on the depth information comprises:

correcting, using a deep neural network, perspective distortion or foldingdistortion in the image of the paper receipt based on the depth information.
The method of claim 23, comprising:

recognizing at least one characteristic from the flattened image of the paper receipt based on the category of the paper receipt; and

matching the recognized at least one characteristic with records in a merchant database to generate a matching result,

wherein extracting the information from the flattened image of the paper receipt comprises extracting structured information from the flattened image of the paper receipt based on the matching result.
The method of claim 39, comprising:

identifying at least one category-specific feature associated with the paper receipt based on the category of the paper receipt,

wherein recognizing at least one characteristic from the flattened image of the paper receipt comprises recognizing the at least one characteristic from the flattened image of the paper receipt based on the at least one category-specific feature.
The method of claim 40, wherein the at least one category-specific feature comprises an area on the paper receipt, a location on the paper receipt, or a size of the paper receipt.
The method of claim 39, wherein the at least one characteristic comprises at least one of a receipt pattern, a merchant feature, or textual information.
The method of claim 42, wherein the receipt pattern comprises at least one of a style of the paper receipt, a layout of the paper receipt, or a content composition of the paper receipt.
The method of claim 42, wherein the merchant feature comprises at least one of a merchant logo, a merchant seal, or a merchant identifier.
A non-transitory computer-readable medium storing instructions that, when executed by at least one processor, cause the at least one processor to perform a method of information extraction from a paper receipt, the method comprising:

receiving an image of the paper receipt;

receiving depth information of the paper receipt;

flattening the image of the paper receipt based on the depth information;

classifying the paper receipt into one of a plurality of categories based on the depth information; and

extracting information from the flattened image of the paper receipt based on the category of the paper receipt.