CN112036295B

CN112036295B - Bill image processing method and device, storage medium and electronic equipment

Info

Publication number: CN112036295B
Application number: CN202010884649.XA
Authority: CN
Inventors: 刘岩
Original assignee: Taikang Insurance Group Co Ltd
Current assignee: Taikang Insurance Group Co Ltd
Priority date: 2020-08-28
Filing date: 2020-08-28
Publication date: 2023-12-08
Anticipated expiration: 2040-08-28
Also published as: CN112036295A

Abstract

The embodiment of the disclosure provides a bill image processing method, a bill image processing device, a storage medium and electronic equipment. The method comprises the following steps: acquiring sub-images of the bill image, and acquiring text information, pixel information and coordinate information of each sub-image in the bill image; determining a key value pair formed by sub-images in the bill image and structural information of the key value pair according to the text information, the pixel information and the coordinate information of each sub-image in the bill image; correcting the key value pairs based on the error correction information corresponding to the structure information to obtain corrected key value pairs; and carrying out structural output on the corrected key value pair according to the structural information of the corrected key value pair. The bill image information can be identified and automatically acquired, and the bill information is output in a structured mode.

Description

Bill image processing method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to image processing technology and computer technology, and in particular, to a ticket image processing method, apparatus, storage medium, and electronic device.

Background

Common medical notes comprise hospitalization invoices, clinic invoices, bill of fees, statement of accounts, hospitalization medical records, laboratory sheets and the like, and due to the fact that the information system of each medical institution lacks unified data standard and the fact that professional terms used in aspects of medicines, medical consumables and the like lack standardization, the medical notes issued by each medical institution are integrated and different, and the note information has very important value for accounting claim amount, evaluating the health condition of clients and the like of insurance companies, and the information is mainly manually recorded at present, so that manpower, time and fund investment are very large.

Therefore, a new bill image processing method, a new bill image processing device, a new storage medium and a new electronic device are needed, and the general bill information can be identified and collected, and the bill information is structurally output.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The embodiment of the disclosure provides a bill image processing method, a bill image processing device, a storage medium and electronic equipment, which can identify and collect general bill information and structurally output the bill information.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to an aspect of the embodiments of the present disclosure, there is provided a ticket image processing method, wherein the method includes: acquiring sub-images of the bill image, and acquiring text information, pixel information and coordinate information of each sub-image in the bill image; determining a key value pair formed by sub-images in the bill image and structural information of the key value pair according to the text information, the pixel information and the coordinate information of each sub-image in the bill image; correcting the key value pairs based on the error correction information corresponding to the structure information to obtain corrected key value pairs; and carrying out structural output on the corrected key value pair according to the structural information of the corrected key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, determining a key value pair composed of sub-images in the ticket image and structure information of the key value pair according to text information, pixel information of each sub-image and coordinate information of the ticket image includes: inputting the text information, the pixel information and the coordinate information of each sub-image into a pre-trained relation matching model to obtain an association relation matrix of each sub-image and other sub-images and the structural information of each sub-image; determining key value pairs formed by sub-images in the bill images according to the incidence relation matrix; and determining the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, determining, according to the association relation matrix, a key value pair composed of sub-images in the ticket image includes: and extracting sub-images with the association relation value exceeding a threshold value from the association relation matrix, and forming key value pairs based on each sub-image and the sub-images with the association relation value exceeding the threshold value.

In some exemplary embodiments of the disclosure, based on the foregoing scheme, the structural information of the key-value pairs includes: conventional key-value pairs or table key-value pairs; the structural information of the sub-image includes: key information or value information, the key information including: conventional key or table construction, the value information includes: conventional values or table values; determining the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair, wherein the method comprises the following steps: and determining the structural information of the key value pair according to the key information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing solution, correcting the key value pair based on the correction information corresponding to the structure information, to obtain the corrected key value pair includes: arranging the sub-images corresponding to the key values according to the coordinate information of the sub-images in the key value pair in the bill image; and correcting the key value pair according to the error correction information corresponding to the structure information corresponding to the key value pair and the arranged sub-image corresponding to the key value pair, and obtaining the corrected key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, structurally outputting the corrected key value pair according to the structural information of the corrected key value pair includes: setting a marking pattern corresponding to the structure information of each key value pair; and marking the corrected key value pairs in the bill image according to the marking pattern.

In some exemplary embodiments of the present disclosure, based on the foregoing scheme, structurally outputting the corrected key value pair according to the structural information of the corrected key value pair includes: dividing the corrected key value pairs into categories according to the structural information of the corrected key value pairs; and automatically generating a key and value part of the category based on the structural information of the key value pair neutron image.

According to an aspect of the embodiments of the present disclosure, there is provided a ticket image processing apparatus, wherein the apparatus includes: the acquisition module is configured to acquire sub-images of the bill images, and acquire text information, pixel information and coordinate information of each sub-image in the bill images; the determining module is configured to determine key value pairs formed by the sub-images in the bill image and structural information of the key value pairs according to the text information, the pixel information and the coordinate information of each sub-image in the bill image; the error correction module is configured to correct the key value pairs based on the error correction information corresponding to the structure information, and obtain corrected key value pairs; and the output module is configured to perform structural output on the corrected key value pairs according to the structural information of the corrected key value pairs.

In some exemplary embodiments of the disclosure, based on the foregoing solution, the determining module includes: the obtaining unit is configured to input the text information, the pixel information and the coordinate information of each sub-image into a pre-trained relation matching model to obtain an association relation matrix of each sub-image and other sub-images and the structural information of each sub-image; the first determining unit is configured to determine key value pairs formed by sub-images in the bill image according to the incidence relation matrix; and a second determining unit configured to determine the structural information of the key value pair according to the structural information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the present disclosure, based on the foregoing aspect, the first determining unit is configured to extract, from the association matrix, sub-images whose association value with each sub-image exceeds a threshold value, and form key-value pairs based on each sub-image and the sub-images whose association value with each sub-image exceeds the threshold value.

In some exemplary embodiments of the disclosure, based on the foregoing scheme, the structural information of the key-value pairs includes: conventional key-value pairs or table key-value pairs; the structural information of the sub-image includes: key information or value information, the key information including: conventional key or table construction, the value information includes: conventional values or table values; the second determining unit is configured to determine structural information of the key value pair according to key information of the sub-image corresponding to the key value pair.

In some exemplary embodiments of the disclosure, based on the foregoing solution, the error correction module is configured to arrange the sub-images corresponding to the key value pair according to the coordinate information of the sub-images in the ticket image; and correcting the key value pair according to the error correction information corresponding to the structure information corresponding to the key value pair and the arranged sub-image corresponding to the key value pair, and obtaining the corrected key value pair.

In some exemplary embodiments of the disclosure, based on the foregoing solution, the output module is configured to set a flag pattern corresponding to the structure information of each key value pair; and marking the corrected key value pairs in the bill image according to the marking pattern.

In some exemplary embodiments of the disclosure, based on the foregoing scheme, the output module is configured to divide the corrected key value pairs into categories according to structural information of the corrected key value pairs; and automatically generating a key and value part of the category based on the structural information of the key value pair neutron image.

According to an aspect of the disclosed embodiments, there is provided a computer readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the method as described in the above embodiments.

According to an aspect of an embodiment of the present disclosure, there is provided an electronic device including: one or more processors; and storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the method as described in the above embodiments.

In the embodiment of the invention, sub-images of the bill image are acquired, and text information, pixel information and coordinate information of the bill image of each sub-image are acquired; determining a key value pair formed by sub-images in the bill image and structural information of the key value pair according to the text information, the pixel information and the coordinate information of each sub-image in the bill image; correcting the key value pairs based on the error correction information corresponding to the structure information to obtain corrected key value pairs; and carrying out structural output on the corrected key value pair according to the structural information of the corrected key value pair. The bill image information can be identified and automatically acquired, and the bill information is output in a structured mode.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 schematically illustrates a flow chart of a ticket image processing method according to one embodiment of the present disclosure;

FIG. 2 schematically illustrates a schematic diagram of a structured presentation of a ticket image according to one embodiment of the disclosure;

FIG. 3 schematically illustrates a schematic diagram of a structured presentation of a ticket image according to another embodiment of the present disclosure;

FIG. 4 schematically illustrates a schematic diagram of a method of determining key value pairs and structural information of the key value pairs of sub-image compositions in a ticket image, according to one embodiment of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of an associative relationship matrix according to one embodiment of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of relational matching model-based data processing in accordance with one embodiment of the disclosure;

FIG. 7 schematically illustrates a flow diagram of a ticket image processing method according to another embodiment of the present disclosure;

FIG. 8 schematically illustrates a block diagram of a ticket image processing apparatus according to an embodiment of the present disclosure;

fig. 9 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Fig. 1 schematically illustrates a flow chart of a ticket image processing method according to one embodiment of the present disclosure. The method provided in the embodiments of the present disclosure may be processed by any electronic device having computing processing capability, for example, a server or a terminal device, and in the following embodiments, the terminal is taken as an execution body for illustration, but the present disclosure is not limited thereto.

As shown in fig. 1, the bill image processing method provided by the embodiment of the present disclosure may include the following steps:

In step S110, the category and sub-image of the bill image are acquired, and text information, pixel information, and coordinate information of the bill image of each sub-image are acquired.

In the embodiment of the disclosure, the terminal may acquire a bill image shot by a user, perform a series of processing on the bill image, and acquire a sub-image of the bill image, text information, pixel information and coordinate information of each sub-image in the bill image.

In the embodiment of the disclosure, a series of processing can be performed on the bill image according to the arrangement of the plurality of modules, so as to obtain the sub-image of the bill image, and the text information, the pixel information and the coordinate information of each sub-image in the bill image. These modules may include:

(1) An image classification module: the method is used for judging the image quality and the font definition.

The method comprises the steps of acquiring shot bill images uploaded by a user, wherein the bill images can comprise hospitalized invoices, outpatient invoices, laboratory sheets, bill of fees, hospitalized medical records and the like, about 34 types, and as the current medical bill has no data standard, the bill issued by each hospital is different in paper/seal color, content layout, content expression and the like, and the environment shot by the user by using a mobile phone is different, so that the quality of the finally acquired bill images is uneven, the identification and the structuring are greatly influenced, and three-level classification quality inspection is carried out on the bill images uploaded by the user by using an image classification module for controlling the quality of data.

First, the classification judgment is carried out on the bill images, the bill images are artificially divided into 34 classes, and the classification judgment is realized by training a classification model. If the bill image material does not belong to one of the 34 classes, the user is reminded to upload again. The bill image category of each category can be automatically marked, and the bill image with the category mark enters the next automatic identification link; and for bill images of which the types cannot be identified, the bill images do not enter an automatic identification link, a user is reminded of re-uploading, if the bill images are not uploaded, the bill images are automatically reserved, and in a subsequent link, the bill image data can be manually checked and recorded.

Secondly, respectively training a quality detection classification model aiming at each classified bill image: and if the bill image is qualified and unqualified, prompting a user to upload again. If the user does not upload again, the bill image data can be automatically saved, and in the subsequent links, manual checking and recording are carried out.

Finally, entering a next text definition detection link for the bill image with qualified quality inspection, and carrying out classification judgment on the text definition: and the links are qualified and unqualified, and the link is mainly the definition of the audit character. If the quality inspection is not qualified, prompting the user to upload again. If the user does not upload again, the bill image data can be automatically saved, and in the subsequent links, manual checking and recording are carried out. And warehousing the bill images which are qualified in quality inspection, prompting the user, wherein the prompting information comprises: the number of bill images, the passing number of quality inspection, the unqualified number and whether uploading is successful or not. The quality inspection result does not influence whether the uploading is successful or not, and only influences business handling timeliness.

In the embodiment of the invention, the bill image can support three file formats: TIFF, PDF, JPEG, wherein TIFF is an image package, JPEG is a single compressed image, PDF is a document format, and for these different formats, the image classification module first determines the format of the document, and if TIFF is the format, then parses all the pictures in the package to form a single JPEG picture; if the format is PDF format, each document page is cut and converted into a single JPEG format, and after conversion, the category of each picture is analyzed.

(2) Text detection module: for detecting and dividing blocks of Chinese in an image.

In the embodiment of the invention, the bill images are subjected to text detection, text block splicing and segmentation by using the object detection EAST model, so that each bill image can be segmented to obtain a cluster of text block small image slices, namely sub-images, and pixel information of each sub-image and coordinate information of the original image are obtained.

(3) An optical character recognition (Optical Character Recognition, OCR) module: and (5) recognizing characters in the bill image.

And performing OCR text recognition on the sub-images corresponding to each bill image to obtain the text information on each sub-image.

Through the processing of the above 3 modules, the following information of each bill image can be obtained:

a. bill image classification category information (obtained from the image classification module);

b. sub-images corresponding to the bill images, pixel information of each sub-image and position coordinate information (obtained according to a text detection module) of the bill images;

c. text information on top of the thumbnail (obtained from the OCR recognition module).

In step S120, key value pairs formed by sub-images in the bill image and structure information of the key value pairs are determined according to text information, pixel information and coordinate information of each sub-image in the bill image.

In the embodiment of the invention, the text information, the pixel information and the coordinate information of each sub-image can be input into a pre-trained relation matching model to obtain the incidence relation matrix of each sub-image and other sub-images and the structural information of each sub-image, the key value pair formed by the sub-images in the bill image is determined according to the incidence relation matrix, and the structural information of the key value pair is determined according to the structural information of the sub-image corresponding to the key value pair.

In the embodiment of the invention, the sub-images with the association relation value exceeding the threshold value of each sub-image can be extracted from the association relation matrix, and key value pairs are formed based on each sub-image and the sub-images with the association relation value exceeding the threshold value of each sub-image.

In the embodiment of the present invention, key-Value pairs (Key-Value) refer to a combination with an association relationship formed by taking a sub-image in a certain bill image as a Key (Key) and taking at least one other sub-image in the bill image as a Value (Value), and the structure information of each Key-Value pair may be a normal Key-Value pair (normal Key-Value) or a table Key-Value pair (table Key-Value), but the present invention is not limited thereto, for example, the structure information of the Key-Value pair may also be a long-text Key-Value pair. Conventional key-value pairs refer to combinations of keys with one sub-image and values with one sub-image, and generally refer to a certain category in a bill image and specific content (one) of the category. The table key value pair refers to a combination of one sub-image as a key and a plurality of sub-images as values, and is usually displayed in the form of a table or similar table (a table without a wire frame) in a bill image, such as a plurality of specific contents in a certain category and in the category.

In the embodiment of the invention, the structural information of the sub-image comprises: key information or value information, the key information may include: conventional key or form construction, the value information may include: conventional values or table values. And determining the structural information of the key value pair according to the key information of the sub-image corresponding to the key value pair.

For example, based on the relation matching model, it is determined that the sub-images N1 and N2 constitute a key value pair, wherein the structural information of N1 is key information, specifically, a normal key, the structural information of the sub-image N2 is value information, specifically, a normal value, and the structural information of the key value pair, that is, the key value pair is a normal key value pair, is obtained based on the key information of the sub-images (the key information of N1, that is, the normal key) of the N1 and N2 constitute the key value pair.

In step S130, the key value pair is corrected based on the correction information corresponding to the structure information, so as to obtain the corrected key value pair.

In the embodiment of the invention, the sub-images corresponding to the key values can be arranged according to the coordinate information of the sub-images in the key value pair in the bill image, and the key value pair is corrected according to the error correction information corresponding to the structure information corresponding to the key value pair and the arranged sub-images corresponding to the key value pair, so as to obtain the corrected key value pair.

In the embodiment of the invention, error correction rules of key value pairs aiming at different structural information are preset, the error correction rules can be integrated, after the key value pairs and the structural information of the key value pairs are acquired, the sub-images of all the key value pairs are arranged according to the coordinates of the sub-images corresponding to the key value pairs in the original bill image, so as to obtain the bill image in which the key value pairs are arranged, and error correction is carried out on the bill image according to the integrated error correction rules corresponding to the structural information comprising each key value pair, so that the error corrected key value pairs are acquired.

In this embodiment, for the bill images arranged by the key-value pairs, the arranged key-value pairs may be subjected to error correction according to the following set of error correction rules in order from left to right from top to bottom (hereinafter, a key refers to a sub-image whose structure information is key information, and a value refers to a sub-image whose structure information is value information):

1) The values between two keys located in the same row are combined.

2) For the case where the key is also followed by a key, the first key is discarded if the pixel distance between the two keys is greater than the length of the smaller of the two keys, otherwise the two keys are merged.

3) For the case where the value is also followed by a value, the values are directly combined until the next key in the same row is found.

4) If multiple values of different rows correspond to the same key, the key is marked as a table key, and all values below the key belong to table values corresponding to the table key until a key value pair with the next key and the value in the same row is found.

5) For fields whose value length approaches or exceeds the original ticket image length, the value of the long key value pair is marked, and the key immediately above the row is marked as the key of the long key value pair.

6) The form key can only form key value pairs with the form value, the conventional key can only form key value pairs with the conventional value, if cross matching exists, the key needs to be disassembled, and then the cyclic processing is started from 1) until the key cannot be disassembled continuously.

It should be noted that, among the above error correction rules, 1), 2), and 3) are error correction rules for conventional key-value pairs, 2) and 4) are error correction rules for table key-value pairs, 5) are error correction rules for long key-value pairs (also rules that identify long key-value pairs), and 6) are error correction rules for all key-value pairs.

Through the above set of correction rules, each key value pair in the bill image arranged by the key value pair can be corrected, thereby obtaining the corrected key value pair.

In step S140, the corrected key value pair is structured and output according to the structure information of the corrected key value pair.

In the embodiment of the invention, key value pairs of bill images can be divided into three main categories: conventional key-value pairs, form key-value pairs, long-text key-value pairs, wherein a form key-value pair, typically a plurality of values, share a key; the length of the values of the long key-value pairs often spans rows, even segments. Based on the characteristics of the key value pairs, the structured result of each key value pair is classified and output by combining a knowledge base of the conventional key value pairs (such as name, age, gender, amount and the like), a knowledge base of the table key value pairs (such as project name, dosage, metering and the like) and a knowledge base of the long-text key value pairs (such as admission check, past medical history, diagnosis stage, diagnosis and treatment process and the like), and the output result can be stored in a Json format.

In the embodiment of the invention, the marking pattern corresponding to the structure information of each key value pair can be set, and the key value pair after error correction is marked in the bill image according to the marking pattern. Wherein the marking pattern may be a marking shape and/or color, e.g. a circle marked with a solid line for a sub-image of a key in a conventional key-value pair, a circle marked with a dashed line for a sub-image of a value in a conventional key-value pair, a square marked with a solid line for a sub-image of a key in a table key-value pair, and a square marked with a dashed line for a sub-image of a value in a table key-value pair. For another example, different colors are set for the sub-images of different structure information in each key value pair.

Fig. 2 schematically illustrates a schematic diagram of a structured presentation of a ticket image, as shown in fig. 2, that is a presentation of a hospital charge ticket structure, wherein the sub-images of the keys in the conventional key value pair are marked with a solid circle, the sub-images of the values in the conventional key value pair are marked with a dashed circle, the sub-images of the keys in the table key value pair are marked with a solid square, and the sub-images of the values in the table key value pair are marked with a dashed square, according to one embodiment of the present disclosure.

In the embodiment of the invention, the corrected key value pairs can be divided into categories according to the structural information of the corrected key value pairs, and the key and value parts of the categories can be automatically generated based on the structural information of the sub-images of the key value pairs.

The obtained key value pair after error correction has a corresponding relationship with the value part, and the key value pair is divided into the part of the key of the category according to the preset category (comprising the key and the value part), and the value part of the key value pair is divided into the value part of the category.

In the embodiment of the invention, the structured display can be further structured display on the basis of marking the corrected key value pairs according to the marking pattern in the bill image, but the invention is not limited to this, and for example, the structured display can also be structured display on the basis of the original bill image.

Fig. 3 schematically illustrates a schematic diagram of a structured presentation of a ticket image according to another embodiment of the present disclosure, where, as illustrated in fig. 3, the right side of the interface may present the structured ticket image in fig. 2, the left side of the interface may automatically input the right ticket image according to a type of structure information of a key value pair, automatically input a field corresponding to a key portion of the key value pair to the key portion of the type, and input a field corresponding to a value portion of the key value pair to the value portion of the type.

When the structured display is performed, the structured result may be edited, for example, the category and the key portion of the category may be set by the user, and the corresponding value portion may be automatically input according to the key portion set by the user.

After the key value pair after error correction is structured and output, the output result, structured data, image data marked as unclear and image data which cannot be classified can be manually checked and data is complemented by manpower. For example, a modification button is provided for the output result shown in fig. 2 or 3, and by clicking the modification button, the user can modify the output result.

The method for determining the key value pairs and the structure information of the key value pairs, which are composed of the sub-images in the bill image and are proposed in the embodiment of the present invention, will be described in detail with reference to specific embodiments.

Fig. 4 schematically illustrates a schematic diagram of a method of determining key value pairs and structural information of the key value pairs of sub-image compositions in a ticket image according to one embodiment of the present disclosure, as illustrated in fig. 4, the method may include the following flow:

in step S410, the text information, the pixel information and the coordinate information of each sub-image are input into a pre-trained relationship matching model, so as to obtain an association relationship matrix of each sub-image and other sub-images and the structure information of each sub-image.

In the embodiment of the invention, the relation matching model can be constructed in advance, and the relation matching model is trained through the sample. The sample is a sub-image set of a bill image of a certain category, the category of the bill image, the text information and the pixel information of each sub-image and the coordinate information of the bill image which the sample belongs to are input into a relation matching model, and the relation matching model is trained based on the true value of the key value relation among the sub-image samples and the predicted value of the relation matching model so as to obtain a relation matching module capable of determining the association relation matrix and the key value relation among the sub-images.

It should be noted that in the embodiment of the present invention, the relationship matching models corresponding to the categories of all the bill images may be set together, so that the association relationship matrix of each sub-image and other sub-images for the category of any bill image and the structural information of each sub-image may be determined.

It should be noted that, when determining the association relation matrix and the key value relation between the sub-images of the bill image of a certain category, the relation matching model needs to be determined together based on the category of the bill image, the pixel information and the text information of the sub-images and the coordinate information relation of the bill image, the key value relation between the sub-images refers to: the structure information of a certain sub-image is "Key (Key)" or "Value", if it is a Key (Key), it is a normal Key (key_normal) or a table (key_table), and if it is a Value (Value), it is a normal Value (value_normal) or a table Value (value_table).

The incidence relation matrix between the sub-images can be a matrix corresponding to the number of the sub-images, a certain bill image can obtain N Zhang Zitu images, and then a (N) -x (N) matrix M can be obtained, wherein the relation between any one sub-image A and all sub-images (including the sub-image A) can be represented by an N-dimensional vector, each element of the vector represents the incidence relation value of the sub-image A and other sub-images, N small pictures can be matched to N-1-dimensional vectors, and an (N) -x (N) -matrix M is formed.

Fig. 5 schematically illustrates a schematic diagram of an association relationship matrix according to one embodiment of the present disclosure. Wherein A, B, c. represent each sub-image, with N sub-images for each row and column, wherein the numerical value of each element in the matrix represents an associative value, which is a symmetric matrix.

For example, a relational matching model is trained in advance, the relational matching model may be trained based on a graph-convolution neural network, and input information I of the model may include:

I＝{Text，Image，Coordinate}；

wherein: text represents Text information of the sub-image; image represents pixel information of the sub-Image, coordinate represents Coordinate information of the sub-Image in the original bill Image, and two points are taken to represent { { { x1, y1}, { x2, y2 }.

The expected model output is the incidence relation matrix M and key value relation of the sub-images, namely a vector T of the incidence relation matrix M of N dimension and the key value relation of 4 dimension of any sub-image A can be obtained, and each element in M and T is a numerical value between 0 and 1:

T＝{Key_normal，Key_table,Value_normal,Value_table}

wherein Key_normal represents a regular Key, key_table represents a table Key, value_normal represents a regular Value, and value_table represents a table Value.

For example, if sub-image a is key_table, t= {0,1, 0}.

Fig. 6 schematically illustrates a schematic diagram of data processing based on a relationship matching model according to an embodiment of the present disclosure, as shown in fig. 6, sub-images are obtained from an original document image, text information, pixel information and coordinate information of the document image of each sub-image are extracted by features, and the text information, the pixel information and the coordinate information of the document image are input into a trained relationship matching model (which may be a graph convolution neural network) after feature normalization and feature stitching, so as to obtain a key value relationship and an association relationship matrix of each sub-image.

In step S420, key value pairs composed of sub-images in the ticket image are determined according to the association relation matrix.

For example, in the matrix shown in fig. 5, the threshold value is 0.35 (experimental value), and in the order from top to bottom, the association value with a value greater than 0.35 in the first row of the matrix is 1, and the corresponding row and column respectively correspond to the sub-images a and B, which indicate that the two sub-images may form a key value pair.

It should be noted that, in the order of the columns from left to right, sub-images whose association value exceeds the threshold value may be extracted, and key value pairs may be formed based on each sub-image and sub-images whose association value exceeds the threshold value.

In step S430, the structural information of the key value pair is determined according to the structural information of the sub-image corresponding to the key value pair.

In the embodiment of the invention, the structural information of each sub-image is obtained based on a pre-trained relation matching model, and after the key value pair is obtained, the structural information of the key value pair can be determined according to the key information of the sub-image corresponding to the key value pair.

For example, in the matrix corresponding to fig. 5, the sub-image a and the sub-image B form a key value pair, and the structural information of the key value pair may be obtained based on the key information of the sub-image in the key value pair, and if the sub-image a is a normal key and the sub-image B is a normal value, the structural information of the key value pair is a normal key value pair. If the sub-image A is a table key, the sub-image B is a table value, and the structural information of the key value pair is a table key value pair.

It should be noted that there may be a case where the key value of the sub-image in the key value pair does not correspond, for example, if the sub-image a is a normal key, the sub-image B represents a value. In the subsequent error correction process, error correction is performed based on the error correction rule of 6).

The following describes the overall flow of the bill image processing method provided by the embodiment of the invention in detail. Fig. 7 schematically illustrates a flow diagram of a ticket image processing method according to another embodiment of the present disclosure, as shown in fig. 7, which may include the following flow:

in S701, a ticket image is acquired.

It should be noted that, the client may upload the captured ticket image through the client system, and the client may be an applet set in the application program or some existing applet, for example, a WeChat applet.

In S702, the ticket image is added to a task queue.

In S703, it is determined whether the ticket image is an existing category of ticket image based on the image classification module.

In the embodiment of the invention, the bill images of the existing categories can be structured.

If yes, S704 is executed, otherwise S701 is executed.

In S704, text detection is performed based on the text detection module.

In S705, text recognition is performed based on the OCR module.

In S706, a structured result is determined based on the relationship-matching model.

In S707, the structured result is error-corrected.

In S708, the error correction result is output in a structured manner.

In S709, the structured result that was manually audited is put in storage.

The following describes apparatus embodiments of the present disclosure that may be used to perform the above-described methods of the present disclosure. For details not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the bill image processing method described above in the present disclosure.

Fig. 8 schematically illustrates a block diagram of a ticket image processing apparatus according to an embodiment of the present disclosure. Referring to fig. 8, a ticket image processing apparatus 800 of an embodiment of the present disclosure may include: acquisition module 810, determination module 820, error correction module 830, and output module 840.

The acquiring module 810 is configured to acquire sub-images of the bill image, and acquire text information, pixel information and coordinate information of each sub-image in the bill image.

And a determining module 820 configured to determine key value pairs formed by the sub-images in the bill image and structure information of the key value pairs according to the text information, the pixel information and the coordinate information of each sub-image in the bill image.

And the error correction module 830 is configured to correct the key value pair based on the error correction information corresponding to the structure information, so as to obtain the corrected key value pair.

And an output module 840 configured to perform structural output on the corrected key value pair according to the structural information of the corrected key value pair.

Fig. 9 shows a schematic diagram of a computer system suitable for use in implementing embodiments of the present disclosure. It should be noted that, the computer system 900 of the electronic device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present disclosure.

As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for system operation are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.

The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.

In particular, according to embodiments of the present disclosure, the processes described below with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. When the computer program is executed by a Central Processing Unit (CPU) 901, various functions defined in the system of the present application are performed.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules and/or units involved in the embodiments of the present disclosure may be implemented in software, or may be implemented in hardware, and the described modules and/or units may also be disposed in a processor. Wherein the names of the modules and/or units do not in some cases constitute limitations on the modules and/or units themselves.

As another aspect, the present application also provides a computer-readable medium that may be contained in the electronic device described in the above embodiment; or may exist alone without being incorporated into the electronic device. The computer-readable medium carries one or more programs which, when executed by one of the electronic devices, cause the electronic device to implement the methods described in the embodiments below.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A ticket image processing method, characterized in that the method comprises:

acquiring sub-images of the bill image, and acquiring text information and pixel information of each sub-image and coordinate information of each sub-image in the bill image;

according to the text information, the pixel information and the coordinate information of each sub-image in the bill image, determining a first key value pair formed by the sub-images in the bill image and the structure information of the first key value pair, wherein the method specifically comprises the following steps: inputting the text information, the pixel information and the coordinate information of each sub-image in the bill image into a pre-trained relation matching model to obtain an association relation matrix of each sub-image and other sub-images and the structural information of each sub-image; determining a first key value pair formed by sub-images in the bill image according to the incidence relation matrix; determining the structural information of the first key value pair according to the structural information of the sub-image corresponding to the first key value pair;

The relation matching model is trained in advance through a sample, the sample is a sub-image set of bill images of the same category, the category of the bill images, the text information and the pixel information of each sub-image and the coordinate information of each sub-image in the bill images are input into the relation matching model, and the relation matching model is trained based on the true value of the key value relation among the sub-images and the predicted value of the relation matching model;

the key value pair is to take one sub-image in the bill image as a key and take at least one other sub-image in the bill image as a value;

the structure information of the key value pair includes: conventional key value pairs, form key value pairs or long text key value pairs;

the structural information of the sub-image includes: key information or value information;

the determining the structure information of the first key value pair according to the structure information of the sub-image corresponding to the first key value pair includes: determining the key information in the structural information of the sub-image corresponding to the first key value

Structural information of the first key value pair;

correcting the first key value pair based on the error correction information corresponding to the structure information of the first key value pair to obtain a corrected second key value pair;

And carrying out structural output on the second key value pair after error correction according to the structural information of the second key value pair after error correction.

2. The method of claim 1, wherein determining a first key-value pair of sub-image components in the ticket image from the association matrix comprises:

and extracting sub-images with association relation values exceeding a threshold value from the association relation matrix to form a first key value pair.

3. The method of claim 1, wherein the key information comprises: a regular key or a form key, the value information including: conventional values or table values.

4. The method of claim 1, wherein correcting the first key value pair based on correction information corresponding to the structure information of the first key value pair, to obtain a corrected second key value pair, comprises:

arranging the sub-images corresponding to the first key value according to the coordinate information of the sub-images in the first key value pair in the bill image;

and correcting the first key value pair according to the error correction information corresponding to the structure information corresponding to the first key value pair and the arranged sub-image corresponding to the first key value pair, and obtaining a corrected second key value pair.

5. The method of claim 1, wherein the structured outputting of the error corrected second key value pairs according to the structure information of the error corrected second key value pairs comprises:

setting a marking pattern corresponding to the structure information of each second key value pair;

marking a second key value pair after error correction in the bill image according to the marking pattern.

6. The method of claim 1 or 5, wherein the structured outputting of the error corrected second key value pair according to the structure information of the error corrected second key value pair comprises:

dividing the corrected second key value pair into categories according to the structural information of the corrected second key value pair;

and automatically generating keys and values of the categories based on the structural information of the second key value pair neutron image.

7. A ticket image processing apparatus, characterized in that the apparatus comprises:

the acquisition module is configured to acquire sub-images of the bill image, and acquire text information and pixel information of each sub-image and coordinate information of each sub-image in the bill image;

the determining module is configured to determine a first key value pair formed by the sub-images in the bill image and structural information of the first key value pair according to text information and pixel information of each sub-image and coordinate information of each sub-image in the bill image, and is specifically used for: inputting the text information, the pixel information and the coordinate information of each sub-image in the bill image into a pre-trained relation matching model to obtain an association relation matrix of each sub-image and other sub-images and the structural information of each sub-image; determining a first key value pair formed by sub-images in the bill image according to the incidence relation matrix; determining the structural information of the first key value pair according to the structural information of the sub-image corresponding to the first key value pair;

the determining module is further configured to: determining the structural information of the first key value pair according to the key information in the structural information of the sub-image corresponding to the first key value pair;

the error correction module is configured to correct the first key value pair based on error correction information corresponding to the structure information of the first key value pair, and obtain a corrected second key value pair;

and the output module is configured to perform structural output on the corrected second key value pair according to the structural information of the corrected second key value pair.

8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 6.

9. An electronic device, comprising:

one or more processors;

a storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1 to 6.