CN114092948B

CN114092948B - Bill identification method, device, equipment and storage medium

Info

Publication number: CN114092948B
Application number: CN202111404281.3A
Authority: CN
Inventors: 秦铎浩
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2023-09-22
Anticipated expiration: 2041-11-24
Also published as: CN114092948A; WO2023093014A1

Abstract

The disclosure provides a bill identification method, device, equipment and storage medium, relates to the technical field of image processing, and particularly relates to the field of intelligent searching. The specific implementation scheme is as follows: acquiring a bill picture; dividing the bill picture to obtain subareas in the bill picture; for each sub-region, acquiring identification information of the sub-region; and integrating the identification information of each subarea to obtain the identification result of the bill picture. The method and the device can realize the identification of the bills with different styles without distinguishing the bill types, and provide a universal bill identification method.

Description

Bill identification method, device, equipment and storage medium

Technical Field

The disclosure relates to the technical field of image processing, in particular to the field of intelligent search, and specifically relates to a bill identification method, device and equipment and a storage medium.

Background

In real life, various notes exist, such as bank deposit notes, income certificates, shopping tickets and the like, the various notes are required to be digitally archived and retrieved in daily application, and the data of a pure image is difficult to retrieve and often needs to be identified before retrieval.

Disclosure of Invention

The disclosure provides a bill identification method, a bill identification device, bill identification equipment and a storage medium.

According to a first aspect of the present disclosure, there is provided a ticket identification method, comprising:

acquiring a bill picture;

dividing the bill picture to obtain subareas in the bill picture; the sub-region is one of a generic structure; the general structure is a structure contained in the bill obtained through statistics;

for each sub-region, obtaining identification information of the sub-region;

and integrating the identification information of each subarea to obtain the identification result of the bill picture.

According to a second aspect of the present disclosure, there is provided a bill identifying means comprising:

the acquisition module is used for acquiring the bill pictures;

the sub-region determining module is used for dividing the bill picture to obtain sub-regions in the bill picture; the sub-region is one of a generic structure; the general structure is a structure contained in the bill obtained through statistics;

the acquisition module is used for acquiring identification information of each subarea;

and the integration module is used for integrating the identification information of each subarea to obtain the identification result of the bill picture.

According to a third aspect of the present disclosure, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method according to the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the first aspect.

The bill identification method provided by the disclosure can identify bills of different styles without distinguishing bill types, namely, a universal bill identification mode is realized.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a flow chart of a ticket identification method provided by an embodiment of the present disclosure;

FIG. 2A is a schematic diagram of key-value versus KV field in an embodiment of the present disclosure;

FIG. 2B is a schematic illustration of a mid-roll landing zone in an embodiment of the present disclosure;

FIG. 2C is a schematic diagram of a table section in an embodiment of the present disclosure;

FIG. 2D is a schematic diagram of a KV zone and table zone combination in an embodiment of the present disclosure;

FIG. 2E is a schematic diagram of a drop zone and a form zone in an embodiment of the present disclosure;

FIG. 2F is a schematic diagram of KV zones, paragraph zones, and table zones in an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of a bill identifying method provided by an embodiment of the present disclosure;

FIG. 4 is a schematic illustration of a sub-region in an embodiment of the present disclosure;

fig. 5 is a schematic structural view of a bill identifying device provided in an embodiment of the present disclosure;

fig. 6 is a block diagram of an electronic device for implementing a ticket identification method of an embodiment of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Common optical character recognition (Optical Character Recognition, OCR) is used directly to recognize text on a picture. For the bill, not only the text on the bill picture but also the corresponding key (keyword) and value (value) are identified according to the structure of the bill, and the mapping relation between the key and the value is established. For example, the result is that the value corresponding to key being [ age ] is [ 20 ].

When the related technology is used for bill identification, the identification is realized according to the bills at first according to the bill, the identification is realized for each bill type independently after the bill is classified, the text information of the text can be obtained through the OCR of the text of the bill in the realization process, and then the text format is extracted through writing the rule strategy, so that the bill is classified firstly, the corresponding rule strategy is determined for each bill type, the information of the text is only relied on in the process of writing the rule extraction, and the rule strategy writing is very complicated due to various extraction fields under the condition of extremely many bill formats. In general, in the related art, bill identification needs to be classified, that is, the type of bill needs to be determined first, and a corresponding rule policy needs to be determined for each type of bill, so that the whole bill identification is complex.

The bill identification method provided by the embodiment of the disclosure can identify different types of bills without distinguishing the bill types, namely, a general bill identification method is provided, and multi-mode bill identification can be realized. In the embodiment of the disclosure, for each bill type, the bill picture can be divided to obtain sub-regions in the bill picture; for each sub-region, acquiring identification information of the sub-region; the identification information of each subarea is integrated to obtain the identification result of the bill picture, and the embodiment of the invention does not need to distinguish the bill types and determine the corresponding rule strategy for each bill type, so that the complexity of bill identification can be reduced.

The embodiment of the disclosure provides a bill identification method, which can comprise the following steps:

acquiring a bill picture;

dividing the bill picture to obtain subareas in the bill picture, wherein the subareas are one of general structures; the general structure is the structure contained in the bill obtained through statistics;

for each sub-region, acquiring identification information of the sub-region;

In the embodiment of the disclosure, the bill picture is divided to obtain sub-regions in the bill picture, then identification information of the sub-regions is obtained for each sub-region, and the identification information of each sub-region is integrated to obtain the identification result of the bill picture. Therefore, the general bill identification is realized without distinguishing the types of bill pictures, the bill types are not required to be divided, corresponding rule strategies are not required to be determined for a plurality of bill pictures, and the complexity of bill identification can be reduced.

Fig. 1 is a flowchart of a ticket identification method provided in an embodiment of the present disclosure. Referring to fig. 1, the bill identifying method provided by the embodiment of the present disclosure may include the following steps:

s101, acquiring a bill picture.

The bill picture is the picture of the bill to be identified.

Image acquisition can be carried out on the bill to be identified, and a bill picture is obtained. For example, a picture of the bill to be recognized is obtained by photographing, scanning, or the like.

S102, dividing the bill picture to obtain subareas in the bill picture.

The sub-region is one of the generic structures. For each sub-region, the sub-region is one of a generic structure.

The general structure is the structure contained in the bill obtained through statistics.

For example, a large number of sample bill pictures can be obtained, statistical analysis is performed on the large number of sample bill pictures in advance, and the structure contained in most sample bill pictures in the large number of sample bill pictures can be understood as the structure contained in the bill obtained through statistics, namely, the general structure.

A preset number threshold may be set, and when the number of sample bill pictures containing a certain structure is not less than the preset number threshold, the structure may be understood as a structure contained in a bill obtained through statistics, that is, a structure contained in most sample bill pictures. For example, the preset number threshold is 70, 100 sample bill pictures are obtained, wherein 90 sample bill pictures all contain a first structure, 85 sample bill pictures all contain a second structure, 80 sample bill pictures all contain a third structure, 30 sample bill pictures contain a fourth structure, 1 sample bill picture contains a fifth structure, and then the first structure can be used as a first general structure, the second structure can be used as a second general structure, the third structure can be used as a third general structure, the general structures in the embodiment of the disclosure comprise a first general structure, a second general structure and a third general structure, and the general bill is composed of one or more of the first general structure, the second general structure and the third general structure.

In the embodiment of the disclosure, format analysis can be performed on a plurality of sample bill pictures in advance, namely, the sample bill pictures are divided, so that a general structure in the plurality of sample bill pictures is obtained. Therefore, when the bill picture is to be identified, the general structure in the bill picture can be identified first, and the bill picture is simply understood, namely, the bill picture is identified in a large scale.

In one implementation, embodiments of the present disclosure are analyzed to obtain: most notes will include at least one of three distinct structures, the three common structures being a KV (key-value) field, a paragraph field, and a form field, and the notes include at least one of the three common structures. It is simply understood that most notes consist of one or more of KV, paragraph and table regions. As in case 1: the bill comprises a KV area; case 2: the bill comprises a section area; case 3: the bill comprises a form area; case 4: the bill comprises a KV zone and a paragraph zone; case 5: the bill comprises a section area and a form area; case 6: the bill comprises a KV area and a table area; case 7: the bill comprises a KV zone, a section zone and a table zone.

The general structure in embodiments of the present disclosure may include one or more of the following structures: key pairs KV field, paragraph field and table field.

The KV zone represents a zone in the bill picture, wherein the zone comprises key value pairs, keys in the key value pairs and values corresponding to the keys are presented according to a preset rule. The preset rule may include a plurality of keys and values corresponding to the keys are distributed in a determinant, as k1 in fig. 2A: v1; k2: v2, k3: v3 and k4: v4.

The form area represents the area containing the form in the ticket picture.

The paragraph area represents an area in which it is determined that only text is contained in the ticket picture.

For example, the sub-region may be one of the following combinations: KV region, as shown in fig. 2A; a staging area, as shown in FIG. 2B; a table area, as shown in FIG. 2C; KV area and paragraph area; KV and table regions, as in fig. 2D; a paragraph area and a table area, as in FIG. 2E; KV area, paragraph area, and table area, as in fig. 2F.

The number of KV regions, paragraph regions and table regions included in the subareas can be 1 or more.

In one implementation, S102 may include:

and responding to the region which contains the key value pairs and is presented by the preset rule by the key value pair keys and the values corresponding to the keys in the bill picture, and taking the region which contains the key value pairs and is presented by the preset rule by the key value pair keys and the values corresponding to the keys in the bill picture as a KV region.

And in response to the region containing the table in the bill picture, taking the region containing the table in the bill picture as a table region.

And responding to the region which only contains the text in the bill picture, and taking the region which only contains the text in the bill picture as a paragraph region.

The KV area, the paragraph area and the table area are universal structures in the bill, and bill pictures can be divided into universal structures, so that bill types are not required to be distinguished, and identification can be carried out on any type of bill.

In an alternative embodiment, the bill picture can be input into a pre-trained deep learning model, and the position information of each subarea in the bill picture is output through the deep learning model; and extracting each subarea from the bill picture based on the position information of each subarea.

A deep learning model may be pre-trained that is used to determine sub-regions in the ticket picture.

Specifically, the training in advance to obtain the deep learning model includes:

and acquiring a plurality of sample bill pictures.

For each sample bill picture, marking the subarea in the sample bill picture and the area category of each subarea, for example, marking the KV area, the paragraph area and/or the table area in the sample bill picture, and marking the position information, such as vertex coordinates, of each subarea.

A sample ticket picture and a sub-region of the sample ticket picture are taken as a sample pair, and the sub-region of the sample ticket picture can be understood as a true value.

And respectively inputting each sample pair into an initial model, comparing the model output obtained by inputting the sample pair into the initial model for each sample pair with a true value of the sample pair, namely a sub-region of a sample bill picture in the sample pair, and adjusting model parameters to ensure that the difference between the model output and the true value is smaller than a preset value, wherein the preset value can be determined according to practical conditions, such as 0.01,0.001 and the like, so that the difference between the model output and the true value is smaller than the preset value for each sample pair, or comparing the model output and the true value once as one iteration, and ending the whole training process when the iteration times reach the preset times to obtain a trained deep learning model.

The input of the trained deep learning model for identifying the subareas of the bill picture is the bill picture, and the output is the position information of each subarea in the bill picture, such as the vertex coordinates of each subarea. At the same time, the region category of the sub-region may also be output.

Therefore, the bill picture can be input into a pre-trained deep learning model, the position information of each subarea in the bill picture is output through the deep learning model, and each subarea is extracted from the bill picture based on the position information of each subarea.

For example, a bill picture is input into the deep learning model, and four corner coordinates ((x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4)) of each region, and the category (KV region, table region, paragraph region) of this coordinate region are output, respectively. Thus, based on the four corner coordinates ((x 1, y 1), (x 2, y 2), (x 3, y 3), (x 4, y 4)) of a region, the sub-region can be extracted from the ticket picture, which can be understood as a sub-picture in the ticket picture.

Through a pre-trained deep learning model, the sub-regions can be rapidly and accurately divided for the bill pictures.

S103, obtaining identification information of each sub-region.

And respectively identifying each subarea to obtain identification information of each subarea.

The different types of subareas differ in structure, so that the subareas can be identified in a targeted manner based on the difference in the area category of the subareas. The region category of the sub-region can be determined first, then the sub-region is identified based on the region category of the sub-region, and the identification information of the region is obtained.

KV area is in structural identification, and can be identified by utilizing a visual identification algorithm.

For example, structuring is done for KV: character blocks in the KV area are recognized through OCR recognition. And then, identifying the character blocks of the specific key through a dictionary of the predefined key, wherein the dictionary of the predefined key comprises the character blocks of a plurality of keys, comparing the identified character blocks with the character blocks of the plurality of keys in the dictionary, and if one character block is matched with the character blocks included in the dictionary, the character block is the character block of the key in the KV region. Then, extracting the value, wherein the extraction of the value with fixed position relation can be performed by configuring a corresponding searching strategy, for example, the first text block searched from key to the back is the value; the extraction of the relation uncertainty can be performed by a classification model, wherein the input of the classification model comprises the position information of the text block, the content of the text block and the relative position relation with the surrounding text block, and the output of the classification model comprises the classification result of which key corresponds to the value of the text block.

In the KV structuring process, through a deep learning model, for example, the classification model not only adds text information after OCR, but also carries out vectorization construction through image information, position and other empty information of characters, and the mapping relation between keys and values is identified.

The paragraph region contains only text, with an emphasis on element extraction, and for paragraph region implementation of element extraction operations in paragraphs can be based on a very large scale pre-trained language model. Extraction may be performed, for example, by named entity recognition such as natural language processing (NLP, natural Language Processing). For example, the paragraph element extraction is to extract elements from the result after OCR recognition by using a named entity recognition algorithm based on deep learning, and the extraction of key elements can be realized by adding a network structure of a two-way Long Short-Term Memory (LSTM) and a serialization labeling algorithm (sequence labeling algorithm, CRF) based on an NLP pre-training language model.

The table area focuses on table parsing, for example, splitting of table cells and identification of table grid lines can be achieved. Alternatively, a structured parsing of the table can be implemented using a model of the table parsing, such as TableNet (a deep learning model for end-to-end table detection and table data extraction from scanned document images).

In one implementation, the recognition model corresponding to the region class may be pre-trained for different region classes. In particular, model training may refer to the training process of deep learning models in the related art.

For KV regions, recognition models including OCR recognition and classification structures may be trained for structured recognition of KV regions. For example, a plurality of first training samples may be obtained in advance, the first training samples may be bill pictures including KV regions, recognition results of the respective first training samples are marked, and for each first training sample, the recognition result corresponding to the first training sample of the first training sample is taken as a sample pair, and the recognition result may be understood as a true value of the sample pair. Each sample pair is input into a first model, the first model may include an OCR recognition module and a classification module, the model output obtained by inputting the sample pair into the first model is compared with the true value of the sample pair, that is, the recognition result of the first training sample in the sample pair, and the model parameter is adjusted to make the difference between the model output and the true value smaller than a first value, which may be determined according to the actual situation, for example, 0.01,0.001, etc., so that the difference between the model output and the true value is smaller than a preset value for each sample pair, or the model output and the true value are compared once as an iteration, and when the iteration number reaches the preset number, the whole training process is ended, so as to obtain the recognition model for KV region recognition.

For a drop zone, an identification model containing NLP may be pre-trained for element extraction of the drop zone. For example, a plurality of second training samples may be obtained in advance, the second training samples may be bill pictures including a section area, recognition results of the second training samples are marked, and for each second training sample, the recognition result corresponding to the second training sample of the second training sample is taken as a sample pair, and the recognition result may be understood as a true value of the sample pair. And respectively inputting each sample pair into a second model, wherein the second model can comprise an NLP structure, or can also comprise LSTM and CRF on the basis of the NLP structure, comparing the model output obtained by inputting the sample pair into the second model with the true value of the sample pair, namely the identification result of the second training sample in the sample pair, and adjusting the model parameters to ensure that the difference between the model output and the true value is smaller than a second value, wherein the second value can be determined according to the actual situation, for example, 0.01,0.001 and the like, so that the difference between the model output and the true value is smaller than a preset value for each sample pair, or comparing the model output and the true value once as one iteration, and ending the whole training process when the iteration number reaches the preset number to obtain the identification model for identifying the paragraph region.

For the table area, the TableNet may be trained in advance, and specifically, the TableNet training process may refer to the TableNet in the related art, which is not described herein.

The deep learning model for identifying the subareas of the bill picture can output the position information of each subarea in the bill picture, such as the vertex coordinates of each subarea, and can also output the area category of the subarea. Based on this, a corresponding recognition model can be selected for the sub-region using the region category.

Inputting the bill picture into a pre-trained deep learning model, outputting the vertex coordinates of each subarea in the bill picture through the deep learning model, and simultaneously outputting the area category of each subarea through the deep learning model.

For each sub-region, the region category is used to indicate that the sub-region is a KV region, a paragraph region, or a table region.

S103 may include:

selecting an identification model corresponding to each subarea based on the area category of the subarea; and obtaining the identification information corresponding to the subarea by using the identification model.

The recognition models corresponding to different area categories are trained in advance, and in the bill recognition process, the corresponding recognition models can be directly selected based on the area categories. Therefore, corresponding extraction work is realized through the deep learning model for each sub-region, and the writing of rule strategies can be reduced through the model, so that the complexity of bill identification is further reduced. And aiming at the difference of the region types of the subregions, the recognition model suitable for the region type is selected to be recognized in a targeted manner, so that the recognition accuracy can be improved.

And S104, integrating the identification information of each sub-region to obtain the identification result of the bill picture.

After identifying the identification information of each subarea, the identification information of each subarea can be combined to obtain the identification result of the bill picture.

In a specific example, as shown in fig. 3, layout analysis, that is, layout analysis, is performed on a bill picture to obtain a plurality of sub-regions, and for each sub-region, the sub-region may be a KV region, a paragraph region or a table region. And carrying out structural extraction on the KV region, carrying out element extraction on the section region, carrying out table analysis on the table region, and finally summarizing the identification information obtained by each sub-region, namely summarizing the result. For example, the ticket picture includes 3 sub-regions including 1 KV region, 1 paragraph region, 1 table region.

The result returned by KV zone, namely KV identification information is { ' name: ' XX ' }; the result returned by the section area, namely the identification information of the section area is { "ID card number": "xxxxxx" }; the returned result of the table area, namely the identification information of the table area is { "total amount": "100" }, the identification information of the 3 sub-areas are combined, and the finally summarized result is { 'name': "XX", "identity card number": "xxxxxx", "total amount": "100" }, namely the identification result of the bill picture.

For example, in one example, a ticket picture is shown in FIG. 4. The bill picture is divided to obtain subareas in the bill picture, wherein the subareas comprise 1 KV zone as shown by a dotted line box 401 in fig. 4 and 1 table zone as shown by a dotted line box 402 in fig. 4. For each sub-area, identification information of each sub-area is obtained, for example, { "billing date": "2012.10.10" }, { "freight": "USD50.00" } is extracted from the KV area, and finally the result is summarized, the identification result of the bill picture shown in fig. 4 comprises { "billing date": "2012.10.10", "freight": "USD50.00" }, here, only the process of summarizing the identification result is exemplified, and information not illustrated in the example can also be included in fig. 4.

The embodiment of the disclosure realizes universal bill identification, and after acquiring the bill picture, the bill picture is divided to obtain the subareas in the bill picture; for each sub-region, acquiring identification information of the sub-region; and the identification information of each subarea is integrated, so that the identification result of the bill picture can be obtained, and the identification of the bill from end to end can be understood. In general, the disclosed embodiments enable a universal end-to-end ticket identification. And can support the structured work of different bill formats in bill picture discernment in-process, split bill picture into limited substructure, realize corresponding extraction work through the deep learning model to each substructure, can reduce the puzzlement that bill format kind number brought by a wide margin, can reduce the write of rule tactics through the model simultaneously, simplify holistic implementation scheme.

The embodiment of the disclosure also provides a bill identifying device, as shown in fig. 5, which may include:

an obtaining module 501, configured to obtain a ticket picture;

the sub-region determining module 502 is configured to divide the bill picture to obtain sub-regions in the bill picture; the subarea is one of the general structures; the general structure is the structure contained in the bill obtained through statistics;

an obtaining module 503, configured to obtain, for each sub-region, identification information of the sub-region;

and the integrating module 504 is configured to integrate the identification information of each sub-region to obtain the identification result of the bill picture.

Optionally, the sub-region comprises one or more of the following structures: key value pairs KV area, paragraph area and table area;

the determine sub-area module 502 is further configured to: responding to the bill picture including the area which contains the key value pair and is presented with the preset rule by the key of the key value pair and the value corresponding to the key, and taking the area which contains the key value pair and is presented with the preset rule by the key of the key value pair and the value corresponding to the key of the bill picture as a KV area; responding to the region containing the table in the bill picture, and taking the region containing the table in the bill picture as a table region; and responding to the region which only contains the text in the bill picture, and taking the region which only contains the text in the bill picture as a paragraph region.

Optionally, the determine sub-area module 502 is further configured to: inputting the bill picture into a pre-trained deep learning model, and outputting the position information of each subarea in the bill picture through the deep learning model; and extracting each subarea from the bill picture based on the position information of each subarea.

Optionally, the obtaining module 503 is further configured to: selecting an identification model corresponding to each subarea based on the area category of the subarea; and obtaining the identification information corresponding to the subareas by using the identification model, wherein the area category of each subarea is output by the deep learning model while inputting the bill picture into the pre-trained deep learning model and outputting the vertex coordinates of each subarea in the bill picture by the deep learning model, and the area category is used for indicating that the subarea is a KV area, a paragraph area or a table area aiming at each subarea.

In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 602 or a computer program loaded from a storage unit 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 601 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 601 performs the respective methods and processes described above, such as a ticket recognition method. For example, in some embodiments, the ticket identification method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When a computer program is loaded into RAM 603 and executed by computing unit 601, one or more steps of the ticket identification method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the ticket identification method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A ticket identification method comprising:

acquiring a bill picture;

dividing the bill picture to obtain subareas in the bill picture and area categories of all subareas; for each sub-region, the region category of the sub-region is one of the common structures; the general structure is a structure contained in the bill obtained through statistics;

for each sub-region, obtaining identification information of the sub-region;

integrating the identification information of each subarea to obtain the identification result of the bill picture;

the dividing the bill picture to obtain the subareas in the bill picture and the area category of each subarea comprises the following steps:

inputting the bill picture into a pre-trained deep learning model, and outputting the position information of each subarea in the bill picture through the deep learning model;

extracting each subarea from the bill picture based on the position information of each subarea respectively;

inputting the bill picture into a pre-trained deep learning model, outputting the position information of each subarea in the bill picture through the deep learning model, and simultaneously outputting the area category of each subarea by the deep learning model, wherein the area category is used for indicating that the subarea is a key value pair KV area, a paragraph area or a table area for each subarea;

the obtaining, for each sub-region, identification information of the sub-region includes:

selecting an identification model corresponding to each subarea based on the area category of the subarea;

and obtaining the identification information corresponding to the subarea by using the identification model.

2. The method of claim 1, wherein the generic structure comprises one or more of the following structures: key value pairs KV area, paragraph area and table area;

dividing the bill picture to obtain sub-regions in the bill picture, wherein the sub-regions comprise:

responding to the bill picture including the area which contains the key value pair and is presented by the preset rule by the key value pair key and the value corresponding to the key, and taking the area which contains the key value pair and is presented by the key value pair key and the value corresponding to the key by the preset rule in the bill picture as a key value pair KV area;

responding to the bill picture including the area containing the table, and taking the area containing the table in the bill picture as a table area;

3. A ticket identification apparatus comprising:

the acquisition module is used for acquiring the bill pictures;

the sub-region determining module is used for dividing the bill picture to obtain sub-regions in the bill picture and region categories of all the sub-regions; for each sub-region, the region category of the sub-region is one of the common structures; the general structure is a structure contained in the bill obtained through statistics;

the integration module is used for integrating the identification information of each subarea to obtain the identification result of the bill picture;

wherein, the determine subarea module is further configured to: inputting the bill picture into a pre-trained deep learning model, and outputting the position information of each subarea in the bill picture through the deep learning model; extracting each subarea from the bill picture based on the position information of each subarea, wherein the area category of each subarea is output by the deep learning model while inputting the bill picture into a pre-trained deep learning model and outputting the vertex coordinates of each subarea in the bill picture, and the area category is used for indicating that the subarea is a key value pair KV area, a paragraph area or a table area for each subarea;

the obtaining module is further configured to: selecting an identification model corresponding to each subarea based on the area category of the subarea; and obtaining the identification information corresponding to the subarea by using the identification model.

4. The apparatus of claim 3, wherein the generic structure comprises one or more of the following structures: key value pairs KV area, paragraph area and table area;

the sub-region determining module is further configured to: responding to the bill picture including the area which contains the key value pair and is presented by the preset rule by the key value pair key and the value corresponding to the key, and taking the area which contains the key value pair and is presented by the key value pair key and the value corresponding to the key by the preset rule in the bill picture as a key value pair KV area; responding to the bill picture including the area containing the table, and taking the area containing the table in the bill picture as a table area; and responding to the region which only contains the text in the bill picture, and taking the region which only contains the text in the bill picture as a paragraph region.

5. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-2.

6. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-2.