CN117218673A

CN117218673A - Bill identification method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN117218673A
Application number: CN202311170652.5A
Authority: CN
Inventors: 吴春彪; 钟玉兴; 叶瑛锋; 聂雪琴
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2023-12-12

Abstract

The invention discloses a bill identification method, a bill identification device, a computer readable storage medium and electronic equipment. Relates to the field of artificial intelligence, and the method comprises the following steps: acquiring a bill image to be identified, wherein the bill image to be identified comprises a handwritten text and a printed text; text recognition is carried out on handwritten texts in the bill images to be recognized by using a first recognition model to obtain first text information, text recognition is carried out on printed texts in the bill images to be recognized by using a second recognition model to obtain second text information, wherein the first recognition model is obtained by training according to a plurality of sample bill images containing the handwritten texts, and the second recognition model is obtained by training according to a plurality of sample bill images containing the printed texts; and determining text information in the bill image to be identified according to the first text information and the second text information. The invention solves the technical problem of poor recognition accuracy when recognizing the text information in the bill in the related technology.

Description

Bill identification method and device, computer readable storage medium and electronic equipment

Technical Field

The invention relates to the field of artificial intelligence, in particular to a bill identification method, a bill identification device, a computer readable storage medium and electronic equipment.

Background

When transacting financial business, it is often necessary to enter paper bill information submitted by customers into a system to form structured data for processing. If a manual input mode is adopted, a large amount of manpower is required, the efficiency is low, and if a deep learning method is adopted, automatic identification can be realized, but due to uneven bill images and more bill types, various printing and handwriting contents are particularly arranged on the bill, so that the problem of poor identification accuracy of character information in the bill can be caused.

In view of the above problems, no effective solution has been proposed at present.

Disclosure of Invention

The embodiment of the invention provides a bill identification method, a bill identification device, a computer-readable storage medium and electronic equipment, which at least solve the technical problem of poor identification accuracy when character information in a bill is identified in the related technology.

According to an aspect of an embodiment of the present invention, there is provided a ticket identification method, including: acquiring a bill image to be identified, wherein the bill image to be identified comprises a handwritten text and a printed text; text recognition is carried out on handwritten texts in the bill images to be recognized by using a first recognition model to obtain first text information, text recognition is carried out on printed texts in the bill images to be recognized by using a second recognition model to obtain second text information, wherein the first recognition model is obtained by training according to a plurality of sample bill images containing the handwritten texts, and the second recognition model is obtained by training according to a plurality of sample bill images containing the printed texts; and determining text information in the bill image to be identified according to the first text information and the second text information.

Further, the bill identifying method further comprises the following steps: dividing the image content of the bill image to be identified by using an image dividing model to obtain a first image and a second image, wherein the bill image to be identified, the first image and the second image have the same size, the first image comprises a handwritten text, and the second image comprises a printed text; performing text recognition on the handwritten text in the first image by using the first recognition model to obtain first text information; and carrying out text recognition on the printed text in the second image by using the second recognition model to obtain second text information.

Further, the bill identifying method further comprises the following steps: performing text region detection on the handwritten text in the first image by using a text detection model to obtain a plurality of first image block sets, wherein each first image block set comprises a plurality of first image blocks, each first image block comprises at least one text character, and the text characters in the plurality of first image blocks in the first image block set belong to the same sentence or the same word; for each first image block set, text recognition is carried out on text characters in a plurality of first image blocks in the first image block set by using a first recognition model, and first text sub-information matched with the first image block set is obtained; the first text information is composed of the first text sub-information matched by all the first image block sets.

Further, the bill identifying method further comprises the following steps: performing text region detection on the printed text in the second image by using a text detection model to obtain a plurality of second image block sets, wherein each second image block set comprises a plurality of second image blocks, each second image block comprises at least one text character, and the text characters in the plurality of second image blocks in the second image block set belong to the same sentence or the same word; for each second image block set, text recognition is carried out on text characters in a plurality of second image blocks in the second image block set by using a second recognition model, and second text sub-information matched with the second image block set is obtained; and forming second text information by the second text sub-information matched by all the second image block sets.

Further, the bill identifying method further comprises the following steps: for each first text sub-information in the first text information, determining first text content matched with the first text sub-information according to first position information of a first image block matched with the first text sub-information, and determining position information of the first text content; for each second text sub-information in the second text information, determining second text content matched with the second text sub-information according to second position information of a second image block matched with the second text sub-information, and determining position information of the second text content; matching the first text content with the second text content according to the position information of each first text content and the position information of each second text content to obtain a plurality of matched text contents; and forming text information in the bill image to be identified by a plurality of matched text contents.

Further, the bill identifying method further comprises the following steps: for each second text content, determining the center point coordinates of the second text content according to the position information of the second text content; for each first text content, determining the center point coordinates of the first text content according to the position information of the first text content; calculating Euclidean distance between the center point coordinates of the first text content and the center point coordinates of each second text content to obtain Euclidean distance values matched with each second text content; under the condition that a candidate Euclidean distance value exists in the multiple Euclidean distance values, screening the candidate Euclidean distance value from the multiple Euclidean distance values, wherein the candidate Euclidean distance value is smaller than a first preset threshold value; and determining a first target Euclidean distance value from the candidate Euclidean distance values, and matching the second text content matched with the first target Euclidean distance value with the first text content to obtain matched text content.

Further, the bill identifying method further comprises the following steps: for each candidate Euclidean distance value, calculating the vertical distance between the center point coordinate of the second text content matched with the candidate Euclidean distance value and the center point coordinate of the first text content to obtain a vertical distance value; judging whether a second target Euclidean distance value exists in the candidate Euclidean distance values according to the vertical distance values matched with the candidate Euclidean distance values, wherein the vertical distance value matched with the second target Euclidean distance value is smaller than a second preset threshold value; and determining the candidate Euclidean distance value matched with the second target Euclidean distance value as the first target Euclidean distance value under the condition that the second target Euclidean distance value exists.

According to another aspect of the embodiment of the present invention, there is also provided a bill identifying apparatus, including: the acquisition module is used for acquiring a bill image to be identified, wherein the bill image to be identified comprises a handwritten text and a printed text; the recognition module is used for carrying out text recognition on the handwritten text in the bill image to be recognized by utilizing a first recognition model to obtain first text information, and carrying out text recognition on the printed text in the bill image to be recognized by utilizing a second recognition model to obtain second text information, wherein the first recognition model is obtained by training according to a plurality of sample bill images containing the handwritten text, and the second recognition model is obtained by training according to a plurality of sample bill images containing the printed text; and the determining module is used for determining the text information in the bill image to be identified according to the first text information and the second text information.

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to execute the ticket identification method described above when run.

According to another aspect of an embodiment of the present invention, there is also provided an electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the ticket identification method described above when run.

In the embodiment of the invention, a mode of identifying text information in a bill image to be identified based on a plurality of models is adopted, the first text information is obtained by acquiring the bill image to be identified and then identifying text of a handwritten text in the bill image to be identified by utilizing a first identification model, and the second text information is obtained by identifying text of a printed text in the bill image to be identified by utilizing a second identification model, so that the text information in the bill image to be identified is determined according to the first text information and the second text information. The bill images to be identified comprise handwritten texts and printed texts, the first identification model is trained according to a plurality of sample bill images containing the handwritten texts, and the second identification model is trained according to a plurality of sample bill images containing the printed texts.

It is easy to notice that in the above process, since the fonts of the printed text and the handwritten text in the bill image to be recognized have certain differences, text recognition is performed on the handwritten text in the bill image to be recognized by using the first recognition model to obtain first text information, text recognition is performed on the printed text in the bill image to be recognized by using the second recognition model to obtain second text information, and recognition of the texts of different fonts in the bill image to be recognized by using different recognition models is realized, so that the text of each font is recognized in a targeted manner, and the accuracy of text information recognition is improved.

Therefore, the scheme provided by the application achieves the aim of identifying the text information in the bill image to be identified based on a plurality of models, thereby realizing the technical effect of improving the accuracy of text information identification, and further solving the technical problem of poor identification accuracy when the text information in the bill is identified in the related technology.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of an alternative ticket identification method according to an embodiment of the application;

FIG. 2 is a schematic illustration of an alternative ticket image to be identified according to an embodiment of the application;

FIG. 3 is a schematic diagram of an alternative bill identification method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of the operation of an alternative image segmentation model according to an embodiment of the present application;

FIG. 5 is a schematic view of an alternative bill validator according to an embodiment of the present application;

fig. 6 is a schematic diagram of an alternative electronic device according to an embodiment of the application.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.

Example 1

According to an embodiment of the present application, there is provided an embodiment of a ticket identification method, it being noted that the steps shown in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is shown in the flowcharts, in some cases the steps shown or described may be performed in an order different from that herein.

FIG. 1 is a flow chart of an alternative ticket identification method according to an embodiment of the application, as shown in FIG. 1, comprising the steps of:

step S101, acquiring a bill image to be identified, wherein the bill image to be identified comprises a handwritten text and a printed text.

Alternatively, an application system, an electronic device, a server, or the like may be used as an execution subject of the present application, and in this embodiment, the target recognition system is used as an execution subject to acquire the ticket image to be recognized. For example, fig. 2 is a schematic diagram of an alternative bill image to be recognized according to an embodiment of the present application, as shown in fig. 2, the english content in fig. 2 is a printed text, and the digital content is a handwritten text.

Step S102, text recognition is carried out on handwritten texts in the bill images to be recognized by using a first recognition model to obtain first text information, text recognition is carried out on printed texts in the bill images to be recognized by using a second recognition model to obtain second text information, wherein the first recognition model is trained according to a plurality of sample bill images containing the handwritten texts, and the second recognition model is trained according to a plurality of sample bill images containing the printed texts.

Optionally, the target recognition system may directly input the bill image to be recognized to the first recognition model to obtain the first text information output by the first recognition model, and input the bill image to be recognized to the second recognition model to obtain the second text information output by the second recognition model. Optionally, the target recognition system may divide the bill image to be recognized by using the image division model to obtain an image area containing the handwritten text and an image area containing the printed text, so that the image area containing the handwritten text is input into the first recognition model to obtain the first text information output by the first recognition model, and the image area containing the printed text is input into the second recognition model to obtain the second text information output by the second recognition model.

The first recognition model and the second recognition model may be models with the same structure but different parameters, that is, the first recognition model and the second recognition model may be obtained by training the same initial recognition model by using different training sample sets. The first recognition model and the second recognition model can be CRNN (Convolutional Recurrent Neural Network) models, the CRNN model is a graph-text recognition model, long text sequences can be recognized, the long text sequences comprise a CNN (Convolutional Neural Networks, convolutional neural network) characteristic extraction layer and a BLSTM (bidirectional long and short time memory network) sequence characteristic extraction layer, and end-to-end joint training can be performed. In the prediction process, the CRNN model uses a CNN network to extract the characteristics of a text image, the BLSTM is used for fusing the characteristic vectors to extract the context characteristics of a character sequence, then probability distribution of each row of characteristics is obtained, and finally the text sequence is obtained through prediction of a transcription layer, so that the text information is obtained.

The training sample set of the first recognition model consists of a plurality of sample bill images containing handwritten texts and real text information corresponding to each sample bill image, and the training sample set of the second recognition model consists of a plurality of sample bill images containing printed texts and real text information corresponding to each sample bill image. Preferably, the sample bill images in the training sample set of the first recognition model only contain handwritten text, and the sample bill images in the training sample set of the second recognition model only contain printed text.

And step S103, determining text information in the bill image to be identified according to the first text information and the second text information.

In step S103, the target recognition system may directly summarize the content in the first text information and the content in the second text information, so as to obtain text information in the bill image to be recognized. Optionally, since the bill image to be identified includes a plurality of attributes and attribute values, that is, includes a plurality of key information and value information corresponding to the key information, for example, the key information is a name and the value information is a Zhang three, the target identification system may also match the key information and the value information in the first text information and the second text information according to the text content in the first text information and the text content in the second text information, so as to determine the information obtained after the matching as the text information in the bill image to be identified.

Based on the scheme defined in the steps S101 to S103, it can be known that in the embodiment of the present invention, a manner of identifying text information in a bill image to be identified based on a plurality of models is adopted, by acquiring the bill image to be identified, then text identifying is performed on a handwritten text in the bill image to be identified by using a first identification model, so as to obtain first text information, and text identifying is performed on a printed text in the bill image to be identified by using a second identification model, so as to obtain second text information, so that text information in the bill image to be identified is determined according to the first text information and the second text information. The bill images to be identified comprise handwritten texts and printed texts, the first identification model is trained according to a plurality of sample bill images containing the handwritten texts, and the second identification model is trained according to a plurality of sample bill images containing the printed texts.

In an alternative embodiment, in the process of performing text recognition on a handwritten text in a bill image to be recognized by using a first recognition model to obtain first text information and performing text recognition on a printed text in the bill image to be recognized by using a second recognition model to obtain second text information, the target recognition system may divide the image content of the bill image to be recognized by using an image division model to obtain a first image and a second image, then perform text recognition on the handwritten text in the first image by using the first recognition model to obtain first text information and perform text recognition on the printed text in the second image by using the second recognition model to obtain second text information, wherein the bill image to be recognized, the first image and the second image have the same size, the first image includes the handwritten text, and the second image includes the printed text.

Fig. 3 is a schematic diagram of an alternative bill identifying method according to an embodiment of the present invention, in which key information is usually printed text and value information is usually handwritten text, because key information is fixed and value information depends on a user who fills out a bill in a bill image. Therefore, before text recognition is performed by using the first recognition model and the second recognition model, as shown in fig. 3, the target recognition system may first use the image segmentation model to perform key-value image separation on the bill image to be recognized, i.e., separate the bill image to be recognized into a first image only including handwritten text and a second image only including printed text. The size of the bill image to be identified, the size of the first image and the size of the second image are the same, namely the position of the handwritten text in the first image is the same as the position of the handwritten text in the bill image to be identified, and the position of the printed text in the second image is the same as the position of the printed text in the bill image to be identified.

Optionally, the image segmentation model adopts a U-Net segmentation network, and specifically can adopt a 2DU-Net network, 2DUNet is a classical image segmentation network in FCN (Fully Convolutional Networks, full convolutional neural network) that can output images end-to-end, i.e. the output image (first image, second image) is kept in size with the input image (ticket image to be identified). The 2DU-Net network uses the structure of an encoder for learning and generalizing the features of the input pictures, and a decoder for recovering the details of the output pictures. Fig. 4 is a schematic diagram of the operation of an alternative image segmentation model according to an embodiment of the present invention, as shown in fig. 4, in which a single encoder is used in the encoding stage and two decoders are used to decode the first image and the second image, respectively, because the first image and the second image need to be obtained. Specifically, assuming that the bill image to be identified is X, the encoder is θ, and the intermediate layer feature image is X _m Then the relationship may be expressed as in equation (1):

X _m ＝θ(X） (1)

as shown in fig. 4, in the decoding stage, the image segmentation model has two decoders:in the decoding phase, two decoders obtain the first image by decoding the intermediate layer feature image Xm>Second image->The relationship thereof can be expressed as formula (2) and formula (3):

further, after determining the first image and the second image, the object recognition system may input the first image to the first recognition model and the second image to the second recognition model, thereby obtaining the first text information and the second text information. Optionally, the target recognition system may further process the first image and the second image, so that the processed first image is input to the first recognition model, and the processed second image is input to the second recognition model, so as to obtain the first text information and the second text information.

Alternatively, in the training stage of the image segmentation model, a training sample set of the image segmentation model may be constructed as follows: first an image X is acquired which contains only handwritten text _a And collecting bill images containing only printed text and marking as X _b . Wherein X is _a And X is _b The images are consistent in size, so that a target sample bill image of the image segmentation model can be obtained through the construction of a formula (4):

wherein X [ i ]]Pixel value, X, representing the ith pixel in the target sample ticket image _a [i]Representing an image X containing only handwritten text _a The pixel value of the ith pixel, X _b [i]Representing ticket image X containing printed text only _b The pixel value of the i-th pixel in (a). The meaning that equation (4) is intended to express is if it is in image X containing only handwritten text _a The pixel value of the ith pixel in the target sample bill image is not 0, and the pixel value of the ith pixel in the target sample bill image is the image X of the handwritten text _a The pixel value of the i-th pixel in the ticket image Xb containing only the printed text, and vice versa. Colloquially, it is an image X that will contain only handwritten text _a The content of the document is covered on the bill image which only contains the printing text and is marked as X _b Thereby achieving the effect of image fusion, wherein X _a The pixel value in the blank is zero.

Further, equation (5) may be used as a loss function used in the image segmentation model training process:

wherein Loss represents a Loss value, Y _a Equivalent to X in the bill image of the target sample _a Content corresponding to Y _b Equivalent to X in the bill image of the target sample _b The corresponding content.

By the aid of the process, the first recognition model and the second recognition model can better recognize text information, and accordingly recognition accuracy can be effectively achieved.

In an alternative embodiment, in the process of performing text recognition on the handwritten text in the first image by using the first recognition model to obtain the first text information, the target recognition system may perform text region detection on the handwritten text in the first image by using the text detection model to obtain a plurality of first image block sets, and then perform text recognition on text characters in a plurality of first image blocks in the first image block set by using the first recognition model for each first image block set to obtain first text sub-information matched with the first image block set, so that the first text information is composed of the first text sub-information matched with all the first image block sets. Each first image block set comprises a plurality of first image blocks, each first image block comprises at least one text character, and the text characters in the plurality of first image blocks in the first image block set belong to the same sentence or the same word.

Alternatively, the text detection model may be a CTPN (Connection Text Proposal Network, connected text area network) model, and the CPTN model is a text detection model based on a target detection method, which converts a text detection task into detection of a series of small-scale text boxes. The CPTN model adopts a mode of combining CNN and BLSTM, extracts image features through the CNN, and extracts sequence front-back relation features through the BLSTM so as to improve the boundary prediction accuracy of the text box.

Alternatively, as shown in FIG. 3, the object recognition system may input the first image into a text detection model, resulting in a plurality of first image block sets. For example, if the first image includes "date" and "XXX mechanism payment ticket", the text detection model may output two first image block sets, where one image block set includes two first image blocks with "date" and "period" respectively, and the other image set includes three first image blocks with "XXX mechanism", "payment" and "ticket" respectively. In addition, the text detection model also outputs first position information of each first image block, wherein the first position information can be the center coordinates of the first image blocks, the center coordinates of each character in the first image blocks, and corner marks of frames of each character in the first image blocks.

Further, for each first image block set, as shown in fig. 3, the target recognition system may input the first image blocks in the first image block set into the first recognition model, and then perform text recognition on text characters in each first image block by using the first recognition model, so as to obtain first text sub-information matched with the first image block set. The first text sub-information comprises text sub-information corresponding to each first image block.

And then, forming first text information by the first text sub-information matched with all the first image block sets.

It should be noted that, by using the text detection model to detect the text region of the handwritten text in the first image to obtain a plurality of first image block sets, and then using the first recognition model to perform text recognition on the image blocks in the first image block sets, the accuracy of text recognition can be further improved, so as to improve the recognition accuracy. The problem that the character recognition effect is affected because the character range of a single character cannot be accurately determined when a certain character in an image is recognized under the condition that the first recognition model recognizes the text of the image containing a large amount of texts is avoided.

In an alternative embodiment, in the process of performing text recognition on the printed text in the second image by using the second recognition model to obtain the second text information, the target recognition system may perform text region detection on the printed text in the second image by using the text detection model to obtain a plurality of second image block sets, and then perform text recognition on text characters in a plurality of second image blocks in the second image block set by using the second recognition model for each second image block set to obtain second text sub-information matched with the second image block set, so that the second text information is composed of the second text sub-information matched with all the second image block sets. Each second image block set comprises a plurality of second image blocks, each second image block comprises at least one text character, and the text characters in the second image blocks in the second image block set belong to the same sentence or the same word.

Alternatively, as shown in FIG. 3, the object recognition system may input the second image into a text detection model, resulting in a plurality of second image block sets. For example, if the second image includes "2021, 5, 8, and" confirm payment ", the text detection model may output two sets of second image blocks, one set of second image blocks including three second image blocks with" 2021 "," 5, 8, and the other set of second image blocks including two second image blocks with "confirm" and "pay", respectively. In addition, the text detection model also outputs second position information of each second image block, wherein the second position information can be the center coordinates of the second image block, the center coordinates of each character in the second image block, and corner marks of frames of each character in the second image block.

Further, as shown in fig. 3, for each second image block set, the target recognition system may input the second image blocks in the second image block set into the second recognition model, and then perform text recognition on text characters in each second image block by using the second recognition model, so as to obtain second text sub-information matched by the second image block set. The second text sub-information comprises text sub-information corresponding to each second image block.

And then, forming second text information by the second text sub-information matched by all the second image block sets.

By the above process, the accuracy of text recognition can be further improved, thereby improving the recognition accuracy.

In an alternative embodiment, the text detection model further outputs first position information of each first image block, and the text detection model further outputs second position information of each second image block, wherein in determining text information in the ticket image to be identified according to the first text information and the second text information, the object recognition system may determine, for each first text sub-information in the first text information, first text content matched with the first text sub-information according to the first position information of the first image block matched with the first text sub-information, and determine position information of the first text content, for each second text sub-information in the second text information, determine second text content matched with the second text sub-information according to the second position information of the second image block matched with the second text sub-information, and determine position information of the second text content, and then match the first text content with the second text content according to the position information of each first text content and the position information of each second text content, so as to obtain a plurality of matched texts, thereby forming a ticket image to be identified in the plurality of matched text contents.

The first position information of the first image block refers to the first position information of the first image block in the first image, the second position information of the second image block refers to the first position information of the second image block in the second image, and since the sizes of the bill image to be identified, the first image and the second image are the same, the first position information of the first image block can be understood as the first position information of the first image block in the bill image to be identified, and the second position information of the second image block can be understood as the second position information of the second image block in the bill image to be identified. And the reference coordinate systems of the first position information and the second position information are the same, for example, the vertex of the upper left corner of the bill image to be identified is determined to be a (0, 0) coordinate, the right direction along the coordinate is the positive half-axis direction of the x-axis, and the vertical direction along the coordinate is the positive plate axis direction of the y-axis.

Because the first text sub-information includes text sub-information corresponding to each first image block, for each first text sub-information, the target recognition system may sort the text sub-information corresponding to each first image block in the first text sub-information according to the first position information of each first image block, so as to combine the text sub-information after sorting to obtain the first text content. For example, if the first text sub-information includes "day" and "period", the first location information of the first image block including "day" is: the center coordinates of the first image block are (1, 1), and the first position information of the first image block including the "period" is: the center coordinates of the first image block are (1, 2), and the ordered text sub-information is "date", "period", so that the first text content can be determined to be "date".

Further, the target recognition system may average the first position information of all the first image blocks matched with the first text sub-information in the x-axis direction and the y-axis direction, so as to obtain the position information of the corresponding first text content, that is, for the above first text sub-information including "day" and "period", the position information of the first text content is: the center coordinates of the first text content are (1, 1.5).

Further, the target recognition system may determine the second text content and the location information of the second text content matched with each of the second text sub-information in the above manner, so that the description thereof is omitted herein.

Further, as shown in fig. 3, since the key information and the corresponding value information are often distributed in the adjacent areas in the bill image, for each first text content, the target recognition system may adopt a position matching manner, and screen out the second text content closest to the first text content from all the second text contents according to the position information of the first text content and the position information of each second text content, so as to determine that the first text content and the second text content are the matched key value information, and further obtain a plurality of matched text contents. For example, the matched text content may be "name-Zhang San", where "name" is the second text content and "name" is the first text content.

And then, the target recognition system can form structured data according to the matched text contents, wherein the structured data comprises a plurality of key information and value information matched with each key information, and the structured data is the text information in the bill image to be recognized.

By the aid of the process, information in the bill image to be identified is effectively determined, and accordingly accuracy of text identification is improved.

In an alternative embodiment, in the process of matching the first text content with the second text content according to the position information of each first text content and the position information of each second text content to obtain a plurality of matched text contents, the target recognition system may determine, for each second text content, a center point coordinate of the second text content according to the position information of the second text content, determine, for each first text content, a center point coordinate of the first text content according to the position information of the first text content, then calculate a euclidean distance between the center point coordinate of the first text content and the center point coordinate of each second text content to obtain a euclidean distance value matched with each second text content, and then, in the case that candidate euclidean distance values exist in the plurality of euclidean distance values, select the candidate euclidean distance value from the plurality of euclidean distance values, thereby determining a first target euclidean distance value from the candidate euclidean distance values, and matching the second text content matched with the first target euclidean distance value with the first text content to obtain the matched text content. The candidate Euclidean distance value is smaller than a first preset threshold value.

Optionally, if the position information of the first text content (or the second text content) refers to the center point coordinate of the first text content (or the second text content), the target recognition system may directly extract the center point coordinate from the position information, and if the position information of the first text content (or the second text content) refers to the four vertex coordinates of the frame of the first text content (or the second text content), the target recognition system may calculate the center point coordinate.

Further, for each first text content, the target recognition system may calculate a euclidean distance between the center point coordinates of the first text content and the center point coordinates of the respective second text content, thereby obtaining a euclidean distance value for the respective second text content matches.

Then, the target recognition system may determine whether a candidate euclidean distance value exists in the euclidean distance values, so as to determine the candidate euclidean distance value under the condition that the candidate euclidean distance value exists, and then may determine the smallest candidate euclidean distance value in the candidate euclidean distance values as the first target euclidean distance value, so as to determine the text content after matching. Otherwise, if the candidate Euclidean distance value does not exist, determining that the first text content does not exist the matched second text content.

By the aid of the process, effective matching between the second text content and the first text content is achieved, and therefore recognition accuracy of text information in the bill image is improved.

In an alternative embodiment, in determining the first target euclidean distance value from the candidate euclidean distance values, the target recognition system may calculate, for each candidate euclidean distance value, a vertical distance between the coordinates of the center point of the second text content and the coordinates of the center point of the first text content, where the candidate euclidean distance values match, to obtain a vertical distance value, and then determine whether the second target euclidean distance value exists in the candidate euclidean distance values according to the vertical distance values, where the candidate euclidean distance value matches the second target euclidean distance value, so as to determine the candidate euclidean distance value, where the second target euclidean distance value exists, as the first target euclidean distance value. The vertical distance value matched with the second target Euclidean distance value is smaller than a second preset threshold value.

Optionally, since the texts in the bill image are mostly distributed transversely, for each candidate euclidean distance value, the target recognition system may calculate a vertical distance between the center point coordinate of the second text content matched by the candidate euclidean distance value and the center point coordinate of the first text content, that is, calculate a difference value of the y-axis coordinates, so as to obtain a vertical distance value.

Then, the target recognition system may determine whether a second target euclidean distance value exists in the candidate euclidean distance values, that is, determine whether a second text content in the same line as the first text content exists, so that if the second target euclidean distance value exists, the candidate euclidean distance value matched with the second target euclidean distance value is determined as the first target euclidean distance value. Otherwise, if the second target Euclidean distance value does not exist, determining that the first text content does not exist the matched second text content.

By the above process, more accurate matching between the first text content and the second text content is realized, so that the recognition accuracy of the text information can be improved.

Example 2

According to an embodiment of the present application, there is provided an embodiment of a bill identifying means, wherein fig. 5 is a schematic diagram of an alternative bill identifying means according to an embodiment of the present application, as shown in fig. 5, the apparatus includes:

The obtaining module 501 is configured to obtain a ticket image to be identified, where the ticket image to be identified includes a handwritten text and a printed text;

the recognition module 502 is configured to perform text recognition on a handwritten text in a bill image to be recognized by using a first recognition model to obtain first text information, and perform text recognition on a printed text in the bill image to be recognized by using a second recognition model to obtain second text information, where the first recognition model is obtained by training according to a plurality of sample bill images including the handwritten text, and the second recognition model is obtained by training according to a plurality of sample bill images including the printed text;

a determining module 503, configured to determine text information in the bill image to be identified according to the first text information and the second text information.

It should be noted that the above-mentioned obtaining module 501, the identifying module 502, and the determining module 503 correspond to steps S101 to S103 in the above-mentioned embodiment, and the three modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to those disclosed in the above-mentioned embodiment 1.

Optionally, the identification module 502 further includes: the segmentation sub-module is used for segmenting the image content of the bill image to be identified by utilizing the image segmentation model to obtain a first image and a second image, wherein the bill image to be identified, the first image and the second image have the same size, the first image comprises a handwritten text, and the second image comprises a printed text; the first recognition sub-module is used for carrying out text recognition on the handwritten text in the first image by utilizing the first recognition model to obtain first text information; and the second recognition sub-module is used for carrying out text recognition on the printed text in the second image by utilizing the second recognition model to obtain second text information.

Optionally, the first identification sub-module further includes: the first detection unit is used for detecting text areas of handwritten texts in the first image by using a text detection model to obtain a plurality of first image block sets, wherein each first image block set comprises a plurality of first image blocks, each first image block comprises at least one text character, and the text characters in the plurality of first image blocks in the first image block set belong to the same sentence or the same word; the first recognition unit is used for carrying out text recognition on text characters in a plurality of first image blocks in the first image block set by using a first recognition model for each first image block set to obtain first text sub-information matched with the first image block set; and the first processing unit is used for forming first text information by the first text sub-information matched with all the first image block sets.

Optionally, the second identifying sub-module further includes: the second detection unit is used for detecting text areas of the printed text in the second image by using the text detection model to obtain a plurality of second image block sets, wherein each second image block set comprises a plurality of second image blocks, each second image block comprises at least one text character, and the text characters in the plurality of second image blocks in the second image block set belong to the same sentence or the same word; the second recognition unit is used for carrying out text recognition on text characters in a plurality of second image blocks in the second image block set by using a second recognition model for each second image block set to obtain second text sub-information matched with the second image block set; and the second processing unit is used for forming second text information by the second text sub-information matched with all the second image block sets.

Optionally, the determining module 503 further includes: the first determining sub-module is used for determining first text contents matched with the first text sub-information according to the first position information of the first image block matched with the first text sub-information for each first text sub-information in the first text information, and determining the position information of the first text contents; the second determining sub-module is used for determining second text contents matched with the second text sub-information according to the second position information of the second image block matched with the second text sub-information for each piece of second text sub-information in the second text information, and determining the position information of the second text contents; the matching sub-module is used for matching the first text content with the second text content according to the position information of each first text content and the position information of each second text content to obtain a plurality of matched text contents; and the processing sub-module is used for forming text information in the bill image to be identified by the matched text contents.

Optionally, the matching submodule further includes: a first determining unit configured to determine, for each of the second text contents, a center point coordinate of the second text content based on the position information of the second text content; a second determining unit configured to determine, for each first text content, center point coordinates of the first text content based on the position information of the first text content; the computing unit is used for computing Euclidean distance between the center point coordinates of the first text content and the center point coordinates of each second text content to obtain Euclidean distance values matched with each second text content; the screening unit is used for screening the candidate Euclidean distance value from the multiple Euclidean distance values under the condition that the candidate Euclidean distance value exists in the multiple Euclidean distance values, and the candidate Euclidean distance value is smaller than a first preset threshold value; and the third determining unit is used for determining a first target Euclidean distance value from the candidate Euclidean distance values, and matching the second text content matched with the first target Euclidean distance value with the first text content to obtain matched text content.

Optionally, the third determining unit further includes: the calculating subunit is used for calculating the vertical distance between the center point coordinate of the second text content matched with the candidate Euclidean distance value and the center point coordinate of the first text content for each candidate Euclidean distance value to obtain a vertical distance value; the judging subunit is used for judging whether a second target Euclidean distance value exists in the candidate Euclidean distance values according to the vertical distance values matched with the candidate Euclidean distance values, wherein the vertical distance value matched with the second target Euclidean distance value is smaller than a second preset threshold value; and the determining subunit is used for determining the candidate Euclidean distance value matched with the second target Euclidean distance value as the first target Euclidean distance value under the condition that the second target Euclidean distance value exists.

Example 3

According to another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the ticket identification method described above when run.

Example 4

According to another aspect of an embodiment of the present invention, there is also provided an electronic device, wherein fig. 6 is a schematic diagram of an alternative electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device including one or more processors; and a memory for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to implement a method for running the program, wherein the program is configured to perform the ticket identification method described above when run.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of units may be a logic function division, and there may be another division manner in actual implementation, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A ticket identification method, comprising:

acquiring a bill image to be identified, wherein the bill image to be identified comprises a handwritten text and a printed text;

performing text recognition on the handwritten text in the bill image to be recognized by using a first recognition model to obtain first text information, and performing text recognition on the printed text in the bill image to be recognized by using a second recognition model to obtain second text information, wherein the first recognition model is obtained by training according to a plurality of sample bill images containing the handwritten text, and the second recognition model is obtained by training according to a plurality of sample bill images containing the printed text;

and determining text information in the bill image to be identified according to the first text information and the second text information.

2. The method of claim 1, wherein performing text recognition on the handwritten text in the ticket image to be recognized using a first recognition model to obtain first text information, and performing text recognition on the printed text in the ticket image to be recognized using a second recognition model to obtain second text information, comprises:

Dividing the image content of the bill image to be identified by utilizing an image division model to obtain a first image and a second image, wherein the bill image to be identified, the first image and the second image have the same size, the first image comprises a handwritten text, and the second image comprises a printed text;

performing text recognition on the handwritten text in the first image by using the first recognition model to obtain the first text information;

and carrying out text recognition on the printed text in the second image by using the second recognition model to obtain the second text information.

3. The method of claim 2, wherein text recognition of the handwritten text in the first image using the first recognition model results in the first text information, comprising:

performing text region detection on the handwritten text in the first image by using a text detection model to obtain a plurality of first image block sets, wherein each first image block set comprises a plurality of first image blocks, each first image block comprises at least one text character, and the text characters in the plurality of first image blocks in the first image block set belong to the same sentence or the same word;

For each first image block set, carrying out text recognition on text characters in a plurality of first image blocks in the first image block set by using the first recognition model to obtain first text sub-information matched with the first image block set;

the first text information is composed of first text sub-information matched by all the first image block sets.

4. A method according to claim 3, wherein text recognition of the printed text in the second image using the second recognition model to obtain the second text information comprises:

performing text region detection on the printed text in the second image by using the text detection model to obtain a plurality of second image block sets, wherein each second image block set comprises a plurality of second image blocks, each second image block comprises at least one text character, and the text characters in the plurality of second image blocks in the second image block set belong to the same sentence or the same word;

for each second image block set, carrying out text recognition on text characters in a plurality of second image blocks in the second image block set by using the second recognition model to obtain second text sub-information matched with the second image block set;

And forming the second text information by the second text sub-information matched by all the second image block sets.

5. The method of claim 4, wherein the text detection model further outputs first location information for each first image block and the text detection model further outputs second location information for each second image block, wherein determining text information in the ticket image to be identified based on the first text information and the second text information comprises:

for each first text sub-information in the first text information, determining first text content matched with the first text sub-information according to first position information of a first image block matched with the first text sub-information, and determining position information of the first text content;

for each second text sub-information in the second text information, determining second text content matched with the second text sub-information according to second position information of a second image block matched with the second text sub-information, and determining position information of the second text content;

matching the first text content with the second text content according to the position information of each first text content and the position information of each second text content to obtain a plurality of matched text contents;

And forming text information in the bill image to be identified by the matched text contents.

6. The method of claim 5, wherein matching the first text content with the second text content based on the location information of each first text content and the location information of each second text content to obtain a plurality of matched text contents, comprising:

for each second text content, determining the center point coordinates of the second text content according to the position information of the second text content;

for each first text content, determining the center point coordinates of the first text content according to the position information of the first text content;

calculating Euclidean distance between the center point coordinates of the first text content and the center point coordinates of each second text content to obtain Euclidean distance values matched with each second text content;

screening the candidate Euclidean distance value from the multiple Euclidean distance values under the condition that the candidate Euclidean distance value exists in the multiple Euclidean distance values, wherein the candidate Euclidean distance value is smaller than a first preset threshold value;

and determining a first target Euclidean distance value from the candidate Euclidean distance values, and matching the second text content matched with the first target Euclidean distance value with the first text content to obtain the matched text content.

7. The method of claim 6, wherein determining a first target euclidean distance value from the candidate euclidean distance values comprises:

for each candidate Euclidean distance value, calculating the vertical distance between the center point coordinate of the second text content matched with the candidate Euclidean distance value and the center point coordinate of the first text content to obtain a vertical distance value;

judging whether a second target Euclidean distance value exists in the candidate Euclidean distance values according to the vertical distance values matched with the candidate Euclidean distance values, wherein the vertical distance value matched with the second target Euclidean distance value is smaller than a second preset threshold value;

and under the condition that a second target Euclidean distance value exists, determining the candidate Euclidean distance value matched with the second target Euclidean distance value as the first target Euclidean distance value.

8. A bill identifying device, characterized by comprising:

the system comprises an acquisition module, a printing module and a recognition module, wherein the acquisition module is used for acquiring a bill image to be recognized, and the bill image to be recognized comprises a handwriting text and a printing text;

the identification module is used for carrying out text identification on the handwritten text in the bill image to be identified by utilizing a first identification model to obtain first text information, and carrying out text identification on the printed text in the bill image to be identified by utilizing a second identification model to obtain second text information, wherein the first identification model is obtained by training according to a plurality of sample bill images containing the handwritten text, and the second identification model is obtained by training according to a plurality of sample bill images containing the printed text;

And the determining module is used for determining the text information in the bill image to be identified according to the first text information and the second text information.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program, wherein the computer program is arranged to execute the ticket identification method as claimed in any one of claims 1 to 7 when run.

10. An electronic device, the electronic device comprising one or more processors; a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement a method for running a program, wherein the program is configured to perform the ticket identification method of any of claims 1 to 7 when run.