CN115147846A

CN115147846A - Multi-language bill identification method, device, equipment and storage medium

Info

Publication number: CN115147846A
Application number: CN202210832078.4A
Authority: CN
Inventors: 王帅峰
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-07-15
Filing date: 2022-07-15
Publication date: 2022-10-04

Abstract

The invention relates to the technical field of image processing, in particular to a multi-language bill identification method, a multi-language bill identification device, multi-language bill identification equipment and a storage medium, wherein the method identifies texts of pictures to be predicted through a preset identification model, and distinguishes text information of the pictures to be predicted so as to position target identification elements; dividing the picture to be predicted according to the target recognition element, and distinguishing target information to be recognized to form a plurality of target sub-pictures; according to different text information of each target sub-picture, different recognition sub-models are matched with each target sub-picture; a plurality of recognizer models are concurrently called to recognize information of each target recognition element, so that the recognition efficiency can be improved; and the text information of different types in the bill is separately identified, so that the identification interference among the texts of different types is avoided, the identification precision is improved, and the technical problem of low identification precision of the existing bill identification model is solved.

Description

Multi-language bill identification method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a multi-language bill identification method, device, equipment and storage medium.

Background

In the prior art, automatic recognition of bills, automatic entry of texts and automatic verification of contents can be realized through an OCR technology. However, the bill contents have a plurality of text contents such as text multilingual, print and handwritten forms, various symbols and the like, and the existing recognition model has the problems of complex process, low recognition efficiency, poor recognition effect, difficult optimization and the like aiming at the detection and recognition of the complex bill contents.

Therefore, how to solve the technical problem that the existing bill identification model is low in identification precision becomes an urgent need to be solved.

Disclosure of Invention

The invention mainly aims to provide a multilingual bill identification method, device, equipment and storage medium, and aims to solve the technical problem of low identification precision of the existing bill identification model.

In order to achieve the above object, the present invention provides a multi-language bill recognition method, including: acquiring at least one picture to be predicted, performing text recognition on the picture to be predicted based on a preset recognition model, and positioning at least one target recognition element; based on the target identification element, dividing the picture to be predicted to obtain at least one target sub-picture; and concurrently calling an identification submodel corresponding to the target sub-picture, and carrying out information identification on the target sub-picture to obtain target information.

In addition, to achieve the above object, the present invention provides a multi-language bill recognition apparatus, including: the identification element positioning module is used for acquiring at least one picture to be predicted, performing text identification on the picture to be predicted based on a preset identification model, and positioning at least one target identification element; the sub-picture extraction module is used for segmenting the picture to be predicted based on the target identification element to obtain at least one target sub-picture; and the information identification module is used for concurrently calling an identification submodel corresponding to the target sub-picture, and performing information identification on the target sub-picture to obtain target information.

In addition, to achieve the above object, the present invention also provides a multi-language bill recognition apparatus including a processor, a memory, and a multi-language bill recognition program stored on the memory and executable by the processor, wherein the multi-language bill recognition program, when executed by the processor, implements the steps of the multi-language bill recognition method as described above.

In addition, to achieve the above object, the present invention also provides a computer readable storage medium having a multi-language bill recognition program stored thereon, wherein the multi-language bill recognition program, when executed by a processor, implements the steps of the multi-language bill recognition method as described above.

The invention provides a multi-language bill identification method, which comprises the steps of obtaining at least one picture to be predicted, carrying out text identification on the picture to be predicted based on a preset identification model, and positioning at least one target identification element; based on the target identification element, segmenting the picture to be predicted to obtain at least one target sub-picture; and concurrently calling an identification submodel corresponding to the target sub-picture, and carrying out information identification on the target sub-picture to obtain target information. Through the mode, the text recognition is carried out on the picture to be predicted through the preset recognition model, the text information of the picture to be predicted is distinguished, the target recognition element is positioned, the picture to be predicted is divided according to the target recognition element, the target information needing to be recognized is distinguished, a plurality of target sub-pictures are formed, each target sub-picture corresponds to one recognition sub-model, the plurality of recognition sub-models are invoked to carry out information recognition on each target recognition element, the recognition efficiency can be improved, the text information of different types in the bill is recognized respectively, the recognition interference among the texts of different types is avoided, the recognition precision is improved, and the technical problem that the existing bill recognition model is low in recognition precision and becomes urgent to be solved is solved.

Drawings

FIG. 1 is a schematic diagram of a hardware structure of a multi-language bill identifying apparatus according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a multi-lingual ticket identification method according to a first embodiment of the present invention;

FIG. 3 is a flowchart illustrating a multi-lingual ticket identification method according to a second embodiment of the present invention;

FIG. 4 is a functional block diagram of a multi-language bill recognition apparatus according to a first embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The multi-language bill identification method related by the embodiment of the invention is mainly applied to multi-language bill identification equipment, and the multi-language bill identification equipment can be equipment with display and processing functions, such as a PC (personal computer), a portable computer, a mobile terminal and the like.

Referring to fig. 1, fig. 1 is a schematic diagram of a hardware structure of a multi-language bill recognition apparatus according to an embodiment of the present invention. In an embodiment of the present invention, the multi-language ticket recognition device may include a processor 1001 (e.g., CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. The communication bus 1002 is used for realizing connection communication among the components; the user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard); the network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface); the memory 1005 may be a high-speed RAM memory, or may be a non-volatile memory (e.g., a magnetic disk memory), and optionally, the memory 1005 may be a storage device independent of the processor 1001.

Those skilled in the art will appreciate that the hardware configuration shown in FIG. 1 does not constitute a limitation of the multi-lingual bill identification device and may include more or fewer components than those shown, or some components in combination, or a different arrangement of components.

With continued reference to FIG. 1, the memory 1005 of FIG. 1, which is a computer-readable storage medium, may include an operating system, a network communication module, and a multi-language ticket identification program.

In fig. 1, the network communication module is mainly used for connecting to a server and performing data communication with the server; the processor 1001 may call the multi-language bill recognition program stored in the memory 1005 and execute the multi-language bill recognition method according to the embodiment of the present invention.

The embodiment of the invention provides a multi-language bill identification method.

Referring to fig. 2, fig. 2 is a flowchart illustrating a multilingual ticket-recognition method according to a first embodiment of the present invention.

In this embodiment, the multilingual ticket recognition method includes the following steps:

s10, acquiring at least one picture to be predicted, performing text recognition on the picture to be predicted based on a preset recognition model, and positioning at least one target recognition element;

in this embodiment, the preset recognition model may include a text detection model, a direction classification model, and a general recognition model.

Specifically, the text detection model is mainly used for positioning the region where the characters are located and eliminating the interference of sundries such as backgrounds in the pictures. The text detection model can be used as a labeling tool for semi-automatically and manually labeling the picture to be predicted, so that the text region in the picture to be predicted is identified and extracted.

It is understood that the text detection model described in the present embodiment may be any model used for text detection in the prior art. For example, the text detection model may be a CTRN model or an East model.

Specifically, the CTRN model concatenates small text blocks into a finished text box using a text line construction algorithm by extracting context features around the character using a BLSTM module. The text line construction algorithm searches the paired text blocks with the horizontal distance smaller than 50 pixels for each detected small text block in the forward direction and the reverse direction, and a connection graph of all the text blocks can be formed naturally after all the text blocks are searched, so that a text detection box is found.

In particular, the East model provides a simple and powerful pipeline, and text detection can be rapidly and accurately realized in a natural scene. The pipeline directly predicts quadrilateral words or text lines in any direction in the full image by using a simple neural network.

Specifically, in order to improve the recognition effect of detecting the characters in the text box and maintain the consistency of the characters, it is generally desirable that the text box to be recognized is in a positive horizontal direction. The direction classification model can be used for carrying out deviation correction on the image detected by the text detection model, can be corrected after FFT (fast Fourier transform) transformation and then inversely transformed, and can also be directly used for searching a rectangle containing a character area and rotating the rectangle. For example, if the image is in the vertical direction, the image is rotated by 90 degrees and then becomes in the horizontal direction; if the text box is reversed after changing to the horizontal orientation, the picture can be re-recognized as being reversed.

Specifically, the generic recognition model is a text recognition model for recognizing the text of the ticket image. The universal recognition model recognizes the text box selected by the text detection model, and extracts the text information in the text box, wherein the text information comprises information such as language types (such as Chinese characters, english characters, numbers and the like), character types (such as handwriting and printing) and character contents.

Step S20, based on the target identification element, dividing the picture to be predicted to obtain at least one target sub-picture;

in this embodiment, the image of the picture to be predicted is divided according to the coordinate position of the target identification element in the picture to be predicted and the size of the text box, and a plurality of target identification elements are correspondingly divided into a plurality of target sub-pictures.

Specifically, based on the target recognition element positioned by the preset recognition model, the text box of the line where the target recognition element is positioned is determined, and then the single line of text is cut into single characters. The text can be cut by adopting a uniform cutting method or a statistical cutting method.

Specifically, the uniform cutting is to directly cut the single-line text into square pictures according to the height without any judgment. The method is mainly used for cutting the square Chinese characters.

Specifically, statistical segmentation is performed by summing the vertical direction of a single row of text pictures, and a column of 0 is the cut column. This method can be used for cutting for english and numeric characters.

And S30, concurrently calling an identification submodel corresponding to the target sub-picture, and performing information identification on the target sub-picture to obtain target information.

In this embodiment, according to text information (such as language attribute, print style, or handwriting) of each target recognition element, the corresponding recognition sub-models are matched, after matching is completed, each recognition sub-model is concurrently called to recognize the target sub-picture, and the text of each target sub-picture is read and analyzed to obtain the target information. The recognition efficiency can be improved by calling the recognition submodels for recognition concurrently, and the recognition submodels operate independently and can avoid mutual interference aiming at the corresponding target sub-pictures.

The embodiment provides a multi-language bill identification method, which includes the steps of obtaining at least one picture to be predicted, carrying out text identification on the picture to be predicted based on a preset identification model, and positioning at least one target identification element; based on the target identification element, dividing the picture to be predicted to obtain at least one target sub-picture; and concurrently calling an identification submodel corresponding to the target sub-picture, and carrying out information identification on the target sub-picture to obtain target information. Through the mode, the text recognition is carried out on the picture to be predicted through the preset recognition model, and the text information of the picture to be predicted is distinguished, so that the target recognition element is positioned; dividing the picture to be predicted according to the target recognition element, and distinguishing target information needing to be recognized to form a plurality of target sub-pictures; each target sub-picture corresponds to one identification sub-model, and a plurality of identification sub-models are concurrently called to identify information of each target identification element, so that the identification efficiency can be improved; the method and the device realize the respective identification of different types of text information in the bill, avoid the identification interference among different types of texts, thereby improving the identification precision and solving the technical problem that the low identification precision of the existing bill identification model is a problem to be solved urgently.

Referring to fig. 3, fig. 3 is a flowchart illustrating a multilingual ticket-recognition method according to a second embodiment of the present invention.

Based on the foregoing embodiment shown in fig. 2, in this embodiment, before the step S10, the method further includes:

step S01, acquiring a bill training set, and acquiring a marked bill based on the preset recognition model and the bill training set;

s02, identifying the marked bill based on a text identification model, and positioning identification elements;

and S03, extracting element sub-pictures based on the identification elements, and constructing the identification sub-models based on the element sub-pictures.

In the embodiment, firstly, a preset recognition model is trained according to an acquired bill training set, semi-automatic manual labeling is realized, and a labeled bill is acquired; then, performing primary text recognition on the marked bill through a text recognition model; positioning the identification elements of the identification task by using the keywords; and then, based on the coordinates of all the elements positioned by the recognition task, the bill is segmented to extract element sub-pictures, and recognition interference of other irrelevant characters and modules is ignored. The identification submodels based on the CRNN + CTC structure are respectively established for the bill element sub-pictures for identification, the accuracy of the identification single model aiming at the specific elements is greatly improved, and the final identification effect is improved simultaneously by improving the identification accuracy of the submodels.

Specifically, CRNN (convolutional recurrent neural network), which is a combination of CNN + RNN, has both a strong feature extraction capability of CNN and the same properties as RNN, and can generate a series of serialized labels. The entire CRNN is divided into three parts: convolutional layer, cyclic layer, and transcriptional layer.

Wherein the convolutional layer is a CNN network for extracting features (code input 32 × 256 × 1); the cyclic layer predicts the signature (true value) distribution (64 × 512) of the signature sequences obtained from the convolutional layer using deep bidirectional RNNs; the transcription layer was trained using CTCs.

Specifically, the CNRR algorithm inputs the entry image with the normalized height, extracts a feature map based on the CNN, and segments the feature map in columns. During the training process, the approximate soft alignment of the character position and the class mark is realized through the guidance of the CTC loss function.

Specifically, the more consideration in the training of CTCs is to maximize the sum of the probabilities of paths contained by labels that may be mapped (deduplicated, nulled) (the posterior probability of a path is the accumulation of the probabilities for each time slice if CTCs assume that the output for each time slice is independent of each other), then the path that can be mapped to the correct result is more likely to be searched when output according to the path with the highest probability given the input search. And the condition of 'many-to-one' is considered during searching, so that the possibility of decoding a correct result is further increased.

Further, before step S02, the method further includes:

determining the character type and the language type of the marking bill based on the marking information of the marking bill;

and constructing a multi-language text detection model as the text recognition model based on the character type and the language type of the labeling bill.

In this embodiment, in order to adapt to multi-language bill recognition, according to the tagging information of the tagged bill, the character types and the language types in the bill are recognized, distinguished and classified, for example, the handwriting and the print in the bill are distinguished, and different language types such as chinese characters, english, numbers and the like in the bill are distinguished, different types of text information are classified and summarized, and a corresponding recognition lexicon is established for all types of text.

Specifically, based on the recognition lexicon corresponding to different types, in this embodiment, a text detection model including multiple languages may be established, and multiple text detection models corresponding to different languages may also be established, so that the text recognition model can recognize different types of text information, and can simultaneously support multi-language text detection, thereby improving the applicability of the text recognition model.

Further, step S02 specifically includes:

identifying different types of text information of the marked bill based on the text identification model;

extracting keywords corresponding to the different types of text information based on the different types of text information;

and positioning the identification elements corresponding to the keywords based on the keywords and a preset threshold interval.

In this embodiment, a single labeling ticket includes a plurality of identification elements, and for each identification element, the type and position of the identification element can be determined according to the extraction of the keyword.

Specifically, in the text information, the recognition element of the recognition task is positioned using a keyword. In order to ensure the accurate positioning of the identified elements, other key words with certain threshold spacing from top to bottom, left to right of the key elements are used for confirming and locking the current identified elements, and the accuracy of element positioning is improved; the keywords may be "date", "amount", "signature", "seal", and the like.

Specifically, the extraction of the keywords may adopt an unsupervised keyword extraction method, or may adopt a supervised keyword extraction method.

Specifically, the unsupervised keyword extraction method mainly comprises the following steps: keyword extraction (TF, TF-IDF) based on statistical features; keyword extraction based on a word graph model (PageRank, textRank); keyword extraction (LDA) based on topic models. The key word extraction algorithm based on the statistical characteristics has the idea that the key words of the document are extracted by using the statistical information of the words in the document; extracting keywords based on a word graph model, firstly constructing a language network graph of a document, then analyzing the language network graph, and searching words or phrases with important functions on the graph, wherein the phrases are the keywords of the document; the keyword extraction algorithm based on the theme mainly utilizes the property about theme distribution in the theme model to extract keywords.

In a specific embodiment, unsupervised keyword extraction can be performed by extracting more important words in a text as keywords, and the method comprises the steps of extracting candidate words, scoring each candidate word, and outputting top K candidate words with the highest scores as the keywords. According to different scoring strategies, different algorithms exist, such as TF-IDF, text Rank, LDA and other algorithms. For example, the TextRank algorithm is adopted to realize the TextRank algorithm, the TextRank algorithm is based on graph sorting, a Text is divided into a plurality of composition units (words and sentences), a graph model is established, important components in the Text are sorted by using a voting mechanism, and keyword extraction can be realized only by using the information of a bill. When a Text Rank algorithm is used for extracting keywords, a graph needs to be constructed firstly. As for the idea that an edge can utilize n-gram, a word is only related to n words nearby, that is, a node corresponding to n words nearby is connected with a non-directional edge.

Specifically, the supervised keyword extraction method regards the keyword extraction process as a binary problem, extracts candidate words, then defines a label for each candidate word, whether the label is a keyword or not, and then trains a keyword extraction classifier. When a new bill is obtained, extracting all candidate words, classifying each candidate word by using a trained keyword extraction classifier, and finally taking the candidate word with the label as the keyword.

Further, step S03 specifically includes:

positioning the position coordinates of the current keywords in the marked bill as the position coordinates of the current identification elements corresponding to the current keywords;

determining the position coordinates of the adjacent identification elements of the current identification element based on the preset threshold distance and the position coordinates of the current identification element;

and based on the position coordinates, segmenting the labeling bill and extracting the element sub-picture.

In this embodiment, the current keyword positioning mainly adopts methods of proximity search and text segmentation. The adjacent search is to circle a single line of text, and the text cutting is to cut the single line of text into single words.

Specifically, based on the feature text box extracted by text recognition, connected region search is performed, and each obtained connected region is regarded as a character. Through the realization of the adjacent search algorithm, the areas which can be formed into characters can be integrated to obtain a single line of text areas, and the situation that some characters are separated because of no connectivity is avoided, such as characters of 'eight', 'meta', and the like in Chinese characters is avoided. The purpose of the proximity search is to expand to "glue" together regions that may be typed. If the expansion is performed without searching, the expansion is performed in all directions simultaneously, so that it is possible to bond the up and down lines. Thus, the region must be allowed to expand in only a single direction.

In a particular embodiment, the expansion direction (up, down, left, right) is determined by searching for a neighboring region: starting from a connected region, a horizontal circumscribed rectangle of the connected region can be found, and the connected region is expanded to the whole rectangle. When the distance between the region and the nearest neighboring region is less than a certain range, the expansion of the rectangle is considered, and the direction of the expansion is the direction of the nearest neighboring region.

It can be understood that, since there are up, down, left and right directions in the distance, and according to the proximity search method, characters of the current keyword within a preset threshold distance may be bound together, the expansion direction needs to be defined, for example: the expansion operation is performed if and only if the direction is horizontal. At this point, it is constrained to allow only lateral expansion.

Specifically, the position coordinates of the current keyword in the picture to be predicted are determined, the text area of the current recognition element is determined through proximity search, and the position coordinates of the text area of the current recognition element can be calculated according to the preset threshold distance and the position coordinates of the current keyword. And cutting the text area to obtain a plurality of element sub-pictures.

Further, based on the embodiment shown in fig. 2, in this embodiment, the step S30 specifically includes:

determining the recognition submodel corresponding to the target recognition element based on the target recognition element;

and based on the recognition word bank corresponding to the recognition sub-model, concurrently calling the recognition sub-model, and performing information recognition on the target sub-picture to obtain the target information.

In this embodiment, the text information type of all the target identification elements is determined through the text information of all the target identification elements, and then the corresponding recognizer model is called to identify different target identification elements. For example, the corresponding Chinese character recognition sub-model is called for the print form Chinese character, and the print form Chinese character is inquired through the Chinese character recognition word bank corresponding to the Chinese character recognition sub-model, so that the text information of the print form Chinese character is analyzed.

Specifically, the corresponding identifier models need to be matched according to the target identification elements, all the identifier models need to be invoked simultaneously, and all the target identification elements need to be identified synchronously, so that the corresponding target information is read out. And the identification submodels are called concurrently, so that the identification efficiency can be improved, the identification interference among different elements can be avoided, and the identification precision can be improved.

In addition, the embodiment of the invention also provides a multi-language bill recognition device.

Referring to fig. 4, fig. 4 is a functional block diagram of a multi-language bill recognition apparatus according to a first embodiment of the present invention.

In this embodiment, the multi-language bill recognition apparatus includes:

the identification element positioning module 10 is used for acquiring at least one picture to be predicted, performing text identification on the picture to be predicted based on a preset identification model, and positioning at least one target identification element;

a sub-picture extracting module 20, configured to segment the picture to be predicted based on the target identification element, so as to obtain at least one target sub-picture;

and the information identification module 30 is configured to concurrently call an identification submodel corresponding to the target sub-picture, perform information identification on the target sub-picture, and obtain target information.

Further, the multi-language bill recognition apparatus includes a recognition model training module, and the recognition model training module specifically includes:

the marked bill obtaining unit is used for obtaining a bill training set and obtaining a marked bill based on the preset recognition model and the bill training set;

the identification element positioning unit is used for identifying the marked bill based on the text identification model and positioning the identification element;

and the identifier sub-model building unit is used for extracting element sub-pictures based on the identified elements and building the identifier sub-model based on the element sub-pictures.

Further, the multi-language bill recognition device further includes a text recognition model construction module, and the text recognition model construction module specifically includes:

the text information determining unit is used for determining the character type and the language type of the marking bill based on the marking information of the marking bill;

and the text recognition model construction unit is used for constructing a multi-language text detection model as the text recognition model based on the character type and the language type of the marked bill.

Further, the identification element positioning unit specifically includes:

the text information identification subunit is used for identifying different types of text information of the marked bill based on the text identification model;

the keyword extraction subunit is used for extracting keywords corresponding to the different types of text information based on the different types of text information;

and the identification element positioning subunit is used for positioning the identification element corresponding to the keyword based on the keyword and a preset threshold interval.

Further, the identification submodel building unit specifically includes:

the element coordinate positioning subunit is used for positioning the position coordinate of the current keyword in the marked bill as the position coordinate of the current identification element corresponding to the current keyword;

an adjacent element coordinate determination subunit, configured to determine, based on the preset threshold distance and the position coordinate of the current identified element, a position coordinate of an adjacent identified element of the current identified element;

and the element sub-picture extracting sub-unit is used for segmenting the labeling bill based on the position coordinates and extracting the element sub-picture.

Further, the multi-language bill recognition device further comprises a word bank recognition docking module, wherein the word bank recognition docking module specifically comprises:

the service interface establishing unit is used for establishing an element identification service interface based on the identification submodel;

the recognition word stock determining unit is used for determining a recognition word stock corresponding to the element sub-picture based on the text information in the element sub-picture;

and the recognition word stock connecting unit is used for establishing connection between the element recognition service interface and the recognition word stock based on the recognition submodel.

Further, the information identification module 30 specifically includes:

an identification submodel determination unit configured to determine, based on the target identification element, the identification submodel corresponding to the target identification element;

and the target information obtaining unit is used for concurrently calling the recognition sub-model based on the recognition word bank corresponding to the recognition sub-model, performing information recognition on the target sub-picture and obtaining the target information.

Each module in the multi-language bill recognition device corresponds to each step in the multi-language bill recognition method embodiment, and the functions and implementation processes thereof are not described in detail herein.

In addition, the embodiment of the invention also provides a computer readable storage medium.

The present computer readable storage medium has stored thereon a multi-language bill identification program, wherein the multi-language bill identification program, when executed by a processor, implements the steps of the multi-language bill identification method as described above.

The method implemented by the multi-language bill recognition program when executed can refer to the embodiments of the multi-language bill recognition method of the present invention, and will not be described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or system comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A multi-language bill recognition method, comprising the steps of:

acquiring at least one picture to be predicted, performing text recognition on the picture to be predicted based on a preset recognition model, and positioning at least one target recognition element;

based on the target identification element, dividing the picture to be predicted to obtain at least one target sub-picture;

and concurrently calling an identification submodel corresponding to the target sub-picture, and carrying out information identification on the target sub-picture to obtain target information.

2. The method for multi-language bill recognition according to claim 1, wherein before performing text recognition on the picture to be predicted based on a preset recognition model and locating at least one target recognition element, the method further comprises:

acquiring a bill training set, and acquiring a marked bill based on the preset recognition model and the bill training set;

identifying the marked bill based on a text identification model, and positioning identification elements;

and extracting element sub-pictures based on the identification elements, and constructing the identification sub-models based on the element sub-pictures.

3. The multi-lingual ticket identification method of claim 2, wherein identifying the annotated ticket based on the text recognition model further comprises, prior to locating the identification element:

determining the character type and the language type of the annotation bill based on the annotation information of the annotation bill;

4. The multi-lingual ticket identification method of claim 3, wherein said identifying the labeled ticket based on the text recognition model, locating an identification element, comprises:

5. The multi-lingual ticket identification method of claim 2, wherein said extracting an element sub-picture based on said identification element comprises:

determining the position coordinates of the adjacent identification elements of the current identification element based on the preset threshold interval and the position coordinates of the current identification element;

6. The multi-lingual bill identification method of claim 2, wherein after the element sub-picture is constructed into the identification sub-model, the method further comprises:

establishing an element identification service interface based on the identification submodel;

determining an identification word bank corresponding to the element sub-picture based on the text information in the element sub-picture;

and establishing connection between the element recognition service interface and the recognition word stock based on the recognition submodel.

7. The multi-lingual bill identification method according to any one of claims 1 to 6, wherein said concurrently invoking an identification submodel corresponding to each identification element to perform information identification on the target sub-picture to obtain target information comprises:

8. A multi-language bill recognition apparatus, comprising:

the identification element positioning module is used for acquiring at least one picture to be predicted, performing text identification on the picture to be predicted based on a preset identification model, and positioning at least one target identification element;

the sub-picture extraction module is used for segmenting the picture to be predicted based on the target identification element to obtain at least one target sub-picture;

and the information identification module is used for concurrently calling an identification submodel corresponding to the target sub-picture, and performing information identification on the target sub-picture to obtain target information.

9. A multi-language bill recognition apparatus comprising a processor, a memory, and a multi-language bill recognition program stored on the memory and executable by the processor, wherein the multi-language bill recognition program when executed by the processor implements the steps of the multi-language bill recognition method of any one of claims 1 to 7.

10. A computer-readable storage medium, having a multi-language ticket recognition program stored thereon, wherein the multi-language ticket recognition program, when executed by a processor, performs the steps of the multi-language ticket recognition method of any of claims 1-7.