CN111582273B

CN111582273B - Image text recognition method and device

Info

Publication number: CN111582273B
Application number: CN202010385297.3A
Authority: CN
Inventors: 邓小远; 姜璐; 钟华; 高天宁
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-05-09
Filing date: 2020-05-09
Publication date: 2023-10-10
Anticipated expiration: 2040-05-09
Also published as: CN111582273A

Abstract

The application provides an image text recognition method and device, wherein the method comprises the following steps: receiving a target text image and acquiring target cutting fragments obtained after cutting the target text image; determining an image text type corresponding to the target text image by applying a preset image classification model and the target cutting fragments; inputting the target cutting fragments into a preset positioning model corresponding to the image text type, and taking the output result of the positioning model as a text region image to be identified of the target text image; the text region image to be recognized is input into a preset optical character text recognition model corresponding to the image text type, and the output result of the optical character text recognition model is used as the optical character recognition text of the target text image.

Description

Image text recognition method and device

Technical Field

The present application relates to the field of text recognition technologies, and in particular, to a method and an apparatus for recognizing image text.

Background

Currently, image text recognition methods for improving the recognition rate of optical character recognition (Optical Character Recognition, OCR) text mainly include methods for improving at least one of an image classification model, a text localization model and a text recognition model by means of a deep learning algorithm, and methods for improving image quality by means of image processing means. These two methods have the following drawbacks:

(1) The development cost of the algorithm is high and the period is long. In addition, the model related to the OCR text recognition system is generally trained by the deep learning algorithm with the best current performance and the most suitable scene, and the optimal model parameters of the related algorithm are also determined, so that the attempt to improve the OCR text recognition rate through the optimization algorithm is not practical in a short period of time.

(2) The image quality is improved by means of image preprocessing, the method is suitable for images with single formats, relatively single parameter dimensions and relatively obvious definition, and is particularly suitable for the field of color recognition, but the method has poor performance on the conditions of various formats of text images, complex and various characters, handwriting and the like.

Disclosure of Invention

Aiming at the problems in the prior art, the application provides an image text recognition method and an image text recognition device, which can improve the accuracy and efficiency of recognizing the text by the optical characters of the text image and can improve the application scene universality of the image text recognition method.

In order to solve the technical problems, the application provides the following technical scheme:

in a first aspect, the present application provides an image text recognition method, including:

receiving a target text image and acquiring target cutting fragments obtained after cutting the target text image;

determining an image text type corresponding to the target text image by applying a preset image classification model and the target cutting fragments;

inputting the target cutting fragments into a preset positioning model corresponding to the image text type, and taking the output result of the positioning model as a text region image to be identified of the target text image;

and inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as the optical character recognition text of the target text image.

Further, the preset positioning model includes: a handwriting image positioning model and a printing image positioning model; correspondingly, the inputting the target cutting fragments into the preset positioning model corresponding to the image text type comprises the following steps: if the image text type is a handwriting image text type, inputting the target cutting fragments into a preset handwriting image positioning model; and if the image text type is the printing body image text type, inputting the target cutting fragments into a preset printing body image positioning model.

Further, the preset optical character text recognition model includes: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, the inputting the text region image to be recognized into the preset optical character text recognition model corresponding to the image text type comprises the following steps: if the image text type is a handwriting image text type, inputting the text region image to be recognized into a preset handwriting optical character text recognition model; and if the image text type is the printing body image text type, inputting the text region image to be identified into a preset printing body optical character text identification model.

Further, the obtaining the target cutting fragments obtained after the target text image is cut includes: and outputting the target text image and receiving target cutting fragments obtained after the target text image is manually cut.

Further, the obtaining the target cutting fragments obtained after the target text image is cut includes: inputting the target text image into a preset primary domain positioning model, and taking the output result of the preset primary domain positioning model as the target cutting fragments.

Further, the image classification model is a deep learning model based on a ResNet18 algorithm.

Further, the handwriting optical character text recognition model and the printing optical character text recognition model are both convolutional recurrent neural network models.

Further, before the text region image to be recognized is input into the preset optical character text recognition model corresponding to the image text type, the method further comprises: acquiring a historical handwriting text region image sample set to be identified and target actual text information corresponding to each sample in the historical handwriting text region image sample set to be identified; training the handwriting optical character text recognition model by applying the historical handwriting text region image sample set to be recognized and target actual text information; the text region image sample set to be recognized by the historical handwriting comprises: and inputting the marked historical handwriting cutting fragment samples into a corresponding preset positioning model to obtain a historical handwriting to-be-identified text region image sample.

Further, before the text region image to be recognized is input into the preset optical character text recognition model corresponding to the image text type, the method further comprises: acquiring a text region image sample set to be identified of a historical printing body and target actual text information corresponding to each sample in the text region image sample set to be identified of the historical printing body; training the optical character text recognition model of the printing body by applying the image sample set of the text region to be recognized of the historical printing body and target actual text information; the text region image sample set to be identified of the historical printing body comprises: and inputting the marked historical printing body cutting fragment sample into a corresponding preset positioning model to obtain a text region image sample to be identified of the historical printing body.

In a second aspect, the present application provides an image text recognition apparatus comprising:

the target cutting fragment obtaining module is used for receiving the target text image and obtaining target cutting fragments obtained after the target text image is cut;

the image text type determining module is used for determining the image text type corresponding to the target text image by applying a preset image classification model and the target cutting fragments;

the text region image module to be identified is used for inputting the target cutting fragments into a preset positioning model corresponding to the image text type, and taking the output result of the positioning model as a text region image to be identified of the target text image;

and the image text recognition module is used for inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as the optical character recognition text of the target text image.

Further, the preset positioning model includes: a handwriting image positioning model and a printing image positioning model; correspondingly, the module for obtaining the text region image to be identified comprises: the handwriting image positioning unit is used for inputting the target cutting fragments into a preset handwriting image positioning model if the image text type is a handwriting image text type; and the printing body image positioning unit is used for inputting the target cutting fragments into a preset printing body image positioning model if the image text type is the printing body image text type.

Further, the preset optical character text recognition model includes: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, the image text recognition module comprises: the image handwriting text recognition unit is used for inputting the text region image to be recognized into a preset handwriting optical character text recognition model if the image text type is a handwriting image text type; and the image printing body text recognition unit is used for inputting the text region image to be recognized into a preset printing body optical character text recognition model if the image text type is the printing body image text type.

In a third aspect, the present application provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the image text recognition method when executing the program.

In a fourth aspect, the present application provides a computer readable storage medium having stored thereon computer instructions which, when executed, implement the image text recognition method.

According to the technical scheme, the application provides an image text recognition method and device. Wherein the method comprises the following steps: receiving a target text image and acquiring target cutting fragments obtained after cutting the target text image; determining an image text type corresponding to the target text image by applying a preset image classification model and the target cutting fragments; inputting the target cutting fragments into a preset positioning model corresponding to the image text type, and taking the output result of the positioning model as a text region image to be identified of the target text image; inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as the optical character recognition text of the target text image, so that the accuracy and efficiency of recognizing the optical character recognition text of the text image can be improved, and the application scene universality of the image text recognition method can be improved; specifically, the recognition rate of OCR texts with various characters and complicated handwriting can be improved, meanwhile, the universality and the applicability of complex scenes of the image text recognition method can be ensured, and the image text recognition process is efficient and reliable; further, the processing process of image cutting can be simplified, and the text image cutting efficiency and accuracy can be improved; the processing processes of the cutting fragments with larger text characteristic difference are independent, so that the capability of coping with complex scenes can be improved; the training sample data sources have diversity, so that the accuracy and the efficiency of the model can be improved; in addition, the cost of image text recognition can be reduced.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an image text recognition method according to an embodiment of the application;

fig. 2 is a flowchart of step S301 and step S302 of the image text recognition method in the embodiment of the present application;

fig. 3 is a flowchart of step S401 and step S402 of the image text recognition method in the embodiment of the present application;

FIG. 4 is a flowchart of an image text recognition method including step S112 according to an embodiment of the present application;

fig. 5 is a flowchart of step S411 and step S412 of the image text recognition method in the embodiment of the present application;

fig. 6 is a flowchart illustrating steps S421 and S422 of the image text recognition method according to the embodiment of the present application;

FIG. 7 is a schematic diagram of an image text recognition device according to an embodiment of the present application;

FIG. 8 is a schematic flow chart of a system preparation phase in an embodiment of the application;

FIG. 9A is a schematic diagram of a portion of information of an overseas handwriting credential in an embodiment of the application;

FIG. 9B is a schematic illustration of a portion of information of an overseas print credential in an embodiment of the application;

FIG. 10A is a schematic diagram of a target cut fragment corresponding to an overseas handwriting credential in an embodiment of the application;

FIG. 10B is a schematic diagram of a target cut fragment corresponding to an overseas print credential in an embodiment of the application;

FIG. 11A is a schematic diagram of partial information of another overseas handwriting credential in a specific application example of the application;

FIG. 11B is a schematic illustration of a portion of information of another overseas print form credential in an embodiment of the application;

FIG. 12A is a schematic diagram of a target cut fragment corresponding to another overseas handwriting credential in a specific application of the application;

FIG. 12B is a schematic view of a target cut fragment corresponding to another overseas print credential in a specific application example of the application;

FIG. 13A is a schematic view of an image of a text region to be identified corresponding to another overseas handwriting credential in a specific application example of the present application;

FIG. 13B is a schematic view of an image of a text region to be identified corresponding to another overseas print credential in a specific application example of the present application;

FIG. 14 is a schematic flow chart of a system setup phase in a specific application example of the present application;

fig. 15 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions in the present specification better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In order to solve the problems existing in the prior OCR text recognition process, the application provides an image text recognition method and device, and the method is a general method capable of improving the OCR text recognition rate, and firstly, format cutting is carried out on a scene image; inputting the cut target cutting fragments into an image classification model to finish isolation of the cutting fragments with larger characteristic differences; and inputting the isolated cutting fragments into a corresponding positioning model to obtain a text region image to be identified, and finally, properly filling (packing) the fragments to be identified and zooming (resize) the images and then sending the fragments to the corresponding optical character text recognition model to perform text recognition.

Based on this, in order to improve accuracy and efficiency of recognizing an optical character recognition text of a text image and to improve the universality of application scenes of an image text recognition method, an embodiment of the present application provides an image text recognition apparatus, which may be a server or a client device, and the client device may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), a vehicle-mounted device, a smart wearable device, and the like. Wherein, intelligent wearing equipment can include intelligent glasses, intelligent wrist-watch and intelligent bracelet etc..

In practical applications, the part for image text recognition may be performed on the server side as described above, or all operations may be performed in the client device. Specifically, the selection may be made according to the processing capability of the client device, and restrictions of the use scenario of the user. The application is not limited in this regard. If all operations are performed in the client device, the client device may further include a processor.

The client device may have a communication module (i.e. a communication unit) and may be connected to a remote server in a communication manner, so as to implement data transmission with the server. The server may include a server on the side of the task scheduling center, and in other implementations may include a server of an intermediate platform, such as a server of a third party server platform having a communication link with the task scheduling center server. The server may include a single computer device, a server cluster formed by a plurality of servers, or a server structure of a distributed device.

Any suitable network protocol may be used for communication between the server and the client device, including those not yet developed on the filing date of the present application. The network protocols may include, for example, TCP/IP protocol, UDP/IP protocol, HTTP protocol, HTTPS protocol, etc. Of course, the network protocol may also include, for example, RPC protocol (Remote Procedure Call Protocol ), REST protocol (Representational State Transfer, representational state transfer protocol), etc. used above the above-described protocol.

The following examples are presented in detail.

As shown in fig. 1, in order to improve accuracy and efficiency of recognizing an optical character recognition text of a text image and further improve application scenario universality of an image text recognition method, the embodiment provides an image text recognition method with an execution subject being an image text recognition device, which specifically includes the following contents:

s100: and receiving the target text image and acquiring target cutting fragments obtained after cutting the target text image.

Specifically, the target cutting fragments are text images obtained by cutting the target text images and containing preset text distinguishing information, the text distinguishing information can be set according to actual conditions, for example, in an overseas voucher image scene, text information such as "MOP", "HK" and "or ticket holder" which has obvious characteristics and is not in other areas in the images exist nearby the amount of money, and if the text to be identified of the target text images is the amount of money, the text distinguishing information can be at least one of "MOP", "HK" and "or" ticket holder ".

S200: and determining the image text type corresponding to the target text image by applying a preset image classification model and the target cutting fragments.

Specifically, the image text type includes a print text type and a handwriting text type. The image classification model may be a deep learning model based on the ResNet18 algorithm. The image text type corresponding to the target text image may be the image text type corresponding to the text region to be identified of the target text image, or may be the image text type corresponding to the whole target text image.

S300: and inputting the target cutting fragments into a preset positioning model corresponding to the image text type, and taking an output result of the positioning model as a text region image to be identified of the target text image.

It can be understood that the text region to be identified is the region where the text to be identified is located in the target text image; in addition, according to the text type of the image, a corresponding preset positioning model is determined, and the cutting fragments with large text characteristic difference can be divided into the conditions for acquiring the image of the text region to be identified, so that the accuracy for acquiring the image of the text region to be identified is improved, and concretely, the large text characteristic difference means that the difference between the text characteristic of the printed body and the text characteristic of the handwriting is large.

S400: and inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as the optical character recognition text of the target text image.

Specifically, a corresponding preset optical character text recognition model is determined according to the type of the image text, so that text recognition can be realized on the image division condition of the text region to be recognized with large text characteristic difference, and the accuracy of the text recognition is further improved.

In order to improve accuracy and efficiency of obtaining an image of a text region to be recognized, and further identify accuracy and efficiency of recognizing text by applying the image of the text region to be recognized, and further improve universality of application scenarios of an image text recognition method, referring to fig. 2, in one embodiment of the present application, the preset positioning model includes: a handwriting image positioning model and a printing image positioning model; correspondingly, in step S300, the inputting the target cutting chips into the preset positioning model corresponding to the image text type includes:

s301: and if the image text type is the handwriting image text type, inputting the target cutting fragments into a preset handwriting image positioning model.

S302: and if the image text type is the printing body image text type, inputting the target cutting fragments into a preset printing body image positioning model.

Specifically, the handwriting image positioning model and the print image positioning model may be both a fast-RCNN model.

In order to realize recognition of the image text according to the type of the image text, further improve accuracy and efficiency of recognizing the text by the optical character recognition of the text image, and further improve the application scenario universality of the image text recognition method, referring to fig. 3, in one embodiment of the present application, the preset optical character text recognition model includes: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, in step S400, inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type includes:

s401: and if the image text type is the handwriting image text type, inputting the text region image to be recognized into a preset handwriting optical character text recognition model.

S402: and if the image text type is the printing body image text type, inputting the text region image to be identified into a preset printing body optical character text identification model.

In particular, the handwriting optical character text recognition model and the printed optical character text recognition model may each be a Convolutional Recurrent Neural Network (CRNN) model.

In order to flexibly determine the image cutting mode of the target text image, improve the universality of the text recognition process and simplify the image cutting process so as to further improve the universality of the application scene of the image text recognition method, in one embodiment of the application, step S100 includes:

s111: and outputting the target text image and receiving target cutting fragments obtained after the target text image is manually cut.

Specifically, in this embodiment, the text style type corresponding to the target text image, for example, a transfer certificate type, and if the positions of the text regions to be identified in the text image of the text style type are the same, the same cutting format may be used to implement manual cutting.

In order to flexibly determine the image cutting mode of the target text image, improve the universality of the text recognition process and simplify the image cutting process, and further improve the universality of the application scenario of the image text recognition method, referring to fig. 4, in one embodiment of the present application, step S100 includes:

S112: inputting the target text image into a preset primary domain positioning model, and taking the output result of the preset primary domain positioning model as the target cutting fragments.

In particular, the primary domain positioning model may be a Faster-RCNN model.

In order to obtain an efficient and reliable optical character text recognition model, and apply the reliable optical character text recognition model to further improve accuracy and efficiency of recognizing the optical character recognition text of the text image, referring to fig. 5, in one embodiment of the present application, before step S400, the method further includes:

s411: and acquiring a historical handwriting text region image sample set to be recognized and target actual text information corresponding to each sample in the historical handwriting text region image sample set to be recognized.

The target actual text information is the actual text information corresponding to the output result of the handwriting optical character text recognition model.

S412: training the handwriting optical character text recognition model by applying the historical handwriting text region image sample set to be recognized and target actual text information; the text region image sample set to be identified by the historical handwriting comprises: and inputting the marked historical handwriting cutting fragment samples into a corresponding preset positioning model to obtain a historical handwriting to-be-identified text region image sample.

It can be understood that by applying the marked historical handwriting cutting fragment sample and inputting the unlabeled historical handwriting cutting fragment sample into the corresponding preset positioning model to obtain the sample of the image sample of the text region to be recognized of the historical handwriting, the handwriting optical character text recognition model is trained, the reliability of data in the handwriting optical character text recognition model training process can be improved, and the training accuracy of the handwriting optical character text recognition model is further improved.

In order to obtain an efficient and reliable optical character text recognition model, and apply the efficient and reliable optical character text recognition model to further improve accuracy and efficiency of recognizing an optical character recognition text of a text image, referring to fig. 6, in an embodiment of the present application, before step S400, the method further includes:

s421: and acquiring a text region image sample set to be identified of the historical printing body and target actual text information corresponding to each sample in the text region image sample set to be identified of the historical printing body.

S422: training the optical character text recognition model of the printing body by applying the image sample set of the text region to be recognized of the historical printing body and target actual text information; the text region image sample set to be identified of the historical printing body comprises: and inputting the marked historical printing body cutting fragment sample into a corresponding preset positioning model to obtain a text region image sample to be identified of the historical printing body.

It can be understood that the printed body optical character text recognition model is trained by applying the marked historical printed body cutting fragment sample and inputting the unlabeled historical printed body cutting fragment sample into the corresponding preset positioning model to obtain the sample of the image sample of the text region to be recognized of the historical printed body, so that the reliability of data in the process of training the printed body optical character text recognition model can be improved, and the training accuracy of the printed body optical character text recognition model is further improved.

In one example, step S411 or S421 includes:

s401: and acquiring a historical text image sample set, and respectively cutting each historical text image sample in the historical text image sample set to obtain a historical cutting fragment sample set.

S402: and determining the image text type of each historical cutting fragment sample in the historical cutting fragment sample set by applying the preset image classification model.

S403: obtaining a text region image sample set to be recognized of the historical handwriting based on the image text type, the historical cutting fragment sample set and a preset positioning model; and obtaining a text region image sample set to be identified of the historical printing body based on the image text type, the historical cutting fragment sample set and a preset positioning model.

In order to obtain an efficient and reliable primary domain positioning model, and further improve accuracy and efficiency of recognizing an optical character recognition text of a text image by applying the efficient primary domain positioning model, in one example, the method further comprises: and acquiring a marked historical text image sample set, and training the primary domain positioning model by applying the marked historical text image sample set.

In order to obtain an efficient and reliable image classification model and apply the efficient image classification model to further improve the accuracy and efficiency of recognizing the optical character recognition text of the text image, in one example, after step S401, the method further includes: determining image text type information of each historical cutting fragment sample in the historical cutting fragment sample set; and training the image classification model by using the historical cutting fragment sample set and the image text type information.

In order to obtain an efficient and reliable positioning model and apply the efficient positioning model to further improve the accuracy and efficiency of recognizing the optical character recognition text of the text image, in one example, after step S402, the method further includes:

marking data of each historical cutting fragment sample according to the image sample type; and training a corresponding positioning model by applying each marked historical cutting fragment sample. Specifically, if the data annotation cannot correct the incorrectly classified fragments, for example, the image classification model erroneously identifies the handwriting cut fragment sample as a print cut fragment sample, the handwriting cut fragment sample should still be marked as a print cut fragment sample when the data annotation is performed.

In order to improve accuracy and efficiency of recognizing text by optical characters of a text image and further improve application scenario universality of an image text recognition method, the application provides an embodiment of an image text recognition device for realizing all or part of contents in the image text recognition method, referring to fig. 7, wherein the image text recognition device specifically comprises the following contents:

the target cutting chip obtaining module 10 is configured to receive a target text image and obtain a target cutting chip obtained after cutting the target text image.

The image text type determining module 20 is configured to determine an image text type corresponding to the target text image by applying a preset image classification model and the target cutting chips.

The text region image to be identified module 30 is configured to input the target cutting debris into a preset positioning model corresponding to the image text type, and take an output result of the positioning model as a text region image to be identified of the target text image.

The image text recognition module 40 is configured to input the text area image to be recognized into a preset optical character text recognition model corresponding to the image text type, and take an output result of the optical character text recognition model as an optical character recognition text of the target text image.

In one embodiment of the present application, the preset positioning model includes: a handwriting image positioning model and a printing image positioning model; correspondingly, the module for obtaining the text region image to be identified comprises:

and the handwriting image positioning unit is used for inputting the target cutting fragments into a preset handwriting image positioning model if the image text type is a handwriting image text type.

And the printing body image positioning unit is used for inputting the target cutting fragments into a preset printing body image positioning model if the image text type is the printing body image text type.

In one embodiment of the present application, the preset optical character text recognition model includes: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, the image text recognition module comprises:

and the image handwriting text recognition unit is used for inputting the text region image to be recognized into a preset handwriting optical character text recognition model if the image text type is a handwriting image text type.

And the image printing body text recognition unit is used for inputting the text region image to be recognized into a preset printing body optical character text recognition model if the image text type is the printing body image text type.

The embodiments of the image text recognition apparatus provided in the present disclosure may be specifically used to execute the processing flow of the embodiments of the image text recognition method, and the functions thereof are not described herein again, and may refer to the detailed description of the embodiments of the image text recognition method.

In order to further explain the scheme, the application also provides a specific application example of the image text recognition method, and in the specific application example, the image text recognition method mainly relates to the technical contents of layout cutting, image preprocessing, training sample preparation, image classification, text positioning, handwriting amount recognition and printing amount recognition, and can be divided into two parts of system preparation and system construction, and the specific description is as follows:

FIG. 8 is a flow chart of a system preparation stage, which can be divided into three parts of layout cutting, business logic processing and model training, and the specific steps are as follows:

step 101: scene image style analysis. Specifically, step 101 includes: step 1011: judging whether a format is selected to realize format cutting; if not, go to step 1013: training a primary domain positioning model; and further comprises an execution step 1012 before executing step 1013: sample preparation and data labeling.

Three main points are considered: whether the layout can complete the layout cutting of all scene images or not; if multiple formats exist, analyzing whether characters or boundaries with obvious differences from other areas exist around a text area to be identified in the image, and if so, adopting a primary domain positioning method to make format cutting; otherwise, different cutting positions are designed for different formats. In the actual scenario, the first two cases are dominant, so this specific application example only considers the technical content of implementing layout cutting by adopting one layout or a one-time domain positioning model.

Step 102: the layout cutting process is specifically as follows:

(1) fig. 9A shows part of information of an overseas handwriting voucher, fig. 9A shows a cut area 1 of the overseas handwriting voucher, fig. 9B shows part of information of an overseas printing voucher, and fig. 9B shows a cut area 2 of the overseas printing voucher; the identification of the amount of the certificate is realized, firstly, the scene image style is analyzed, and as can be seen from fig. 9A and 9B, the position proportion of the amount to be identified in all the images in the original image is the same. Therefore, the images of any pattern are selected to manually cut the overseas handwriting industrial voucher scene image and the overseas printing industry voucher scene image, the results shown in fig. 10A and 10B are respectively obtained, fig. 10A shows the target cutting fragments corresponding to the cutting area 1, fig. 10B shows the target cutting fragments corresponding to the cutting area 2, then the position proportion of the target cutting fragments in the image is calculated, and the position proportion is used as a constant to be applied to other images for format cutting, so that the voucher scene image can be cut by one format.

(2) Fig. 11A shows part of information of another overseas handwriting voucher, fig. 11A shows a cut area 3 of the overseas handwriting voucher, fig. 11B shows part of information of another overseas printing voucher, and fig. 11B shows a cut area 4 of the overseas handwriting voucher; and realizing the identification of the amount of the certificate, and analyzing the scene image to find that the positions of the amount to be identified are distributed differently in proportion to the positions in the original image. Further analysis shows that there are characters "MOP", "HK" and "or ticket holder" near the amount that are more distinctive and that are absent from other areas in the image. Therefore, one-time domain positioning is adopted to replace various format cuts. The primary domain positioning mainly comprises sample preparation and data labeling, and then, training of a primary domain positioning model is carried out based on labeled training samples; the original picture is sent to the trained domain positioning model to realize format cutting, and the results are shown in fig. 12A and 12B, wherein fig. 12A shows target cutting fragments corresponding to the cutting region 3, fig. 12A shows a text region a to be recognized of the target cutting fragments 3, fig. 12B shows target cutting fragments corresponding to the cutting region 4, and fig. 12B shows a text region B to be recognized of the target cutting fragments 4.

Step 103: and training a classification model. Before executing step 103, step 1031 is further included: sample preparation, algorithm selection, and resolution.

Specifically, the classification of the cut-scrap handwriting and the print is performed, handwriting and print scrap samples are prepared, and then a proper model, such as a ResNet18 classification algorithm model, is selected and model parameters are set, including the image size setting. And finally, training a handwriting printing body classification model.

Step 104: training the handwriting positioning model and the printing positioning model, before step 104, further comprises step 1041: obtaining printing body fragments and handwriting body fragments; step 1042: and (5) marking data. The specific process is as follows:

(1) and (3) sending the cutting fragments obtained by the format cutting in the application step 102 to the step 103 for classifying handwriting and printing.

(2) And respectively carrying out data labeling on the classified handwriting cutting fragments and the printing body cutting fragments, wherein the emphasis is that the data labeling cannot artificially correct the fragments with wrong classification, if the classification model erroneously identifies the handwriting cutting fragments as the printing body cutting fragments, the handwriting cutting fragments still need to be labeled as the printing body cutting fragments during labeling.

(3) And then, respectively performing corresponding positioning model training by using the marked handwriting sample and the printed body sample. Fig. 13A shows a text region image to be recognized corresponding to the text region a to be recognized, fig. 13B shows a text region image to be recognized corresponding to the text region B to be recognized, and as shown in fig. 13A and fig. 13B, a trained positioning model is applied to obtain respective text region images to be recognized corresponding to the respective samples.

Step 105: training the handwriting recognition model and the printing recognition model, before step 105, further comprises: step 1051: sample preparation, application IOU (Intersection over Union) performance evaluation index, sample Padding and sample resize; the specific process is as follows:

(1) sample preparation, the sample for training the recognition model consists of an upstream labeled positioning model training sample, such as 30% and an output patch of the positioned model, such as 70%.

(2) To-be-identified fragment positioning effect evaluation, the handwriting sample and the printing sample positioning effect can be determined by using the IOU performance evaluation index, and the handwriting characters are quite complex because the printing characters are quite standard, so that the positioning effect is different.

(3) Aiming at the situation that the character to be identified is lost, incomplete and the like possibly exists in positioning, the handwriting cutting fragments and the printing cutting fragments after positioning are respectively filled (padding) appropriately according to the IOU evaluation index, and the method also comprises the step of processing the data marked on the upstream in the same way.

(4) After filling (padding) the cut pieces as per process (3), the image is further scaled (resized), typically by an integer multiple of the original size and an integer multiple of 8.

(5) Taking fragments processed in the process (4) as training samples of the recognition model, taking real characters corresponding to the fragments as training labels, enabling a recognition algorithm to be a CRNN algorithm, and then carrying out recognition model training.

(II) FIG. 14 is a schematic flow diagram of a system setup stage, which can be divided into three parts of layout cutting, business logic processing and model calling, and specifically includes the following steps:

step 106: and inputting a scene image.

Step 107: and (5) layout analysis.

Step 108: and judging whether a format is applied to realize the format cutting of the scene image according to the format analysis result.

If yes, executing the format cutting of the step 109 by using a format, and if not, executing the format cutting of the step 109 by using a primary domain positioning model.

Step 110: a resize; and carrying out image size transformation processing on the cutting fragments obtained after the format cutting.

Step 111: inputting the cut fragments after the resize treatment into a classification model, and classifying the cut fragments into handwriting and printing bodies.

Step 112: judging whether the character is a handwriting; if yes, go to step 113, if no, go to step 116.

Step 113: inputting the cutting fragments obtained in the application step 108 or the step 109 into a preset handwriting positioning model.

Step 114: and filling the preset padding proportion and carrying out restore scaling on the text region to be identified output by the preset handwriting positioning model.

Step 115: inputting the cut fragments processed in the application step 114 into a handwriting positioning model for text recognition, and obtaining a text recognition result. After the text recognition result is obtained, step 119 is performed.

Step 116: inputting the cutting fragments obtained in the step 109 into a preset printing body positioning model.

Step 117: and filling the preset padding proportion and carrying out restore scaling on the text region to be identified output by the preset printing body positioning model.

Step 118: inputting the cut fragments processed in the application step 117 into a printing body positioning model for text recognition, and obtaining a text recognition result.

Step 119: and outputting text.

As can be seen from the above description, in the image text recognition method and apparatus provided by the present application, handwriting text recognition and printing text recognition are completely independent, and particularly, a one-time domain positioning model is introduced to replace multiple format cutting fragments, so that format cutting of handwriting text with larger text feature difference and format cutting of printing text are completely independent, and simultaneously, a positioning model and a recognition model corresponding to each of the handwriting cutting fragments and the printing text cutting fragments are provided, so that accuracy of text positioning and text recognition can be further improved, and text recognition rate can be improved as a whole.

In order to improve accuracy and efficiency of recognizing text by optical characters of a text image and further improve application scene universality of an image text recognition method, from a hardware aspect, the application provides an embodiment of an electronic device for realizing all or part of contents in the image text recognition method, wherein the electronic device specifically comprises the following contents:

a processor (processor), a memory (memory), a communication interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete communication with each other through the bus; the communication interface is used for realizing information transmission between the image text recognition device and related equipment such as a user terminal; the electronic device may be a desktop computer, a tablet computer, a mobile terminal, etc., and the embodiment is not limited thereto. In this embodiment, the electronic device may be implemented with reference to the embodiment for implementing the image text recognition method and the embodiment for implementing the image text recognition apparatus, and the contents thereof are incorporated herein and are not repeated here.

Fig. 15 is a schematic block diagram of a system configuration of an electronic device 9600 according to an embodiment of the present application. As shown in fig. 15, the electronic device 9600 may include a central processor 9100 and a memory 9140; the memory 9140 is coupled to the central processor 9100. Notably, this fig. 15 is exemplary; other types of structures may also be used in addition to or in place of the structures to implement telecommunications functions or other functions.

In one or more embodiments of the application, the image text recognition function may be integrated into the central processor 9100. The central processor 9100 may be configured to perform the following control:

As can be seen from the above description, the electronic device provided by the embodiment of the application can improve the accuracy and efficiency of recognizing the text by the optical character of the text image, and can improve the universality of the application scene of the image text recognition method.

In another embodiment, the image text recognition apparatus may be configured separately from the central processor 9100, for example, the image text recognition apparatus may be configured as a chip connected to the central processor 9100, and the image text recognition function is implemented by control of the central processor.

As shown in fig. 15, the electronic device 9600 may further include: a communication module 9110, an input unit 9120, an audio processor 9130, a display 9160, and a power supply 9170. It is noted that the electronic device 9600 need not include all of the components shown in fig. 15; in addition, the electronic device 9600 may further include components not shown in fig. 15, and reference may be made to the related art.

As shown in fig. 15, the central processor 9100, sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 9100 receives inputs and controls the operation of the various components of the electronic device 9600.

The memory 9140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 9100 can execute the program stored in the memory 9140 to realize information storage or processing, and the like.

The input unit 9120 provides input to the central processor 9100. The input unit 9120 is, for example, a key or a touch input device. The power supply 9170 is used to provide power to the electronic device 9600. The display 9160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 9140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, etc. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. The memory 9140 may also be some other type of device. The memory 9140 includes a buffer memory 9141 (sometimes referred to as a buffer). The memory 9140 may include an application/function storage portion 9142, the application/function storage portion 9142 storing application programs and function programs or a flow for executing operations of the electronic device 9600 by the central processor 9100.

The memory 9140 may also include a data store 9143, the data store 9143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by an electronic device. The driver storage portion 9144 of the memory 9140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 9110 is a transmitter/receiver 9110 that transmits and receives signals via an antenna 9111. A communication module (transmitter/receiver) 9110 is coupled to the central processor 9100 to provide input signals and receive output signals, as in the case of conventional mobile communication terminals.

Based on different communication technologies, a plurality of communication modules 9110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 9110 is also coupled to a speaker 9131 and a microphone 9132 via an audio processor 9130 to provide audio output via the speaker 9131 and to receive audio input from the microphone 9132 to implement usual telecommunications functions. The audio processor 9130 can include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 9130 is also coupled to the central processor 9100 so that sound can be recorded locally through the microphone 9132 and sound stored locally can be played through the speaker 9131.

As can be seen from the above description, the electronic device provided by the embodiment of the application can improve the accuracy and efficiency of recognizing the text by the optical character of the text image, and can improve the universality of the application scenario of the image text recognition method.

An embodiment of the present application also provides a computer-readable storage medium capable of implementing all the steps of the image text recognition method in the above embodiment, the computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements all the steps of the image text recognition method in the above embodiment, for example, the processor implementing the steps of:

As can be seen from the above description, the computer-readable storage medium provided by the embodiments of the present application can improve the accuracy and efficiency of recognizing the optical character recognition text of the text image, and can improve the universality of the application scenario of the image text recognition method.

The embodiments of the method of the present application are described in a progressive manner, and the same and similar parts of the embodiments are all referred to each other, and each embodiment mainly describes differences from other embodiments. For relevance, see the description of the method embodiments.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present application have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. An image text recognition method, comprising:

inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as an optical character recognition text of the target text image;

the preset optical character text recognition model comprises the following steps: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, the inputting the text region image to be recognized into the preset optical character text recognition model corresponding to the image text type comprises the following steps: if the image text type is a handwriting image text type, inputting the text region image to be recognized into a preset handwriting optical character text recognition model; if the image text type is the printing body image text type, inputting the text region image to be identified into a preset printing body optical character text identification model;

Before the text region image to be recognized is input into the preset optical character text recognition model corresponding to the image text type, the method further comprises the following steps: acquiring a historical handwriting text region image sample set to be identified and target actual text information corresponding to each sample in the historical handwriting text region image sample set to be identified; training the handwriting optical character text recognition model by applying the historical handwriting text region image sample set to be recognized and target actual text information; the text region image sample set to be recognized by the historical handwriting comprises: and inputting the marked historical handwriting cutting fragment samples into a corresponding preset positioning model to obtain a historical handwriting to-be-identified text region image sample.

2. The image text recognition method of claim 1, wherein the predetermined positioning model comprises: a handwriting image positioning model and a printing image positioning model;

correspondingly, the inputting the target cutting fragments into the preset positioning model corresponding to the image text type comprises the following steps:

if the image text type is a handwriting image text type, inputting the target cutting fragments into a preset handwriting image positioning model;

And if the image text type is the printing body image text type, inputting the target cutting fragments into a preset printing body image positioning model.

3. The image text recognition method according to claim 1, wherein the obtaining the target cutting chips obtained after the cutting of the target text image comprises:

and outputting the target text image and receiving target cutting fragments obtained after the target text image is manually cut.

4. The image text recognition method according to claim 1, wherein the obtaining the target cutting chips obtained after the cutting of the target text image comprises:

inputting the target text image into a preset primary domain positioning model, and taking the output result of the preset primary domain positioning model as the target cutting fragments.

5. The image text recognition method of claim 1, wherein the image classification model is a deep learning model based on a res net18 algorithm.

6. The image text recognition method of claim 1, wherein the handwriting optical character text recognition model and the print optical character text recognition model are both convolutional recurrent neural network models.

7. The image text recognition method according to claim 1, further comprising, before the inputting the text region image to be recognized into the preset optical character text recognition model corresponding to the image text type:

acquiring a text region image sample set to be identified of a historical printing body and target actual text information corresponding to each sample in the text region image sample set to be identified of the historical printing body;

training the optical character text recognition model of the printing body by applying the image sample set of the text region to be recognized of the historical printing body and target actual text information;

the text region image sample set to be identified of the historical printing body comprises: and inputting the marked historical printing body cutting fragment sample into a corresponding preset positioning model to obtain a text region image sample to be identified of the historical printing body.

8. An image text recognition apparatus, comprising:

the image text recognition module is used for inputting the text region image to be recognized into a preset optical character text recognition model corresponding to the image text type, and taking the output result of the optical character text recognition model as the optical character recognition text of the target text image;

the preset optical character text recognition model comprises the following steps: a handwriting optical character text recognition model and a print optical character text recognition model; correspondingly, the image text recognition module comprises:

the image handwriting text recognition unit is used for inputting the text region image to be recognized into a preset handwriting optical character text recognition model if the image text type is a handwriting image text type;

the image printing body text recognition unit is used for inputting the text region image to be recognized into a preset printing body optical character text recognition model if the image text type is the printing body image text type;

9. The image text recognition device of claim 8, wherein the predetermined positioning model comprises: a handwriting image positioning model and a printing image positioning model;

correspondingly, the module for obtaining the text region image to be identified comprises:

the handwriting image positioning unit is used for inputting the target cutting fragments into a preset handwriting image positioning model if the image text type is a handwriting image text type;

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the image text recognition method of any of claims 1 to 7 when executing the program.

11. A computer readable storage medium having stored thereon computer instructions, which when executed implement the image text recognition method of any of claims 1 to 7.