CN115171129A

CN115171129A - Character recognition error correction method and device, terminal equipment and storage medium

Info

Publication number: CN115171129A
Application number: CN202211081040.4A
Authority: CN
Inventors: 蓝建敏; 李思伟; 申鑫; 池沐霖; 张旭君
Original assignee: Excellence Information Technology Co ltd
Current assignee: Excellence Information Technology Co ltd
Priority date: 2022-09-06
Filing date: 2022-09-06
Publication date: 2022-10-11

Abstract

The invention discloses a character recognition error correction method, a device, a terminal device and a storage medium, comprising the following steps: acquiring character information in a text image according to the text image; inputting the character information into a matching model to obtain a character matching result output by the matching model; acquiring a character structure and a character component of the text information according to the text information; inputting the character structure and the character component into a recognition model to obtain a character recognition result output by the recognition model; and calculating to obtain an error correction result in a weighted scoring mode according to the matching result and the identification result. The method has the advantages that the character information is identified and corrected through semantic matching, the character structure is matched and corrected through font matching, the results of the two matching methods are further output, the effects of the two methods are fused through a weighting scoring mode, the algorithm flow of character identification correction is simplified, and the accuracy of character identification correction is improved.

Description

Character recognition error correction method and device, terminal equipment and storage medium

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a character recognition error correction method, a character recognition error correction device, terminal equipment and a storage medium.

Background

Optical Character Recognition (OCR) is generally used to automatically match and recognize the content in a text image captured by an Optical device and perform error correction tasks. However, the conventional OCR algorithm usually has a problem of misrecognition, and the OCR error correction algorithm has a poor effect, and is even worse in recognition and error correction of the chinese characters.

A character recognition and error correction method is needed to improve the accuracy of recognition and error correction in the field of chinese characters.

Disclosure of Invention

The purpose of the invention is: the character recognition error correction method, the character recognition error correction device, the computer terminal equipment and the computer readable storage medium can solve the problem that the accuracy rate of optical character recognition is not high.

In order to achieve the above object, the present invention provides a method for character recognition and error correction, comprising:

acquiring character information in a text image according to the text image;

inputting the character information into a matching model to obtain a character matching result output by the matching model; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set;

acquiring a character structure and a character component of the text information according to the text information;

inputting the character structure and the character component into a recognition model to obtain a character recognition result output by the recognition model; the identification model is a Convolutional Neural Network (CNN) model trained according to the existing character data set;

and calculating to obtain an error correction result in a weighted scoring mode according to the matching result and the identification result.

In a certain embodiment, the CNN model includes: the system comprises a target positioning module and a content identification module;

the target positioning module is used for generating a positioning area in a square frame or pixel level mask form so as to determine the positions of the content in the character structure and the character component in the positioning area and obtain a positioning result; and the content identification module is used for identifying the content represented by the characters in the positioning area according to the positioning result.

In one embodiment, before the obtaining of the text information in the text image according to the text image, the method further includes:

according to the text image, image preprocessing is carried out to obtain a preprocessed text image;

wherein the image pre-processing comprises at least one of: noise elimination, edge detection, histogram equalization, morphological processing and binarization.

In a certain embodiment, the existing character data set is in the form of an image, and before the training of the CNN model, the existing character data set further includes: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

wherein the data enhancement comprises at least one of: random noise addition, image flipping, image shifting, image rotation, image cropping, and image scaling.

The embodiment of the invention also provides a character recognition error correction device, which is applied to the character recognition error correction method in any embodiment, and comprises the following steps:

the character information acquisition unit is used for acquiring character information in the text image according to the text image;

the character information matching unit is used for inputting the character information into a matching model to obtain a character matching result output by the matching model; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set;

the character information acquisition unit is used for acquiring a character structure and a character component of the character information according to the character information;

the character information matching unit is used for inputting the character structure and the character component into a recognition model and obtaining a character recognition result output by the recognition model; the identification model is a Convolutional Neural Network (CNN) model trained according to the existing character data set;

and the error correction result processing unit is used for calculating and obtaining an error correction result in a weighted scoring mode according to the matching result and the identification result.

In one embodiment, the apparatus further comprises:

the image preprocessing unit is used for preprocessing the image according to the text image to obtain a preprocessed text image;

In an embodiment, the existing character data set is in the form of an image, and the character information matching unit is further configured to: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

The embodiment of the invention also provides computer terminal equipment which comprises one or more processors and a memory. A memory coupled to the processor for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a method for text recognition error correction as in any one of the embodiments described above.

The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the method for recognizing and correcting the error of the character in any of the above embodiments is implemented.

The embodiment of the invention discloses a character recognition error correction method, a character recognition error correction device, computer terminal equipment and a computer readable storage medium, and compared with the prior art, the character recognition error correction method has the beneficial effects that: the method has the advantages that the character information is identified and corrected through semantic matching, the character structure is matched and corrected through font matching, the results of the two matching methods are further output, the effects of the two methods are fused through a weighting scoring mode, the algorithm flow of character identification correction is simplified, and the accuracy of character identification correction is improved.

Drawings

In order to more clearly illustrate the technical solution of the present invention, the drawings required to be used in the embodiments will be briefly described below, and obviously, the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic flow chart of a method for recognizing and correcting a character according to an embodiment of the present invention;

fig. 2 is a schematic view of an application scenario of a method for recognizing and correcting errors in a word according to an embodiment of the present invention;

fig. 3 is a schematic flow chart of a method for recognizing and correcting a character according to a second embodiment of the present invention;

fig. 4 is a schematic flow chart of a character recognition error correction method according to a third embodiment of the present invention;

fig. 5 is a schematic structural diagram of a character recognition error correction apparatus according to a fourth embodiment of the present invention;

fig. 6 is a schematic structural diagram of a computer terminal device according to a fifth embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be understood that the step numbers used herein are for convenience of description only and are not used as limitations on the order in which the steps are performed.

It is to be understood that the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

The terms "comprises" and "comprising" indicate the presence of the described features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The term "and/or" refers to any and all possible combinations of one or more of the associated listed items and includes such combinations.

OCR recognition and error correction is an important task in the field of artificial intelligence to match the content of text in an optical image and to correct corresponding errors for possible matching words and phrases. However, the current OCR recognition and error correction has low accuracy, and is mostly used for languages such as english and the like, and there are few solutions for chinese and foreign languages. One possible solution is to recognize and correct errors by using the structures of characters such as radicals and the like and the characteristics of fonts, and this type of algorithm generally considers characters as an image to perform the decomposition and recognition of each structure of characters. Another feasible solution is to perform context matching and recognition by word vectors based on the meaning of the words, thereby completing error correction. At present, word vector models with character patterns blended into features are available, but the error correction effect is not good, and other algorithms mostly perform character recognition and error correction through a single angle. The technical idea of the application lies in that the image meaning and the syntax meaning of the characters are used at the same time, namely, the identification and the error correction of the characters are carried out by combining the information of two aspects through the identification and the analysis of the characters and the identification and the analysis of the semantics, compared with the prior art, the difficulty of realizing the algorithm is simplified, and the identification accuracy is improved.

Example one

Fig. 1 is a schematic flow chart of a text recognition error correction method according to an embodiment of the present invention, and referring to fig. 1, the method includes:

s101, acquiring character information in a text image according to the text image;

s102, inputting the character information into a matching model to obtain a character matching result output by the matching model;

s103, acquiring a character structure and a character component of the text information according to the text information;

s104, inputting the character structure and the character component into a recognition model to obtain a character recognition result output by the recognition model;

and S105, calculating to obtain an error correction result in a weighted grading mode according to the matching result and the identification result.

The present embodiment is exemplarily described with reference to specific application scenarios: for the task of character recognition and error correction of the optical image, firstly, character information in the text image is acquired according to the text image, and then corresponding features are extracted from the character information according to a specific algorithm for processing.

After the extraction of the text information is completed, semantic matching can be performed, specifically, the text information is input into a matching model, and a text matching result output by the matching model is obtained; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set. It should be noted that the Keyword-BERT network model can pay more attention to the keywords in the word segments, that is, in the matching and error correction process, the algorithm will be more emphasized for relatively important words, thereby also improving the practicability of the method itself. The pre-training of the network model can be carried out through each large public corpus data set, and the trained model can be directly input in a use scene. It should be noted that the matching model may not be limited to the Keyword-BERT network model, and other natural language processing models are possible embodiments as long as semantic recognition and matching can be performed, but the preferred embodiment given herein is better than other solutions in effect.

After semantic matching is completed, font matching is required, and then recognition results of the two aspects are fused to calculate the final result of the method. Specifically, character analysis and splitting are firstly performed on character matching in the aspect of character patterns, however, for alphabetic languages such as English, character pattern recognition is simpler, and the data volume in the existing alphabetic training set is far less than that of Chinese characters. Therefore, this step is mostly used in the context of Chinese characters. Acquiring a character structure and a character component of the text information according to the text information; then, inputting the character structure and the character component into a recognition model to obtain a character recognition result output by the recognition model; the recognition model is a Convolutional Neural Network (CNN) model trained according to the existing character data set. Similarly, the pre-training of the network model can be carried out through each large public character data set, and the character patterns are directly input into the trained model for recognition in a use scene. The characteristics of the font include a character structure and character components, and since the step is mostly used in the context of Chinese characters, the character structure is usually a top-bottom structure, a left-right structure, an independent structure, and the like in Chinese characters, and the character components, i.e., basic elements constituting Chinese characters, need to be split from part or radical to stroke level. If the step is used for foreign languages, the corresponding letters in the character information are directly input into the recognition model without splitting the character structure and the character components. CNN is a general image recognition network, and extraction of some features in an image is completed through a series of convolution and pooling layers.

An example, the CNN model includes: the system comprises a target positioning module and a content identification module; the target positioning module is used for generating a positioning area in a square frame or pixel level mask form so as to determine the positions of the content in the character structure and the character component in the positioning area and obtain a positioning result; and the content identification module is used for identifying the content represented by the characters in the positioning area according to the positioning result. In this embodiment, a preferred CNN model is a model with two parts, namely, an object location module and a content recognition module, in the existing algorithm, fast RCNN and Mask RCNN are two more representative CNN models, the former object location module is used to generate a location block, and the latter object location module is used to generate a location area in the form of a pixel-level Mask. The target positioning module is used for judging the area to be identified, and the content identification module is used for identifying the text content in the area to be identified, so that the accuracy of character pattern identification is improved.

After the recognition of the semantics and the font is completed, an error correction result can be calculated in a weighted scoring mode according to the matching result and the recognition result. The weighted scoring is a general and simple data fusion mode, and in addition, a more accurate fusion result can be obtained through a neural network or other complex algorithms so as to further improve the accuracy of identification and error correction. However, the complexity of the algorithm increases accordingly. For example, fig. 2 is a schematic view of an application scenario of a text recognition error correction method according to an embodiment of the present invention, and provides an application case of a method for fusion recognition of semantics and fonts. The example sentence shown in the figure is "how to scan codes and add WeChat", but after the words are extracted by the optical device, the word "WeChat" is erroneously recognized as "badge", and therefore, recognition and error correction work is required. As shown, the top list is the candidates for several semantic aspect recognition results and the normalized scores, the bottom list is the candidates for several glyph aspect recognition results and the corresponding normalized scores, and the ellipses represent other results with scores that are too low to be ignored. Under the scene, the characteristics of the semantic and the font can be fused through the weighted score between the semantic and the font, and under the condition of unknown influence degree, a universal weight value taking mode is respectively 0.5, so that the micro-letter score is 0.98, the credit mark score is 0.42, the micro-letter group score is 0.37, the short message score is 0.35, and the information score is 0.06. It should be noted that in this scenario shown in fig. 2, regardless of how the weight is assigned, "WeChat" will be the matching result finally used for error correction, but in some scenarios, the correctness of the matching result may be affected. Because the distribution of the weight is difficult to model and calculate through a specific mathematical formula, however, the semantics generally include the judgment of homophones, synonyms and homonyms, especially the Keyword-BERT network model described in the present application, and therefore, the weight of the semantics should be heavier, i.e., greater than 0.5, but the specific value is difficult to directly determine. At the moment, a series of recognition results are input into the network by means of a neural network assisted with manual marking, a converged network model is trained, the weight values of the semantic meaning and the font form are inverted and used as a weight value distribution scheme, and therefore the accuracy of the method is improved.

The embodiment provides a character recognition error correction method, which comprises the following steps: acquiring character information in a text image according to the text image; inputting the character information into a matching model to obtain a character matching result output by the matching model; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set; acquiring a character structure and a character component of the text information according to the text information; inputting the character structure and the character component into a recognition model to obtain a character recognition result output by the recognition model; the identification model is a Convolutional Neural Network (CNN) model trained according to the existing character data set; and calculating to obtain an error correction result in a weighted scoring mode according to the matching result and the identification result. The method has the advantages that the character information is identified and corrected through semantic matching, the character structure is matched and corrected through font matching, the results of the two matching methods are further output, the effects of the two methods are fused through a weighting scoring mode, the algorithm flow of character identification correction is simplified, and the accuracy of character identification correction is improved.

Example two

Fig. 3 is a schematic flow chart of a character recognition error correction method according to a second embodiment of the present invention, please refer to fig. 3, before S101, the method further includes:

s201, according to the text image, image preprocessing is carried out, and a preprocessed text image is obtained.

The present embodiment is exemplarily explained with reference to specific application scenarios: before acquiring the text information in the text image according to the text image, the method further includes: according to the text image, image preprocessing is carried out to obtain a preprocessed text image; wherein the image pre-processing comprises at least one of: noise elimination, edge detection, histogram equalization, morphological processing and binarization. The input image is correspondingly preprocessed, so that the text part in the image is highlighted, and correspondingly, the features are extracted more favorably, and the accuracy of the method is improved. Meanwhile, as for a character recognition and error correction method of OCR, a traditional image processing means is a common method, namely, the improvement of image quality and the highlighting of character contents are completed through image processing, meanwhile, the extraction and combination of the characteristics in aspects of color, form, texture and the like in an image are performed through characteristic engineering, and recognition and error correction are completed through modeling of statistical machine learning. It should be noted that this method can also be used in the present application instead of extracting the character structure and character features. However, compared with the scheme provided in the application, the conventional method has the disadvantages of more complicated steps, large amount of labor work and low accuracy. But the method is also an alternative in the situation that the existing character data set is insufficient, and the image preprocessing is more important to improve the image quality.

The embodiment provides a method for character recognition and error correction, before acquiring character information in a text image according to the text image, the method further includes: performing image preprocessing according to the text image to obtain a preprocessed text image; wherein the image pre-processing comprises at least one of: noise elimination, edge detection, histogram equalization, morphological processing and binarization. By preprocessing the text image before application, the quality of the text image is improved, and therefore the accuracy of character recognition and error correction is improved.

EXAMPLE III

Fig. 4 is a schematic flow chart of a character recognition error correction method according to a third embodiment of the present invention, please refer to fig. 4, wherein S104 specifically includes:

s301, according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

s302, training the CNN model according to the character data set after the data enhancement;

and S303, inputting the character structure and the character component into the CNN model to obtain a character recognition result output by the CNN model.

The present embodiment is exemplarily described with reference to specific application scenarios: for the existing character data set in the form of image, the data amount in the data set is not enough to train a general, stable and excellent model, so that the data set can be expanded from various angles through data enhancement to solve the problems of stability and universality of the model. Specifically, data enhancement is carried out according to the existing character data set to obtain a character data set after data enhancement; training the CNN model according to the character data set after the data enhancement; and inputting the character structure and the character component into the CNN model to obtain a character recognition result output by the CNN model. Wherein the data enhancement comprises at least one of: random noise addition, image flipping, image shifting, image rotation, image cropping, and image scaling.

The present embodiment provides a method for recognizing and correcting a word, where the existing character data set is in an image form, and before training a CNN model, the method further includes: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained; wherein the data enhancement comprises at least one of: random noise addition, image flipping, image shifting, image rotation, image cropping, and image scaling. By enhancing the data of the existing character data set during training, the stability and the repeatability of the model are improved, and therefore the accuracy of character recognition and error correction is improved.

Example four

Fig. 5 is a schematic structural diagram of a character recognition error correction apparatus according to a fourth embodiment of the present invention, and referring to fig. 5, a character recognition error correction apparatus according to a fourth embodiment of the present invention is applied to a character recognition error correction method according to any one of the embodiments. It should be noted that fig. 5 is only one of the most basic embodiments, and other units may be added according to actual requirements. The device comprises:

a text information obtaining unit 41, configured to obtain text information in a text image according to the text image;

a text information matching unit 42, configured to input the text information into a matching model, and obtain a text matching result output by the matching model; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set;

a character information obtaining unit 43, configured to obtain a character structure and a character component of the text information according to the text information;

a character information matching unit 44, configured to input the character structure and the character component into a recognition model, and obtain a character recognition result output by the recognition model; the identification model is a Convolutional Neural Network (CNN) model trained according to the existing character data set;

and an error correction result processing unit 45, configured to calculate an error correction result in a weighted scoring manner according to the matching result and the recognition result.

An example, the CNN model includes: the system comprises a target positioning module and a content identification module;

The target positioning module is used for judging the area to be identified, and the content identification module is used for identifying the text content in the area to be identified, so that the accuracy of character pattern identification is improved.

An example, the apparatus further comprising:

By preprocessing the text image before application, the quality of the text image is improved, and therefore the accuracy of character recognition and error correction is improved.

An example, the existing character data set is in the form of an image, and the character information matching unit is further configured to: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

By enhancing the data of the existing character data set during training, the stability and the repeatability of the model are improved, and therefore the accuracy of character recognition and error correction is improved.

For the specific limitation of the character recognition and error correction device, reference may be made to the above limitation on the character recognition and error correction method, which is not described herein again. All or part of each module in the character recognition and error correction device can be realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The present embodiment provides a character recognition error correction apparatus, including: the character information acquisition unit is used for acquiring character information in the text image according to the text image; the character information matching unit is used for inputting the character information into a matching model to obtain a character matching result output by the matching model; the matching model is a bidirectional code representation Keyword-BERT network model based on a converter and trained according to the existing corpus data set; the character information acquisition unit is used for acquiring a character structure and a character component of the character information according to the character information; the character information matching unit is used for inputting the character structure and the character component into a recognition model and obtaining a character recognition result output by the recognition model; the identification model is a Convolutional Neural Network (CNN) model trained according to the existing character data set; and the error correction result processing unit is used for calculating and obtaining an error correction result in a weighted scoring mode according to the matching result and the identification result. The method has the advantages that the character information is identified and corrected through semantic matching, the character structure is matched and corrected through font matching, the results of the two matching methods are further output, the effects of the two methods are fused through a weighting scoring mode, the algorithm flow of character identification correction is simplified, and the accuracy of character identification correction is improved.

EXAMPLE five

Fig. 6 is a schematic structural diagram of a computer terminal device according to a fifth embodiment of the present invention, and referring to fig. 6, the fifth embodiment of the present invention provides a computer terminal device including one or more processors and a memory. The memory is coupled to the processor and configured to store one or more programs, which when executed by the one or more processors, cause the one or more processors to implement the method for text recognition error correction in any of the above embodiments.

The processor is used for controlling the overall operation of the computer terminal equipment so as to complete all or part of the steps of the character recognition error correction method. The memory is used to store various types of data to support the operation at the computer terminal device, which data may include, for example, instructions for any application or method operating on the computer terminal device, as well as application-related data. The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk.

In an exemplary embodiment, the computer terminal Device may be implemented by one or more Application Specific 1 integrated circuits (AS 1C), digital Signal Processors (DSP), digital Signal Processing Devices (DSPD), programmable Logic Devices (PLD), field Programmable Gate Arrays (FPGA), controllers, microcontrollers, microprocessors, or other electronic components, and is configured to perform the above-mentioned character recognition error correction method and achieve the technical effects consistent with the above-mentioned method.

In another exemplary embodiment, a computer readable storage medium is further provided, in which a computer program is stored, and the computer program is executed by a processor to implement the steps of the character recognition error correction method in any one of the above embodiments. For example, the computer readable storage medium may be the memory storing the computer program, and the computer program may be executed by a processor of a computer terminal device to perform the method for recognizing and correcting the text, and achieve the technical effects consistent with the method.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these modifications and substitutions should also be regarded as the protection scope of the present invention.

Claims

1. A character recognition error correction method is characterized by comprising the following steps:

acquiring character information in a text image according to the text image;

2. The method of claim 1, wherein the CNN model comprises: the system comprises a target positioning module and a content identification module;

3. The method according to claim 1, wherein before the obtaining of the text information in the text image according to the text image, the method further comprises:

4. The method of any of claims 1-3, wherein the existing character data set is in the form of an image, the existing character data set further comprising, prior to training of the CNN model: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

5. A character recognition error correction apparatus, comprising:

6. The apparatus of claim 5, wherein the CNN model comprises: the system comprises a target positioning module and a content identification module;

7. The apparatus of claim 5, further comprising:

8. The apparatus according to any one of claims 5 to 7, wherein the existing character data set is in the form of an image, and the character information matching unit is further configured to: according to the existing character data set, data enhancement is carried out, and a character data set after data enhancement is obtained;

9. A computer terminal device, comprising:

one or more processors;

a memory coupled to the processor for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of text recognition error correction according to any of claims 1-4.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of text recognition and error correction according to any one of claims 1 to 4.