CN116563853A

CN116563853A - Method and device suitable for text recognition and error correction

Info

Publication number: CN116563853A
Application number: CN202310560184.6A
Authority: CN
Inventors: 戴菀庭; 罗奕康; 聂砂; 王伊妍; 丁苏苏; 郑江
Original assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Current assignee: China Construction Bank Corp; CCB Finetech Co Ltd
Priority date: 2023-05-18
Filing date: 2023-05-18
Publication date: 2023-08-08

Abstract

The invention relates to a method and a device suitable for text recognition and correction, wherein the method comprises the steps of obtaining a first text list; inputting a target text to be identified into a text detection model, and obtaining an identification result of the target text; supplementing a first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list; inputting the target text recognition result into a training font similarity model to obtain a font similarity function; and performing error correction processing on the second text content with the confidence coefficient lower than the preset threshold value in the target text recognition result, adding the recognition result with the confidence coefficient higher than the threshold value into the existing non-full text list, further obtaining a full text list, and accurately correcting errors generated by text recognition.

Description

Method and device suitable for text recognition and error correction

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for text recognition and error correction.

Background

In the prior art, errors generated when the OCR text recognizes pictures are corrected by using semantic and language models, and the error correction can be performed on correctly recognized characters directly by using a traditional editing distance model, or the accuracy of correcting the incorrectly recognized characters is not high. In particular to a commodity list category, nationality, address county, customs declaration form and the like, and the prior art cannot accurately correct errors generated during OCR text recognition under the condition that full standard output cannot be obtained or the latest full standard cannot be obtained in time.

Disclosure of Invention

In order to solve at least one of the problems mentioned in the background art, the present invention proposes a method and an apparatus for text recognition error correction, which are intended to implement accurate correction of errors generated during text recognition.

In order to achieve the above object, the present invention adopts the technical scheme that:

a method for text recognition error correction, comprising:

acquiring a first text list, wherein the first text list is a non-full text list;

inputting a target text to be identified into a text detection model, and obtaining an identification result of the target text;

supplementing a first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list, wherein the second text list is a full text list;

inputting the target text recognition result into a training font similarity model to obtain a font similarity function;

and carrying out error correction processing on the second text content with the confidence coefficient lower than a preset threshold value in the target text recognition result.

Further, the inputting the target text recognition result into a training font similarity model, and obtaining the font similarity function includes:

all characters of the target text recognition result are respectively converted into images and input into the training character pattern similarity model to obtain a distance matrix, and the character pattern similarity function between each character image and other character images is obtained through calculation of the distance matrix.

Further, the error correction processing of the second text content with the confidence coefficient lower than the preset threshold value in the target text recognition result includes:

when the target text recognition result contains the confidence coefficient of each text, performing error correction processing on the text with the confidence coefficient lower than a preset threshold value;

and when the target text recognition result contains the confidence coefficient of each word, carrying out error correction processing on the words with the confidence coefficient lower than a preset threshold value.

Further, the error correction processing for the text with the opposite confidence level lower than the preset threshold value comprises the following steps:

setting parameters of an editing distance model to obtain a first editing distance model;

determining characters with confidence coefficient lower than a preset threshold value in the target word and characters with confidence coefficient higher than the preset threshold value and positions thereof;

determining a potential word set meeting the target word order in the second text list;

calculating the sum of the editing distances of each potential word in the potential word set compared with the target word, and sorting;

and determining the potential word with the smallest editing distance compared with the target word in the potential word set as a first word, and replacing the target word with the first word.

Further, the error correction processing for the words with the confidence level lower than the preset threshold value comprises the following steps:

setting parameters of the editing distance model to obtain a second editing distance model;

determining that the word with the confidence coefficient lower than a preset threshold value is a target word;

determining a set of potential words in the second text list that meet the target word similarity threshold;

calculating the sum of the editing distance of each potential word in the potential word set compared with the target word;

comprehensively sorting the potential words by combining the length of the target words, the number of words to be replaced in the target words and the size of the sum of the editing distances of the potential words compared with the target words;

and determining that the first potential word comprehensively sequenced in the potential word set is a second word, and replacing the target word with the second word.

Further, the method further comprises when the sum of the edit distances of the second words is smaller thanAnd when the target word is replaced by the second word.

The invention also relates to a device suitable for text recognition and error correction, comprising:

the first acquisition module is used for acquiring a first text list, wherein the first text list is a non-full text list;

the text recognition module is used for inputting a target text to be recognized into the text detection model and obtaining a recognition result of the target text;

the second acquisition module is used for supplementing the first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list, wherein the second text list is a full text list;

the training font similarity module is used for inputting the target text recognition result into a training font similarity model to obtain a font similarity function;

and the correction module is used for carrying out error correction processing on the second text content with the confidence coefficient lower than a preset threshold value in the target text recognition result.

The invention also relates to a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, implements the above-mentioned method.

The invention also relates to an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the above-mentioned method when executing the computer program.

The invention also relates to a computer program product comprising a computer program and/or instructions, characterized in that the computer program and/or instructions, when executed by a processor, implement the steps of the above-mentioned method.

The beneficial effects of the invention are as follows:

the method and the device for text recognition error correction are adopted, and the method is realized by acquiring a first text list; inputting a target text to be identified into a text detection model, and obtaining an identification result of the target text; supplementing a first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list; inputting the target text recognition result into a training font similarity model to obtain a font similarity function; and performing error correction processing on the second text content with the confidence coefficient lower than the preset threshold value in the target text recognition result, adding the recognition result with the confidence coefficient higher than the threshold value into the existing non-full text list to obtain a full text list, correcting errors generated by recognition of a text detection model in a certain range, setting different custom editing distance models according to formats of results output by the text detection model, and accurately correcting errors generated by text recognition while more conforming to a use scene.

Drawings

Fig. 1 is a flowchart of a method for text recognition error correction according to an embodiment of the present invention.

Fig. 2 is a schematic structural diagram of an apparatus suitable for text recognition error correction according to an embodiment of the present invention.

Detailed Description

In order to more clearly illustrate the technical solutions of the embodiments of the present specification, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is apparent that the drawings in the following description are only some examples or embodiments of the present specification, and it is possible for those of ordinary skill in the art to apply the present specification to other similar situations according to the drawings without inventive effort. Unless otherwise apparent from the context of the language or otherwise specified, like reference numerals in the figures refer to like structures or operations.

It will be appreciated that "system," "apparatus," "unit" and/or "module" as used herein is one method for distinguishing between different components, elements, parts, portions or assemblies at different levels. However, if other words can achieve the same purpose, the words can be replaced by other expressions.

As used in this specification and the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.

A flowchart is used in this specification to describe the operations performed by the system according to embodiments of the present specification. It should be appreciated that the preceding or following operations are not necessarily performed in order precisely. Rather, the steps may be processed in reverse order or simultaneously. Also, other operations may be added to or removed from these processes.

Definition of terms:

(1) OCR (Optical Character Recognition ): the electronic equipment checks the characters printed on the paper, determines the shape of the characters by detecting dark and light modes, and then translates the shape into computer characters by a character recognition method.

(2) Edit distance: also called Levenshtein, is one type of edit distance. Refers to the minimum number of editing operations required to switch from one to the other between two strings. The allowed editing operations include replacing one character with another, inserting one character, deleting one character.

The first aspect of the present invention relates to a method for text recognition error correction, which comprises the steps as shown in fig. 1, and comprises the following steps:

step S1, a first text list is obtained, wherein the first text list is a non-full text list.

Illustratively, the non-full text listing is a trade name: [ apple, banana, peach, salted fish, sheep leg, fresh beef, fresh burdock, fresh dried celery, chafing dish ].

And S2, inputting the target text to be identified into a text detection model, and obtaining the identification result of the target text.

In this embodiment, preferably, the target text to be recognized is a picture, the text detection model is an OCR model, and characters on the picture can be accurately recognized through the OCR model and an OCR result set is obtained.

Preferably, the OCR result set includes a commodity name and a confidence.

Illustratively, the OCR result set is: 0.99 of spanner, 0.98 of banana, 0.2 of iron alloy and 0.4 of anti-theft pin, ….

And S3, supplementing the first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list, wherein the second text list is a full text list.

Preferably, in this embodiment, the confidence level preset threshold is 0.95. Supplementing commodities 'bananas' and 'wrenches' with the confidence coefficient greater than 0.95 in the OCR result set to the first text list, and performing duplicate removal processing to obtain a second text list, namely commodity names of the full text list: [ apple, banana, peach, salted fish, sheep leg, fresh beef, fresh burdock, fresh dried celery, chafing dish, spanner ].

And S4, inputting the target text recognition result into a training font similarity model to obtain a font similarity function.

Preferably, all the characters of the target text recognition result are respectively converted into images and input into the training character pattern similarity model to obtain a distance matrix, and the character pattern similarity function between each character image and other character images is obtained through calculation of the distance matrix.

For example, assuming that there are 3 tens of thousands of characters, each character corresponds to a vector, and the cosine distance between the vectors is the similarity of two characters, a cosine similarity matrix of 3w X3 w can be obtained, for example, the similarity between female-Ru is higher than the similarity between male-female; the method can calculate the font similarity function sim between each text image and other text images.

And S5, performing error correction processing on the second text content with the confidence coefficient lower than a preset threshold value in the target text recognition result.

Specifically, according to different manners of the OCR output result, different steps are adopted for error correction processing.

In this embodiment, preferably, when the target text recognition result includes a confidence coefficient of each text, the text with the confidence coefficient lower than a preset threshold is subjected to error correction processing, which specifically includes the following steps:

determining characters and positions of which the confidence coefficient is lower than a preset threshold value in the target words and characters and positions of which the confidence coefficient is higher than the preset threshold value;

Assuming that the first edit distance model is an edit distance model a, an example is shown in which the edit distances of "add" and "delete" are each set to 1 and the edit distance of "change" is set to 1-sim [ a ] [ b ], where sim [ a ] [ b ] represents the similarity between two text images.

For example, if the OCR output is "fresh bovine side- [0.99,0.99,0.35]", the confidence preset threshold is 0.95, and only the "side" word with confidence lower than 0.95 is needed to be corrected.

For example, if the OCR output result is "the hot pot is made to be the lunar fresh cow beside- [0.99,0.99,0.35,0.15,0.12,0.98,0.99,0.56]", the preset confidence coefficient threshold is 0.95, firstly, "fire", "pot", "fresh", "cow" with confidence coefficient higher than 0.95 is selected, then a potential word set a meeting the word order sequence of "hot pot abc fresh cow d" is traversed and screened in the second text list, the sum of the edit distances of each potential word in the potential word set a compared with "the hot pot is made to be the lunar fresh cow beside" is calculated, and the potential words with the smallest sum of the edit distances compared with "the hot pot is ranked, in the potential word set a, are determined to be the first words, and the" the lunar fresh cow beside "is replaced by the first words.

When the target text recognition result contains the confidence coefficient of each word, carrying out error correction processing on the words with the confidence coefficient lower than a preset threshold value, wherein the specific steps are as follows:

Assuming that the second edit distance model is edit distance model Z, exemplary in which the edit distances of "add" and "delete" are both set to 1, and the edit distance of "change" is set to max (1-sim [ a ] [ b ], 0.49), where sim [ a ] [ b ] represents the similarity between two text images, and 0.49 is a set threshold value, which may be set according to actual needs.

Illustratively, the trade name is "fresh burdock" and the OCR output is "fresh ox side"; meanwhile, another commodity is called as fresh dry celery, the editing distance of the beef- > dry commodity is calculated to be 0.1, the editing distance of the celery- > side commodity is calculated to be 0.29, and the editing distance of the side- > burdock commodity is calculated to be 0.4; at this time, the model considers that the sum of the editing distances of modifying the "cattle and side" to the "stem and celery" at the same time is 0.39 to be lower than the editing distance of the "side" modification "burdock" by 0.4, thereby performing error correction. Then, according to the empirical exploration of the recognition result of the optical text, under the condition that which text is most likely to be recognized incorrectly is uncertain, the effect of less correction text is better than that of more correction text, so in the embodiment, a certain weight is set by combining the length of the target word, the number of the to-be-replaced text in the target word and the sum of the editing distances of the potential word and the target word, and comprehensive analysis is carried out to obtain the correction result closest to the target text. The threshold of 0.49 is set to ensure that the edit distance is relatively small when correcting similar characters, and also to ensure that the number of corrected characters of the model is as small as possible.

In the preferred embodiment, when editing the second wordThe sum of the distances is smaller thanAnd when the target word is replaced by the second word.

When searching the whole word stock based on the edit distance for the words with the confidence coefficient lower than the preset threshold value, when the minimum sum of the edit distances is judged to be smaller thanWhen it is, then the replacement is performed. This is because the longer the length of the recognition word, the higher the probability of making an error, and thus the larger the sum of editing distances that allow correction.

By adopting the method suitable for text recognition error correction, a first text list is acquired; inputting a target text to be identified into a text detection model, and obtaining an identification result of the target text; supplementing a first text content with the confidence coefficient higher than a preset threshold value in the target text recognition result to the first text list to obtain a second text list; inputting the target text recognition result into a training font similarity model to obtain a font similarity function; and performing error correction processing on the second text content with the confidence coefficient lower than the preset threshold value in the target text recognition result, adding the recognition result with the confidence coefficient higher than the threshold value into the existing non-full text list to obtain a full text list, correcting errors generated by recognition of a text detection model in a certain range, setting different custom editing distance models according to formats of results output by the text detection model, and accurately correcting errors generated by text recognition while more conforming to a use scene.

Another aspect of the present invention also relates to a device for text recognition and error correction, whose structure is shown in fig. 2, comprising:

By using the device, the above-mentioned operation processing method can be executed and the corresponding technical effects can be achieved.

The embodiment of the present invention also provides a computer-readable storage medium capable of implementing all the steps in the method for text recognition error correction in the above embodiment, the computer-readable storage medium storing a computer program thereon, the computer program implementing all the steps in the method for text recognition error correction in the above embodiment when executed by a processor.

The embodiment of the invention also provides an electronic device for executing the method, which is used as an implementation device of the method, and at least comprises a processor and a memory, wherein the memory is particularly used for storing data and related computer programs and the like required by executing the method, and all the steps of the implementation method are executed by calling the data and the programs in the memory by the processor, so that corresponding technical effects are obtained.

Preferably, the electronic device may comprise a bus architecture, and the bus may comprise any number of interconnected buses and bridges, the buses linking together various circuits, including the one or more processors and memory. The bus may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., as are well known in the art and, therefore, will not be further described herein. The bus interface provides an interface between the bus and the receiver and transmitter. The receiver and the transmitter may be the same element, i.e. a transceiver, providing a unit for communicating with various other systems over a transmission medium. The processor is responsible for managing the bus and general processing, while the memory may be used to store data used by the processor in performing operations.

Additionally, the electronic device may further include a communication module, an input unit, an audio processor, a display, a power supply, and the like. The processor (or controllers, operational controls) employed may comprise a microprocessor or other processor device and/or logic devices that receives inputs and controls the operation of the various components of the electronic device; the memory may be one or more of a buffer, a flash memory, a hard drive, a removable medium, a volatile memory, a nonvolatile memory, or other suitable means, may store the above-mentioned related data information, may further store a program for executing the related information, and the processor may execute the program stored in the memory to realize information storage or processing, etc.; the input unit is used for providing input to the processor, and can be a key or a touch input device; the power supply is used for providing power for the electronic equipment; the display is used for displaying display objects such as images and characters, and may be, for example, an LCD display. The communication module is a transmitter/receiver that transmits and receives signals via an antenna. The communication module (transmitter/receiver) is coupled to the processor to provide an input signal and to receive an output signal, which may be the same as in the case of a conventional mobile communication terminal. Based on different communication technologies, a plurality of communication modules, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) is also coupled to the speaker and microphone via the audio processor to provide audio output via the speaker and to receive audio input from the microphone to implement the usual telecommunications functions. The audio processor may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor is also coupled to the central processor so that sound can be recorded on the host through the microphone and sound stored on the host can be played through the speaker.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A method for text recognition error correction, comprising:

2. The method of claim 1, wherein the inputting the target text recognition result into a training glyph similarity model to obtain a glyph similarity function comprises:

3. The method according to any one of claims 1 or 2, wherein performing error correction processing on the second text content with the confidence level lower than the preset threshold in the target text recognition result includes:

4. The method of claim 3, wherein the error correction processing of text having a confidence level below a predetermined threshold comprises:

5. The method of claim 3, wherein performing error correction processing on words having a confidence level below a preset threshold comprises:

6. The method of claim 5, further comprising when the sum of edit distances of the second words is less thanAnd when the target word is replaced by the second word.

7. An apparatus for text recognition error correction, comprising:

8. A computer readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 6.

9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method of any of claims 1 to 6 when executing the computer program.

10. A computer program product comprising a computer program and/or instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 6.