WO2022194130A1

WO2022194130A1 - Character position correction method and apparatus, electronic device and storage medium

Info

Publication number: WO2022194130A1
Application number: PCT/CN2022/080874
Authority: WO
Inventors: 蔡悦; 张宇轩; 庄妮; 黄灿; 王长虎
Original assignee: 北京有竹居网络技术有限公司
Priority date: 2021-03-16
Filing date: 2022-03-15
Publication date: 2022-09-22
Also published as: CN113033377A

Abstract

The present disclosure provides a character position correction method and apparatus, an electronic device and a storage medium. By means of correcting a single character position in a text line recognition result by using a single character position in a single character detection and recognition result, the text recognition performance and the single character position accuracy can be both taken into consideration.

Description

Character position correction method, device, electronic device and storage medium

This application is based on the CN application number 202110304878.4 and the filing date is March 16, 2021, and claims its priority. The disclosure of this CN application is hereby incorporated into this application as a whole.

technical field

Embodiments of the present disclosure relate to the technical field of character recognition, and in particular, to a character position correction method, apparatus, electronic device, and storage medium.

Background technique

Optical Character Recognition (OCR) is a technology that recognizes characters in images. At present, OCR mainly has two granularities of "line" and "single word". Among them, the text line OCR is the mainstream, and the text line OCR can more accurately identify the characters appearing in the text line image, and use its own characteristics to roughly estimate the position of a single character.

SUMMARY OF THE INVENTION

Embodiments of the present disclosure propose a character position correction method, apparatus, electronic device, and storage medium.

In a first aspect, an embodiment of the present disclosure provides a character position correction method, the method includes: acquiring a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the text line recognition result includes the first A character and the first character bounding box position, the single character detection and recognition result includes the second character and the second character bounding box position; for each single character detection and recognition result in the single character detection and recognition result sequence, a position update operation is performed.

In some embodiments, the location update operation includes: searching the text line recognition result sequence for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one Text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result; update the position of the bounding box of the first character in the determined text line recognition result to this The second character position in the single word detection recognition result.

In some embodiments, the first character bounding box position includes an inline start position and an inline end position; and the method further includes:

Calculate the average value of the in-line character gaps of the text line recognition result sequence, wherein the in-line character gap average value of the text line recognition result sequence is the average value of each updated first character bounding box position in the text line recognition result sequence. In the text line recognition result, the average value of the distance between the in-line end position in the previous text line recognition result and the in-line start position in the subsequent text line recognition result in the two adjacent text line recognition results;

For the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence, update the in-line start position and in-line end position in the text line recognition result according to the average value of the in-line character gap position, wherein the distance between the updated in-line starting position of the text line recognition result and the in-line ending position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the in-line character gap Average value, and/or, the distance between the in-line end position of the text line recognition result after the update and the in-line start position of the text recognition result following the text recognition result in the text line recognition result sequence is the described Inline character gap average.

In some embodiments, the first character bounding box position includes a line start position and a line end position; and

The method also includes:

Respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position and text line end position;

For the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, update the line in the text line recognition result with the text line start position and text line end position respectively. Start position and line end position.

In some embodiments, the text line recognition result sequence corresponding to the text line image is obtained in the following manner:

Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.

In some embodiments, the text line recognition model includes sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.

In some embodiments, the text line recognition model includes sequential convolutional neural networks and attention-based recurrent neural networks.

In some embodiments, the sequence of word detection and recognition results corresponding to the text line image is obtained in the following manner:

Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;

According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;

For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.

In a second aspect, an embodiment of the present disclosure provides a character position correction device, the device includes: an acquisition unit configured to acquire a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the The text line recognition result includes the first character and the position of the bounding box of the first character, and the single-word detection and recognition result includes the second character and the position of the second character bounding box; the first updating unit is configured to detect and identify the sequence of the single-word detection and recognition results Detect the recognition result of each word in , and perform a position update operation.

In some embodiments, the first character bounding box position includes an in-line start position and an in-line end position; and the apparatus further includes:

The average value calculation unit is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the updated value of each updated text line recognition result sequence. The average value of the distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result in the text line recognition result passing the first character's bounding box position in the two adjacent text line recognition results ;

The second updating unit is configured to, for the text line recognition result in the text line recognition result sequence whose in-line start position and in-line end position have not been updated, update the text line recognition result in the text line recognition result according to the average value of the in-line character gap. The in-line start position and in-line end position of the text line recognition result are between the updated in-line start position and the in-line end position of the previous text recognition result of the text line recognition result in the text line recognition result sequence The distance is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line and the inline start of the text recognition result following the text recognition result in the text line recognition result sequence The distance between the positions is the average of the inline character gaps.

The device also includes:

The determining unit is configured to respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence Determined as the starting position of the text line and the ending position of the text line;

The third updating unit is configured to update the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence with the text line start position and the text line end position respectively. This text line identifies the line start and end positions in the result.

In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are stored by the one or more programs described above When executed by the multiple processors, the above-mentioned one or more processors are caused to implement the method described in any one of the implementation manners of the first aspect.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by one or more processors, any implementation manner of the first aspect is implemented method described.

Description of drawings

Other features, objects and advantages of the present disclosure will become more apparent upon reading the detailed description of non-limiting embodiments taken with reference to the following drawings. The drawings are for illustrative purposes only and are not to be considered limiting of the present disclosure. In the attached image:

1 is an exemplary system architecture diagram to which some embodiments of the character position correction method of the present disclosure may be applied;

2 is a flowchart of some embodiments of a character position correction method according to the present disclosure;

3A-3C are schematic diagrams of an application scenario of the character position correction method according to the present disclosure;

4 is a flow chart of further embodiments of a character position correction method according to the present disclosure;

5 is a schematic structural diagram of some embodiments of a character position correction device according to the present disclosure;

6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.

Detailed ways

The present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.

It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.

In the related art, although the text line OCR can accurately identify the characters appearing in the text line image, the text line OCR only uses a rough estimation of the position of a single character. For example, CRNN (Convolutional Recurrent Neural Networks, Convolutional Recurrent Neural Networks) is mainly based on CTC (Connectionist Temporal Classification) information backward, and Transformer is mainly based on attention mechanism. However, the single-word positions obtained by them have low accuracy and cannot be applied in scenarios with high requirements for single-word positions. For example, in the scenario of difference comparison between two documents (for example, contract documents), the accuracy of the word position needs to be high.

In order to take into account both the character recognition performance and the accuracy of the single-character position, the applicant has found through practice that the single-character character recognition accuracy obtained in the single-character granularity character detection and recognition is low, but the accuracy of the single-character position is high. Therefore, the character position correction method, device, electronic device, and storage medium provided by the embodiments of the present disclosure, by using the single-character position in the single-character detection and recognition result to correct the single-character position in the text line recognition result, can take into account the character recognition performance and The accuracy of the position of a single character is specifically implemented as follows: first, obtain a sequence of text line recognition results and a sequence of single character detection and recognition results corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, and the single character detection The recognition result includes the second character and the position of the bounding box of the second character; then, for each single-word detection and recognition result in the single-word detection and recognition result sequence, a position update operation is performed: that is, the first character and the single word are searched in the text line recognition result sequence. Detect the text line recognition result with the same second character in the recognition result; in response to finding at least one text line recognition result, in each of the found text line recognition results, determine the text line recognition that is closest to the single word detection and recognition result. Result: updating the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the word detection and recognition result. Then, by updating the position of the character bounding box in the text line recognition result, the accuracy of the character position in the text line recognition result is improved.

FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the character position correction method, apparatus, electronic device, and storage medium of the present disclosure may be applied.

According to some embodiments of the present disclosure, as shown in FIG. 1 , the system architecture 100 may include

terminal devices

101 , 102 , 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the

terminal devices

101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can use the

terminal devices

101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the

terminal devices

101, 102 and 103, such as character recognition applications, text processing applications, speech recognition applications, short video social applications, web conferencing applications, web browser applications, Shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.

The

terminal devices

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, and 103 are hardware, they can be various electronic devices with video capture devices (such as cameras), handwriting pads and display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 Players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer 4) Players, Laptops and Desktops computer, etc. When the

terminal devices

101, 102 and 103 are software, they can be installed in the terminal devices listed above. It can be implemented as a plurality of software or software modules (for example, to provide character position correction services), or can be implemented as a single software or software module. There is no specific limitation here.

In some cases, the character position correction method provided by the present disclosure may be performed by the

terminal devices

101 , 102 , and 103 , and correspondingly, the character position correction apparatus may be provided in the

terminal devices

101 , 102 , and 103 . In this case, the system architecture 100 may not include the server 105 .

In some cases, the character position correction method provided by the present disclosure may be executed jointly by the

terminal devices

101, 102, 103 and the server 105, for example, "obtain the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image" The steps of “perform the location update operation for each word detection and recognition result in the sequence of word detection and recognition results” can be performed by the

terminal devices

101 , 102 , and 103 , etc., can be performed by the server 105 . This disclosure does not limit this. Correspondingly, the character position correction device may also be provided in the

terminal devices

101 , 102 , 103 and the server 105 respectively.

In some cases, the character position correction method provided by the present disclosure may be executed by the server 105. Correspondingly, the character position correction device may also be provided in the server 105. In this case, the system architecture 100 may not include the

terminal devices

101 and 102. , 103.

It should be noted that the server 105 may be hardware or software. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server 105 is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.

It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.

Continuing to refer to FIG. 2, there is shown a flow 200 of some embodiments of a character position correction method according to the present disclosure, the character position correction method comprising the following steps:

Step 201: Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.

In this embodiment, the execution body of the character position correction method (for example, the

terminal devices

101, 102, and 103 shown in FIG. 1 ) can be locally or remotely from other electronic devices (for example, shown in FIG. 1 ) connected to the above-mentioned execution body network. The server 105 shown) obtains the text line recognition result sequence and the word detection and recognition result sequence corresponding to the text line image.

Here, the text line image may be an image including a text line object. Here, each character in the text line object may have the same size or may have different sizes. Each character in the text line object may be composed of characters of the same language, or may be composed of characters of more than one language, which is not specifically limited in the present disclosure.

The text line recognition result sequence corresponding to the text line image may be obtained by using the text line OCR technology to recognize the text line image. The sequence of text line recognition results may be a sequence consisting of text line recognition results, and the text line recognition results may include the first character and the position of the first character's bounding box. Here, the position of the first character's bounding box may be used to indicate that the first character corresponds to The position range in the image of the text line. The text line recognition results in the text line recognition result sequence may be arranged according to the order of the first character in the text line recognition result in the text line object in the text line image. In practice, the circumscribed rectangle of the character in the text line image is usually used as the bounding box of the character. Accordingly, the position of the bounding box of the first character can use various implementations to represent the circumscribed rectangle of the first character in the text line image. For example, the position of the bounding box of the first character may include the coordinates of four vertices of the bounding rectangle of the first character in the text line image; for another example, the position of the bounding box of the first character may further include the bounding rectangle of the first character in the text line image The coordinates of the top-left vertex of , and the lengths of the long and short sides of the bounding rectangle.

The sequence of single-word detection and recognition results corresponding to the text-line image may be obtained by recognizing the text-line image by using the OCR technology based on single-word detection and recognition. The sequence of single-word detection and recognition results may be a sequence composed of single-word detection and recognition results, and the single-word detection and recognition results may include the second character and the position of the bounding box of the second character. Here, the position of the second character bounding box may be used to represent that the second character corresponds to The position range in the image of the text line. The single-word detection and recognition results in the single-word detection and recognition result sequence may be arranged according to the order of the second character in the single-word detection and recognition result in the text line object in the text line image. Similar to the first character bounding box, the second character bounding box may also be a circumscribed rectangle representing the second character in the text line image using various implementations, which will not be repeated here.

In some embodiments, the text line recognition result sequence corresponding to the text line image may be obtained in the following manner:

Input the text line image into the pre-trained text line recognition model, and obtain the text line recognition result sequence corresponding to the text line image. Here, the text line recognition model can be used to characterize the correspondence between the image to be recognized and the text line recognition result sequence. For example, a text line recognition model can be obtained by training a machine learning model on a large number of training samples. The training sample may include a sample text line image and corresponding label information, and the label information may include specific characters and corresponding character positions of each character in the sample text line image.

In some embodiments, the text line recognition model here may include sequentially arranged Convolutional Neural Networks (CNN, Convolutional Neural Networks), Recurrent Neural Networks (RNN, Recurrent Neural Networks) and CTC (Connectionist Temporal Classification, connection time classification). Specifically, the text line image can be input into the CNN to obtain the feature image, and then the feature image can be input into the RNN (specifically, it can be a deep bidirectional LSTM network), and the text sequence features can be extracted on the basis of the feature image obtained by convolution. Then, the text sequence features are reversed by CTC to obtain the text line recognition result sequence.

In some embodiments, the text line recognition model here may also include sequential convolutional neural networks and attention-based recurrent neural networks.

In some embodiments, the sequence of word detection and recognition results corresponding to the text line image may be obtained in the following manner:

First, single-word detection is performed on the text line image by using the target detection algorithm to obtain the position of at least one character bounding box.

Secondly, character images are intercepted from the text line images according to the detected positions of each character bounding box, and the intercepted character images are input into the single-character recognition model to obtain corresponding character recognition results. For example, a single-word recognition model can be obtained by training a machine learning model based on a large number of single-word training samples. The single-word training samples may include sample single-word images and corresponding characters.

Finally, for each character bounding box obtained by detection, the character recognition result corresponding to the character bounding box and the position of the character bounding box are used to generate a single character detection and recognition result, and the character in the text line image corresponding to the position of the character bounding box is located. sequence, using the generated word recognition to generate a sequence of word detection and recognition results.

Step 202, for each word detection and recognition result in the sequence of word detection and recognition results, perform a position update operation.

In this embodiment, the above-mentioned execution subject may perform a position update operation on each word detection and recognition result in the sequence of word detection and recognition results obtained in step 201 . In some embodiments, the location update operation may specifically include the following sub-steps 2021 to 2023:

Sub-step 2021, search for a text line recognition result in which the first character is the same as the second character in the word detection and recognition result in the text line recognition result sequence.

Sub-step 2022, in response to finding at least one text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result.

Here, various implementation manners can be used to determine the distance between the text line recognition result and the word detection recognition result.

For example, the difference between the sorting order of the text line recognition results in the text line recognition result sequence and the sorting order of the single-character detection and recognition results in the word detection and recognition result sequence can be used as the difference between the text line recognition result and the single-word detection and recognition result the distance.

For another example, the distance between the position of the first character bounding box in the text line recognition result and the position of the second character bounding box in the word detection and recognition result may be used as the distance between the text line recognition result and the word detection and recognition result.

Sub-step 2023: Update the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the single-word detection and recognition result.

The above location update operation is described below with specific examples:

As shown in FIG. 3A , the text line recognition result sequence 302 and the single character detection and recognition result sequence 303 corresponding to the text line image 301 are shown. As can be seen from the figure, the first character in the text line recognition result sequence 302 is recognized Correct, but the first character bounding box location is rough. However, the position of the bounding box of the second character in the single-character detection and recognition result sequence 303 is relatively accurate, but the second character has a recognition error. an hour".

Executing step 202 based on the above-mentioned text line recognition result sequence 302 and word detection and recognition result sequence 303 may be:

For each word detection and recognition result in the single-word detection and recognition result sequence 303, a position update operation is performed.

Among them, for the second characters "head" and "area" in the single-word detection and recognition result sequence 303, when sub-step 2021 is executed, since the same first character does not exist in the text line recognition result sequence 302, the sub-step 2021 is not executed any more.

Steps

2022 and 2023.

For the "day" that appears first in the second character in the word detection and recognition result sequence 303, when sub-step 2021 is executed, it is found that there are two first characters corresponding to the same "day" in the text line recognition result. When performing sub-step 2022, it is determined that the distance between the word detection and recognition results corresponding to the "day" that appears first in the second character in the single-word detection and recognition result sequence 303 is the closest in the text line recognition results. The text line recognition result corresponding to the first first character "day" of The position of the bounding box of the first character is updated to the position of the second bounding box corresponding to the "day" that appears first in the second character in the single-word detection and recognition result sequence 303 .

In the same way, the process is similar for the second occurrence of "day" in the second character in the single-word detection and recognition result sequence 303, and finally the second first character "day" in the text line recognition result sequence 302 corresponds to The position of the bounding box of the first character in the text line recognition result is updated to the position of the second bounding box corresponding to the second occurrence of "day" in the second character in the sequence 303 of word detection and recognition results.

For the second characters "de", "qi" and "bao" in the single-word detection and recognition result sequence 303, when sub-step 2021 is executed, it is found that there are characters corresponding to "de", "qi" respectively in the text line identification results. ” and “report” are the same as the text line recognition results corresponding to the first characters; therefore, when sub-step 2022 is executed, the text line recognition results for which the first characters are “的”, “qi”, and “report” can be directly determined. is the corresponding shortest text jargon recognition result; then, when sub-step 2023 is executed, the first character in the text line recognition result sequence 302 is respectively "的", "qi", "report" in the text line recognition results The position of the first character bounding box of is updated to the position of the first character bounding box in the single-character detection and recognition results of which the second characters in the single-character detection and recognition result sequence 303 are "的", "qi", and "bao" respectively.

After step 202, the text line recognition result sequence 302 will be as shown in the lower text line recognition result sequence 302 in FIG. 3A . It can be seen that “now” and “pre”, except for the second character recognition error in the single character detection and recognition result, correspond to Except for the position of the first character bounding box of The accuracy of the position is higher, and it is more suitable for scenes with higher requirements on the character position.

The character position correction method provided by the above embodiments of the present disclosure, by using the character bounding box position in the word detection and recognition result corresponding to the text line image is more accurate than the character bounding box position in the text line recognition result, while the text line recognition result is more accurate. Compared with the characters in the single-character detection and recognition, the characters in the middle are more accurate. The position of the character bounding box in the text line recognition result is corrected with the character bounding box position of the correct character in the single-word detection and recognition result, which improves the text line recognition result. The accuracy of the position of the character bounding box is more suitable for scenes with high requirements on the character position.

Continuing to refer to FIG. 4 , a flow 400 of further embodiments of the character position correction method according to the present disclosure is shown. The character position correction method includes the following steps:

Step 401: Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.

Step 402, for each word detection and recognition result in the word detection and recognition result sequence, perform a position update operation.

In this embodiment, the specific operations and technical effects of

steps

401 and 402 are basically the same as the operations and effects of

steps

202 and 202 in the embodiment shown in FIG. 2 , and will not be repeated here.

Step 403: Calculate the average value of the inter-line character gap of the text line recognition result sequence.

In this embodiment, the position of the bounding box of the first character in each text line recognition result in the text line recognition result sequence may include an in-line start position and an in-line end position. Here, the in-line start position and in-line end position in the bounding box position of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction parallel to the text line.

For example, when the text line in the text line image is in the horizontal direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged horizontally from left to right. At this time, the in-line starting position in the first character bounding box position corresponding to the first character may be the abscissa value of the upper left corner or the lower left corner vertex coordinate of the circumscribed rectangle of the first character, and the in-line ending position may be the first character The abscissa value of the vertex coordinates of the upper right or lower right corner of the bounding rectangle.

For another example, when the text line in the text line image is in the vertical direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged vertically from top to bottom. At this time, the in-line starting position in the first character bounding box position corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character, and the in-line ending position may be the first character The ordinate value of the coordinates of the lower left or lower right corner of the bounding rectangle.

Here, the text line in the text line image does not necessarily define a specific direction. For example, text lines can be arranged horizontally from left to right, and text lines can also be arranged vertically from top to bottom. Text lines can also be arranged from top left to bottom right.

In this embodiment, the average value of the character gaps in the text line recognition result sequence is the average value of the two adjacent text line recognition results in the text line recognition results whose positions of the text line recognition result sequences have been updated with the position of the bounding box of the first character. The average value of the distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result.

For details, please refer to FIG. 3B , the upper part 302 in FIG. 3B shows the position of each first character bounding box in the text line recognition result sequence corresponding to the revision of the first character bounding box position in step 202 shown in FIG. 3A . enlarged view of .

As can be seen from FIG. 3B , in the text line recognition result sequence, the text line recognition results whose position of the bounding box of the first character has been updated each include the first characters as “天”, “的”, “天”, and “Qi” respectively. , the text line recognition result corresponding to "report", wherein, the character gap between the adjacent first character "tian" and the first character "de" is d1, wherein, the adjacent first character "de" and the first character The character gap between "tian" is d2, and the character gap between the adjacent first character "tian" and the first character "qi" is d3. The average value of the inter-line character gap of the text line recognition result sequence is the average value of d1, d2, and d3 d0=(d1+d2+d3)/3.

Step 404 , for a text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, update the in-line start position and in-line end position in the text line recognition result.

In this embodiment, the above-mentioned execution body may, for the text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, calculate the inline character gap of the text line recognition result sequence obtained according to step 403 Average, update the inline start position and inline end position in the text line recognition result. Wherein, the distance between the updated in-line starting position of the text line recognition result and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the average value of the in-line character gap obtained by the above calculation , or, the distance between the in-line starting position of the text line recognition result in the updated in-line end position and the text line recognition result of the text line recognition result sequence is the in-line character obtained by the above calculation Gap average.

Here, continue to take the text line recognition result sequence 302 shown in FIG. 3B as an example for description. It can be seen from the relevant description in the embodiment of FIG. 2 that after step 202 or step 402, the text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence 302 have not been updated is the number one in the word detection and recognition result. The recognition results of the text lines corresponding to "jin" and "pre" for the two character recognition errors. Here, in step 404, the in-line start position and in-line end position in the bounding box position of the first character in the text line recognition result whose first characters in the text line recognition result sequence are "now" and "pre" can be obtained according to the above calculation. The average value d0 of the character gap in the line is to update the inline start position and inline end position in the text line recognition results where the first character is "jin" and "pre", and make the end position in the line corresponding to the first character "jin" after the update and The distance between the in-line starting positions corresponding to the first character "天" is the average value d0 of the in-line character gap, and the in-line starting position corresponding to the first character "pre" after the update is the same as the in-line corresponding to the first character "qi". The distance between the end positions is the average value d0 of the inline character gap, and/or, the distance between the inline end position corresponding to the first character "pre" after the update and the inline start position corresponding to the first character "report" is the inline character gap Average d0. The updated text sequence recognition result sequence may be shown as the lower part 302 in FIG. 3B .

It can be seen intuitively from FIG. 3B that the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of FIG. 3B is relative to the position of the first character in the text line recognition result sequence 302 shown in the upper part of FIG. 3B The accuracy of the box position is higher. Specifically, the spacing between characters in a text line is more uniform.

It can be seen from the above record that, for the text line recognition result whose position of the first character bounding box has not been updated in step 402 in the text line recognition result sequence, after

steps

403 and 404, its in-line start position and in-line end position are further revised. , compared with before the update, the text line recognition result after the update is closer to the inter-character gap in the entire text line recognition result sequence. In practice, the characters in the text line tend to have equal spacing in most cases. Therefore, after the operations of step 403 and step 404, the accuracy of the position of the bounding box of the first character in the text line recognition result sequence is further improved.

In some embodiments, the above-mentioned execution subject may also perform the following

steps

405 and 406 after performing step 402, or after performing step 404:

Step 405: Determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position. start position and end position of the text line.

Here, the position of the first bounding box in each text line recognition result in the text line recognition result sequence may further include a line start position and a line end position.

Here, the line start position and the line end position in the bounding box of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction perpendicular to the text line.

For example, when the text line in the text line image is in the horizontal direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged horizontally from left to right. At this time, the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character, and the end position of the line may be the first character The ordinate value of the vertex coordinates of the lower left or lower right corner of the bounding rectangle.

For another example, when the text line in the text line image is in the vertical direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged vertically from top to bottom. At this time, the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the abscissa value of the upper left corner or the upper left corner vertex coordinate of the circumscribed rectangle of the first character, and the end position of the line may be the first The abscissa value of the upper right or lower right corner of the bounding rectangle of the character.

Specifically, reference may be made to Fig. 3C. The upper part 302 in Fig. 3C shows the text line recognition result sequence 302 shown in the lower part in Fig. 3B. In the upper text line recognition result sequence 302 in FIG. 3C, the text line recognition results whose line start position and line end position have been updated include the first characters "天", "的", "天", "qi", and "bao" The corresponding text line recognition result. The minimum value of the line starting positions of the above text line recognition results is the line starting position y1 corresponding to the first character "天", that is, the ordinate corresponding to the upper side of the circumscribed rectangle of the first character "天"; the above texts The maximum value in the line end positions of the line recognition result is the line end position y2 corresponding to the first character "report", that is, the ordinate corresponding to the lower side of the circumscribed rectangle of the first character "report". Therefore, y1 and y2 can be determined as the starting position of the text line and the ending position of the text line, respectively.

Step 406, for the text line recognition result that has not updated the line start position and the line end position in the text line recognition result sequence, update the line start position in the text line recognition result with the text line start position and the text line end position respectively. start position and line end position.

For example, for the text recognition result in which the first characters of the line start position and line end position that have not been updated in the text line recognition sequence 302 in the upper part of FIG. 3C are "now" and "pre", respectively, the first character The line start position and line end position in the position are updated to y1 and y2, and the updated text line recognition result sequence 302 is shown in the lower part of FIG. 3C .

It can be seen intuitively from FIG. 3C that the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of FIG. 3C is relative to the position of the first character enclosing box in the text line recognition result sequence 302 shown in the upper part of FIG. 3C . The accuracy of the box position is higher, which is reflected in the line height of each character, which is further approximated to the real situation.

It can be seen from the above record that, for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, after step 405 and step 406, its line start position and line end position are further revised. , compared to before the update, the text line recognition result after the update is closer to the overall line height range of the text line recognition result sequence. In practice, in most cases, the characters in the text line tend to be in the same line and within the corresponding line height range, that is, between a certain line start position and line end position, so after

steps

405 and 405 operation, the accuracy of the position of the bounding box of the first character in the text line recognition result sequence is further improved.

As can be seen from FIG. 4 , compared with the embodiment corresponding to FIG. 2 , the process 400 of the character position correction method in this embodiment has more text line recognition for the in-line start position and in-line end position that have not been updated. As a result, the in-line start position and the in-line end position are updated according to the average value of the in-line character gaps, and in some embodiments, the in-line start position and the in-line end position of the text line recognition results that have not been updated may be updated. Operations on position and inline end position. Therefore, the solution described in this embodiment can further improve the accuracy of the position of the bounding box of the first character in the text line recognition result sequence.

Further referring to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a character position correction device, the device embodiments correspond to the method embodiments shown in FIG. 2 , the device specifically Can be used in various electronic devices.

As shown in FIG. 5 , the character position correction device 500 in this embodiment includes: an acquisition unit 501 and a first update unit 502 . The obtaining unit 501 is configured to obtain a text line recognition result sequence and a single character detection and recognition result sequence corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, the single character The detection and recognition result includes the second character and the position of the bounding box of the second character; the first update unit 502 is configured to perform the following position update operation for each word detection and recognition result in the sequence of single-word detection and recognition results: in the text In the line recognition result sequence, search for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one text line recognition result, in each of the found text line recognition results, determine The text line recognition result closest to the word detection and recognition result; the position of the bounding box of the first character in the determined text line recognition result is updated to the second character position in the word detection and recognition result.

In this embodiment, for the specific processing of the acquisition unit 501 and the first update unit 502 of the character position correction device 500 and the technical effects brought about by them, please refer to the relevant descriptions of

steps

201 and 202 in the corresponding embodiment of FIG. 2 , respectively. It is not repeated here.

In some embodiments, the first character bounding box position may include an in-line start position and an in-line end position; and the apparatus 500 may further include:

The average value calculation unit 503 is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the average value of each character gap in the text line recognition result sequence. The average distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result in the text line recognition result that has updated the first character's bounding box position in the two adjacent text line recognition results value;

The second updating unit 504 is configured to update the text line recognition result according to the average value of the in-line character gap for the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence The in-line starting position and the in-line ending position in , wherein, the text line recognition result is between the updated in-line starting position and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence. The distance between is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line starts from the line of the text recognition result following the text recognition result in the text line recognition result sequence. The distance between the starting positions is the average value of the character gap in the line.

In some embodiments, the first character bounding box position may include a line start position and a line end position; and the apparatus 500 may further include:

The determining unit 505 is configured to determine, respectively, the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence. The value is determined as the starting position of the text line and the ending position of the text line;

The third updating unit 506 is configured to use the text line starting position and the text line ending position for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, respectively. Update the line start position and line end position in the text line recognition result.

In some embodiments, the text line recognition model may include sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.

In some embodiments, the text line recognition model may include sequential convolutional neural networks and attention-based recurrent neural networks.

In some embodiments, the word detection and recognition result sequence corresponding to the text line image can be obtained in the following manner:

It should be noted that, for the implementation details and technical effects of each unit in the character position correction device provided by the embodiments of the present disclosure, reference may be made to the descriptions of other embodiments in the present disclosure, and details are not repeated here.

Referring now to FIG. 6 , a schematic diagram of the structure of a computer system 600 suitable for implementing the electronic device of the present disclosure is shown. The computer system 600 shown in FIG. 6 is merely an example, and should not impose any limitations on the functionality and scope of use of the embodiments of the present disclosure.

As shown in FIG. 6, computer system 600 may include a processing device (eg, central processing unit, graphics processor, etc.) 601 that may be loaded into random access according to a program stored in read only memory (ROM) 602 or from storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the computer system 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604 .

Generally, the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, etc.; output devices 607 including, for example, a liquid crystal display (LCD), speakers, vibrators, etc. ; including storage devices 608 such as magnetic tapes, hard disks, etc.; and communication devices 609 . Communication means 609 may allow computer system 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 illustrates a computer system 600 of an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602. When the computer program is executed by the processing apparatus 601, the above-described functions defined in the methods of the embodiments of the present disclosure are executed.

It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.

The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.

The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to realize the character position correction shown in the embodiment shown in FIG. 2 and its implementation manner. method, and/or the character position correction method shown in the embodiment shown in FIG. 4 and its implementation manner.

Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.

The units involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances. For example, the acquisition unit can also be described as "a unit that acquires the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image". .

The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims

A character position correction method, comprising:

Acquire a text line recognition result sequence and a single character detection and recognition result sequence corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the bounding box of the first character, and the single character detection and recognition result includes the second character and the first character. Two-character bounding box position;

For each word detection and recognition result in the single-word detection and recognition result sequence, a position update operation is performed.
The method according to claim 1, wherein the position update operation comprises: searching for a text line recognition result in which the first character is the same as the second character in the single character detection and recognition result in the text line recognition result sequence; in response to Find at least one text line recognition result, and in each of the found text line recognition results, determine the text line recognition result that is closest to the single word detection and recognition result; enclose the first character in the determined text line recognition result with a bounding box The position is updated to the second character position in the word detection and recognition result.
The method of claim 1, wherein the first character bounding box position includes an inline start position and an inline end position; and

The method also includes:

Calculate the average value of the in-line character gaps of the text line recognition result sequence, wherein the in-line character gap average value of the text line recognition result sequence is the average value of each updated first character bounding box position in the text line recognition result sequence. In the text line recognition result, the average value of the distance between the in-line end position in the previous text line recognition result and the in-line start position in the subsequent text line recognition result in the two adjacent text line recognition results;

For the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence, update the in-line start position and in-line end position in the text line recognition result according to the average value of the in-line character gap position, wherein at least one of the following is performed: between the in-line starting position of the text line recognition result after the update and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence The distance is the average value of the character gap in the line, or, the text line recognition result is between the updated in-line end position and the in-line starting position of the text recognition result following the text recognition result in the text line recognition result sequence. The distance is the average of the inline character gaps.
The method of any one of claims 1-3, wherein the first character bounding box position includes a line start position and a line end position; and

The method also includes:

Respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position and text line end position;

For the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, update the line in the text line recognition result with the text line start position and text line end position respectively. Start position and line end position.
The method according to claim 1, wherein the text line recognition result sequence corresponding to the text line image is obtained in the following manner:

Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.
6. The method of claim 5, wherein the text line recognition model comprises a sequential convolutional neural network, a recurrent neural network, and a connected temporal classification CTC.
The method of claim 5, wherein the text line recognition model comprises a sequential convolutional neural network and an attention-based recurrent neural network.
The method according to claim 1, wherein the sequence of single-character detection and recognition results corresponding to the text line image is obtained by the following methods:

Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;

According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;

For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.
A character position correction device, comprising:

an acquisition unit, configured to acquire a sequence of text line recognition results and a sequence of single-character detection and recognition results corresponding to the text-line image, wherein the text line recognition results include a first character and a position of a first character bounding box, and the single-word detection and recognition results Including the second character and the position of the bounding box of the second character;

The first update unit is configured to perform a position update operation for each word detection and recognition result in the sequence of single word detection and recognition results.
The apparatus according to claim 9, wherein the position update operation comprises: searching for a text line recognition result in which the first character is the same as the second character in the single character detection and recognition result in the text line recognition result sequence; in response to Find at least one text line recognition result, and in each of the found text line recognition results, determine the text line recognition result that is closest to the single word detection and recognition result; enclose the first character in the determined text line recognition result with a bounding box The position is updated to the second character position in the word detection and recognition result.
An electronic device comprising:

one or more processors;

a storage device on which one or more programs are stored,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.
A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-8.