WO2022194130A1 - Character position correction method and apparatus, electronic device and storage medium - Google Patents

Character position correction method and apparatus, electronic device and storage medium Download PDF

Info

Publication number
WO2022194130A1
WO2022194130A1 PCT/CN2022/080874 CN2022080874W WO2022194130A1 WO 2022194130 A1 WO2022194130 A1 WO 2022194130A1 CN 2022080874 W CN2022080874 W CN 2022080874W WO 2022194130 A1 WO2022194130 A1 WO 2022194130A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
recognition result
text line
line
text
Prior art date
Application number
PCT/CN2022/080874
Other languages
French (fr)
Chinese (zh)
Inventor
蔡悦
张宇轩
庄妮
黄灿
王长虎
Original Assignee
北京有竹居网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京有竹居网络技术有限公司 filed Critical 北京有竹居网络技术有限公司
Publication of WO2022194130A1 publication Critical patent/WO2022194130A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • Embodiments of the present disclosure relate to the technical field of character recognition, and in particular, to a character position correction method, apparatus, electronic device, and storage medium.
  • OCR Optical Character Recognition
  • Embodiments of the present disclosure propose a character position correction method, apparatus, electronic device, and storage medium.
  • an embodiment of the present disclosure provides a character position correction method, the method includes: acquiring a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the text line recognition result includes the first A character and the first character bounding box position, the single character detection and recognition result includes the second character and the second character bounding box position; for each single character detection and recognition result in the single character detection and recognition result sequence, a position update operation is performed.
  • the location update operation includes: searching the text line recognition result sequence for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one Text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result; update the position of the bounding box of the first character in the determined text line recognition result to this The second character position in the single word detection recognition result.
  • the first character bounding box position includes an inline start position and an inline end position; and the method further includes:
  • the in-line character gap average value of the text line recognition result sequence is the average value of each updated first character bounding box position in the text line recognition result sequence.
  • the first character bounding box position includes a line start position and a line end position
  • the method also includes:
  • the text line recognition result sequence corresponding to the text line image is obtained in the following manner:
  • the text line recognition model includes sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
  • the text line recognition model includes sequential convolutional neural networks and attention-based recurrent neural networks.
  • the sequence of word detection and recognition results corresponding to the text line image is obtained in the following manner:
  • the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
  • a sequence of single-word detection and recognition results is generated using the generated single-word recognition.
  • an embodiment of the present disclosure provides a character position correction device, the device includes: an acquisition unit configured to acquire a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the The text line recognition result includes the first character and the position of the bounding box of the first character, and the single-word detection and recognition result includes the second character and the position of the second character bounding box; the first updating unit is configured to detect and identify the sequence of the single-word detection and recognition results Detect the recognition result of each word in , and perform a position update operation.
  • the location update operation includes: searching the text line recognition result sequence for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one Text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result; update the position of the bounding box of the first character in the determined text line recognition result to this The second character position in the single word detection recognition result.
  • the first character bounding box position includes an in-line start position and an in-line end position; and the apparatus further includes:
  • the average value calculation unit is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the updated value of each updated text line recognition result sequence.
  • the second updating unit is configured to, for the text line recognition result in the text line recognition result sequence whose in-line start position and in-line end position have not been updated, update the text line recognition result in the text line recognition result according to the average value of the in-line character gap.
  • the in-line start position and in-line end position of the text line recognition result are between the updated in-line start position and the in-line end position of the previous text recognition result of the text line recognition result in the text line recognition result sequence
  • the distance is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line and the inline start of the text recognition result following the text recognition result in the text line recognition result sequence
  • the distance between the positions is the average of the inline character gaps.
  • the first character bounding box position includes a line start position and a line end position
  • the device also includes:
  • the determining unit is configured to respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence Determined as the starting position of the text line and the ending position of the text line;
  • the third updating unit is configured to update the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence with the text line start position and the text line end position respectively. This text line identifies the line start and end positions in the result.
  • the text line recognition result sequence corresponding to the text line image is obtained in the following manner:
  • the text line recognition model includes sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
  • the text line recognition model includes sequential convolutional neural networks and attention-based recurrent neural networks.
  • the sequence of word detection and recognition results corresponding to the text line image is obtained in the following manner:
  • the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
  • a sequence of single-word detection and recognition results is generated using the generated single-word recognition.
  • embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are stored by the one or more programs described above When executed by the multiple processors, the above-mentioned one or more processors are caused to implement the method described in any one of the implementation manners of the first aspect.
  • embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by one or more processors, any implementation manner of the first aspect is implemented method described.
  • FIG. 1 is an exemplary system architecture diagram to which some embodiments of the character position correction method of the present disclosure may be applied;
  • FIG. 2 is a flowchart of some embodiments of a character position correction method according to the present disclosure
  • 3A-3C are schematic diagrams of an application scenario of the character position correction method according to the present disclosure.
  • FIG. 4 is a flow chart of further embodiments of a character position correction method according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of some embodiments of a character position correction device according to the present disclosure.
  • FIG. 6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
  • the text line OCR can accurately identify the characters appearing in the text line image
  • the text line OCR only uses a rough estimation of the position of a single character.
  • CRNN Convolutional Recurrent Neural Networks, Convolutional Recurrent Neural Networks
  • CTC Connectionist Temporal Classification
  • Transformer is mainly based on attention mechanism.
  • the single-word positions obtained by them have low accuracy and cannot be applied in scenarios with high requirements for single-word positions. For example, in the scenario of difference comparison between two documents (for example, contract documents), the accuracy of the word position needs to be high.
  • the applicant has found through practice that the single-character character recognition accuracy obtained in the single-character granularity character detection and recognition is low, but the accuracy of the single-character position is high.
  • the character position correction method, device, electronic device, and storage medium provided by the embodiments of the present disclosure, by using the single-character position in the single-character detection and recognition result to correct the single-character position in the text line recognition result, can take into account the character recognition performance and
  • the accuracy of the position of a single character is specifically implemented as follows: first, obtain a sequence of text line recognition results and a sequence of single character detection and recognition results corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, and the single character detection
  • the recognition result includes the second character and the position of the bounding box of the second character; then, for each single-word detection and recognition result in the single-word detection and recognition result sequence, a position update operation is performed: that is, the first character and the single word are searched in the text line recognition result sequence.
  • Detect the text line recognition result with the same second character in the recognition result in response to finding at least one text line recognition result, in each of the found text line recognition results, determine the text line recognition that is closest to the single word detection and recognition result. Result: updating the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the word detection and recognition result. Then, by updating the position of the character bounding box in the text line recognition result, the accuracy of the character position in the text line recognition result is improved.
  • FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the character position correction method, apparatus, electronic device, and storage medium of the present disclosure may be applied.
  • the system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various communication client applications can be installed on the terminal devices 101, 102 and 103, such as character recognition applications, text processing applications, speech recognition applications, short video social applications, web conferencing applications, web browser applications, Shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, and 103 can be various electronic devices with video capture devices (such as cameras), handwriting pads and display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 Players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer 4) Players, Laptops and Desktops computer, etc.
  • the terminal devices 101, 102 and 103 are software, they can be installed in the terminal devices listed above. It can be implemented as a plurality of software or software modules (for example, to provide character position correction services), or can be implemented as a single software or software module. There is no specific limitation here.
  • the character position correction method provided by the present disclosure may be performed by the terminal devices 101 , 102 , and 103 , and correspondingly, the character position correction apparatus may be provided in the terminal devices 101 , 102 , and 103 .
  • the system architecture 100 may not include the server 105 .
  • the character position correction method provided by the present disclosure may be executed jointly by the terminal devices 101, 102, 103 and the server 105, for example, "obtain the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image"
  • the steps of “perform the location update operation for each word detection and recognition result in the sequence of word detection and recognition results” can be performed by the terminal devices 101 , 102 , and 103 , etc., can be performed by the server 105 .
  • the character position correction device may also be provided in the terminal devices 101 , 102 , 103 and the server 105 respectively.
  • the character position correction method provided by the present disclosure may be executed by the server 105.
  • the character position correction device may also be provided in the server 105.
  • the system architecture 100 may not include the terminal devices 101 and 102. , 103.
  • the server 105 may be hardware or software.
  • the server 105 can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server.
  • the server 105 is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • FIG. 2 there is shown a flow 200 of some embodiments of a character position correction method according to the present disclosure, the character position correction method comprising the following steps:
  • Step 201 Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.
  • the execution body of the character position correction method (for example, the terminal devices 101, 102, and 103 shown in FIG. 1 ) can be locally or remotely from other electronic devices (for example, shown in FIG. 1 ) connected to the above-mentioned execution body network.
  • the server 105 shown obtains the text line recognition result sequence and the word detection and recognition result sequence corresponding to the text line image.
  • the text line image may be an image including a text line object.
  • each character in the text line object may have the same size or may have different sizes.
  • Each character in the text line object may be composed of characters of the same language, or may be composed of characters of more than one language, which is not specifically limited in the present disclosure.
  • the text line recognition result sequence corresponding to the text line image may be obtained by using the text line OCR technology to recognize the text line image.
  • the sequence of text line recognition results may be a sequence consisting of text line recognition results, and the text line recognition results may include the first character and the position of the first character's bounding box.
  • the position of the first character's bounding box may be used to indicate that the first character corresponds to The position range in the image of the text line.
  • the text line recognition results in the text line recognition result sequence may be arranged according to the order of the first character in the text line recognition result in the text line object in the text line image. In practice, the circumscribed rectangle of the character in the text line image is usually used as the bounding box of the character.
  • the position of the bounding box of the first character can use various implementations to represent the circumscribed rectangle of the first character in the text line image.
  • the position of the bounding box of the first character may include the coordinates of four vertices of the bounding rectangle of the first character in the text line image; for another example, the position of the bounding box of the first character may further include the bounding rectangle of the first character in the text line image The coordinates of the top-left vertex of , and the lengths of the long and short sides of the bounding rectangle.
  • the sequence of single-word detection and recognition results corresponding to the text-line image may be obtained by recognizing the text-line image by using the OCR technology based on single-word detection and recognition.
  • the sequence of single-word detection and recognition results may be a sequence composed of single-word detection and recognition results, and the single-word detection and recognition results may include the second character and the position of the bounding box of the second character.
  • the position of the second character bounding box may be used to represent that the second character corresponds to The position range in the image of the text line.
  • the single-word detection and recognition results in the single-word detection and recognition result sequence may be arranged according to the order of the second character in the single-word detection and recognition result in the text line object in the text line image.
  • the second character bounding box may also be a circumscribed rectangle representing the second character in the text line image using various implementations, which will not be repeated here.
  • the text line recognition result sequence corresponding to the text line image may be obtained in the following manner:
  • the text line recognition model can be used to characterize the correspondence between the image to be recognized and the text line recognition result sequence.
  • a text line recognition model can be obtained by training a machine learning model on a large number of training samples.
  • the training sample may include a sample text line image and corresponding label information, and the label information may include specific characters and corresponding character positions of each character in the sample text line image.
  • the text line recognition model here may include sequentially arranged Convolutional Neural Networks (CNN, Convolutional Neural Networks), Recurrent Neural Networks (RNN, Recurrent Neural Networks) and CTC (Connectionist Temporal Classification, connection time classification).
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Networks
  • CTC Connectionist Temporal Classification, connection time classification
  • the text line recognition model here may also include sequential convolutional neural networks and attention-based recurrent neural networks.
  • sequence of word detection and recognition results corresponding to the text line image may be obtained in the following manner:
  • single-word detection is performed on the text line image by using the target detection algorithm to obtain the position of at least one character bounding box.
  • a single-word recognition model can be obtained by training a machine learning model based on a large number of single-word training samples.
  • the single-word training samples may include sample single-word images and corresponding characters.
  • the character recognition result corresponding to the character bounding box and the position of the character bounding box are used to generate a single character detection and recognition result, and the character in the text line image corresponding to the position of the character bounding box is located. sequence, using the generated word recognition to generate a sequence of word detection and recognition results.
  • Step 202 for each word detection and recognition result in the sequence of word detection and recognition results, perform a position update operation.
  • the above-mentioned execution subject may perform a position update operation on each word detection and recognition result in the sequence of word detection and recognition results obtained in step 201 .
  • the location update operation may specifically include the following sub-steps 2021 to 2023:
  • Sub-step 2021 search for a text line recognition result in which the first character is the same as the second character in the word detection and recognition result in the text line recognition result sequence.
  • Sub-step 2022 in response to finding at least one text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result.
  • the difference between the sorting order of the text line recognition results in the text line recognition result sequence and the sorting order of the single-character detection and recognition results in the word detection and recognition result sequence can be used as the difference between the text line recognition result and the single-word detection and recognition result the distance.
  • the distance between the position of the first character bounding box in the text line recognition result and the position of the second character bounding box in the word detection and recognition result may be used as the distance between the text line recognition result and the word detection and recognition result.
  • Sub-step 2023 Update the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the single-word detection and recognition result.
  • the text line recognition result sequence 302 and the single character detection and recognition result sequence 303 corresponding to the text line image 301 are shown.
  • the first character in the text line recognition result sequence 302 is recognized Correct, but the first character bounding box location is rough.
  • the position of the bounding box of the second character in the single-character detection and recognition result sequence 303 is relatively accurate, but the second character has a recognition error. an hour".
  • Executing step 202 based on the above-mentioned text line recognition result sequence 302 and word detection and recognition result sequence 303 may be:
  • the first character in the text line recognition result sequence 302 is respectively " ⁇ ", “qi", "report” in the text line recognition results
  • the position of the first character bounding box of is updated to the position of the first character bounding box in the single-character detection and recognition results of which the second characters in the single-character detection and recognition result sequence 303 are " ⁇ ", "qi", and "bao" respectively.
  • the text line recognition result sequence 302 will be as shown in the lower text line recognition result sequence 302 in FIG. 3A . It can be seen that “now” and “pre”, except for the second character recognition error in the single character detection and recognition result, correspond to Except for the position of the first character bounding box of The accuracy of the position is higher, and it is more suitable for scenes with higher requirements on the character position.
  • the character position correction method provided by the above embodiments of the present disclosure by using the character bounding box position in the word detection and recognition result corresponding to the text line image is more accurate than the character bounding box position in the text line recognition result, while the text line recognition result is more accurate. Compared with the characters in the single-character detection and recognition, the characters in the middle are more accurate.
  • the position of the character bounding box in the text line recognition result is corrected with the character bounding box position of the correct character in the single-word detection and recognition result, which improves the text line recognition result.
  • the accuracy of the position of the character bounding box is more suitable for scenes with high requirements on the character position.
  • the character position correction method includes the following steps:
  • Step 401 Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.
  • Step 402 for each word detection and recognition result in the word detection and recognition result sequence, perform a position update operation.
  • steps 401 and 402 are basically the same as the operations and effects of steps 202 and 202 in the embodiment shown in FIG. 2 , and will not be repeated here.
  • Step 403 Calculate the average value of the inter-line character gap of the text line recognition result sequence.
  • the position of the bounding box of the first character in each text line recognition result in the text line recognition result sequence may include an in-line start position and an in-line end position.
  • the in-line start position and in-line end position in the bounding box position of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction parallel to the text line.
  • the coordinate origin of the text line image is the upper left corner vertex of the text line image
  • the characters in the text line are arranged horizontally from left to right.
  • the in-line starting position in the first character bounding box position corresponding to the first character may be the abscissa value of the upper left corner or the lower left corner vertex coordinate of the circumscribed rectangle of the first character
  • the in-line ending position may be the first character The abscissa value of the vertex coordinates of the upper right or lower right corner of the bounding rectangle.
  • the coordinate origin of the text line image is the upper left corner vertex of the text line image
  • the characters in the text line are arranged vertically from top to bottom.
  • the in-line starting position in the first character bounding box position corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character
  • the in-line ending position may be the first character The ordinate value of the coordinates of the lower left or lower right corner of the bounding rectangle.
  • text lines in the text line image does not necessarily define a specific direction.
  • text lines can be arranged horizontally from left to right, and text lines can also be arranged vertically from top to bottom. Text lines can also be arranged from top left to bottom right.
  • the average value of the character gaps in the text line recognition result sequence is the average value of the two adjacent text line recognition results in the text line recognition results whose positions of the text line recognition result sequences have been updated with the position of the bounding box of the first character.
  • the upper part 302 in FIG. 3B shows the position of each first character bounding box in the text line recognition result sequence corresponding to the revision of the first character bounding box position in step 202 shown in FIG. 3A .
  • the text line recognition results whose position of the bounding box of the first character has been updated each include the first characters as “ ⁇ ”, “ ⁇ ”, “ ⁇ ”, and “Qi” respectively.
  • Step 404 for a text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, update the in-line start position and in-line end position in the text line recognition result.
  • the above-mentioned execution body may, for the text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, calculate the inline character gap of the text line recognition result sequence obtained according to step 403 Average, update the inline start position and inline end position in the text line recognition result.
  • the distance between the updated in-line starting position of the text line recognition result and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the average value of the in-line character gap obtained by the above calculation , or, the distance between the in-line starting position of the text line recognition result in the updated in-line end position and the text line recognition result of the text line recognition result sequence is the in-line character obtained by the above calculation Gap average.
  • step 404 the in-line start position and in-line end position in the bounding box position of the first character in the text line recognition result whose first characters in the text line recognition result sequence are "now" and "pre” can be obtained according to the above calculation.
  • the average value d0 of the character gap in the line is to update the inline start position and inline end position in the text line recognition results where the first character is "jin" and "pre”, and make the end position in the line corresponding to the first character “jin” after the update and
  • the distance between the in-line starting positions corresponding to the first character " ⁇ ” is the average value d0 of the in-line character gap, and the in-line starting position corresponding to the first character "pre” after the update is the same as the in-line corresponding to the first character "qi”.
  • the distance between the end positions is the average value d0 of the inline character gap, and/or, the distance between the inline end position corresponding to the first character "pre” after the update and the inline start position corresponding to the first character "report” is the inline character gap Average d0.
  • the updated text sequence recognition result sequence may be shown as the lower part 302 in FIG. 3B .
  • the above-mentioned execution subject may also perform the following steps 405 and 406 after performing step 402, or after performing step 404:
  • Step 405 Determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position. start position and end position of the text line.
  • the position of the first bounding box in each text line recognition result in the text line recognition result sequence may further include a line start position and a line end position.
  • the line start position and the line end position in the bounding box of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction perpendicular to the text line.
  • the coordinate origin of the text line image is the upper left corner vertex of the text line image
  • the characters in the text line are arranged horizontally from left to right.
  • the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character
  • the end position of the line may be the first character The ordinate value of the vertex coordinates of the lower left or lower right corner of the bounding rectangle.
  • the coordinate origin of the text line image is the upper left corner vertex of the text line image
  • the characters in the text line are arranged vertically from top to bottom.
  • the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the abscissa value of the upper left corner or the upper left corner vertex coordinate of the circumscribed rectangle of the first character
  • the end position of the line may be the first The abscissa value of the upper right or lower right corner of the bounding rectangle of the character.
  • text lines in the text line image does not necessarily define a specific direction.
  • text lines can be arranged horizontally from left to right, and text lines can also be arranged vertically from top to bottom. Text lines can also be arranged from top left to bottom right.
  • Fig. 3C The upper part 302 in Fig. 3C shows the text line recognition result sequence 302 shown in the lower part in Fig. 3B.
  • the text line recognition results whose line start position and line end position have been updated include the first characters " ⁇ ", " ⁇ ", “ ⁇ ”, "qi", and "bao" The corresponding text line recognition result.
  • the minimum value of the line starting positions of the above text line recognition results is the line starting position y1 corresponding to the first character " ⁇ ", that is, the ordinate corresponding to the upper side of the circumscribed rectangle of the first character " ⁇ ”; the above texts
  • the maximum value in the line end positions of the line recognition result is the line end position y2 corresponding to the first character "report”, that is, the ordinate corresponding to the lower side of the circumscribed rectangle of the first character "report”. Therefore, y1 and y2 can be determined as the starting position of the text line and the ending position of the text line, respectively.
  • Step 406 for the text line recognition result that has not updated the line start position and the line end position in the text line recognition result sequence, update the line start position in the text line recognition result with the text line start position and the text line end position respectively. start position and line end position.
  • the first character of the line start position and line end position that have not been updated in the text line recognition sequence 302 in the upper part of FIG. 3C are "now” and "pre", respectively
  • the first character The line start position and line end position in the position are updated to y1 and y2
  • the updated text line recognition result sequence 302 is shown in the lower part of FIG. 3C .
  • the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of FIG. 3C is relative to the position of the first character enclosing box in the text line recognition result sequence 302 shown in the upper part of FIG. 3C .
  • the accuracy of the box position is higher, which is reflected in the line height of each character, which is further approximated to the real situation.
  • the process 400 of the character position correction method in this embodiment has more text line recognition for the in-line start position and in-line end position that have not been updated.
  • the in-line start position and the in-line end position are updated according to the average value of the in-line character gaps, and in some embodiments, the in-line start position and the in-line end position of the text line recognition results that have not been updated may be updated. Operations on position and inline end position. Therefore, the solution described in this embodiment can further improve the accuracy of the position of the bounding box of the first character in the text line recognition result sequence.
  • the present disclosure provides some embodiments of a character position correction device, the device embodiments correspond to the method embodiments shown in FIG. 2 , the device specifically Can be used in various electronic devices.
  • the character position correction device 500 in this embodiment includes: an acquisition unit 501 and a first update unit 502 .
  • the obtaining unit 501 is configured to obtain a text line recognition result sequence and a single character detection and recognition result sequence corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, the single character
  • the detection and recognition result includes the second character and the position of the bounding box of the second character
  • the first update unit 502 is configured to perform the following position update operation for each word detection and recognition result in the sequence of single-word detection and recognition results: in the text In the line recognition result sequence, search for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one text line recognition result, in each of the found text line recognition results, determine The text line recognition result closest to the word detection and recognition result; the position of the bounding box of the first character in the determined text line recognition result is updated to the second character position in the word
  • the first character bounding box position may include an in-line start position and an in-line end position; and the apparatus 500 may further include:
  • the average value calculation unit 503 is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the average value of each character gap in the text line recognition result sequence.
  • the second updating unit 504 is configured to update the text line recognition result according to the average value of the in-line character gap for the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence
  • the in-line starting position and the in-line ending position in wherein, the text line recognition result is between the updated in-line starting position and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence.
  • the distance between is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line starts from the line of the text recognition result following the text recognition result in the text line recognition result sequence.
  • the distance between the starting positions is the average value of the character gap in the line.
  • the first character bounding box position may include a line start position and a line end position; and the apparatus 500 may further include:
  • the determining unit 505 is configured to determine, respectively, the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence. The value is determined as the starting position of the text line and the ending position of the text line;
  • the third updating unit 506 is configured to use the text line starting position and the text line ending position for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, respectively. Update the line start position and line end position in the text line recognition result.
  • the text line recognition result sequence corresponding to the text line image may be obtained in the following manner:
  • the text line recognition model may include sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
  • the text line recognition model may include sequential convolutional neural networks and attention-based recurrent neural networks.
  • the word detection and recognition result sequence corresponding to the text line image can be obtained in the following manner:
  • the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
  • a sequence of single-word detection and recognition results is generated using the generated single-word recognition.
  • FIG. 6 a schematic diagram of the structure of a computer system 600 suitable for implementing the electronic device of the present disclosure is shown.
  • the computer system 600 shown in FIG. 6 is merely an example, and should not impose any limitations on the functionality and scope of use of the embodiments of the present disclosure.
  • computer system 600 may include a processing device (eg, central processing unit, graphics processor, etc.) 601 that may be loaded into random access according to a program stored in read only memory (ROM) 602 or from storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 .
  • ROM read only memory
  • RAM memory
  • various programs and data necessary for the operation of the computer system 600 are also stored.
  • the processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • An input/output (I/O) interface 605 is also connected to bus 604 .
  • the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, etc.; output devices 607 including, for example, a liquid crystal display (LCD), speakers, vibrators, etc. ; including storage devices 608 such as magnetic tapes, hard disks, etc.; and communication devices 609 .
  • Communication means 609 may allow computer system 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 illustrates a computer system 600 of an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602.
  • the processing apparatus 601 the above-described functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to realize the character position correction shown in the embodiment shown in FIG. 2 and its implementation manner. method, and/or the character position correction method shown in the embodiment shown in FIG. 4 and its implementation manner.
  • Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the units involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the name of the unit does not constitute a limitation of the unit itself under certain circumstances.
  • the acquisition unit can also be described as "a unit that acquires the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image". .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Character Discrimination (AREA)

Abstract

The present disclosure provides a character position correction method and apparatus, an electronic device and a storage medium. By means of correcting a single character position in a text line recognition result by using a single character position in a single character detection and recognition result, the text recognition performance and the single character position accuracy can be both taken into consideration.

Description

字符位置修正方法、装置、电子设备和存储介质Character position correction method, device, electronic device and storage medium
本申请是以CN申请号为202110304878.4,申请日为2021年3月16日的申请为基础,并主张其优先权,该CN申请的公开内容在此作为整体引入本申请中。This application is based on the CN application number 202110304878.4 and the filing date is March 16, 2021, and claims its priority. The disclosure of this CN application is hereby incorporated into this application as a whole.
技术领域technical field
本公开的实施例涉及文字识别技术领域,具体涉及字符位置修正方法、装置、电子设备和存储介质。Embodiments of the present disclosure relate to the technical field of character recognition, and in particular, to a character position correction method, apparatus, electronic device, and storage medium.
背景技术Background technique
光学字符识别(Optical Character Recognition,OCR)是识别图像中字符的技术。当前OCR主要有“行”和“单字”两种粒度。其中文本行OCR占主流,文本行OCR能够较准确地识识别文本行图像中出现的字符,且利用自身特性粗略估计单字符位置。Optical Character Recognition (OCR) is a technology that recognizes characters in images. At present, OCR mainly has two granularities of "line" and "single word". Among them, the text line OCR is the mainstream, and the text line OCR can more accurately identify the characters appearing in the text line image, and use its own characteristics to roughly estimate the position of a single character.
发明内容SUMMARY OF THE INVENTION
本公开的实施例提出了字符位置修正方法、装置、电子设备和存储介质。Embodiments of the present disclosure propose a character position correction method, apparatus, electronic device, and storage medium.
第一方面,本公开的实施例提供了一种字符位置修正方法,该方法包括:获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列,其中,所述文本行识别结果包括第一字符和第一字符包围盒位置,所述单字检测识别结果包括第二字符和第二字符包围盒位置;对于所述单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。In a first aspect, an embodiment of the present disclosure provides a character position correction method, the method includes: acquiring a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the text line recognition result includes the first A character and the first character bounding box position, the single character detection and recognition result includes the second character and the second character bounding box position; for each single character detection and recognition result in the single character detection and recognition result sequence, a position update operation is performed.
在一些实施方式中,所述位置更新操作包括:在所述文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更 新为该单字检测识别结果中的第二字符位置。In some embodiments, the location update operation includes: searching the text line recognition result sequence for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one Text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result; update the position of the bounding box of the first character in the determined text line recognition result to this The second character position in the single word detection recognition result.
在一些实施方式中,所述第一字符包围盒位置包括行内起始位置和行内结束位置;以及所述方法还包括:In some embodiments, the first character bounding box position includes an inline start position and an inline end position; and the method further includes:
计算所述文本行识别结果序列的行内字符间隙平均值,其中,所述文本行识别结果序列的行内字符间隙平均值为所述文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果中两相邻文本行识别结果中的在前文本行识别结果中行内结束位置和在后文本行识别结果中行内起始位置之间距离的平均值;Calculate the average value of the in-line character gaps of the text line recognition result sequence, wherein the in-line character gap average value of the text line recognition result sequence is the average value of each updated first character bounding box position in the text line recognition result sequence. In the text line recognition result, the average value of the distance between the in-line end position in the previous text line recognition result and the in-line start position in the subsequent text line recognition result in the two adjacent text line recognition results;
对于所述文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,根据所述行内字符间隙平均值更新该文本行识别结果中的行内起始位置和行内结束位置,其中,该文本行识别结果在更新后的行内起始位置与所述文本行识别结果序列中该文本识别结果的前一文本识别结果的行内结束位置之间的距离为所述行内字符间隙平均值,和/或,该文本行识别结果在更新后的行内结束位置与所述文本行识别结果序列中该文本识别结果的后一文本识别结果的行内起始位置之间的距离为所述行内字符间隙平均值。For the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence, update the in-line start position and in-line end position in the text line recognition result according to the average value of the in-line character gap position, wherein the distance between the updated in-line starting position of the text line recognition result and the in-line ending position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the in-line character gap Average value, and/or, the distance between the in-line end position of the text line recognition result after the update and the in-line start position of the text recognition result following the text recognition result in the text line recognition result sequence is the described Inline character gap average.
在一些实施方式中,所述第一字符包围盒位置包括行起始位置和行结束位置;以及In some embodiments, the first character bounding box position includes a line start position and a line end position; and
所述方法还包括:The method also includes:
分别将所述文本行识别结果序列中被更新过行起始位置和行结束位置的各文本行识别结果的行起始位置中的最小值和行结束位置中的最大值确定为文本行起始位置和文本行结束位置;Respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position and text line end position;
对于所述文本行识别结果序列中未被更新过行起始位置和行结束位置的文本行识别结果,分别用所述文本行起始位置和文本行结束位置更新该文本行识别结果中的行起始位置和行结束位置。For the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, update the line in the text line recognition result with the text line start position and text line end position respectively. Start position and line end position.
在一些实施方式中,所述文本行图像对应的文本行识别结果序列是通过如下方式得到的:In some embodiments, the text line recognition result sequence corresponding to the text line image is obtained in the following manner:
将所述文本行图像输入预先训练的文本行识别模型,得到与所述文本行图像对应的文本行识别结果序列,其中,所述文本行识别模型用于表征包括待识别图像与文本行识别结果序列之间的对应关系。Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.
在一些实施方式中,所述文本行识别模型包括顺序排列的卷积神经网络、循环神经网络和连接时间分类CTC。In some embodiments, the text line recognition model includes sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
在一些实施方式中,所述文本行识别模型包括顺序排列的卷积神经网络和基于注意力机制的循环神经网络。In some embodiments, the text line recognition model includes sequential convolutional neural networks and attention-based recurrent neural networks.
在一些实施方式中,所述文本行图像对应的单字检测识别结果序列是通过如下方式得到的:In some embodiments, the sequence of word detection and recognition results corresponding to the text line image is obtained in the following manner:
利用目标检测算法对所述文本行图像进行单字检测,得到至少一个字符包围盒位置;Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;
按照检测得到的各字符包围盒位置从所述文本行图像中截取字符图像,以及将所截取的字符图像输入单字识别模型,得到对应的字符识别结果;According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
对于检测得到的每个字符包围盒,用该字符包围盒对应的字符识别结果和该字符包围盒位置生成单字检测识别结果,以及按照该字符包围盒位置对应在文本行图像中的字符所在顺序,用所生成的单字识别生成单字检测识别结果序列。For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.
第二方面,本公开的实施例提供了一种字符位置修正装置,该装置包括:获取单元,被配置成获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列,其中,所述文本行识别结果包括第一字符和第一字符包围盒位置,所述单字检测识别结果包括第二字符和第二字符包围盒位置;第一更新单元,被配置成对于所述单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。In a second aspect, an embodiment of the present disclosure provides a character position correction device, the device includes: an acquisition unit configured to acquire a text line recognition result sequence and a word detection recognition result sequence corresponding to a text line image, wherein the The text line recognition result includes the first character and the position of the bounding box of the first character, and the single-word detection and recognition result includes the second character and the position of the second character bounding box; the first updating unit is configured to detect and identify the sequence of the single-word detection and recognition results Detect the recognition result of each word in , and perform a position update operation.
在一些实施方式中,所述位置更新操作包括:在所述文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。In some embodiments, the location update operation includes: searching the text line recognition result sequence for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one Text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result; update the position of the bounding box of the first character in the determined text line recognition result to this The second character position in the single word detection recognition result.
在一些实施方式中,所述第一字符包围盒位置包括行内起始位置和行内结束位置;以及所述装置还包括:In some embodiments, the first character bounding box position includes an in-line start position and an in-line end position; and the apparatus further includes:
平均值计算单元,被配置成计算所述文本行识别结果序列的行内 字符间隙平均值,其中,所述文本行识别结果序列的行内字符间隙平均值为所述文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果中两相邻文本行识别结果中的在前文本行识别结果中行内结束位置和在后文本行识别结果中行内起始位置之间距离的平均值;The average value calculation unit is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the updated value of each updated text line recognition result sequence. The average value of the distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result in the text line recognition result passing the first character's bounding box position in the two adjacent text line recognition results ;
第二更新单元,被配置成对于所述文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,根据所述行内字符间隙平均值更新该文本行识别结果中的行内起始位置和行内结束位置,其中,该文本行识别结果在更新后的行内起始位置与所述文本行识别结果序列中该文本识别结果的前一文本识别结果的行内结束位置之间的距离为所述行内字符间隙平均值,和/或,该文本行识别结果在更新后的行内结束位置与所述文本行识别结果序列中该文本识别结果的后一文本识别结果的行内起始位置之间的距离为所述行内字符间隙平均值。The second updating unit is configured to, for the text line recognition result in the text line recognition result sequence whose in-line start position and in-line end position have not been updated, update the text line recognition result in the text line recognition result according to the average value of the in-line character gap. The in-line start position and in-line end position of the text line recognition result are between the updated in-line start position and the in-line end position of the previous text recognition result of the text line recognition result in the text line recognition result sequence The distance is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line and the inline start of the text recognition result following the text recognition result in the text line recognition result sequence The distance between the positions is the average of the inline character gaps.
在一些实施方式中,所述第一字符包围盒位置包括行起始位置和行结束位置;以及In some embodiments, the first character bounding box position includes a line start position and a line end position; and
所述装置还包括:The device also includes:
确定单元,被配置成分别将所述文本行识别结果序列中被更新过行起始位置和行结束位置的各文本行识别结果的行起始位置中的最小值和行结束位置中的最大值确定为文本行起始位置和文本行结束位置;The determining unit is configured to respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence Determined as the starting position of the text line and the ending position of the text line;
第三更新单元,被配置成对于所述文本行识别结果序列中未被更新过行起始位置和行结束位置的文本行识别结果,分别用所述文本行起始位置和文本行结束位置更新该文本行识别结果中的行起始位置和行结束位置。The third updating unit is configured to update the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence with the text line start position and the text line end position respectively. This text line identifies the line start and end positions in the result.
在一些实施方式中,所述文本行图像对应的文本行识别结果序列是通过如下方式得到的:In some embodiments, the text line recognition result sequence corresponding to the text line image is obtained in the following manner:
将所述文本行图像输入预先训练的文本行识别模型,得到与所述文本行图像对应的文本行识别结果序列,其中,所述文本行识别模型用于表征包括待识别图像与文本行识别结果序列之间的对应关系。Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.
在一些实施方式中,所述文本行识别模型包括顺序排列的卷积神经网络、循环神经网络和连接时间分类CTC。In some embodiments, the text line recognition model includes sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
在一些实施方式中,所述文本行识别模型包括顺序排列的卷积神经网络和基于注意力机制的循环神经网络。In some embodiments, the text line recognition model includes sequential convolutional neural networks and attention-based recurrent neural networks.
在一些实施方式中,所述文本行图像对应的单字检测识别结果序列是通过如下方式得到的:In some embodiments, the sequence of word detection and recognition results corresponding to the text line image is obtained in the following manner:
利用目标检测算法对所述文本行图像进行单字检测,得到至少一个字符包围盒位置;Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;
按照检测得到的各字符包围盒位置从所述文本行图像中截取字符图像,以及将所截取的字符图像输入单字识别模型,得到对应的字符识别结果;According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
对于检测得到的每个字符包围盒,用该字符包围盒对应的字符识别结果和该字符包围盒位置生成单字检测识别结果,以及按照该字符包围盒位置对应在文本行图像中的字符所在顺序,用所生成的单字识别生成单字检测识别结果序列。For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.
第三方面,本公开的实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,其上存储有一个或多个程序,当上述一个或多个程序被上述一个或多个处理器执行时,使得上述一个或多个处理器实现如第一方面中任一实现方式描述的方法。In a third aspect, embodiments of the present disclosure provide an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, when the one or more programs are stored by the one or more programs described above When executed by the multiple processors, the above-mentioned one or more processors are caused to implement the method described in any one of the implementation manners of the first aspect.
第四方面,本公开的实施例提供了一种计算机可读存储介质,其上存储有计算机程序,其中,该计算机程序被一个或多个处理器执行时实现如第一方面中任一实现方式描述的方法。In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium on which a computer program is stored, wherein, when the computer program is executed by one or more processors, any implementation manner of the first aspect is implemented method described.
附图说明Description of drawings
通过阅读参照以下附图所作的对非限制性实施例所作的详细描述,本公开的其它特征、目的和优点将会变得更明显。附图仅用于示出具体实施方式的目的,而并不认为是对本公开的限制。在附图中:Other features, objects and advantages of the present disclosure will become more apparent upon reading the detailed description of non-limiting embodiments taken with reference to the following drawings. The drawings are for illustrative purposes only and are not to be considered limiting of the present disclosure. In the attached image:
图1是本公开的字符位置修正方法的一些实施例可以应用于其中的示例性系统架构图;1 is an exemplary system architecture diagram to which some embodiments of the character position correction method of the present disclosure may be applied;
图2是根据本公开的字符位置修正方法的一些实施例的流程图;2 is a flowchart of some embodiments of a character position correction method according to the present disclosure;
图3A-图3C是根据本公开的字符位置修正方法一个应用场景的示意图;3A-3C are schematic diagrams of an application scenario of the character position correction method according to the present disclosure;
图4是根据本公开的字符位置修正方法的又一些实施例的流程图;4 is a flow chart of further embodiments of a character position correction method according to the present disclosure;
图5是根据本公开的字符位置修正装置的一些实施例的结构示意图;5 is a schematic structural diagram of some embodiments of a character position correction device according to the present disclosure;
图6是适于用来实现本公开的实施例的电子设备的计算机系统的结构示意图。6 is a schematic structural diagram of a computer system suitable for implementing an electronic device of an embodiment of the present disclosure.
具体实施方式Detailed ways
下面结合附图和实施例对本公开作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释相关发明,而非对该发明的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关发明相关的部分。The present disclosure will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the related invention, but not to limit the invention. In addition, it should be noted that, for the convenience of description, only the parts related to the related invention are shown in the drawings.
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。It should be noted that the embodiments of the present disclosure and the features of the embodiments may be combined with each other under the condition of no conflict. The present disclosure will be described in detail below with reference to the accompanying drawings and in conjunction with embodiments.
在相关技术中,文本行OCR虽能够准确地识别文本行图像中出现的字符,但文本行OCR只是利用粗略估计单字符位置。例如,CRNN(Convolutional Recurrent Neural Networks,卷积循环神经网络)主要基于CTC(Connectionist Temporal Classification)信息倒推,转换(Transformer)主要基于注意力机制。但是,它们获得的单字位置准确性较低,在单字位置要求较高的场景中无法适用。例如,在两文档(例如,合同文档)差异比对的场景下,需要单字位置准确性较高。In the related art, although the text line OCR can accurately identify the characters appearing in the text line image, the text line OCR only uses a rough estimation of the position of a single character. For example, CRNN (Convolutional Recurrent Neural Networks, Convolutional Recurrent Neural Networks) is mainly based on CTC (Connectionist Temporal Classification) information backward, and Transformer is mainly based on attention mechanism. However, the single-word positions obtained by them have low accuracy and cannot be applied in scenarios with high requirements for single-word positions. For example, in the scenario of difference comparison between two documents (for example, contract documents), the accuracy of the word position needs to be high.
为了兼顾文字识别性能和单字位置准确性,申请人经实践发现,单字粒度的文字检测与识别中得到的单字字符识别准确性较低,但单字位置的准确性较高。因此,本公开的实施例提供的字符位置修正方法、装置、电子设备和存储介质,通过利用单字检测识别结果中的单字位置对文本行识别结果中的单字位置进行修正,可以兼顾文字识别性能和单字位置准确性,其具体实现为:首先,获取文本行图像对应 的文本行识别结果序列和单字检测识别结果序列,其中,文本行识别结果包括第一字符和第一字符包围盒位置,单字检测识别结果包括第二字符和第二字符包围盒位置;再对于单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作:即,在文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。继而通过对文本行识别结果中字符包围盒位置进行更新,提高了文本行识别结果中字符位置的准确度。In order to take into account both the character recognition performance and the accuracy of the single-character position, the applicant has found through practice that the single-character character recognition accuracy obtained in the single-character granularity character detection and recognition is low, but the accuracy of the single-character position is high. Therefore, the character position correction method, device, electronic device, and storage medium provided by the embodiments of the present disclosure, by using the single-character position in the single-character detection and recognition result to correct the single-character position in the text line recognition result, can take into account the character recognition performance and The accuracy of the position of a single character is specifically implemented as follows: first, obtain a sequence of text line recognition results and a sequence of single character detection and recognition results corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, and the single character detection The recognition result includes the second character and the position of the bounding box of the second character; then, for each single-word detection and recognition result in the single-word detection and recognition result sequence, a position update operation is performed: that is, the first character and the single word are searched in the text line recognition result sequence. Detect the text line recognition result with the same second character in the recognition result; in response to finding at least one text line recognition result, in each of the found text line recognition results, determine the text line recognition that is closest to the single word detection and recognition result. Result: updating the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the word detection and recognition result. Then, by updating the position of the character bounding box in the text line recognition result, the accuracy of the character position in the text line recognition result is improved.
图1示出了可以应用本公开的字符位置修正方法、装置、电子设备和存储介质的实施例的示例性系统架构100。FIG. 1 illustrates an exemplary system architecture 100 to which embodiments of the character position correction method, apparatus, electronic device, and storage medium of the present disclosure may be applied.
根据本公开的一些实施例,如图1所示,系统架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。According to some embodiments of the present disclosure, as shown in FIG. 1 , the system architecture 100 may include terminal devices 101 , 102 , 103 , a network 104 and a server 105 . The network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 . The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种通讯客户端应用,例如字符识别类应用、文本处理类应用、语音识别类应用、短视频社交类应用、网络会议类应用、网页浏览器应用、购物类应用、搜索类应用、即时通信工具、邮箱客户端、社交平台软件等。The user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like. Various communication client applications can be installed on the terminal devices 101, 102 and 103, such as character recognition applications, text processing applications, speech recognition applications, short video social applications, web conferencing applications, web browser applications, Shopping applications, search applications, instant messaging tools, email clients, social platform software, etc.
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103为硬件时,可以是具有视频采集设备(例如摄像头)、手写板和显示屏的各种电子设备,包括但不限于智能手机、平板电脑、电子书阅读器、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、 膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的终端设备中。其可以实现成多个软件或软件模块(例如用来提供字符位置修正服务),也可以实现成单个软件或软件模块。在此不做具体限定。The terminal devices 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, and 103 are hardware, they can be various electronic devices with video capture devices (such as cameras), handwriting pads and display screens, including but not limited to smart phones, tablet computers, e-book readers, MP3 Players (Moving Picture Experts Group Audio Layer III, Moving Picture Experts Compression Standard Audio Layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer 4) Players, Laptops and Desktops computer, etc. When the terminal devices 101, 102 and 103 are software, they can be installed in the terminal devices listed above. It can be implemented as a plurality of software or software modules (for example, to provide character position correction services), or can be implemented as a single software or software module. There is no specific limitation here.
在一些情况下,本公开所提供的字符位置修正方法可以由终端设备101、102、103执行,相应地,字符位置修正装置可以设置于终端设备101、102、103中。这时,系统架构100也可以不包括服务器105。In some cases, the character position correction method provided by the present disclosure may be performed by the terminal devices 101 , 102 , and 103 , and correspondingly, the character position correction apparatus may be provided in the terminal devices 101 , 102 , and 103 . In this case, the system architecture 100 may not include the server 105 .
在一些情况下,本公开所提供的字符位置修正方法可以由终端设备101、102、103和服务器105共同执行,例如,“获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列”的步骤可以由终端设备101、102、103执行,“对于单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作”等步骤可以由服务器105执行。本公开对此不做限定。相应地,字符位置修正装置也可以分别设置于终端设备101、102、103和服务器105中。In some cases, the character position correction method provided by the present disclosure may be executed jointly by the terminal devices 101, 102, 103 and the server 105, for example, "obtain the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image" The steps of “perform the location update operation for each word detection and recognition result in the sequence of word detection and recognition results” can be performed by the terminal devices 101 , 102 , and 103 , etc., can be performed by the server 105 . This disclosure does not limit this. Correspondingly, the character position correction device may also be provided in the terminal devices 101 , 102 , 103 and the server 105 respectively.
在一些情况下,本公开所提供的字符位置修正方法可以由服务器105执行,相应地,字符位置修正装置也可以设置于服务器105中,这时,系统架构100也可以不包括终端设备101、102、103。In some cases, the character position correction method provided by the present disclosure may be executed by the server 105. Correspondingly, the character position correction device may also be provided in the server 105. In this case, the system architecture 100 may not include the terminal devices 101 and 102. , 103.
需要说明的是,服务器105可以是硬件,也可以是软件。当服务器105为硬件时,可以实现成多个服务器组成的分布式服务器集群,也可以实现成单个服务器。当服务器105为软件时,可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块。在此不做具体限定。It should be noted that the server 105 may be hardware or software. When the server 105 is hardware, it can be implemented as a distributed server cluster composed of multiple servers, or can be implemented as a single server. When the server 105 is software, it can be implemented as a plurality of software or software modules (for example, for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。It should be understood that the numbers of terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
继续参考图2,其示出了根据本公开的字符位置修正方法的一些实施例的流程200,该字符位置修正方法包括以下步骤:Continuing to refer to FIG. 2, there is shown a flow 200 of some embodiments of a character position correction method according to the present disclosure, the character position correction method comprising the following steps:
步骤201,获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列。Step 201: Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.
在本实施例中,字符位置修正方法的执行主体(例如图1所示的 终端设备101、102、103)可以本地或者远程地从与上述执行主体网络连接的其他电子设备(例如,图1所示的服务器105)获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列。In this embodiment, the execution body of the character position correction method (for example, the terminal devices 101, 102, and 103 shown in FIG. 1 ) can be locally or remotely from other electronic devices (for example, shown in FIG. 1 ) connected to the above-mentioned execution body network. The server 105 shown) obtains the text line recognition result sequence and the word detection and recognition result sequence corresponding to the text line image.
这里,文本行图像可以是包括文本行对象的图像。这里,文本行对象中各字符可以具有相同尺寸或者也可以具有不相同尺寸。文本行对象中的各字符可以由同一种语言的字符组成的,也可以有多于一种语言的字符所组成的,本公开对此不做具体限定。Here, the text line image may be an image including a text line object. Here, each character in the text line object may have the same size or may have different sizes. Each character in the text line object may be composed of characters of the same language, or may be composed of characters of more than one language, which is not specifically limited in the present disclosure.
文本行图像对应的文本行识别结果序列可以是采用文本行OCR技术对文本行图像进行识别所得到的。文本行识别结果序列可以是由文本行识别结果组成的序列,而文本行识别结果可以包括第一字符和第一字符包围盒位置,这里,第一字符包围盒位置可用于表征第一字符对应在文本行图像中的位置范围。文本行识别结果序列中文本行识别结果可以是按照该文本行识别结果中第一字符在文本行图像中的文本行对象中的顺序进行排列的。实践中,通常采用字符在文本行图像中的外接矩形作为字符的包围盒,相应地,这里第一字符包围盒位置可以采用各种实现方式表征第一字符在文本行图像中的外接矩形。例如,第一字符包围盒位置可以包括第一字符在文本行图像中的外接矩形的四个顶点坐标;又例如,第一字符包围盒位置还可以包括第一字符在文本行图像中的外接矩形的左上角顶点坐标以及外接矩形的长边和短边边长。The text line recognition result sequence corresponding to the text line image may be obtained by using the text line OCR technology to recognize the text line image. The sequence of text line recognition results may be a sequence consisting of text line recognition results, and the text line recognition results may include the first character and the position of the first character's bounding box. Here, the position of the first character's bounding box may be used to indicate that the first character corresponds to The position range in the image of the text line. The text line recognition results in the text line recognition result sequence may be arranged according to the order of the first character in the text line recognition result in the text line object in the text line image. In practice, the circumscribed rectangle of the character in the text line image is usually used as the bounding box of the character. Accordingly, the position of the bounding box of the first character can use various implementations to represent the circumscribed rectangle of the first character in the text line image. For example, the position of the bounding box of the first character may include the coordinates of four vertices of the bounding rectangle of the first character in the text line image; for another example, the position of the bounding box of the first character may further include the bounding rectangle of the first character in the text line image The coordinates of the top-left vertex of , and the lengths of the long and short sides of the bounding rectangle.
文本行图像对应的单字检测识别结果序列可以是采用基于单字检测和识别的OCR技术对文本行图像进行识别所得到的。单字检测识别结果序列可以是由单字检测识别结果组成的序列,而单字检测识别结果可以包括第二字符和第二字符包围盒位置,这里,第二字符包围盒位置可用于表征第二字符对应在文本行图像中的位置范围。单字检测识别结果序列中单字检测识别结果可以是按照该单字检测识别结果中第二字符在文本行图像中的文本行对象中的顺序进行排列的。类似于第一字符包围盒,第二字符包围盒也可以是采用各种实现方式表征第二字符在文本行图像中的外接矩形,在此不再赘述。The sequence of single-word detection and recognition results corresponding to the text-line image may be obtained by recognizing the text-line image by using the OCR technology based on single-word detection and recognition. The sequence of single-word detection and recognition results may be a sequence composed of single-word detection and recognition results, and the single-word detection and recognition results may include the second character and the position of the bounding box of the second character. Here, the position of the second character bounding box may be used to represent that the second character corresponds to The position range in the image of the text line. The single-word detection and recognition results in the single-word detection and recognition result sequence may be arranged according to the order of the second character in the single-word detection and recognition result in the text line object in the text line image. Similar to the first character bounding box, the second character bounding box may also be a circumscribed rectangle representing the second character in the text line image using various implementations, which will not be repeated here.
在一些实施方式中,文本行图像对应的文本行识别结果序列可以 是通过如下方式得到的:In some embodiments, the text line recognition result sequence corresponding to the text line image may be obtained in the following manner:
将文本行图像输入预先训练的文本行识别模型,得到与文本行图像对应的文本行识别结果序列。这里,文本行识别模型可用于表征包括待识别图像与文本行识别结果序列之间的对应关系。例如,文本行识别模型可以基于大量训练样本训练机器学习模型得到的。训练样本中可以包括样本文本行图像和对应的标注信息,标注信息中可以包括具体该样本文本行图像中每个字符的字符和相应的字符位置。Input the text line image into the pre-trained text line recognition model, and obtain the text line recognition result sequence corresponding to the text line image. Here, the text line recognition model can be used to characterize the correspondence between the image to be recognized and the text line recognition result sequence. For example, a text line recognition model can be obtained by training a machine learning model on a large number of training samples. The training sample may include a sample text line image and corresponding label information, and the label information may include specific characters and corresponding character positions of each character in the sample text line image.
在一些实施方式中,这里文本行识别模型可以包括顺序排列的卷积神经网络(CNN,Convolutional Neural Networks)、循环神经网络(RNN,Recurrent Neural Networks)和CTC(Connectionist Temporal Classification,连接时间分类)。具体可以是将文本行图像输入CNN,得到特征图像,再将特征图像输入RNN(具体可以为深层双向LSTM网络),在卷积得到的特征图像的基础上继续提取文字序列特征。再将文字序列特征经CTC进行信息倒推得到文本行识别结果序列。In some embodiments, the text line recognition model here may include sequentially arranged Convolutional Neural Networks (CNN, Convolutional Neural Networks), Recurrent Neural Networks (RNN, Recurrent Neural Networks) and CTC (Connectionist Temporal Classification, connection time classification). Specifically, the text line image can be input into the CNN to obtain the feature image, and then the feature image can be input into the RNN (specifically, it can be a deep bidirectional LSTM network), and the text sequence features can be extracted on the basis of the feature image obtained by convolution. Then, the text sequence features are reversed by CTC to obtain the text line recognition result sequence.
在一些实施方式中,这里文本行识别模型也可以包括顺序排列的卷积神经网络和基于注意力机制的循环神经网络。In some embodiments, the text line recognition model here may also include sequential convolutional neural networks and attention-based recurrent neural networks.
在一些实施方式中,文本行图像对应的单字检测识别结果序列可以是通过如下方式得到的:In some embodiments, the sequence of word detection and recognition results corresponding to the text line image may be obtained in the following manner:
首先,利用目标检测算法对文本行图像进行单字检测,得到至少一个字符包围盒位置。First, single-word detection is performed on the text line image by using the target detection algorithm to obtain the position of at least one character bounding box.
其次,按照检测得到的各字符包围盒位置从文本行图像中截取字符图像,以及将所截取的字符图像输入单字识别模型,得到对应的字符识别结果。例如,单字识别模型可以基于大量单字训练样本训练机器学习模型得到的。单字训练样本中可以包括样本单字图像和对应的字符。Secondly, character images are intercepted from the text line images according to the detected positions of each character bounding box, and the intercepted character images are input into the single-character recognition model to obtain corresponding character recognition results. For example, a single-word recognition model can be obtained by training a machine learning model based on a large number of single-word training samples. The single-word training samples may include sample single-word images and corresponding characters.
最后,对于检测得到的每个字符包围盒,用该字符包围盒对应的字符识别结果和该字符包围盒位置生成单字检测识别结果,以及按照该字符包围盒位置对应在文本行图像中的字符所在顺序,用所生成的单字识别生成单字检测识别结果序列。Finally, for each character bounding box obtained by detection, the character recognition result corresponding to the character bounding box and the position of the character bounding box are used to generate a single character detection and recognition result, and the character in the text line image corresponding to the position of the character bounding box is located. sequence, using the generated word recognition to generate a sequence of word detection and recognition results.
步骤202,对于单字检测识别结果序列中每个单字检测识别结果, 执行位置更新操作。 Step 202, for each word detection and recognition result in the sequence of word detection and recognition results, perform a position update operation.
在本实施例中,上述执行主体可以对步骤201中所获取的单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。在一些实施方式中,位置更新操作具体可以包括以下子步骤2021到子步骤2023:In this embodiment, the above-mentioned execution subject may perform a position update operation on each word detection and recognition result in the sequence of word detection and recognition results obtained in step 201 . In some embodiments, the location update operation may specifically include the following sub-steps 2021 to 2023:
子步骤2021,在文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果。Sub-step 2021, search for a text line recognition result in which the first character is the same as the second character in the word detection and recognition result in the text line recognition result sequence.
子步骤2022,响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果。Sub-step 2022, in response to finding at least one text line recognition result, among the found text line recognition results, determine the text line recognition result that is closest to the word detection and recognition result.
这里,可以采用各种实现方式确定文本行识别结果与单字检测识别结果之间的距离。Here, various implementation manners can be used to determine the distance between the text line recognition result and the word detection recognition result.
例如,可以以文本行识别结果在文本行识别结果序列中的排列顺序与单字检测识别结果在单词检测识别结果序列中的排序顺序之间的差值作为文本行识别结果与单字检测识别结果之间的距离。For example, the difference between the sorting order of the text line recognition results in the text line recognition result sequence and the sorting order of the single-character detection and recognition results in the word detection and recognition result sequence can be used as the difference between the text line recognition result and the single-word detection and recognition result the distance.
又例如,还可以以文本行识别结果中第一字符包围盒位置与单字检测识别结果中第二字符包围盒位置之间的距离作为文本行识别结果与单字检测识别结果之间的距离。For another example, the distance between the position of the first character bounding box in the text line recognition result and the position of the second character bounding box in the word detection and recognition result may be used as the distance between the text line recognition result and the word detection and recognition result.
子步骤2023,将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。Sub-step 2023: Update the position of the bounding box of the first character in the determined text line recognition result to the position of the second character in the single-word detection and recognition result.
下面结合具体例子来说明上述位置更新操作:The above location update operation is described below with specific examples:
如图3A所示,其中示出了文本行图像301对应的文本行识别结果序列302和单字检测识别结果序列303,从图中可看出,文本行识别结果序列302中的第一字符均识别正确,但第一字符包围盒位置较为粗略。而单字检测识别结果序列303中第二字符包围盒位置较为精确,但第二字符却存在识别错误,例如其中将“今”错误识别成了“头”,以及将“预”错误识别成了“顷”。As shown in FIG. 3A , the text line recognition result sequence 302 and the single character detection and recognition result sequence 303 corresponding to the text line image 301 are shown. As can be seen from the figure, the first character in the text line recognition result sequence 302 is recognized Correct, but the first character bounding box location is rough. However, the position of the bounding box of the second character in the single-character detection and recognition result sequence 303 is relatively accurate, but the second character has a recognition error. an hour".
基于上述文本行识别结果序列302和单字检测识别结果序列303执行步骤202可以是:Executing step 202 based on the above-mentioned text line recognition result sequence 302 and word detection and recognition result sequence 303 may be:
对于单字检测识别结果序列303中的每个单字检测识别结果序列 中每个单字检测识别结果,执行位置更新操作。For each word detection and recognition result in the single-word detection and recognition result sequence 303, a position update operation is performed.
其中,对单字检测识别结果序列303中的第二字符“头”和“顷”,在执行子步骤2021时,由于文本行识别结果序列302中不存在同样的第一字符,因此不再执行子步骤2022和2023。Among them, for the second characters "head" and "area" in the single-word detection and recognition result sequence 303, when sub-step 2021 is executed, since the same first character does not exist in the text line recognition result sequence 302, the sub-step 2021 is not executed any more. Steps 2022 and 2023.
对于对单字检测识别结果序列303中的第二字符中第一个出现的“天”,在执行子步骤2021时,查找到文本行识别结果中存在两个与“天”相同的第一字符对应的文本行识别结果;在执行子步骤2022时,确定与单字检测识别结果序列303中的第二字符中第一个出现的“天”对应的单字检测识别结果距离最近的是文本行识别结果中的第一个第一字符“天”对应的文本行识别结果;继而,在执行子步骤2023时,将文本行识别结果序列302中第一个第一字符“天”对应的文本行识别结果中的第一字符包围盒位置更新为单字检测识别结果序列303中的第二字符中第一个出现的“天”对应的第二包围盒位置。For the "day" that appears first in the second character in the word detection and recognition result sequence 303, when sub-step 2021 is executed, it is found that there are two first characters corresponding to the same "day" in the text line recognition result. When performing sub-step 2022, it is determined that the distance between the word detection and recognition results corresponding to the "day" that appears first in the second character in the single-word detection and recognition result sequence 303 is the closest in the text line recognition results. The text line recognition result corresponding to the first first character "day" of The position of the bounding box of the first character is updated to the position of the second bounding box corresponding to the "day" that appears first in the second character in the single-word detection and recognition result sequence 303 .
同理,对于对单字检测识别结果序列303中的第二字符中第二个出现的“天”也是类似过程,最终会将文本行识别结果序列302中第二个第一字符“天”对应的文本行识别结果中的第一字符包围盒位置更新为单字检测识别结果序列303中的第二字符中第二个出现的“天”对应的第二包围盒位置。In the same way, the process is similar for the second occurrence of "day" in the second character in the single-word detection and recognition result sequence 303, and finally the second first character "day" in the text line recognition result sequence 302 corresponds to The position of the bounding box of the first character in the text line recognition result is updated to the position of the second bounding box corresponding to the second occurrence of "day" in the second character in the sequence 303 of word detection and recognition results.
对于对单字检测识别结果序列303中的第二字符“的”、“气”、“报”,在执行子步骤2021时,查找到文本行识别结果中均存在一个分别与“的”、“气”、“报”相同的第一字符对应的文本行识别结果;因而在执行子步骤2022时,可直接将第一字符分别为“的”、“气”、“报”的文本行识别结果确定为相应的距离最近的文本行话识别结果;继而,在执行子步骤2023时,将文本行识别结果序列302中第一字符分别为“的”、“气”、“报”的文本行识别结果中的第一字符包围盒位置更新为单字检测识别结果序列303中的第二字符分别为“的”、“气”、“报”的单字检测识别结果中的第一字符包围盒位置。For the second characters "de", "qi" and "bao" in the single-word detection and recognition result sequence 303, when sub-step 2021 is executed, it is found that there are characters corresponding to "de", "qi" respectively in the text line identification results. ” and “report” are the same as the text line recognition results corresponding to the first characters; therefore, when sub-step 2022 is executed, the text line recognition results for which the first characters are “的”, “qi”, and “report” can be directly determined. is the corresponding shortest text jargon recognition result; then, when sub-step 2023 is executed, the first character in the text line recognition result sequence 302 is respectively "的", "qi", "report" in the text line recognition results The position of the first character bounding box of is updated to the position of the first character bounding box in the single-character detection and recognition results of which the second characters in the single-character detection and recognition result sequence 303 are "的", "qi", and "bao" respectively.
经过步骤202后,文本行识别结果序列302将如图3A中的下部文本行识别结果序列302所示,可见其中除在单字检测识别结果中第二字符识别错误的“今”和“预”对应的第一字符包围盒位置外,其 他第一字符包围盒位置均采用单字检测识别结果中第二字符包围盒位置进行了更新,相对于更新前,文字行识别结果序列中的第二字符包围盒位置的准确度更高,更适合于对字符位置要求较高的场景。After step 202, the text line recognition result sequence 302 will be as shown in the lower text line recognition result sequence 302 in FIG. 3A . It can be seen that “now” and “pre”, except for the second character recognition error in the single character detection and recognition result, correspond to Except for the position of the first character bounding box of The accuracy of the position is higher, and it is more suitable for scenes with higher requirements on the character position.
本公开的上述实施例提供的字符位置修正方法,通过利用文本行图像对应的单字检测识别结果中的字符包围盒位置相对于文本行识别结果中字符包围盒位置更为准确,而文本行识别结果中字符相对于单字检测识别中字符更加准确的特点,对文本行识别结果中的字符包围盒位置用单字检测识别结果中识别正确的字符的字符包围盒位置进行修正,提高了文本行识别结果中字符包围盒位置的准确度,更加适用于对字符位置要求较高的场景。The character position correction method provided by the above embodiments of the present disclosure, by using the character bounding box position in the word detection and recognition result corresponding to the text line image is more accurate than the character bounding box position in the text line recognition result, while the text line recognition result is more accurate. Compared with the characters in the single-character detection and recognition, the characters in the middle are more accurate. The position of the character bounding box in the text line recognition result is corrected with the character bounding box position of the correct character in the single-word detection and recognition result, which improves the text line recognition result. The accuracy of the position of the character bounding box is more suitable for scenes with high requirements on the character position.
继续参考图4,其示出了根据本公开的字符位置修正方法的又一些实施例的流程400。该字符位置修正方法,包括以下步骤:Continuing to refer to FIG. 4 , a flow 400 of further embodiments of the character position correction method according to the present disclosure is shown. The character position correction method includes the following steps:
步骤401,获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列。Step 401: Obtain a text line recognition result sequence and a word detection and recognition result sequence corresponding to the text line image.
步骤402,对于单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。 Step 402, for each word detection and recognition result in the word detection and recognition result sequence, perform a position update operation.
在本实施例中,步骤401和步骤402的具体操作及其所产生的技术效果与图2所示的实施例中步骤202和步骤202的操作及效果基本相同,在此不再赘述。In this embodiment, the specific operations and technical effects of steps 401 and 402 are basically the same as the operations and effects of steps 202 and 202 in the embodiment shown in FIG. 2 , and will not be repeated here.
步骤403,计算文本行识别结果序列的行内字符间隙平均值。Step 403: Calculate the average value of the inter-line character gap of the text line recognition result sequence.
在本实施例中,文本行识别结果序列中的每个文本行识别结果中第一字符包围盒位置可以包括行内起始位置和行内结束位置。这里,第一字符包围盒位置中的行内起始位置和行内结束位置分别用于表征第一字符在文本行图像中的外接矩形在平行于文本行方向的最小坐标值和最大坐标值。In this embodiment, the position of the bounding box of the first character in each text line recognition result in the text line recognition result sequence may include an in-line start position and an in-line end position. Here, the in-line start position and in-line end position in the bounding box position of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction parallel to the text line.
例如,当文本行图像中文本行为水平方向,文本行图像的坐标原点为文本行图像的左上角顶点,文本行中字符为水平方向从左到右排列。这时,第一字符对应的第一字符包围盒位置中的行内起始位置可以为第一字符的外接矩形的左上角或者左下角顶点坐标的横坐标值, 而行内结束位置可以为第一字符的外接矩形的右上角或者右下角顶点坐标的横坐标值。For example, when the text line in the text line image is in the horizontal direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged horizontally from left to right. At this time, the in-line starting position in the first character bounding box position corresponding to the first character may be the abscissa value of the upper left corner or the lower left corner vertex coordinate of the circumscribed rectangle of the first character, and the in-line ending position may be the first character The abscissa value of the vertex coordinates of the upper right or lower right corner of the bounding rectangle.
又例如,当文本行图像中文本行为垂直方向,文本行图像的坐标原点为文本行图像的左上角顶点,文本行中字符为垂直方向从上到下排列。这时,第一字符对应的第一字符包围盒位置中的行内起始位置可以为第一字符的外接矩形的左上角或者右上角顶点坐标的纵坐标值,而行内结束位置可以为第一字符的外接矩形的左下角或者右下角坐标的纵坐标值。For another example, when the text line in the text line image is in the vertical direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged vertically from top to bottom. At this time, the in-line starting position in the first character bounding box position corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character, and the in-line ending position may be the first character The ordinate value of the coordinates of the lower left or lower right corner of the bounding rectangle.
这里,文本行图像中的文本行并不一定限定具体方向。例如,文本行可以是水平方向从左到右排列,文本行也可以是垂直方向从上到下排列。文本行还可以是从左上向右下方排列。Here, the text line in the text line image does not necessarily define a specific direction. For example, text lines can be arranged horizontally from left to right, and text lines can also be arranged vertically from top to bottom. Text lines can also be arranged from top left to bottom right.
在本实施例中,文本行识别结果序列的行内字符间隙平均值为文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果中两相邻文本行识别结果中的在前文本行识别结果中行内结束位置和在后文本行识别结果中行内起始位置之间距离的平均值。In this embodiment, the average value of the character gaps in the text line recognition result sequence is the average value of the two adjacent text line recognition results in the text line recognition results whose positions of the text line recognition result sequences have been updated with the position of the bounding box of the first character. The average value of the distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result.
具体可以参考图3B所示,图3B中上部302示出的是图3A所示的经过步骤202修订第一字符包围盒位置后所对应的文本行识别结果序列中的各第一字符包围盒位置的放大图。For details, please refer to FIG. 3B , the upper part 302 in FIG. 3B shows the position of each first character bounding box in the text line recognition result sequence corresponding to the revision of the first character bounding box position in step 202 shown in FIG. 3A . enlarged view of .
从图3B中可以看出,文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果包括第一字符分别为“天”、“的”、“天”、“气”、“报”对应的文本行识别结果,其中,相邻第一字符“天”和第一字符“的”之间的字符间隙为d1,其中,相邻第一字符“的”和第一字符“天”之间的字符间隙为d2,其中,相邻第一字符“天”和第一字符“气”之间的字符间隙为d3。而文本行识别结果序列的行内字符间隙平均值则为d1、d2、d3的平均值d0=(d1+d2+d3)/3。As can be seen from FIG. 3B , in the text line recognition result sequence, the text line recognition results whose position of the bounding box of the first character has been updated each include the first characters as “天”, “的”, “天”, and “Qi” respectively. , the text line recognition result corresponding to "report", wherein, the character gap between the adjacent first character "tian" and the first character "de" is d1, wherein, the adjacent first character "de" and the first character The character gap between "tian" is d2, and the character gap between the adjacent first character "tian" and the first character "qi" is d3. The average value of the inter-line character gap of the text line recognition result sequence is the average value of d1, d2, and d3 d0=(d1+d2+d3)/3.
步骤404,对于文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,更新该文本行识别结果中的行内起始位置和行内结束位置。 Step 404 , for a text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, update the in-line start position and in-line end position in the text line recognition result.
在本实施例中,上述执行主体可以对于文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,根据步骤 403计算得到的文本行识别结果序列的行内字符间隙平均值,更新该文本行识别结果中的行内起始位置和行内结束位置。其中,该文本行识别结果在更新后的行内起始位置与文本行识别结果序列中该文本识别结果的前一文本识别结果的行内结束位置之间的距离为上述计算得到的行内字符间隙平均值,或,该文本行识别结果在更新后的行内结束位置与所述文本行识别结果序列中该文本识别结果的后一文本识别结果的行内起始位置之间的距离为上述计算得到的行内字符间隙平均值。In this embodiment, the above-mentioned execution body may, for the text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence have not been updated, calculate the inline character gap of the text line recognition result sequence obtained according to step 403 Average, update the inline start position and inline end position in the text line recognition result. Wherein, the distance between the updated in-line starting position of the text line recognition result and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence is the average value of the in-line character gap obtained by the above calculation , or, the distance between the in-line starting position of the text line recognition result in the updated in-line end position and the text line recognition result of the text line recognition result sequence is the in-line character obtained by the above calculation Gap average.
这里,继续以图3B所示的文本行识别结果序列302为例进行说明。从图2的实施例中相关描述可知,经过步骤202或者步骤402,文本行识别结果序列302中未被更新过行内起始位置和行内结束位置的文本行识别结果为在单字检测识别结果中第二字符识别错误的“今”和“预”对应的文本行识别结果。这里,步骤404中可以对文本行识别结果序列中第一字符为“今”和“预”的文本行识别结果中第一字符包围盒位置中行内起始位置和行内结束位置,根据上述计算得到的行内字符间隙平均值d0更新第一字符为“今”和“预”的文本行识别结果中行内起始位置和行内结束位置,并使得更新后第一字符“今”对应的行内结束位置与第一字符“天”对应的行内起始位置之间的距离为行内字符间隙平均值d0,以及使得更新后第一字符“预”对应的行内起始位置与第一字符“气”对应的行内结束位置间的距离为行内字符间隙平均值d0,和/或,更新后第一字符“预”对应的行内结束位置与第一字符“报”对应的行内起始位置间的距离为行内字符间隙平均值d0。更新后的文本序列识别结果序列可如图3B中下部302所示。Here, continue to take the text line recognition result sequence 302 shown in FIG. 3B as an example for description. It can be seen from the relevant description in the embodiment of FIG. 2 that after step 202 or step 402, the text line recognition result whose in-line start position and in-line end position in the text line recognition result sequence 302 have not been updated is the number one in the word detection and recognition result. The recognition results of the text lines corresponding to "jin" and "pre" for the two character recognition errors. Here, in step 404, the in-line start position and in-line end position in the bounding box position of the first character in the text line recognition result whose first characters in the text line recognition result sequence are "now" and "pre" can be obtained according to the above calculation. The average value d0 of the character gap in the line is to update the inline start position and inline end position in the text line recognition results where the first character is "jin" and "pre", and make the end position in the line corresponding to the first character "jin" after the update and The distance between the in-line starting positions corresponding to the first character "天" is the average value d0 of the in-line character gap, and the in-line starting position corresponding to the first character "pre" after the update is the same as the in-line corresponding to the first character "qi". The distance between the end positions is the average value d0 of the inline character gap, and/or, the distance between the inline end position corresponding to the first character "pre" after the update and the inline start position corresponding to the first character "report" is the inline character gap Average d0. The updated text sequence recognition result sequence may be shown as the lower part 302 in FIG. 3B .
从图3B中直观地可看出,图3B中下部所示的文本行识别结果序列302中第一字符包围盒位置相对于图3B中上部所示的文本行识别结果序列302中第一字符包围盒位置的准确性更高。具体体现在文本行内各字符间间距更加均匀。It can be seen intuitively from FIG. 3B that the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of FIG. 3B is relative to the position of the first character in the text line recognition result sequence 302 shown in the upper part of FIG. 3B The accuracy of the box position is higher. Specifically, the spacing between characters in a text line is more uniform.
由上述记载可见,对于文本行识别结果序列中未在步骤402中更新过第一字符包围盒位置的文本行识别结果,经过步骤403和步骤404,其行内起始位置和行内结束位置得到进一步修正,相对于更新前, 更新后该文本行识别结果更加贴近于文本行识别结果序列整体的行内字符间间隙。而实践中,大多情况下文本行中各字符之间倾向于等间距,因此经过步骤403和步骤404的操作,文本行识别结果序列中的第一字符包围盒位置的准确性进一步得到提高。It can be seen from the above record that, for the text line recognition result whose position of the first character bounding box has not been updated in step 402 in the text line recognition result sequence, after steps 403 and 404, its in-line start position and in-line end position are further revised. , compared with before the update, the text line recognition result after the update is closer to the inter-character gap in the entire text line recognition result sequence. In practice, the characters in the text line tend to have equal spacing in most cases. Therefore, after the operations of step 403 and step 404, the accuracy of the position of the bounding box of the first character in the text line recognition result sequence is further improved.
在一些实施方式中,上述执行主体还可以在执行完步骤402后,或者在执行完步骤404后,执行以下步骤405和步骤406:In some embodiments, the above-mentioned execution subject may also perform the following steps 405 and 406 after performing step 402, or after performing step 404:
步骤405,分别将文本行识别结果序列中被更新过行起始位置和行结束位置的各文本行识别结果的行起始位置中的最小值和行结束位置中的最大值确定为文本行起始位置和文本行结束位置。Step 405: Determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position. start position and end position of the text line.
这里,文本行识别结果序列中每个文本行识别结果中的第一包围盒位置还可以包括行起始位置和行结束位置。Here, the position of the first bounding box in each text line recognition result in the text line recognition result sequence may further include a line start position and a line end position.
这里,第一字符包围盒位置中的行起始位置和行结束位置分别用于表征第一字符在文本行图像中的外接矩形在垂直于文本行方向的最小坐标值和最大坐标值。Here, the line start position and the line end position in the bounding box of the first character are respectively used to represent the minimum coordinate value and the maximum coordinate value of the circumscribed rectangle of the first character in the text line image in the direction perpendicular to the text line.
例如,当文本行图像中文本行为水平方向,文本行图像的坐标原点为文本行图像的左上角顶点,文本行中字符为水平方向从左到右排列。这时,第一字符对应的第一字符包围盒位置中的行起始位置可以为第一字符的外接矩形的左上角或者右上角顶点坐标的纵坐标值,而行结束位置可以为第一字符的外接矩形的左下角或者右下角顶点坐标的纵坐标值。For example, when the text line in the text line image is in the horizontal direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged horizontally from left to right. At this time, the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the ordinate value of the upper left corner or the upper right corner vertex coordinate of the circumscribed rectangle of the first character, and the end position of the line may be the first character The ordinate value of the vertex coordinates of the lower left or lower right corner of the bounding rectangle.
又例如,当文本行图像中文本行为垂直方向,文本行图像的坐标原点为文本行图像的左上角顶点,文本行中字符为垂直方向从上到下排列。这时,第一字符对应的第一字符包围盒位置中的行起始位置可以为第一字符的外接矩形的左上角或者左下上角顶点坐标的横坐标值,而行结束位置可以为第一字符的外接矩形的右上角或者右下角坐标的横坐标值。For another example, when the text line in the text line image is in the vertical direction, the coordinate origin of the text line image is the upper left corner vertex of the text line image, and the characters in the text line are arranged vertically from top to bottom. At this time, the starting position of the line in the position of the bounding box of the first character corresponding to the first character may be the abscissa value of the upper left corner or the upper left corner vertex coordinate of the circumscribed rectangle of the first character, and the end position of the line may be the first The abscissa value of the upper right or lower right corner of the bounding rectangle of the character.
这里,文本行图像中的文本行并不一定限定具体方向。例如,文本行可以是水平方向从左到右排列,文本行也可以是垂直方向从上到下排列。文本行还可以是从左上向右下方排列。Here, the text line in the text line image does not necessarily define a specific direction. For example, text lines can be arranged horizontally from left to right, and text lines can also be arranged vertically from top to bottom. Text lines can also be arranged from top left to bottom right.
具体可以参考图3C所示,图3C中上部302示出的是图3B中下 部所示的文本行识别结果序列302。图3C中上部文本行识别结果序列302中被更新过行起始位置和行结束位置的文本行识别结果包括第一字符“天”、“的”、“天”、“气”、“报”对应的文本行识别结果。而上述各文本行识别结果的行起始位置中的最小值为第一字符“天”对应的行起始位置y1,即第一字符“天”的外接矩形上边对应的纵坐标;上述各文本行识别结果的行结束位置中的最大值为第一字符“报”对应的行结束位置y2,即第一字符“报”的外接矩形下边对应的纵坐标。因此,可以将y1和y2分别确定为文本行起始位置和文本行结束位置。Specifically, reference may be made to Fig. 3C. The upper part 302 in Fig. 3C shows the text line recognition result sequence 302 shown in the lower part in Fig. 3B. In the upper text line recognition result sequence 302 in FIG. 3C, the text line recognition results whose line start position and line end position have been updated include the first characters "天", "的", "天", "qi", and "bao" The corresponding text line recognition result. The minimum value of the line starting positions of the above text line recognition results is the line starting position y1 corresponding to the first character "天", that is, the ordinate corresponding to the upper side of the circumscribed rectangle of the first character "天"; the above texts The maximum value in the line end positions of the line recognition result is the line end position y2 corresponding to the first character "report", that is, the ordinate corresponding to the lower side of the circumscribed rectangle of the first character "report". Therefore, y1 and y2 can be determined as the starting position of the text line and the ending position of the text line, respectively.
步骤406,对于文本行识别结果序列中未被更新过行起始位置和行结束位置的文本行识别结果,分别用文本行起始位置和文本行结束位置更新该文本行识别结果中的行起始位置和行结束位置。 Step 406, for the text line recognition result that has not updated the line start position and the line end position in the text line recognition result sequence, update the line start position in the text line recognition result with the text line start position and the text line end position respectively. start position and line end position.
例如,对于图3C中上部的文本行识别序列302中的未被更新过行起始位置和行结束位置的第一字符分别为“今”和“预”的文本识别结果,将其中第一字符位置中的行起始位置和行结束位置更新为y1和y2,更新后的文本行识别结果序列302如图3C中下部所示。For example, for the text recognition result in which the first characters of the line start position and line end position that have not been updated in the text line recognition sequence 302 in the upper part of FIG. 3C are "now" and "pre", respectively, the first character The line start position and line end position in the position are updated to y1 and y2, and the updated text line recognition result sequence 302 is shown in the lower part of FIG. 3C .
从图3C中直观地可看出,图3C中下部所示的文本行识别结果序列302中第一字符包围盒位置相对于图3C中上部所示的文本行识别结果序列302中第一字符包围盒位置的准确性更高,具体体现在从行高上看,各个字符的行高得到了进一步逼近真实情况。It can be seen intuitively from FIG. 3C that the position of the first character bounding box in the text line recognition result sequence 302 shown in the lower part of FIG. 3C is relative to the position of the first character enclosing box in the text line recognition result sequence 302 shown in the upper part of FIG. 3C . The accuracy of the box position is higher, which is reflected in the line height of each character, which is further approximated to the real situation.
由上述记载可见,对于文本行识别结果序列中未未被更新过行起始位置和行结束位置的文本行识别结果,经过步骤405和步骤406,其行起始位置和行结束位置得到进一步修正,相对于更新前,更新后该文本行识别结果更加贴近于文本行识别结果序列整体的行高范围。而实践中,大多情况下文本行中各字符之间倾向于处于同一行进而处于相应的行高范围内,即处于一定的行起始位置和行结束位置之间,因此经过步骤405和步骤405的操作,文本行识别结果序列中的第一字符包围盒位置的准确性进一步得到提高。It can be seen from the above record that, for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, after step 405 and step 406, its line start position and line end position are further revised. , compared to before the update, the text line recognition result after the update is closer to the overall line height range of the text line recognition result sequence. In practice, in most cases, the characters in the text line tend to be in the same line and within the corresponding line height range, that is, between a certain line start position and line end position, so after steps 405 and 405 operation, the accuracy of the position of the bounding box of the first character in the text line recognition result sequence is further improved.
从图4中可以看出,与图2对应的实施例相比,本实施例中的字符位置修正方法的流程400多出了对未被更新过行内起始位置和行内结束位置的文本行识别结果,根据行内字符间隙平均值更新行内起始 位置和行内结束位置,以及在一些实施方式中,可以多出对未被更新过行内起始位置和行内结束位置的文本行识别结果更新行内起始位置和行内结束位置的操作。由此,本实施例描述的方案可以实现进一步提高文本行识别结果序列中第一字符包围盒位置的准确性。As can be seen from FIG. 4 , compared with the embodiment corresponding to FIG. 2 , the process 400 of the character position correction method in this embodiment has more text line recognition for the in-line start position and in-line end position that have not been updated. As a result, the in-line start position and the in-line end position are updated according to the average value of the in-line character gaps, and in some embodiments, the in-line start position and the in-line end position of the text line recognition results that have not been updated may be updated. Operations on position and inline end position. Therefore, the solution described in this embodiment can further improve the accuracy of the position of the bounding box of the first character in the text line recognition result sequence.
进一步参考图5,作为对上述各图所示方法的实现,本公开提供了一种字符位置修正装置的一些实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。Further referring to FIG. 5 , as an implementation of the methods shown in the above figures, the present disclosure provides some embodiments of a character position correction device, the device embodiments correspond to the method embodiments shown in FIG. 2 , the device specifically Can be used in various electronic devices.
如图5所示,本实施例的字符位置修正装置500包括:获取单元501和第一更新单元502。其中,获取单元501,被配置成获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列,其中,所述文本行识别结果包括第一字符和第一字符包围盒位置,所述单字检测识别结果包括第二字符和第二字符包围盒位置;第一更新单元502,被配置成对于所述单字检测识别结果序列中每个单字检测识别结果,执行以下位置更新操作:在所述文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。As shown in FIG. 5 , the character position correction device 500 in this embodiment includes: an acquisition unit 501 and a first update unit 502 . The obtaining unit 501 is configured to obtain a text line recognition result sequence and a single character detection and recognition result sequence corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the first character bounding box, the single character The detection and recognition result includes the second character and the position of the bounding box of the second character; the first update unit 502 is configured to perform the following position update operation for each word detection and recognition result in the sequence of single-word detection and recognition results: in the text In the line recognition result sequence, search for a text line recognition result in which the first character is the same as the second character in the single-word detection and recognition result; in response to finding at least one text line recognition result, in each of the found text line recognition results, determine The text line recognition result closest to the word detection and recognition result; the position of the bounding box of the first character in the determined text line recognition result is updated to the second character position in the word detection and recognition result.
在本实施例中,字符位置修正装置500的获取单元501和第一更新单元502的具体处理及其所带来的技术效果可分别参考图2对应实施例中步骤201和步骤202的相关说明,在此不再赘述。In this embodiment, for the specific processing of the acquisition unit 501 and the first update unit 502 of the character position correction device 500 and the technical effects brought about by them, please refer to the relevant descriptions of steps 201 and 202 in the corresponding embodiment of FIG. 2 , respectively. It is not repeated here.
在一些实施方式中,所述第一字符包围盒位置可以包括行内起始位置和行内结束位置;以及所述装置500还可以包括:In some embodiments, the first character bounding box position may include an in-line start position and an in-line end position; and the apparatus 500 may further include:
平均值计算单元503,被配置成计算所述文本行识别结果序列的行内字符间隙平均值,其中,所述文本行识别结果序列的行内字符间隙平均值为所述文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果中两相邻文本行识别结果中的在前文本行识别结果中行内结束位置和在后文本行识别结果中行内起始位置之间距离的 平均值;The average value calculation unit 503 is configured to calculate the average value of the character gaps in the text line recognition result sequence, wherein the average value of the inline character gaps in the text line recognition result sequence is the average value of each character gap in the text line recognition result sequence. The average distance between the inline end position in the previous text line recognition result and the inline start position in the next text line recognition result in the text line recognition result that has updated the first character's bounding box position in the two adjacent text line recognition results value;
第二更新单元504,被配置成对于所述文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,根据所述行内字符间隙平均值更新该文本行识别结果中的行内起始位置和行内结束位置,其中,该文本行识别结果在更新后的行内起始位置与所述文本行识别结果序列中该文本识别结果的前一文本识别结果的行内结束位置之间的距离为所述行内字符间隙平均值,和/或,该文本行识别结果在更新后的行内结束位置与所述文本行识别结果序列中该文本识别结果的后一文本识别结果的行内起始位置之间的距离为所述行内字符间隙平均值。The second updating unit 504 is configured to update the text line recognition result according to the average value of the in-line character gap for the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence The in-line starting position and the in-line ending position in , wherein, the text line recognition result is between the updated in-line starting position and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence. The distance between is the average value of the character gap in the line, and/or, the end position of the text line recognition result in the updated line starts from the line of the text recognition result following the text recognition result in the text line recognition result sequence. The distance between the starting positions is the average value of the character gap in the line.
在一些实施方式中,所述第一字符包围盒位置可以包括行起始位置和行结束位置;以及所述装置500还可以包括:In some embodiments, the first character bounding box position may include a line start position and a line end position; and the apparatus 500 may further include:
确定单元505,被配置成分别将所述文本行识别结果序列中被更新过行起始位置和行结束位置的各文本行识别结果的行起始位置中的最小值和行结束位置中的最大值确定为文本行起始位置和文本行结束位置;The determining unit 505 is configured to determine, respectively, the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence. The value is determined as the starting position of the text line and the ending position of the text line;
第三更新单元506,被配置成对于所述文本行识别结果序列中未被更新过行起始位置和行结束位置的文本行识别结果,分别用所述文本行起始位置和文本行结束位置更新该文本行识别结果中的行起始位置和行结束位置。The third updating unit 506 is configured to use the text line starting position and the text line ending position for the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, respectively. Update the line start position and line end position in the text line recognition result.
在一些实施方式中,所述文本行图像对应的文本行识别结果序列可以是通过如下方式得到的:In some embodiments, the text line recognition result sequence corresponding to the text line image may be obtained in the following manner:
将所述文本行图像输入预先训练的文本行识别模型,得到与所述文本行图像对应的文本行识别结果序列,其中,所述文本行识别模型用于表征包括待识别图像与文本行识别结果序列之间的对应关系。Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.
在一些实施方式中,所述文本行识别模型可以包括顺序排列的卷积神经网络、循环神经网络和连接时间分类CTC。In some embodiments, the text line recognition model may include sequential convolutional neural networks, recurrent neural networks, and connection time classification CTCs.
在一些实施方式中,所述文本行识别模型可以包括顺序排列的卷积神经网络和基于注意力机制的循环神经网络。In some embodiments, the text line recognition model may include sequential convolutional neural networks and attention-based recurrent neural networks.
在一些实施方式中,所述文本行图像对应的单字检测识别结果序 列可以是通过如下方式得到的:In some embodiments, the word detection and recognition result sequence corresponding to the text line image can be obtained in the following manner:
利用目标检测算法对所述文本行图像进行单字检测,得到至少一个字符包围盒位置;Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;
按照检测得到的各字符包围盒位置从所述文本行图像中截取字符图像,以及将所截取的字符图像输入单字识别模型,得到对应的字符识别结果;According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
对于检测得到的每个字符包围盒,用该字符包围盒对应的字符识别结果和该字符包围盒位置生成单字检测识别结果,以及按照该字符包围盒位置对应在文本行图像中的字符所在顺序,用所生成的单字识别生成单字检测识别结果序列。For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.
需要说明的是,本公开的实施例提供的字符位置修正装置中各单元的实现细节和技术效果可以参考本公开中其它实施例的说明,在此不再赘述。It should be noted that, for the implementation details and technical effects of each unit in the character position correction device provided by the embodiments of the present disclosure, reference may be made to the descriptions of other embodiments in the present disclosure, and details are not repeated here.
下面参考图6,其示出了适于用来实现本公开的电子设备的计算机系统600的结构示意图。图6示出的计算机系统600仅仅是一个示例,不应对本公开的实施例的功能和使用范围带来任何限制。Referring now to FIG. 6 , a schematic diagram of the structure of a computer system 600 suitable for implementing the electronic device of the present disclosure is shown. The computer system 600 shown in FIG. 6 is merely an example, and should not impose any limitations on the functionality and scope of use of the embodiments of the present disclosure.
如图6所示,计算机系统600可以包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有计算机系统600操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 6, computer system 600 may include a processing device (eg, central processing unit, graphics processor, etc.) 601 that may be loaded into random access according to a program stored in read only memory (ROM) 602 or from storage device 608 Various appropriate actions and processes are executed by the programs in the memory (RAM) 603 . In the RAM 603, various programs and data necessary for the operation of the computer system 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604. An input/output (I/O) interface 605 is also connected to bus 604 .
通常,以下装置可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风等的输入装置606;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及通信装置609。通信装置609可以允许计算机系统600与其他设备进行无线或有线通信以交换数据。虽然图6示出了具有各种装置的电子设备的计算机系统600,但是应理解的是, 并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices can be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, etc.; output devices 607 including, for example, a liquid crystal display (LCD), speakers, vibrators, etc. ; including storage devices 608 such as magnetic tapes, hard disks, etc.; and communication devices 609 . Communication means 609 may allow computer system 600 to communicate wirelessly or by wire with other devices to exchange data. While FIG. 6 illustrates a computer system 600 of an electronic device having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM 602被安装。在该计算机程序被处理装置601执行时,执行本公开的实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 609, or from the storage device 608, or from the ROM 602. When the computer program is executed by the processing apparatus 601, the above-described functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium can be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In this disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with computer-readable program code embodied thereon. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备实现如图2所示的实施例及其实施方式示出的字符位置修正方法,和/或,如图4所示的实施例及其实施方式示出的字符位置修正方法。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to realize the character position correction shown in the embodiment shown in FIG. 2 and its implementation manner. method, and/or the character position correction method shown in the embodiment shown in FIG. 4 and its implementation manner.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional Procedural programming language - such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开的实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,单元的名称在某种情况下并不构成对该单元本身的限定,例如,获取单元还可以被描述为“获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列的单元”。The units involved in the embodiments of the present disclosure may be implemented in software or hardware. Among them, the name of the unit does not constitute a limitation of the unit itself under certain circumstances. For example, the acquisition unit can also be described as "a unit that acquires the text line recognition result sequence and the single character detection recognition result sequence corresponding to the text line image". .
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is merely a preferred embodiment of the present disclosure and an illustration of the technical principles employed. Those skilled in the art should understand that the scope of the disclosure involved in the present disclosure is not limited to the technical solutions formed by the specific combination of the above-mentioned technical features, and should also cover, without departing from the above-mentioned disclosed concept, the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of its equivalent features. For example, a technical solution is formed by replacing the above features with the technical features disclosed in the present disclosure (but not limited to) with similar functions.

Claims (12)

  1. 一种字符位置修正方法,包括:A character position correction method, comprising:
    获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列,其中,所述文本行识别结果包括第一字符和第一字符包围盒位置,所述单字检测识别结果包括第二字符和第二字符包围盒位置;Acquire a text line recognition result sequence and a single character detection and recognition result sequence corresponding to the text line image, wherein the text line recognition result includes the first character and the position of the bounding box of the first character, and the single character detection and recognition result includes the second character and the first character. Two-character bounding box position;
    对于所述单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。For each word detection and recognition result in the single-word detection and recognition result sequence, a position update operation is performed.
  2. 根据权利要求1所述的方法,其中所述位置更新操作包括:在所述文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。The method according to claim 1, wherein the position update operation comprises: searching for a text line recognition result in which the first character is the same as the second character in the single character detection and recognition result in the text line recognition result sequence; in response to Find at least one text line recognition result, and in each of the found text line recognition results, determine the text line recognition result that is closest to the single word detection and recognition result; enclose the first character in the determined text line recognition result with a bounding box The position is updated to the second character position in the word detection and recognition result.
  3. 根据权利要求1所述的方法,其中,所述第一字符包围盒位置包括行内起始位置和行内结束位置;以及The method of claim 1, wherein the first character bounding box position includes an inline start position and an inline end position; and
    所述方法还包括:The method also includes:
    计算所述文本行识别结果序列的行内字符间隙平均值,其中,所述文本行识别结果序列的行内字符间隙平均值为所述文本行识别结果序列中各被更新过第一字符包围盒位置的文本行识别结果中两相邻文本行识别结果中的在前文本行识别结果中行内结束位置和在后文本行识别结果中行内起始位置之间距离的平均值;Calculate the average value of the in-line character gaps of the text line recognition result sequence, wherein the in-line character gap average value of the text line recognition result sequence is the average value of each updated first character bounding box position in the text line recognition result sequence. In the text line recognition result, the average value of the distance between the in-line end position in the previous text line recognition result and the in-line start position in the subsequent text line recognition result in the two adjacent text line recognition results;
    对于所述文本行识别结果序列中未被更新过行内起始位置和行内结束位置的文本行识别结果,根据所述行内字符间隙平均值更新该文本行识别结果中的行内起始位置和行内结束位置,其中,执行以下中的至少一个:该文本行识别结果在更新后的行内起始位置与所述文本行识别结果序列中该文本识别结果的前一文本识别结果的行内结束位置之间的距离为所述行内字符间隙平均值,或,该文本行识别结果在 更新后的行内结束位置与所述文本行识别结果序列中该文本识别结果的后一文本识别结果的行内起始位置之间的距离为所述行内字符间隙平均值。For the text line recognition result whose in-line start position and in-line end position have not been updated in the text line recognition result sequence, update the in-line start position and in-line end position in the text line recognition result according to the average value of the in-line character gap position, wherein at least one of the following is performed: between the in-line starting position of the text line recognition result after the update and the in-line end position of the text recognition result preceding the text recognition result in the text line recognition result sequence The distance is the average value of the character gap in the line, or, the text line recognition result is between the updated in-line end position and the in-line starting position of the text recognition result following the text recognition result in the text line recognition result sequence. The distance is the average of the inline character gaps.
  4. 根据权利要求1-3中任一项所述的方法,其中,所述第一字符包围盒位置包括行起始位置和行结束位置;以及The method of any one of claims 1-3, wherein the first character bounding box position includes a line start position and a line end position; and
    所述方法还包括:The method also includes:
    分别将所述文本行识别结果序列中被更新过行起始位置和行结束位置的各文本行识别结果的行起始位置中的最小值和行结束位置中的最大值确定为文本行起始位置和文本行结束位置;Respectively determine the minimum value in the line start position and the maximum value in the line end position of each text line recognition result whose line start position and line end position have been updated in the text line recognition result sequence as the text line start position and text line end position;
    对于所述文本行识别结果序列中未被更新过行起始位置和行结束位置的文本行识别结果,分别用所述文本行起始位置和文本行结束位置更新该文本行识别结果中的行起始位置和行结束位置。For the text line recognition result whose line start position and line end position have not been updated in the text line recognition result sequence, update the line in the text line recognition result with the text line start position and text line end position respectively. Start position and line end position.
  5. 根据权利要求1所述的方法,其中,所述文本行图像对应的文本行识别结果序列是通过如下方式得到的:The method according to claim 1, wherein the text line recognition result sequence corresponding to the text line image is obtained in the following manner:
    将所述文本行图像输入预先训练的文本行识别模型,得到与所述文本行图像对应的文本行识别结果序列,其中,所述文本行识别模型用于表征包括待识别图像与文本行识别结果序列之间的对应关系。Inputting the text line image into a pre-trained text line recognition model to obtain a text line recognition result sequence corresponding to the text line image, wherein the text line recognition model is used to represent the image to be recognized and the text line recognition result Correspondence between sequences.
  6. 根据权利要求5所述的方法,其中,所述文本行识别模型包括顺序排列的卷积神经网络、循环神经网络和连接时间分类CTC。6. The method of claim 5, wherein the text line recognition model comprises a sequential convolutional neural network, a recurrent neural network, and a connected temporal classification CTC.
  7. 根据权利要求5所述的方法,其中,所述文本行识别模型包括顺序排列的卷积神经网络和基于注意力机制的循环神经网络。The method of claim 5, wherein the text line recognition model comprises a sequential convolutional neural network and an attention-based recurrent neural network.
  8. 根据权利要求1所述的方法,其中,所述文本行图像对应的单字检测识别结果序列是通过如下方式得到的:The method according to claim 1, wherein the sequence of single-character detection and recognition results corresponding to the text line image is obtained by the following methods:
    利用目标检测算法对所述文本行图像进行单字检测,得到至少一个字符包围盒位置;Use a target detection algorithm to perform single-word detection on the text line image to obtain at least one character bounding box position;
    按照检测得到的各字符包围盒位置从所述文本行图像中截取字符图像,以及将所截取的字符图像输入单字识别模型,得到对应的字符识别结果;According to the detected position of each character bounding box, the character image is intercepted from the text line image, and the intercepted character image is input into the single-character recognition model to obtain the corresponding character recognition result;
    对于检测得到的每个字符包围盒,用该字符包围盒对应的字符识别结果和该字符包围盒位置生成单字检测识别结果,以及按照该字符包围盒位置对应在文本行图像中的字符所在顺序,用所生成的单字识别生成单字检测识别结果序列。For each character bounding box obtained by detection, use the character recognition result corresponding to the character bounding box and the position of the character bounding box to generate a single-character detection and recognition result, and the order of the characters in the text line image corresponding to the character bounding box position, A sequence of single-word detection and recognition results is generated using the generated single-word recognition.
  9. 一种字符位置修正装置,包括:A character position correction device, comprising:
    获取单元,被配置成获取文本行图像对应的文本行识别结果序列和单字检测识别结果序列,其中,所述文本行识别结果包括第一字符和第一字符包围盒位置,所述单字检测识别结果包括第二字符和第二字符包围盒位置;an acquisition unit, configured to acquire a sequence of text line recognition results and a sequence of single-character detection and recognition results corresponding to the text-line image, wherein the text line recognition results include a first character and a position of a first character bounding box, and the single-word detection and recognition results Including the second character and the position of the bounding box of the second character;
    第一更新单元,被配置成对于所述单字检测识别结果序列中每个单字检测识别结果,执行位置更新操作。The first update unit is configured to perform a position update operation for each word detection and recognition result in the sequence of single word detection and recognition results.
  10. 根据权利要求9所述的装置,其中所述位置更新操作包括:在所述文本行识别结果序列中查找第一字符与该单字检测识别结果中的第二字符相同的文本行识别结果;响应于查找到至少一个文本行识别结果,在所找到的各文本行识别结果中,确定与该单字检测识别结果距离最近的文本行识别结果;将所确定的文本行识别结果中的第一字符包围盒位置更新为该单字检测识别结果中的第二字符位置。The apparatus according to claim 9, wherein the position update operation comprises: searching for a text line recognition result in which the first character is the same as the second character in the single character detection and recognition result in the text line recognition result sequence; in response to Find at least one text line recognition result, and in each of the found text line recognition results, determine the text line recognition result that is closest to the single word detection and recognition result; enclose the first character in the determined text line recognition result with a bounding box The position is updated to the second character position in the word detection and recognition result.
  11. 一种电子设备,包括:An electronic device comprising:
    一个或多个处理器;one or more processors;
    存储装置,其上存储有一个或多个程序,a storage device on which one or more programs are stored,
    当所述一个或多个程序被所述一个或多个处理器执行时,使得所述一个或多个处理器实现如权利要求1-8中任一所述的方法。The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-8.
  12. 一种计算机可读存储介质,其上存储有计算机程序,其中, 所述计算机程序被一个或多个处理器执行时实现如权利要求1-8中任一所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program, when executed by one or more processors, implements the method of any one of claims 1-8.
PCT/CN2022/080874 2021-03-16 2022-03-15 Character position correction method and apparatus, electronic device and storage medium WO2022194130A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110304878.4 2021-03-16
CN202110304878.4A CN113033377A (en) 2021-03-16 2021-03-16 Character position correction method, character position correction device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2022194130A1 true WO2022194130A1 (en) 2022-09-22

Family

ID=76472557

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/080874 WO2022194130A1 (en) 2021-03-16 2022-03-15 Character position correction method and apparatus, electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN113033377A (en)
WO (1) WO2022194130A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033377A (en) * 2021-03-16 2021-06-25 北京有竹居网络技术有限公司 Character position correction method, character position correction device, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN111860348A (en) * 2020-07-21 2020-10-30 国网山东省电力公司青岛供电公司 Deep learning-based weak supervision power drawing OCR recognition method
CN111950555A (en) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 Text recognition method and device, readable medium and electronic equipment
CN112232341A (en) * 2020-12-10 2021-01-15 北京易真学思教育科技有限公司 Text detection method, electronic device and computer readable medium
CN113033377A (en) * 2021-03-16 2021-06-25 北京有竹居网络技术有限公司 Character position correction method, character position correction device, electronic equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020010547A1 (en) * 2018-07-11 2020-01-16 深圳前海达闼云端智能科技有限公司 Character identification method and apparatus, and storage medium and electronic device
CN109711412A (en) * 2018-12-27 2019-05-03 信雅达系统工程股份有限公司 A kind of optical character identification error correction method based on dictionary
CN110866529A (en) * 2019-10-29 2020-03-06 腾讯科技(深圳)有限公司 Character recognition method, character recognition device, electronic equipment and storage medium
CN111680684B (en) * 2020-03-16 2023-09-05 广东技术师范大学 Spine text recognition method, device and storage medium based on deep learning
CN112396049A (en) * 2020-11-19 2021-02-23 平安普惠企业管理有限公司 Text error correction method and device, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN110147786A (en) * 2019-04-11 2019-08-20 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN111860348A (en) * 2020-07-21 2020-10-30 国网山东省电力公司青岛供电公司 Deep learning-based weak supervision power drawing OCR recognition method
CN111950555A (en) * 2020-08-17 2020-11-17 北京字节跳动网络技术有限公司 Text recognition method and device, readable medium and electronic equipment
CN112232341A (en) * 2020-12-10 2021-01-15 北京易真学思教育科技有限公司 Text detection method, electronic device and computer readable medium
CN113033377A (en) * 2021-03-16 2021-06-25 北京有竹居网络技术有限公司 Character position correction method, character position correction device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113033377A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
US11023716B2 (en) Method and device for generating stickers
CN109614934B (en) Online teaching quality assessment parameter generation method and device
US20210042504A1 (en) Method and apparatus for outputting data
WO2020062493A1 (en) Image processing method and apparatus
CN109993150B (en) Method and device for identifying age
EP3893125A1 (en) Method and apparatus for searching video segment, device, medium and computer program product
CN111414879B (en) Face shielding degree identification method and device, electronic equipment and readable storage medium
CN112364860B (en) Training method and device of character recognition model and electronic equipment
WO2020029466A1 (en) Image processing method and apparatus
WO2020211573A1 (en) Method and device for processing image
CN109583389B (en) Drawing recognition method and device
US11915447B2 (en) Audio acquisition device positioning method and apparatus, and speaker recognition method and system
CN110059623B (en) Method and apparatus for generating information
WO2019080702A1 (en) Image processing method and apparatus
KR20210090576A (en) A method, an apparatus, an electronic device, a storage medium and a program for controlling quality
CN111402122A (en) Image mapping processing method and device, readable medium and electronic equipment
WO2021088790A1 (en) Display style adjustment method and apparatus for target device
CN110110666A (en) Object detection method and device
WO2022194130A1 (en) Character position correction method and apparatus, electronic device and storage medium
US10766143B1 (en) Voice controlled keyboard typing using computer vision
CN109829431B (en) Method and apparatus for generating information
CN113191257B (en) Order of strokes detection method and device and electronic equipment
CN111310595B (en) Method and device for generating information
CN117633228A (en) Model training method and device
CN112309389A (en) Information interaction method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22770483

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22770483

Country of ref document: EP

Kind code of ref document: A1