CN114764916A

CN114764916A - Text recognition processing method and device and related equipment

Info

Publication number: CN114764916A
Application number: CN202110001485.6A
Authority: CN
Inventors: 李一龙; 黄文辉; 曹俊峰; 王斌
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Communications Ltd Research Institute
Priority date: 2021-01-04
Filing date: 2021-01-04
Publication date: 2022-07-19

Abstract

The invention provides a text recognition processing method, a text recognition processing device and related equipment. The method comprises the following steps: performing text position detection on a target picture to obtain a first bounding box sequence, wherein the first bounding box sequence comprises a plurality of bounding boxes, and each bounding box comprises coordinate information and confidence; sequencing each bounding box in the first bounding box sequence according to a preset arrangement sequence to obtain a second bounding box sequence; based on the second bounding box sequence, sequentially executing target operations according to the order from the confidence coefficient to the minimum to obtain a third bounding box sequence; performing text recognition on the target picture based on the third bounding box sequence; wherein the target operation comprises: deleting the second target bounding box in the second bounding box sequence if the overlap ratio between the first target bounding box and the second target bounding box is greater than or equal to a first threshold. The embodiment of the invention improves the identification accuracy of the bounding box of the text.

Description

Text recognition processing method and device and related equipment

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a text recognition processing method, a text recognition processing device and related equipment.

Background

It is well known that non-maxima suppression is a common algorithm in object detection. Because the number of targets to be detected is not fixed on one picture, the current target detection algorithm adopts a large number of predicted boundary frames, and then selects a proper frame from the large number of boundary frames through a non-maximum suppression algorithm. The conventional non-maximum suppression algorithm usually compares the intersection ratio of two bounding boxes, and during text detection, because texts are generally not overlapped, the bounding boxes of the texts are determined in the intersection ratio mode, so that the accuracy of recognition of the bounding boxes of the texts is poor.

Disclosure of Invention

The embodiment of the invention provides a text recognition processing method, a text recognition processing device and related equipment, and aims to solve the problem of poor accuracy of text boundary box recognition.

In order to solve the problems, the invention is realized as follows:

in a first aspect, an embodiment of the present invention provides a text recognition processing method, where the method includes:

performing text position detection on a target picture to obtain a first bounding box sequence, wherein the first bounding box sequence comprises a plurality of bounding boxes, and each bounding box comprises coordinate information and confidence;

sequencing each bounding box in the first bounding box sequence according to a preset arrangement sequence to obtain a second bounding box sequence;

based on the second bounding box sequence, sequentially executing target operations according to the sequence of the confidence degrees from large to small to obtain a third bounding box sequence;

performing text recognition on the target picture based on the third bounding box sequence;

wherein the target operation comprises:

deleting a second target boundary box in the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is larger than or equal to a first threshold, wherein when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, the first target boundary box is the ith boundary box in the target boundary box sequence, i is a positive integer, and the second target boundary box is a boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target bounding box is the ith last bounding box in the target bounding box sequence, and the second target bounding box is any bounding box before the first target bounding box; the target bounding box sequence is a bounding box sequence obtained after the bounding box in the second bounding box sequence is adjusted by executing the target operation for the (i-1) th time.

In a second aspect, an embodiment of the present invention provides a text recognition processing apparatus, including:

the detection module is used for detecting the text position of the target picture to obtain a first bounding box sequence, wherein the first bounding box sequence comprises a plurality of bounding boxes, and each bounding box comprises coordinate information and confidence;

the sorting module is used for sorting the bounding boxes in the first bounding box sequence according to a preset sorting sequence to obtain a second bounding box sequence;

the execution module is used for sequentially executing target operations according to the sequence from the confidence coefficient to the minimum based on the second bounding box sequence to obtain a third bounding box sequence;

the identification module is used for performing text identification on the target picture based on the third bounding box sequence;

wherein the target operation comprises: deleting a second target boundary box from the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is greater than or equal to a first threshold, wherein the first target boundary box is the ith boundary box in the target boundary box sequence when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, i is a positive integer, and the second target boundary box is the boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target boundary box is the ith last boundary box in the target boundary box sequence, and the second target boundary box is any boundary box before the first target boundary box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence.

In a third aspect, an embodiment of the present invention further provides an electronic device, including: a memory, a processor, and a program stored on the memory and executable on the processor; the processor, configured to read a program in the memory to implement the steps of the method according to the first aspect; .

In a fourth aspect, the embodiment of the present invention further provides a readable storage medium for storing a program, where the program, when executed by a processor, implements the steps in the method according to the foregoing first aspect.

The method comprises the steps of detecting the text position of a target picture to obtain a first boundary box sequence, wherein the first boundary box sequence comprises a plurality of boundary boxes, and each boundary box comprises coordinate information and confidence; sequencing each bounding box in the first bounding box sequence according to a preset arrangement sequence to obtain a second bounding box sequence; based on the second bounding box sequence, sequentially executing target operations according to the order from the confidence coefficient to the minimum to obtain a third bounding box sequence; performing text recognition on the target picture based on the third bounding box sequence; wherein the target operation comprises: deleting a second target boundary box from the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is greater than or equal to a first threshold, wherein the first target boundary box is the ith boundary box in the target boundary box sequence when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, i is a positive integer, and the second target boundary box is the boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target boundary box is the ith last boundary box in the target boundary box sequence, and the second target boundary box is any boundary box before the first target boundary box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence. Because the boundary box of the text is determined according to the overlapping proportion of the boundary box based on the non-overlapping characteristic of the text, the recognition accuracy of the boundary box of the text is improved, and the recognition accuracy of the subsequent text can be improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is a schematic flowchart of a text recognition processing method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a text recognition processing apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an electronic device provided in the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," and the like in the embodiments of the present invention are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Further, as used herein, "and/or" means at least one of the connected objects, e.g., a and/or B and/or C, means 7 cases including a alone, B alone, C alone, and both a and B present, B and C present, a and C present, and A, B and C present.

Referring to fig. 1, fig. 1 is a schematic flowchart of a text recognition processing method according to an embodiment of the present invention. As shown in fig. 1, the text recognition processing method may include the steps of:

101, performing text position detection on a target picture to obtain a first bounding box sequence, wherein the first bounding box sequence comprises a plurality of bounding boxes, and each bounding box comprises coordinate information and confidence;

in the embodiment of the present application, the above bounding box may be understood as a vector set including five elements, which may be denoted as bbox, for example. Four of the five elements are coordinate information, and one element has no confidence. The coordinate information may indicate an area of the bounding box, and specifically, the coordinate information may include a maximum value xmax in an X-axis direction, a minimum value xmin in the X-axis direction, a maximum value ymax in a Y-axis direction, and a minimum value ymin in the Y-axis direction; where xmax and ymin represent coordinate information of the lower right corner of the bounding box, and xmin and ymax represent coordinate information of the upper left corner of the bounding box. The confidence may be understood as a predicted probability value of the bounding box, which may be expressed as score, for example.

102, sequencing all the bounding boxes in the first bounding box sequence according to a preset arrangement sequence to obtain a second bounding box sequence;

in the embodiment of the present application, the preset arrangement order may be an arrangement order in which the confidence degrees are arranged from large to small, or an arrangement order in which the confidence degrees are arranged from small to large. For example, if the first bounding box sequence includes five bounding boxes and the confidence levels are 0.90, 0.10, 0.93, 0.80, and 0.73, the second bounding box sequence obtained by arranging the first bounding box sequence in descending order of confidence levels is { bbox₁，bbox₂，bbox₃，bbox₄，bbox₅At this point bbox₁Confidence of 0.93, bbox₂Confidence of 0.90, bbox₃Confidence of 0.80, bbox₄Confidence of 0.73, bbox₅The confidence of (c) is 0.10.

It should be understood that when the confidence of the L bounding boxes (L is an integer greater than 1) is the same, the arrangement order of the L bounding boxes may be arbitrarily arranged, and is not further limited herein. For example, in some optional embodiments, it may be ensured that the relative positional relationship of the L bounding boxes in the second bounding box sequence is unchanged from the relative positional relationship of the L bounding boxes in the first bounding box sequence.

103, sequentially executing target operations according to the sequence of the confidence degrees from large to small on the basis of the second bounding box sequence to obtain a third bounding box sequence;

in the embodiment of the present application, sequentially executing the target operations according to the order from the greater confidence to the smaller confidence may be understood as executing the target operations for a plurality of times with respect to the second bounding box sequence, and finally obtaining the third bounding box sequence.

Taking the preset arrangement order and the arrangement order of the confidence degrees from large to small as an example, executing the target operation each time may be understood as executing the target operation for the boundary box arranged the first time based on all boundary boxes which are not currently executing the target operation in the second boundary box sequence, determining the repetition ratio between the boundary box and the rest boundary boxes, and deleting the boundary box with the larger repetition ratio. Until all bounding boxes have performed the target operation, or only the last bounding box has left to perform no target operation. In other words, in the embodiment of the present application, the target operation includes:

deleting a second target boundary box in the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is larger than or equal to a first threshold, wherein when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, the first target boundary box is the ith boundary box in the target boundary box sequence, i is a positive integer, and the second target boundary box is a boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target boundary box is the ith last boundary box in the target boundary box sequence, and the second target boundary box is any boundary box before the first target boundary box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence.

It should be noted that, the calculation manner of the overlap ratio between the first target bounding box and the second target bounding box may be set according to actual needs, for example, in some embodiments, the overlap ratio is determined as a cover score, and the cover score satisfies:

wherein bbox_s1Representing the first object bounding Box, bbox_s2Represents the second target bounding box, | bbox_s1∩bbox_s2| represents an overlapping area of the first target bounding box and the second bounding box, | bbox_s1And | represents the area of the first target bounding box.

In a text recognition scene, the condition that a small boundary box is contained in a large boundary box does not occur in the boundary box of characters, so that the boundary box with a small confidence coefficient is deleted according to the repetition proportion among the boundary boxes, the accuracy of boundary box recognition can be improved, and the accuracy of subsequent text recognition is improved.

And 104, performing text recognition on the target picture based on the third bounding box sequence.

In the embodiment of the present application, reference may be made to related technologies for a manner of performing text recognition on a target picture based on a third bounding box sequence. The specific process is as follows: and performing pixel extraction on the target picture based on the bounding boxes in the third bounding box sequence to obtain a sub-image corresponding to each bounding box, performing text recognition on each sub-image of the position information of each bounding box, and finally splicing to obtain a finally output target text or recognizing a text in a specified bounding box to obtain a finally output target text.

The method comprises the steps of detecting the text position of a target picture to obtain a first boundary box sequence, wherein the first boundary box sequence comprises a plurality of boundary boxes, and each boundary box comprises coordinate information and confidence; sequencing each bounding box in the first bounding box sequence according to a preset arrangement sequence to obtain a second bounding box sequence; based on the second bounding box sequence, sequentially executing target operations according to the sequence of the confidence degrees from large to small to obtain a third bounding box sequence; performing text recognition on the target picture based on the third bounding box sequence; wherein the target operation comprises: deleting a second target boundary box in the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is larger than or equal to a first threshold, wherein when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, the first target boundary box is the ith boundary box in the target boundary box sequence, i is a positive integer, and the second target boundary box is a boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target boundary box is the ith last boundary box in the target boundary box sequence, and the second target boundary box is any boundary box before the first target boundary box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence. Because the boundary box of the text is determined according to the overlapping proportion of the boundary box based on the non-overlapping characteristic of the text, the recognition accuracy of the boundary box of the text is improved, and the recognition accuracy of the subsequent text can be improved.

Optionally, in some embodiments, the step of sorting the bounding boxes in the first bounding box sequence according to a preset sorting order to obtain a second bounding box sequence includes:

sequencing each bounding box in the first bounding box sequence according to the preset sequencing order to obtain a middle bounding box sequence;

and deleting the boundary box with the confidence coefficient smaller than a second threshold value in the intermediate boundary box sequence to obtain the second boundary box sequence.

In the embodiment of the present application, the size of the second threshold may be set according to actual needs, and it can be understood that the bounding box is an unreasonable bounding box when the confidence is smaller than the second threshold. For example, in some embodiments, the second threshold may be 0.5. That is, after the bounding boxes in the first bounding box sequence are arranged according to the confidence degree from high to low, the bounding boxes with the confidence degree lower than 05 are deleted, so that the second bounding box sequence can be obtained.

For example, the first bounding box sequence includes 5 bounding boxes with corresponding confidences of 0.90, 0.10, 0.93, 0.80, and 0.73, respectively. And arranging according to the confidence degrees from high to low, and deleting the bounding boxes with the confidence degrees smaller than 05 to obtain a second bounding box sequence comprising four bounding boxes, wherein the corresponding confidence degrees are 0.93, 0.90, 0.80 and 0.73 respectively.

In the embodiment of the application, the bounding box with the confidence coefficient smaller than the second threshold value is deleted, so that the calculation amount of the repetition proportion during target operation can be reduced, the speed of determining the bounding box can be increased, and the speed of text recognition can be increased.

Optionally, in some embodiments, the target operation further comprises:

under the condition that the overlapping proportion between a first target boundary box and a second target boundary box is smaller than the first threshold and larger than 0, calculating a first intersection ratio of the first target boundary box and the second target boundary box in the Y-axis direction;

calculating a merged bounding box of the first target bounding box and a second target bounding box when the first intersection ratio is greater than or equal to a third threshold;

and updating the first target bounding box into the combined bounding box, and deleting the second target bounding box.

In the embodiment of the present application, the first intersection ratio may be represented as yIoU, and since the cover score >0, the two bounding boxes necessarily intersect, so that yIoU > 0. If the yIoU is greater than the third threshold, it indicates that the two bounding boxes repeatedly cover part of the text content of the same line of text. For example, the content of the first line of text covered by the first target bounding box is "ABCDE", the content of the first line of text covered by the second target bounding box is "EHIJK", and at this time, the first target bounding box and the second target bounding box satisfy that yIoU is greater than the third threshold, and the first target bounding box and the second target bounding box are merged, so that the merged bounding box merges the first target bounding box with the merged bounding box, and deletes the second target bounding box, which can ensure that the texts in the same line should be detected as the same bounding box. The embodiment of the application can avoid that the repeated covered characters are repeatedly identified to cause errors in subsequent text identification, so that the embodiment of the application further improves the accuracy of text identification.

In addition, in the embodiment of the present application, when the first target bounding box is updated, only the coordinate information of the first target bounding box may be updated, and the confidence level is not updated. Of course in other embodiments, the confidence level may also be updated, such as updating the confidence level to be the average of the confidence levels of the two bounding boxes.

Optionally, the calculation method of the yIoU may be:

wherein, ymax_s1The most extreme in Y-axis direction of the first target bounding boxGreat coordinate value, ymin_s1Minimum coordinate value, xmax, indicating the Y-axis direction of the first target bounding box_s1Maximum coordinate value, xmin, representing the X-axis direction of the first target bounding box_s1Minimum coordinates representing the X-axis direction of the first target bounding box; ymax_s2Maximum coordinate value, ymin, indicating the Y-axis direction of the second target bounding box_s2Minimum coordinate value, xmax, indicating the Y-axis direction of the second target bounding box_s2Maximum coordinate value, xmin, indicating the X-axis direction of the second target bounding box_s2The minimum coordinate in the X-axis direction of the second target bounding box is represented.

For example, in some embodiments, the updated first target bounding box satisfies:

the minimum coordinate value in the X-axis direction is min { xmin_s1,xmin_s2}；

The maximum coordinate value in the X-axis direction is max { xmax_s1,xmax_s2}；

The minimum coordinate value in the Y-axis direction is min { ymin_s1,ymin_s2}；

The maximum coordinate value in the Y-axis direction is max { ymax_s1,ymax_s2}；

Confidence of (score)_s1+score_s2) /2, wherein, score_s1Score for confidence of the first target bounding Box before update_s1Is the confidence of the second target bounding box.

Optionally, in some embodiments, the target operation further comprises:

under the condition that the overlapping proportion between a first target boundary box and a second target boundary box is smaller than the first threshold and larger than 0 and the first intersection ratio is smaller than a third threshold, calculating a second intersection ratio of the first target boundary box and the second target boundary box in the X-axis direction;

determining a target adjustment amount deltay if the second intersection ratio is greater than or equal to a fourth threshold;

adjusting the Y-axis coordinate value of the first target boundary box and the Y-axis coordinate value of the second target boundary box according to the target adjustment amount;

wherein Δ y ═ λ max { ym }ax_s1-ymin_s2,ymax_s2-ymin_s1}；ymax_s1Y-axis maximum coordinate value, ymin, representing the first target bounding box_s1Minimum coordinate value of Y-axis, ymax, representing the first target bounding box_s2Y-axis maximum coordinate value, ymin, representing second target bounding box_s2A Y-axis minimum coordinate value representing the second target bounding box,

score_s1representing a confidence level of the first target bounding box; score_s2Representing the confidence of the second target bounding box.

In the embodiment of the present application, the second intersection ratio may be represented as an xlou. If the xlou is greater than the fourth threshold, it may indicate that multiple lines of text are covered in the same bounding box, i.e. multiple lines of text are detected in the same bounding box. For example, the first target bounding box covers the first line of text and covers the portion of the second line of text that is adjacent to the first line of text, and the first target bounding box covers the second line of text and covers the portion of the first line of text that is adjacent to the first line of text. At the moment, the Y-axis coordinate values of the first target boundary box and the second target boundary box are adjusted, so that the same line of characters is only covered by one boundary box, and the accuracy of subsequent text recognition is ensured.

Optionally, the calculation method of the xlou may be:

wherein, ymax_s1Maximum coordinate value, ymin, indicating the Y-axis direction of the first target bounding box_s1Minimum coordinate value, xmax, indicating the Y-axis direction of the first target bounding box_s1Maximum coordinate value, xmin, representing the X-axis direction of the first target bounding box_s1Minimum coordinates representing the X-axis direction of the first target bounding box; ymax_s2A maximum coordinate value, ymin, indicating the Y-axis direction of the second target bounding box_s2Minimum coordinate value, xmax, indicating Y-axis direction of the second target bounding box_s2RepresentMaximum coordinate value, xmin, of the second target bounding box in the X-axis direction_s2The minimum coordinate in the X-axis direction of the second target bounding box is represented.

It should be noted that, the manner of adjusting the Y-axis coordinate values of the first target boundary box and the second target boundary box may be set according to actual needs, for example, in some embodiments, the adjusting the Y-axis coordinate values of the first target boundary box and the second target boundary box according to the target adjustment amount includes:

in ymin_s1＞ymin_s2In the case of (3), the minimum coordinate value of the Y axis of the first target boundary frame and the maximum coordinate value of the Y axis of the second target boundary frame are both adjusted to ymin_s1+Δy；

In ymin_s1≤ymin_s2In the case of (3), the Y-axis maximum coordinate value of the first target boundary frame and the Y-axis minimum coordinate value of the second target boundary frame are both adjusted to ymin_s2+Δy。

In the examples of this application, ymin_s1＞ymin_s2When the target boundary is located above the second target boundary, the minimum coordinate value of the Y axis of the first target boundary and the maximum coordinate value of the Y axis of the second target boundary are both adjusted to ymin_s1+ Δ y, so that it can be guaranteed that there is no overlapping coverage area for the first target bounding box and the second target bounding box; ymin_s1≤ymin_s2It may be indicated that the first target bounding box is located below the second target bounding box. By adjusting both the Y-axis maximum coordinate value of the first target bounding box and the Y-axis minimum coordinate value of the second target bounding box to ymin_s2+ Δ y, so that it can be guaranteed that there is no overlapping coverage area of the first target bounding box and the second target bounding box. Therefore, in the embodiment of the application, after the Y-axis coordinate value of the first target boundary box and the Y-axis coordinate value of the second target boundary box are adjusted, the area which is not covered by the first target boundary box and the second target boundary box integrally can be ensured to be adjusted, so that the recognition area can be prevented from being lost, and the accuracy of subsequent text recognition is ensured.

It should be noted that the second bounding box sequence may be sorted after each execution of the target operation, so as to avoid that the second bounding box has a null element. For example, after deleting a bounding box, subsequent bounding boxes may be filled forward.

Referring to fig. 2, fig. 2 is a structural diagram of a text recognition processing apparatus according to an embodiment of the present invention. As shown in fig. 2, the text recognition processing device 200 includes:

the detection module 201 is configured to perform text position detection on a target picture to obtain a first bounding box sequence, where the first bounding box sequence includes a plurality of bounding boxes, and each bounding box includes coordinate information and a confidence level;

a sorting module 202, configured to sort, according to a preset sorting order, each bounding box in the first bounding box sequence, to obtain a second bounding box sequence;

the execution module 203 is configured to sequentially execute target operations according to the order from the maximum confidence degree to the minimum confidence degree based on the second bounding box sequence to obtain a third bounding box sequence;

an identifying module 204, configured to perform text identification on the target picture based on the third bounding box sequence;

wherein the target operation comprises: deleting a second target boundary box from the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is greater than or equal to a first threshold, wherein the first target boundary box is the ith boundary box in the target boundary box sequence when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, i is a positive integer, and the second target boundary box is the boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target bounding box is the ith last bounding box in the target bounding box sequence, and the second target bounding box is any bounding box before the first target bounding box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence.

Optionally, the sorting module 202 includes:

the sorting unit is used for sorting the bounding boxes in the first bounding box sequence according to the preset sorting order to obtain a middle bounding box sequence;

and the processing unit is used for deleting the boundary box with the confidence coefficient smaller than a second threshold value in the intermediate boundary box sequence to obtain the second boundary box sequence.

Optionally, the target operation further comprises:

under the condition that the overlapping proportion between a first target bounding box and a second target bounding box is smaller than the first threshold value and larger than 0, calculating a first intersection ratio of the first target bounding box and the second target bounding box in the Y-axis direction;

Optionally, the target operation further comprises:

determining a target adjustment amount Δ y in the case that the second intersection ratio is greater than or equal to a fourth threshold;

wherein Δ y ═ λ max { ymax }_s1-ymin_s2,ymax_s2-ymin_s1}；ymax_s1Y-axis maximum coordinate value, ymin, representing the first target bounding box_s1Minimum coordinate value of Y-axis, ymax, representing the first target bounding box_s2Representing a second target edgeMaximum coordinate value of Y-axis of bounding box, ymin_s2A Y-axis minimum coordinate value representing the second target bounding box,

Optionally, the executing module 203 is specifically configured to: in ymin_s1＞ymin_s2In the case of (3), the minimum coordinate value of the Y axis of the first target boundary frame and the maximum coordinate value of the Y axis of the second target boundary frame are both adjusted to ymin_s1+ Δ y; in ymin_s1≤ymin_s2In the case of (3), the Y-axis maximum coordinate value of the first target boundary frame and the Y-axis minimum coordinate value of the second target boundary frame are both adjusted to ymin_s2+Δy。

The text recognition processing apparatus 200 can implement each process of the method embodiment in fig. 1 in the embodiment of the present invention, and achieve the same beneficial effects, and for avoiding repetition, the details are not described here again.

The embodiment of the invention also provides the electronic equipment. Referring to fig. 3, the electronic device may include a processor 301, a memory 302, and a program 3021 stored on the memory 302 and operable on the processor 301.

When executed by the processor 301, the program 3021 may implement any of the steps of the method embodiment shown in fig. 1 and achieve the same advantages, and thus, the description thereof is omitted here.

Those skilled in the art will appreciate that all or part of the steps of the method according to the above embodiments may be implemented by hardware associated with program instructions, and the program may be stored in a readable medium. An embodiment of the present invention further provides a readable storage medium, where a computer program is stored on the readable storage medium, and when the computer program is executed by a processor, any step in the method embodiment corresponding to fig. 1 may be implemented, and the same technical effect may be achieved, and in order to avoid repetition, details are not repeated here.

The storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

While the foregoing is directed to the preferred embodiment of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made therein without departing from the principles of the invention as set forth in the appended claims.

Claims

1. A text recognition processing method, the method comprising:

sequencing all the bounding boxes in the first bounding box sequence according to a preset sequencing order to obtain a second bounding box sequence;

wherein the target operation comprises:

deleting a second target boundary box in the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is larger than or equal to a first threshold, wherein when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, the first target boundary box is the ith boundary box in the target boundary box sequence, i is a positive integer, and the second target boundary box is a boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target bounding box is the ith last bounding box in the target bounding box sequence, and the second target bounding box is any bounding box before the first target bounding box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence.

2. The method according to claim 1, wherein the step of sorting the bounding boxes in the first bounding box sequence according to a preset sorting order to obtain a second bounding box sequence comprises:

3. The method of claim 1, wherein the target operation further comprises:

4. The method of claim 3, wherein the target operation further comprises:

wherein Δ y ═ λ max { ymax }_s1-ymin_s2，ymax_s2-ymin_s1}；ymax_s1Y-axis maximum coordinate value, ymin, representing the first target bounding box_s1Minimum coordinate value of Y-axis, ymax, representing the first target bounding box_s2Y-axis maximum coordinate value, ymin, representing second target bounding box_s2A Y-axis minimum coordinate value representing the second target bounding box,

score_s1representing a confidence level of the first target bounding box; score_s2Representing the confidence of the second object bounding box.

5. The method of claim 4, wherein said adjusting the Y-axis coordinate values of the first target bounding box and the Y-axis coordinate values of the second target bounding box by the target adjustment amount comprises:

in ymin_s1＞ymin_s2In the case of (3), the minimum Y-axis coordinate value of the first target boundary frame and the maximum Y-axis coordinate value of the second target boundary frame are both adjusted to ymin_s1+Δy；

6. A text recognition processing apparatus characterized by comprising:

the detection module is used for detecting the text position of a target picture to obtain a first boundary box sequence, wherein the first boundary box sequence comprises a plurality of boundary boxes, and each boundary box comprises coordinate information and confidence;

the execution module is used for sequentially executing target operations according to the sequence from the confidence degree to the minimum based on the second bounding box sequence to obtain a third bounding box sequence;

wherein the target operation comprises: deleting a second target boundary box in the second boundary box sequence under the condition that the overlapping proportion between the first target boundary box and the second target boundary box is larger than or equal to a first threshold, wherein when the preset arrangement sequence is the arrangement sequence with the confidence coefficient from large to small, the first target boundary box is the ith boundary box in the target boundary box sequence, i is a positive integer, and the second target boundary box is a boundary box before the first target boundary box; when the preset arrangement sequence is an arrangement sequence with confidence coefficient from small to large, the first target boundary box is the ith last boundary box in the target boundary box sequence, and the second target boundary box is any boundary box before the first target boundary box; the target bounding box sequence is a bounding box sequence obtained after the target operation is executed for the (i-1) th time to adjust the bounding box in the second bounding box sequence.

7. The apparatus of claim 6, wherein the ordering module comprises:

8. The apparatus of claim 6, wherein the target operation further comprises:

9. The apparatus of claim 8, wherein the target operation further comprises:

10. The apparatus of claim 9, wherein the execution module is specifically configured to: in ymin_s1＞ymin_s2In the case of (3), the minimum Y-axis coordinate value of the first target boundary frame and the maximum Y-axis coordinate value of the second target boundary frame are both adjusted to ymin_s1+ Δ y; in ymin_s1≤ymin_s2In the case of (3), the Y-axis maximum coordinate value of the first target boundary frame and the Y-axis minimum coordinate value of the second target boundary frame are both adjusted to ymin_s2+Δy。

11. An electronic device, comprising: a memory, a processor, and a program stored on the memory and executable on the processor; the processor is configured to read a program in the memory to implement the steps in the text recognition processing method according to any one of claims 1 to 5.

12. A readable storage medium storing a program, wherein the program realizes the steps in the text recognition processing method according to any one of claims 1 to 5 when executed by a processor.