CN112766266A

CN112766266A - Text direction correction method, system and device based on staged probability statistics

Info

Publication number: CN112766266A
Application number: CN202110128262.6A
Authority: CN
Inventors: 李源; 杨曦露
Original assignee: Yuncong Technology Group Co Ltd
Current assignee: Yuncong Technology Group Co Ltd
Priority date: 2021-01-29
Filing date: 2021-01-29
Publication date: 2021-05-07
Anticipated expiration: 2041-01-29
Also published as: CN112766266B

Abstract

The invention relates to the technical field of text direction correction, and particularly provides a text direction correction method, a system and a device based on staged probability statistics, aiming at solving the technical problem of how to correct different arbitrary print texts into correct reading directions. To this end, the method of the invention comprises: detecting a text image to obtain all text lines; determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence; correcting the main direction of all the text lines into a horizontal direction; and slicing the corrected text line, counting the positive and negative directions of at least part of slices, and finally correcting the text line based on the slice direction with the highest occurrence probability to ensure that the direction of the text image accords with the preset direction. The method does not pay attention to local characteristics of the text aiming at a certain specific format, has strong generalization capability, can correct the direction of the text only by training a model through machine learning, and can ensure the accuracy and correctness of the whole optical character recognition.

Description

Text direction correction method, system and device based on staged probability statistics

Technical Field

The invention relates to the technical field of text direction correction, in particular to a text direction correction method, a text direction correction system and a text direction correction device based on staged probability statistics.

Background

In most optical character recognition OCR tasks, firstly, the direction of a text needs to be corrected, because the background of the text is complex, and the change range of the size and the length-width ratio of the text is large, the traditional method is easy to be sensitive to the color, the brightness, the background texture and the format of the text, the generalization capability is poor, the correct reading direction of any printed text with different specific formats is corrected, and an ideal correction effect is difficult to achieve, as shown in FIG. 1, the common defects of inaccurate text angle, direction reversal and the like can be caused, and the interruption of the optical character recognition task can be further caused.

Therefore, a text direction correction scheme based on staged probability statistics is urgently needed to be provided, local features of a text can be concerned without aiming at a certain specific format, generalization capability is strong, and extremely high accuracy can be obtained only by training a text detection model.

Disclosure of Invention

In order to overcome the defects, the invention provides a method, a system and a device for correcting the text direction based on staged probability statistics, which aims to solve or at least partially solve the technical problems of correcting any print text with different colors, brightness and background textures into a correct reading direction and ensuring the correctness and accuracy of the whole optical character recognition.

In a first aspect, a method for correcting text direction based on staged probability statistics is provided, the method including:

detecting a text image to obtain all text lines;

determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence;

correcting the main direction of all the text lines into a horizontal direction;

slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.

The step of detecting the text image to obtain all text lines specifically includes: detecting the text image in a fixed anchor mode to obtain all text lines; and/or the method further comprises: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.

The step of determining the main directions of all text lines based on the one or more directions with the highest probability of occurrence specifically includes: and taking the direction of the average value of the angles of the text lines with the largest occurrence number relative to the horizontal direction as the main direction.

Wherein the method further comprises:

before counting the positive and negative directions of at least part of slices, carrying out character classification on characters in at least part of slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;

the step of counting the positive and negative directions of at least part of the slices specifically comprises the following steps:

and counting positive and negative directions only on the slices with dissimilar positive and negative character shapes.

The step of "performing final correction based on the slice direction with the highest occurrence probability" specifically includes:

inputting the image of the slice into a convolutional neural network, and calculating the prediction category of each character on the slice; if the number of positive direction characters in the characters with the dissimilar positive and negative direction shapes is larger than that of the negative direction characters, the slicing direction is positive; otherwise, the slice direction is negative;

if the direction of most slices is positive, keeping the current direction unchanged; otherwise, all text is rotated by 180 degrees.

In a second aspect, a system for correcting text direction based on staged probability statistics is provided, which includes:

the text line acquisition module is used for detecting a text image to obtain all text lines;

a main direction determination module for determining the direction of each text line and determining the main direction of all text lines based on the one or more directions with the highest probability of occurrence;

a horizontal direction rectifying module for rectifying the main direction of all the text lines into a horizontal direction;

the final correction module is used for slicing the corrected text line, counting the positive and negative directions of at least part of slices and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with the preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.

The operation executed by the text line acquisition module specifically comprises: detecting the text image in a fixed anchor mode to obtain all text lines; and/or further comprising: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.

Wherein, when determining the main direction of all text lines based on one or more directions with the highest occurrence probability, the main direction determination module takes a direction of an average value of angles of the text lines with the highest number of occurrences with respect to the horizontal direction as the main direction.

Before counting the positive and negative directions of at least part of the slices, the final correction module classifies characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;

when the positive and negative directions of at least part of the slices are counted, the final correction module only counts the positive and negative directions of the slices with dissimilar characters in the positive and negative directions.

When final correction is performed based on the slice direction with the highest occurrence probability, the operation performed by the final correction module specifically includes:

In a third aspect, a computer readable storage medium is provided, having stored thereon a plurality of program codes adapted to be loaded and executed by a processor to perform the method of any of the preceding claims.

In a fourth aspect, there is provided a control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the method of any of the preceding claims.

One or more technical schemes of the invention at least have one or more of the following beneficial effects: detecting a text image to obtain all text lines; determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence; correcting the main direction of all the text lines into a horizontal direction; slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice. The method has the advantages that local features of the text can be concerned without aiming at a certain specific format, the generalization capability is strong, the direction of the text can be corrected only by training a text detection model and machine learning, and the correctness and the accuracy of the whole optical character recognition are ensured.

Drawings

Embodiments of the invention are described below with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an embodiment of correcting an arbitrary print text by a conventional method, which is difficult to achieve a desired correction effect;

FIG. 2 is a main flow diagram of one embodiment of a text orientation correction method based on staged probability statistics in accordance with the present invention;

FIG. 3 is a schematic diagram of an embodiment of counting text lines in an interval of 5 degrees to obtain a main direction with the largest occurrence frequency according to the scheme of the present invention;

FIG. 4 is a schematic diagram of one embodiment of rectifying a primary direction of a line of text to a horizontal direction in accordance with aspects of the present invention;

FIG. 5 is a schematic diagram of one embodiment of positive and negative directions of a text line slice in accordance with aspects of the present invention;

FIG. 6 is a block diagram of predicting text line direction based on text line slicing direction determination, according to an aspect of the present invention;

FIG. 7 is a schematic diagram of an embodiment of selecting a slice for vote detection and determining a positive and negative direction for implementing final correction of a text according to the solution of the present invention;

fig. 8 is a block diagram illustrating an embodiment of a system for rectifying the orientation of a text based on a staged probability statistic according to the present invention.

Detailed Description

Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.

In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.

Some terms to which the invention relates are explained here:

optical character recognition OCR: refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer word using a character recognition method.

Cross-over ratio IOU: the ratio of the intersection and union of the areas of the two rectangular boxes.

Non-maximal inhibition of NMS: the non-maximization inhibition is used for selecting the anchor with the highest score or the highest probability in a local neighborhood and inhibiting the anchor with the low score.

In the prior art, in the development of most optical character recognition OCR tasks, the direction of a text needs to be corrected first, due to the fact that the background of the text is complex, and the change range of the size and the length-width ratio of the text is large, the traditional method is sensitive to the color, the brightness, the background texture and the format of the text easily, the generalization capability is poor, the correct reading direction of any printed text with different specific formats is corrected to be difficult to achieve an ideal correction effect, and the common defects of inaccurate text angle, reverse direction and the like are caused generally, and further the interruption of the optical character recognition task is caused.

One embodiment of a text direction correction scheme based on staged probability statistics of the present invention is as follows: for a certain marketThe method comprises the following steps of carrying out text direction correction and recognition on a shopping receipt, firstly, when a text detection model is trained, presetting a dense rectangular box with a fixed size as a fixed anchor on an image of the shopping receipt, marking a text label on a text line of the image of the shopping receipt, wherein the area of the fixed anchor A is 5 square centimeters, the area of the text label G is 6 square centimeters, the crossed area of the fixed anchor A and the text label G is 5 square centimeters, a preset threshold value is 0.5, and the cross-over ratio of the fixed anchor A and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample. The four point coordinates of the anchor A are (1, 1), (2, 1), (1, 6) and (2, 6), the four point coordinates of the text label G are (1, 1), (2, 1), (1, 7) and (2, 7), and the width A of the anchor A is_wIs 1, height A_hIs 5, therefore, the offset of the regression of the text label G in the X direction with respect to the anchor A is calculated by the algorithm as t_xIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is t_yIs (0, 0, 0.5, 0.5); and finally, using the fixed anchor and the offset to train the text detection model.

When a text line is predicted, the predicted anchor is B, the predicted text box is F, the probability that each anchor is a text output by the text detection model is 0.75, and the preset threshold value is 0.5, so that the probability that each anchor is a text output by the text detection model is greater than the preset threshold value, the coordinates of the predicted anchor B are obtained as (1, 1), (2, 1), (1, 7) and (2, 7), and the deviation of the predicted text box F provided by the text detection model relative to the regression of the anchor B is T_xIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is T_yIf the number is (1, 1, 1, 1), the coordinates of the text box F to be predicted are [ 2, 2), (3, 2), (2, 8), (3, 8) ]; if the probability of outputting the predicted text box E by the text detection model is maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value or not, and if so, deleting the predicted text box CAnd D, reserving the predicted text box E as a text line obtained by detection.

After all text lines are detected and obtained, removing the text lines with small length-width ratio, removing the text lines with length-width ratio of 1/3< h/w <3, wherein the length of the rectangular text lines is h, and the width of the rectangular text lines is w; fitting the minimum envelope moment to the remaining text lines to form a text box, calculating to obtain the vector direction of the text box, obtaining the direction angle of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as an interval, finding the interval with the maximum number of the text lines, solving and taking the direction of the angle average value of the text line set of the interval with the maximum number of the text lines as the main direction of the text, wherein the angle range of the direction of the remaining text lines is between 0 degree and 90 degrees, the interval is taken 5 degrees as an interval and is divided into 18 intervals, and the angle range of the nth interval is [ (n-1) x 5, n x 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer. The angles of all text lines are obtained through calculation of a text detection model and are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] in 5 degrees, wherein the text line data of the second section is the most, the average value of the angles is 45 degrees, the main direction of the text is 45 degrees, and finally the main direction of the text is corrected to be the horizontal direction.

By cutting the text line corrected to the horizontal direction into slices and performing character classification and positive and negative direction prediction voting (probability statistics) on each character in the slices of a plurality of text lines at the same time, characters with similar shapes in the positive and negative directions in the character classification are firstly classified into one class and removed, namely, characters with particularly similar shapes in the positive direction and 180-degree rotation are taken out as one class, such as characters of '0', 'one', 'H', 'field', 'day' and the like, and other characters are positive at 0 degrees and negative at 180 degrees. The total 20 characters are predicted in the slice, the number of the characters predicted to be in the positive direction is 18, the number of the characters predicted to be in the negative direction is 2, the number of the characters predicted to be in the positive direction in the slice is larger than that of the characters predicted to be in the negative direction, therefore, the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior in the slice is further judged to be in the positive direction, and therefore, the direction of the shopping receipt is kept unchanged; if the number of characters predicted to be negative in the slice is larger than that of characters predicted to be positive in the positive direction, the direction of the slice is a negative direction, and if the directions of most slices are negative, the shopping receipt needs to be rotated by 180 degrees, so that the final correction of the shopping receipt is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.

The implementation of the present invention will be described with reference to the main flowchart of fig. 2, which shows an embodiment of a text direction correction method based on staged probability statistics.

Step S101, detecting a text image to obtain all text lines;

in one embodiment, all text lines in a text image may be obtained by using a text detection model based on a fixed anchor, the text detection model sets a dense rectangular box with a fixed size as an anchor in advance on the text image, and extracts the feature of the text image, classifies and regresses the fixed anchor, and obtains all text lines of the text by non-maximization suppression.

When a text detection model is trained, presetting a dense rectangular box with a fixed size as a fixed anchor on a text image, labeling a text label on a text line of the text image, calculating the intersection ratio of the fixed anchor and the text label, if the intersection ratio is greater than a preset threshold value, the fixed anchor is a positive sample, otherwise, the fixed anchor is a negative sample; and calculating to obtain the offset of the text label relative to the fixed anchor according to the difference between the coordinates of the text label and the fixed anchor, and finally inputting the positive sample and the negative sample of the fixed anchor and training the text detection model.

Further, when the text detection model is trained, if the anchor is a and the text label is G, then the calculation formula of the intersection ratio of the anchor a and the text label G is as follows:

IOU＝area(A∩G)/(area(A)+area(G)–area(A∩G))

in the formula:

IOU represents the intersection ratio, area (x) represents the area of x, and A # G represents the intersection part of A and G;

if the intersection ratio IOU is larger than a preset threshold value, the anchor A is a positive sample, otherwise, the anchor A is a negative sample;

for example, the area of the anchor a is 5 square centimeters, the area of the text label G is 6 square centimeters, the area of the intersection of the anchor a and the text label G is 5 square centimeters, the preset threshold value is 0.5, and the intersection-parallel ratio of the anchor a and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample.

When the text detection model is trained, the regression offset calculation formula of the text label G relative to the fixed anchor A is as follows:

t_xi＝(X_Gi-X_Ai)/A_w，t_yi＝(Y_Gi-Y_Ai)/A_h)

in the formula:

t_xithe displacement of the ith point in the X direction;

t_yithe displacement of the ith point in the Y direction;

i is four points of a text label and a fixed anchor, and takes values of 1, 2, 3 and 4;

X_Githe X coordinate of the ith point of the text label G;

X_Aithe X coordinate of the ith point of the anchor A is taken as the X coordinate of the ith point of the anchor A;

A_wis the width of anchor a;

Y_Githe Y coordinate of the ith point of the text label G;

Y_Aithe Y coordinate of the ith point of the anchor A is fixed;

A_hthe height of anchor a is fixed.

For another example, the four point coordinates of the anchor a are [ 1, 1), (2, 1), (1, 6), (2, 6) ], the four point coordinates of the text label G are [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the width a of the anchor a_wIs 1, height A_hIs 5, and thus the text label G has an offset t from the regression of the anchor a in the X direction_xIs (0, 0, 0, 0) in the Y directionThe offset of the regression is t_yIs (0, 0, 0.5, 0.5).

When the trained text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold value, the predicted coordinates of the anchor are added with the offset of the predicted coordinates of the text box output by the text detection model relative to the predicted anchor to obtain an initial predicted text box detection result, and finally all text lines after detection are obtained from the initial predicted text box detection result through non-maximization inhibition.

Further, when a text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold, the predicted coordinates of the anchor are obtained, and the predicted coordinates of the text box are obtained according to the regression offset T of the text box provided by the text detection model relative to the predicted anchor;

when predicting text lines, the predicted anchor is B, the predicted text box is F, and the coordinate calculation formula of the predicted text box is as follows:

X_Fi＝T_xi×B_w+X_Bi

Y_Fi＝T_yi×B_h+Y_Bi

in the formula:

X_Fiis the X coordinate of the ith point of the predicted text box;

T_xiproviding an ith point X-direction shift amount of the predicted regression offset of the text box relative to the predicted anchor according to the text detection model;

B_wis the predicted anchor width;

X_Bithe X coordinate of the ith point of the predicted fixed anchor;

Y_Fiis the Y coordinate of the predicted ith point of the text box;

T_yiproviding an ith amount of offset of a predicted text box from a regression of a predicted anchor according to the text detection modelPoint Y-direction shift amount;

B_his the predicted anchor height;

Y_Biis the Y coordinate of the predicted anchor ith point.

For example, if the probability that each anchor is a text output by the text detection model is 0.75 and the preset threshold is 0.5, the probability that each anchor is a text output by the text detection model is greater than the preset threshold, the coordinates of the predicted anchors are obtained as [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the offset of the regression of the text box providing prediction according to the text detection model with respect to the anchor is T_xIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is T_yTo (1, 1, 1, 1), the coordinates of the text box for which prediction is obtained are [ 2, 2), (3, 2), (2, 8), (3, 8) ].

If the probability that the text detection model outputs the predicted text box E is the maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value, if so, deleting C, D the predicted text box, and keeping the predicted text box E as the text line obtained by detection.

Step S102, determining the direction of each text line and determining the main direction of all the text lines based on one or more directions with the highest occurrence probability;

in one embodiment, after all text lines are detected and obtained, removing text lines with a small length-width ratio, fitting a minimum envelope moment to the remaining text lines to form a text box, calculating to obtain a vector direction of the text box, obtaining direction angles of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as one interval, finding the interval with the largest number of the text lines, and solving and taking the direction of an angle average value of a text line set of the interval with the largest number of the text lines as a main direction of the text.

When the main direction of the text is calculated, removing text lines with the length-width ratio of 1/3< h/w <3, wherein the length of each rectangular text line is h, and the width of each rectangular text line is w;

the angle range of the rest text line directions is between 0 degree and 90 degrees, 5 degrees are taken as intervals and are divided into 18 intervals, the angle range of the nth interval is [ (n-1) multiplied by 5, n multiplied by 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer.

For example, fig. 3 is a schematic diagram of an embodiment of counting text lines in an interval of 5 degrees to obtain a main direction with the largest occurrence frequency according to the scheme of the present invention; after all text lines are obtained through detection, text lines with the length-width ratio being 1/3< h/w <3 are removed, the length of a rectangular text line is h, the width of the rectangular text line is w, a text box is formed by fitting a minimum envelope moment to the remaining text lines, the vector direction of the text box is obtained through calculation, the direction angles of the remaining text lines are obtained according to the vector direction of the text box, the angles of all the text lines are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] by 5 degrees, wherein the text line data in the second section is the largest, and the average value of the angles is 45 degrees, and the main direction of the text can be obtained as 45 degrees.

Step S103, correcting the main directions of all the text lines into the horizontal direction;

correcting all text lines into horizontal directions according to the determined main directions of all the text lines; fig. 4 is a schematic diagram of an embodiment of the scheme according to the invention for correcting the main direction of the text line to the horizontal direction.

Step S104, slicing the corrected text line, counting the positive and negative directions of at least part of slices, and finally correcting the text line based on the slice direction with the highest occurrence probability to ensure that the direction of the text image accords with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.

In one embodiment, as shown in FIG. 5, a schematic diagram of one embodiment of positive and negative directions of a text slice according to aspects of the present invention; cutting the text line corrected to the horizontal direction into slices, and simultaneously performing character classification and predictive positive-negative direction voting on each character in the slices of the plurality of text lines; FIG. 6 illustrates the determination of the direction of a predicted text line based on the direction of the text line slice according to aspects of the present invention; before counting the positive and negative directions of at least part of the slices, firstly, carrying out character classification on characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions, and only judging the positive and negative directions of the slices with dissimilar shapes of the characters in the positive and negative directions; after characters with similar shapes in the positive direction and the negative direction in character classification are removed, if the number of positive direction characters in the slice is larger than that of negative direction characters, the direction of the slice is judged to be the positive direction, if the directions of most slices are positive, the positive direction of the text behavior where the slice is located is further judged, and therefore the direction of the text is kept unchanged; otherwise, the text is rotated by 180 degrees to realize the final correction of the text, as shown in fig. 7, which is an embodiment of the scheme of the invention that the final correction of the text is realized by selecting a slice to perform voting detection and judging the positive and negative directions by single character voting.

In one embodiment, because the direction can be determined by voting of each word, the accuracy requirement for identifying the model is not high, the model is subjected to lightweight processing, and an RNN layer is removed; firstly, inputting a sliced image into a convolutional neural network, and outputting a character sequence prediction probability matrix P, wherein the shape of the matrix is (m, c), m is the length of a character sequence, and c is the number of character classification categories; when the character recognition of the slice is performed, the prediction type of each character in the character sequence is calculated according to the character sequence prediction probability matrix P, and if the maximum value index value of the character sequence prediction probability vector P [ i ] of the ith character is j ═ argmax (P [ i ]), the prediction type of the ith character is j.

For example, if the number c of character classification categories in the present embodiment is 3, and the prediction category preset for the ith character is j 0, it indicates that the ith character is a character with a similar shape when viewed from the front direction and 180 degrees of rotation, that is, a character with a similar shape in the positive and negative directions, such as characters "0", "one", "H", "farm", "day", etc.; presetting that when the prediction category of the ith character is j equal to 1, the ith character is a character which can be read and identified normally when being seen from the positive direction, namely the ith character is the character in the positive direction; when the prediction type of the ith character is preset to be j-2, the ith character is a character which can be normally read and recognized only by rotating 180 degrees when viewed from the positive direction, namely the ith character is a character in the negative direction.

For another example, by cutting the text line corrected to the horizontal direction into slices, and performing character classification and positive/negative direction vote prediction on each character in the slices of a plurality of text lines at the same time, characters with similar positive/negative direction shapes, such as characters "0", "one", "H", "field", "day", and the like, are first sorted and culled, and positive/negative direction vote statistics is performed only on characters in the positive direction (0 degrees) and characters in the negative direction (180 degrees). If the total 20 characters are predicted in the slice, if the characters predicted to be in the positive direction are 18, the characters predicted to be in the negative direction are 2, and the number of the characters predicted to be in the positive direction in the slice is greater than that of the characters predicted to be in the negative direction, so that the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior where the slice is located is further judged to be in the positive direction, and the direction of the text is kept unchanged; if the number of characters predicted to be in the negative direction in the slice is larger than that of characters in the positive direction, the direction of the slice is the negative direction, and if the directions of most slices are negative, the text needs to be rotated by 180 degrees, so that the final correction of the text is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.

Referring to fig. 8, a block diagram of an embodiment of a text direction correction system based on staged probability statistics according to the present invention is shown; the implementation of the present invention is explained. The system at least comprises:

a text line acquisition module 801 for detecting a text image to obtain all text lines;

IOU＝area(A∩G)/(area(A)+area(G)–area(A∩G))

in the formula:

t_xi＝(X_Gi-X_Ai)/A_w，t_yi＝(Y_Gi-Y_Ai)/A_h)

in the formula:

t_xiis the ith point X directionThe amount of displacement of;

t_yithe displacement of the ith point in the Y direction;

X_Githe X coordinate of the ith point of the text label G;

A_wis the width of anchor a;

Y_Githe Y coordinate of the ith point of the text label G;

Y_Aithe Y coordinate of the ith point of the anchor A is fixed;

A_hthe height of anchor a is fixed.

For another example, the four point coordinates of the anchor a are [ 1, 1), (2, 1), (1, 6), (2, 6) ], the four point coordinates of the text label G are [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the width a of the anchor a_wIs 1, height A_hIs 5, and thus the text label G has an offset t from the regression of the anchor a in the X direction_xIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is t_yIs (0, 0, 0.5, 0.5).

X_Fi＝T_xi×B_w+X_Bi

Y_Fi＝T_yi×B_h+Y_Bi

in the formula:

X_Fiis the X coordinate of the ith point of the predicted text box;

B_wis the predicted anchor width;

X_Bithe X coordinate of the ith point of the predicted fixed anchor;

Y_Fiis the Y coordinate of the predicted ith point of the text box;

T_yiproviding an ith point Y-direction shift amount of the predicted regression offset of the text box relative to the predicted anchor according to the text detection model;

B_his the predicted anchor height;

Y_Biis the Y coordinate of the predicted anchor ith point.

A main direction determination module 802 for determining a direction of each text line and determining a main direction of all text lines based on the one or more directions with the highest probability of occurrence;

A horizontal direction rectifying module 803 for rectifying the main direction of all the text lines into a horizontal direction;

A final correction module 804, configured to slice the corrected text line, count positive and negative directions of at least part of the slices, and perform final correction based on a slice direction with the highest occurrence probability, so that the direction of the text image conforms to a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.

An example of an application scenario of the technical solution of the present invention is described below to further illustrate the implementation of the present invention: carrying out text direction correction and recognition on shopping tickets in a certain market, firstly, when a text detection model is trained, presetting a dense rectangular frame with a fixed size as a fixed anchor on an image of the shopping tickets, marking a text label on a text line of the image of the shopping tickets, wherein the area of the fixed anchor A is 5 square centimeters, the area of the text label G is 6 square centimeters, the crossed area of the fixed anchor A and the text label G is 5 square centimeters, a preset threshold value is 0.5, and the coincidence ratio of the fixed anchor A and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample. The four point coordinates of the anchor A are (1, 1), (2, 1), (1, 6) and (2, 6), the four point coordinates of the text label G are (1, 1), (2, 1), (1, 7) and (2, 7), and the width A of the anchor A is_wIs 1, height A_hIs 5, therefore, the offset of the regression of the text label G in the X direction with respect to the anchor A is calculated by the algorithm as t_xIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is t_yIs (0, 0, 0.5, 0.5); and finally, using the fixed anchor and the offset to train the text detection model.

When the text line is predicted, the predicted fixed anchor is B, the predicted text box is F, the probability that each fixed anchor is the text output by the text detection model is 0.75, and the preset threshold value is 0.5, so that the probability that each fixed anchor is the text output by the text detection model is greater than the preset threshold valueSetting a threshold value, obtaining the coordinates of the predicted anchor B as [ 1, 1), (2, 1), (1, 7) and (2, 7 ], and providing the predicted offset of the text box F relative to the regression of the anchor B as T according to the text detection model_xIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is T_yIf the number is (1, 1, 1, 1), the coordinates of the text box F to be predicted are [ 2, 2), (3, 2), (2, 8), (3, 8) ]; if the probability that the text detection model outputs the predicted text box E is the maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value, if so, deleting C, D the predicted text box, and keeping the predicted text box E as the text line obtained by detection.

By cutting the text line corrected to the horizontal direction into slices and performing character classification and positive and negative direction voting prediction on each character in the slices of a plurality of text lines at the same time, characters with similar positive and negative direction shapes, such as characters "0", "one", "H", "field", "day", and the like, are first taken as one class and eliminated, and positive and negative direction voting statistics are performed only on characters in the positive direction (0 degrees) and characters in the negative direction (180 degrees). If the total 20 characters are predicted in the slice, if the characters predicted to be in the positive direction are 18, the characters predicted to be in the negative direction are 2, and the number of the characters predicted to be in the positive direction in the slice is greater than that of the characters predicted to be in the negative direction, so that the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior where the slice is located is further judged to be in the positive direction, and the direction of the text is kept unchanged; if the number of characters predicted to be in the negative direction in the slice is larger than that of characters in the positive direction, the direction of the slice is the negative direction, and if the directions of most slices are negative, the text needs to be rotated by 180 degrees, so that the final correction of the text is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.

It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

Further, in one embodiment of a computer-readable storage medium of the present invention, includes: the storage medium has stored therein a plurality of program codes adapted to be loaded and executed by a processor to perform the method of any of the preceding claims.

Further, in an embodiment of a control device of the invention, the processing device comprises a processor and a memory, said memory device being adapted to store a plurality of program codes, said program codes being adapted to be loaded and run by said processor to perform the method of any of the preceding claims.

Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.

Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.

So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A text direction rectification method based on staged probability statistics is characterized by comprising the following steps:

detecting a text image to obtain all text lines;

slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction;

wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.

2. The method according to claim 1, wherein the step of "detecting the text image to obtain all text lines" comprises in particular: detecting the text image in a fixed anchor mode to obtain all text lines; and/or

The method further comprises the following steps: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.

3. The method according to claim 1, wherein the step of determining the main direction of all text lines based on the one or more directions with the highest probability of occurrence specifically comprises: and taking the direction of the average value of the angles of the text lines with the largest occurrence number relative to the horizontal direction as the main direction.

4. The method of claim 1, further comprising:

5. The method according to claim 1, wherein the step of performing final rectification based on the slice direction with the highest probability of occurrence specifically comprises:

6. A system for correcting text orientation based on staged probability statistics, comprising:

7. The system of claim 6, wherein the text line obtaining module performs operations specifically comprising: detecting the text image in a fixed anchor mode to obtain all text lines; and/or

Further comprising: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.

8. The system according to claim 6, wherein the main direction determination module takes, as the main direction, a direction of an average value of angles of the text line that appears most frequently with respect to the horizontal direction when determining the main direction of all the text lines based on one or more directions in which the probability of occurrence is highest.

9. The system of claim 6,

before counting the positive and negative directions of at least part of the slices, the final correction module carries out character classification on characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;

10. The system of claim 6, wherein, in performing the final correction based on the slice direction with the highest probability of occurrence, the final correction module performs operations comprising:

11. A computer-readable storage medium, characterized in that a plurality of program codes are stored in the storage medium, which program codes are adapted to be loaded and executed by a processor to perform the method according to any of claims 1 to 5.

12. A control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the method of any of claims 1 to 5.