CN112766266A - Text direction correction method, system and device based on staged probability statistics - Google Patents

Text direction correction method, system and device based on staged probability statistics Download PDF

Info

Publication number
CN112766266A
CN112766266A CN202110128262.6A CN202110128262A CN112766266A CN 112766266 A CN112766266 A CN 112766266A CN 202110128262 A CN202110128262 A CN 202110128262A CN 112766266 A CN112766266 A CN 112766266A
Authority
CN
China
Prior art keywords
text
positive
characters
slice
negative
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110128262.6A
Other languages
Chinese (zh)
Other versions
CN112766266B (en
Inventor
李源
杨曦露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yuncong Technology Group Co Ltd
Original Assignee
Yuncong Technology Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yuncong Technology Group Co Ltd filed Critical Yuncong Technology Group Co Ltd
Priority to CN202110128262.6A priority Critical patent/CN112766266B/en
Publication of CN112766266A publication Critical patent/CN112766266A/en
Application granted granted Critical
Publication of CN112766266B publication Critical patent/CN112766266B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/146Aligning or centring of the image pick-up or image-field
    • G06V30/1475Inclination or skew detection or correction of characters or of image to be recognised
    • G06V30/1478Inclination or skew detection or correction of characters or of image to be recognised of characters or characters lines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention relates to the technical field of text direction correction, and particularly provides a text direction correction method, a system and a device based on staged probability statistics, aiming at solving the technical problem of how to correct different arbitrary print texts into correct reading directions. To this end, the method of the invention comprises: detecting a text image to obtain all text lines; determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence; correcting the main direction of all the text lines into a horizontal direction; and slicing the corrected text line, counting the positive and negative directions of at least part of slices, and finally correcting the text line based on the slice direction with the highest occurrence probability to ensure that the direction of the text image accords with the preset direction. The method does not pay attention to local characteristics of the text aiming at a certain specific format, has strong generalization capability, can correct the direction of the text only by training a model through machine learning, and can ensure the accuracy and correctness of the whole optical character recognition.

Description

Text direction correction method, system and device based on staged probability statistics
Technical Field
The invention relates to the technical field of text direction correction, in particular to a text direction correction method, a text direction correction system and a text direction correction device based on staged probability statistics.
Background
In most optical character recognition OCR tasks, firstly, the direction of a text needs to be corrected, because the background of the text is complex, and the change range of the size and the length-width ratio of the text is large, the traditional method is easy to be sensitive to the color, the brightness, the background texture and the format of the text, the generalization capability is poor, the correct reading direction of any printed text with different specific formats is corrected, and an ideal correction effect is difficult to achieve, as shown in FIG. 1, the common defects of inaccurate text angle, direction reversal and the like can be caused, and the interruption of the optical character recognition task can be further caused.
Therefore, a text direction correction scheme based on staged probability statistics is urgently needed to be provided, local features of a text can be concerned without aiming at a certain specific format, generalization capability is strong, and extremely high accuracy can be obtained only by training a text detection model.
Disclosure of Invention
In order to overcome the defects, the invention provides a method, a system and a device for correcting the text direction based on staged probability statistics, which aims to solve or at least partially solve the technical problems of correcting any print text with different colors, brightness and background textures into a correct reading direction and ensuring the correctness and accuracy of the whole optical character recognition.
In a first aspect, a method for correcting text direction based on staged probability statistics is provided, the method including:
detecting a text image to obtain all text lines;
determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence;
correcting the main direction of all the text lines into a horizontal direction;
slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
The step of detecting the text image to obtain all text lines specifically includes: detecting the text image in a fixed anchor mode to obtain all text lines; and/or the method further comprises: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.
The step of determining the main directions of all text lines based on the one or more directions with the highest probability of occurrence specifically includes: and taking the direction of the average value of the angles of the text lines with the largest occurrence number relative to the horizontal direction as the main direction.
Wherein the method further comprises:
before counting the positive and negative directions of at least part of slices, carrying out character classification on characters in at least part of slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;
the step of counting the positive and negative directions of at least part of the slices specifically comprises the following steps:
and counting positive and negative directions only on the slices with dissimilar positive and negative character shapes.
The step of "performing final correction based on the slice direction with the highest occurrence probability" specifically includes:
inputting the image of the slice into a convolutional neural network, and calculating the prediction category of each character on the slice; if the number of positive direction characters in the characters with the dissimilar positive and negative direction shapes is larger than that of the negative direction characters, the slicing direction is positive; otherwise, the slice direction is negative;
if the direction of most slices is positive, keeping the current direction unchanged; otherwise, all text is rotated by 180 degrees.
In a second aspect, a system for correcting text direction based on staged probability statistics is provided, which includes:
the text line acquisition module is used for detecting a text image to obtain all text lines;
a main direction determination module for determining the direction of each text line and determining the main direction of all text lines based on the one or more directions with the highest probability of occurrence;
a horizontal direction rectifying module for rectifying the main direction of all the text lines into a horizontal direction;
the final correction module is used for slicing the corrected text line, counting the positive and negative directions of at least part of slices and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with the preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
The operation executed by the text line acquisition module specifically comprises: detecting the text image in a fixed anchor mode to obtain all text lines; and/or further comprising: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.
Wherein, when determining the main direction of all text lines based on one or more directions with the highest occurrence probability, the main direction determination module takes a direction of an average value of angles of the text lines with the highest number of occurrences with respect to the horizontal direction as the main direction.
Before counting the positive and negative directions of at least part of the slices, the final correction module classifies characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;
when the positive and negative directions of at least part of the slices are counted, the final correction module only counts the positive and negative directions of the slices with dissimilar characters in the positive and negative directions.
When final correction is performed based on the slice direction with the highest occurrence probability, the operation performed by the final correction module specifically includes:
inputting the image of the slice into a convolutional neural network, and calculating the prediction category of each character on the slice; if the number of positive direction characters in the characters with the dissimilar positive and negative direction shapes is larger than that of the negative direction characters, the slicing direction is positive; otherwise, the slice direction is negative;
if the direction of most slices is positive, keeping the current direction unchanged; otherwise, all text is rotated by 180 degrees.
In a third aspect, a computer readable storage medium is provided, having stored thereon a plurality of program codes adapted to be loaded and executed by a processor to perform the method of any of the preceding claims.
In a fourth aspect, there is provided a control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, the program codes being adapted to be loaded and run by the processor to perform the method of any of the preceding claims.
One or more technical schemes of the invention at least have one or more of the following beneficial effects: detecting a text image to obtain all text lines; determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence; correcting the main direction of all the text lines into a horizontal direction; slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice. The method has the advantages that local features of the text can be concerned without aiming at a certain specific format, the generalization capability is strong, the direction of the text can be corrected only by training a text detection model and machine learning, and the correctness and the accuracy of the whole optical character recognition are ensured.
Drawings
Embodiments of the invention are described below with reference to the accompanying drawings, in which:
FIG. 1 is a schematic diagram of an embodiment of correcting an arbitrary print text by a conventional method, which is difficult to achieve a desired correction effect;
FIG. 2 is a main flow diagram of one embodiment of a text orientation correction method based on staged probability statistics in accordance with the present invention;
FIG. 3 is a schematic diagram of an embodiment of counting text lines in an interval of 5 degrees to obtain a main direction with the largest occurrence frequency according to the scheme of the present invention;
FIG. 4 is a schematic diagram of one embodiment of rectifying a primary direction of a line of text to a horizontal direction in accordance with aspects of the present invention;
FIG. 5 is a schematic diagram of one embodiment of positive and negative directions of a text line slice in accordance with aspects of the present invention;
FIG. 6 is a block diagram of predicting text line direction based on text line slicing direction determination, according to an aspect of the present invention;
FIG. 7 is a schematic diagram of an embodiment of selecting a slice for vote detection and determining a positive and negative direction for implementing final correction of a text according to the solution of the present invention;
fig. 8 is a block diagram illustrating an embodiment of a system for rectifying the orientation of a text based on a staged probability statistic according to the present invention.
Detailed Description
Some embodiments of the invention are described below with reference to the accompanying drawings. It should be understood by those skilled in the art that these embodiments are only for explaining the technical principle of the present invention, and are not intended to limit the scope of the present invention.
In the description of the present invention, a "module" or "processor" may include hardware, software, or a combination of both. A module may comprise hardware circuitry, various suitable sensors, communication ports, memory, may comprise software components such as program code, or may be a combination of software and hardware. The processor may be a central processing unit, microprocessor, image processor, digital signal processor, or any other suitable processor. The processor has data and/or signal processing functionality. The processor may be implemented in software, hardware, or a combination thereof. Non-transitory computer readable storage media include any suitable medium that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random-access memory, and the like. The term "a and/or B" denotes all possible combinations of a and B, such as a alone, B alone or a and B. The term "at least one A or B" or "at least one of A and B" means similar to "A and/or B" and may include only A, only B, or both A and B. The singular forms "a", "an" and "the" may include the plural forms as well.
Some terms to which the invention relates are explained here:
optical character recognition OCR: refers to a process in which an electronic device (e.g., a scanner or a digital camera) checks a character printed on paper, determines its shape by detecting dark and light patterns, and then translates the shape into a computer word using a character recognition method.
Cross-over ratio IOU: the ratio of the intersection and union of the areas of the two rectangular boxes.
Non-maximal inhibition of NMS: the non-maximization inhibition is used for selecting the anchor with the highest score or the highest probability in a local neighborhood and inhibiting the anchor with the low score.
In the prior art, in the development of most optical character recognition OCR tasks, the direction of a text needs to be corrected first, due to the fact that the background of the text is complex, and the change range of the size and the length-width ratio of the text is large, the traditional method is sensitive to the color, the brightness, the background texture and the format of the text easily, the generalization capability is poor, the correct reading direction of any printed text with different specific formats is corrected to be difficult to achieve an ideal correction effect, and the common defects of inaccurate text angle, reverse direction and the like are caused generally, and further the interruption of the optical character recognition task is caused.
One embodiment of a text direction correction scheme based on staged probability statistics of the present invention is as follows: for a certain marketThe method comprises the following steps of carrying out text direction correction and recognition on a shopping receipt, firstly, when a text detection model is trained, presetting a dense rectangular box with a fixed size as a fixed anchor on an image of the shopping receipt, marking a text label on a text line of the image of the shopping receipt, wherein the area of the fixed anchor A is 5 square centimeters, the area of the text label G is 6 square centimeters, the crossed area of the fixed anchor A and the text label G is 5 square centimeters, a preset threshold value is 0.5, and the cross-over ratio of the fixed anchor A and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample. The four point coordinates of the anchor A are (1, 1), (2, 1), (1, 6) and (2, 6), the four point coordinates of the text label G are (1, 1), (2, 1), (1, 7) and (2, 7), and the width A of the anchor A iswIs 1, height AhIs 5, therefore, the offset of the regression of the text label G in the X direction with respect to the anchor A is calculated by the algorithm as txIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is tyIs (0, 0, 0.5, 0.5); and finally, using the fixed anchor and the offset to train the text detection model.
When a text line is predicted, the predicted anchor is B, the predicted text box is F, the probability that each anchor is a text output by the text detection model is 0.75, and the preset threshold value is 0.5, so that the probability that each anchor is a text output by the text detection model is greater than the preset threshold value, the coordinates of the predicted anchor B are obtained as (1, 1), (2, 1), (1, 7) and (2, 7), and the deviation of the predicted text box F provided by the text detection model relative to the regression of the anchor B is TxIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is TyIf the number is (1, 1, 1, 1), the coordinates of the text box F to be predicted are [ 2, 2), (3, 2), (2, 8), (3, 8) ]; if the probability of outputting the predicted text box E by the text detection model is maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value or not, and if so, deleting the predicted text box CAnd D, reserving the predicted text box E as a text line obtained by detection.
After all text lines are detected and obtained, removing the text lines with small length-width ratio, removing the text lines with length-width ratio of 1/3< h/w <3, wherein the length of the rectangular text lines is h, and the width of the rectangular text lines is w; fitting the minimum envelope moment to the remaining text lines to form a text box, calculating to obtain the vector direction of the text box, obtaining the direction angle of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as an interval, finding the interval with the maximum number of the text lines, solving and taking the direction of the angle average value of the text line set of the interval with the maximum number of the text lines as the main direction of the text, wherein the angle range of the direction of the remaining text lines is between 0 degree and 90 degrees, the interval is taken 5 degrees as an interval and is divided into 18 intervals, and the angle range of the nth interval is [ (n-1) x 5, n x 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer. The angles of all text lines are obtained through calculation of a text detection model and are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] in 5 degrees, wherein the text line data of the second section is the most, the average value of the angles is 45 degrees, the main direction of the text is 45 degrees, and finally the main direction of the text is corrected to be the horizontal direction.
By cutting the text line corrected to the horizontal direction into slices and performing character classification and positive and negative direction prediction voting (probability statistics) on each character in the slices of a plurality of text lines at the same time, characters with similar shapes in the positive and negative directions in the character classification are firstly classified into one class and removed, namely, characters with particularly similar shapes in the positive direction and 180-degree rotation are taken out as one class, such as characters of '0', 'one', 'H', 'field', 'day' and the like, and other characters are positive at 0 degrees and negative at 180 degrees. The total 20 characters are predicted in the slice, the number of the characters predicted to be in the positive direction is 18, the number of the characters predicted to be in the negative direction is 2, the number of the characters predicted to be in the positive direction in the slice is larger than that of the characters predicted to be in the negative direction, therefore, the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior in the slice is further judged to be in the positive direction, and therefore, the direction of the shopping receipt is kept unchanged; if the number of characters predicted to be negative in the slice is larger than that of characters predicted to be positive in the positive direction, the direction of the slice is a negative direction, and if the directions of most slices are negative, the shopping receipt needs to be rotated by 180 degrees, so that the final correction of the shopping receipt is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.
The implementation of the present invention will be described with reference to the main flowchart of fig. 2, which shows an embodiment of a text direction correction method based on staged probability statistics.
Step S101, detecting a text image to obtain all text lines;
in one embodiment, all text lines in a text image may be obtained by using a text detection model based on a fixed anchor, the text detection model sets a dense rectangular box with a fixed size as an anchor in advance on the text image, and extracts the feature of the text image, classifies and regresses the fixed anchor, and obtains all text lines of the text by non-maximization suppression.
When a text detection model is trained, presetting a dense rectangular box with a fixed size as a fixed anchor on a text image, labeling a text label on a text line of the text image, calculating the intersection ratio of the fixed anchor and the text label, if the intersection ratio is greater than a preset threshold value, the fixed anchor is a positive sample, otherwise, the fixed anchor is a negative sample; and calculating to obtain the offset of the text label relative to the fixed anchor according to the difference between the coordinates of the text label and the fixed anchor, and finally inputting the positive sample and the negative sample of the fixed anchor and training the text detection model.
Further, when the text detection model is trained, if the anchor is a and the text label is G, then the calculation formula of the intersection ratio of the anchor a and the text label G is as follows:
IOU=area(A∩G)/(area(A)+area(G)–area(A∩G))
in the formula:
IOU represents the intersection ratio, area (x) represents the area of x, and A # G represents the intersection part of A and G;
if the intersection ratio IOU is larger than a preset threshold value, the anchor A is a positive sample, otherwise, the anchor A is a negative sample;
for example, the area of the anchor a is 5 square centimeters, the area of the text label G is 6 square centimeters, the area of the intersection of the anchor a and the text label G is 5 square centimeters, the preset threshold value is 0.5, and the intersection-parallel ratio of the anchor a and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample.
When the text detection model is trained, the regression offset calculation formula of the text label G relative to the fixed anchor A is as follows:
txi=(XGi-XAi)/Aw,tyi=(YGi-YAi)/Ah)
in the formula:
txithe displacement of the ith point in the X direction;
tyithe displacement of the ith point in the Y direction;
i is four points of a text label and a fixed anchor, and takes values of 1, 2, 3 and 4;
XGithe X coordinate of the ith point of the text label G;
XAithe X coordinate of the ith point of the anchor A is taken as the X coordinate of the ith point of the anchor A;
Awis the width of anchor a;
YGithe Y coordinate of the ith point of the text label G;
YAithe Y coordinate of the ith point of the anchor A is fixed;
Ahthe height of anchor a is fixed.
For another example, the four point coordinates of the anchor a are [ 1, 1), (2, 1), (1, 6), (2, 6) ], the four point coordinates of the text label G are [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the width a of the anchor awIs 1, height AhIs 5, and thus the text label G has an offset t from the regression of the anchor a in the X directionxIs (0, 0, 0, 0) in the Y directionThe offset of the regression is tyIs (0, 0, 0.5, 0.5).
When the trained text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold value, the predicted coordinates of the anchor are added with the offset of the predicted coordinates of the text box output by the text detection model relative to the predicted anchor to obtain an initial predicted text box detection result, and finally all text lines after detection are obtained from the initial predicted text box detection result through non-maximization inhibition.
Further, when a text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold, the predicted coordinates of the anchor are obtained, and the predicted coordinates of the text box are obtained according to the regression offset T of the text box provided by the text detection model relative to the predicted anchor;
when predicting text lines, the predicted anchor is B, the predicted text box is F, and the coordinate calculation formula of the predicted text box is as follows:
XFi=Txi×Bw+XBi
YFi=Tyi×Bh+YBi
in the formula:
XFiis the X coordinate of the ith point of the predicted text box;
Txiproviding an ith point X-direction shift amount of the predicted regression offset of the text box relative to the predicted anchor according to the text detection model;
Bwis the predicted anchor width;
XBithe X coordinate of the ith point of the predicted fixed anchor;
YFiis the Y coordinate of the predicted ith point of the text box;
Tyiproviding an ith amount of offset of a predicted text box from a regression of a predicted anchor according to the text detection modelPoint Y-direction shift amount;
Bhis the predicted anchor height;
YBiis the Y coordinate of the predicted anchor ith point.
For example, if the probability that each anchor is a text output by the text detection model is 0.75 and the preset threshold is 0.5, the probability that each anchor is a text output by the text detection model is greater than the preset threshold, the coordinates of the predicted anchors are obtained as [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the offset of the regression of the text box providing prediction according to the text detection model with respect to the anchor is TxIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is TyTo (1, 1, 1, 1), the coordinates of the text box for which prediction is obtained are [ 2, 2), (3, 2), (2, 8), (3, 8) ].
If the probability that the text detection model outputs the predicted text box E is the maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value, if so, deleting C, D the predicted text box, and keeping the predicted text box E as the text line obtained by detection.
Step S102, determining the direction of each text line and determining the main direction of all the text lines based on one or more directions with the highest occurrence probability;
in one embodiment, after all text lines are detected and obtained, removing text lines with a small length-width ratio, fitting a minimum envelope moment to the remaining text lines to form a text box, calculating to obtain a vector direction of the text box, obtaining direction angles of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as one interval, finding the interval with the largest number of the text lines, and solving and taking the direction of an angle average value of a text line set of the interval with the largest number of the text lines as a main direction of the text.
When the main direction of the text is calculated, removing text lines with the length-width ratio of 1/3< h/w <3, wherein the length of each rectangular text line is h, and the width of each rectangular text line is w;
the angle range of the rest text line directions is between 0 degree and 90 degrees, 5 degrees are taken as intervals and are divided into 18 intervals, the angle range of the nth interval is [ (n-1) multiplied by 5, n multiplied by 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer.
For example, fig. 3 is a schematic diagram of an embodiment of counting text lines in an interval of 5 degrees to obtain a main direction with the largest occurrence frequency according to the scheme of the present invention; after all text lines are obtained through detection, text lines with the length-width ratio being 1/3< h/w <3 are removed, the length of a rectangular text line is h, the width of the rectangular text line is w, a text box is formed by fitting a minimum envelope moment to the remaining text lines, the vector direction of the text box is obtained through calculation, the direction angles of the remaining text lines are obtained according to the vector direction of the text box, the angles of all the text lines are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] by 5 degrees, wherein the text line data in the second section is the largest, and the average value of the angles is 45 degrees, and the main direction of the text can be obtained as 45 degrees.
Step S103, correcting the main directions of all the text lines into the horizontal direction;
correcting all text lines into horizontal directions according to the determined main directions of all the text lines; fig. 4 is a schematic diagram of an embodiment of the scheme according to the invention for correcting the main direction of the text line to the horizontal direction.
Step S104, slicing the corrected text line, counting the positive and negative directions of at least part of slices, and finally correcting the text line based on the slice direction with the highest occurrence probability to ensure that the direction of the text image accords with a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
In one embodiment, as shown in FIG. 5, a schematic diagram of one embodiment of positive and negative directions of a text slice according to aspects of the present invention; cutting the text line corrected to the horizontal direction into slices, and simultaneously performing character classification and predictive positive-negative direction voting on each character in the slices of the plurality of text lines; FIG. 6 illustrates the determination of the direction of a predicted text line based on the direction of the text line slice according to aspects of the present invention; before counting the positive and negative directions of at least part of the slices, firstly, carrying out character classification on characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions, and only judging the positive and negative directions of the slices with dissimilar shapes of the characters in the positive and negative directions; after characters with similar shapes in the positive direction and the negative direction in character classification are removed, if the number of positive direction characters in the slice is larger than that of negative direction characters, the direction of the slice is judged to be the positive direction, if the directions of most slices are positive, the positive direction of the text behavior where the slice is located is further judged, and therefore the direction of the text is kept unchanged; otherwise, the text is rotated by 180 degrees to realize the final correction of the text, as shown in fig. 7, which is an embodiment of the scheme of the invention that the final correction of the text is realized by selecting a slice to perform voting detection and judging the positive and negative directions by single character voting.
In one embodiment, because the direction can be determined by voting of each word, the accuracy requirement for identifying the model is not high, the model is subjected to lightweight processing, and an RNN layer is removed; firstly, inputting a sliced image into a convolutional neural network, and outputting a character sequence prediction probability matrix P, wherein the shape of the matrix is (m, c), m is the length of a character sequence, and c is the number of character classification categories; when the character recognition of the slice is performed, the prediction type of each character in the character sequence is calculated according to the character sequence prediction probability matrix P, and if the maximum value index value of the character sequence prediction probability vector P [ i ] of the ith character is j ═ argmax (P [ i ]), the prediction type of the ith character is j.
For example, if the number c of character classification categories in the present embodiment is 3, and the prediction category preset for the ith character is j 0, it indicates that the ith character is a character with a similar shape when viewed from the front direction and 180 degrees of rotation, that is, a character with a similar shape in the positive and negative directions, such as characters "0", "one", "H", "farm", "day", etc.; presetting that when the prediction category of the ith character is j equal to 1, the ith character is a character which can be read and identified normally when being seen from the positive direction, namely the ith character is the character in the positive direction; when the prediction type of the ith character is preset to be j-2, the ith character is a character which can be normally read and recognized only by rotating 180 degrees when viewed from the positive direction, namely the ith character is a character in the negative direction.
For another example, by cutting the text line corrected to the horizontal direction into slices, and performing character classification and positive/negative direction vote prediction on each character in the slices of a plurality of text lines at the same time, characters with similar positive/negative direction shapes, such as characters "0", "one", "H", "field", "day", and the like, are first sorted and culled, and positive/negative direction vote statistics is performed only on characters in the positive direction (0 degrees) and characters in the negative direction (180 degrees). If the total 20 characters are predicted in the slice, if the characters predicted to be in the positive direction are 18, the characters predicted to be in the negative direction are 2, and the number of the characters predicted to be in the positive direction in the slice is greater than that of the characters predicted to be in the negative direction, so that the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior where the slice is located is further judged to be in the positive direction, and the direction of the text is kept unchanged; if the number of characters predicted to be in the negative direction in the slice is larger than that of characters in the positive direction, the direction of the slice is the negative direction, and if the directions of most slices are negative, the text needs to be rotated by 180 degrees, so that the final correction of the text is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.
Referring to fig. 8, a block diagram of an embodiment of a text direction correction system based on staged probability statistics according to the present invention is shown; the implementation of the present invention is explained. The system at least comprises:
a text line acquisition module 801 for detecting a text image to obtain all text lines;
in one embodiment, all text lines in a text image may be obtained by using a text detection model based on a fixed anchor, the text detection model sets a dense rectangular box with a fixed size as an anchor in advance on the text image, and extracts the feature of the text image, classifies and regresses the fixed anchor, and obtains all text lines of the text by non-maximization suppression.
When a text detection model is trained, presetting a dense rectangular box with a fixed size as a fixed anchor on a text image, labeling a text label on a text line of the text image, calculating the intersection ratio of the fixed anchor and the text label, if the intersection ratio is greater than a preset threshold value, the fixed anchor is a positive sample, otherwise, the fixed anchor is a negative sample; and calculating to obtain the offset of the text label relative to the fixed anchor according to the difference between the coordinates of the text label and the fixed anchor, and finally inputting the positive sample and the negative sample of the fixed anchor and training the text detection model.
Further, when the text detection model is trained, if the anchor is a and the text label is G, then the calculation formula of the intersection ratio of the anchor a and the text label G is as follows:
IOU=area(A∩G)/(area(A)+area(G)–area(A∩G))
in the formula:
IOU represents the intersection ratio, area (x) represents the area of x, and A # G represents the intersection part of A and G;
if the intersection ratio IOU is larger than a preset threshold value, the anchor A is a positive sample, otherwise, the anchor A is a negative sample;
for example, the area of the anchor a is 5 square centimeters, the area of the text label G is 6 square centimeters, the area of the intersection of the anchor a and the text label G is 5 square centimeters, the preset threshold value is 0.5, and the intersection-parallel ratio of the anchor a and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample.
When the text detection model is trained, the regression offset calculation formula of the text label G relative to the fixed anchor A is as follows:
txi=(XGi-XAi)/Aw,tyi=(YGi-YAi)/Ah)
in the formula:
txiis the ith point X directionThe amount of displacement of;
tyithe displacement of the ith point in the Y direction;
i is four points of a text label and a fixed anchor, and takes values of 1, 2, 3 and 4;
XGithe X coordinate of the ith point of the text label G;
XAithe X coordinate of the ith point of the anchor A is taken as the X coordinate of the ith point of the anchor A;
Awis the width of anchor a;
YGithe Y coordinate of the ith point of the text label G;
YAithe Y coordinate of the ith point of the anchor A is fixed;
Ahthe height of anchor a is fixed.
For another example, the four point coordinates of the anchor a are [ 1, 1), (2, 1), (1, 6), (2, 6) ], the four point coordinates of the text label G are [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the width a of the anchor awIs 1, height AhIs 5, and thus the text label G has an offset t from the regression of the anchor a in the X directionxIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is tyIs (0, 0, 0.5, 0.5).
When the trained text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold value, the predicted coordinates of the anchor are added with the offset of the predicted coordinates of the text box output by the text detection model relative to the predicted anchor to obtain an initial predicted text box detection result, and finally all text lines after detection are obtained from the initial predicted text box detection result through non-maximization inhibition.
Further, when a text detection model is applied, the text detection model outputs the probability of whether each anchor is a text, if the probability is greater than a preset threshold, the predicted coordinates of the anchor are obtained, and the predicted coordinates of the text box are obtained according to the regression offset T of the text box provided by the text detection model relative to the predicted anchor;
when predicting text lines, the predicted anchor is B, the predicted text box is F, and the coordinate calculation formula of the predicted text box is as follows:
XFi=Txi×Bw+XBi
YFi=Tyi×Bh+YBi
in the formula:
XFiis the X coordinate of the ith point of the predicted text box;
Txiproviding an ith point X-direction shift amount of the predicted regression offset of the text box relative to the predicted anchor according to the text detection model;
Bwis the predicted anchor width;
XBithe X coordinate of the ith point of the predicted fixed anchor;
YFiis the Y coordinate of the predicted ith point of the text box;
Tyiproviding an ith point Y-direction shift amount of the predicted regression offset of the text box relative to the predicted anchor according to the text detection model;
Bhis the predicted anchor height;
YBiis the Y coordinate of the predicted anchor ith point.
For example, if the probability that each anchor is a text output by the text detection model is 0.75 and the preset threshold is 0.5, the probability that each anchor is a text output by the text detection model is greater than the preset threshold, the coordinates of the predicted anchors are obtained as [ 1, 1), (2, 1), (1, 7), (2, 7) ], and the offset of the regression of the text box providing prediction according to the text detection model with respect to the anchor is TxIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is TyTo (1, 1, 1, 1), the coordinates of the text box for which prediction is obtained are [ 2, 2), (3, 2), (2, 8), (3, 8) ].
If the probability that the text detection model outputs the predicted text box E is the maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value, if so, deleting C, D the predicted text box, and keeping the predicted text box E as the text line obtained by detection.
A main direction determination module 802 for determining a direction of each text line and determining a main direction of all text lines based on the one or more directions with the highest probability of occurrence;
in one embodiment, after all text lines are detected and obtained, removing text lines with a small length-width ratio, fitting a minimum envelope moment to the remaining text lines to form a text box, calculating to obtain a vector direction of the text box, obtaining direction angles of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as one interval, finding the interval with the largest number of the text lines, and solving and taking the direction of an angle average value of a text line set of the interval with the largest number of the text lines as a main direction of the text.
When the main direction of the text is calculated, removing text lines with the length-width ratio of 1/3< h/w <3, wherein the length of each rectangular text line is h, and the width of each rectangular text line is w;
the angle range of the rest text line directions is between 0 degree and 90 degrees, 5 degrees are taken as intervals and are divided into 18 intervals, the angle range of the nth interval is [ (n-1) multiplied by 5, n multiplied by 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer.
For example, fig. 3 is a schematic diagram of an embodiment of counting text lines in an interval of 5 degrees to obtain a main direction with the largest occurrence frequency according to the scheme of the present invention; after all text lines are obtained through detection, text lines with the length-width ratio being 1/3< h/w <3 are removed, the length of a rectangular text line is h, the width of the rectangular text line is w, a text box is formed by fitting a minimum envelope moment to the remaining text lines, the vector direction of the text box is obtained through calculation, the direction angles of the remaining text lines are obtained according to the vector direction of the text box, the angles of all the text lines are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] by 5 degrees, wherein the text line data in the second section is the largest, and the average value of the angles is 45 degrees, and the main direction of the text can be obtained as 45 degrees.
A horizontal direction rectifying module 803 for rectifying the main direction of all the text lines into a horizontal direction;
correcting all text lines into horizontal directions according to the determined main directions of all the text lines; fig. 4 is a schematic diagram of an embodiment of the scheme according to the invention for correcting the main direction of the text line to the horizontal direction.
A final correction module 804, configured to slice the corrected text line, count positive and negative directions of at least part of the slices, and perform final correction based on a slice direction with the highest occurrence probability, so that the direction of the text image conforms to a preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
In one embodiment, as shown in FIG. 5, a schematic diagram of one embodiment of positive and negative directions of a text slice according to aspects of the present invention; cutting the text line corrected to the horizontal direction into slices, and simultaneously performing character classification and predictive positive-negative direction voting on each character in the slices of the plurality of text lines; FIG. 6 illustrates the determination of the direction of a predicted text line based on the direction of the text line slice according to aspects of the present invention; before counting the positive and negative directions of at least part of the slices, firstly, carrying out character classification on characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions, and only judging the positive and negative directions of the slices with dissimilar shapes of the characters in the positive and negative directions; after characters with similar shapes in the positive direction and the negative direction in character classification are removed, if the number of positive direction characters in the slice is larger than that of negative direction characters, the direction of the slice is judged to be the positive direction, if the directions of most slices are positive, the positive direction of the text behavior where the slice is located is further judged, and therefore the direction of the text is kept unchanged; otherwise, the text is rotated by 180 degrees to realize the final correction of the text, as shown in fig. 7, which is an embodiment of the scheme of the invention that the final correction of the text is realized by selecting a slice to perform voting detection and judging the positive and negative directions by single character voting.
In one embodiment, because the direction can be determined by voting of each word, the accuracy requirement for identifying the model is not high, the model is subjected to lightweight processing, and an RNN layer is removed; firstly, inputting a sliced image into a convolutional neural network, and outputting a character sequence prediction probability matrix P, wherein the shape of the matrix is (m, c), m is the length of a character sequence, and c is the number of character classification categories; when the character recognition of the slice is performed, the prediction type of each character in the character sequence is calculated according to the character sequence prediction probability matrix P, and if the maximum value index value of the character sequence prediction probability vector P [ i ] of the ith character is j ═ argmax (P [ i ]), the prediction type of the ith character is j.
For example, if the number c of character classification categories in the present embodiment is 3, and the prediction category preset for the ith character is j 0, it indicates that the ith character is a character with a similar shape when viewed from the front direction and 180 degrees of rotation, that is, a character with a similar shape in the positive and negative directions, such as characters "0", "one", "H", "farm", "day", etc.; presetting that when the prediction category of the ith character is j equal to 1, the ith character is a character which can be read and identified normally when being seen from the positive direction, namely the ith character is the character in the positive direction; when the prediction type of the ith character is preset to be j-2, the ith character is a character which can be normally read and recognized only by rotating 180 degrees when viewed from the positive direction, namely the ith character is a character in the negative direction.
For another example, by cutting the text line corrected to the horizontal direction into slices, and performing character classification and positive/negative direction vote prediction on each character in the slices of a plurality of text lines at the same time, characters with similar positive/negative direction shapes, such as characters "0", "one", "H", "field", "day", and the like, are first sorted and culled, and positive/negative direction vote statistics is performed only on characters in the positive direction (0 degrees) and characters in the negative direction (180 degrees). If the total 20 characters are predicted in the slice, if the characters predicted to be in the positive direction are 18, the characters predicted to be in the negative direction are 2, and the number of the characters predicted to be in the positive direction in the slice is greater than that of the characters predicted to be in the negative direction, so that the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior where the slice is located is further judged to be in the positive direction, and the direction of the text is kept unchanged; if the number of characters predicted to be in the negative direction in the slice is larger than that of characters in the positive direction, the direction of the slice is the negative direction, and if the directions of most slices are negative, the text needs to be rotated by 180 degrees, so that the final correction of the text is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.
An example of an application scenario of the technical solution of the present invention is described below to further illustrate the implementation of the present invention: carrying out text direction correction and recognition on shopping tickets in a certain market, firstly, when a text detection model is trained, presetting a dense rectangular frame with a fixed size as a fixed anchor on an image of the shopping tickets, marking a text label on a text line of the image of the shopping tickets, wherein the area of the fixed anchor A is 5 square centimeters, the area of the text label G is 6 square centimeters, the crossed area of the fixed anchor A and the text label G is 5 square centimeters, a preset threshold value is 0.5, and the coincidence ratio of the fixed anchor A and the text label G is obtained through formula calculation: the IOU is 5/(11-5) 5/6 is 0.83, which is greater than the preset threshold of 0.5, and anchor a is a positive sample. The four point coordinates of the anchor A are (1, 1), (2, 1), (1, 6) and (2, 6), the four point coordinates of the text label G are (1, 1), (2, 1), (1, 7) and (2, 7), and the width A of the anchor A iswIs 1, height AhIs 5, therefore, the offset of the regression of the text label G in the X direction with respect to the anchor A is calculated by the algorithm as txIs (0, 0, 0, 0), and the amount of deviation of the regression in the Y direction is tyIs (0, 0, 0.5, 0.5); and finally, using the fixed anchor and the offset to train the text detection model.
When the text line is predicted, the predicted fixed anchor is B, the predicted text box is F, the probability that each fixed anchor is the text output by the text detection model is 0.75, and the preset threshold value is 0.5, so that the probability that each fixed anchor is the text output by the text detection model is greater than the preset threshold valueSetting a threshold value, obtaining the coordinates of the predicted anchor B as [ 1, 1), (2, 1), (1, 7) and (2, 7 ], and providing the predicted offset of the text box F relative to the regression of the anchor B as T according to the text detection modelxIs (1, 1, 1, 1), and the amount of the regression deviation in the Y direction is TyIf the number is (1, 1, 1, 1), the coordinates of the text box F to be predicted are [ 2, 2), (3, 2), (2, 8), (3, 8) ]; if the probability that the text detection model outputs the predicted text box E is the maximum, respectively calculating and judging whether the intersection ratio IOU of the predicted text box C, D and the predicted text box E is greater than a preset threshold value, if so, deleting C, D the predicted text box, and keeping the predicted text box E as the text line obtained by detection.
After all text lines are detected and obtained, removing the text lines with small length-width ratio, removing the text lines with length-width ratio of 1/3< h/w <3, wherein the length of the rectangular text lines is h, and the width of the rectangular text lines is w; fitting the minimum envelope moment to the remaining text lines to form a text box, calculating to obtain the vector direction of the text box, obtaining the direction angle of the remaining text lines according to the vector direction of the text box, counting the number of the text lines in each interval by taking 5 degrees as an interval, finding the interval with the maximum number of the text lines, solving and taking the direction of the angle average value of the text line set of the interval with the maximum number of the text lines as the main direction of the text, wherein the angle range of the direction of the remaining text lines is between 0 degree and 90 degrees, the interval is taken 5 degrees as an interval and is divided into 18 intervals, and the angle range of the nth interval is [ (n-1) x 5, n x 5], n is more than or equal to 1 and less than or equal to 18, and n is an integer. The angles of all text lines are obtained through calculation of a text detection model and are 30 degrees, 35 degrees, 45 degrees, 70 degrees and 45 degrees, the text line angles can be divided into 3 sections [ 30, 35 ], [ 45, 45 ] and [ 75 ] in 5 degrees, wherein the text line data of the second section is the most, the average value of the angles is 45 degrees, the main direction of the text is 45 degrees, and finally the main direction of the text is corrected to be the horizontal direction.
By cutting the text line corrected to the horizontal direction into slices and performing character classification and positive and negative direction voting prediction on each character in the slices of a plurality of text lines at the same time, characters with similar positive and negative direction shapes, such as characters "0", "one", "H", "field", "day", and the like, are first taken as one class and eliminated, and positive and negative direction voting statistics are performed only on characters in the positive direction (0 degrees) and characters in the negative direction (180 degrees). If the total 20 characters are predicted in the slice, if the characters predicted to be in the positive direction are 18, the characters predicted to be in the negative direction are 2, and the number of the characters predicted to be in the positive direction in the slice is greater than that of the characters predicted to be in the negative direction, so that the direction of the slice is in the positive direction, if the directions of most slices are positive, the text behavior where the slice is located is further judged to be in the positive direction, and the direction of the text is kept unchanged; if the number of characters predicted to be in the negative direction in the slice is larger than that of characters in the positive direction, the direction of the slice is the negative direction, and if the directions of most slices are negative, the text needs to be rotated by 180 degrees, so that the final correction of the text is realized; and multiple (e.g., 3 or 5) slices may be selected to predict the vote together when actually applied.
It will be understood by those skilled in the art that all or part of the flow of the method according to the above-described embodiment may be implemented by a computer program, which may be stored in a computer-readable storage medium and used to implement the steps of the above-described embodiments of the method when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying said computer program code, media, usb disk, removable hard disk, magnetic diskette, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunication signals, software distribution media, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.
Further, in one embodiment of a computer-readable storage medium of the present invention, includes: the storage medium has stored therein a plurality of program codes adapted to be loaded and executed by a processor to perform the method of any of the preceding claims.
Further, in an embodiment of a control device of the invention, the processing device comprises a processor and a memory, said memory device being adapted to store a plurality of program codes, said program codes being adapted to be loaded and run by said processor to perform the method of any of the preceding claims.
Further, it should be understood that, since the modules are only configured to illustrate the functional units of the system of the present invention, the corresponding physical devices of the modules may be the processor itself, or a part of software, a part of hardware, or a part of a combination of software and hardware in the processor. Thus, the number of individual modules in the figures is merely illustrative.
Those skilled in the art will appreciate that the various modules in the system may be adaptively split or combined. Such splitting or combining of specific modules does not cause the technical solutions to deviate from the principle of the present invention, and therefore, the technical solutions after splitting or combining will fall within the protection scope of the present invention.
So far, the technical solution of the present invention has been described with reference to one embodiment shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (12)

1. A text direction rectification method based on staged probability statistics is characterized by comprising the following steps:
detecting a text image to obtain all text lines;
determining a direction of each text line and determining a principal direction of all text lines based on the one or more directions having the highest probability of occurrence;
correcting the main direction of all the text lines into a horizontal direction;
slicing the corrected text line, counting the positive and negative directions of at least part of slices, and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with a preset direction;
wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
2. The method according to claim 1, wherein the step of "detecting the text image to obtain all text lines" comprises in particular: detecting the text image in a fixed anchor mode to obtain all text lines; and/or
The method further comprises the following steps: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.
3. The method according to claim 1, wherein the step of determining the main direction of all text lines based on the one or more directions with the highest probability of occurrence specifically comprises: and taking the direction of the average value of the angles of the text lines with the largest occurrence number relative to the horizontal direction as the main direction.
4. The method of claim 1, further comprising:
before counting the positive and negative directions of at least part of slices, carrying out character classification on characters in at least part of slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;
the step of counting the positive and negative directions of at least part of the slices specifically comprises the following steps:
and counting positive and negative directions only on the slices with dissimilar positive and negative character shapes.
5. The method according to claim 1, wherein the step of performing final rectification based on the slice direction with the highest probability of occurrence specifically comprises:
inputting the image of the slice into a convolutional neural network, and calculating the prediction category of each character on the slice; if the number of positive direction characters in the characters with the dissimilar positive and negative direction shapes is larger than that of the negative direction characters, the slicing direction is positive; otherwise, the slice direction is negative;
if the direction of most slices is positive, keeping the current direction unchanged; otherwise, all text is rotated by 180 degrees.
6. A system for correcting text orientation based on staged probability statistics, comprising:
the text line acquisition module is used for detecting a text image to obtain all text lines;
a main direction determination module for determining the direction of each text line and determining the main direction of all text lines based on the one or more directions with the highest probability of occurrence;
a horizontal direction rectifying module for rectifying the main direction of all the text lines into a horizontal direction;
the final correction module is used for slicing the corrected text line, counting the positive and negative directions of at least part of slices and performing final correction based on the slice direction with the highest occurrence probability to enable the direction of the text image to accord with the preset direction; wherein the positive and negative directions of the slice are determined by the positive and negative directions of the characters in the slice.
7. The system of claim 6, wherein the text line obtaining module performs operations specifically comprising: detecting the text image in a fixed anchor mode to obtain all text lines; and/or
Further comprising: after all text lines are obtained, text lines with aspect ratios smaller than a set threshold are removed, and only the direction of each remaining text line is determined.
8. The system according to claim 6, wherein the main direction determination module takes, as the main direction, a direction of an average value of angles of the text line that appears most frequently with respect to the horizontal direction when determining the main direction of all the text lines based on one or more directions in which the probability of occurrence is highest.
9. The system of claim 6,
before counting the positive and negative directions of at least part of the slices, the final correction module carries out character classification on characters in at least part of the slices, wherein the character classification result at least comprises characters with similar shapes in the positive and negative directions and characters with dissimilar shapes in the positive and negative directions;
when the positive and negative directions of at least part of the slices are counted, the final correction module only counts the positive and negative directions of the slices with dissimilar characters in the positive and negative directions.
10. The system of claim 6, wherein, in performing the final correction based on the slice direction with the highest probability of occurrence, the final correction module performs operations comprising:
inputting the image of the slice into a convolutional neural network, and calculating the prediction category of each character on the slice; if the number of positive direction characters in the characters with the dissimilar positive and negative direction shapes is larger than that of the negative direction characters, the slicing direction is positive; otherwise, the slice direction is negative;
if the direction of most slices is positive, keeping the current direction unchanged; otherwise, all text is rotated by 180 degrees.
11. A computer-readable storage medium, characterized in that a plurality of program codes are stored in the storage medium, which program codes are adapted to be loaded and executed by a processor to perform the method according to any of claims 1 to 5.
12. A control apparatus comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, wherein the program codes are adapted to be loaded and run by the processor to perform the method of any of claims 1 to 5.
CN202110128262.6A 2021-01-29 2021-01-29 Text direction correction method, system and device based on staged probability statistics Active CN112766266B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110128262.6A CN112766266B (en) 2021-01-29 2021-01-29 Text direction correction method, system and device based on staged probability statistics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110128262.6A CN112766266B (en) 2021-01-29 2021-01-29 Text direction correction method, system and device based on staged probability statistics

Publications (2)

Publication Number Publication Date
CN112766266A true CN112766266A (en) 2021-05-07
CN112766266B CN112766266B (en) 2021-12-10

Family

ID=75703754

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110128262.6A Active CN112766266B (en) 2021-01-29 2021-01-29 Text direction correction method, system and device based on staged probability statistics

Country Status (1)

Country Link
CN (1) CN112766266B (en)

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060036608A1 (en) * 2004-08-11 2006-02-16 Adknowledge, Inc. Method and system for generating and distributing electronic communications
US20080317341A1 (en) * 2007-06-21 2008-12-25 Speigle Jon M Methods and Systems for Identifying Text Orientation in a Digital Image
US20100014782A1 (en) * 2008-07-15 2010-01-21 Nuance Communications, Inc. Automatic Correction of Digital Image Distortion
WO2016069005A1 (en) * 2014-10-31 2016-05-06 Hewlett-Packard Development Company, L.P. Text line detection
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain
CN108427950A (en) * 2018-02-01 2018-08-21 北京捷通华声科技股份有限公司 A kind of literal line detection method and device
CN108596066A (en) * 2018-04-13 2018-09-28 武汉大学 A kind of character identifying method based on convolutional neural networks
CN108681729A (en) * 2018-05-08 2018-10-19 腾讯科技(深圳)有限公司 Text image antidote, device, storage medium and equipment
CN109299717A (en) * 2018-09-13 2019-02-01 网易(杭州)网络有限公司 Text region model foundation and character recognition method, device, medium and equipment
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
CN109919147A (en) * 2019-03-04 2019-06-21 上海宝尊电子商务有限公司 The method of text identification in drop for clothing image
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110705515A (en) * 2019-10-18 2020-01-17 山东健康医疗大数据有限公司 Hospital paper archive filing method and system based on OCR character recognition
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111260569A (en) * 2020-01-10 2020-06-09 百度在线网络技术(北京)有限公司 Method and device for correcting image inclination, electronic equipment and storage medium
CN111428723A (en) * 2020-04-02 2020-07-17 苏州杰锐思智能科技股份有限公司 Character recognition method and device, electronic equipment and storage medium
CN111444918A (en) * 2020-04-01 2020-07-24 中移雄安信息通信科技有限公司 Image inclined text line detection model training and image inclined text line detection method
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111553347A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method oriented to any angle
CN111985469A (en) * 2019-05-22 2020-11-24 珠海金山办公软件有限公司 Method and device for recognizing characters in image and electronic equipment
CN112016341A (en) * 2019-05-28 2020-12-01 珠海金山办公软件有限公司 Text picture correction method and device
CN112101317A (en) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060036608A1 (en) * 2004-08-11 2006-02-16 Adknowledge, Inc. Method and system for generating and distributing electronic communications
US20080317341A1 (en) * 2007-06-21 2008-12-25 Speigle Jon M Methods and Systems for Identifying Text Orientation in a Digital Image
US20100014782A1 (en) * 2008-07-15 2010-01-21 Nuance Communications, Inc. Automatic Correction of Digital Image Distortion
WO2016069005A1 (en) * 2014-10-31 2016-05-06 Hewlett-Packard Development Company, L.P. Text line detection
CN106845475A (en) * 2016-12-15 2017-06-13 西安电子科技大学 Natural scene character detecting method based on connected domain
CN108427950A (en) * 2018-02-01 2018-08-21 北京捷通华声科技股份有限公司 A kind of literal line detection method and device
CN108596066A (en) * 2018-04-13 2018-09-28 武汉大学 A kind of character identifying method based on convolutional neural networks
CN108681729A (en) * 2018-05-08 2018-10-19 腾讯科技(深圳)有限公司 Text image antidote, device, storage medium and equipment
CN109299717A (en) * 2018-09-13 2019-02-01 网易(杭州)网络有限公司 Text region model foundation and character recognition method, device, medium and equipment
CN109492630A (en) * 2018-10-26 2019-03-19 信雅达系统工程股份有限公司 A method of the word area detection positioning in the financial industry image based on deep learning
CN109919147A (en) * 2019-03-04 2019-06-21 上海宝尊电子商务有限公司 The method of text identification in drop for clothing image
CN111985469A (en) * 2019-05-22 2020-11-24 珠海金山办公软件有限公司 Method and device for recognizing characters in image and electronic equipment
CN112016341A (en) * 2019-05-28 2020-12-01 珠海金山办公软件有限公司 Text picture correction method and device
CN110363252A (en) * 2019-07-24 2019-10-22 山东大学 It is intended to scene text detection end to end and recognition methods and system
CN110705515A (en) * 2019-10-18 2020-01-17 山东健康医疗大数据有限公司 Hospital paper archive filing method and system based on OCR character recognition
CN110866495A (en) * 2019-11-14 2020-03-06 杭州睿琪软件有限公司 Bill image recognition method, bill image recognition device, bill image recognition equipment, training method and storage medium
CN111260569A (en) * 2020-01-10 2020-06-09 百度在线网络技术(北京)有限公司 Method and device for correcting image inclination, electronic equipment and storage medium
CN111444918A (en) * 2020-04-01 2020-07-24 中移雄安信息通信科技有限公司 Image inclined text line detection model training and image inclined text line detection method
CN111428723A (en) * 2020-04-02 2020-07-17 苏州杰锐思智能科技股份有限公司 Character recognition method and device, electronic equipment and storage medium
CN111539309A (en) * 2020-04-21 2020-08-14 广州云从鼎望科技有限公司 Data processing method, system, platform, equipment and medium based on OCR
CN111553347A (en) * 2020-04-26 2020-08-18 佛山市南海区广工大数控装备协同创新研究院 Scene text detection method oriented to any angle
CN112101317A (en) * 2020-11-17 2020-12-18 深圳壹账通智能科技有限公司 Page direction identification method, device, equipment and computer readable storage medium

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
M. SHANG 等: "Character Region Awareness Network For Scene Text Recognition", 《2020 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO》 *
RUOCHEN WANG 等: "Offset Neural Network for Document Orientation Identification", 《2018 13TH IAPR INTERNATIONAL WORKSHOP ON DOCUMENT ANALYSIS SYSTEMS》 *
Y. JIANG 等: "R2CNN: Rotational Region CNN for Arbitrarily-Oriented Scene Text Detection", 《 018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION》 *
冯子勇: "基于深度学习的图像特征学习和分类方法的研究及应用", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *
刘荟悦: "基于深度神经网络的印刷体文字识别", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
王阳 等: "基于深度学习的OCR文字识别在银行业的应用研究", 《计算机应用研究》 *

Also Published As

Publication number Publication date
CN112766266B (en) 2021-12-10

Similar Documents

Publication Publication Date Title
EP3933771A1 (en) Positioning method and apparatus, and storage medium
CN111931864B (en) Method and system for multiple optimization of target detector based on vertex distance and cross-over ratio
CN111797829A (en) License plate detection method and device, electronic equipment and storage medium
CN112997190A (en) License plate recognition method and device and electronic equipment
CN111985469B (en) Method and device for recognizing characters in image and electronic equipment
CN110874618A (en) OCR template learning method and device based on small sample, electronic equipment and medium
CN114038004A (en) Certificate information extraction method, device, equipment and storage medium
CN111626145B (en) Simple and effective incomplete form identification and page-crossing splicing method
CN112766266B (en) Text direction correction method, system and device based on staged probability statistics
CN110909772A (en) High-precision real-time multi-scale dial pointer detection method and system
CN114005120A (en) License plate character cutting method, license plate recognition method, device, equipment and storage medium
CN112434583B (en) Lane transverse deceleration marking line detection method and system, electronic equipment and storage medium
CN111751279A (en) Optical image capturing parameter adjusting method and sensing device
CN112926426A (en) Ship identification method, system, equipment and storage medium based on monitoring video
CN115147855A (en) Method and system for carrying out batch OCR (optical character recognition) on bills
US20060204105A1 (en) Image recognition method
US20070248269A1 (en) Method and apparatus for image processing
CN110874538B (en) Method and device for evaluating decoding result of bar code and electronic equipment
CN116597466A (en) Engineering drawing text detection and recognition method and system based on improved YOLOv5s
JP7234495B2 (en) Image processing device and program
CN114612919A (en) Bill information processing system, method and device
CN113743316A (en) Vehicle jamming behavior identification method, system and device based on target detection
CN106251468B (en) A kind of paper money discrimination method and apparatus
CN113111888B (en) Picture discrimination method and device
CN117389492B (en) Thermal printer order reminding method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant