CN112784836A - Text and graphic offset angle prediction and correction method thereof - Google Patents
Text and graphic offset angle prediction and correction method thereof Download PDFInfo
- Publication number
- CN112784836A CN112784836A CN202110090669.4A CN202110090669A CN112784836A CN 112784836 A CN112784836 A CN 112784836A CN 202110090669 A CN202110090669 A CN 202110090669A CN 112784836 A CN112784836 A CN 112784836A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- model
- standard
- coordinates
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 50
- 238000012937 correction Methods 0.000 title claims abstract description 22
- 238000001514 detection method Methods 0.000 claims abstract description 24
- 238000013145 classification model Methods 0.000 claims abstract description 17
- 238000013135 deep learning Methods 0.000 claims abstract description 4
- 238000012549 training Methods 0.000 claims description 33
- 230000006870 function Effects 0.000 claims description 21
- 230000009466 transformation Effects 0.000 claims description 20
- 238000004422 calculation algorithm Methods 0.000 claims description 18
- 230000008569 process Effects 0.000 claims description 16
- 238000013527 convolutional neural network Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 13
- 230000001629 suppression Effects 0.000 claims description 11
- 238000011176 pooling Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 5
- JJDLLVZMTMEVBY-DAXSKMNVSA-N BOX B Chemical compound CC1=C(C=C)C(=O)N\C1=C/C(N)=O JJDLLVZMTMEVBY-DAXSKMNVSA-N 0.000 claims description 3
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 206010061274 Malocclusion Diseases 0.000 abstract description 5
- 238000010586 diagram Methods 0.000 description 7
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000007635 classification algorithm Methods 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000002679 ablation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000003702 image correction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002779 inactivation Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/62—Text, e.g. of license plates, overlay texts or captions on TV images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Evolutionary Biology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a text and graphic offset angle prediction and correction method, which comprises the following four steps: (1) acquiring a text image to be recognized; (2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill; (3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction; (4) and if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction. According to the method, the text images can be subjected to angle classification of any angle in batch, an angle deviation predicted value is returned, the value can be compared with a standard angle, the text images of non-standard angles are automatically converted into the text images of standard angles, the size of a model is only 3M, the model can be deployed on a PC or a mobile terminal, and meanwhile, the accuracy is high.
Description
Technical Field
The invention relates to the technical field of text image correction, in particular to a text image offset angle prediction method and a text image offset angle correction method.
Background
With the development of mobile internet, big data and artificial intelligence, the internet enters a new era, which is rapidly changing the tradition of human society and bringing great impact to the traditional banking industry, especially to the traditional bank service mode taking website tellers and managers as the core, the application of artificial intelligence is increasingly wide, the artificial intelligence technologies such as text direction classification technology and the like are beginning to be widely applied to various business fields such as banks, enterprises and the like, and bank bills are accurately classified according to text directions.
Firstly, however, most of the existing banks use manual adjustment of the angle of bills when processing bills in batch, and cannot automatically and batch correct the pictures with offset uploaded pictures;
some classifiers in the text direction can only classify at 0 degree and 180 degrees, and cannot classify text pictures at any angle;
after the text angle classification is carried out, the corresponding automatic correction cannot be carried out on the picture;
secondly, some algorithm models are heavyweight models at the present stage, and cannot be deployed at a mobile terminal, and the landing requirements cannot be met due to too low calling speed.
Disclosure of Invention
The invention aims to: in order to solve the technical problems mentioned in the background art, a text graphic offset angle prediction and a correction method thereof are provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text graphic offset angle prediction and correction method thereof includes the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a convolution layer of 3 x 3 and an average pooling layer, the convolution layer is used for extracting text features, the average pooling layer is composed of a plurality of convolution layers, the average pooling layer is connected with an output layer through two convolution layers of 1 x1, and an h-swish function is adopted as an activation function of the depth-separable convolutional neural network; the depth-separable convolutional neural network optimizes the model by adjusting model basic parameters in a training process; the depth separable convolutional neural network adopts a cross entropy loss function during training; an image enhancement processing means is adopted in the generation process of the training set of the MobileNet V3 model; the step (4) converts the text region in the image into a standard orientation by rotating the text region by a corresponding angle in an instant direction using a coordinate rotation formula and an OPENCV technique, respectively.
As a further description of the above technical solution:
the flow of the non-maximum suppression algorithm includes the steps of:
(1) dividing coordinates of a text detection box obtained by text direction classification model regression according to categories;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list and finishing the algorithm.
As a further description of the above technical solution:
ReLU is an activation function;
the formula for ReLU is:
and x is the input characteristic value.
As a further description of the above technical solution:
the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) and adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and evaluating the model by using a cross entropy loss function.
As a further description of the above technical solution:
the image enhancement processing includes the steps of:
(1) performing image rotation on the acquired text image;
(2) and carrying out perspective transformation on the text image subjected to image rotation.
As a further description of the above technical solution:
the formula adopted for image rotation is as follows:
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position.
As a further description of the above technical solution:
the general transformation formula of the perspective transformation is as follows:
(u, v) are original image pixel coordinates),for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation.
As a further description of the above technical solution:
the cross entropy loss function is:
w is a weight;
xi, yi are eigenvalues;
b is an offset value.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. according to the method, the text images can be subjected to angle classification of any angle in batch, the images are subjected to model prediction, an angle deviation predicted value is returned, the value can be compared with a standard angle, the text images of non-standard angles are automatically converted into the text images of standard angles, the whole system provides very convenient technical support for the stage of bill deviation image preprocessing, the images are automatically aligned, the subsequent image detection and identification are facilitated, and the manual labor cost and time are reduced.
2. The invention provides a lightweight algorithm model, the size of the model is only 3M, and the model can be deployed on a PC or a mobile terminal and has high accuracy.
3. In the invention, in the process of reading a parameter file required by training and training, the system modifies the learning rate by using a method for automatically adjusting the learning rate, the method for automatically adjusting the learning rate is used because the optimized deep neural network is regarded as an empirical process to a great extent, the optimized deep neural network needs to manually adjust several parameters such as the learning rate, weight attenuation and random inactivation rate, so to speak, the learning rate is the most important one of the parameters needing to be adjusted, compared with the fixed learning rate, the variable learning rate scheduling system can provide faster convergence, can improve the training speed of the model and enhance the generalization capability of the model, and the images of bank notes are used for carrying out corresponding image enhancement techniques on the images, such as image rotation, image perspective transformation and other direction transformation technical means for carrying out image rotation and image annotation, and motion blur and Gaussian noise are used, the number of the pictures in the training set is increased for the operation purpose, in order to improve the generalization capability of the model and avoid overfitting of the model, after multiple times of ablation training, the system achieves a good classification effect and can perform automatic correction according to the recognized classification angle, the output result of the system comprises a predicted angle value and confidence coefficient, the system can automatically correct the pictures to a standard position through the angle prediction value, an early warning is prompted according to the confidence coefficient, if the confidence coefficient is too low, the corrected images possibly need to be manually intervened, and the robustness of the whole system is improved.
Drawings
FIG. 1 is a schematic diagram illustrating a text image direction classification algorithm flow of a text image offset angle prediction and a correction method thereof according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an application of a text image direction classification algorithm to a text image offset angle prediction and correction method thereof according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating the process of determining the orientation correction of the image to be recognized according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 4 is a schematic diagram illustrating 180-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 5 is a schematic diagram illustrating 15-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 6 is a schematic diagram illustrating 90-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 7 is a schematic diagram of a training system based on a MobileNet V3 model for predicting the offset angle of a text graphic and a correction method thereof according to an embodiment of the present invention;
fig. 8 shows a structure diagram of a MobileNetV3 model for text graphic offset angle prediction and correction method thereof according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1-8, the present invention provides a technical solution: a text graphic offset angle prediction and correction method thereof includes the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a 3 x 3 convolutional layer and an average pooling layer for extracting text features, wherein the average pooling layer is composed of a plurality of convolutional layers, the average pooling layer is connected with an output layer through two 1 x1 convolutional layers, after the four steps, the text images can be subjected to angle classification of any angle in batch, after the images are subjected to model prediction, an angle deviation predicted value is returned, the value is compared with a standard angle, the text images of non-standard angles are automatically converted into text images of standard angles, the steps provide a lightweight algorithm model, the size of the model is only 3M, the model can be deployed at a PC or a mobile terminal, and meanwhile, the accuracy is high, wherein in the step (4), a text area in the images is respectively rotated by corresponding angles in an instant needle through coordinate rotation formula and OPENCV technology, converting it to a standard orientation.
Referring to FIG. 1, the calculation formula of the h-swish activation function of the MobileNet V3 model is
ReLU is an activation function;
the formula for ReLU is:
x is an input characteristic value;
the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and simultaneously performing model evaluation by using a cross entropy loss function;
the image enhancement processing includes the steps of:
(1) performing image rotation on the acquired text image;
(2) carrying out perspective transformation on the text image subjected to image rotation;
the formula adopted for image rotation is:
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position;
the general transformation formula for the perspective transformation is:
(u, v) are original image pixel coordinates),for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation;
the cross entropy loss function is:
w is a weight;
xi, yi are eigenvalues;
b is an offset value, in which the h-Swish function is approximated by a Swish functionThe method has the advantages that sigmoid is replaced, the number of filters can be reduced from 32 to 16 under the condition of not losing precision through experiments due to the benefit of h-swish design, the time can be saved by 3ms through the improvement, 1000 ten thousand times of multiply-add operation is realized, the basic parameters of the model are adjusted, the model does not need to be reconstructed, the efficiency is greatly improved, meanwhile, under the condition that the data volume is not large, the characteristics learned by the model can be more robust through fine adjustment, the cross entropy loss function is selected for training, multiple parameter adjustment is carried out for training, the generalization capability of the model is improved, and the coordinate rotation formula for correcting the picture in the MobileNet V3 model is optimized through image enhancement processing.
Referring to fig. 1, the flow of the non-maximum suppression algorithm includes the following steps:
(1) dividing the coordinates of the text detection box obtained by the regression of the text direction classification model according to categories, and removing the background of the text image;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list, finishing the algorithm, and obtaining the target detection frame of the text through a non-maximum suppression algorithm, namely obtaining the specific position information of the text in the image, thereby being convenient for storing the coordinates of each detection frame.
The working principle is as follows: when the method is used, firstly, a MobileNet V3 model needs to be trained, in order to generate a training set of offset angles, images need to be enhanced, random offset angles of some pictures can be optimized by adopting image enhancement processing to a coordinate rotation formula for correcting the pictures in the MobileNet V3 model, in the training process, a method for adjusting model basic parameters is adopted, the model basic parameters are adjusted, the model basic parameters do not need to be re-constructed, the efficiency is greatly improved, meanwhile, under the condition that the data volume of the model is not large, fine adjustment enables the characteristics learned by the model to be more robust, wherein a cross entropy loss function is selected for training, multiple times of parameter adjustment are carried out for training, the generalization capability of the model is improved, the model with high training needs to be evaluated in the training process, and the use accuracy in the system is improvedRecall rateAnd F1 scoreThe performance of the model is evaluated, the accuracy of the model evaluation can reach more than 95%, after training is finished, the model is rapidly predicted through a prediction script, single picture or batch prediction can be carried out through formulating a prediction picture or folder path, the system can output an angle prediction value and a confidence coefficient of the picture, for example, (30,0.99876), the system can automatically correct the picture to a standard position through the angle prediction value, an early warning is prompted according to the confidence coefficient, if the confidence coefficient is too low, the corrected picture can be required to be manually intervened, the robustness of the whole system is improved, secondly, after the model is trained, the text image can be subjected to angle classification of any angle in batch, after the picture is predicted through the model, an angle deviation prediction value is returned, the value can be compared with a standard angle, the text image of a non-standard angle is automatically converted into the text image of the standard angle, the steps provide a lightweight algorithm model, the size of the model is only 3M, the model can be deployed on a PC or a mobile terminal, and meanwhile, the accuracy is high, wherein target detection frames of the text can be obtained through a non-maximum suppression algorithm, namely specific position information of the text in the image can be obtained, and further, the coordinates of each detection frame can be conveniently stored.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.
Claims (8)
1. A text graphic offset angle prediction and correction method thereof is characterized by comprising the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a convolution layer of 3 x 3 and an average pooling layer, the convolution layer is used for extracting text features, the average pooling layer is composed of a plurality of convolution layers, the average pooling layer is connected with an output layer through two convolution layers of 1 x1, and an h-swish function is adopted as an activation function of the depth-separable convolutional neural network;
the depth-separable convolutional neural network optimizes the model by adjusting model basic parameters in a training process;
the depth separable convolutional neural network adopts a cross entropy loss function during training;
an image enhancement processing means is adopted in the generation process of the training set of the MobileNet V3 model;
the step (4) converts the text region in the image into a standard orientation by rotating the text region by a corresponding angle in an instant direction using a coordinate rotation formula and an OPENCV technique, respectively.
2. The method of claim 1, wherein the flow of the non-maximum suppression algorithm comprises the steps of:
(1) dividing coordinates of a text detection box obtained by text direction classification model regression according to categories;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list and finishing the algorithm.
4. The method as claimed in claim 1, wherein the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) and adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and evaluating the model by using a cross entropy loss function.
5. The method of claim 4, wherein the image enhancement process comprises the following steps:
(1) performing image rotation on the acquired text image;
(2) and carrying out perspective transformation on the text image subjected to image rotation.
6. The method as claimed in claim 5, wherein the image rotation is performed by the following formula:
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position.
7. The method according to claim 5, wherein the general transformation formula of the perspective transformation is:
(u, v) are original image pixel coordinates),for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110090669.4A CN112784836A (en) | 2021-01-22 | 2021-01-22 | Text and graphic offset angle prediction and correction method thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110090669.4A CN112784836A (en) | 2021-01-22 | 2021-01-22 | Text and graphic offset angle prediction and correction method thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112784836A true CN112784836A (en) | 2021-05-11 |
Family
ID=75758695
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110090669.4A Pending CN112784836A (en) | 2021-01-22 | 2021-01-22 | Text and graphic offset angle prediction and correction method thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112784836A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420741A (en) * | 2021-08-24 | 2021-09-21 | 深圳市中科鼎创科技股份有限公司 | Method and system for intelligently detecting file modification |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427939A (en) * | 2019-08-02 | 2019-11-08 | 泰康保险集团股份有限公司 | Method, apparatus, medium and the electronic equipment of correction inclination text image |
CN110490198A (en) * | 2019-08-12 | 2019-11-22 | 上海眼控科技股份有限公司 | Text orientation bearing calibration, device, computer equipment and storage medium |
CN111325203A (en) * | 2020-01-21 | 2020-06-23 | 福州大学 | American license plate recognition method and system based on image correction |
CN112001385A (en) * | 2020-08-20 | 2020-11-27 | 长安大学 | Target cross-domain detection and understanding method, system, equipment and storage medium |
-
2021
- 2021-01-22 CN CN202110090669.4A patent/CN112784836A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110427939A (en) * | 2019-08-02 | 2019-11-08 | 泰康保险集团股份有限公司 | Method, apparatus, medium and the electronic equipment of correction inclination text image |
CN110490198A (en) * | 2019-08-12 | 2019-11-22 | 上海眼控科技股份有限公司 | Text orientation bearing calibration, device, computer equipment and storage medium |
CN111325203A (en) * | 2020-01-21 | 2020-06-23 | 福州大学 | American license plate recognition method and system based on image correction |
CN112001385A (en) * | 2020-08-20 | 2020-11-27 | 长安大学 | Target cross-domain detection and understanding method, system, equipment and storage medium |
Non-Patent Citations (2)
Title |
---|
ANDREW HOWARD 等: "Searching for MobileNetV3", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 * |
YUNING DU 等: "PP-OCR: A Practical Ultra Lightweight OCR System", 《HTTPS://ARXIV.ORG/PDF/2009.09941.PDF》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113420741A (en) * | 2021-08-24 | 2021-09-21 | 深圳市中科鼎创科技股份有限公司 | Method and system for intelligently detecting file modification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111401372B (en) | Method for extracting and identifying image-text information of scanned document | |
CN112329760B (en) | Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
CN107220641B (en) | Multi-language text classification method based on deep learning | |
CN107784288A (en) | A kind of iteration positioning formula method for detecting human face based on deep neural network | |
CN112052899A (en) | Single ship target SAR image generation method based on generation countermeasure network | |
CN112069900A (en) | Bill character recognition method and system based on convolutional neural network | |
CN111553438A (en) | Image identification method based on convolutional neural network | |
CN110807362A (en) | Image detection method and device and computer readable storage medium | |
CN113221956B (en) | Target identification method and device based on improved multi-scale depth model | |
CN113095156B (en) | Double-current network signature identification method and device based on inverse gray scale mode | |
Bawane et al. | Object and character recognition using spiking neural network | |
CN110135446A (en) | Method for text detection and computer storage medium | |
CN112036522A (en) | Calligraphy individual character evaluation method, system and terminal based on machine learning | |
CN111783819A (en) | Improved target detection method based on region-of-interest training on small-scale data set | |
Rizvi et al. | Optical character recognition based intelligent database management system for examination process control | |
CN117079098A (en) | Space small target detection method based on position coding | |
CN115880495A (en) | Ship image target detection method and system under complex environment | |
Kancharla et al. | Handwritten signature recognition: a convolutional neural network approach | |
CN114283431B (en) | Text detection method based on differentiable binarization | |
CN112883931A (en) | Real-time true and false motion judgment method based on long and short term memory network | |
CN116229528A (en) | Living body palm vein detection method, device, equipment and storage medium | |
CN116883933A (en) | Security inspection contraband detection method based on multi-scale attention and data enhancement | |
CN112784836A (en) | Text and graphic offset angle prediction and correction method thereof | |
CN114359917A (en) | Handwritten Chinese character detection and recognition and font evaluation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210511 |