CN112784836A - Text and graphic offset angle prediction and correction method thereof - Google Patents

Text and graphic offset angle prediction and correction method thereof Download PDF

Info

Publication number
CN112784836A
CN112784836A CN202110090669.4A CN202110090669A CN112784836A CN 112784836 A CN112784836 A CN 112784836A CN 202110090669 A CN202110090669 A CN 202110090669A CN 112784836 A CN112784836 A CN 112784836A
Authority
CN
China
Prior art keywords
text
image
model
standard
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110090669.4A
Other languages
Chinese (zh)
Inventor
励建科
陈再蝶
朱晓秋
邓明明
樊伟东
周杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Kangxu Technology Co ltd
Original Assignee
Zhejiang Kangxu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Kangxu Technology Co ltd filed Critical Zhejiang Kangxu Technology Co ltd
Priority to CN202110090669.4A priority Critical patent/CN112784836A/en
Publication of CN112784836A publication Critical patent/CN112784836A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a text and graphic offset angle prediction and correction method, which comprises the following four steps: (1) acquiring a text image to be recognized; (2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill; (3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction; (4) and if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction. According to the method, the text images can be subjected to angle classification of any angle in batch, an angle deviation predicted value is returned, the value can be compared with a standard angle, the text images of non-standard angles are automatically converted into the text images of standard angles, the size of a model is only 3M, the model can be deployed on a PC or a mobile terminal, and meanwhile, the accuracy is high.

Description

Text and graphic offset angle prediction and correction method thereof
Technical Field
The invention relates to the technical field of text image correction, in particular to a text image offset angle prediction method and a text image offset angle correction method.
Background
With the development of mobile internet, big data and artificial intelligence, the internet enters a new era, which is rapidly changing the tradition of human society and bringing great impact to the traditional banking industry, especially to the traditional bank service mode taking website tellers and managers as the core, the application of artificial intelligence is increasingly wide, the artificial intelligence technologies such as text direction classification technology and the like are beginning to be widely applied to various business fields such as banks, enterprises and the like, and bank bills are accurately classified according to text directions.
Firstly, however, most of the existing banks use manual adjustment of the angle of bills when processing bills in batch, and cannot automatically and batch correct the pictures with offset uploaded pictures;
some classifiers in the text direction can only classify at 0 degree and 180 degrees, and cannot classify text pictures at any angle;
after the text angle classification is carried out, the corresponding automatic correction cannot be carried out on the picture;
secondly, some algorithm models are heavyweight models at the present stage, and cannot be deployed at a mobile terminal, and the landing requirements cannot be met due to too low calling speed.
Disclosure of Invention
The invention aims to: in order to solve the technical problems mentioned in the background art, a text graphic offset angle prediction and a correction method thereof are provided.
In order to achieve the purpose, the invention adopts the following technical scheme:
a text graphic offset angle prediction and correction method thereof includes the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a convolution layer of 3 x 3 and an average pooling layer, the convolution layer is used for extracting text features, the average pooling layer is composed of a plurality of convolution layers, the average pooling layer is connected with an output layer through two convolution layers of 1 x1, and an h-swish function is adopted as an activation function of the depth-separable convolutional neural network; the depth-separable convolutional neural network optimizes the model by adjusting model basic parameters in a training process; the depth separable convolutional neural network adopts a cross entropy loss function during training; an image enhancement processing means is adopted in the generation process of the training set of the MobileNet V3 model; the step (4) converts the text region in the image into a standard orientation by rotating the text region by a corresponding angle in an instant direction using a coordinate rotation formula and an OPENCV technique, respectively.
As a further description of the above technical solution:
the flow of the non-maximum suppression algorithm includes the steps of:
(1) dividing coordinates of a text detection box obtained by text direction classification model regression according to categories;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list and finishing the algorithm.
As a further description of the above technical solution:
the calculation formula of the h-swish activation function of the MobileNet V3 model is
Figure BDA0002912357990000031
ReLU is an activation function;
the formula for ReLU is:
Figure BDA0002912357990000032
and x is the input characteristic value.
As a further description of the above technical solution:
the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) and adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and evaluating the model by using a cross entropy loss function.
As a further description of the above technical solution:
the image enhancement processing includes the steps of:
(1) performing image rotation on the acquired text image;
(2) and carrying out perspective transformation on the text image subjected to image rotation.
As a further description of the above technical solution:
the formula adopted for image rotation is as follows:
Figure BDA0002912357990000041
Figure BDA0002912357990000042
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position.
As a further description of the above technical solution:
the general transformation formula of the perspective transformation is as follows:
Figure BDA0002912357990000043
(u, v) are original image pixel coordinates),
Figure BDA0002912357990000044
for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
Figure BDA0002912357990000051
Figure BDA0002912357990000052
representing a linear transformation of the image;
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation.
As a further description of the above technical solution:
the cross entropy loss function is:
Figure BDA0002912357990000053
w is a weight;
xi, yi are eigenvalues;
b is an offset value.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1. according to the method, the text images can be subjected to angle classification of any angle in batch, the images are subjected to model prediction, an angle deviation predicted value is returned, the value can be compared with a standard angle, the text images of non-standard angles are automatically converted into the text images of standard angles, the whole system provides very convenient technical support for the stage of bill deviation image preprocessing, the images are automatically aligned, the subsequent image detection and identification are facilitated, and the manual labor cost and time are reduced.
2. The invention provides a lightweight algorithm model, the size of the model is only 3M, and the model can be deployed on a PC or a mobile terminal and has high accuracy.
3. In the invention, in the process of reading a parameter file required by training and training, the system modifies the learning rate by using a method for automatically adjusting the learning rate, the method for automatically adjusting the learning rate is used because the optimized deep neural network is regarded as an empirical process to a great extent, the optimized deep neural network needs to manually adjust several parameters such as the learning rate, weight attenuation and random inactivation rate, so to speak, the learning rate is the most important one of the parameters needing to be adjusted, compared with the fixed learning rate, the variable learning rate scheduling system can provide faster convergence, can improve the training speed of the model and enhance the generalization capability of the model, and the images of bank notes are used for carrying out corresponding image enhancement techniques on the images, such as image rotation, image perspective transformation and other direction transformation technical means for carrying out image rotation and image annotation, and motion blur and Gaussian noise are used, the number of the pictures in the training set is increased for the operation purpose, in order to improve the generalization capability of the model and avoid overfitting of the model, after multiple times of ablation training, the system achieves a good classification effect and can perform automatic correction according to the recognized classification angle, the output result of the system comprises a predicted angle value and confidence coefficient, the system can automatically correct the pictures to a standard position through the angle prediction value, an early warning is prompted according to the confidence coefficient, if the confidence coefficient is too low, the corrected images possibly need to be manually intervened, and the robustness of the whole system is improved.
Drawings
FIG. 1 is a schematic diagram illustrating a text image direction classification algorithm flow of a text image offset angle prediction and a correction method thereof according to an embodiment of the present invention;
FIG. 2 is a schematic diagram illustrating an application of a text image direction classification algorithm to a text image offset angle prediction and correction method thereof according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart illustrating the process of determining the orientation correction of the image to be recognized according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 4 is a schematic diagram illustrating 180-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 5 is a schematic diagram illustrating 15-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 6 is a schematic diagram illustrating 90-degree rotation correction of an image to be corrected according to the text graphic offset angle prediction and the correction method thereof provided by the embodiment of the invention;
FIG. 7 is a schematic diagram of a training system based on a MobileNet V3 model for predicting the offset angle of a text graphic and a correction method thereof according to an embodiment of the present invention;
fig. 8 shows a structure diagram of a MobileNetV3 model for text graphic offset angle prediction and correction method thereof according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1-8, the present invention provides a technical solution: a text graphic offset angle prediction and correction method thereof includes the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a 3 x 3 convolutional layer and an average pooling layer for extracting text features, wherein the average pooling layer is composed of a plurality of convolutional layers, the average pooling layer is connected with an output layer through two 1 x1 convolutional layers, after the four steps, the text images can be subjected to angle classification of any angle in batch, after the images are subjected to model prediction, an angle deviation predicted value is returned, the value is compared with a standard angle, the text images of non-standard angles are automatically converted into text images of standard angles, the steps provide a lightweight algorithm model, the size of the model is only 3M, the model can be deployed at a PC or a mobile terminal, and meanwhile, the accuracy is high, wherein in the step (4), a text area in the images is respectively rotated by corresponding angles in an instant needle through coordinate rotation formula and OPENCV technology, converting it to a standard orientation.
Referring to FIG. 1, the calculation formula of the h-swish activation function of the MobileNet V3 model is
Figure BDA0002912357990000081
ReLU is an activation function;
the formula for ReLU is:
Figure BDA0002912357990000082
x is an input characteristic value;
the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and simultaneously performing model evaluation by using a cross entropy loss function;
the image enhancement processing includes the steps of:
(1) performing image rotation on the acquired text image;
(2) carrying out perspective transformation on the text image subjected to image rotation;
the formula adopted for image rotation is:
Figure BDA0002912357990000091
Figure BDA0002912357990000092
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position;
the general transformation formula for the perspective transformation is:
Figure BDA0002912357990000093
(u, v) are original image pixel coordinates),
Figure BDA0002912357990000094
for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
Figure BDA0002912357990000095
Figure BDA0002912357990000101
representing a linear transformation of the image;
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation;
the cross entropy loss function is:
Figure BDA0002912357990000102
w is a weight;
xi, yi are eigenvalues;
b is an offset value, in which the h-Swish function is approximated by a Swish function
Figure BDA0002912357990000103
The method has the advantages that sigmoid is replaced, the number of filters can be reduced from 32 to 16 under the condition of not losing precision through experiments due to the benefit of h-swish design, the time can be saved by 3ms through the improvement, 1000 ten thousand times of multiply-add operation is realized, the basic parameters of the model are adjusted, the model does not need to be reconstructed, the efficiency is greatly improved, meanwhile, under the condition that the data volume is not large, the characteristics learned by the model can be more robust through fine adjustment, the cross entropy loss function is selected for training, multiple parameter adjustment is carried out for training, the generalization capability of the model is improved, and the coordinate rotation formula for correcting the picture in the MobileNet V3 model is optimized through image enhancement processing.
Referring to fig. 1, the flow of the non-maximum suppression algorithm includes the following steps:
(1) dividing the coordinates of the text detection box obtained by the regression of the text direction classification model according to categories, and removing the background of the text image;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list, finishing the algorithm, and obtaining the target detection frame of the text through a non-maximum suppression algorithm, namely obtaining the specific position information of the text in the image, thereby being convenient for storing the coordinates of each detection frame.
The working principle is as follows: when the method is used, firstly, a MobileNet V3 model needs to be trained, in order to generate a training set of offset angles, images need to be enhanced, random offset angles of some pictures can be optimized by adopting image enhancement processing to a coordinate rotation formula for correcting the pictures in the MobileNet V3 model, in the training process, a method for adjusting model basic parameters is adopted, the model basic parameters are adjusted, the model basic parameters do not need to be re-constructed, the efficiency is greatly improved, meanwhile, under the condition that the data volume of the model is not large, fine adjustment enables the characteristics learned by the model to be more robust, wherein a cross entropy loss function is selected for training, multiple times of parameter adjustment are carried out for training, the generalization capability of the model is improved, the model with high training needs to be evaluated in the training process, and the use accuracy in the system is improved
Figure BDA0002912357990000111
Recall rate
Figure BDA0002912357990000112
And F1 score
Figure BDA0002912357990000113
The performance of the model is evaluated, the accuracy of the model evaluation can reach more than 95%, after training is finished, the model is rapidly predicted through a prediction script, single picture or batch prediction can be carried out through formulating a prediction picture or folder path, the system can output an angle prediction value and a confidence coefficient of the picture, for example, (30,0.99876), the system can automatically correct the picture to a standard position through the angle prediction value, an early warning is prompted according to the confidence coefficient, if the confidence coefficient is too low, the corrected picture can be required to be manually intervened, the robustness of the whole system is improved, secondly, after the model is trained, the text image can be subjected to angle classification of any angle in batch, after the picture is predicted through the model, an angle deviation prediction value is returned, the value can be compared with a standard angle, the text image of a non-standard angle is automatically converted into the text image of the standard angle, the steps provide a lightweight algorithm model, the size of the model is only 3M, the model can be deployed on a PC or a mobile terminal, and meanwhile, the accuracy is high, wherein target detection frames of the text can be obtained through a non-maximum suppression algorithm, namely specific position information of the text in the image can be obtained, and further, the coordinates of each detection frame can be conveniently stored.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (8)

1. A text graphic offset angle prediction and correction method thereof is characterized by comprising the following steps:
(1) acquiring a text image to be recognized;
(2) inputting a text image to be recognized into a text direction classification model for deep learning, and obtaining coordinates and a text offset angle of a text detection box through regression prediction of the text direction classification model, namely detecting a text position in a bill;
(3) comparing the coordinate direction of the text detection box with the coordinate direction of the standard text box according to the coordinate of the text detection box, and judging whether the text detection box is in the standard direction;
(4) if the direction of the text image to be recognized is not the standard direction, converting the direction into the standard direction;
a MobileNet V3 model is used in the text direction classification model, and the MobileNet V3 model comprises a depth separable convolutional neural network for extracting the characteristics of candidate text regions and a non-maximum suppression algorithm for dividing the texts in the candidate regions;
the depth-separable convolutional neural network comprises a convolution layer of 3 x 3 and an average pooling layer, the convolution layer is used for extracting text features, the average pooling layer is composed of a plurality of convolution layers, the average pooling layer is connected with an output layer through two convolution layers of 1 x1, and an h-swish function is adopted as an activation function of the depth-separable convolutional neural network;
the depth-separable convolutional neural network optimizes the model by adjusting model basic parameters in a training process;
the depth separable convolutional neural network adopts a cross entropy loss function during training;
an image enhancement processing means is adopted in the generation process of the training set of the MobileNet V3 model;
the step (4) converts the text region in the image into a standard orientation by rotating the text region by a corresponding angle in an instant direction using a coordinate rotation formula and an OPENCV technique, respectively.
2. The method of claim 1, wherein the flow of the non-maximum suppression algorithm comprises the steps of:
(1) dividing coordinates of a text detection box obtained by text direction classification model regression according to categories;
(2) sorting the bounding boxes (B _ BOX) in each object class in descending order according to the classification confidence;
(3) in a certain class, selecting a bounding BOX B _ BOX1 with the highest confidence coefficient, removing the B _ BOX1 from an input list, and adding the B _ BOX1 into an output list;
(4) calculating the intersection ratio IoU of the B _ BOX1 and the rest B _ BOX2 one by one, and if IoU (B _ BOX1, B _ BOX2) > threshold TH, removing the B _ BOX2 at the input;
(5) repeating the steps 3-4 until the input list is empty, and completing traversal of an object class;
(6) repeating the steps of 2-5 until the non-maximum suppression algorithm processing of all the object classes is completed;
(7) and outputting the list and finishing the algorithm.
3. The method as claimed in claim 1, wherein the calculation formula of the h-swish activation function of the MobileNet V3 model is
Figure FDA0002912357980000021
ReLU is an activation function;
the formula for ReLU is:
Figure FDA0002912357980000022
and x is the input characteristic value.
4. The method as claimed in claim 1, wherein the training process for obtaining the text direction classification model comprises the following steps:
(1) acquiring a text image;
(2) carrying out image enhancement processing on the acquired text image;
(3) taking the image after image enhancement as a training set, and training the original depth separable convolution neural network;
(4) and adjusting model basic parameters of the original depth separable convolutional neural network in a training process, and evaluating the model by using a cross entropy loss function.
5. The method of claim 4, wherein the image enhancement process comprises the following steps:
(1) performing image rotation on the acquired text image;
(2) and carrying out perspective transformation on the text image subjected to image rotation.
6. The method as claimed in claim 5, wherein the image rotation is performed by the following formula:
Figure FDA0002912357980000031
Figure FDA0002912357980000032
x′=(x0-xcenter)cosθ-(y0-ycenter)sinθ+xcenter;
y′=(x0-xcenter)sinθ-(y0-ycenter)cosθ+ycenter
(left, top) represents the top left corner coordinates of the image;
(right, bottom) represents the lower right corner coordinates of the image;
(x0,y0) Representing coordinates of an arbitrary point on the image;
(xcenter, ycenter) represents the coordinates of the center point of the image;
(x ', y') represents the new coordinate position.
7. The method according to claim 5, wherein the general transformation formula of the perspective transformation is:
Figure FDA0002912357980000033
(u, v) are original image pixel coordinates),
Figure FDA0002912357980000041
for the transformed image pixel coordinates, the perspective transformation matrix is as follows:
Figure FDA0002912357980000042
Figure FDA0002912357980000043
representing a linear transformation of the image;
T2=[a13,a23]Tfor generating a perspective transformation of the image;
T3=[a31,a32]representing image translation.
8. The method of claim 4, wherein the cross entropy loss function is:
Figure FDA0002912357980000044
w is a weight;
xi, yi are eigenvalues;
b is an offset value.
CN202110090669.4A 2021-01-22 2021-01-22 Text and graphic offset angle prediction and correction method thereof Pending CN112784836A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110090669.4A CN112784836A (en) 2021-01-22 2021-01-22 Text and graphic offset angle prediction and correction method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110090669.4A CN112784836A (en) 2021-01-22 2021-01-22 Text and graphic offset angle prediction and correction method thereof

Publications (1)

Publication Number Publication Date
CN112784836A true CN112784836A (en) 2021-05-11

Family

ID=75758695

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110090669.4A Pending CN112784836A (en) 2021-01-22 2021-01-22 Text and graphic offset angle prediction and correction method thereof

Country Status (1)

Country Link
CN (1) CN112784836A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420741A (en) * 2021-08-24 2021-09-21 深圳市中科鼎创科技股份有限公司 Method and system for intelligently detecting file modification

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427939A (en) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 Method, apparatus, medium and the electronic equipment of correction inclination text image
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN111325203A (en) * 2020-01-21 2020-06-23 福州大学 American license plate recognition method and system based on image correction
CN112001385A (en) * 2020-08-20 2020-11-27 长安大学 Target cross-domain detection and understanding method, system, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427939A (en) * 2019-08-02 2019-11-08 泰康保险集团股份有限公司 Method, apparatus, medium and the electronic equipment of correction inclination text image
CN110490198A (en) * 2019-08-12 2019-11-22 上海眼控科技股份有限公司 Text orientation bearing calibration, device, computer equipment and storage medium
CN111325203A (en) * 2020-01-21 2020-06-23 福州大学 American license plate recognition method and system based on image correction
CN112001385A (en) * 2020-08-20 2020-11-27 长安大学 Target cross-domain detection and understanding method, system, equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANDREW HOWARD 等: "Searching for MobileNetV3", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》 *
YUNING DU 等: "PP-OCR: A Practical Ultra Lightweight OCR System", 《HTTPS://ARXIV.ORG/PDF/2009.09941.PDF》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420741A (en) * 2021-08-24 2021-09-21 深圳市中科鼎创科技股份有限公司 Method and system for intelligently detecting file modification

Similar Documents

Publication Publication Date Title
CN111401372B (en) Method for extracting and identifying image-text information of scanned document
CN112329760B (en) Method for recognizing and translating Mongolian in printed form from end to end based on space transformation network
Kadam et al. Detection and localization of multiple image splicing using MobileNet V1
CN107220641B (en) Multi-language text classification method based on deep learning
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN112052899A (en) Single ship target SAR image generation method based on generation countermeasure network
CN112069900A (en) Bill character recognition method and system based on convolutional neural network
CN111553438A (en) Image identification method based on convolutional neural network
CN110807362A (en) Image detection method and device and computer readable storage medium
CN113221956B (en) Target identification method and device based on improved multi-scale depth model
CN113095156B (en) Double-current network signature identification method and device based on inverse gray scale mode
Bawane et al. Object and character recognition using spiking neural network
CN110135446A (en) Method for text detection and computer storage medium
CN112036522A (en) Calligraphy individual character evaluation method, system and terminal based on machine learning
CN111783819A (en) Improved target detection method based on region-of-interest training on small-scale data set
Rizvi et al. Optical character recognition based intelligent database management system for examination process control
CN117079098A (en) Space small target detection method based on position coding
CN115880495A (en) Ship image target detection method and system under complex environment
Kancharla et al. Handwritten signature recognition: a convolutional neural network approach
CN114283431B (en) Text detection method based on differentiable binarization
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN116229528A (en) Living body palm vein detection method, device, equipment and storage medium
CN116883933A (en) Security inspection contraband detection method based on multi-scale attention and data enhancement
CN112784836A (en) Text and graphic offset angle prediction and correction method thereof
CN114359917A (en) Handwritten Chinese character detection and recognition and font evaluation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210511