CN110929665A - Natural scene curve text detection method - Google Patents

Natural scene curve text detection method Download PDF

Info

Publication number
CN110929665A
CN110929665A CN201911199614.6A CN201911199614A CN110929665A CN 110929665 A CN110929665 A CN 110929665A CN 201911199614 A CN201911199614 A CN 201911199614A CN 110929665 A CN110929665 A CN 110929665A
Authority
CN
China
Prior art keywords
text
loss
bounding box
detection method
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911199614.6A
Other languages
Chinese (zh)
Other versions
CN110929665B (en
Inventor
王敏
蔡鑫鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Original Assignee
Hohai University HHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU filed Critical Hohai University HHU
Priority to CN201911199614.6A priority Critical patent/CN110929665B/en
Publication of CN110929665A publication Critical patent/CN110929665A/en
Application granted granted Critical
Publication of CN110929665B publication Critical patent/CN110929665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a natural scene curve text detection method, which comprises the following steps: (1) acquiring a plurality of image data sets for training scene curve text detection; (2) performing feature learning on the image data set obtained in the step (1) by using a Convolutional Neural Network (CNN), and generating a text proposal of an input image by using a dimension decomposition area proposal network DeRPN; (3) validating and refining the text proposal in step (2) using a refining network, including text/non-text classification, bounding box regression, and arbitrary shape text region representation; (4) carrying out supervision training on the network built in the step (3) to obtain a detector model; (5) and (4) detecting the picture to be detected by using the detector model in the step (4), and outputting the polygonal text area to obtain a final detection result. The method can better position the curve text tightly and robustly and improve the detection performance.

Description

Natural scene curve text detection method
Technical Field
The invention relates to the technical field of image processing, in particular to a natural scene curve text detection method.
Background
Text is the most basic medium for delivering semantic information, and is ubiquitous in everyday life: road signs, shop signs, product packaging, restaurant menus, etc., and text in such a natural environment is referred to as scene text. It would be beneficial to automatically detect and recognize scene text, and have applications in real-time text translation, blind assistance, shopping, robotics, smart cars, and education. An end-to-end text recognition system typically includes two steps: text detection and text recognition, in which text regions are detected and marked with their bounding boxes; in text recognition, text information is retrieved from a detected text region. Text detection is an important step in achieving end-to-end text recognition, without which text can not be recognized from a scene image. Therefore, scene text detection has attracted much attention in recent years.
Traditional OCR techniques can only process text on printed documents or business cards, whereas scene text detection attempts to detect various text in complex scenes. Scene text detection becomes a very challenging task due to the complexity of the background, variations in fonts, size, color, language, lighting conditions, orientation, etc. Before the deep learning method is popularized, the performance of the deep learning method is poor by using manually designed features and a traditional classifier. However, in recent years, due to the development of deep learning techniques, the detection performance has been greatly improved. Meanwhile, the research focus of scene text detection is also shifted from horizontal scene texts to multi-directional scene texts, and more challenging curve texts or arbitrarily-shaped scene texts.
The main challenge of curved text detection comes from the irregular shape and the direction of height variation. Conventional bounding boxes do not scale well in curved scene text. Because texts may appear in various shapes, the traditional quadrilateral bounding box cannot avoid a large amount of redundant overlapping and containing multiple lines of texts, and is also influenced by background noise, so that the texts in a curved scene are difficult to be positioned in a compact and robust manner. Most target detection methods now use the area proposal network RPN. Although RPN has proven to be an effective method for generating a regional proposal, the anchor box employed in RPN is very sensitive, limiting the ability to adapt to different targets. Performance is much reduced as soon as the anchor box deviates significantly from the ground truth in the data set.
Disclosure of Invention
The invention aims to provide a natural scene curve text detection method, which can better position curve texts closely and robustly and improve the detection performance.
In order to solve the technical problem, the invention provides a natural scene curve text detection method, which comprises the following steps:
(1) acquiring a plurality of image data sets for training scene curve text detection;
(2) performing feature learning on the image data set obtained in the step (1) by using a Convolutional Neural Network (CNN), and generating a text proposal of an input image by using a dimension decomposition area proposal network DeRPN;
(3) validating and refining the text proposal in step (2) using a refining network, including text/non-text classification, bounding box regression, and arbitrary shape text region representation;
(4) carrying out supervision training on the network built in the step (3) to obtain a detector model;
(5) and (4) detecting the picture to be detected by using the detector model in the step (4), and outputting the polygonal text area to obtain a final detection result.
Preferably, in the step (1), the image data set is an existing common scene curve text image data set, or a curve text image data set in a scene is temporarily collected, the image data set includes N training pictures, each training picture has at least one curve text region, and there is an annotation file describing position information of all text regions in the picture by vertex coordinates of a rectangle or a polygon, and the annotation file is called a label.
Preferably, in the step (2), the features (x) are extracted from the convolutional neural network CNN and input to a regression layer and a classification layer, the regression layer is realized by a convolutional layer or a full-link layer and is a linear operation for predicting parameterized coordinates (t), and the parameterized coordinates are decoded according to an anchor box (B) in order to obtain a predicted bounding box; the classification layer applies an activation function (e.g., Sigmoid or Softmax, denoted as σ) to the predicted value to generate a probability (P) of the bounding boxB) (ii) a Using VGG16 as the backbone network, a DeRPN is attached to its conv5 layer, which the DeRPN passes throughDimension decomposition mechanism, introducing anchor
Figure BDA0002295539390000021
Simultaneous prediction of independent segments (S) as independent regression references for object width and heightw(x,w),Sh(y, h)) and corresponding probabilities
Figure BDA0002295539390000022
Rather than a complete bounding box, the mathematical description of this process is as follows:
Figure BDA0002295539390000023
Figure BDA0002295539390000024
Figure BDA0002295539390000025
wherein, Wr,brRepresenting the weight and deviation of the regression layer, Wc,bcWeights and biases representing the classification levels, x, y, w, h are the coordinates of the bounding box, xa,ya,wa,haIs the corresponding coordinate of anchor string,. psi.w,thParameterized coordinates representing predicted width and height, Sw(x,w),Sh(y, h) represents individual segments predicting width and height,
Figure BDA0002295539390000026
an independent regression reference representing the width and height of the object,
Figure BDA0002295539390000027
representing respective wide and high probabilities;
since the detection result requires a two-dimensional bounding box, the predicted segments need to be reasonably combined to recover the bounding box, and the combining process is mathematically described as follows:
B(x,y,w,h)=f(Sw(x,w),Sh(y,h))
Figure BDA0002295539390000031
where f represents a rule or algorithm for combining predicted segments and g is a function (e.g., arithmetic mean, harmonic mean) that evaluates the probability of combining bounding boxes. PBRepresenting the probability of generating a bounding box, and B (x, y, w, h) representing the combined bounding box.
Preferably, in step (3), the geometric property of the text is utilized: the text area, the text center line and the boundary box offset accurately represent the shape of the text boundary box in the step (2), wherein the text center line is formed by contracting the text boundary box, the boundary offset is four channel graphs, and the value is only arranged at the position corresponding to the positive response of the center line feature graph; sampling n points on the central line of the predicted text at equal intervals from left to right, drawing a normal line perpendicular to the tangent line of the predicted text, and intersecting the normal line with an upper boundary line and a lower boundary line to obtain two boundary points; for each centerline sampled point, obtaining four boundary offsets by calculating the distance from itself to its two associated boundary points; by connecting all boundary points clockwise, a complete text polygon representation can be obtained.
Preferably, in step (4), in constructing a detector model of the natural scene curve text detection method, the following loss function is used to calculate the loss:
L=La+λLb
wherein, L, LaAnd LbTotal loss, first stage loss and second stage loss, respectively, and λ is a weight coefficient that balances between the first stage loss and the second stage loss.
Preferably, the first stage losses are defined as follows:
Figure BDA0002295539390000032
wherein R isj={k|Sk=aj,k=1,2,…,M},
Figure BDA0002295539390000033
Here, anchor string is set to a geometric series of { a }nDenotes that (16,32,64,128,256,512,1024), N is a geometric series { a }nNumber of items in the (S) }, M is batch size, S represents anchor string, PiRepresenting the predicted probability of the ith anchoring in the small batch; if anchor string is positive, then the ground-route tag
Figure BDA0002295539390000034
Set to 1, otherwise
Figure BDA0002295539390000035
Is 0; t is tiA prediction vector representing the parametric coordinates,
Figure BDA0002295539390000036
is the corresponding ground-truth vector, A is the aligned anchoring set, RjRepresenting anchor string index sets containing the same scale, j is used to represent the same as { a }nItem a injCorresponding scale, similarly, GjIs a set of aligned anchor string indices containing the same scale; loss of classification LclsIs the cross entropy loss, the regression loss LregIs a smooth L1 loss, λ1Is LclsAnd LregThe balance parameter of (1).
Preferably, the second stage losses are defined as follows:
Lb=L12L23L3
wherein L is1、L2And L3Respectively text/non-text classification loss, bounding box regression loss and arbitrary shape representation loss, lambda2、λ3Is a balance parameter.
Preferably, the text/non-text classification penalty is a binary classification penalty, L1=Lcls(P, t) ═ logPt, t is the label of the classification label, and t ═ 1 indicatesText, t ═ 0 means not text, and parameter P ═ P (P)0,P1) Is the confidence of the text and non-text after softmax calculation.
Preferably, the bounding box regression loss is lost using a smooth L1,
Figure BDA0002295539390000041
v=(vx,vy,vw,vh) Is the target of the bounding box regression, including the coordinates of the center point, width and height,
Figure BDA0002295539390000042
is a predicted value for each text proposal using v and v given in Faster R-CNN*Parameterization, where v and v*The scale invariance and log-space height/width skewness of a target proposal are specified.
Preferably, the arbitrary shape represents the loss L3=μ1LPr2LPcl3Lborder,Ltr、LtclIs the loss of rice coefficient, L, for text regions and text centerlinesborderCalculated by smoothing the L1 loss, μ1、μ2、μ3Is a balance parameter.
The invention has the beneficial effects that: the invention uses a new regional proposal network DeRPN which has strong adaptability, under the condition of no hyper-parameter modification, the DeRPN can be directly used for different models, tasks or data sets and can be matched with an object with the best regression reference, so that the network can be trained more smoothly and a more accurate regional proposal can be obtained; meanwhile, the text with any shape is represented by the form of the text center line and the corresponding offset, so that the curve text can be positioned closely and robustly, and the detection performance is improved.
Drawings
FIG. 1 is a schematic flow chart of the method of the present invention.
FIG. 2 is a schematic diagram of a model architecture according to the present invention.
Detailed Description
As shown in fig. 1, a natural scene curve text detection method includes the following steps:
step 1: acquiring a plurality of image data sets for training scene curve text detection;
the image data set is an existing public scene curve text image data set or a curve text image data set in a temporary collected scene, the image data set comprises N training pictures, each training picture has at least one curve text region, and a labeling file which describes position information of all the text regions in the picture by using vertex coordinates of a rectangle or a polygon is provided, and the labeling file is called a label.
Step 2: performing feature learning on the image data set obtained in the step 1 by using a Convolutional Neural Network (CNN), and generating a text proposal of an input image by using a dimension decomposition area proposal network DeRPN;
extracting features (x) from a Convolutional Neural Network (CNN) and inputting the features (x) into a regression layer and a classification layer, wherein the regression layer is realized by a convolution layer or a full-connection layer and is a linear operation for predicting parameterized coordinates (t), and the parameterized coordinates are decoded according to an anchorbox (B) in order to obtain a predicted boundary frame; the classification layer applies an activation function (e.g., Sigmoid or Softmax, denoted as σ) to the predicted value to generate a probability (P) of the bounding boxB). Step 2, using VGG16 as a backbone network, attaching DeRPN to the conv5 layer thereof, and introducing the DeRPN into the anchor through a dimension decomposition mechanism
Figure BDA0002295539390000051
Simultaneous prediction of independent segments (S) as independent regression references for object width and heightw(x,w),Sh(y, h)) and corresponding probabilities
Figure BDA0002295539390000052
Rather than a complete bounding box. The mathematical description of this process is as follows:
Figure BDA0002295539390000053
Figure BDA0002295539390000054
Figure BDA0002295539390000055
wherein, Wr,brRepresenting the weight and deviation of the regression layer, Wc,bcWeights and biases representing the classification levels, x, y, w, h are the coordinates of the bounding box, xa,ya,wa,haIs the corresponding coordinate of anchor string,. psi.w,thParameterized coordinates representing predicted width and height, Sw(x,w),Sh(y, h) represents individual segments predicting width and height,
Figure BDA0002295539390000056
an independent regression reference representing the width and height of the object,
Figure BDA0002295539390000057
representing a corresponding wide and high probability.
Since the detection result requires a two-dimensional bounding box, the predicted segments need to be reasonably combined to recover the bounding box, and the combining process is mathematically described as follows:
B(x,y,w,h)=f(Sw(x,w),Sh(y,h))
Figure BDA0002295539390000058
where f represents a rule or algorithm for combining predicted segments and g is a function (e.g., arithmetic mean, harmonic mean) that evaluates the probability of combining bounding boxes. PBRepresenting the probability of generating a bounding box, and B (x, y, w, h) representing the combined bounding box.
DeRPN reasonably matches objects with anchor string according to length rather than IoU in RPN, the best matching anchor string is obtained by the following formula:
Figure BDA0002295539390000061
wherein M isjIndex set, e, representing the matching anchor string for the jth objectjIs the object edge (width or height), N, q represents the geometric progression { a } respectivelynThe number and common ratio of items, item aiIs { anThe ith anchor string in the}. The first term in the equation represents the anchor string closest to the edge, and the second term describes
Figure BDA0002295539390000062
Transition intervals are used to reduce blurring caused by image noise and groudtuth bias, if ejIn the transition interval, both i and i +1 are selected as matching indices.
DeRPN uses a pixel-level combinatorial algorithm that first decodes the predicted width and height segments according to the following four equations:
x=xa+wa×tx
y=ya+ha×ty
w=wa×exp(tw)
h=ha×exp(th)
where x, y, w, h are the predicted width and height segments, xa,ya,wa,haIs the corresponding coordinate of anchor string, tx,ty,tw,thAre the predicted corresponding parametric coordinates.
Then, considering the whole width segment set (denoted by W), screening the width segments according to the probability, and selecting the first N (W)N). For WNOf the first k height segments (y) are selected at the corresponding pixel(k),h(k)) These width and height segment pairs define a series of specific bounding boxes { (x, y)(k),w,h(k)) Denoted as BwThe probability of combining bounding boxes is
Figure BDA0002295539390000063
Similarly, the above steps may be repeated for the height segment to obtain Bn={(x(k),y,w(k)H) then pair BwAnd BnUsing non-maximum suppression (NMS), IoU threshold 0.7, and finally the top M bounding box after NMS is considered as the text region proposal.
And step 3: validating and refining the text proposal in the step 2 by using a refining network, wherein the text proposal comprises text/non-text classification, bounding box regression and arbitrary shape text region representation;
using the geometric properties of the text: the text area, the text center line and the boundary box offset accurately represent the shape of the text boundary box in the step 2, wherein the text center line is formed by contracting the text boundary box, the boundary offset is four channel graphs, and the value is only arranged at the position corresponding to the positive response of the center line feature graph. And sampling n points on the central line of the predicted text at equal intervals from left to right, drawing a normal perpendicular to the tangent line of the predicted text, and intersecting the normal with the upper boundary line and the lower boundary line to obtain two boundary points. For each centerline sampled point, four boundary offsets are obtained by calculating the distance from itself to its two associated boundary points. By connecting all boundary points clockwise, a complete text polygon representation can be obtained.
And 4, step 4: and (3) performing supervision training on the network built in the step (3) to obtain a detector model, inputting a marked training image to train the model as shown in fig. 2, wherein the training image can be marked by a quadrangle or a rectangle.
Designing a two-stage multitask loss function, and calculating loss by using the designed loss function:
L=La+λLb
wherein, L, LaAnd LbTotal loss, first stage loss and second stage loss, respectively, and λ is a weight coefficient that balances between the first stage loss and the second stage loss.
Figure BDA0002295539390000071
Wherein R isj={k|Sk=aj,k=1,2,…,M},
Figure BDA0002295539390000072
Here, anchor string is set to a geometric series of { a }nDenotes that (16,32,64,128,256,512,1024), N is a geometric series { a }nNumber of items in the (S) }, M is batch size, S represents anchor string, PiRepresenting the predicted probability of the ith anchor in the mini-batch. If anchor string is positive, then the ground-route tag
Figure BDA0002295539390000073
Set to 1, otherwise
Figure BDA0002295539390000074
Is 0. t is tiA prediction vector representing the parametric coordinates,
Figure BDA0002295539390000075
is the corresponding group-treth vector. A is an aligned anchoring set, RjRepresenting anchor string index sets containing the same scale, j is used to represent the same as { a }nItem a injCorresponding scale, similarly, GjIs a set of aligned anchor string indices that contain the same scale. Loss of classification LclsIs the cross entropy loss, the regression loss LregIs a smooth L1 loss, λ1Is LclsAnd LregThe balance parameter of (1).
Lb=L12L23L3
Wherein L is1、L2And L3Respectively text/non-text classification loss, bounding box regression loss and arbitrary shape representation loss, lambda2、λ3Is a balance parameter.
L1=Lcls(P, t) ═ logPt, text/non-textThe classification loss is a binary classification loss, t is a label of a classification label, t-1 indicates text, t-0 indicates no text, and the parameter P-is (P)0,P1) Is the confidence of the text and non-text after softmax calculation.
Figure BDA0002295539390000081
Bounding box regression loss using smoothed L1 loss, v ═ vx,vy,vw,vh) Is the target of the bounding box regression, including the coordinates of the center point, width and height,
Figure BDA0002295539390000082
is a predicted value for each text proposal using v and v given in Faster R-CNN*Parameterization, where v and v*The scale invariance and log-space height/width skewness of a target proposal are specified.
L3=μ1LPr2LPcl3Lborder,Ltr、LPclIs the loss of rice coefficient, L, for text regions and text centerlinesborderCalculated by smoothing the L1 loss, μ1、μ2、μ3Is a balance parameter.
And 5: and (4) detecting the picture to be detected by using the detector model in the step (4), and outputting the polygonal text area to obtain a final detection result.
Due to the irregular shape and the height change direction of the curve text, the traditional bounding box has no good flexibility in the curve scene text, and the curve scene text is difficult to be positioned in a compact and robust manner. The RPN is proved to be an effective area proposal generation method, but the anchor box adopted in the RPN is very sensitive, which limits the adaptability to different targets, and the poor setting of the anchor box can cause the performance reduction. The method has the advantages that the method has strong adaptability for proposing the network in the region and representing the text region in any shape, improves the performance of detecting the natural scene curve text, and positions the text more closely and robustly.

Claims (10)

1. A natural scene curve text detection method is characterized by comprising the following steps:
(1) acquiring a plurality of image data sets for training scene curve text detection;
(2) performing feature learning on the image data set obtained in the step (1) by using a Convolutional Neural Network (CNN), and generating a text proposal of an input image by using a dimension decomposition area proposal network DeRPN;
(3) validating and refining the text proposal in step (2) using a refining network, including text/non-text classification, bounding box regression, and arbitrary shape text region representation;
(4) carrying out supervision training on the network built in the step (3) to obtain a detector model;
(5) and (4) detecting the picture to be detected by using the detector model in the step (4), and outputting the polygonal text area to obtain a final detection result.
2. The natural scene curve text detection method according to claim 1, wherein in the step (1), the image dataset is an existing common scene curve text image dataset or a curve text image dataset in a scene is temporarily collected, the image dataset includes N training pictures, each training picture has at least one curve text region, and there is an annotation file describing position information of all text regions in the picture by vertex coordinates of a rectangle or a polygon, and the annotation file is called a label.
3. The natural scene curve text detection method of claim 1, wherein in the step (2), the features (x) are extracted from the convolutional neural network CNN and input to a regression layer and a classification layer, the regression layer is implemented by convolutional layers or fully-connected layers, which are linear operations for predicting parameterized coordinates (t), and the parameterized coordinates are decoded according to an anchor box (B) in order to obtain a predicted bounding box; the classification layer applies an activation function to the predicted values to generate a probability (P) of a bounding boxB) (ii) a Using VGG16 as a backboneThe network adds DeRPN to the conv5 layer thereof, and the DeRPN introduces anchor string through a dimension decomposition mechanism
Figure FDA0002295539380000011
Simultaneous prediction of independent segments (S) as independent regression references for object width and heightw(x,w),Sh(y, h)) and corresponding probabilities
Figure FDA0002295539380000012
Rather than a complete bounding box, the mathematical description of this process is as follows:
Figure FDA0002295539380000013
Figure FDA0002295539380000014
Figure FDA0002295539380000015
wherein, Wr,brRepresenting the weight and deviation of the regression layer, Wc,bcWeights and biases representing the classification levels, x, y, w, h are the coordinates of the bounding box, xa,ya,wa,haIs the corresponding coordinate of anchor string,. psi.w,thParameterized coordinates representing predicted width and height, Sw(x,w),Sh(y, h) represents individual segments predicting width and height,
Figure FDA0002295539380000016
an independent regression reference representing the width and height of the object,
Figure FDA0002295539380000021
representing respective wide and high probabilities;
since the detection result requires a two-dimensional bounding box, the predicted segments need to be reasonably combined to recover the bounding box, and the combining process is mathematically described as follows:
B(x,y,w,h)=f(Sw(x,w),Sh(y,h))
Figure FDA0002295539380000022
where f denotes a rule or algorithm for combining predicted segments, g is a function that evaluates the probability of combining bounding boxes, PBRepresenting the probability of generating a bounding box, and B (x, y, w, h) representing the combined bounding box.
4. The natural scene curve text detection method of claim 1, wherein in the step (3), the geometric properties of the text are utilized to: the text area, the text center line and the boundary box offset accurately represent the shape of the text boundary box in the step (2), wherein the text center line is formed by contracting the text boundary box, the boundary offset is four channel graphs, and the value is only arranged at the position corresponding to the positive response of the center line feature graph; sampling n points on the central line of the predicted text at equal intervals from left to right, drawing a normal line perpendicular to the tangent line of the predicted text, and intersecting the normal line with an upper boundary line and a lower boundary line to obtain two boundary points; for each centerline sampled point, obtaining four boundary offsets by calculating the distance from itself to its two associated boundary points; by connecting all boundary points clockwise, a complete text polygon representation is obtained.
5. The natural scene curve text detection method according to claim 1, wherein in the step (4), in constructing a detector model of the natural scene curve text detection method, the following loss function is used for calculating the loss:
L=La+λLb
wherein, L, LaAnd LbTotal loss, first stage loss and second stage loss, respectively, and λ is a weight coefficient that balances between the first stage loss and the second stage loss.
6. The natural scene curve text detection method of claim 5, wherein the first-stage loss is defined as follows:
Figure FDA0002295539380000023
wherein R isj={k|Sk=aj,k=1,2,…,M},
Figure FDA0002295539380000024
Here, anchor string is set to a geometric series of { a }nDenotes that (16,32,64,128,256,512,1024), N is a geometric series { a }nNumber of items in the (S) }, M is batch size, S represents anchor string, PiRepresenting the prediction probability of ith anchor string in the small batch; if anchor string is positive, then the ground-route tag
Figure FDA0002295539380000025
Set to 1, otherwise
Figure FDA0002295539380000026
Is 0; t is tiA prediction vector representing the parametric coordinates,
Figure FDA0002295539380000031
is the corresponding ground-truth vector, A is the aligned anchoring set, RjRepresenting anchor string index sets containing the same scale, j is used to represent the same as { a }nItem a injCorresponding scale, similarly, GjIs a set of aligned anchor string indices containing the same scale; loss of classification LclsIs the cross entropy loss, the regression loss LregIs a smooth L1 loss, λ1Is LclsAnd LregThe balance parameter of (1).
7. The natural scene curve text detection method of claim 5, wherein the second stage loss is defined as follows:
Lb=L12L23L3
wherein L is1、L2And L3Respectively text/non-text classification loss, bounding box regression loss and arbitrary shape representation loss, lambda2、λ3Is a balance parameter.
8. The natural scene curve text detection method of claim 7, wherein the text/non-text classification loss is a binary classification loss, L1=Lcls(P, t) ═ logPt, t is the label of the classification label, t ═ 1 indicates text, t ═ 0 indicates no text, and the parameter P ═ P (P ═ logPt)0,P1) Is the confidence of the text and non-text after softmax calculation.
9. The natural scene curve text detection method of claim 1, wherein the bounding box regression loss is lost using a smooth L1,
Figure FDA0002295539380000032
v=(vx,vy,vw,vh) Is the target of the bounding box regression, including the coordinates of the center point, width and height,
Figure FDA0002295539380000033
is a predicted value for each text proposal using v and v given in Faster R-CNN*Parameterization, where v and v*The scale invariance and log-space height/width skewness of a target proposal are specified.
10. The natural scene curve text detection method of claim 1, wherein an arbitrary shape represents a loss L3=μ1LPr2LPcl3Lborder,LPr、LPclIs the loss of rice coefficient, L, for text regions and text centerlinesborderCalculated by smoothing the L1 loss, μ1、μ2、μ3Is a balance parameter.
CN201911199614.6A 2019-11-29 2019-11-29 Natural scene curve text detection method Active CN110929665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911199614.6A CN110929665B (en) 2019-11-29 2019-11-29 Natural scene curve text detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911199614.6A CN110929665B (en) 2019-11-29 2019-11-29 Natural scene curve text detection method

Publications (2)

Publication Number Publication Date
CN110929665A true CN110929665A (en) 2020-03-27
CN110929665B CN110929665B (en) 2022-08-26

Family

ID=69847731

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911199614.6A Active CN110929665B (en) 2019-11-29 2019-11-29 Natural scene curve text detection method

Country Status (1)

Country Link
CN (1) CN110929665B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111753812A (en) * 2020-07-30 2020-10-09 上海眼控科技股份有限公司 Text recognition method and equipment
CN112070082A (en) * 2020-08-24 2020-12-11 西安理工大学 Curve character positioning method based on instance perception component merging network
CN112183322A (en) * 2020-09-27 2021-01-05 成都数之联科技有限公司 Text detection and correction method for any shape
CN112464798A (en) * 2020-11-24 2021-03-09 创新奇智(合肥)科技有限公司 Text recognition method and device, electronic equipment and storage medium
EP3905112A1 (en) * 2020-04-28 2021-11-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing text content and electronic device
CN113807336A (en) * 2021-08-09 2021-12-17 华南理工大学 Semi-automatic labeling method, system, computer equipment and medium for image text detection
CN115131797A (en) * 2022-06-28 2022-09-30 北京邮电大学 Scene text detection method based on feature enhancement pyramid network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN109919025A (en) * 2019-01-30 2019-06-21 华南理工大学 Video scene Method for text detection, system, equipment and medium based on deep learning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190130204A1 (en) * 2017-10-31 2019-05-02 The University Of Florida Research Foundation, Incorporated Apparatus and method for detecting scene text in an image
CN109919025A (en) * 2019-01-30 2019-06-21 华南理工大学 Video scene Method for text detection, system, equipment and medium based on deep learning

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3905112A1 (en) * 2020-04-28 2021-11-03 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for recognizing text content and electronic device
US11810384B2 (en) 2020-04-28 2023-11-07 Beijing Baidu Netcom Science Technology Co., Ltd. Method and apparatus for recognizing text content and electronic device
CN111753812A (en) * 2020-07-30 2020-10-09 上海眼控科技股份有限公司 Text recognition method and equipment
CN112070082A (en) * 2020-08-24 2020-12-11 西安理工大学 Curve character positioning method based on instance perception component merging network
CN112070082B (en) * 2020-08-24 2023-04-07 西安理工大学 Curve character positioning method based on instance perception component merging network
CN112183322A (en) * 2020-09-27 2021-01-05 成都数之联科技有限公司 Text detection and correction method for any shape
CN112183322B (en) * 2020-09-27 2022-07-19 成都数之联科技股份有限公司 Text detection and correction method for any shape
CN112464798A (en) * 2020-11-24 2021-03-09 创新奇智(合肥)科技有限公司 Text recognition method and device, electronic equipment and storage medium
CN113807336A (en) * 2021-08-09 2021-12-17 华南理工大学 Semi-automatic labeling method, system, computer equipment and medium for image text detection
CN113807336B (en) * 2021-08-09 2023-06-30 华南理工大学 Semi-automatic labeling method, system, computer equipment and medium for image text detection
CN115131797A (en) * 2022-06-28 2022-09-30 北京邮电大学 Scene text detection method based on feature enhancement pyramid network
CN115131797B (en) * 2022-06-28 2023-06-09 北京邮电大学 Scene text detection method based on feature enhancement pyramid network

Also Published As

Publication number Publication date
CN110929665B (en) 2022-08-26

Similar Documents

Publication Publication Date Title
CN110929665B (en) Natural scene curve text detection method
CN109299274B (en) Natural scene text detection method based on full convolution neural network
Ghaderizadeh et al. Hyperspectral image classification using a hybrid 3D-2D convolutional neural networks
CN108549893B (en) End-to-end identification method for scene text with any shape
CN112232149B (en) Document multimode information and relation extraction method and system
CN109886121B (en) Human face key point positioning method for shielding robustness
Bouti et al. A robust system for road sign detection and classification using LeNet architecture based on convolutional neural network
CN109886066A (en) Fast target detection method based on the fusion of multiple dimensioned and multilayer feature
Farag Recognition of traffic signs by convolutional neural nets for self-driving vehicles
CN112183545A (en) Method for recognizing natural scene text in any shape
Hossain et al. Recognition and solution for handwritten equation using convolutional neural network
Sharma et al. Deep eigen space based ASL recognition system
Sohal Improvement of artificial neural network based character recognition system, using SciLab
Yan Computational Methods for Deep Learning: Theory, Algorithms, and Implementations
He et al. Classification of metro facilities with deep neural networks
Li A deep learning-based text detection and recognition approach for natural scenes
Wu CNN-Based Recognition of Handwritten Digits in MNIST Database
Parashivamurthy et al. Recognition of Kannada character scripts using hybrid feature extraction and ensemble learning approaches
Varlik et al. Filtering airborne LIDAR data by using fully convolutional networks
Shi et al. Fuzzy support tensor product adaptive image classification for the internet of things
Li Special character recognition using deep learning
CN114708591A (en) Document image Chinese character detection method based on single character connection
CN114332473A (en) Object detection method, object detection device, computer equipment, storage medium and program product
Suvetha et al. Automatic Traffic Sign Detection System With Voice Assistant
Kurama et al. Detection of natural features and objects in satellite images by semantic segmentation using neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant