CN106897732A - Multi-direction Method for text detection in a kind of natural picture based on connection word section - Google Patents
Multi-direction Method for text detection in a kind of natural picture based on connection word section Download PDFInfo
- Publication number
- CN106897732A CN106897732A CN201710010596.7A CN201710010596A CN106897732A CN 106897732 A CN106897732 A CN 106897732A CN 201710010596 A CN201710010596 A CN 201710010596A CN 106897732 A CN106897732 A CN 106897732A
- Authority
- CN
- China
- Prior art keywords
- bounding box
- word section
- word
- connection
- section
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
- G06V10/225—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention discloses a kind of based on multi-direction Method for text detection in the natural picture for connecting word section, word section and connection are two steps of key in the detection method, are defined as follows:Word section refers to marking off many single multidirectional bounding box regions on picture, and they surround a part for a word bar or word;Connection refers to coupling together adjacent field, it is meant that they belong to same word or with short.The full convolutional neural networks that word section and connection are used in combination an end-to-end training are equally spaced detected with various yardsticks.Last testing result is first to connect multiple word section composition new regions, obtained from being then combined to these new regions.Detection method proposed by the present invention all achieves the effect of brilliance in terms of these relative to prior art in accuracy rate, speed and model ease, efficiency high and strong robustness, the picture background of complexity can be overcome, in addition also can in detection image non-latin text long text.
Description
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of natural figure based on connection word section
Multi-direction Method for text detection in piece.
Background technology
The text read in nature picture is a full of challenges popular task, in photo optical identification, geo-location
There are many actual applications with image retrieval aspect.In Reading text system, text detection is exactly in word level or text
Character area is positioned with bounding box in brief note rank, it is generally all as the very crucial first step.In a sense and
Speech, text detection can also be considered as a kind of special object detection, will word, character or word bar as detection target.
Although existing technology is applied to be achieved in text detection greatly successfully by object detecting method,
It is that object detecting method still has some clearly disadvantageous in terms of character area is positioned.First, the length-width ratio of word or word bar
Generally all bigger than general object many, method before is difficult to produce the bounding box of this ratio;Second, some non-Latin languages
Text between adjacent words and do not include space, such as Chinese character.Existing technology is all only able to detect word, in inspection
Will not applied to when surveying this text, because this text not comprising space cannot provide the vision letter for dividing various words
Breath.3rd, in large-scale natural scene picture, word is probably any direction, but existing technology is most all only
The word in energy detection level direction.Therefore the text detection in natural scene picture is still the difficulty of technical field of computer vision
One of point.
The content of the invention
It is an object of the invention to provide multi-direction Method for text detection in a kind of natural picture based on connection word section,
The method detection text accuracy rate is high, and speed is fast, and model is simple, and strong robustness, can overcome the picture background of complexity, in addition
The long text of non-latin text can be detected.
To achieve the above object, the present invention solves the problems, such as scene text detection from a brand-new visual angle, there is provided one
Multi-direction Method for text detection in the natural picture based on connection word section is planted, is comprised the steps:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is the square of entry
Four point coordinates of the initial bounding box of shape, obtain training dataset;
(1.2) the word section detection model for output character section and connection can be predicted according to entry label, institute are defined
State network model to be made up of concatenated convolutional neutral net and convolution fallout predictor, word section is calculated according to above-mentioned training dataset
With the label of connection, allowable loss function, with reference to online amplification and online negative sample hardly possible example digging technology means, using reversely biography
Guiding method trains the network, obtains word section detection model, including following sub-step:
(1.2.1) builds word section detection convolutional neural networks model:Preceding several layers of convolution units of feature are extracted from pre-
The VGG-16 networks of training, preceding several layers of convolution units turn respectively for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7
Convolutional layer 6 and convolutional layer 7 are changed to, it is behind some extra convolutional layers for adding, the feature for extracting more depth to connect
Detected, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layers divide afterwards
Various sizes of characteristic pattern not being exported, being easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this
Carried out on six various sizes of characteristic patterns;For this 6 convolutional layers, the filtering that size is 3 × 3 is all added after each layer
Device as convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note
Training image collection after scaling is Itr ', wI、hIRespectively the width of Itr ' and height, can take 384 × 384 or 512 × 512
Pixel, the i-th pictures Itri' as mode input, ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip],
Wherein WijIt is j-th word bounding box on the i-th pictures, word bounding box can be that word level can also be entry rank, j=
1 ..., total quantity that p, p are word bounding box on the i-th pictures;The characteristic pattern composition set that 6 layers of convolutional layer is exported respectively after note
Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is the l layers of characteristic pattern of output, w in rear 6 layers of convolutional layerl、hl
The respectively width and height of this feature figure, Itroil' on coordinate (x, y) correspondence Itri' on (xa, ya) centered on point coordinates
The initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l
=1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q
=1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter
It is comprised in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight
Degree h meets:So this initial bounding box BilqBe marked as positive class, label value is 1, and with height
Closest that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiAll it is unsatisfactory for above-mentioned two condition
When, BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label class
It is not identical;Wherein, proportionality constant 1.5 is empirical value;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and is calculated just
Class word field offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+As at the beginning of positive class
Beginning bounding box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W and horizontal direction for matching
Angle is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the portion on the right
Point;C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True mark
The geometric parameter x of labels、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ
θs), computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width
Degree, height and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+The horizontal seat of central point
Mark, central point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point is horizontal
Coordinate xsRelatively initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation
Amount, height hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initial
On bounding box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical;
For feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial encirclement
Box set BilIn, two initial bounding boxsLabel be all positive class, andSame word is matched,
SoBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' right
The initial bounding box set B for answeringilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1)
In initial bounding boxLabel be all positive class and match same word bounding box Wij, thenBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, s is defeated for predictive text section
Go out:To model initialization weight and biasing, preceding 60,000 training iterative step learning rate is set to 10-3, afterwards learning rate decay to
10-4;For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture
Itri' on (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor all can be pre-
Measure BilqIt is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1.Simultaneously
Predict 5 numeralsAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initial
Bounding box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width
Offset variation amount, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer,
In same characteristic pattern Itroil' upper coordinate points (x, y) place, takes the point of neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1
(x ', y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected
Neighbour's word section s in the layer for connecing(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1
It is 16 dimensional vectors, wherein, w is connection in subscript, expression layer;
For parallel link, a parallel link will be corresponding at two points on two characteristic patterns of continuous convolution layer output
Word section is connected;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, and l layers of output is special
Levy figure Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and
Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l
Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models
Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just
With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4
Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layersBetween parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)Be connected between its all 4 neighbour's words section just,
Negative score, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns;
Literary field label, positive class word section true excursions that (1.2.7) is obtained with step (1.2.3) and step (1.2.4)
Amount, connection label are used as output reference, the word section classification predicted with step (1.2.5) and score, the word field offset of prediction
Amount, the connection of step (1.2.6) prediction are scored at prediction output, the target loss letter between design prediction output and output reference
Number, is constantly trained to minimize the classification of word section, word section to word section connecting detection model using reverse conduction method
Skew returns the loss with link sort, for word section connecting detection modelling target loss function, target loss
Function is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is
The connection score of prediction, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as just
Class, then ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl)
It is the connection score c of predictionlSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween
Smooth L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for carrying out normalizing to the classification of word section and recurrence loss
Change;nlIt is positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant, 1 is taken in practice;
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrCarry out online
Amplification, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to phase
Before same size and batch is loaded, they are randomly cut into image block one by one, and each image block is true with word section
The jaccard overlap coefficients o of real bounding box is minimum;For multi-direction word, data amplification is in multi-direction word bounding box
Carried out on minimum area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, schemed
As the size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and connection are negative
Sample occupies the major part of training sample, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to text
Field and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
(2) word section and connecting detection are carried out to text image to be detected using the above-mentioned convolutional neural networks for training,
Including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers can be predicted
Go out the word section of different scale, the characteristic pattern exported by same convolutional layer predicts the word section of same scale:To figure to be detected
I-th text image Itst to be detected in image set Itsti, uniform sizes are zoomed to, specific size can be with text diagram to be detected
The situation of picture is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' it is input to step (1.2)
In the word section connecting detection model that trains, the set Itsto that the characteristic pattern that 6 layers of convolutional layer are exported respectively after obtaining is constitutedi′
=[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of characteristic pattern of output, l=in rear 6 layers of convolutional layer
1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict (x,
Y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predicting 5 numeralsAs being predicted to be positive class word section s+When geometrical offset amount;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the company
Connect including connection and parallel link in layer:Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for
Connection in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With it 8
Individual neighbour's word sectionInterbed in connection positive and negative score cl1;For across
Layer connection, 3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction
Score cl;
(2.3) the word section confidence score and connection confidence score combination for obtaining, its Chinese Fields confidence will be detected
Degree score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor:
Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same characteristic pattern
Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With 8 neighbour's word sectionsInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3
Product fallout predictor can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction
Score cl。
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to (2.3) middle detection, word section and connect that filtering convolution fallout predictor is exported
Connect, using the word section after filtering as node, to connect as side, build connection figure:For step (2) text image to be detected
The word section s and connection N of the fixed qty for being input to word section detection model and producings, filtered by their score;
It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set;Filtering threshold can be artificial according to different data
The different value of setting, when carrying out multi-direction text image detection in practice, can take α=0.9, and β=0.7 is carried out multilingual
During long text image detection, α=0.9 can be taken, β=0.5 when carrying out horizontal text detection, can take α=0.6, β=0.3;
Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them;
(3.2) depth-first search is performed on the diagram to find the component of interconnection, and each component is denoted as set B, wraps
Containing the word section being connected together by connection;
(3.3) the literary set of fields S that depth-first search is obtained is carried out on the diagram to step (3.2), by following step
A complete word is combined into, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section
Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word in set
The deviation angle of section bounding box s, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set SDistance summation it is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second
End points, xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively output is wrapped
The horizontal stroke at Wei He centers, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box,
wp、wqThe width of the bounding box respectively centered on point p and the width of the bounding box centered on q;
(3.3.7)hbTo export the height of bounding box,For i-th word section is surrounded in set
The height of box s, is obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, by coordinate parameters, dimensional parameters, angle parameter
Represent;
The bounding box b that (3.3.9) output is combined.
By the contemplated above technical scheme of the present invention, compared with prior art, the present invention has following technique effect:
(1) multi-direction word can be detected:Text in natural scene picture is typically any direction or distortion, this
Inventive method character area can carry out partial descriptions by word section bounding box, and word section bounding box can be configured to arbitrarily
Direction, therefore multi-direction or distorted shape word can be included.
(2) flexibility ratio is high:The inventive method can also detect the word bar of random length, because cypher section is only relied on
In the connection of prediction, therefore word can be both detected, it is also possible to detect word bar;
(3) strong robustness:The inventive method is used carries out partial descriptions, this partial descriptions with word section bounding box
Method can overcome complexity natural picture background, text feature is captured from picture;
(4) efficiency high:The inventive method word section detection model be it is end-to-end be trained, it is per second to process super
Cross 20 it is big it is small be 512x512 images because word section and connection are all to carry out once positive biography by full convolution CNN models
Broadcast acquisition, it is not necessary to offline scaling and rotation is carried out to input picture;
(5) highly versatile:Some non-Latin texts are between adjacent words and do not include space, such as the Chinese Chinese
Word.Existing technology is all only able to detect word, will not applied to when this text is detected, because this not comprising space
Text cannot provide the visual information for dividing various words.Except latin text, the present invention can also detect non-latin text
Long text, because the inventive method need not provide vision division information using space.
Brief description of the drawings
Fig. 1 is the flow chart of multi-direction text detection in natural picture of the present invention based on word section connection;
Fig. 2 is the schematic diagram that the present invention calculates word section true tag parameters;
Fig. 3 is the output composition schematic diagram of convolution fallout predictor of the present invention;
Fig. 4 is word section connecting detection prototype network connection figure of the present invention;
Fig. 5 is to text diagram to be detected in one embodiment of the invention using the word section connecting detection network model for training
Result figure as detect word section and connection output bounding box.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.As long as additionally, technical characteristic involved in invention described below each implementation method
Not constituting conflict each other can just be mutually combined.
Hereinafter technical term of the invention is explained and illustrated first:
Convolutional neural networks (Concolutional Neural Network, CNN):One kind can be used for image classification, return
The neutral net of task such as return.Network is generally made up of convolutional layer, down-sampled layer and full articulamentum.Convolutional layer and down-sampled layer are negative
Duty extracts the feature of image, and full articulamentum is responsible for classifying or is returned.The parameter of network includes the ginseng of convolution kernel and full articulamentum
Number and biasing, parameter can be obtained by reverse conduction algorithm from data learning;
VGG16:The second place of ILSVRC is VGGNet within 2014, comprising 16 CONV/FC layers, with highly uniform frame
Structure, has much attraction, and 3x3 convolution and 2x2 ponds layer are only carried out from start to end, as classical convolutional neural networks mould
Type.Their pre-training model can be used for the plug and play of Caffe.The depth that it demonstrates network is crucial group of superperformance
Into part.
Depth-first search (DFS):It is it is a kind of for travel through or search tree or figure algorithm.Along the extreme saturation of tree
The node of tree, the branch of as deep as possible search tree.When node v place side all oneself sought, search will trace back to discovery
The start node on that side of node v.This process is performed until and has found untill the reachable all nodes of source node.Such as
Also there is undiscovered node in fruit, then select one of them as source node and repeat above procedure, and whole process is entered repeatedly
Row is untill all nodes are all accessed.Belong to the classic algorithm in graph theory, can be produced using Depth Priority Algorithm
The corresponding topological sorting table of target figure, the graph theoretic problem of many correlations can be easily solved using topological sorting table, such as maximum
Routing problem etc..
As shown in figure 1, Method for text detection is comprised the following steps under natural scene of the present invention based on spatial alternation:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is the square of entry
Four point coordinates of the initial bounding box of shape, obtain training dataset;
(1.2) the word section detection model for output character section and connection can be predicted according to entry mark, institute are defined
State network model to be made up of concatenated convolutional neutral net and convolution fallout predictor, word section is calculated according to above-mentioned training dataset
With the label of connection, allowable loss function, with reference to online amplification and online negative sample hardly possible example digging technology means, using reversely biography
Guiding method trains the network, obtains word section detection model, including following sub-step:
(1.2.1) builds word section detection convolutional neural networks model:Preceding several layers of convolution units of feature are extracted from pre-
The VGG-16 networks of training, preceding several layers of convolution units turn respectively for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7
Convolutional layer 6 and convolutional layer 7 are changed to, it is behind some extra convolutional layers for adding, the feature for extracting more depth to connect
Detected, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layers divide afterwards
Various sizes of characteristic pattern not being exported, being easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this
Carried out on six various sizes of characteristic patterns;For this 6 convolutional layers, the filtering that size is 3 × 3 is all added after each layer
Device as convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note
Training image collection after scaling is Itr ', wI、hIRespectively the width of Itr ' and height, can take 384 × 384 or 512 × 512
Pixel, the i-th pictures Itri' as mode input, ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip],
Wherein WijIt is j-th word bounding box on the i-th pictures, word bounding box can be that word level can also be entry rank, j=
1 ..., total quantity that p, p are word bounding box on the i-th pictures;The characteristic pattern composition set that 6 layers of convolutional layer is exported respectively after note
Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is the l layers of characteristic pattern of output, w in rear 6 layers of convolutional layerl、hl
The respectively width and height of this feature figure, Itroil' on coordinate (x, y) correspondence Itri' on (xa, ya) centered on point coordinates
The initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l
=1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q
=1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter
It is comprised in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight
Degree h meets:So this initial bounding box BilqBe marked as positive class, label value is 1, and with height
Closest that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiAll it is unsatisfactory for above-mentioned two condition
When, BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label class
It is not identical;Wherein, proportionality constant 1.5 is empirical value;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and is calculated just
Class word field offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+As at the beginning of positive class
Beginning bounding box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W and horizontal direction for matching
Angle is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the portion on the right
Point;C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True mark
The geometric parameter x of labels、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ
θs), computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width
Degree, height and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+The horizontal seat of central point
Mark, central point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point is horizontal
Coordinate xsRelatively initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation
Amount, height hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initial
On bounding box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical;
For feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial encirclement
Box set BilIn, two initial bounding boxsLabel be all positive class, andSame word is matched,
SoBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' right
The initial bounding box set B for answeringilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1)
In initial bounding boxLabel be all positive class and match same word bounding box Wij, thenBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, s is defeated for predictive text section
Go out:To model initialization weight and biasing, preceding 60,000 training iterative step learning rate is set to 10-3, afterwards learning rate decay to
10-4;For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture
Itri' on (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor all can be pre-
Measure BilqIt is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1.Simultaneously
Predict 5 numeralsAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initial
Bounding box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width
Offset variation amount, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer,
In same characteristic pattern Itroil' upper coordinate points (x, y) place, takes the point of neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1
(x ', y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected
Neighbour's word section s in 8 layers for connecing(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1
It is 16 dimensional vectors, wherein, w is connection in subscript, expression layer;
For parallel link, a parallel link will be corresponding at two points on two characteristic patterns of continuous convolution layer output
Word section is connected;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, and l layers of output is special
Levy figure Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and
Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l
Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models
Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just
With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4
Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layersBetween parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)Be connected between its all 4 neighbour's words section just,
Negative score, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns;
Literary field label, positive class word section true excursions that (1.2.7) is obtained with step (1.2.3) and step (1.2.4)
Amount, connection label are used as output reference, the word section classification predicted with step (1.2.5) and score, the word field offset of prediction
Amount, the connection of step (1.2.6) prediction are scored at prediction output, the target loss letter between design prediction output and output reference
Number, is constantly trained to minimize the classification of word section, word section to word section connecting detection model using reverse conduction method
Skew returns the loss with link sort, for word section connecting detection modelling target loss function, target loss
Function is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is
The connection score of prediction, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as just
Class, then ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl)
It is the connection score c of predictionlSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween
Smooth L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for carrying out normalizing to the classification of word section and recurrence loss
Change;nlIt is positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant, 1 is taken in practice.
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrCarry out online
Amplification, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to phase
Before same size and batch is loaded, they are randomly cut into image block one by one, and each image block is true with word section
The jaccard overlap coefficients o of real bounding box is minimum;For multi-direction word, data amplification is in multi-direction word bounding box
Carried out on minimum area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, schemed
As the size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and connection are negative
Sample occupies the major part of training sample, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to text
Field and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
(2) word section and connecting detection are carried out to text image to be detected using the above-mentioned convolutional neural networks for training,
Including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers can be predicted
Go out the word section of different scale, the characteristic pattern exported by same convolutional layer predicts the word section of same scale:To figure to be detected
I-th text image Itst to be detected in image set Itsti, uniform sizes are zoomed to, specific size can be with text diagram to be detected
The situation of picture is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' it is input to step (1.2)
In the word section connecting detection model that trains, the set Itsto that the characteristic pattern that 6 layers of convolutional layer are exported respectively after obtaining is constitutedi′
=[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of characteristic pattern of output, l=in rear 6 layers of convolutional layer
1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict (x,
Y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predicting 5 numeralsAs being predicted to be positive class word section s+When geometrical offset amount;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the company
Connect including connection and parallel link in layer:Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for
Connection in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With it 8
Individual neighbour's word sectionInterbed in connection positive and negative score cl1;For across
Layer connection, 3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction
Score cl;
(2.3) the word section confidence score and connection confidence score combination for obtaining, its Chinese Fields confidence will be detected
Degree score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor:
Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same characteristic pattern
Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With 8 neighbour's word sectionsInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3
Product fallout predictor predicts l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction
Score cl。
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to (2.3) middle detection, word section and connect that filtering convolution fallout predictor is exported
Connect, using the word section after filtering as node, to connect as side, build connection figure:For step (2) text image to be detected
The word section s and connection N of the fixed qty for being input to word section detection model and producings, filtered by their score;
It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set;Filtering threshold can be artificial according to different data
The different value of setting, when carrying out multi-direction text image detection in practice, can take α=0.9, and β=0.7 is carried out multilingual
During long text image detection, α=0.9 can be taken, β=0.5 when carrying out horizontal text detection, can take α=0.6, β=0.3;
Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them;
(3.2) depth-first search is performed on the diagram to find the component of interconnection, and each component is denoted as set B, wraps
Containing the word section being connected together by connection;
(3.3) the literary set of fields S that depth-first search is obtained is carried out on the diagram to step (3.2), by following step
A complete word is combined into, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section
Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word in set
The deviation angle of section bounding box s, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set SDistance summation it is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second
End points, xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively output is wrapped
The horizontal stroke at Wei He centers, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box,
wp、wqThe width of the bounding box respectively centered on point p and the width of the bounding box centered on q;
(3.3.7)hbTo export the height of bounding box,For i-th word section is surrounded in set
The height of box s, is obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, by coordinate parameters, dimensional parameters, angle parameter
Represent;
The bounding box b that (3.3.9) output is combined.
Claims (7)
1. it is a kind of based on multi-direction Method for text detection in the natural picture for connecting word section, it is characterised in that methods described bag
Include following step:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is at the beginning of the rectangle of entry
Four point coordinates of beginning bounding box, obtain training dataset;
(1.2) the word section connecting detection network mould for output character section and connection can be predicted according to entry label is defined
Type, the word section connecting detection network model is made up of concatenated convolutional neutral net and convolution fallout predictor, according to above-mentioned training
Data set is calculated the label of word section and connection, and allowable loss function digs with reference to online amplification and online negative sample hardly possible example
Pick method, this article field connecting detection network is trained using reverse conduction method, obtains word section connecting detection network model;
(2) word section and connection are carried out to text image to be detected using the above-mentioned word section connecting detection network model for training
Detection, including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers predicts different chis
The word section of degree, the characteristic pattern exported by same convolutional layer predicts the word section of same scale;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the connection bag
Include connection and parallel link in layer;
(2.3) confidence score and the connection confidence score combination of the word section for obtaining, its Chinese Fields confidence level will be detected
Score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor;
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to detection in (2.3), the word section of filtering convolution fallout predictor output and connection, with
Word section after filtering, to connect as side, builds connection figure as node;
(3.2) perform depth-first search on the diagram to find the component of interconnection, each component is denoted as set S, comprising by
The word section that connection is connected together;
(3.3) the word section during is gathered is combined into a complete entry, calculates complete entry bounding box and exports.
2. according to claim 1 based on multi-direction Method for text detection, its feature in the natural picture for connecting word section
It is that the step (1.2) is specially:
(1.2.1) builds word section detection convolutional neural networks model:The preceding several layers of convolution units for extracting feature come from pre-training
VGG-16 networks, preceding several layers of convolution units are respectively converted into for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7
Convolutional layer 6 and convolutional layer 7, it is behind some extra convolutional layers for adding to connect, and the feature for extracting more depth is carried out
Detection, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layer difference are defeated afterwards
Go out various sizes of characteristic pattern, be easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this six
Carried out on various sizes of characteristic pattern;For this 6 convolutional layers, the wave filter that size is 3 × 3 is all added after each layer and is made
It is convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note scaling
Training image collection afterwards is Itr ', wI、hIThe respectively width of Itr ' and height, with the i-th pictures Itri' as mode input,
ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip], wherein WijFor j-th word on the i-th pictures is surrounded
Box, word bounding box is word level or entry rank, and j=1 ..., p, p are ItriThe total quantity of ' upper word bounding box;6 after note
The characteristic pattern that layer convolutional layer is exported respectively constitutes set Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is rear 6
The l layers of characteristic pattern of output, w in layer convolutional layerl、hlThe respectively width and height of this feature figure, Itroil' on coordinate (x,
Y) correspondence Itri' on (xa, ya) centered on point coordinates the initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l=
1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q=
1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter quilt
It is included in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight h
Meet:So this initial bounding box BilqPositive class is marked as, label value is 1, and with height most
It is close that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiWhen being all unsatisfactory for above-mentioned two condition,
BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label classification phase
Together;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and positive class text is calculated
Fields offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+Initially wrapped by positive class
Enclose box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W for matching and horizontal direction angle
It is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the part on the right;
C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True tag
Geometric parameter xs、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ θs),
Computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width, height
Degree and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+Central point abscissa, center
Point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point abscissa xsPhase
To initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation amount, height
hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initially to surround
On box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical;For
Feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial bounding box collection
Close BilIn, two initial bounding boxsLabel be all positive class, andMatch same word, thenBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' corresponding first
Beginning bounding box set BilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1)In just
Beginning bounding boxLabel be all positive class and match same word bounding boxSoBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, predictive text section s is exported:It is right
Model initialization weight and biasing, preceding 60,000 training iterative step learning rate are set to 10-3, afterwards learning rate decay to 10-4;
For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture Itri' on
With (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor can all predict Bilq
It is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1;Also predict 5 simultaneously
Individual numeralAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initially wrapped
Enclose box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width it is inclined
Move variable quantity, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer, same
One characteristic pattern Itroil' upper coordinate points (x, y) place, take neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1 point (x ',
Y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected
Neighbour's word section s in layer(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1It is 16
Dimensional vector, wherein, w is connection in subscript, expression layer;
For parallel link, corresponding word at two points on the characteristic patterns that a parallel link exports two continuous convolution layers
Duan Xianglian;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, l layers of output characteristic figure
Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and
Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l
Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models
Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just
With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4
Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layers
Between parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)What is be connected between its all 4 neighbour's words section is positive and negative
Point, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns;
Literary field label, positive class word section true excursions amount, company that (1.2.7) is obtained with step (1.2.3) and step (1.2.4)
Label is connect as output reference, the word section classification and score, the word field offset amount of prediction, step predicted with step (1.2.5)
Suddenly the connection of (1.2.6) prediction is scored at prediction output, and the target loss function between design prediction output and output reference is right
Word section connecting detection model is constantly trained to minimize the classification of word section, word field offset using reverse conduction method
The loss with link sort is returned, for word section connecting detection modelling target loss function, target loss function
It is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is prediction
Connection score, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as positive class, then
ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl) it is prediction
Connection score clSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween it is smooth
L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for being normalized to the classification of word section and recurrence loss;nlIt is
Positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant;
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrExpanded online
Increase, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to identical
Size and before batch loads, they are randomly cut into image block one by one, each image block and word section it is true
The jaccard overlap coefficients o of bounding box is minimum;For multi-direction word, data amplification be in multi-direction word bounding box most
Carried out on small area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, image
The size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and the negative sample of connection
Originally the major part of training sample is occupied, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to word
Section and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
3. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special
Levy and be, the step (2.1) is specially:
Treat i-th text image Itst to be detected in detection image collection Itsti, zooming to uniform sizes, specific size can be with
The situation of text image to be detected is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' input
The word section connecting detection model trained in step (1.2), obtains the characteristic pattern composition that rear 6 layers of convolutional layer are exported respectively
Set Itstoi'=[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of feature of output in rear 6 layers of convolutional layer
Figure, l=1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict
Go out (x, y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predict 5 numerals making
To be predicted to be positive class word section s+When geometrical offset amount.
4. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special
Levy and be, the step (2.2) is specially:
Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same Zhang Tezheng
Figure Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With its 8 neighbour's word sectionInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3
Product fallout predictor can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction
Score cl。
5. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special
Levy and be, the step (2.3) is specially:
According to step (2.1) and the result of step (2.2), in each characteristic pattern Itstoil' upper coordinate (x, y) place, will predict
Word section score cs, word section skewConnection score c in layerl1, parallel link score
cl2This four be concatenated into one 33 dimension vector, after the output channel of convolution fallout predictor increase by one layer extra softmax layers with
Word section score and connection score are standardized respectively.
6. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special
Levy and be, the step (3.1) is specially;
For step (2) text image to be detected be input to word section detection model and the word section s of the fixed qty that produces and
Connection Ns, filtered by their score;It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set;
Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them.
7. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special
Levy and be, the step (3.3) is specially:Carry out the literary set of fields that depth-first search is obtained on the diagram to step (3.2)
S, a complete word is combined into by following step, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section
Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word section bag in set
The deviation angle of box s is enclosed, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set S's
The summation of distance is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second end points,
xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively export bounding box
The horizontal stroke at center, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box, wp、wqPoint
It is not the width and the width of the bounding box centered on q of the bounding box centered on point p;
(3.3.7)hbTo export the height of bounding box,It is i-th word section bounding box s in set
Highly, obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, is represented by coordinate parameters, dimensional parameters, angle parameter;
The bounding box b that (3.3.9) output is combined.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710010596.7A CN106897732B (en) | 2017-01-06 | 2017-01-06 | It is a kind of based on connection text section natural picture in multi-direction Method for text detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710010596.7A CN106897732B (en) | 2017-01-06 | 2017-01-06 | It is a kind of based on connection text section natural picture in multi-direction Method for text detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106897732A true CN106897732A (en) | 2017-06-27 |
CN106897732B CN106897732B (en) | 2019-10-08 |
Family
ID=59197865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710010596.7A Active CN106897732B (en) | 2017-01-06 | 2017-01-06 | It is a kind of based on connection text section natural picture in multi-direction Method for text detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106897732B (en) |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107766860A (en) * | 2017-10-31 | 2018-03-06 | 武汉大学 | Natural scene image Method for text detection based on concatenated convolutional neutral net |
CN107844785A (en) * | 2017-12-08 | 2018-03-27 | 浙江捷尚视觉科技股份有限公司 | A kind of method for detecting human face based on size estimation |
CN107977620A (en) * | 2017-11-29 | 2018-05-01 | 华中科技大学 | A kind of multi-direction scene text single detection method based on full convolutional network |
CN108304835A (en) * | 2018-01-30 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | character detecting method and device |
CN108427924A (en) * | 2018-03-09 | 2018-08-21 | 华中科技大学 | A kind of text recurrence detection method based on rotational sensitive feature |
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN109086663A (en) * | 2018-06-27 | 2018-12-25 | 大连理工大学 | The natural scene Method for text detection of dimension self-adaption based on convolutional neural networks |
WO2019057169A1 (en) * | 2017-09-25 | 2019-03-28 | 腾讯科技(深圳)有限公司 | Text detection method, storage medium, and computer device |
CN109583367A (en) * | 2018-11-28 | 2019-04-05 | 网易(杭州)网络有限公司 | Image text row detection method and device, storage medium and electronic equipment |
CN109685718A (en) * | 2018-12-17 | 2019-04-26 | 中国科学院自动化研究所 | Picture quadrate Zoom method, system and device |
CN109886264A (en) * | 2019-01-08 | 2019-06-14 | 深圳禾思众成科技有限公司 | A kind of character detecting method, equipment and computer readable storage medium |
CN109886286A (en) * | 2019-01-03 | 2019-06-14 | 武汉精测电子集团股份有限公司 | Object detection method, target detection model and system based on cascade detectors |
CN109977997A (en) * | 2019-02-13 | 2019-07-05 | 中国科学院自动化研究所 | Image object detection and dividing method based on convolutional neural networks fast robust |
CN110032969A (en) * | 2019-04-11 | 2019-07-19 | 北京百度网讯科技有限公司 | For text filed method, apparatus, equipment and the medium in detection image |
CN110490232A (en) * | 2019-07-18 | 2019-11-22 | 北京捷通华声科技股份有限公司 | Method, apparatus, the equipment, medium of training literal line direction prediction model |
CN111259764A (en) * | 2020-01-10 | 2020-06-09 | 中国科学技术大学 | Text detection method and device, electronic equipment and storage device |
CN111291759A (en) * | 2020-01-17 | 2020-06-16 | 北京三快在线科技有限公司 | Character detection method and device, electronic equipment and storage medium |
CN111444674A (en) * | 2020-03-09 | 2020-07-24 | 稿定(厦门)科技有限公司 | Character deformation method, medium and computer equipment |
CN111914822A (en) * | 2020-07-23 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Text image labeling method and device, computer readable storage medium and equipment |
CN111967463A (en) * | 2020-06-23 | 2020-11-20 | 南昌大学 | Method for detecting curve fitting of curved text in natural scene |
US20210019569A1 (en) * | 2019-07-16 | 2021-01-21 | Ancestry.Com Operations Inc. | Extraction of genealogy data from obituaries |
CN115620081A (en) * | 2022-09-27 | 2023-01-17 | 北京百度网讯科技有限公司 | Training method of target detection model, target detection method and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050471A (en) * | 2014-05-27 | 2014-09-17 | 华中科技大学 | Natural scene character detection method and system |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105574513A (en) * | 2015-12-22 | 2016-05-11 | 北京旷视科技有限公司 | Character detection method and device |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
WO2016124103A1 (en) * | 2015-02-03 | 2016-08-11 | 阿里巴巴集团控股有限公司 | Picture detection method and device |
CN106156711A (en) * | 2015-04-21 | 2016-11-23 | 华中科技大学 | The localization method of line of text and device |
-
2017
- 2017-01-06 CN CN201710010596.7A patent/CN106897732B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104050471A (en) * | 2014-05-27 | 2014-09-17 | 华中科技大学 | Natural scene character detection method and system |
WO2016124103A1 (en) * | 2015-02-03 | 2016-08-11 | 阿里巴巴集团控股有限公司 | Picture detection method and device |
CN106156711A (en) * | 2015-04-21 | 2016-11-23 | 华中科技大学 | The localization method of line of text and device |
CN105184312A (en) * | 2015-08-24 | 2015-12-23 | 中国科学院自动化研究所 | Character detection method and device based on deep learning |
CN105469047A (en) * | 2015-11-23 | 2016-04-06 | 上海交通大学 | Chinese detection method based on unsupervised learning and deep learning network and system thereof |
CN105574513A (en) * | 2015-12-22 | 2016-05-11 | 北京旷视科技有限公司 | Character detection method and device |
CN105608456A (en) * | 2015-12-22 | 2016-05-25 | 华中科技大学 | Multi-directional text detection method based on full convolution network |
Non-Patent Citations (1)
Title |
---|
姚聪: "自然图像中文字检测与识别研究", 《中国博士学位论文全文数据库》 * |
Cited By (33)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11030471B2 (en) | 2017-09-25 | 2021-06-08 | Tencent Technology (Shenzhen) Company Limited | Text detection method, storage medium, and computer device |
WO2019057169A1 (en) * | 2017-09-25 | 2019-03-28 | 腾讯科技(深圳)有限公司 | Text detection method, storage medium, and computer device |
CN107766860A (en) * | 2017-10-31 | 2018-03-06 | 武汉大学 | Natural scene image Method for text detection based on concatenated convolutional neutral net |
CN107977620A (en) * | 2017-11-29 | 2018-05-01 | 华中科技大学 | A kind of multi-direction scene text single detection method based on full convolutional network |
CN107977620B (en) * | 2017-11-29 | 2020-05-19 | 华中科技大学 | Multi-direction scene text single detection method based on full convolution network |
CN107844785B (en) * | 2017-12-08 | 2019-09-24 | 浙江捷尚视觉科技股份有限公司 | A kind of method for detecting human face based on size estimation |
CN107844785A (en) * | 2017-12-08 | 2018-03-27 | 浙江捷尚视觉科技股份有限公司 | A kind of method for detecting human face based on size estimation |
CN108304835A (en) * | 2018-01-30 | 2018-07-20 | 百度在线网络技术(北京)有限公司 | character detecting method and device |
CN108427924A (en) * | 2018-03-09 | 2018-08-21 | 华中科技大学 | A kind of text recurrence detection method based on rotational sensitive feature |
CN108549893B (en) * | 2018-04-04 | 2020-03-31 | 华中科技大学 | End-to-end identification method for scene text with any shape |
CN108549893A (en) * | 2018-04-04 | 2018-09-18 | 华中科技大学 | A kind of end-to-end recognition methods of the scene text of arbitrary shape |
CN109086663B (en) * | 2018-06-27 | 2021-11-05 | 大连理工大学 | Natural scene text detection method based on scale self-adaption of convolutional neural network |
CN109086663A (en) * | 2018-06-27 | 2018-12-25 | 大连理工大学 | The natural scene Method for text detection of dimension self-adaption based on convolutional neural networks |
CN109583367A (en) * | 2018-11-28 | 2019-04-05 | 网易(杭州)网络有限公司 | Image text row detection method and device, storage medium and electronic equipment |
CN109685718A (en) * | 2018-12-17 | 2019-04-26 | 中国科学院自动化研究所 | Picture quadrate Zoom method, system and device |
CN109886286A (en) * | 2019-01-03 | 2019-06-14 | 武汉精测电子集团股份有限公司 | Object detection method, target detection model and system based on cascade detectors |
CN109886286B (en) * | 2019-01-03 | 2021-07-23 | 武汉精测电子集团股份有限公司 | Target detection method based on cascade detector, target detection model and system |
CN109886264A (en) * | 2019-01-08 | 2019-06-14 | 深圳禾思众成科技有限公司 | A kind of character detecting method, equipment and computer readable storage medium |
CN109977997A (en) * | 2019-02-13 | 2019-07-05 | 中国科学院自动化研究所 | Image object detection and dividing method based on convolutional neural networks fast robust |
CN110032969A (en) * | 2019-04-11 | 2019-07-19 | 北京百度网讯科技有限公司 | For text filed method, apparatus, equipment and the medium in detection image |
CN110032969B (en) * | 2019-04-11 | 2021-11-05 | 北京百度网讯科技有限公司 | Method, apparatus, device, and medium for detecting text region in image |
US11537816B2 (en) * | 2019-07-16 | 2022-12-27 | Ancestry.Com Operations Inc. | Extraction of genealogy data from obituaries |
US20230109073A1 (en) * | 2019-07-16 | 2023-04-06 | Ancestry.Com Operations Inc. | Extraction of genealogy data from obituaries |
US20210019569A1 (en) * | 2019-07-16 | 2021-01-21 | Ancestry.Com Operations Inc. | Extraction of genealogy data from obituaries |
CN110490232A (en) * | 2019-07-18 | 2019-11-22 | 北京捷通华声科技股份有限公司 | Method, apparatus, the equipment, medium of training literal line direction prediction model |
CN111259764A (en) * | 2020-01-10 | 2020-06-09 | 中国科学技术大学 | Text detection method and device, electronic equipment and storage device |
CN111291759A (en) * | 2020-01-17 | 2020-06-16 | 北京三快在线科技有限公司 | Character detection method and device, electronic equipment and storage medium |
CN111444674A (en) * | 2020-03-09 | 2020-07-24 | 稿定(厦门)科技有限公司 | Character deformation method, medium and computer equipment |
CN111444674B (en) * | 2020-03-09 | 2022-07-01 | 稿定(厦门)科技有限公司 | Character deformation method, medium and computer equipment |
CN111967463A (en) * | 2020-06-23 | 2020-11-20 | 南昌大学 | Method for detecting curve fitting of curved text in natural scene |
CN111914822A (en) * | 2020-07-23 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Text image labeling method and device, computer readable storage medium and equipment |
CN111914822B (en) * | 2020-07-23 | 2023-11-17 | 腾讯科技(深圳)有限公司 | Text image labeling method, device, computer readable storage medium and equipment |
CN115620081A (en) * | 2022-09-27 | 2023-01-17 | 北京百度网讯科技有限公司 | Training method of target detection model, target detection method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106897732B (en) | 2019-10-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106897732A (en) | Multi-direction Method for text detection in a kind of natural picture based on connection word section | |
Zhang et al. | Scene classification via a gradient boosting random convolutional network framework | |
Zhu et al. | Deep learning in remote sensing: A comprehensive review and list of resources | |
Kantorov et al. | Contextlocnet: Context-aware deep network models for weakly supervised localization | |
CN104217214B (en) | RGB D personage's Activity recognition methods based on configurable convolutional neural networks | |
CN106650725B (en) | Candidate text box generation and text detection method based on full convolution neural network | |
Zhang et al. | EMS-GCN: An end-to-end mixhop superpixel-based graph convolutional network for hyperspectral image classification | |
Sonka et al. | Image processing, analysis and machine vision | |
CN109919177B (en) | Feature selection method based on hierarchical deep network | |
CN108830188A (en) | Vehicle checking method based on deep learning | |
CN108764308A (en) | A kind of recognition methods again of the pedestrian based on convolution loop network | |
CN104462494B (en) | A kind of remote sensing image retrieval method and system based on unsupervised feature learning | |
CN108427924A (en) | A kind of text recurrence detection method based on rotational sensitive feature | |
Sharma et al. | Shark detection from aerial imagery using region-based CNN, a study | |
CN107239733A (en) | Continuous hand-written character recognizing method and system | |
CN107016357A (en) | A kind of video pedestrian detection method based on time-domain convolutional neural networks | |
CN104299006A (en) | Vehicle license plate recognition method based on deep neural network | |
Sharma et al. | Deep eigen space based ASL recognition system | |
CN108268890A (en) | A kind of hyperspectral image classification method | |
Khasanah et al. | Implementation of data augmentation using convolutional neural network for batik classification | |
Xu et al. | Grouped bidirectional LSTM network and multistage fusion convolutional transformer for hyperspectral image classification | |
CN108009512A (en) | A kind of recognition methods again of the personage based on convolutional neural networks feature learning | |
Koziarski et al. | Marine snow removal using a fully convolutional 3d neural network combined with an adaptive median filter | |
CN114612709A (en) | Multi-scale target detection method guided by image pyramid characteristics | |
Deng et al. | Exo-atmospheric infrared objects classification using recurrence-plots-based convolutional neural networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |