CN106897732A - Multi-direction Method for text detection in a kind of natural picture based on connection word section - Google Patents

Multi-direction Method for text detection in a kind of natural picture based on connection word section Download PDF

Info

Publication number
CN106897732A
CN106897732A CN201710010596.7A CN201710010596A CN106897732A CN 106897732 A CN106897732 A CN 106897732A CN 201710010596 A CN201710010596 A CN 201710010596A CN 106897732 A CN106897732 A CN 106897732A
Authority
CN
China
Prior art keywords
bounding box
word section
word
connection
section
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710010596.7A
Other languages
Chinese (zh)
Other versions
CN106897732B (en
Inventor
白翔
石葆光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201710010596.7A priority Critical patent/CN106897732B/en
Publication of CN106897732A publication Critical patent/CN106897732A/en
Application granted granted Critical
Publication of CN106897732B publication Critical patent/CN106897732B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • G06V10/225Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition based on a marking or identifier characterising the area
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention discloses a kind of based on multi-direction Method for text detection in the natural picture for connecting word section, word section and connection are two steps of key in the detection method, are defined as follows:Word section refers to marking off many single multidirectional bounding box regions on picture, and they surround a part for a word bar or word;Connection refers to coupling together adjacent field, it is meant that they belong to same word or with short.The full convolutional neural networks that word section and connection are used in combination an end-to-end training are equally spaced detected with various yardsticks.Last testing result is first to connect multiple word section composition new regions, obtained from being then combined to these new regions.Detection method proposed by the present invention all achieves the effect of brilliance in terms of these relative to prior art in accuracy rate, speed and model ease, efficiency high and strong robustness, the picture background of complexity can be overcome, in addition also can in detection image non-latin text long text.

Description

Multi-direction Method for text detection in a kind of natural picture based on connection word section
Technical field
The invention belongs to technical field of computer vision, more particularly, to a kind of natural figure based on connection word section Multi-direction Method for text detection in piece.
Background technology
The text read in nature picture is a full of challenges popular task, in photo optical identification, geo-location There are many actual applications with image retrieval aspect.In Reading text system, text detection is exactly in word level or text Character area is positioned with bounding box in brief note rank, it is generally all as the very crucial first step.In a sense and Speech, text detection can also be considered as a kind of special object detection, will word, character or word bar as detection target.
Although existing technology is applied to be achieved in text detection greatly successfully by object detecting method, It is that object detecting method still has some clearly disadvantageous in terms of character area is positioned.First, the length-width ratio of word or word bar Generally all bigger than general object many, method before is difficult to produce the bounding box of this ratio;Second, some non-Latin languages Text between adjacent words and do not include space, such as Chinese character.Existing technology is all only able to detect word, in inspection Will not applied to when surveying this text, because this text not comprising space cannot provide the vision letter for dividing various words Breath.3rd, in large-scale natural scene picture, word is probably any direction, but existing technology is most all only The word in energy detection level direction.Therefore the text detection in natural scene picture is still the difficulty of technical field of computer vision One of point.
The content of the invention
It is an object of the invention to provide multi-direction Method for text detection in a kind of natural picture based on connection word section, The method detection text accuracy rate is high, and speed is fast, and model is simple, and strong robustness, can overcome the picture background of complexity, in addition The long text of non-latin text can be detected.
To achieve the above object, the present invention solves the problems, such as scene text detection from a brand-new visual angle, there is provided one Multi-direction Method for text detection in the natural picture based on connection word section is planted, is comprised the steps:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is the square of entry Four point coordinates of the initial bounding box of shape, obtain training dataset;
(1.2) the word section detection model for output character section and connection can be predicted according to entry label, institute are defined State network model to be made up of concatenated convolutional neutral net and convolution fallout predictor, word section is calculated according to above-mentioned training dataset With the label of connection, allowable loss function, with reference to online amplification and online negative sample hardly possible example digging technology means, using reversely biography Guiding method trains the network, obtains word section detection model, including following sub-step:
(1.2.1) builds word section detection convolutional neural networks model:Preceding several layers of convolution units of feature are extracted from pre- The VGG-16 networks of training, preceding several layers of convolution units turn respectively for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7 Convolutional layer 6 and convolutional layer 7 are changed to, it is behind some extra convolutional layers for adding, the feature for extracting more depth to connect Detected, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layers divide afterwards Various sizes of characteristic pattern not being exported, being easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this Carried out on six various sizes of characteristic patterns;For this 6 convolutional layers, the filtering that size is 3 × 3 is all added after each layer Device as convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note Training image collection after scaling is Itr ', wI、hIRespectively the width of Itr ' and height, can take 384 × 384 or 512 × 512 Pixel, the i-th pictures Itri' as mode input, ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip], Wherein WijIt is j-th word bounding box on the i-th pictures, word bounding box can be that word level can also be entry rank, j= 1 ..., total quantity that p, p are word bounding box on the i-th pictures;The characteristic pattern composition set that 6 layers of convolutional layer is exported respectively after note Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is the l layers of characteristic pattern of output, w in rear 6 layers of convolutional layerl、hl The respectively width and height of this feature figure, Itroil' on coordinate (x, y) correspondence Itri' on (xa, ya) centered on point coordinates The initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l =1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q =1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter It is comprised in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight Degree h meets:So this initial bounding box BilqBe marked as positive class, label value is 1, and with height Closest that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiAll it is unsatisfactory for above-mentioned two condition When, BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label class It is not identical;Wherein, proportionality constant 1.5 is empirical value;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and is calculated just Class word field offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+As at the beginning of positive class Beginning bounding box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W and horizontal direction for matching Angle is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the portion on the right Point;C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True mark The geometric parameter x of labels、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ θs), computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width Degree, height and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+The horizontal seat of central point Mark, central point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point is horizontal Coordinate xsRelatively initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation Amount, height hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initial On bounding box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical; For feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial encirclement Box set BilIn, two initial bounding boxsLabel be all positive class, andSame word is matched, SoBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' right The initial bounding box set B for answeringilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1) In initial bounding boxLabel be all positive class and match same word bounding box Wij, thenBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, s is defeated for predictive text section Go out:To model initialization weight and biasing, preceding 60,000 training iterative step learning rate is set to 10-3, afterwards learning rate decay to 10-4;For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture Itri' on (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor all can be pre- Measure BilqIt is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1.Simultaneously Predict 5 numeralsAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initial Bounding box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width Offset variation amount, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer, In same characteristic pattern Itroil' upper coordinate points (x, y) place, takes the point of neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1 (x ', y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected Neighbour's word section s in the layer for connecing(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1 It is 16 dimensional vectors, wherein, w is connection in subscript, expression layer;
For parallel link, a parallel link will be corresponding at two points on two characteristic patterns of continuous convolution layer output Word section is connected;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, and l layers of output is special Levy figure Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4 Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layersBetween parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)Be connected between its all 4 neighbour's words section just, Negative score, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns
Literary field label, positive class word section true excursions that (1.2.7) is obtained with step (1.2.3) and step (1.2.4) Amount, connection label are used as output reference, the word section classification predicted with step (1.2.5) and score, the word field offset of prediction Amount, the connection of step (1.2.6) prediction are scored at prediction output, the target loss letter between design prediction output and output reference Number, is constantly trained to minimize the classification of word section, word section to word section connecting detection model using reverse conduction method Skew returns the loss with link sort, for word section connecting detection modelling target loss function, target loss Function is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is The connection score of prediction, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as just Class, then ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl) It is the connection score c of predictionlSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween Smooth L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for carrying out normalizing to the classification of word section and recurrence loss Change;nlIt is positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant, 1 is taken in practice;
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrCarry out online Amplification, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to phase Before same size and batch is loaded, they are randomly cut into image block one by one, and each image block is true with word section The jaccard overlap coefficients o of real bounding box is minimum;For multi-direction word, data amplification is in multi-direction word bounding box Carried out on minimum area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, schemed As the size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and connection are negative Sample occupies the major part of training sample, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to text Field and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
(2) word section and connecting detection are carried out to text image to be detected using the above-mentioned convolutional neural networks for training, Including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers can be predicted Go out the word section of different scale, the characteristic pattern exported by same convolutional layer predicts the word section of same scale:To figure to be detected I-th text image Itst to be detected in image set Itsti, uniform sizes are zoomed to, specific size can be with text diagram to be detected The situation of picture is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' it is input to step (1.2) In the word section connecting detection model that trains, the set Itsto that the characteristic pattern that 6 layers of convolutional layer are exported respectively after obtaining is constitutedi′ =[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of characteristic pattern of output, l=in rear 6 layers of convolutional layer 1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict (x, Y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predicting 5 numeralsAs being predicted to be positive class word section s+When geometrical offset amount;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the company Connect including connection and parallel link in layer:Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for Connection in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With it 8 Individual neighbour's word sectionInterbed in connection positive and negative score cl1;For across Layer connection, 3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction Score cl
(2.3) the word section confidence score and connection confidence score combination for obtaining, its Chinese Fields confidence will be detected Degree score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor: Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With 8 neighbour's word sectionsInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3 Product fallout predictor can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction Score cl
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to (2.3) middle detection, word section and connect that filtering convolution fallout predictor is exported Connect, using the word section after filtering as node, to connect as side, build connection figure:For step (2) text image to be detected The word section s and connection N of the fixed qty for being input to word section detection model and producings, filtered by their score; It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set;Filtering threshold can be artificial according to different data The different value of setting, when carrying out multi-direction text image detection in practice, can take α=0.9, and β=0.7 is carried out multilingual During long text image detection, α=0.9 can be taken, β=0.5 when carrying out horizontal text detection, can take α=0.6, β=0.3; Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them;
(3.2) depth-first search is performed on the diagram to find the component of interconnection, and each component is denoted as set B, wraps Containing the word section being connected together by connection;
(3.3) the literary set of fields S that depth-first search is obtained is carried out on the diagram to step (3.2), by following step A complete word is combined into, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word in set The deviation angle of section bounding box s, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set SDistance summation it is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second End points, xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively output is wrapped The horizontal stroke at Wei He centers, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box, wp、wqThe width of the bounding box respectively centered on point p and the width of the bounding box centered on q;
(3.3.7)hbTo export the height of bounding box,For i-th word section is surrounded in set The height of box s, is obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, by coordinate parameters, dimensional parameters, angle parameter Represent;
The bounding box b that (3.3.9) output is combined.
By the contemplated above technical scheme of the present invention, compared with prior art, the present invention has following technique effect:
(1) multi-direction word can be detected:Text in natural scene picture is typically any direction or distortion, this Inventive method character area can carry out partial descriptions by word section bounding box, and word section bounding box can be configured to arbitrarily Direction, therefore multi-direction or distorted shape word can be included.
(2) flexibility ratio is high:The inventive method can also detect the word bar of random length, because cypher section is only relied on In the connection of prediction, therefore word can be both detected, it is also possible to detect word bar;
(3) strong robustness:The inventive method is used carries out partial descriptions, this partial descriptions with word section bounding box Method can overcome complexity natural picture background, text feature is captured from picture;
(4) efficiency high:The inventive method word section detection model be it is end-to-end be trained, it is per second to process super Cross 20 it is big it is small be 512x512 images because word section and connection are all to carry out once positive biography by full convolution CNN models Broadcast acquisition, it is not necessary to offline scaling and rotation is carried out to input picture;
(5) highly versatile:Some non-Latin texts are between adjacent words and do not include space, such as the Chinese Chinese Word.Existing technology is all only able to detect word, will not applied to when this text is detected, because this not comprising space Text cannot provide the visual information for dividing various words.Except latin text, the present invention can also detect non-latin text Long text, because the inventive method need not provide vision division information using space.
Brief description of the drawings
Fig. 1 is the flow chart of multi-direction text detection in natural picture of the present invention based on word section connection;
Fig. 2 is the schematic diagram that the present invention calculates word section true tag parameters;
Fig. 3 is the output composition schematic diagram of convolution fallout predictor of the present invention;
Fig. 4 is word section connecting detection prototype network connection figure of the present invention;
Fig. 5 is to text diagram to be detected in one embodiment of the invention using the word section connecting detection network model for training Result figure as detect word section and connection output bounding box.
Specific embodiment
In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.As long as additionally, technical characteristic involved in invention described below each implementation method Not constituting conflict each other can just be mutually combined.
Hereinafter technical term of the invention is explained and illustrated first:
Convolutional neural networks (Concolutional Neural Network, CNN):One kind can be used for image classification, return The neutral net of task such as return.Network is generally made up of convolutional layer, down-sampled layer and full articulamentum.Convolutional layer and down-sampled layer are negative Duty extracts the feature of image, and full articulamentum is responsible for classifying or is returned.The parameter of network includes the ginseng of convolution kernel and full articulamentum Number and biasing, parameter can be obtained by reverse conduction algorithm from data learning;
VGG16:The second place of ILSVRC is VGGNet within 2014, comprising 16 CONV/FC layers, with highly uniform frame Structure, has much attraction, and 3x3 convolution and 2x2 ponds layer are only carried out from start to end, as classical convolutional neural networks mould Type.Their pre-training model can be used for the plug and play of Caffe.The depth that it demonstrates network is crucial group of superperformance Into part.
Depth-first search (DFS):It is it is a kind of for travel through or search tree or figure algorithm.Along the extreme saturation of tree The node of tree, the branch of as deep as possible search tree.When node v place side all oneself sought, search will trace back to discovery The start node on that side of node v.This process is performed until and has found untill the reachable all nodes of source node.Such as Also there is undiscovered node in fruit, then select one of them as source node and repeat above procedure, and whole process is entered repeatedly Row is untill all nodes are all accessed.Belong to the classic algorithm in graph theory, can be produced using Depth Priority Algorithm The corresponding topological sorting table of target figure, the graph theoretic problem of many correlations can be easily solved using topological sorting table, such as maximum Routing problem etc..
As shown in figure 1, Method for text detection is comprised the following steps under natural scene of the present invention based on spatial alternation:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is the square of entry Four point coordinates of the initial bounding box of shape, obtain training dataset;
(1.2) the word section detection model for output character section and connection can be predicted according to entry mark, institute are defined State network model to be made up of concatenated convolutional neutral net and convolution fallout predictor, word section is calculated according to above-mentioned training dataset With the label of connection, allowable loss function, with reference to online amplification and online negative sample hardly possible example digging technology means, using reversely biography Guiding method trains the network, obtains word section detection model, including following sub-step:
(1.2.1) builds word section detection convolutional neural networks model:Preceding several layers of convolution units of feature are extracted from pre- The VGG-16 networks of training, preceding several layers of convolution units turn respectively for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7 Convolutional layer 6 and convolutional layer 7 are changed to, it is behind some extra convolutional layers for adding, the feature for extracting more depth to connect Detected, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layers divide afterwards Various sizes of characteristic pattern not being exported, being easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this Carried out on six various sizes of characteristic patterns;For this 6 convolutional layers, the filtering that size is 3 × 3 is all added after each layer Device as convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note Training image collection after scaling is Itr ', wI、hIRespectively the width of Itr ' and height, can take 384 × 384 or 512 × 512 Pixel, the i-th pictures Itri' as mode input, ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip], Wherein WijIt is j-th word bounding box on the i-th pictures, word bounding box can be that word level can also be entry rank, j= 1 ..., total quantity that p, p are word bounding box on the i-th pictures;The characteristic pattern composition set that 6 layers of convolutional layer is exported respectively after note Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is the l layers of characteristic pattern of output, w in rear 6 layers of convolutional layerl、hl The respectively width and height of this feature figure, Itroil' on coordinate (x, y) correspondence Itri' on (xa, ya) centered on point coordinates The initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l =1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q =1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter It is comprised in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight Degree h meets:So this initial bounding box BilqBe marked as positive class, label value is 1, and with height Closest that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiAll it is unsatisfactory for above-mentioned two condition When, BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label class It is not identical;Wherein, proportionality constant 1.5 is empirical value;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and is calculated just Class word field offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+As at the beginning of positive class Beginning bounding box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W and horizontal direction for matching Angle is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the portion on the right Point;C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True mark The geometric parameter x of labels、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ θs), computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width Degree, height and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+The horizontal seat of central point Mark, central point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point is horizontal Coordinate xsRelatively initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation Amount, height hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initial On bounding box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical; For feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial encirclement Box set BilIn, two initial bounding boxsLabel be all positive class, andSame word is matched, SoBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' right The initial bounding box set B for answeringilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1) In initial bounding boxLabel be all positive class and match same word bounding box Wij, thenBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, s is defeated for predictive text section Go out:To model initialization weight and biasing, preceding 60,000 training iterative step learning rate is set to 10-3, afterwards learning rate decay to 10-4;For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture Itri' on (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor all can be pre- Measure BilqIt is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1.Simultaneously Predict 5 numeralsAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initial Bounding box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width Offset variation amount, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer, In same characteristic pattern Itroil' upper coordinate points (x, y) place, takes the point of neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1 (x ', y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected Neighbour's word section s in 8 layers for connecing(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1 It is 16 dimensional vectors, wherein, w is connection in subscript, expression layer;
For parallel link, a parallel link will be corresponding at two points on two characteristic patterns of continuous convolution layer output Word section is connected;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, and l layers of output is special Levy figure Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4 Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layersBetween parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)Be connected between its all 4 neighbour's words section just, Negative score, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns
Literary field label, positive class word section true excursions that (1.2.7) is obtained with step (1.2.3) and step (1.2.4) Amount, connection label are used as output reference, the word section classification predicted with step (1.2.5) and score, the word field offset of prediction Amount, the connection of step (1.2.6) prediction are scored at prediction output, the target loss letter between design prediction output and output reference Number, is constantly trained to minimize the classification of word section, word section to word section connecting detection model using reverse conduction method Skew returns the loss with link sort, for word section connecting detection modelling target loss function, target loss Function is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is The connection score of prediction, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as just Class, then ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl) It is the connection score c of predictionlSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween Smooth L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for carrying out normalizing to the classification of word section and recurrence loss Change;nlIt is positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant, 1 is taken in practice.
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrCarry out online Amplification, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to phase Before same size and batch is loaded, they are randomly cut into image block one by one, and each image block is true with word section The jaccard overlap coefficients o of real bounding box is minimum;For multi-direction word, data amplification is in multi-direction word bounding box Carried out on minimum area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, schemed As the size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and connection are negative Sample occupies the major part of training sample, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to text Field and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
(2) word section and connecting detection are carried out to text image to be detected using the above-mentioned convolutional neural networks for training, Including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers can be predicted Go out the word section of different scale, the characteristic pattern exported by same convolutional layer predicts the word section of same scale:To figure to be detected I-th text image Itst to be detected in image set Itsti, uniform sizes are zoomed to, specific size can be with text diagram to be detected The situation of picture is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' it is input to step (1.2) In the word section connecting detection model that trains, the set Itsto that the characteristic pattern that 6 layers of convolutional layer are exported respectively after obtaining is constitutedi′ =[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of characteristic pattern of output, l=in rear 6 layers of convolutional layer 1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict (x, Y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predicting 5 numeralsAs being predicted to be positive class word section s+When geometrical offset amount;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the company Connect including connection and parallel link in layer:Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for Connection in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With it 8 Individual neighbour's word sectionInterbed in connection positive and negative score cl1;For across Layer connection, 3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction Score cl
(2.3) the word section confidence score and connection confidence score combination for obtaining, its Chinese Fields confidence will be detected Degree score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor: Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same characteristic pattern Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With 8 neighbour's word sectionsInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3 Product fallout predictor predicts l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction Score cl
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to (2.3) middle detection, word section and connect that filtering convolution fallout predictor is exported Connect, using the word section after filtering as node, to connect as side, build connection figure:For step (2) text image to be detected The word section s and connection N of the fixed qty for being input to word section detection model and producings, filtered by their score; It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set;Filtering threshold can be artificial according to different data The different value of setting, when carrying out multi-direction text image detection in practice, can take α=0.9, and β=0.7 is carried out multilingual During long text image detection, α=0.9 can be taken, β=0.5 when carrying out horizontal text detection, can take α=0.6, β=0.3; Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them;
(3.2) depth-first search is performed on the diagram to find the component of interconnection, and each component is denoted as set B, wraps Containing the word section being connected together by connection;
(3.3) the literary set of fields S that depth-first search is obtained is carried out on the diagram to step (3.2), by following step A complete word is combined into, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word in set The deviation angle of section bounding box s, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set SDistance summation it is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second End points, xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively output is wrapped The horizontal stroke at Wei He centers, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box, wp、wqThe width of the bounding box respectively centered on point p and the width of the bounding box centered on q;
(3.3.7)hbTo export the height of bounding box,For i-th word section is surrounded in set The height of box s, is obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, by coordinate parameters, dimensional parameters, angle parameter Represent;
The bounding box b that (3.3.9) output is combined.

Claims (7)

1. it is a kind of based on multi-direction Method for text detection in the natural picture for connecting word section, it is characterised in that methods described bag Include following step:
(1) training word section connecting detection network model, including following sub-step:
(1.1) content of text of all text images is concentrated with entry level flag training image, label is at the beginning of the rectangle of entry Four point coordinates of beginning bounding box, obtain training dataset;
(1.2) the word section connecting detection network mould for output character section and connection can be predicted according to entry label is defined Type, the word section connecting detection network model is made up of concatenated convolutional neutral net and convolution fallout predictor, according to above-mentioned training Data set is calculated the label of word section and connection, and allowable loss function digs with reference to online amplification and online negative sample hardly possible example Pick method, this article field connecting detection network is trained using reverse conduction method, obtains word section connecting detection network model;
(2) word section and connection are carried out to text image to be detected using the above-mentioned word section connecting detection network model for training Detection, including following sub-step:
(2.1) word section detection is carried out to text image to be detected, the characteristic pattern exported by different convolutional layers predicts different chis The word section of degree, the characteristic pattern exported by same convolutional layer predicts the word section of same scale;
(2.2) the word section on all characteristic layers detected to text image to be detected is attached detection, the connection bag Include connection and parallel link in layer;
(2.3) confidence score and the connection confidence score combination of the word section for obtaining, its Chinese Fields confidence level will be detected Score includes the positive and negative category score of word section and side-play amount score, and softmax standardized scores are exported using convolution fallout predictor;
(3) cypher section and connection, obtain exporting bounding box, including following sub-step:
(3.1) standardized score obtained according to detection in (2.3), the word section of filtering convolution fallout predictor output and connection, with Word section after filtering, to connect as side, builds connection figure as node;
(3.2) perform depth-first search on the diagram to find the component of interconnection, each component is denoted as set S, comprising by The word section that connection is connected together;
(3.3) the word section during is gathered is combined into a complete entry, calculates complete entry bounding box and exports.
2. according to claim 1 based on multi-direction Method for text detection, its feature in the natural picture for connecting word section It is that the step (1.2) is specially:
(1.2.1) builds word section detection convolutional neural networks model:The preceding several layers of convolution units for extracting feature come from pre-training VGG-16 networks, preceding several layers of convolution units are respectively converted into for convolutional layer 1 to pond layer 5, full articulamentum 6 and full articulamentum 7 Convolutional layer 6 and convolutional layer 7, it is behind some extra convolutional layers for adding to connect, and the feature for extracting more depth is carried out Detection, including convolutional layer 8, convolutional layer 9, convolutional layer 10, last layer is convolutional layer 11;6 different convolutional layer difference are defeated afterwards Go out various sizes of characteristic pattern, be easy to extract the high-quality characteristics of various yardsticks, detection word section and connection are at this six Carried out on various sizes of characteristic pattern;For this 6 convolutional layers, the wave filter that size is 3 × 3 is all added after each layer and is made It is convolution fallout predictor, to detect word section and connection jointly;
(1.2.2) produces word section bounding box label from the word bounding box of mark:For original training image collection Itr, note scaling Training image collection afterwards is Itr ', wI、hIThe respectively width of Itr ' and height, with the i-th pictures Itri' as mode input, ItriAll word bounding boxs of ' upper mark are denoted as Wi=[Wi1..., Wip], wherein WijFor j-th word on the i-th pictures is surrounded Box, word bounding box is word level or entry rank, and j=1 ..., p, p are ItriThe total quantity of ' upper word bounding box;6 after note The characteristic pattern that layer convolutional layer is exported respectively constitutes set Itroi'=[Itroi1' ..., Itroi6'], wherein Itroil' it is rear 6 The l layers of characteristic pattern of output, w in layer convolutional layerl、hlThe respectively width and height of this feature figure, Itroil' on coordinate (x, Y) correspondence Itri' on (xa, ya) centered on point coordinates the initial bounding box B of levelilq, they meet following equation:
Initial bounding box BilqWide and height be all configured to a constant al, for the ratio of controlled output word section, l= 1 ..., 6;Remember the l layers of characteristic pattern Itro of outputil' corresponding initial bounding box collection is combined into Bil=[Bil1..., Bilm], q= 1 ..., m, wherein m are the number of initial bounding box on the l layers of characteristic pattern of output;As long as initial bounding box BilqCenter quilt It is included in the word bounding box W of the upper any marks of Itr 'ijInside, and BilqSize alWith the word bounding box W of the markijHeight h Meet:So this initial bounding box BilqPositive class is marked as, label value is 1, and with height most It is close that word bounding box WijMatching;Otherwise, B is worked asilqWith all word bounding box WiWhen being all unsatisfactory for above-mentioned two condition, BilqNegative class is flagged as, label value is 0;Word section is produced on initial bounding box, with initial bounding box label classification phase Together;
Word section is produced on the initial bounding box of the tape label that (1.2.3) is produced in the step (1.2.2) and positive class text is calculated Fields offset amount:Negative class word section bounding box s-It is the initial bounding box B of negative class-;Positive class word section bounding box s+Initially wrapped by positive class Enclose box B+Obtained by following steps:A) the positive initial bounding box B of class is remembered+The mark word bounding box W for matching and horizontal direction angle It is θs, with B+Central point centered on, W is turned clockwise θsAngle;B) W is cut, removal exceeds B+The left side and the part on the right; C) with B+Central point centered on, by the word bounding box W ' rotate counterclockwises θ after cuttingsAngle, obtains word section s+True tag Geometric parameter xs、ys、ws、hs、θs;D) it is calculated literary s+Relative to B+Side-play amount (Δ xs, Δ ys, Δ ws, Δ hs, Δ θs), Computing formula is as follows:
xs=alΔxs+xa
ys=alΔys+ya
ws=alexp(Δws)
hs=alexp(Δhs)
θs=Δ θs
Wherein, xs、ys、ws、hs、θsRespectively word section bounding box s+Central point abscissa, central point ordinate, width, height Degree and the angle between horizontal direction;xa、ya、wa、haThe respectively initial bounding box B of level+Central point abscissa, center Point ordinate, width, height;Δxs、Δys、Δws、Δhs、ΔθsRespectively word section bounding box s+Central point abscissa xsPhase To initial bounding box B+Side-play amount, ordinate ysThe side-play amount of relatively initial bounding box, width wsOffset variation amount, height hsOffset variation amount, angle, θsSide-play amount;
(1.2.4) calculates connection label for the word section bounding box that step (1.2.3) is produced:Word section s is initially to surround On box B produce, therefore between s connection label and their corresponding initial bounding box B between connection label it is identical;For Feature set of graphs Itroi'=[Itroi1' ..., Itroi6'], if in same characteristic pattern Itroil' initial bounding box collection Close BilIn, two initial bounding boxsLabel be all positive class, andMatch same word, thenBetween layer in connection be marked as positive class, otherwise labeled as negative class;If in characteristic pattern Itroil' corresponding first Beginning bounding box set BilIn initial bounding boxAnd Itroi(l-1)' corresponding initial bounding box set Bi(l-1)In just Beginning bounding boxLabel be all positive class and match same word bounding boxSoBetween parallel link be marked as positive class, otherwise labeled as negative class;
(1.2.5), using the training image collection Itr ' after scaling as word section detection model input, predictive text section s is exported:It is right Model initialization weight and biasing, preceding 60,000 training iterative step learning rate are set to 10-3, afterwards learning rate decay to 10-4; For rear 6 layers of convolutional layer, in l layers of characteristic pattern Itroil' on coordinate (x, y) place, (x, y) corresponds to input picture Itri' on With (xa, ya) centered on point coordinates, with alIt is the initial bounding box B of sizeilq, 3 × 3 convolution fallout predictor can all predict Bilq It is divided into the score c of positive and negative classs, csIt is bivector, span is the decimal between 0-1;Also predict 5 simultaneously Individual numeralAs being divided into positive class word section s+When geometrical offset amount, whereinThe word section bounding box s for respectively predicting+The relatively positive class of central point abscissa is initially wrapped Enclose box B+Side-play amount, the initial bounding box B of relatively positive class of ordinate+Side-play amount, the offset variation amount of height, width it is inclined Move variable quantity, offset;
(1.2.6) it is predicted that word section on the basis of connection and parallel link output in prediction interval:For being connected in layer, same One characteristic pattern Itroil' upper coordinate points (x, y) place, take neighbour in the range of x-1≤x '≤x+1, y-1≤y '≤y+1 point (x ', Y '), this 8 points correspond to input picture Itri' when, just obtain benchmark word section s corresponding with (x, y)(x, y, l)It is connected Neighbour's word section s in layer(x ', y ', l), neighbour's word section is represented by set in 8 layers:
3 × 3 convolution fallout predictors can predict s(x, y, l)Gather with neighbour in layerConnection positive and negative score cl1, cl1It is 16 Dimensional vector, wherein, w is connection in subscript, expression layer;
For parallel link, corresponding word at two points on the characteristic patterns that a parallel link exports two continuous convolution layers Duan Xianglian;Due to every by one layer of convolutional layer, the width and height of characteristic pattern can all reduce half, l layers of output characteristic figure Itroil' width wlWith height hlIt is l-1 layers of characteristic pattern Itroi(l-1)' width wl-1With height hl-1Half, and Itroil' corresponding initial bounding box yardstick alIt is Itroi(l-1)' corresponding initial bounding box yardstick al-12 times, in l Layer output characteristic figure Itroil' on (x, y), in characteristic pattern Itroi(l-1)' above take 2x≤x '≤2x+1,2y≤y '≤2y+1 models Enclose 4 interior cross-layer Neighbor Points (x ', y '), Itroil' upper (x, y) corresponds to input picture Itri' on initial bounding box just With Itroi(l-1)' upper 4 cross-layer Neighbor Points correspond to input picture Itri' on 4 initial bounding box locus overlap, 4 Individual cross-layer neighbour word section s(x ', y ', l-1)It is represented by set:
3 × 3 convolution fallout predictors can predict l layers of benchmark word section s(x, y, l)With neighbour's text set of fields on l-1 layers Between parallel link positive and negative score cl2, cl2It is 8 dimensional vectors:
Wherein,Represent that fallout predictor predicts s(x, y, l)What is be connected between its all 4 neighbour's words section is positive and negative Point, c is subscript, represents parallel link;
Connection in all of layerWith all of parallel linkConstitute articulation set Ns
Literary field label, positive class word section true excursions amount, company that (1.2.7) is obtained with step (1.2.3) and step (1.2.4) Label is connect as output reference, the word section classification and score, the word field offset amount of prediction, step predicted with step (1.2.5) Suddenly the connection of (1.2.6) prediction is scored at prediction output, and the target loss function between design prediction output and output reference is right Word section connecting detection model is constantly trained to minimize the classification of word section, word field offset using reverse conduction method The loss with link sort is returned, for word section connecting detection modelling target loss function, target loss function It is three weighted sums of loss:
Wherein ysIt is the label of all word sections, csIt is the word section score of prediction, ylIt is the connection label of prediction, clIt is prediction Connection score, by connecting score c in layerl1With cross-layer score cl2Composition;If i-th initial bounding box is labeled as positive class, then ysI ()=1, is otherwise 0;Lconf(ys, cs) it is the word section score c for predictingsSoftmax loss, Lconf(ys, cl) it is prediction Connection score clSoftmax loss,It is the word section geometric parameter s and true tag of predictionBetween it is smooth L1Return loss;nsIt is the quantity of the initial bounding box of positive class, for being normalized to the classification of word section and recurrence loss;nlIt is Positive class connection sum, for being normalized to link sort loss;λ1And λ2It is weight constant;
(1.2.8) in the training process of step (1.2.7), using online amplification method to training data ItrExpanded online Increase, and positive sample and negative sample are balanced using online negative sample hardly possible example Mining Strategy.In training picture ItrIt is scaled to identical Size and before batch loads, they are randomly cut into image block one by one, each image block and word section it is true The jaccard overlap coefficients o of bounding box is minimum;For multi-direction word, data amplification be in multi-direction word bounding box most Carried out on small area-encasing rectangle, the overlap coefficient o of each sample is randomly choosed from 0,0.1,0.3,0.5,0.7 and 0.9, image The size of block is between 0.1-1 times of original image size;Training image not flip horizontal;In addition, word section and the negative sample of connection Originally the major part of training sample is occupied, positive sample and negative sample is balanced using online negative sample hardly possible example Mining Strategy, to word Section and connection are separately excavated, and the ratio between control negative sample and positive sample is no more than 3: 1.
3. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special Levy and be, the step (2.1) is specially:
Treat i-th text image Itst to be detected in detection image collection Itsti, zooming to uniform sizes, specific size can be with The situation of text image to be detected is manually set, and the text image to be detected after note scaling is Itsti′.By image Itsti' input The word section connecting detection model trained in step (1.2), obtains the characteristic pattern composition that rear 6 layers of convolutional layer are exported respectively Set Itstoi'=[Itstoi1' ..., Itstoi6'], wherein Itstoil' it is the l layers of feature of output in rear 6 layers of convolutional layer Figure, l=1 ..., 6, in every output characteristic figure Itstoil' on coordinate (x, y) place, 3 × 3 convolution fallout predictor can all predict Go out (x, y) corresponding initial bounding box BilqIt is predicted to be the score c of positive and negative class word sections, while also predict 5 numerals making To be predicted to be positive class word section s+When geometrical offset amount.
4. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special Levy and be, the step (2.2) is specially:
Connection and parallel link in prediction interval on the basis of the word section of (2.1) prediction, for being connected in layer, in same Zhang Tezheng Figure Itstoil' upper coordinate points (x, y) place, 3 × 3 convolution fallout predictors predict s(x, y, l)With its 8 neighbour's word sectionInterbed in connection positive and negative score cl1;For parallel link, volume 3 × 3 Product fallout predictor can predict l layers of benchmark word section s(x, y, l)4 neighbour's words section upper with l-1 layersThe positive and negative score c of parallel linkl2, cl1And cl2Constitute the connection of prediction Score cl
5. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special Levy and be, the step (2.3) is specially:
According to step (2.1) and the result of step (2.2), in each characteristic pattern Itstoil' upper coordinate (x, y) place, will predict Word section score cs, word section skewConnection score c in layerl1, parallel link score cl2This four be concatenated into one 33 dimension vector, after the output channel of convolution fallout predictor increase by one layer extra softmax layers with Word section score and connection score are standardized respectively.
6. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special Levy and be, the step (3.1) is specially;
For step (2) text image to be detected be input to word section detection model and the word section s of the fixed qty that produces and Connection Ns, filtered by their score;It is word section s and connection NsDifferent filtering thresholds, respectively α and β are set; Using the word section s ' after filtering as node, the connection N after filterings' as side, build a figure using them.
7. according to claim 1 and 2 based on multi-direction Method for text detection in the natural picture for connecting word section, it is special Levy and be, the step (3.3) is specially:Carry out the literary set of fields that depth-first search is obtained on the diagram to step (3.2) S, a complete word is combined into by following step, including:
(3.3.1) is input into:| S | is the word segment number in set S, whereinIt is i-th word section, i is subscript,Respectively i-th word section bounding box s(i)Center abscissa and ordinate,Respectively word section bounding box s(i)Width and height,It is word section Bounding box s(i)Angle between horizontal direction;
(3.3.2)Wherein θbTo export the deviation angle of bounding box,It is i-th word section bag in set The deviation angle of box s is enclosed, is obtained by the mean deviation angle of all word sections in set S;
(3.3.3) finds straight line tan (θb) x+b intercept b so that all of word section is to central point in set S's The summation of distance is minimum;
(3.3.4) finds two end points (x of straight linep, yp) and (xq, yq), p represents first end points, and q represents second end points, xp、ypRespectively first horizontal stroke of end points, ordinate, xq、yqRespectively second horizontal stroke of end points, ordinate;
(3.3.5)B represents output bounding box, xb、ybRespectively export bounding box The horizontal stroke at center, ordinate;
(3.3.6)Wherein wbTo export the width of bounding box, wp、wqPoint It is not the width and the width of the bounding box centered on q of the bounding box centered on point p;
(3.3.7)hbTo export the height of bounding box,It is i-th word section bounding box s in set Highly, obtained by the average height by all word sections in literary set of fields S;
(3.3.8)b:=(xb, yb, wb, hb, θb), b is output bounding box, is represented by coordinate parameters, dimensional parameters, angle parameter;
The bounding box b that (3.3.9) output is combined.
CN201710010596.7A 2017-01-06 2017-01-06 It is a kind of based on connection text section natural picture in multi-direction Method for text detection Active CN106897732B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710010596.7A CN106897732B (en) 2017-01-06 2017-01-06 It is a kind of based on connection text section natural picture in multi-direction Method for text detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710010596.7A CN106897732B (en) 2017-01-06 2017-01-06 It is a kind of based on connection text section natural picture in multi-direction Method for text detection

Publications (2)

Publication Number Publication Date
CN106897732A true CN106897732A (en) 2017-06-27
CN106897732B CN106897732B (en) 2019-10-08

Family

ID=59197865

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710010596.7A Active CN106897732B (en) 2017-01-06 2017-01-06 It is a kind of based on connection text section natural picture in multi-direction Method for text detection

Country Status (1)

Country Link
CN (1) CN106897732B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766860A (en) * 2017-10-31 2018-03-06 武汉大学 Natural scene image Method for text detection based on concatenated convolutional neutral net
CN107844785A (en) * 2017-12-08 2018-03-27 浙江捷尚视觉科技股份有限公司 A kind of method for detecting human face based on size estimation
CN107977620A (en) * 2017-11-29 2018-05-01 华中科技大学 A kind of multi-direction scene text single detection method based on full convolutional network
CN108304835A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 character detecting method and device
CN108427924A (en) * 2018-03-09 2018-08-21 华中科技大学 A kind of text recurrence detection method based on rotational sensitive feature
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109086663A (en) * 2018-06-27 2018-12-25 大连理工大学 The natural scene Method for text detection of dimension self-adaption based on convolutional neural networks
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN109583367A (en) * 2018-11-28 2019-04-05 网易(杭州)网络有限公司 Image text row detection method and device, storage medium and electronic equipment
CN109685718A (en) * 2018-12-17 2019-04-26 中国科学院自动化研究所 Picture quadrate Zoom method, system and device
CN109886264A (en) * 2019-01-08 2019-06-14 深圳禾思众成科技有限公司 A kind of character detecting method, equipment and computer readable storage medium
CN109886286A (en) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 Object detection method, target detection model and system based on cascade detectors
CN109977997A (en) * 2019-02-13 2019-07-05 中国科学院自动化研究所 Image object detection and dividing method based on convolutional neural networks fast robust
CN110032969A (en) * 2019-04-11 2019-07-19 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN110490232A (en) * 2019-07-18 2019-11-22 北京捷通华声科技股份有限公司 Method, apparatus, the equipment, medium of training literal line direction prediction model
CN111259764A (en) * 2020-01-10 2020-06-09 中国科学技术大学 Text detection method and device, electronic equipment and storage device
CN111291759A (en) * 2020-01-17 2020-06-16 北京三快在线科技有限公司 Character detection method and device, electronic equipment and storage medium
CN111444674A (en) * 2020-03-09 2020-07-24 稿定(厦门)科技有限公司 Character deformation method, medium and computer equipment
CN111914822A (en) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 Text image labeling method and device, computer readable storage medium and equipment
CN111967463A (en) * 2020-06-23 2020-11-20 南昌大学 Method for detecting curve fitting of curved text in natural scene
US20210019569A1 (en) * 2019-07-16 2021-01-21 Ancestry.Com Operations Inc. Extraction of genealogy data from obituaries
CN115620081A (en) * 2022-09-27 2023-01-17 北京百度网讯科技有限公司 Training method of target detection model, target detection method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050471A (en) * 2014-05-27 2014-09-17 华中科技大学 Natural scene character detection method and system
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105469047A (en) * 2015-11-23 2016-04-06 上海交通大学 Chinese detection method based on unsupervised learning and deep learning network and system thereof
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN105608456A (en) * 2015-12-22 2016-05-25 华中科技大学 Multi-directional text detection method based on full convolution network
WO2016124103A1 (en) * 2015-02-03 2016-08-11 阿里巴巴集团控股有限公司 Picture detection method and device
CN106156711A (en) * 2015-04-21 2016-11-23 华中科技大学 The localization method of line of text and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104050471A (en) * 2014-05-27 2014-09-17 华中科技大学 Natural scene character detection method and system
WO2016124103A1 (en) * 2015-02-03 2016-08-11 阿里巴巴集团控股有限公司 Picture detection method and device
CN106156711A (en) * 2015-04-21 2016-11-23 华中科技大学 The localization method of line of text and device
CN105184312A (en) * 2015-08-24 2015-12-23 中国科学院自动化研究所 Character detection method and device based on deep learning
CN105469047A (en) * 2015-11-23 2016-04-06 上海交通大学 Chinese detection method based on unsupervised learning and deep learning network and system thereof
CN105574513A (en) * 2015-12-22 2016-05-11 北京旷视科技有限公司 Character detection method and device
CN105608456A (en) * 2015-12-22 2016-05-25 华中科技大学 Multi-directional text detection method based on full convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姚聪: "自然图像中文字检测与识别研究", 《中国博士学位论文全文数据库》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11030471B2 (en) 2017-09-25 2021-06-08 Tencent Technology (Shenzhen) Company Limited Text detection method, storage medium, and computer device
WO2019057169A1 (en) * 2017-09-25 2019-03-28 腾讯科技(深圳)有限公司 Text detection method, storage medium, and computer device
CN107766860A (en) * 2017-10-31 2018-03-06 武汉大学 Natural scene image Method for text detection based on concatenated convolutional neutral net
CN107977620A (en) * 2017-11-29 2018-05-01 华中科技大学 A kind of multi-direction scene text single detection method based on full convolutional network
CN107977620B (en) * 2017-11-29 2020-05-19 华中科技大学 Multi-direction scene text single detection method based on full convolution network
CN107844785B (en) * 2017-12-08 2019-09-24 浙江捷尚视觉科技股份有限公司 A kind of method for detecting human face based on size estimation
CN107844785A (en) * 2017-12-08 2018-03-27 浙江捷尚视觉科技股份有限公司 A kind of method for detecting human face based on size estimation
CN108304835A (en) * 2018-01-30 2018-07-20 百度在线网络技术(北京)有限公司 character detecting method and device
CN108427924A (en) * 2018-03-09 2018-08-21 华中科技大学 A kind of text recurrence detection method based on rotational sensitive feature
CN108549893B (en) * 2018-04-04 2020-03-31 华中科技大学 End-to-end identification method for scene text with any shape
CN108549893A (en) * 2018-04-04 2018-09-18 华中科技大学 A kind of end-to-end recognition methods of the scene text of arbitrary shape
CN109086663B (en) * 2018-06-27 2021-11-05 大连理工大学 Natural scene text detection method based on scale self-adaption of convolutional neural network
CN109086663A (en) * 2018-06-27 2018-12-25 大连理工大学 The natural scene Method for text detection of dimension self-adaption based on convolutional neural networks
CN109583367A (en) * 2018-11-28 2019-04-05 网易(杭州)网络有限公司 Image text row detection method and device, storage medium and electronic equipment
CN109685718A (en) * 2018-12-17 2019-04-26 中国科学院自动化研究所 Picture quadrate Zoom method, system and device
CN109886286A (en) * 2019-01-03 2019-06-14 武汉精测电子集团股份有限公司 Object detection method, target detection model and system based on cascade detectors
CN109886286B (en) * 2019-01-03 2021-07-23 武汉精测电子集团股份有限公司 Target detection method based on cascade detector, target detection model and system
CN109886264A (en) * 2019-01-08 2019-06-14 深圳禾思众成科技有限公司 A kind of character detecting method, equipment and computer readable storage medium
CN109977997A (en) * 2019-02-13 2019-07-05 中国科学院自动化研究所 Image object detection and dividing method based on convolutional neural networks fast robust
CN110032969A (en) * 2019-04-11 2019-07-19 北京百度网讯科技有限公司 For text filed method, apparatus, equipment and the medium in detection image
CN110032969B (en) * 2019-04-11 2021-11-05 北京百度网讯科技有限公司 Method, apparatus, device, and medium for detecting text region in image
US11537816B2 (en) * 2019-07-16 2022-12-27 Ancestry.Com Operations Inc. Extraction of genealogy data from obituaries
US20230109073A1 (en) * 2019-07-16 2023-04-06 Ancestry.Com Operations Inc. Extraction of genealogy data from obituaries
US20210019569A1 (en) * 2019-07-16 2021-01-21 Ancestry.Com Operations Inc. Extraction of genealogy data from obituaries
CN110490232A (en) * 2019-07-18 2019-11-22 北京捷通华声科技股份有限公司 Method, apparatus, the equipment, medium of training literal line direction prediction model
CN111259764A (en) * 2020-01-10 2020-06-09 中国科学技术大学 Text detection method and device, electronic equipment and storage device
CN111291759A (en) * 2020-01-17 2020-06-16 北京三快在线科技有限公司 Character detection method and device, electronic equipment and storage medium
CN111444674A (en) * 2020-03-09 2020-07-24 稿定(厦门)科技有限公司 Character deformation method, medium and computer equipment
CN111444674B (en) * 2020-03-09 2022-07-01 稿定(厦门)科技有限公司 Character deformation method, medium and computer equipment
CN111967463A (en) * 2020-06-23 2020-11-20 南昌大学 Method for detecting curve fitting of curved text in natural scene
CN111914822A (en) * 2020-07-23 2020-11-10 腾讯科技(深圳)有限公司 Text image labeling method and device, computer readable storage medium and equipment
CN111914822B (en) * 2020-07-23 2023-11-17 腾讯科技(深圳)有限公司 Text image labeling method, device, computer readable storage medium and equipment
CN115620081A (en) * 2022-09-27 2023-01-17 北京百度网讯科技有限公司 Training method of target detection model, target detection method and device

Also Published As

Publication number Publication date
CN106897732B (en) 2019-10-08

Similar Documents

Publication Publication Date Title
CN106897732A (en) Multi-direction Method for text detection in a kind of natural picture based on connection word section
Zhang et al. Scene classification via a gradient boosting random convolutional network framework
Zhu et al. Deep learning in remote sensing: A comprehensive review and list of resources
Kantorov et al. Contextlocnet: Context-aware deep network models for weakly supervised localization
CN104217214B (en) RGB D personage's Activity recognition methods based on configurable convolutional neural networks
CN106650725B (en) Candidate text box generation and text detection method based on full convolution neural network
Zhang et al. EMS-GCN: An end-to-end mixhop superpixel-based graph convolutional network for hyperspectral image classification
Sonka et al. Image processing, analysis and machine vision
CN109919177B (en) Feature selection method based on hierarchical deep network
CN108830188A (en) Vehicle checking method based on deep learning
CN108764308A (en) A kind of recognition methods again of the pedestrian based on convolution loop network
CN104462494B (en) A kind of remote sensing image retrieval method and system based on unsupervised feature learning
CN108427924A (en) A kind of text recurrence detection method based on rotational sensitive feature
Sharma et al. Shark detection from aerial imagery using region-based CNN, a study
CN107239733A (en) Continuous hand-written character recognizing method and system
CN107016357A (en) A kind of video pedestrian detection method based on time-domain convolutional neural networks
CN104299006A (en) Vehicle license plate recognition method based on deep neural network
Sharma et al. Deep eigen space based ASL recognition system
CN108268890A (en) A kind of hyperspectral image classification method
Khasanah et al. Implementation of data augmentation using convolutional neural network for batik classification
Xu et al. Grouped bidirectional LSTM network and multistage fusion convolutional transformer for hyperspectral image classification
CN108009512A (en) A kind of recognition methods again of the personage based on convolutional neural networks feature learning
Koziarski et al. Marine snow removal using a fully convolutional 3d neural network combined with an adaptive median filter
CN114612709A (en) Multi-scale target detection method guided by image pyramid characteristics
Deng et al. Exo-atmospheric infrared objects classification using recurrence-plots-based convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant