CN108764228A - Word object detection method in a kind of image - Google Patents

Word object detection method in a kind of image Download PDF

Info

Publication number
CN108764228A
CN108764228A CN201810520329.9A CN201810520329A CN108764228A CN 108764228 A CN108764228 A CN 108764228A CN 201810520329 A CN201810520329 A CN 201810520329A CN 108764228 A CN108764228 A CN 108764228A
Authority
CN
China
Prior art keywords
layer
frame
bounding box
feature
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810520329.9A
Other languages
Chinese (zh)
Inventor
吕岳
吕淑静
张茹玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiaxing San Suo Intelligent Technology Co Ltd
Original Assignee
Jiaxing San Suo Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiaxing San Suo Intelligent Technology Co Ltd filed Critical Jiaxing San Suo Intelligent Technology Co Ltd
Priority to CN201810520329.9A priority Critical patent/CN108764228A/en
Publication of CN108764228A publication Critical patent/CN108764228A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The present invention provides the word object detection method in a kind of image, belongs to pattern-recognition, technical field of image processing.It includes the following steps:Step 1:The convolutional neural networks of the feature based layer fusion end to end of structure one, the target for different scale in prognostic chart picture;Step 2:According to the candidate frame that Feature-level fusion network exports, the word object detection results in final image are obtained using bounding box blending algorithm.The image object detection method of the present invention is that the band of position of word target is extracted from natural scene image, improves the efficiency and accuracy rate of subsequent target identification.The bounding box of the neural network prediction target using the Feature-level fusion based on deep learning is proposed, and the bounding box of prediction is merged using bounding box blending algorithm, can effectively detect the band of position of pictograph target.

Description

Word object detection method in a kind of image
Technical field
The invention belongs to pattern-recognition, technical field of image processing, more particularly to are word target detections in a kind of image Method.
Background technology
With the development of internet and multimedia technology, more and more information carriers exist in the form of images.Image In include abundant visual information:Word, color, shape, pattern, position etc., these information can help the mankind to divide Analyse the meaning of scene.Technology currently based on the word target detection of image is square in Car license recognition, traffic sign analysis etc. Face has a wide range of applications.But because shooting the randomness of image, the word in image is because of visitors such as deformation, incompleteness, fuzzy fractures Sight factor can generate interference to the detection of character area.In addition, general background is more complicated in scene image, word and background There may be similar textures, this can increase the difficulty of word target detection.Traditional word object detection method is needed to text Word target carries out feature selecting, and the position of word, effect unobvious are obtained using a large amount of heuristic rule.
Invention content
The object of the present invention is to provide word object detection methods in a kind of image based on deep learning, to solve image In word target orientation problem, the present invention carrys out position and the confidence level of predictive text target object by neural network, then By candidate frame aggregation algorithms, all candidate frames of output are merged, obtain the final bounding box of image object, is i.e. image object is examined Survey result.
The technical solution adopted by the present invention to solve the technical problems is:
The pictograph object detection method that a kind of feature based converged network and bounding box blending algorithm are combined, the party Method includes the following steps:
First, a convolutional neural networks end to end are designed, and there are multiple output layers, multiple output layers have strong table Danone power.The different output layer of network can predict the target object of different scale, wherein high-rise output layer predicts large scale Target object, low output layer predicts the target object of small scale.The position of the output layer output target object of network and confidence Degree, obtains a series of boundary candidate frame.
Then the candidate text box of neural network output is post-processed, by merging multiple boundary candidate frames, is obtained The optimum detection position of target object.
Further, the convolutional neural networks of one feature based layer of structure fusion, the position for detecting word target, Include the following steps:
(1) convolutional neural networks of a propagated forward are built, i.e. pre-network is VGG-16, wherein last two layers complete Articulamentum replaces with convolutional layer, after pre-network structure, is added to additional convolutional layer and pond layer.
(2) warp lamination is separately added between highest characteristic layer and other characteristic layers, the deconvolution behaviour in warp lamination Make to be similar to bilinearity difference, selectively characteristic pattern can be amplified so that the characteristic pattern ruler in top characteristic layer Degree becomes the size as low layer scale.The calculation formula of characteristic pattern size of warp lamination output is:
Wherein, i indicates that the size of warp lamination input feature vector figure, k indicate that the size of convolution kernel, s indicate step sizes, p Indicate filling back gauge.According to the size of characteristic layer input feature vector figure and output characteristic pattern, high-rise characteristic layer passes through warp lamination Corresponding parameter is set, can be obtained and low layer characteristic pattern of a size.
(3) characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode, is obtained New characteristic layer.New characteristic layer is as output layer, the position for exporting target object and confidence level, wherein two features The element dot product operations of figure, are equal to two matrix dot product operations, and two matrix corresponding elements are multiplied:
(4) a series of acquiescence frame that fixed sizes are defined on output layer defines a series of acquiescence frame of fixed sizes, defeated Go out the confidence level of layer output text and the offset coordinates relative to acquiescence frame.Assuming that the size of image and characteristic pattern is (w respectivelyim, him) and (wmap,hmap), the position (i, j) corresponds to an acquiescence frame b in characteristic pattern0=(x0,y0,w0,h0), output layer it is defeated Go out for (Δ x, Δ y, Δ w, Δ h, c), wherein (Δ x, Δ y, Δ w, Δ h) indicate prediction text border frame relative to acquiescence frame Offset coordinates, c indicate text confidence level.The text border frame of prediction is b=(x, y, w, h), wherein:
X=x0+w0△x
Y=y0+h0△y
W=w0+exp△x
H=h0+exp△y
X, y indicate that the transverse and longitudinal coordinate in the upper left corner of the text box of prediction, w, h are the width and height of text box.
For Feature-level fusion neural network, setting uses strategy to select positive negative sample, specific steps packet for neural network It includes:
(1) frame, the feature of N × N sizes are given tacit consent to using the schema creation of sliding window on the characteristic pattern of each output layer Figure has N × N number of characteristic point, according to the transverse and longitudinal of target object ratio, each characteristic point correspond to six kinds of different transverse and longitudinals than acquiescence frame:
ar={ a1,a2,a3,a4,a5,a6}
(2) relationship between the true tag frame (ground truth) of target object in image and acquiescence frame is established, and Acquiescence frame is labeled.Acquiescence frame is labeled using jaccard Duplication as matching index, jaccard Duplication Higher to show that Sample Similarity is higher, two samples more match.Given acquiescence frame A and true tag frame B, acquiescence frame and true mark Sign the ratio of the intersection area and union area of the jaccard Duplication expression A and B of frame:
For acquiescence frame using jaccard Duplication more than or equal to 0.5 as matched acquiescence frame, jaccard Duplication is small In 0.5 acquiescence frame as unmatched acquiescence frame.Wherein, matched acquiescence frame is made as positive sample, unmatched acquiescence frame For negative sample.
(3) after sample mark, the negative sample given tacit consent in frame is ranked up by confidence level loss, selects confidence level loss It is worth negative sample of the higher acquiescence frame as network training, the ratio of the positive negative sample of training is made to be maintained at 1:3.
For Feature-level fusion network, the object function of Feature-level fusion network is set, specific steps include:
(1) setting target loss function is the weighted sum of positioning loss and confidence level loss:
Wherein, x indicates that matching result matrix, c indicate that confidence level, l indicate that predicted position, g indicate the actual position of target, N indicates the number of acquiescence frame matching true tag frame;Wherein, weight coefficient α is set as 1;
(2) setting positioning loss is LlocFor the predicted position of target and the L2 losses of actual position, setting confidence level is lost LconfThe softmax losses of two classification of position:
For Fusion Features network, each output is arranged in the target object bounding box of multiple output layer prediction different scales The scale of layer output object boundary frame, specific steps include:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as net The output layer of network.
(2) size that frame is given tacit consent in each output layer is set, and output layer exports object boundary frame relative to the inclined of acquiescence frame Coordinate and confidence level are moved, candidate object boundary frame is obtained.Assuming that there is m output layer in network, each output layer corresponds to one Characteristic pattern, the scale of acquiescence frame is in each characteristic pattern:
Each width for giving tacit consent to frame and height are respectively:
Wherein, Smin, SmaxIndicate that the scale of lowermost layer and top acquiescence frame, low layer output layer predict small scale respectively Target object, the target object of high-rise output layer prediction large scale.The acquiescence frame of output layer has on different characteristic patterns Different scales has different transverse and longitudinal ratios in the same characteristic pattern again, correspondingly, whole network can pass through multiple output layers Predict different scale and target object of different shapes.
Further, multiple candidate target bounding boxes of Feature-level fusion network output are carried out using bounding box blending algorithm Post-processing, obtains the final position of image object, the specific steps of bounding box blending algorithm include:
(1) the boundary candidate frame of target is sorted from high to low according to the value of confidence level, chooses first boundary candidate frame Bounding box as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused boundary If the confidence levels of two text boxes of confidence level be all higher than threshold alpha, calculate present fusion bounding box and be fused bounding box Area overlaps rate, otherwise, executes step (3).Wherein, area overlaps rate and refers to that the overlapping area of two bounding boxes accounts for two sides The ratio of boundary's frame union area:
Wherein, area (C) and area (G) is respectively the area of text box C and text box G:
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes are merged, after fusion Bounding box be two bounding boxes extraneous rectangle frame, confidence level be fusion bounding box confidence level.
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include overlapping Rate removes the bounding box if two bounding boxes are more than threshold gamma comprising Duplication, otherwise, executes step (5).Wherein, it wraps Refer to that the overlapping area of two bounding boxes accounts for the ratio of another bounding box area containing Duplication:
Wherein, area (ti) indicate rectangle frame tiArea, area (ti) indicate rectangle frame tjArea.Ii(ti,tj) table Show rectangle frame tiRelative to rectangle frame tiInclude Duplication.
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as most Otherwise whole object detection results update the boundary candidate frame of image object, according to the sequence arranged before, take it is next not The bounding box being fused executes step (2) as fusing text frame.
Fusion Features network exports the boundary candidate frame of target, and bounding box blending algorithm handles boundary candidate frame, Finally obtain the testing result of image object.
Compared to the prior art the present invention, has the following advantages and effect:Image object detection method proposed by the present invention It is the band of position that target object is oriented from natural scene.This method utilizes multiple output layers in single Neural straight The band of position of prediction target object is connect, recognition efficiency is high, while only there are one post-processing algorithms for merging all candidates Bounding box obtains the testing result of final image object.
Description of the drawings
Fig. 1 is the flow chart of word object detection method in image related to the technical solution of the present invention.
Fig. 2 is the network structure of Feature-level fusion network related to the technical solution of the present invention.
Fig. 3 is the output layer of Feature-level fusion network related to the technical solution of the present invention.
Fig. 4 is Feature-level fusion network samples mode related to the technical solution of the present invention.
Fig. 5 is the boundary candidate of the text objects related to the technical solution of the present invention exported using Feature-level fusion network Frame.
Fig. 6 is the algorithm flow chart of bounding box blending algorithm related to the technical solution of the present invention.
Fig. 7 is related to the technical solution of the present invention using bounding box blending algorithm treated testing result figure.
Specific implementation mode
The present invention is described in further detail below in conjunction with the accompanying drawings and by embodiment, and following embodiment is to this hair Bright explanation and the invention is not limited in following embodiments.
The present invention examines word target with the method that bounding box blending algorithm is combined using Feature-level fusion network It surveys, is broadly divided into two steps, respectively:(1) band of position for using Feature-level fusion neural network forecast image object, obtains text The boundary candidate frame of word target;(2) bounding box blending algorithm is used to obtain final detection result.It is as shown in Figure 1 present invention text The flow chart of word target detection.
With the development of internet and multimedia technology, more and more information carriers exist in the form of images, image There is be widely applied in actual life for target detection.Traditional text detection algorithm needs a large amount of heuristic rule sieve Select text filed, effect is not obvious, and method of the patent of the present invention based on deep learning builds a characteristic layer end to end Converged network, can the position of word target and confidence level directly in prognostic chart picture.
The neural network of feature based layer fusion is built, Fig. 2 shows the network structure of Feature-level fusion network.With The depth of network, the characteristic pattern scale in characteristic layer tapers into, and the ability to express of characteristic pattern is also increasingly stronger, will be high-rise special Sign layer, which with low-level feature layer merge, is combined into new feature layer as output layer, can enhance the ability to express of output layer.Such as Fig. 3 Shown, there are two types of connection types in overall structure for Fusion Features network, and one is bottom-up connection types, and one is certainly Push up downward connection type, such as Fig. 3.Bottom-up is the propagated forward process of network, the size of characteristic pattern by convolutional layer and It can be tapered into after the layer of pond, whole network is pyramid structure in hierarchical structure.Top-down connection uses deconvolution, By the Fusion Features of network high level to low-level feature layer, new output layer is built.As shown in figure 3, the output of Fusion Features network Layer be A, B ', C ', D ', wherein characteristic layer A, B merge to form new characteristic layer B ', and characteristic layer A, C merge to form new characteristic layer C ', characteristic layer A, D merge to form new characteristic layer D ', since characteristic layer A is top characteristic layer, still as the output of network Layer.
The construction step of Feature-level fusion network is as follows:
Step (1):On the basis of the convolutional neural networks for building a propagated forward, wherein last two layers of full articulamentum Convolutional layer is replaced with, pre-network is VGG-16, after pre-network structure, adds additional convolutional layer and pond layer.
Step (2):On the basis of propagated forward network, it will be separately added between top characteristic layer and other characteristic layers Warp lamination makes the characteristic pattern scale after deconvolution and the scale of characteristic pattern in low-level feature layer be consistent.In warp lamination Deconvolution operation be similar to bilinearity difference, selectively characteristic pattern can be amplified so that in top characteristic layer Characteristic pattern scale become the size as low layer scale.The calculation formula of characteristic pattern size of warp lamination output is:
Wherein, i indicates that the size of warp lamination input feature vector figure, k indicate that the size of convolution kernel, s indicate step sizes, p Indicate filling back gauge.According to the size of characteristic layer input feature vector figure and output characteristic pattern, high-rise characteristic layer passes through warp lamination Corresponding parameter is set, can be obtained and low layer characteristic pattern of a size.
Step (3):Characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode, Obtain new characteristic layer.New characteristic layer is as output layer, the position for exporting target object and confidence level, wherein two The element dot product operations of characteristic pattern, are equal to two matrix dot product operations, and two matrix corresponding elements are multiplied:
Step (4):A series of acquiescence frame that fixed sizes are defined on output layer, defines a series of acquiescence of fixed sizes Frame, output layer export the confidence level of text and the offset coordinates relative to acquiescence frame.Assuming that the size of image and characteristic pattern is distinguished It is (wim, him) and (wmap,hmap), the position (i, j) corresponds to an acquiescence frame b in characteristic pattern0=(x0,y0,w0,h0), output The output of layer is (Δ x, Δ y, Δ w, Δ h, c), wherein (Δ x, Δ y, Δ w, Δ h) indicate prediction text border frame relative to Give tacit consent to the offset coordinates of frame, c indicates the confidence level of text.The text border frame of prediction is b=(x, y, w, h), wherein:
X=x0+w0△x
Y=y0+h0△y
W=w0+exp△x
H=h0+exp△y
X, y indicate that the transverse and longitudinal coordinate in the upper left corner of the text box of prediction, w, h are the width and height of text box.
It is characterized a layer converged network setting sampling policy, positive negative sample is obtained, needs to define on the characteristic pattern of output layer, Give tacit consent to frame, and establish the relationship between the true tag frame of target object in image and acquiescence frame, selects positive negative sample.Specific packet Include following steps:
Step (1):Frame is given tacit consent to using the schema creation of sliding window on the characteristic pattern of each output layer, N × N sizes Characteristic pattern has N × N number of characteristic point, according to the transverse and longitudinal of target object ratio, six kinds of transverse and longitudinals of each characteristic point than acquiescence frame:
ar={ a1,a2,a3,a4,a5,a6}
Step (2):Establish the pass between the true tag frame (ground truth) of target object in image and acquiescence frame System, and acquiescence frame is labeled.Acquiescence frame is labeled using jaccard Duplication as matching index, jaccard weights Folded rate is higher to show that Sample Similarity is higher, and two samples more match.Given acquiescence frame A and true tag frame B, acquiescence frame with it is true The jaccard Duplication of real label frame indicates the ratio of the intersection area and union area of A and B:
For acquiescence frame using jaccard Duplication more than or equal to 0.5 as matched acquiescence frame, jaccard Duplication is small In 0.5 acquiescence frame as unmatched acquiescence frame.Wherein, matched acquiescence frame is made as positive sample, unmatched acquiescence frame For negative sample.
Text objects in detection image are characterized converged network and select positive negative sample, need to establish image true tag Relationship between frame and acquiescence frame, such as Fig. 4.The true tag frame of text objects " Marlboro " is the top in figure in Fig. 4 (a) Solid box, the true tag frame of text " LIGHTS " is the solid box of the lower section in figure.The dotted line frame of Fig. 4 (b) and Fig. 4 (c) The acquiescence frame on the characteristic pattern of 8 × 8 sizes and the characteristic pattern of 4 × 4 sizes is indicated respectively.Wherein, matched text " LIGHTS " has Two dotted line frames, for matched text " Marlboro " there are one dotted line frame, the matched acquiescence frame of mark is unmatched as positive sample Frame is given tacit consent to as negative sample.
Step (3):After sample mark, the negative sample given tacit consent in frame is ranked up by confidence level loss, selects confidence Negative sample of the higher acquiescence frame of penalty values as network training is spent, the ratio of the positive negative sample of training is made to be maintained at 1:3.
For Feature-level fusion network.The object function of Feature-level fusion network is set, following steps are specifically included:
(1):The weighted sum that target loss function is positioning loss and confidence level loss is set:
Wherein, x indicates that matching result matrix, c indicate that confidence level, l indicate that predicted position, g indicate the actual position of target, N indicates the number of acquiescence frame matching true tag frame;Wherein, weight coefficient α is set as 1;
(2):Setting positioning loss is LlocFor the predicted position of target and the L2 losses of actual position, setting confidence level is damaged Lose LconfThe softmax losses of two classification of position:
Since the corresponding characteristic pattern scale of output layer different in network is different, target of the different output layers to prediction Scale is different, and high-rise output layer predicts that the target object of large scale, the output layer of low layer predict the target object of small scale. The scale of Feature-level fusion network output layer output object boundary frame, boundary candidate frame such as Fig. 5 of Fusion Features network are set It is shown, specifically include following steps:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as net The output layer of network.
(2) the corresponding characteristic pattern scale of output layer different in network is different, it is assumed that has m output layer in network, often A output layer corresponds to a characteristic pattern, and the scale that frame is given tacit consent in each characteristic pattern is:
Each width for giving tacit consent to frame and height are respectively:
Wherein, Smin, SmaxIndicate that the scale of lowermost layer and top acquiescence frame, low layer output layer predict small scale respectively Target object, the target object of high-rise output layer prediction large scale.The acquiescence frame of output layer has on different characteristic patterns Different scales has different transverse and longitudinal ratios in the same characteristic pattern again, correspondingly, whole network can pass through multiple output layers Predict different scale and text of different shapes.
Feature-level fusion network directly predicts that the bounding box of target object, each bounding box can be obtained using multiple output layers To a confidence score.The bounding box that output layer predicts can there is a situation where overlapped, use bounding box blending algorithm The higher bounding box of confidence level in contiguous range can be chosen, and merges overlapped boundary candidate frame, obtains optimal mesh Test position is marked, following steps are specifically included:
(1) the boundary candidate frame of word target is sorted from high to low according to the value of confidence level, chooses first candidate side Bounding box of boundary's frame as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused boundary If the confidence levels of two text boxes of confidence level be all higher than threshold alpha, calculate present fusion bounding box and be fused bounding box Area overlaps rate, otherwise, executes step (3).Wherein, area overlaps rate and refers to that the overlapping area of two bounding boxes accounts for two sides The ratio of boundary's frame union area:
Wherein, area (C) and area (G) is respectively the area of text box C and text box G:
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes are merged, after fusion Bounding box be two bounding boxes extraneous rectangle frame, confidence level be fusion bounding box confidence level.
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include overlapping Rate removes the bounding box if two bounding boxes are more than threshold gamma comprising Duplication, otherwise, executes step (5).Wherein, it wraps Refer to that the overlapping area of two bounding boxes accounts for the ratio of another bounding box area containing Duplication:
Wherein, area (ti) indicate rectangle frame tiArea, area (ti) indicate rectangle frame tjArea.Ii(ti,tj) table Show rectangle frame tiRelative to rectangle frame tiInclude Duplication.
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as most Otherwise whole object detection results update the boundary candidate frame of image object, according to the sequence arranged before, take it is next not The bounding box being fused executes step (2) as fusing text frame.
It is merged using two bounding boxes of above-mentioned bounding box blending algorithm pair, the flow chart of algorithm is as shown in Figure 6, wherein IOU (ti, tj) indicate bounding box tiAnd tjIOU overlap rate, Fusion (ti,tj) indicate bounding box tiAnd tjBounding box after merging, For the boundary rectangle frame of two bounding boxes;Ii(ti,tj) and Ij(ti,tj) bounding box t is indicated respectivelyiAnd tjInclude Duplication.Side Boundary's frame blending algorithm includes three threshold values, respectively:Confidence threshold value α, IOU overlaps rate threshold value beta, includes Duplication threshold gamma. Confidence threshold value determines whether two bounding boxes merge, and when the confidence level of two bounding boxes is all higher than α, two bounding boxes carry out Fusion.
The last text objects testing result obtained using bounding box blending algorithm, as shown in Figure 7.Bounding box blending algorithm The position relationship and confidence level of neighborhood boundary candidate frame is utilized, boundary candidate frame is merged, obtains final image mesh Mark testing result.Described in this specification above content is only illustrations made for the present invention.
Those skilled in the art can make various modifications to described specific embodiment Or supplement or substitute by a similar method, content without departing from description of the invention or surmount the claims institute The range of definition, is within the scope of protection of the invention.

Claims (4)

1. word object detection method in a kind of image, which is characterized in that include the following steps:
Step 1:The convolutional neural networks of the feature based layer fusion end to end of structure one, for different rulers in prognostic chart picture The word target of degree;
Step 2:According to the candidate frame that Feature-level fusion network exports, final image text is obtained using bounding box blending algorithm Word object detection results.
2. word object detection method in a kind of image according to claim 1, which is characterized in that structure one is end-to-end The convolutional neural networks of feature based layer fusion specifically include following step for the position of the word target in detection image Suddenly:
(1) convolutional neural networks of a propagated forward are built, pre-network is VGG-16, wherein last two layers of full articulamentum Convolutional layer is replaced with, after pre-network structure, is added to additional convolutional layer and pond layer;
(2) on the basis of propagated forward network, deconvolution will be separately added between top characteristic layer and other characteristic layers Layer, makes the characteristic pattern scale after deconvolution and the scale of characteristic pattern in low-level feature layer be consistent;
(3) characteristic pattern after deconvolution is merged with the characteristic pattern of low-level feature layer using element dot product mode, is obtained new Characteristic layer, new characteristic layer is as output layer, the position for exporting target object and confidence level;
(4) it defines a series of acquiescence frame of fixed sizes on output layer, defines the confidence level of output layer output text and opposite In the offset coordinates of acquiescence frame.
3. word object detection method in a kind of image according to claim 2, which is characterized in that feature based layer merges Convolutional neural networks, setting Feature-level fusion network output layer export object boundary frame scale, specifically include:
(1) select the characteristic layer that top characteristic layer and top characteristic layer are formed with other Feature-level fusions as network Output layer;
(2) size that frame is given tacit consent in each output layer is set, and output layer exports object boundary frame and sat relative to the offset of acquiescence frame Mark and confidence level, obtain candidate object boundary frame, and setting low layer output layer predicts the target object of small scale, high-rise output layer Predict the word target object of large scale.
4. word object detection method in a kind of image according to claim 1, which is characterized in that Feature-level fusion network The boundary candidate frame of output is obtained the final position of word target using bounding box blending algorithm, specifically includes following steps:
(1) the boundary candidate frame of word target is sorted from high to low according to the value of confidence level, chooses first boundary candidate frame Bounding box as present fusion;
(2) using other boundary candidate frames as the bounding box being fused, compare present fusion bounding box and be fused setting for boundary If the confidence level of two text boxes of reliability is all higher than threshold alpha, present fusion bounding box and the area for being fused bounding box are calculated Otherwise overlapping rate executes step (3);
(3) if the area of two boundary candidate frames overlaps rate and is optionally greater than threshold value beta, two bounding boxes, the side after fusion are merged Boundary's frame is the extraneous rectangle frame of two bounding boxes, and confidence level is to merge the confidence level of bounding box;
(4) if the area of two boundary candidate frames overlaps rate and is less than threshold value beta, two bounding boxes of calculating include Duplication, such as Two bounding boxes of fruit are more than threshold gamma comprising Duplication, remove the bounding box, otherwise, execute step (5);
(5) if only remaining the last one text box, algorithm terminates, and selects text box of the confidence level higher than threshold value δ as final mesh Mark testing result;
Otherwise, the boundary candidate frame of more new literacy target takes next boundary not being fused according to the sequence arranged before Frame executes step (2) as fusing text frame.
CN201810520329.9A 2018-05-28 2018-05-28 Word object detection method in a kind of image Pending CN108764228A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810520329.9A CN108764228A (en) 2018-05-28 2018-05-28 Word object detection method in a kind of image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810520329.9A CN108764228A (en) 2018-05-28 2018-05-28 Word object detection method in a kind of image

Publications (1)

Publication Number Publication Date
CN108764228A true CN108764228A (en) 2018-11-06

Family

ID=64005915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810520329.9A Pending CN108764228A (en) 2018-05-28 2018-05-28 Word object detection method in a kind of image

Country Status (1)

Country Link
CN (1) CN108764228A (en)

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
CN109458978A (en) * 2018-11-07 2019-03-12 五邑大学 A kind of Downtilt measurement method based on multiple scale detecting algorithm
CN109918951A (en) * 2019-03-12 2019-06-21 中国科学院信息工程研究所 A kind of artificial intelligence process device side channel system of defense based on interlayer fusion
CN110110722A (en) * 2019-04-30 2019-08-09 广州华工邦元信息技术有限公司 A kind of region detection modification method based on deep learning model recognition result
CN110135423A (en) * 2019-05-23 2019-08-16 北京阿丘机器人科技有限公司 The training method and optical character recognition method of text identification network
CN110163081A (en) * 2019-04-02 2019-08-23 宜通世纪物联网研究院(广州)有限公司 Regional invasion real-time detection method, system and storage medium based on SSD
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110414417A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction
CN110458170A (en) * 2019-08-06 2019-11-15 汕头大学 Chinese character positioning and recognition methods in a kind of very noisy complex background image
CN110674804A (en) * 2019-09-24 2020-01-10 上海眼控科技股份有限公司 Text image detection method and device, computer equipment and storage medium
CN110796640A (en) * 2019-09-29 2020-02-14 郑州金惠计算机系统工程有限公司 Small target defect detection method and device, electronic equipment and storage medium
CN111046923A (en) * 2019-11-26 2020-04-21 佛山科学技术学院 Image target detection method and device based on bounding box and storage medium
CN111222368A (en) * 2018-11-26 2020-06-02 北京金山办公软件股份有限公司 Method and device for identifying document paragraph and electronic equipment
CN111598082A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Electric power nameplate text detection method based on full convolution network and instance segmentation network
CN111680628A (en) * 2020-06-09 2020-09-18 北京百度网讯科技有限公司 Text box fusion method, device, equipment and storage medium
TWI706336B (en) * 2018-11-19 2020-10-01 中華電信股份有限公司 Image processing device and method for detecting and filtering text object
CN111783685A (en) * 2020-05-08 2020-10-16 西安建筑科技大学 Target detection improved algorithm based on single-stage network model
CN111844101A (en) * 2020-07-31 2020-10-30 中国科学技术大学 Multi-finger dexterous hand sorting planning method
WO2020221298A1 (en) * 2019-04-30 2020-11-05 北京金山云网络技术有限公司 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN111986252A (en) * 2020-07-16 2020-11-24 浙江工业大学 Method for accurately positioning candidate bounding box in target segmentation network
CN112419310A (en) * 2020-12-08 2021-02-26 中国电子科技集团公司第二十研究所 Target detection method based on intersection and fusion frame optimization
CN112487848A (en) * 2019-09-12 2021-03-12 京东方科技集团股份有限公司 Character recognition method and terminal equipment
CN112906699A (en) * 2020-12-23 2021-06-04 深圳市信义科技有限公司 Method for detecting and identifying enlarged number of license plate
CN113269049A (en) * 2021-04-30 2021-08-17 天津科技大学 Method for detecting handwritten Chinese character area
CN113850264A (en) * 2019-06-10 2021-12-28 创新先进技术有限公司 Method and system for evaluating target detection model
US20220083819A1 (en) * 2019-11-15 2022-03-17 Salesforce.Com, Inc. Image augmentation and object detection
CN114359889A (en) * 2022-03-14 2022-04-15 北京智源人工智能研究院 Text recognition method for long text data
WO2022150978A1 (en) * 2021-01-12 2022-07-21 Nvidia Corporation Neighboring bounding box aggregation for neural networks
CN114898171A (en) * 2022-04-07 2022-08-12 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN115080051A (en) * 2022-05-31 2022-09-20 武汉大学 GUI code automatic generation method based on computer vision

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570497A (en) * 2016-10-08 2017-04-19 中国科学院深圳先进技术研究院 Text detection method and device for scene image
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN107563381A (en) * 2017-09-12 2018-01-09 国家新闻出版广电总局广播科学研究院 The object detection method of multiple features fusion based on full convolutional network
CN107688808A (en) * 2017-08-07 2018-02-13 电子科技大学 A kind of quickly natural scene Method for text detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106570497A (en) * 2016-10-08 2017-04-19 中国科学院深圳先进技术研究院 Text detection method and device for scene image
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN107688808A (en) * 2017-08-07 2018-02-13 电子科技大学 A kind of quickly natural scene Method for text detection
CN107563381A (en) * 2017-09-12 2018-01-09 国家新闻出版广电总局广播科学研究院 The object detection method of multiple features fusion based on full convolutional network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHENG-YANG FU等: ""DSSD : Deconvolutional Single Shot Detector"", 《ARXIV》 *
MINGHUI LIAO等: "TextBoxes: A Fast Text Detector with a Single Deep Neural Network", 《ADVANCEMENT OF ARTIFICIAL INTELLIGENCE(AAAI)》 *
WEILIU等: ""SSD: Single Shot MultiBox Detector"", 《SPRINGER》 *

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109299274B (en) * 2018-11-07 2021-12-17 南京大学 Natural scene text detection method based on full convolution neural network
CN109458978A (en) * 2018-11-07 2019-03-12 五邑大学 A kind of Downtilt measurement method based on multiple scale detecting algorithm
CN109299274A (en) * 2018-11-07 2019-02-01 南京大学 A kind of natural scene Method for text detection based on full convolutional neural networks
TWI706336B (en) * 2018-11-19 2020-10-01 中華電信股份有限公司 Image processing device and method for detecting and filtering text object
CN111222368A (en) * 2018-11-26 2020-06-02 北京金山办公软件股份有限公司 Method and device for identifying document paragraph and electronic equipment
CN111222368B (en) * 2018-11-26 2023-09-19 北京金山办公软件股份有限公司 Method and device for identifying document paragraphs and electronic equipment
CN109918951A (en) * 2019-03-12 2019-06-21 中国科学院信息工程研究所 A kind of artificial intelligence process device side channel system of defense based on interlayer fusion
CN110163081A (en) * 2019-04-02 2019-08-23 宜通世纪物联网研究院(广州)有限公司 Regional invasion real-time detection method, system and storage medium based on SSD
CN110110722A (en) * 2019-04-30 2019-08-09 广州华工邦元信息技术有限公司 A kind of region detection modification method based on deep learning model recognition result
WO2020221298A1 (en) * 2019-04-30 2020-11-05 北京金山云网络技术有限公司 Text detection model training method and apparatus, text region determination method and apparatus, and text content determination method and apparatus
CN110135423A (en) * 2019-05-23 2019-08-16 北京阿丘机器人科技有限公司 The training method and optical character recognition method of text identification network
CN113850264A (en) * 2019-06-10 2021-12-28 创新先进技术有限公司 Method and system for evaluating target detection model
CN110263877B (en) * 2019-06-27 2022-07-08 中国科学技术大学 Scene character detection method
CN110263877A (en) * 2019-06-27 2019-09-20 中国科学技术大学 Scene character detecting method
CN110414417A (en) * 2019-07-25 2019-11-05 电子科技大学 A kind of traffic mark board recognition methods based on multi-level Fusion multi-scale prediction
CN110458170A (en) * 2019-08-06 2019-11-15 汕头大学 Chinese character positioning and recognition methods in a kind of very noisy complex background image
CN112487848B (en) * 2019-09-12 2024-04-26 京东方科技集团股份有限公司 Character recognition method and terminal equipment
CN112487848A (en) * 2019-09-12 2021-03-12 京东方科技集团股份有限公司 Character recognition method and terminal equipment
CN110674804A (en) * 2019-09-24 2020-01-10 上海眼控科技股份有限公司 Text image detection method and device, computer equipment and storage medium
CN110796640A (en) * 2019-09-29 2020-02-14 郑州金惠计算机系统工程有限公司 Small target defect detection method and device, electronic equipment and storage medium
US20220083819A1 (en) * 2019-11-15 2022-03-17 Salesforce.Com, Inc. Image augmentation and object detection
US11710077B2 (en) * 2019-11-15 2023-07-25 Salesforce, Inc. Image augmentation and object detection
CN111046923B (en) * 2019-11-26 2023-02-28 佛山科学技术学院 Image target detection method and device based on bounding box and storage medium
CN111046923A (en) * 2019-11-26 2020-04-21 佛山科学技术学院 Image target detection method and device based on bounding box and storage medium
CN111598082A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Electric power nameplate text detection method based on full convolution network and instance segmentation network
CN111598082B (en) * 2020-04-24 2023-10-17 云南电网有限责任公司电力科学研究院 Electric power nameplate text detection method based on full convolution network and instance segmentation network
CN111783685A (en) * 2020-05-08 2020-10-16 西安建筑科技大学 Target detection improved algorithm based on single-stage network model
CN111680628B (en) * 2020-06-09 2023-04-28 北京百度网讯科技有限公司 Text frame fusion method, device, equipment and storage medium
CN111680628A (en) * 2020-06-09 2020-09-18 北京百度网讯科技有限公司 Text box fusion method, device, equipment and storage medium
CN111986252A (en) * 2020-07-16 2020-11-24 浙江工业大学 Method for accurately positioning candidate bounding box in target segmentation network
CN111986252B (en) * 2020-07-16 2024-03-29 浙江工业大学 Method for accurately positioning candidate bounding boxes in target segmentation network
CN111844101A (en) * 2020-07-31 2020-10-30 中国科学技术大学 Multi-finger dexterous hand sorting planning method
CN111985465A (en) * 2020-08-17 2020-11-24 中移(杭州)信息技术有限公司 Text recognition method, device, equipment and storage medium
CN112419310B (en) * 2020-12-08 2023-07-07 中国电子科技集团公司第二十研究所 Target detection method based on cross fusion frame optimization
CN112419310A (en) * 2020-12-08 2021-02-26 中国电子科技集团公司第二十研究所 Target detection method based on intersection and fusion frame optimization
CN112906699A (en) * 2020-12-23 2021-06-04 深圳市信义科技有限公司 Method for detecting and identifying enlarged number of license plate
WO2022150978A1 (en) * 2021-01-12 2022-07-21 Nvidia Corporation Neighboring bounding box aggregation for neural networks
CN113269049A (en) * 2021-04-30 2021-08-17 天津科技大学 Method for detecting handwritten Chinese character area
CN114359889A (en) * 2022-03-14 2022-04-15 北京智源人工智能研究院 Text recognition method for long text data
CN114898171A (en) * 2022-04-07 2022-08-12 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN114898171B (en) * 2022-04-07 2023-09-22 中国科学院光电技术研究所 Real-time target detection method suitable for embedded platform
CN115080051A (en) * 2022-05-31 2022-09-20 武汉大学 GUI code automatic generation method based on computer vision

Similar Documents

Publication Publication Date Title
CN108764228A (en) Word object detection method in a kind of image
CN108416394B (en) Multi-target detection model building method based on convolutional neural networks
CN107134144B (en) A kind of vehicle checking method for traffic monitoring
CN108876780B (en) Bridge crack image crack detection method under complex background
CN109784203B (en) Method for inspecting contraband in weak supervision X-ray image based on layered propagation and activation
CN103049763B (en) Context-constraint-based target identification method
CN109583425A (en) A kind of integrated recognition methods of the remote sensing images ship based on deep learning
CN108830188A (en) Vehicle checking method based on deep learning
CN111091105A (en) Remote sensing image target detection method based on new frame regression loss function
CN111275688A (en) Small target detection method based on context feature fusion screening of attention mechanism
CN110097568A (en) A kind of the video object detection and dividing method based on the double branching networks of space-time
CN109977918A (en) A kind of target detection and localization optimization method adapted to based on unsupervised domain
Xu et al. Scale-aware feature pyramid architecture for marine object detection
CN108182454A (en) Safety check identifying system and its control method
CN110046572A (en) A kind of identification of landmark object and detection method based on deep learning
CN111079602A (en) Vehicle fine granularity identification method and device based on multi-scale regional feature constraint
CN107729801A (en) A kind of vehicle color identifying system based on multitask depth convolutional neural networks
CN107945153A (en) A kind of road surface crack detection method based on deep learning
CN109753949B (en) Multi-window traffic sign detection method based on deep learning
CN107092870A (en) A kind of high resolution image semantics information extracting method and system
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN105528575A (en) Sky detection algorithm based on context inference
CN107545571A (en) A kind of image detecting method and device
CN106991049A (en) A kind of Software Defects Predict Methods and forecasting system
CN110929746A (en) Electronic file title positioning, extracting and classifying method based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20181106