CN107944443A - One kind carries out object consistency detection method based on end-to-end deep learning - Google Patents
One kind carries out object consistency detection method based on end-to-end deep learning Download PDFInfo
- Publication number
- CN107944443A CN107944443A CN201711139653.8A CN201711139653A CN107944443A CN 107944443 A CN107944443 A CN 107944443A CN 201711139653 A CN201711139653 A CN 201711139653A CN 107944443 A CN107944443 A CN 107944443A
- Authority
- CN
- China
- Prior art keywords
- mrow
- roi
- consistency
- detection
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
Abstract
The present invention proposes a kind of based on end-to-end deep learning progress object consistency detection method, it is intended to find the position of objects in images at the same time, classification and uniformity, the feature in interest region is computed correctly from characteristics of image figure using interest region aligned layer, RoI characteristic patterns are upsampled to high-resolution convolutional layer using convolution sequence of layer and obtain uniformity figure, its consistency is supervised using Robust Strategies adjusting training model.Object detection is positioned for object, and each pixel in object is distributed to its consistency label by consistency detection, and the mapping of bounding box classification, position and uniformity is trained using multitask loss, and finally training and reasoning obtain uniformity label.The present invention uses end-to-end deep learning, use multitask loss function combined optimization object detection and consistency detection, it is not necessary to which extraneous information, reduces the complexity in training and test process, the accuracy of detection is effectively improved, suitable for the application of real-time machine people.
Description
Technical field
The present invention relates to computer vision field, and object one is carried out based on end-to-end deep learning more particularly, to a kind of
Cause property detection method.
Background technology
In computer vision, while detection object and cutting object are becoming increasingly popular, and object can be regarded by various
Feel that attribute such as color, shape or physical attribute such as weight, volume and material are described, these attributes are for identifying object
Or it is useful to be classified into different classifications, in many robot applications, identification object consistency be it is vital,
But robot may still need more information to complete task, robot does not merely have to detection object uniformity, and
It can position and identify relevant object.Object consistency detection is used as emerging problem, has practicality hair in many fields
Exhibition, such as scene understanding, video search, object detection, behavioural analysis, 3 D scene rebuilding, human-computer interaction etc., especially
Ground, the human-computer interaction in object detection unmanned, in smart home, medical diagnosis in field of traffic etc. all has wide
Wealthy application prospect.Understand that object or object consistency are different from the virtual physical properties of only description object, it is also necessary to acquisition pair
Interaction as consensus information and with the mankind, therefore, understands that object consistency is autonomous robot with object interaction and assisting
People carry out the key of various routine works.
However, the uniformity of detection object is more increasingly difficult than traditional semantic segmentation problem, two have different appearances
Object may have identical uniformity label, because uniformity label is the abstract concept to object behavior based on the mankind, separately
Outside, it is also vital to carry out detection and the summary to that can not see object in real time for uniformity.Existing common method
It is very time-consuming using two continuous deep-neural-networks, it is not suitable for applying in real time.
The present invention proposes a kind of based on end-to-end deep learning progress object consistency detection method, it is intended to finds at the same time
The position of objects in images, classification and uniformity, are correctly counted using interest region aligned layer (RoIAlign) from characteristics of image figure
The feature of interest region (RoI) is calculated, RoI characteristic patterns are upsampled to high-resolution convolutional layer using convolution sequence of layer obtains uniformity
Figure, its consistency is supervised using Robust Strategies adjusting training model.Object detection is positioned for object, and consistency detection will be right
Each pixel as in distributes to its consistency label, using multitask loss be trained bounding box classify, position with it is consistent
Property mapping, finally training and reasoning obtain uniformity label.The present invention uses end-to-end deep learning, is lost using multitask
Function combined optimization object detection and consistency detection, it is not necessary to which extraneous information, reduces the complexity in training and test process
Property, is effectively improved the accuracy of detection, suitable for the application of real-time machine people.
The content of the invention
For time-consuming, be not suitable for applying in real time the problem of, the present invention use end-to-end deep learning, use multitask
Loss function combined optimization object detection and consistency detection, it is not necessary to which extraneous information, reduces in training and test process
Complexity, is effectively improved the accuracy of detection, suitable for the application of real-time machine people.
To solve the above problems, the present invention provides one kind carries out object consistency detection side based on end-to-end deep learning
Method, mainly includes:
Problem formulation (one);
The uniformity network architecture (two);
(3) are lost in multitask;
Training and reasoning (four).
Wherein, the problem of described, is formulation, and frame is intended to find the position of object at the same time, pair in object type and image
The uniformity of elephant, according in computer vision standard design, the position of object by being defined relative to the upper left corner rectangle of image,
Object type is defined by rectangle frame, each pixel coder its consistency in rectangle frame, and object pixel region has identical
Function, it is believed that be it is consistent, ideally, all related objects in detection image, and by each picture in these objects
Element is mapped to most probable uniformity label.
Wherein, the uniformity network architecture, there is three chief components:1) interest region aligned layer
(RoIAlign) it is used for the feature that interest region (RoI) is computed correctly from characteristics of image figure;2) convolution sequence of layer is by RoI characteristic patterns
It is upsampled to high-resolution convolutional layer and obtains smooth, fine and smooth uniformity figure;3) supervised using Robust Strategies adjusting training model
Its consistency.
Further, the interest region aligned layer (RoIAlign), region suggest that network (RPN) is carried out based on region
Target acquisition, the network share weight with master file product backbone, export different size of bounding box, and each RoI uses RoIPool layers
The small Feature Mapping (such as 7 × 7) of fixed size is accumulated from characteristics of image set of graphs layer, RoIAlign layers will suitably carry
The feature taken is alignd with RoI, is operated without using rounding-off, and RoIAlign layer in each RoI grid of bilinear interpolation calculating using advising
The then interpolated value of sampling location, carrys out polymerization result using maximum computing, avoids the imbalance between RoI and the feature of extraction.
Further, the high-resolution convolutional layer, uses the model (such as 14 × 14 or 28 × 28) of small fixed size
To represent Object Segmentation model, the pixel value in each prediction model of RoI is binary, i.e. foreground and background, because often
There are multiple Consistency Class in a object, cannot be worked well in test problems are provided using bench model, therefore use solution
Convolutional layer realizes high-resolution consistency model, and in form, it is S to give input feature vector figure sizei, uncoiling lamination performs and volume
The opposite operation of lamination, in order to build the output figure size S of biggero, SiWith SoRelation be:
So=s* (Si-1)+Sf-2*d (1)
Wherein SfIt is filter size;S and d is stride and pad parameter respectively;In fact, RoIAlign layers of Output Size
For 7 × 7 characteristic pattern, which is upsampled to the resolution ratio of higher, first uncoiling lamination filling using three uncoiling laminations
Parameter d=1, stride s=1, kernel size Sf=8, the figure that size is 30 × 30 is created, similarly, the second layer parameter is (d=
1, s=4, Sf=8), third layer parameter is (d=1, s=2, Sf=4) the final high resolution graphics that size is 244 × 244 is created,
Before each uncoiling lamination, carrying out learning characteristic using convolutional layer will be used to deconvolute, and convolutional layer can be regarded as two continuously
Uncoiling lamination between adaptation.
Further, the training pattern, the fixed size of consistency model detection branches needs one (such as 244
× 244) supervised training, is not worked using single threshold value in consistency detection problem, therefore proposes multi thresholds Developing Tactics ruler
It is very little, a virgin control group model is given, in the case of without loss of generality, if n separate label P=(c in model0,
c1,…,cn-1), the value Linear Mapping in P is set toUsing from P toMapping by original mould
Type is converted into new model;The model of conversion is adjusted to predefined moulded dimension, and is used on the model of adjustment size
Threshold value, it is as follows:
Wherein, ρ (x, y) is the pixel value for adjusting model;It isValue in one;α is super parameter, is set to
0.005;By the value in threshold model be remapped to original tag value (by using fromTo the mapping of P) come realize object instruct
Practice model.
Further, the end-to-end deep learning, network is made of Liang Ge branches, for object detection and uniformity
Detection, is given input picture, is extracted further feature from image using VGG16 networks as backbone, then used and convolution bone
Frame shares the RPN of weight to generate candidate's bounding box (RoIs), and for each RoI, RoIAlign layers of extraction are simultaneously corresponding by its
Feature is converged in the characteristic pattern of 7 × 7 sizes, and in object detection branch, using two layers being fully connected, every layer all
There are 4096 neurons, its subseries layer classifies object, returns layer and returns object's position;In consistency detection branch
In, the characteristic patterns of 7 × 7 sizes up-sampling is amplified to 244 × 244 acquisition high resolution graphics, using softmax layers by 244 × 244
Each pixel in mapping distributes to its most probable Consistency Class, and whole network is carried out end-to-end using multitask loss function
Training.
Wherein, the multitask is lost, in end-to-end framework, in K+1 object type classification layer output probability distribution p
=(p0,…,pK), p is softmax layers of output, and returning K+1 bounding box recurrence offset of layer output, (each offset is included in frame
The heart and frame size):Each offset tkCorresponding to each classification k, to tkParameterized, tkSpecify
The conversion of one Scale invariant, height/width relative shift relation RPN bounding boxs, consistency detection branch export each pixel i
RoI in one group of probability distribution m={ mi}i∈RoI, whereinIt is in the C+1 uniformity labels including background
The softmax layers output of upper definition;Using multitask lose L carry out the classification of joint training bounding box, surround box position and
Uniformity maps, as follows:
L=Lcls+Lloc+Laff (3)
Wherein LclsIt is defined as the output of classification layer, LlocIt is defined as returning the output of layer, LaffIt is defined as consistency detection point
The output of branch.
Further, the prediction object of each RoI is control group object class u, and control group bounding box offset υ is consistent with target
Property model s, training dataset provide u and υ value, goal congruence model s be RoI it is associated there control group model between
Intersection, the RoI interior pixels for being not belonging to intersection, we are marked as background, and object mask is adjusted to fixed
Size (i.e. 244 × 244), formula (3) is written as:
L(p,u,tu, v, m, s) and=Lcls(p,u)+I[u≥1]Lloc(tu,v)+I[u≥1]Laff(m,s) (4)
First loss Lcls(p, u) is the intersection entropy loss of multinomial classification, is calculated as follows:
Lcls(p, u)=- log (pu) (5)
Wherein, puIt is the softmax outputs of control group object class u, second is lost Lloc(tu, v) and it is to return frame offset tu
(correspond to the smooth L1 losses between control group object class u) and control group bounding box offset υ, calculate as follows:
Wherein:
Laff(m, s) is the multinomial intersection entropy loss of consistency detection branch, is calculated as follows:
Wherein,It is true tag siPixel i at softmax output;N is the pixel number in RoI;
In equation (4), I [u >=1] is a target function, exports 1 as u >=1, is otherwise 0, only defines frame position
Lose Lloc, only RoI is timing, defines consistency detection loss Laff, when the value of RoI is positive or negative, define object classification damage
Lose Lcls, consistency detection branch penalty is different from example segmentation loss, and binary segmentation, i.e. prospect are divided into each RoI
And background, in consistency detection problem, uniformity label is different from object tag, the uniformity number of labels in each RoI
Binary, i.e., it is always greater than 2 (including backgrounds), therefore, uniformity label dependent on each pixel softmax and
Multinomial intersection entropy loss.
Wherein, the training and reasoning, the training network in a manner of end to end, using 0.9 momentum and 0.0005 weight
The stochastic gradient descent method of decay, the network carry out 200,000 repetitive exercises, and the learning rate of first 150,000 times is arranged to 0.001, most
The learning rate of 50,000 times reduces afterwards, and input picture is resized so that short edge is 600 pixels, and long edge is no more than
1000 pixels;If longer edge, more than 1000 pixels, longer edge is arranged to 1000 pixels, and is based on the edge
Adjust image size;15 anchor points are used in RPN, preceding 2000 RoI of RPN are used to calculate multitask loss;In reasoning rank
Section, selects preceding 1000 RoI of RPN generations, and object detection branch is run on these RoI, from the output of detection branches, selection
Classify output box of the fraction higher than 0.9 as the object eventually detected, if not meeting the frame of the condition, selection has
One unique detection object of conduct of highest classification fraction, the input using the object detected as supply detection branches are right
Each pixel in the object detected, uniformity classification prediction obtain the output-consistence label of each pixel;Finally, adopt
244 × 244 consistency models of each object prediction are adjusted to object (frame) size with strategy is sized, if detected
Object between there are overlapping, final consistency label is determined based on priority.
Brief description of the drawings
Fig. 1 is a kind of system flow chart that object consistency detection method is carried out based on end-to-end deep learning of the present invention.
Fig. 2 is a kind of uniformity network rack that object consistency detection method is carried out based on end-to-end deep learning of the present invention
Composition.
Fig. 3 is a kind of deconvolution up-sampling that object consistency detection method is carried out based on end-to-end deep learning of the present invention
Figure.
Embodiment
It should be noted that in the case where there is no conflict, the feature in embodiment and embodiment in the application can phase
Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.
Fig. 1 is a kind of system flow chart that object consistency detection method is carried out based on end-to-end deep learning of the present invention.
Mainly include:Problem formulation (one);The uniformity network architecture (two);(3) are lost in multitask;Training and reasoning (four).
Problem formulation frame is intended to find the position of object at the same time, the uniformity of the object in object type and image,
According in computer vision standard design, the position of object by being defined relative to the upper left corner rectangle of image, object type by
Rectangle frame defines, and each pixel coder its consistency in rectangle frame, object pixel region has the function of identical, it is believed that is
It is consistent, ideally, all related objects in detection image, and by each pixel-map in these objects to most may be used
The uniformity label of energy.
In end-to-end framework, in K+1 object type classification layer output probability distribution p=(p0,…,pK), p is softmax
The output of layer, returns layer and exports K+1 bounding box recurrence offset (each offset includes frame center and frame size):Each offset tkCorresponding to each classification k, to tkParameterized, tkSpecify a Scale invariant
Conversion, height/width relative shift relation RPN bounding boxs, consistency detection branch exports in the RoI of each pixel i one group
Probability distribution m={ mi}i∈RoI, whereinIt is to be defined on the C+1 uniformity labels including background
Softmax layers of output;The classification of L progress joint training bounding boxs, encirclement box position and uniformity are lost using a multitask to reflect
Penetrate, it is as follows:
L=Lcls+Lloc+Laff (1)
Wherein LclsIt is defined as the output of classification layer, LlocIt is defined as returning the output of layer, LaffIt is defined as consistency detection point
The output of branch.
The prediction object of each RoI is control group object class u, and control group bounding box deviates υ and goal congruence model s,
Training dataset provides the value of u and υ, and goal congruence model s is the intersection between RoI control group models associated there,
RoI interior pixels for being not belonging to intersection, we are marked as background, and object mask is adjusted to fixed size (i.e.
244 × 244), formula is written as:
L(p,u,tu, v, m, s) and=Lcls(p,u)+I[u≥1]Lloc(tu,v)+I[u≥1]Laff(m,s) (2)
First loss Lcls(p, u) is the intersection entropy loss of multinomial classification, is calculated as follows:
Lcls(p, u)=- log (pu) (3)
Wherein, puIt is the softmax outputs of control group object class u, second is lost Lloc(tu, v) and it is to return frame offset tu
(L1 corresponded between control group object class u) and control group bounding box offset υ smoothly loses, and calculates as follows:
Wherein:
Laff(m, s) is the multinomial intersection entropy loss of consistency detection branch, is calculated as follows:
Wherein,It is true tag siPixel i at softmax output;N is the pixel number in RoI;
In equation (1), I [u >=1] is a target function, exports 1 as u >=1, is otherwise 0, only defines frame position
Lose Lloc, only RoI is timing, defines consistency detection loss Laff, when the value of RoI is positive or negative, define object classification damage
Lose Lcls, consistency detection branch penalty is different from example segmentation loss, and binary segmentation, i.e. prospect are divided into each RoI
And background, in consistency detection problem, uniformity label is different from object tag, the uniformity number of labels in each RoI
Binary, i.e., it is always greater than 2 (including backgrounds), therefore, uniformity label dependent on each pixel softmax and
Multinomial intersection entropy loss.
The training network in a manner of end to end, the stochastic gradient descent method to be decayed using 0.9 momentum and 0.0005 weight, should
Network carries out 200,000 repetitive exercises, and the learning rate of first 150,000 times is arranged to 0.001, and the learning rate of last 50,000 times reduces, input
Image is resized so that short edge is 600 pixels, and long edge is no more than 1000 pixels;If longer edge surpasses
Cross 1000 pixels, then longer edge is arranged to 1000 pixels, and based on edge adjustment image size;Used in RPN
15 anchor points, preceding 2000 RoI of RPN are used to calculate multitask loss;In the reasoning stage, first 1000 of RPN generations are selected
RoI, runs object detection branch on these RoI, and from the output of detection branches, selection sort fraction is higher than 0.9 output box
As the object eventually detected, if not meeting the frame of the condition, a conduct of the selection with highest classification fraction
Unique detection object, the input using the object detected as supply detection branches, for each in the object that detects
Pixel, uniformity classification prediction obtain the output-consistence label of each pixel;Finally, it is each right using strategy general is sized
As 244 × 244 consistency models of prediction are adjusted to object (frame) size, if there are overlapping between the object detected, most
Whole uniformity label is determined based on priority.
Fig. 2 is a kind of uniformity network rack that object consistency detection method is carried out based on end-to-end deep learning of the present invention
Composition.The uniformity network architecture has three chief components:1) interest region aligned layer (RoIAlign) is used for special from image
Sign figure is computed correctly the feature of interest region (RoI);2) RoI characteristic patterns are upsampled to high-resolution convolutional layer and obtained by convolution sequence of layer
Obtain smooth, fine and smooth uniformity figure;3) its consistency is supervised using Robust Strategies adjusting training model.
Interest region aligned layer (RoIAlign), region suggest that network (RPN) is based on region and carries out target acquisition, the network
Weight is shared with master file product backbone, exports different size of bounding box, each RoI uses RoIPool layers from characteristics of image atlas
The small Feature Mapping (such as 7 × 7) that fixed size is accumulated in layer is closed, RoIAlign layers suitably by the feature and RoI of extraction
Alignment, operates without using rounding-off, RoIAlign layer using in each RoI grid of bilinear interpolation calculating rule sampling position it is interior
Interpolation, carrys out polymerization result using maximum computing, avoids the imbalance between RoI and the feature of extraction.
Consistency model detection branches need fixed size (such as 244 × 244) supervised training, use single threshold
Value does not work in consistency detection problem, therefore proposes multi thresholds Developing Tactics size, gives a virgin control group model,
In the case of without loss of generality, if n separate label P=(c in model0,c1,…,cn-1), the value Linear Mapping in P is set
For Using from P toMapping archetype is converted into new model;By the model tune of conversion
Whole is predefined moulded dimension, and uses threshold value on the model of adjustment size, as follows:
Wherein, ρ (x, y) is the pixel value for adjusting model;It isValue in one;α is super parameter, is set to
0.005;By the value in threshold model be remapped to original tag value (by using fromTo the mapping of P) come realize object instruct
Practice model.
End-to-end deep learning network is made of Liang Ge branches, for object detection and consistency detection, gives input figure
Picture, further feature is extracted using VGG16 networks as backbone from image, then uses the RPN that weight is shared with convolution skeleton
To generate candidate's bounding box (RoIs), for each RoI, RoIAlign layers are extracted and its corresponding feature are converged to one 7
In the characteristic pattern of × 7 sizes, in object detection branch, using two layers being fully connected, every layer has 4096 neurons,
Its subseries layer classifies object, returns layer and returns object's position;In consistency detection branch, the feature of 7 × 7 sizes
Figure up-sampling is amplified to 244 × 244 acquisition high resolution graphics, each pixel during 244 × 244 are mapped using softmax layers
Its most probable Consistency Class is distributed to, whole network is lost function using multitask and trained end to end.
Fig. 3 is a kind of deconvolution up-sampling that object consistency detection method is carried out based on end-to-end deep learning of the present invention
Figure.High-resolution convolutional layer represents Object Segmentation model using the model (such as 14 × 14 or 28 × 28) of small fixed size,
Pixel value in each prediction model of RoI is binary, i.e. foreground and background, because having in each object multiple consistent
Property class, cannot be worked, therefore realize high-resolution using uncoiling lamination well using bench model in test problems are provided
Consistency model, in form, it is S to give input feature vector figure sizei, the uncoiling lamination execution operation opposite with convolutional layer, in order to
Build the output figure size S of biggero, SiWith SoRelation be:
So=s* (Si-1)+Sf-2*d (7)
Wherein SfIt is filter size;S and d is stride and pad parameter respectively;In fact, RoIAlign layers of Output Size
For 7 × 7 characteristic pattern, which is upsampled to the resolution ratio of higher, first uncoiling lamination filling using three uncoiling laminations
Parameter d=1, stride s=1, kernel size Sf=8, the figure that size is 30 × 30 is created, similarly, the second layer parameter is (d=
1, s=4, Sf=8), third layer parameter is (d=1, s=2, Sf=4) the final high resolution graphics that size is 244 × 244 is created,
Before each uncoiling lamination, carrying out learning characteristic using convolutional layer will be used to deconvolute, and convolutional layer can be regarded as two continuously
Uncoiling lamination between adaptation.
For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention
In the case of refreshing and scope, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as the present invention's
Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention
More and change.
Claims (10)
1. one kind carries out object consistency detection method based on end-to-end deep learning, it is characterised in that mainly determines including problem
Formula (one);The uniformity network architecture (two);(3) are lost in multitask;Training and reasoning (four).
2. the problem of based on described in claims 1 formulation (one), it is characterised in that frame is intended to find the position of object at the same time
Put, the uniformity of the object in object type and image, designed according to the standard in computer vision, the position of object is by opposite
In the upper left corner rectangle definition of image, object type is defined by rectangle frame, each pixel coder its consistency in rectangle frame,
Object pixel region has the function of identical, it is believed that be it is consistent, ideally, all related objects in detection image,
And by each pixel-map in these objects to most probable uniformity label.
3. based on the uniformity network architecture (two) described in claims 1, it is characterised in that three of the uniformity network architecture
Chief component:1) interest region aligned layer (RoIAlign) is used to be computed correctly interest region (RoI) from characteristics of image figure
Feature;2) RoI characteristic patterns are upsampled to high-resolution convolutional layer and obtain smooth, fine and smooth uniformity figure by convolution sequence of layer;3)
Its consistency is supervised using Robust Strategies adjusting training model.
4. based on the interest region aligned layer (RoIAlign) described in claims 3, it is characterised in that suggest network in region
(RPN) target acquisition is carried out based on region, which shares weight with master file product backbone, export different size of bounding box, often
A RoI accumulates the small Feature Mapping (such as 7 × 7) of fixed size using RoIPool layers from characteristics of image set of graphs layer,
RoIAlign layers are suitably alignd the feature of extraction with RoI, are operated without using rounding-off, RoIAlign layers use bilinear interpolation
The interpolated value of rule sampling position in each RoI grid is calculated, carrys out polymerization result using maximum computing, avoids RoI and extraction
Imbalance between feature.
5. be based on high-resolution convolutional layer described in claims 3, it is characterised in that using small fixed size model (such as
14 × 14 or 28 × 28) represent Object Segmentation model, pixel value in each prediction model of RoI be it is binary, i.e., before
Scape and background, cannot be fine in test problems are provided using bench model because having multiple Consistency Class in each object
Ground works, therefore realizes high-resolution consistency model using uncoiling lamination, and in form, it is S to give input feature vector figure sizei,
Uncoiling lamination performs the operation opposite with convolutional layer, in order to build the output figure size S of biggero, SiWith SoRelation be:
So=s* (Si-1)+Sf-2*d (1)
Wherein SfIt is filter size;S and d is stride and pad parameter respectively;In fact, RoIAlign layers of Output Size for 7 ×
The figure, the resolution ratio of higher, first uncoiling lamination pad parameter d are upsampled to using three uncoiling laminations by 7 characteristic pattern
=1, stride s=1, kernel size Sf=8, the figure that size is 30 × 30 is created, similarly, the second layer parameter is (d=1, s=
4, Sf=8), third layer parameter is (d=1, s=2, Sf=4) the final high resolution graphics that size is 244 × 244 is created, every
Before a uncoiling lamination, carrying out learning characteristic using convolutional layer will be used to deconvolute, and convolutional layer can be regarded as two continuous solutions
Adaptation between convolutional layer.
6. based on the training pattern described in claims 3, it is characterised in that consistency model detection branches need a fixation
Size (such as 244 × 244) supervised training, do not worked using single threshold value in consistency detection problem, thus propose it is more
Threshold strategies adjust size, give a virgin control group model, in the case of without loss of generality, if n independence in model
Label P=(c0,c1,…,cn-1), the value Linear Mapping in P is set toUsing from P toMapping
Archetype is converted into new model;The model of conversion is adjusted to predefined moulded dimension, and in adjustment size
Threshold value is used on model, it is as follows:
Wherein, ρ (x, y) is the pixel value for adjusting model;It isValue in one;α is super parameter, is set to 0.005;By threshold
Value in value model be remapped to original tag value (by using fromTo the mapping of P) realize object training pattern.
7. the end-to-end deep learning described in based on claims 1, it is characterised in that network is made of Liang Ge branches, is used for
Object detection and consistency detection, give input picture, further feature are extracted from image using VGG16 networks as backbone,
Then use and share the RPN of weight with convolution skeleton to generate candidate's bounding box (RoIs), for each RoI, RoIAlign layers
Extract and converge to its corresponding feature in the characteristic pattern of one 7 × 7 size, it is complete using two in object detection branch
The layer connected entirely, every layer has 4096 neurons, its subseries layer classifies object, returns layer and returns object's position;
In consistency detection branch, the characteristic pattern up-sampling of 7 × 7 sizes is amplified to 244 × 244 acquisition high resolution graphics, uses
Each pixel in 244 × 244 mappings is distributed to its most probable Consistency Class by softmax layers, and whole network uses more
Business is lost function and is trained end to end.
8. (three) are lost based on the multitask described in claims 1, it is characterised in that in end-to-end framework, in K+1 object
Classification of type layer output probability distribution p=(p0,…,pK), p is softmax layers of output, returns layer and exports K+1 bounding box time
Return-to-bias moves (each offset includes frame center and frame size):Each offset tkCorresponding to each class
Other k, to tkParameterized, tkThe conversion of a specified Scale invariant, height/width relative shift relation RPN bounding boxs, one
Cause property detection branches export one group of probability distribution m={ m in the RoI of each pixel ii}i∈RoI, whereinIt is
The softmax layers output defined on the C+1 uniformity labels including background;L, which is lost, using a multitask carries out joint instruction
Practice bounding box classification, surround box position and uniformity mapping, it is as follows:
L=Lcls+Lloc+Laff (3)
Wherein LclsIt is defined as the output of classification layer, LlocIt is defined as returning the output of layer, LaffIt is defined as consistency detection branch
Output.
9. based on the loss described in claims 8, it is characterised in that the prediction object of each RoI is control group object class u,
Control group bounding box deviates υ and goal congruence model s, and training dataset provides the value of u and υ, and goal congruence model s is
Intersection between RoI control group models associated there, the RoI interior pixels for being not belonging to intersection, we are marked
For background, object mask is adjusted to fixed size (i.e. 244 × 244), formula (3) is written as:
L(p,u,tu, v, m, s) and=Lcls(p,u)+I[u≥1]Lloc(tu,v)+I[u≥1]Laff(m,s) (4)
First loss Lcls(p, u) is the intersection entropy loss of multinomial classification, is calculated as follows:
Lcls(p, u)=- log (pu) (5)
Wherein, puIt is the softmax outputs of control group object class u, second is lost Lloc(tu, v) and it is to return frame offset tuIt is (corresponding
L1 between control group object class u) and control group bounding box offset υ smoothly loses, and calculates as follows:
<mrow>
<msub>
<mi>L</mi>
<mrow>
<mi>l</mi>
<mi>o</mi>
<mi>c</mi>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msup>
<mi>t</mi>
<mi>u</mi>
</msup>
<mo>,</mo>
<mi>v</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<mo>{</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>,</mo>
<mi>w</mi>
<mo>,</mo>
<mi>h</mi>
<mo>}</mo>
</mrow>
</munder>
<msub>
<mi>Smooth</mi>
<mrow>
<mi>L</mi>
<mn>1</mn>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<msubsup>
<mi>t</mi>
<mi>i</mi>
<mi>u</mi>
</msubsup>
<mo>-</mo>
<msub>
<mi>v</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>6</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein:
Laff(m, s) is the multinomial intersection entropy loss of consistency detection branch, is calculated as follows:
<mrow>
<msub>
<mi>L</mi>
<mrow>
<mi>a</mi>
<mi>f</mi>
<mi>f</mi>
</mrow>
</msub>
<mrow>
<mo>(</mo>
<mi>m</mi>
<mo>,</mo>
<mi>s</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mrow>
<mo>-</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</mfrac>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>i</mi>
<mo>&Element;</mo>
<mi>R</mi>
<mi>o</mi>
<mi>I</mi>
</mrow>
</munder>
<mi>l</mi>
<mi>o</mi>
<mi>g</mi>
<mrow>
<mo>(</mo>
<msubsup>
<mi>m</mi>
<msub>
<mi>s</mi>
<mi>i</mi>
</msub>
<mi>i</mi>
</msubsup>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mn>7</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein,It is true tag siPixel i at softmax output;N is the pixel number in RoI;In equation (4), I
[u >=1] is a target function, exports 1 as u >=1, is otherwise 0, only defines frame position loss Lloc, only RoI is timing,
Define consistency detection loss Laff, when the value of RoI is positive or negative, define object classification loss Lcls, consistency detection branch damage
Mistake is different from example segmentation loss, is divided into binary segmentation, i.e. foreground and background in each RoI, is asked in consistency detection
In topic, uniformity label is different from object tag, and the uniformity number of labels in each RoI is not binary, i.e., it is always
More than 2 (including backgrounds), therefore, softmax and multinomial intersection entropy loss of the uniformity label dependent on each pixel.
10. based on the training described in claims 1 and reasoning (four), it is characterised in that the training network in a manner of end to end,
The stochastic gradient descent method to be decayed using 0.9 momentum and 0.0005 weight, the network carry out 200,000 repetitive exercises, first 150,000 times
Learning rate be arranged to 0.001, the learning rate of last 50,000 times reduces, and input picture is resized so that short edge is
600 pixels, long edge are no more than 1000 pixels;If longer edge is arranged to more than 1000 pixels, longer edge
1000 pixels, and based on edge adjustment image size;15 anchor points are used in RPN, preceding 2000 RoI of RPN are used for
Calculate multitask loss;In the reasoning stage, preceding 1000 RoI of RPN generations are selected, object detection point is run on these RoI
Branch, from the output of detection branches, output box of the selection sort fraction higher than 0.9 is as the object eventually detected, if do not had
Meet the frame of the condition, then a conduct unique detection object of the selection with highest classification fraction, uses the object detected
As the input of supply detection branches, for each pixel in the object that detects, uniformity classification prediction obtains each picture
The output-consistence label of element;Finally, using being sized strategy by 244 × 244 consistency model tune of each object prediction
Whole is object (frame) size, if there are overlapping between the object detected, final consistency label is determined based on priority.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711139653.8A CN107944443A (en) | 2017-11-16 | 2017-11-16 | One kind carries out object consistency detection method based on end-to-end deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711139653.8A CN107944443A (en) | 2017-11-16 | 2017-11-16 | One kind carries out object consistency detection method based on end-to-end deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107944443A true CN107944443A (en) | 2018-04-20 |
Family
ID=61932635
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711139653.8A Withdrawn CN107944443A (en) | 2017-11-16 | 2017-11-16 | One kind carries out object consistency detection method based on end-to-end deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107944443A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145898A (en) * | 2018-07-26 | 2019-01-04 | 清华大学深圳研究生院 | A kind of object detecting method based on convolutional neural networks and iterator mechanism |
CN109190537A (en) * | 2018-08-23 | 2019-01-11 | 浙江工商大学 | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning |
CN109299434A (en) * | 2018-09-04 | 2019-02-01 | 重庆公共运输职业学院 | Cargo customs clearance big data is intelligently graded and sampling observation rate computing system |
CN109801297A (en) * | 2019-01-14 | 2019-05-24 | 浙江大学 | A kind of image panorama segmentation prediction optimization method realized based on convolution |
CN109871798A (en) * | 2019-02-01 | 2019-06-11 | 浙江大学 | A kind of remote sensing image building extracting method based on convolutional neural networks |
CN110008808A (en) * | 2018-12-29 | 2019-07-12 | 北京迈格威科技有限公司 | Panorama dividing method, device and system and storage medium |
CN110298364A (en) * | 2019-06-27 | 2019-10-01 | 安徽师范大学 | Based on the feature selection approach of multitask under multi-threshold towards functional brain network |
CN110349167A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of image instance dividing method and device |
CN110633595A (en) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | Target detection method and device by utilizing bilinear interpolation |
CN110909748A (en) * | 2018-09-17 | 2020-03-24 | 斯特拉德视觉公司 | Image encoding method and apparatus using multi-feed |
CN110956131A (en) * | 2019-11-27 | 2020-04-03 | 北京迈格威科技有限公司 | Single-target tracking method, device and system |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
WO2020156303A1 (en) * | 2019-01-30 | 2020-08-06 | 广州市百果园信息技术有限公司 | Method and apparatus for training semantic segmentation network, image processing method and apparatus based on semantic segmentation network, and device and storage medium |
CN112684704A (en) * | 2020-12-18 | 2021-04-20 | 华南理工大学 | End-to-end motion control method, system, device and medium based on deep learning |
CN112692875A (en) * | 2021-01-06 | 2021-04-23 | 华南理工大学 | Digital twin system for operation and maintenance of welding robot |
CN112799401A (en) * | 2020-12-28 | 2021-05-14 | 华南理工大学 | End-to-end robot vision-motion navigation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106204555A (en) * | 2016-06-30 | 2016-12-07 | 天津工业大学 | A kind of combination Gbvs model and the optic disc localization method of phase equalization |
CN106599939A (en) * | 2016-12-30 | 2017-04-26 | 深圳市唯特视科技有限公司 | Real-time target detection method based on region convolutional neural network |
CN106780536A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of shape based on object mask network perceives example dividing method |
-
2017
- 2017-11-16 CN CN201711139653.8A patent/CN107944443A/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106204555A (en) * | 2016-06-30 | 2016-12-07 | 天津工业大学 | A kind of combination Gbvs model and the optic disc localization method of phase equalization |
CN106599939A (en) * | 2016-12-30 | 2017-04-26 | 深圳市唯特视科技有限公司 | Real-time target detection method based on region convolutional neural network |
CN106780536A (en) * | 2017-01-13 | 2017-05-31 | 深圳市唯特视科技有限公司 | A kind of shape based on object mask network perceives example dividing method |
Non-Patent Citations (1)
Title |
---|
THANH-TOAN DO ET AL.: "AffordanceNet: An End-to-End Deep Learning Approach for Object Affordance Detection", 《ARXIV》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110633595B (en) * | 2018-06-21 | 2022-12-02 | 北京京东尚科信息技术有限公司 | Target detection method and device by utilizing bilinear interpolation |
CN110633595A (en) * | 2018-06-21 | 2019-12-31 | 北京京东尚科信息技术有限公司 | Target detection method and device by utilizing bilinear interpolation |
CN109145898A (en) * | 2018-07-26 | 2019-01-04 | 清华大学深圳研究生院 | A kind of object detecting method based on convolutional neural networks and iterator mechanism |
CN109190537A (en) * | 2018-08-23 | 2019-01-11 | 浙江工商大学 | A kind of more personage's Attitude estimation methods based on mask perceived depth intensified learning |
CN109190537B (en) * | 2018-08-23 | 2020-09-29 | 浙江工商大学 | Mask perception depth reinforcement learning-based multi-person attitude estimation method |
CN109299434A (en) * | 2018-09-04 | 2019-02-01 | 重庆公共运输职业学院 | Cargo customs clearance big data is intelligently graded and sampling observation rate computing system |
CN110909748A (en) * | 2018-09-17 | 2020-03-24 | 斯特拉德视觉公司 | Image encoding method and apparatus using multi-feed |
CN110909748B (en) * | 2018-09-17 | 2023-09-19 | 斯特拉德视觉公司 | Image encoding method and apparatus using multi-feed |
CN110008808A (en) * | 2018-12-29 | 2019-07-12 | 北京迈格威科技有限公司 | Panorama dividing method, device and system and storage medium |
CN109801297A (en) * | 2019-01-14 | 2019-05-24 | 浙江大学 | A kind of image panorama segmentation prediction optimization method realized based on convolution |
WO2020156303A1 (en) * | 2019-01-30 | 2020-08-06 | 广州市百果园信息技术有限公司 | Method and apparatus for training semantic segmentation network, image processing method and apparatus based on semantic segmentation network, and device and storage medium |
CN109871798A (en) * | 2019-02-01 | 2019-06-11 | 浙江大学 | A kind of remote sensing image building extracting method based on convolutional neural networks |
WO2020155518A1 (en) * | 2019-02-03 | 2020-08-06 | 平安科技(深圳)有限公司 | Object detection method and device, computer device and storage medium |
CN110298364A (en) * | 2019-06-27 | 2019-10-01 | 安徽师范大学 | Based on the feature selection approach of multitask under multi-threshold towards functional brain network |
CN110349167A (en) * | 2019-07-10 | 2019-10-18 | 北京悉见科技有限公司 | A kind of image instance dividing method and device |
CN110956131A (en) * | 2019-11-27 | 2020-04-03 | 北京迈格威科技有限公司 | Single-target tracking method, device and system |
CN110956131B (en) * | 2019-11-27 | 2024-01-05 | 北京迈格威科技有限公司 | Single-target tracking method, device and system |
CN112684704A (en) * | 2020-12-18 | 2021-04-20 | 华南理工大学 | End-to-end motion control method, system, device and medium based on deep learning |
CN112799401A (en) * | 2020-12-28 | 2021-05-14 | 华南理工大学 | End-to-end robot vision-motion navigation method |
CN112692875B (en) * | 2021-01-06 | 2021-08-10 | 华南理工大学 | Digital twin system for operation and maintenance of welding robot |
CN112692875A (en) * | 2021-01-06 | 2021-04-23 | 华南理工大学 | Digital twin system for operation and maintenance of welding robot |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107944443A (en) | One kind carries out object consistency detection method based on end-to-end deep learning | |
CN110428428B (en) | Image semantic segmentation method, electronic equipment and readable storage medium | |
CN104809187B (en) | A kind of indoor scene semanteme marking method based on RGB D data | |
Volpi et al. | Deep multi-task learning for a geographically-regularized semantic segmentation of aerial images | |
Yang et al. | Layered object models for image segmentation | |
CN105869178B (en) | A kind of complex target dynamic scene non-formaldehyde finishing method based on the convex optimization of Multiscale combination feature | |
CN106920243A (en) | The ceramic material part method for sequence image segmentation of improved full convolutional neural networks | |
US20160055237A1 (en) | Method for Semantically Labeling an Image of a Scene using Recursive Context Propagation | |
CN107909015A (en) | Hyperspectral image classification method based on convolutional neural networks and empty spectrum information fusion | |
CN104281853A (en) | Behavior identification method based on 3D convolution neural network | |
CN110929665B (en) | Natural scene curve text detection method | |
CN106599805A (en) | Supervised data driving-based monocular video depth estimating method | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence | |
JP7329041B2 (en) | Method and related equipment for synthesizing images based on conditional adversarial generation networks | |
Liu et al. | Robust salient object detection for RGB images | |
CN112734789A (en) | Image segmentation method and system based on semi-supervised learning and point rendering | |
Hernández et al. | CUDA-based parallelization of a bio-inspired model for fast object classification | |
Zhang et al. | Class relatedness oriented-discriminative dictionary learning for multiclass image classification | |
CN107657276B (en) | Weak supervision semantic segmentation method based on searching semantic class clusters | |
CN109726725A (en) | The oil painting writer identification method of heterogeneite Multiple Kernel Learning between a kind of class based on large-spacing | |
Vinoth Kumar et al. | A decennary survey on artificial intelligence methods for image segmentation | |
Liu et al. | Dunhuang murals contour generation network based on convolution and self-attention fusion | |
CN103440651A (en) | Multi-label image annotation result fusion method based on rank minimization | |
CN104778683A (en) | Multi-modal image segmenting method based on functional mapping | |
Wang et al. | Self-attention deep saliency network for fabric defect detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180420 |