CN107808167A

CN107808167A - A kind of method that complete convolutional network based on deformable segment carries out target detection

Info

Publication number: CN107808167A
Application number: CN201711022519.XA
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2017-10-27
Filing date: 2017-10-27
Publication date: 2018-03-16

Abstract

The present invention proposes a kind of method that complete convolutional network based on deformable segment carries out target detection, mainly to include complete convolution feature extraction, the classification based on deformable segment RoI ponds and deformable segment and location prediction, backbone network structure is used as using complete convolutional network (FCN), using deformable segment RoI ponds, region suggestion R is merotomized, position the best match shape of these parts, and relevant position alignment in the picture, prediction are suggested being classified and being positioned to carry out using two branches to region.The present invention utilizes complete convolutional network, is shared in the calculating of whole image, there is provided the Feature Mapping related to task, infers regional location using special RoI set, final prediction is simply collected, and being classified and being positioned carries out target detection.The matching degree between classification and Detection task is ensure that, colleague ensure that the rate respectively of characteristic pattern, improve the accuracy rate of target detection.

Description

A kind of method that complete convolutional network based on deformable segment carries out target detection

Technical field

The present invention relates to object detection field, more particularly to a kind of complete convolutional network based on deformable segment to carry out mesh The method for marking detection.

Background technology

In recent years, the deep learning based on depth convolutional network is widely applied in some visual fields, main to use In fields such as image segmentation, semantic segmentation and target detections.Target detection turns into one of key technology of artificial intelligence field, And vision guided navigation, intelligent transportation, video frequency searching and compression, three-dimensionalreconstruction, security monitoring and medical treatment etc. have it is wide should With in addition, being all widely used in civilian large scene fields such as magnitude of traffic flow control, vehicle abnormality behavior monitorings.It is normal at this stage The region interested that network is focused in image is suggested in region, and the region is classified and positioned, this method Still there are many deficiencies, easily produce the mismatch between classification and Detection task, and reduce the spatial resolution of characteristic pattern, Make network insensitive to the object's position in region, influence accuracy rate, therefore, existing object detection method still suffers from certain Limitation.

The present invention proposes a kind of method that complete convolutional network based on deformable segment carries out target detection, with complete Convolutional network (FCN) is used as backbone network structure, using deformable segment RoI ponds, region suggestion R is merotomized, positioned The best match shape of these parts, and relevant position alignment in the picture, prediction are suggested using two branches to region Classified and positioned to carry out.The present invention utilizes complete convolutional network, is shared in the calculating of whole image, there is provided with task Related Feature Mapping, regional location is inferred using special RoI set, final prediction is simply collected, classified Target detection is carried out with positioning.The matching degree between classification and Detection task is ensure that, colleague ensure that the resolution ratio of characteristic pattern, Improve the accuracy rate of target detection.

The content of the invention

Low and the problem of target detection accuracy rate is not high for characteristic pattern resolution ratio, the present invention utilizes complete convolution net Network, shared in the calculating of whole image, there is provided the Feature Mapping related to task, region is inferred using special RoI set Position, final prediction are simply collected, and being classified and being positioned carries out target detection.

To solve the above problems, the invention provides a kind of complete convolutional network based on deformable segment to carry out target inspection The method of survey, mainly includes：

Complete convolution feature extraction (one)；

Based on deformable segment RoI ponds (two)；

The classification of deformable segment and location prediction (three).

Wherein, the complete convolutional network of deformable segment, region characterizing part, alignd, carried by optimizing its position High-class and location prediction, the expression based on part have more consistency to partial transformation, and the structure of part provides relevant thing The important information of body geometry.

Wherein, complete convolution feature extractor (one), can be with using complete convolutional network (FCN) as backbone network structure Shared in the calculating of whole image, and reduce RoI layers, there is provided the Feature Mapping related to task, final prediction carry out letter Single collects (such as detecting thermal map), in the complete convolutional network (DP-FCN) of deformable segment, is gathered using special RoI Infer regional location, complete convolutional coding structure is suitable for calculating the response of all parts in all classes, as every a kind of mapping, phase The structure answered is used to position, and the complete representation of whole image (classification and positioning figure per a kind of each part) is by forward pass Acquisition is passed, and is shared between all areas of same image.

Further, to partly positioning in characteristic pattern, there is higher requirement to the resolution ratio of figure, FCNs is only included Space layer, suitable shelf space resolution ratio, with the full connection layer network difference terminated with network, specifically, if stride is too Greatly, the deformation of part may be excessively coarse, can not correctly describe object, and pond is skipped by expanding the convolution on last convolution block Change, merge three last steps so as to reduce stride.

Wherein, based on deformable segment RoI ponds (two), it is therefore an objective to region suggestion R merotomizes, repositions these The best match shape of part, the differentiation part element modeling of each part, and relevant position alignment in the picture, this Kind is more to represent constant converting objects based on part deformation, right because the positioning of this partial response and position are more stable It is more useful in non-rigid object.

Further, best match shape, by a size be k × k specifications grid to partly splitting, to region Matched, each point (i, j) is considered as independent part R_i,j, because partial quantity (i.e.k²) consolidate as hyper parameter It is fixed, can be that each regions (i, j) of each class c calculate the complete detection thermal map z of acquisition_i,j,c, part is only needed corresponding Optimized in map；Partial deformation allows part somewhat to be moved in reference position (subregion of prime area), and selection is optimal Potential displacement, select pond value, the pond fraction of part (i, j) from selected positionIt is in feature with classification c Map the balance that fraction maximizes and reference position displacement (dx, dy) minimizes：

Wherein, λ^defRepresent the intensity of regular (small deformation).

Further, in training, deform no section-level indicate in the case of it is optimised, into transmittance process Calculate and be stored in identical position and be used for backpropagation, all parts and class are all that independent calculate deforms, but are not calculated Background information, without that can distinguish key element, the same displacement of part is used to collect value from positioning figure background information；λ^defDirectly close It is the size to partial dislocation, therefore is also related to RoIs deformation, by controlling squared-distance regularization, increases to regularization Weight, effectively reduces displaced portion, and increase improves regularization weight, displaced portion effectively reduced, if λ^def=+∞, make d_xWith d_yLevel off to 0, make the band of position pond be：

On the other hand, if λ^def=0, removing normalization and part can move freely, and work as λ^defIt is worth too low, as a result reduces, Prove normalized importance, λ^defIt is stable in very large range.

Wherein, the classification of deformable segment and location prediction (three), prediction are suggested classifying using two branches to region Carried out with positioning, classification branch is only to be made up of average one softmax layer of pond heel, and this is that R-FCN is used Strategy, classification come more transforming object consistency and is improved based on RoI ponds part Zona transformans.

Further, positioning is positioned from corresponding feature calculation using average pond and exported, and the position of part passes through above The member align of execution obtains, there is provided the abundant geological information of object appearance, such as their shape or posture, can be used for Improve positioning precision.

Further, positioning precision introduces a new deformation perceived position and refined module, for each region R, Extract the characteristic vector of the displacement (dx, dy) of all parts of c classesOf a sort previous output is improved,It is by two Individual full articulamentum transmission, then element be multiplied by first value and produce this kind of final position output, because refinement is mainly several What, all classes are respectively completed, and parameter is shared between classification.

Brief description of the drawings

Fig. 1 is the flow for the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection Figure.

Fig. 2 is the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection based on portion Divide (RoI) pond flow chart of deformation.

Fig. 3 is the positioning accurate for the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection Change flow chart.

Specific embodiment

It should be noted that in the case where not conflicting, the feature in embodiment and embodiment in the application can phase Mutually combine, the present invention is described in further detail with specific embodiment below in conjunction with the accompanying drawings.

Fig. 1 is the flow for the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection Figure.Mainly include complete convolution feature extraction (one)；Based on deformable segment RoI ponds (two)；The classification of deformable segment and fixed Position prediction (three).The complete convolutional network of deformable segment, region characterizing part, alignd, improved by optimizing its position Classification and location prediction, the expression based on part have more consistency to partial transformation, and the structure of part provides relevant object The important information of geometry.

Complete convolution feature extractor (one), can be whole using complete convolutional network (FCN) as backbone network structure Shared in the calculating of image, and reduce RoI layers, there is provided the Feature Mapping related to task, final prediction are simply converged Always, in the complete convolutional network (DP-FCN) of deformable segment, regional location, completely volume are inferred using special RoI set Product structure is suitable for calculating the response of all parts in all classes, and as every a kind of mapping, corresponding structure is used to position, whole The complete representation (classification and positioning figure per a kind of each part) of individual image to front transfer by obtaining, and in same figure Shared between all areas of picture.

To partly positioning in characteristic pattern, there is higher requirement to the resolution ratio of figure, FCNs only includes space layer, fits Shelf space resolution ratio is closed, it is different with the full connection layer network that is terminated with network, specifically, if stride is too big, part Deformation may be excessively coarse, can not correctly describe object, skips pond by expanding the convolution on last convolution block, merges most Three steps afterwards are so as to reducing stride.

Fig. 2 is the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection based on portion Divide (RoI) pond flow chart of deformation.It is that region suggestion R merotomizes based on deformable segment RoI ponds (two) purpose, weight The best match shape (see Fig. 2) of these parts of new definition, the differentiation part element modeling of each part, and in the picture Relevant position alignment, it is this based on part deformation be more to represent constant converting objects because this partial response position It is more stable with position, it is more useful for non-rigid object.

By a size be k × k specifications grid to partly splitting, region is matched, each point (i, j) It is considered as independent part R_i,j, because partial quantity (i.e.k²) fixed as hyper parameter, can be each class c (left figures 2) each region (i, j), which calculates, obtains complete detection thermal map z_i,j,c, part only needs to optimize in corresponding map；

Partial deformation allows part somewhat to move (subregion of prime area) in reference position, selects optimal potential Displacement, pond value, the pond fraction of part (i, j) are selected from selected positionIt is in Feature Mapping fraction with classification c Maximize the balance minimized with reference position (see Fig. 2) displacement (dx, dy)：

Wherein, λ^defRepresent the intensity of regular (small deformation).

In training, deform no section-level indicate in the case of it is optimised, into transmittance process calculate and It is stored in identical position and is used for backpropagation, all parts and class are all that independent calculate deforms, but do not calculate background information, Without that can distinguish key element, the same displacement of part is used to collect value from positioning figure background information；λ^defIt is directly connected to part position The size of shifting, therefore be also related to RoIs deformation, by controlling squared-distance regularization, increase to regularization weight, effectively Displaced portion is reduced, increase improves regularization weight, displaced portion effectively reduced, if λ^def=+∞, make d_xAnd d_yLevel off to 0, The band of position pond is set to be：

Fig. 3 is the positioning accurate for the method that a kind of complete convolutional network based on deformable segment of the present invention carries out target detection Change flow chart.The classification of deformable segment and location prediction (three), prediction are suggested being classified and being determined using two branches to region Position is carried out, and classification branch is only to be made up of average pond heel one softmax layer, and this is the strategy that R-FCN is used, Classification come more transforming object consistency and is improved based on RoI ponds part Zona transformans.

Positioned and exported from corresponding feature calculation using average pond, the position of part passes through in the member align above performed Obtain, there is provided the abundant geological information of object appearance, such as their shape or posture, can be used for improving positioning precision.

Introduce a new deformation perceived position to refine module, for each region R, extract all parts of c classes Displacement (dx, dy) characteristic vectorOf a sort previous output is improved,It is by two full articulamentum transmission, so Element is multiplied by first value and produces this kind of final position output afterwards, and because refinement is mainly geometry, all classes have been distinguished Into parameter is shared between classification.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of refreshing and scope, the present invention can be realized with other concrete forms.In addition, those skilled in the art can be to this hair Bright to carry out various changes and modification without departing from the spirit and scope of the present invention, these improvement and modification also should be regarded as the present invention's Protection domain.Therefore, appended claims are intended to be construed to include preferred embodiment and fall into all changes of the scope of the invention More and change.

Claims

1. a kind of method that complete convolutional network based on deformable segment carries out target detection, it is characterised in that mainly include Complete convolution feature extraction (one)；Based on deformable segment RoI ponds (two)；The classification of deformable segment and location prediction (three).

2. the complete convolutional network based on the deformable segment described in claims 1, it is characterised in that region characterizing part, Alignd by optimizing its position, improve classification and location prediction, the expression based on part is to partial transformation with more constant Property, the structure of part provides the important information about geometry of objects.

3. based on the complete convolution feature extractor (one) described in claims 1, it is characterised in that with complete convolutional network (FCN) backbone network structure is used as, can be shared in the calculating of whole image, and reduce RoI layers, there is provided related to task Feature Mapping, final prediction is simply collected, in the complete convolutional network (DP-FCN) of deformable segment, using spy Regional location is inferred in different RoI set, and complete convolutional coding structure is suitable for calculating the response of all parts in all classes, as each The mapping of class, corresponding structure are used to position, and the complete representation of whole image is (per the classification and positioning of a kind of each part Figure) by being obtained to front transfer, and shared between all areas of same image.

4. based on the response described in claims 3, to partly positioning in characteristic pattern, have to the resolution ratio of figure higher It is required that FCNs only includes space layer, it is adapted to shelf space resolution ratio, it is different with the full connection layer network that is terminated with network, specifically For, if stride is too big, the deformation of part may be excessively coarse, can not correctly describe object, by expanding last convolution Convolution on block skips pond, merges three last steps so as to reduce stride.

5. based on described in claims 1 based on deformable segment RoI ponds (two), it is characterised in that purpose is to build region View R merotomizes, and repositions the best match shape of these parts, and the differentiation part element of each part models, And relevant position alignment in the picture, this deformed based on part is more to represent constant converting objects, because this portion Split-phase should position it is more stable with position, it is more useful for non-rigid object.

6. based on the best match shape described in claims 5, it is characterised in that by a size be k × k specification grids To partly splitting, region is matched, each point (i, j) is considered as independent part R_i,j, because partial quantity (i.e.k²) fixed as hyper parameter, can be that each regions (i, j) of each class c calculate the complete detection thermal map of acquisition z_i,j,c, part only needs to optimize in corresponding map；

Partial deformation allows part somewhat to move (subregion of prime area) in reference position, selects optimal potential position Move, pond value, the pond fraction of part (i, j) are selected from selected positionWith classification c be in Feature Mapping fraction most The balance that bigization and reference position (see Fig. 2) displacement (dx, dy) minimize：

<mrow> <msubsup> <mi>P</mi> <mi>c</mi> <mi>R</mi> </msubsup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>max</mi> <mrow> <msub> <mi>d</mi> <mi>x</mi> </msub> <mo>,</mo> <msub> <mi>d</mi> <mi>y</mi> </msub> </mrow> </munder> <mo>&lsqb;</mo> <msub> <mi>Pool</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>R</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </msub> <msub> <mi>Z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>c</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>+</mo> <mi>d</mi> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>+</mo> <mi>d</mi> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <msup> <mi>&lambda;</mi> <mrow> <mi>d</mi> <mi>e</mi> <mi>f</mi> </mrow> </msup> <mrow> <mo>(</mo> <msup> <mi>dx</mi> <mn>2</mn> </msup> <mo>+</mo> <msup> <mi>dy</mi> <mn>2</mn> </msup> <mo>)</mo> </mrow> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein, λ^defRepresent the intensity of regular (small deformation).

7. based on the deformation described in claims 6, it is characterised in that in training, deform in no section-level sign In the case of it is optimised, be used for backpropagation, all parts and class being calculated into transmittance process and being stored in identical position All it is that independent calculate deforms, but does not calculate background information, without that can distinguish key element, the same displacement of part is used for background information Collect value from positioning figure；λ^defThe size of partial dislocation is directly connected to, therefore is also related to RoIs deformation, passes through control Squared-distance regularization, increase to regularization weight, effectively reduce displaced portion, increase improves regularization weight, effectively drop Low displacement part, if λ^def=+∞, make d_xAnd d_yLevel off to 0, make the band of position pond be：

<mrow> <msubsup> <mi>P</mi> <mi>c</mi> <mi>R</mi> </msubsup> <mrow> <mo>(</mo> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>Pool</mi> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> <mo>&Element;</mo> <msub> <mi>R</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> </mrow> </msub> <msub> <mi>Z</mi> <mrow> <mi>i</mi> <mo>,</mo> <mi>j</mi> <mo>,</mo> <mi>c</mi> </mrow> </msub> <mrow> <mo>(</mo> <mi>x</mi> <mo>,</mo> <mi>y</mi> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

On the other hand, if λ^def=0, removing normalization and part can move freely, and work as λ^defIt is worth too low, as a result reduces, it was demonstrated that Normalized importance, λ^defIt is stable in very large range.

8. classification and location prediction (three) based on the described deformable segment described in claims 1, it is characterised in that pre- Survey and region is suggested being classified and being positioned to carry out using two branches, classification branch is only by an average pond heel one Softmax layers form, this be R-FCN use strategy, based on RoI ponds part Zona transformans come more transforming object consistency and And improve classification.

9. based on the positioning described in claims 8, it is characterised in that defeated from the positioning of corresponding feature calculation using average pond Go out, the position of part in the member align above performed by obtaining, there is provided the abundant geological information of object appearance, such as he Shape or posture, can be used for improve positioning precision.

10. based on the positioning precision described in claims 9, it is characterised in that introduce a new deformation perceived position essence Change module, for each region R, the characteristic vector of the displacement (dx, dy) of all parts of extraction c classesImprove same class Previous output,It is by two full articulamentum transmission, then element is multiplied by first value and produces this kind of most final position Output is put, because refinement is mainly geometry, all classes are respectively completed, and parameter is shared between classification.