CN108830300A

CN108830300A - A kind of object transmission method based on mixing supervisory detection

Info

Publication number: CN108830300A
Application number: CN201810520073.1A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2018-05-28
Filing date: 2018-05-28
Publication date: 2018-11-16

Abstract

A kind of object transmission method based on mixing supervisory detection proposed in the present invention, main contents include：The study of domain immutable object, subject perceptions detection model, its process is, first learn the constant object knowledge in domain from strong tag image and weak tag image, then with the annotation box of strong tag image come training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and domain invariance is realized by the gradient of domain classifier, object and chaff interferent are roughly distinguished followed by object knowledge, finally, further discriminating between object and chaff interferent using the model based on more case-based learnings (MIL) method.The present invention solves previous method for checking object and needs a large amount of bounding box annotation, and it is easy because object and chaff interferent, which cannot be distinguished, leads to the problem of detecting failure, the object knowledge learnt from weak tag image can effectively be transmitted, the demand annotated to bounding box is reduced, while improving the ability of identification object and chaff interferent.

Description

A kind of object transmission method based on mixing supervisory detection

Technical field

The present invention relates to computer vision fields, more particularly, to a kind of object transmission side based on mixing supervisory detection Method.

Background technique

Mixing supervisory detection technology refer to using bounding box annotation strong tag image and only image level label it is weak Tag image trains detection network, allows the network to the object and exclusive PCR object of identification new category.Mixing supervision inspection Survey technology can be applied to field of traffic, by rapidly identifying license plate number and face automatically, can efficiently disobey recording traffic Rule behavior；In safety-security area, mixing supervisory detection technology can be used for identifying the lawbreaker in the delinquent events such as theft And automatic alarm；In military field, environmental disturbances can quickly be excluded by mixing supervisory detection technology, identify the enemy army of camouflage.So And previous method for checking object needs a large amount of bounding box annotation, and is easy because indistinguishable object and chaff interferent are led Cause detection failure.

A kind of object transmission method based on mixing supervisory detection is proposed in the present invention, first from strong tag image and weak mark The object knowledge that study domain is constant in image is remembered, then with the annotation box of strong tag image come training objective fallout predictor, with weak mark The no noting frame of note image carrys out training domain classifier, and domain invariance is realized by the gradient of domain classifier, followed by mesh Mark knowledge roughly distinguishes object and chaff interferent, finally, being further discriminated between using the model based on more case-based learnings (MIL) method Object and chaff interferent.The present invention can effectively transmit the object knowledge learnt from weak tag image, reduce and infuse to bounding box The demand released, while improving the ability of identification object and chaff interferent.

Summary of the invention

Need a large amount of bounding box annotation for previous method for checking object, and be easy because cannot be distinguished object and Chaff interferent leads to the problem of detection failure, and the purpose of the present invention is to provide a kind of object transmission sides based on mixing supervisory detection Method first learns the constant object knowledge in domain, then with the annotation box of strong tag image from strong tag image and weak tag image Carry out training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and by the gradient of domain classifier come It realizes domain invariance, roughly distinguishes object and chaff interferent followed by object knowledge, finally, using more case-based learnings are based on (MIL) model of method further discriminates between object and chaff interferent.

To solve the above problems, the present invention provides a kind of object transmission method based on mixing supervisory detection, it is main interior Appearance includes：

(1) study of domain immutable object；

(2) subject perceptions detection model.

Wherein, the study of domain immutable object refers to that the method based on convolutional neural networks (CNN) builds object knowledge Mould, and object knowledge model is directly trained with the training program from bottom to top that selective search generates.

Further, the object knowledge is referred to and is annotated using the bounding box in strong classification, the study of object knowledge It is carried out as a binary classification task：Having very big overlapping region to be considered as object with reference label frame, and it is overlapped smaller Region be considered as non-object.

Further, the training program refers to that in training, image and one group of region suggest inputing to multiple convolution Layer, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region r_iIt is eventually mapped to 256 n dimensional vector nsAs the internal representation of input area, then f_iIt is connected to target predictor and domain classifier Liang Ge branch.

Further, the target predictor, is expressed as G_obj, formed by fc layers, according to f_iEstimation range r_iWhether be One object separates subject area and non-object region due to needing to be annotated with bounding box in the picture, so only using SetIn region carry out training objective fallout predictor, binary logic loss is as follows：

WhereinIt is region r_iBelong to the posterior probability of object；The region intersected with reference label (IoU) be considered as positive example not less than 0.5 region, maximum IoU section [0.1,0.5) in region be reverse side example Son；Example positive and negative in each image is pressed 1:3 ratio keeps balance, because the quantity of negative examples is far longer than positive example Quantity；Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be well It is applied to setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and target Domain.

Further, the domain classifier, is expressed as G_dom, domain may be implemented not by link field classifier and characteristic f Denaturation；Domain classifier is received from setAnd setRegion, predict the source of input area, it is binary system point Generic task, the majorized function used are：

WhereinIt is region r_iBelong to setProbability, in this domain classification task, from collection It closesThe region of middle sampling is positive example, from setThe region of middle sampling is positive examples；

In the forward propagation process, domain classifier executes program by standard, and L is calculated_dom, in back-propagation process, The gradient of domain classifier is first reversed passes to f (multiplied by -1) again, inverts operator, in the training process, net by this gradient Network is virtually the maximum to have changed L_dom；In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 A region.

Wherein, the subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate SetIn weak tag image object and chaff interferent, sequentially for each weak tag image, its region suggests first being entered It into object module, to obtain their target score, is then ranked up according to score, the suggestion quilt of the highest m% of score It is selected as subject area, remaining region is as interference region；Be then based on more case-based learnings (MIL) method further to object and Difference modeling between chaff interferent, finally using quickly convolutional network method (R-CNN) frame based on region to object sense Know that detection model is trained.

Further, the modeling refers in MIL frame, for setEach of weak tag image, use Subject area constructs object data packet, constructs chaff interferent data packet with interference region, the label of both data packets isWherein interference coating is labeled asAnd object coating is labeled as corresponding object class

Further, the training refers in the training process, the image of weak label and by selective search Suggest in the region of generationAs network input (each image include two packet：Object packet and interference packet), then network is same When suggest calculating feature for each region, and by Feature Mapping to K+1 dimensional vectorRepresent the classification in region Score, the other score of these region classes are directly used in detection performance when assessment test；Finally use cross entropy loss function pair Network carries out end-to-end training, and the weight attenuation parameter of weight w is for answering the object knowledge learnt preferably in CNN network For weak tag image；Wherein because the region in packet cannot be labeled, the other score S of region class^RIt needs to be aggregated to one The classification score S of a package level^BIn carry out training pattern.

Further, the other score of the region class refers in traditional MIL setting, by operation max by highest Score of the other score of region class as package level, but operation max is wrapped only each using a region as positive example, is Loosen this limitation, uses " exp-sum-log " operator as the soft approximation of max operation；After the score for obtaining package level, benefit Packet B is calculated with hyperbolic functions_iBelong to the posterior probability of k-th of classification.

Detailed description of the invention

Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.

Fig. 2 is a kind of subject perceptions detection model process of the object transmission method based on mixing supervisory detection of the present invention Figure.

Specific embodiment

It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.

Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.It mainly include domain The study of immutable object and subject perceptions detection model.

Subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate setIn weak mark The object and chaff interferent for remembering image, sequentially for each weak tag image, its region suggests first being input into object module, To obtain their target score, then it is ranked up according to score, the suggestion of the highest m% of score is chosen as subject area, Remaining region is as interference region；More case-based learnings (MIL) method is then based on further between object and chaff interferent Difference modeling, finally using quick convolutional network method (R-CNN) frame based on region to subject perceptions detection model into Row training.

For those skilled in the art, the present invention is not limited to the details of above-described embodiment, without departing substantially from essence of the invention In the case where mind and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as of the invention Protection scope.Therefore, it includes preferred embodiment and all changes for falling into the scope of the invention that the following claims are intended to be interpreted as More and modify.

Claims

1. a kind of object transmission method based on mixing supervisory detection, which is characterized in that main includes the study of domain immutable object (1)；Subject perceptions detection model (two).

2. the study (one) based on domain immutable object described in claims 1, which is characterized in that with based on convolutional neural networks (CNN) method models object knowledge, and object knowledge model is directly generated from bottom to top with selective search Training program be trained.

3. based on object knowledge described in claims 2, which is characterized in that annotated using the bounding box in strong classification, target The study of knowledge is carried out as a binary classification task：Thering is very big overlapping region to be considered as object with reference label frame, And it is overlapped lesser region and is considered as non-object.

4. based on training program described in claims 2, which is characterized in that in training, image and one group of region suggestion are defeated Enter to multiple convolutional layers, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region r_iIt is eventually mapped to one 256 n dimensional vector nsAs the internal representation of input area, then f_iIt is connected to target predictor and domain classifier two Branch.

5. based on target predictor described in claims 4, which is characterized in that be expressed as G_obj, formed by fc layers, according to f_iIn advance Survey region r_iWhether it is an object, separates subject area and non-object region due to needing to be annotated with bounding box in the picture It opens, so only using setIn region carry out training objective fallout predictor, binary logic loss is as follows：

WhereinIt is region r_iBelong to the posterior probability of object；The region (IoU) intersected with reference label Region not less than 0.5 is considered as positive example, maximum IoU section [0.1,0.5) in region be positive examples；It will Positive and negative example presses 1 in each image:3 ratio keeps balance, because the quantity of negative examples is far longer than the number of positive example Amount；Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be applied well To setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and aiming field.

6. based on domain classifier described in claims 4, which is characterized in that be expressed as G_dom, pass through link field classifier and spy Domain invariance may be implemented in property f；Domain classifier is received from setAnd setRegion, predict the source of input area, It is a binary classification task, and the majorized function used is：

WhereinIt is region r_iBelong to setProbability, in this domain classification task, from set The region of middle sampling is positive example, from setThe region of middle sampling is positive examples；

In the forward propagation process, domain classifier executes program by standard, and L is calculated_dom, in back-propagation process, domain point The gradient of class device is first reversed passes to f (multiplied by -1) again, inverts operator by this gradient, and in the training process, network is real L is maximised on border_dom；In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 areas Domain.

7. based on subject perceptions detection model (two) described in claims 1, which is characterized in that arrive object detector study Object knowledge for separate setIn weak tag image object and chaff interferent, sequentially for each weak tag image, it Region suggest first be input into object module, to obtain their target score, be then ranked up according to score, score The suggestion of highest m% is chosen as subject area, and remaining region is as interference region；It is then based on more case-based learnings (MIL) Method further models the difference between object and chaff interferent, finally using the quickly convolutional network method based on region (R-CNN) frame is trained subject perceptions detection model.

8. based on modeling described in claims 6, which is characterized in that in MIL frame, for setEach of it is weak Tag image, constructs object data packet with subject area, constructs chaff interferent data packet, the mark of both data packets with interference region Label areWherein interference coating is labeled asAnd object coating is labeled as corresponding object class

9. based on training described in claims 6, which is characterized in that in the training process, the image of weak label and Suggested by the region that selective search generatesAs network input (each image include two packet：Object packet and interference Packet), then network is that feature is suggested calculating in each region simultaneously, and by Feature Mapping to K+1 dimensional vector The classification score in region is represented, the other score of these region classes is directly used in detection performance when assessment test；Finally using friendship It pitches entropy loss function and end-to-end training is carried out to network, the weight attenuation parameter of weight w is used for the mesh for making to learn in CNN network Mark knowledge is preferably applied for weak tag image；Wherein because the region in packet cannot be labeled, the other score of region class S^RNeed to be aggregated to the classification score S an of package level^BIn carry out training pattern.

10. based on the other score of region class described in claims 9, which is characterized in that in traditional MIL setting, pass through Operation max is using the score of highest zone rank as the score of package level, but operation max is only made using a region in each packet Use " exp-sum-log " operator as the soft approximation of max operation to loosen this limitation for positive example；Obtain packet level After other score, packet B is calculated using hyperbolic functions_iBelong to the posterior probability of k-th of classification.