CN108830300A - A kind of object transmission method based on mixing supervisory detection - Google Patents

A kind of object transmission method based on mixing supervisory detection Download PDF

Info

Publication number
CN108830300A
CN108830300A CN201810520073.1A CN201810520073A CN108830300A CN 108830300 A CN108830300 A CN 108830300A CN 201810520073 A CN201810520073 A CN 201810520073A CN 108830300 A CN108830300 A CN 108830300A
Authority
CN
China
Prior art keywords
region
domain
score
training
packet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810520073.1A
Other languages
Chinese (zh)
Inventor
夏春秋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Vision Technology Co Ltd
Original Assignee
Shenzhen Vision Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Vision Technology Co Ltd filed Critical Shenzhen Vision Technology Co Ltd
Priority to CN201810520073.1A priority Critical patent/CN108830300A/en
Publication of CN108830300A publication Critical patent/CN108830300A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

A kind of object transmission method based on mixing supervisory detection proposed in the present invention, main contents include:The study of domain immutable object, subject perceptions detection model, its process is, first learn the constant object knowledge in domain from strong tag image and weak tag image, then with the annotation box of strong tag image come training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and domain invariance is realized by the gradient of domain classifier, object and chaff interferent are roughly distinguished followed by object knowledge, finally, further discriminating between object and chaff interferent using the model based on more case-based learnings (MIL) method.The present invention solves previous method for checking object and needs a large amount of bounding box annotation, and it is easy because object and chaff interferent, which cannot be distinguished, leads to the problem of detecting failure, the object knowledge learnt from weak tag image can effectively be transmitted, the demand annotated to bounding box is reduced, while improving the ability of identification object and chaff interferent.

Description

A kind of object transmission method based on mixing supervisory detection
Technical field
The present invention relates to computer vision fields, more particularly, to a kind of object transmission side based on mixing supervisory detection Method.
Background technique
Mixing supervisory detection technology refer to using bounding box annotation strong tag image and only image level label it is weak Tag image trains detection network, allows the network to the object and exclusive PCR object of identification new category.Mixing supervision inspection Survey technology can be applied to field of traffic, by rapidly identifying license plate number and face automatically, can efficiently disobey recording traffic Rule behavior;In safety-security area, mixing supervisory detection technology can be used for identifying the lawbreaker in the delinquent events such as theft And automatic alarm;In military field, environmental disturbances can quickly be excluded by mixing supervisory detection technology, identify the enemy army of camouflage.So And previous method for checking object needs a large amount of bounding box annotation, and is easy because indistinguishable object and chaff interferent are led Cause detection failure.
A kind of object transmission method based on mixing supervisory detection is proposed in the present invention, first from strong tag image and weak mark The object knowledge that study domain is constant in image is remembered, then with the annotation box of strong tag image come training objective fallout predictor, with weak mark The no noting frame of note image carrys out training domain classifier, and domain invariance is realized by the gradient of domain classifier, followed by mesh Mark knowledge roughly distinguishes object and chaff interferent, finally, being further discriminated between using the model based on more case-based learnings (MIL) method Object and chaff interferent.The present invention can effectively transmit the object knowledge learnt from weak tag image, reduce and infuse to bounding box The demand released, while improving the ability of identification object and chaff interferent.
Summary of the invention
Need a large amount of bounding box annotation for previous method for checking object, and be easy because cannot be distinguished object and Chaff interferent leads to the problem of detection failure, and the purpose of the present invention is to provide a kind of object transmission sides based on mixing supervisory detection Method first learns the constant object knowledge in domain, then with the annotation box of strong tag image from strong tag image and weak tag image Carry out training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and by the gradient of domain classifier come It realizes domain invariance, roughly distinguishes object and chaff interferent followed by object knowledge, finally, using more case-based learnings are based on (MIL) model of method further discriminates between object and chaff interferent.
To solve the above problems, the present invention provides a kind of object transmission method based on mixing supervisory detection, it is main interior Appearance includes:
(1) study of domain immutable object;
(2) subject perceptions detection model.
Wherein, the study of domain immutable object refers to that the method based on convolutional neural networks (CNN) builds object knowledge Mould, and object knowledge model is directly trained with the training program from bottom to top that selective search generates.
Further, the object knowledge is referred to and is annotated using the bounding box in strong classification, the study of object knowledge It is carried out as a binary classification task:Having very big overlapping region to be considered as object with reference label frame, and it is overlapped smaller Region be considered as non-object.
Further, the training program refers to that in training, image and one group of region suggest inputing to multiple convolution Layer, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to 256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier Liang Ge branch.
Further, the target predictor, is expressed as Gobj, formed by fc layers, according to fiEstimation range riWhether be One object separates subject area and non-object region due to needing to be annotated with bounding box in the picture, so only using SetIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region intersected with reference label (IoU) be considered as positive example not less than 0.5 region, maximum IoU section [0.1,0.5) in region be reverse side example Son;Example positive and negative in each image is pressed 1:3 ratio keeps balance, because the quantity of negative examples is far longer than positive example Quantity;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be well It is applied to setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and target Domain.
Further, the domain classifier, is expressed as Gdom, domain may be implemented not by link field classifier and characteristic f Denaturation;Domain classifier is received from setAnd setRegion, predict the source of input area, it is binary system point Generic task, the majorized function used are:
WhereinIt is region riBelong to setProbability, in this domain classification task, from collection It closesThe region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process, The gradient of domain classifier is first reversed passes to f (multiplied by -1) again, inverts operator, in the training process, net by this gradient Network is virtually the maximum to have changed Ldom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 A region.
Wherein, the subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate SetIn weak tag image object and chaff interferent, sequentially for each weak tag image, its region suggests first being entered It into object module, to obtain their target score, is then ranked up according to score, the suggestion quilt of the highest m% of score It is selected as subject area, remaining region is as interference region;Be then based on more case-based learnings (MIL) method further to object and Difference modeling between chaff interferent, finally using quickly convolutional network method (R-CNN) frame based on region to object sense Know that detection model is trained.
Further, the modeling refers in MIL frame, for setEach of weak tag image, use Subject area constructs object data packet, constructs chaff interferent data packet with interference region, the label of both data packets isWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
Further, the training refers in the training process, the image of weak label and by selective search Suggest in the region of generationAs network input (each image include two packet:Object packet and interference packet), then network is same When suggest calculating feature for each region, and by Feature Mapping to K+1 dimensional vectorRepresent the classification in region Score, the other score of these region classes are directly used in detection performance when assessment test;Finally use cross entropy loss function pair Network carries out end-to-end training, and the weight attenuation parameter of weight w is for answering the object knowledge learnt preferably in CNN network For weak tag image;Wherein because the region in packet cannot be labeled, the other score S of region classRIt needs to be aggregated to one The classification score S of a package levelBIn carry out training pattern.
Further, the other score of the region class refers in traditional MIL setting, by operation max by highest Score of the other score of region class as package level, but operation max is wrapped only each using a region as positive example, is Loosen this limitation, uses " exp-sum-log " operator as the soft approximation of max operation;After the score for obtaining package level, benefit Packet B is calculated with hyperbolic functionsiBelong to the posterior probability of k-th of classification.
Detailed description of the invention
Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.
Fig. 2 is a kind of subject perceptions detection model process of the object transmission method based on mixing supervisory detection of the present invention Figure.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.It mainly include domain The study of immutable object and subject perceptions detection model.
Wherein, the study of domain immutable object refers to that the method based on convolutional neural networks (CNN) builds object knowledge Mould, and object knowledge model is directly trained with the training program from bottom to top that selective search generates.
Further, the object knowledge is referred to and is annotated using the bounding box in strong classification, the study of object knowledge It is carried out as a binary classification task:Having very big overlapping region to be considered as object with reference label frame, and it is overlapped smaller Region be considered as non-object.
Further, the training program refers to that in training, image and one group of region suggest inputing to multiple convolution Layer, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to 256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier Liang Ge branch.
Further, the target predictor, is expressed as Gobj, formed by fc layers, according to fiEstimation range riWhether be One object separates subject area and non-object region due to needing to be annotated with bounding box in the picture, so only using SetIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region intersected with reference label (IoU) be considered as positive example not less than 0.5 region, maximum IoU section [0.1,0.5) in region be reverse side example Son;Example positive and negative in each image is pressed 1:3 ratio keeps balance, because the quantity of negative examples is far longer than positive example Quantity;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be well It is applied to setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and target Domain.
Further, the domain classifier, is expressed as Gdom, domain may be implemented not by link field classifier and characteristic f Denaturation;Domain classifier is received from setAnd setRegion, predict the source of input area, it is binary system point Generic task, the majorized function used are:
WhereinIt is region riBelong to setProbability, in this domain classification task, from collection It closesThe region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process, The gradient of domain classifier is first reversed passes to f (multiplied by -1) again, inverts operator, in the training process, net by this gradient Network is virtually the maximum to have changed Ldom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 A region.
Fig. 2 is a kind of subject perceptions detection model process of the object transmission method based on mixing supervisory detection of the present invention Figure.
Subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate setIn weak mark The object and chaff interferent for remembering image, sequentially for each weak tag image, its region suggests first being input into object module, To obtain their target score, then it is ranked up according to score, the suggestion of the highest m% of score is chosen as subject area, Remaining region is as interference region;More case-based learnings (MIL) method is then based on further between object and chaff interferent Difference modeling, finally using quick convolutional network method (R-CNN) frame based on region to subject perceptions detection model into Row training.
Further, the modeling refers in MIL frame, for setEach of weak tag image, use Subject area constructs object data packet, constructs chaff interferent data packet with interference region, the label of both data packets isWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
Further, the training refers in the training process, the image of weak label and by selective search Suggest in the region of generationAs network input (each image include two packet:Object packet and interference packet), then network is same When suggest calculating feature for each region, and by Feature Mapping to K+1 dimensional vectorRepresent the classification in region Score, the other score of these region classes are directly used in detection performance when assessment test;Finally use cross entropy loss function pair Network carries out end-to-end training, and the weight attenuation parameter of weight w is for answering the object knowledge learnt preferably in CNN network For weak tag image;Wherein because the region in packet cannot be labeled, the other score S of region classRIt needs to be aggregated to one The classification score S of a package levelBIn carry out training pattern.
Further, the other score of the region class refers in traditional MIL setting, by operation max by highest Score of the other score of region class as package level, but operation max is wrapped only each using a region as positive example, is Loosen this limitation, uses " exp-sum-log " operator as the soft approximation of max operation;After the score for obtaining package level, benefit Packet B is calculated with hyperbolic functionsiBelong to the posterior probability of k-th of classification.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, without departing substantially from essence of the invention In the case where mind and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as of the invention Protection scope.Therefore, it includes preferred embodiment and all changes for falling into the scope of the invention that the following claims are intended to be interpreted as More and modify.

Claims (10)

1. a kind of object transmission method based on mixing supervisory detection, which is characterized in that main includes the study of domain immutable object (1);Subject perceptions detection model (two).
2. the study (one) based on domain immutable object described in claims 1, which is characterized in that with based on convolutional neural networks (CNN) method models object knowledge, and object knowledge model is directly generated from bottom to top with selective search Training program be trained.
3. based on object knowledge described in claims 2, which is characterized in that annotated using the bounding box in strong classification, target The study of knowledge is carried out as a binary classification task:Thering is very big overlapping region to be considered as object with reference label frame, And it is overlapped lesser region and is considered as non-object.
4. based on training program described in claims 2, which is characterized in that in training, image and one group of region suggestion are defeated Enter to multiple convolutional layers, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to one 256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier two Branch.
5. based on target predictor described in claims 4, which is characterized in that be expressed as Gobj, formed by fc layers, according to fiIn advance Survey region riWhether it is an object, separates subject area and non-object region due to needing to be annotated with bounding box in the picture It opens, so only using setIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region (IoU) intersected with reference label Region not less than 0.5 is considered as positive example, maximum IoU section [0.1,0.5) in region be positive examples;It will Positive and negative example presses 1 in each image:3 ratio keeps balance, because the quantity of negative examples is far longer than the number of positive example Amount;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be applied well To setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and aiming field.
6. based on domain classifier described in claims 4, which is characterized in that be expressed as Gdom, pass through link field classifier and spy Domain invariance may be implemented in property f;Domain classifier is received from setAnd setRegion, predict the source of input area, It is a binary classification task, and the majorized function used is:
WhereinIt is region riBelong to setProbability, in this domain classification task, from set The region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process, domain point The gradient of class device is first reversed passes to f (multiplied by -1) again, inverts operator by this gradient, and in the training process, network is real L is maximised on borderdom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 areas Domain.
7. based on subject perceptions detection model (two) described in claims 1, which is characterized in that arrive object detector study Object knowledge for separate setIn weak tag image object and chaff interferent, sequentially for each weak tag image, it Region suggest first be input into object module, to obtain their target score, be then ranked up according to score, score The suggestion of highest m% is chosen as subject area, and remaining region is as interference region;It is then based on more case-based learnings (MIL) Method further models the difference between object and chaff interferent, finally using the quickly convolutional network method based on region (R-CNN) frame is trained subject perceptions detection model.
8. based on modeling described in claims 6, which is characterized in that in MIL frame, for setEach of it is weak Tag image, constructs object data packet with subject area, constructs chaff interferent data packet, the mark of both data packets with interference region Label areWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
9. based on training described in claims 6, which is characterized in that in the training process, the image of weak label and Suggested by the region that selective search generatesAs network input (each image include two packet:Object packet and interference Packet), then network is that feature is suggested calculating in each region simultaneously, and by Feature Mapping to K+1 dimensional vector The classification score in region is represented, the other score of these region classes is directly used in detection performance when assessment test;Finally using friendship It pitches entropy loss function and end-to-end training is carried out to network, the weight attenuation parameter of weight w is used for the mesh for making to learn in CNN network Mark knowledge is preferably applied for weak tag image;Wherein because the region in packet cannot be labeled, the other score of region class SRNeed to be aggregated to the classification score S an of package levelBIn carry out training pattern.
10. based on the other score of region class described in claims 9, which is characterized in that in traditional MIL setting, pass through Operation max is using the score of highest zone rank as the score of package level, but operation max is only made using a region in each packet Use " exp-sum-log " operator as the soft approximation of max operation to loosen this limitation for positive example;Obtain packet level After other score, packet B is calculated using hyperbolic functionsiBelong to the posterior probability of k-th of classification.
CN201810520073.1A 2018-05-28 2018-05-28 A kind of object transmission method based on mixing supervisory detection Withdrawn CN108830300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810520073.1A CN108830300A (en) 2018-05-28 2018-05-28 A kind of object transmission method based on mixing supervisory detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810520073.1A CN108830300A (en) 2018-05-28 2018-05-28 A kind of object transmission method based on mixing supervisory detection

Publications (1)

Publication Number Publication Date
CN108830300A true CN108830300A (en) 2018-11-16

Family

ID=64145759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810520073.1A Withdrawn CN108830300A (en) 2018-05-28 2018-05-28 A kind of object transmission method based on mixing supervisory detection

Country Status (1)

Country Link
CN (1) CN108830300A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738263A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 image recognition model training method, image recognition method and device
CN113196291A (en) * 2019-01-23 2021-07-30 动态Ad有限责任公司 Automatic selection of data samples for annotation

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104573731A (en) * 2015-02-06 2015-04-29 厦门大学 Rapid target detection method based on convolutional neural network

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104063719A (en) * 2014-06-27 2014-09-24 深圳市赛为智能股份有限公司 Method and device for pedestrian detection based on depth convolutional network
CN104573731A (en) * 2015-02-06 2015-04-29 厦门大学 Rapid target detection method based on convolutional neural network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YAN LI ET.AL: "Mixed Supervised Object Detection with Robust Objectness Transfer", 《ARXIV:1802.09778V2 [CS.CV]》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113196291A (en) * 2019-01-23 2021-07-30 动态Ad有限责任公司 Automatic selection of data samples for annotation
CN110738263A (en) * 2019-10-17 2020-01-31 腾讯科技(深圳)有限公司 image recognition model training method, image recognition method and device
CN110738263B (en) * 2019-10-17 2020-12-29 腾讯科技(深圳)有限公司 Image recognition model training method, image recognition method and image recognition device
US11960571B2 (en) 2019-10-17 2024-04-16 Tencent Technology (Shenzhen) Company Limited Method and apparatus for training image recognition model, and image recognition method and apparatus

Similar Documents

Publication Publication Date Title
Rigano Using artificial intelligence to address criminal justice needs
CN109978893B (en) Training method, device, equipment and storage medium of image semantic segmentation network
CN108710892B (en) Cooperative immune defense method for multiple anti-picture attacks
Liu et al. Computational and statistical methods for analysing big data with applications
CN112163638A (en) Defense method, device, equipment and medium for image classification model backdoor attack
CN109697503A (en) The KI module and operation method of fuzzy parameter
CN110139067A (en) A kind of wild animal monitoring data management information system
CN113642474A (en) Hazardous area personnel monitoring method based on YOLOV5
CN108830300A (en) A kind of object transmission method based on mixing supervisory detection
CN112308093B (en) Air quality perception method based on image recognition, model training method and system
WO2019180310A1 (en) A method, an apparatus and a computer program product for an interpretable neural network representation
Apeagyei et al. Evaluation of deep learning models for classification of asphalt pavement distresses
Haffar et al. Explaining image misclassification in deep learning via adversarial examples
CN113343123A (en) Training method and detection method for generating confrontation multiple relation graph network
Chandra et al. RIPA: Real-time image privacy alert system
CN116680633B (en) Abnormal user detection method, system and storage medium based on multitask learning
Greene et al. Natural scene categorization from conjunctions of ecological global properties
Sheng et al. Backdoor attack of graph neural networks based on subgraph trigger
CN115758337A (en) Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium
CN110263842A (en) For the neural network training method of target detection, device, equipment, medium
Touazi et al. A k-nearest neighbor approach to improve change detection from remote sensing: Application to optical aerial images
CN112929380B (en) Trojan horse communication detection method and system combining meta-learning and spatiotemporal feature fusion
Zhou et al. UAV forest fire detection based on lightweight YOLOv5 model
Praneash et al. Forest fire detection using computer vision
Tian Detect and repair errors for DNN-based software

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20181116

WW01 Invention patent application withdrawn after publication