CN108830300A - A kind of object transmission method based on mixing supervisory detection - Google Patents
A kind of object transmission method based on mixing supervisory detection Download PDFInfo
- Publication number
- CN108830300A CN108830300A CN201810520073.1A CN201810520073A CN108830300A CN 108830300 A CN108830300 A CN 108830300A CN 201810520073 A CN201810520073 A CN 201810520073A CN 108830300 A CN108830300 A CN 108830300A
- Authority
- CN
- China
- Prior art keywords
- region
- domain
- score
- training
- packet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
Abstract
A kind of object transmission method based on mixing supervisory detection proposed in the present invention, main contents include:The study of domain immutable object, subject perceptions detection model, its process is, first learn the constant object knowledge in domain from strong tag image and weak tag image, then with the annotation box of strong tag image come training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and domain invariance is realized by the gradient of domain classifier, object and chaff interferent are roughly distinguished followed by object knowledge, finally, further discriminating between object and chaff interferent using the model based on more case-based learnings (MIL) method.The present invention solves previous method for checking object and needs a large amount of bounding box annotation, and it is easy because object and chaff interferent, which cannot be distinguished, leads to the problem of detecting failure, the object knowledge learnt from weak tag image can effectively be transmitted, the demand annotated to bounding box is reduced, while improving the ability of identification object and chaff interferent.
Description
Technical field
The present invention relates to computer vision fields, more particularly, to a kind of object transmission side based on mixing supervisory detection
Method.
Background technique
Mixing supervisory detection technology refer to using bounding box annotation strong tag image and only image level label it is weak
Tag image trains detection network, allows the network to the object and exclusive PCR object of identification new category.Mixing supervision inspection
Survey technology can be applied to field of traffic, by rapidly identifying license plate number and face automatically, can efficiently disobey recording traffic
Rule behavior;In safety-security area, mixing supervisory detection technology can be used for identifying the lawbreaker in the delinquent events such as theft
And automatic alarm;In military field, environmental disturbances can quickly be excluded by mixing supervisory detection technology, identify the enemy army of camouflage.So
And previous method for checking object needs a large amount of bounding box annotation, and is easy because indistinguishable object and chaff interferent are led
Cause detection failure.
A kind of object transmission method based on mixing supervisory detection is proposed in the present invention, first from strong tag image and weak mark
The object knowledge that study domain is constant in image is remembered, then with the annotation box of strong tag image come training objective fallout predictor, with weak mark
The no noting frame of note image carrys out training domain classifier, and domain invariance is realized by the gradient of domain classifier, followed by mesh
Mark knowledge roughly distinguishes object and chaff interferent, finally, being further discriminated between using the model based on more case-based learnings (MIL) method
Object and chaff interferent.The present invention can effectively transmit the object knowledge learnt from weak tag image, reduce and infuse to bounding box
The demand released, while improving the ability of identification object and chaff interferent.
Summary of the invention
Need a large amount of bounding box annotation for previous method for checking object, and be easy because cannot be distinguished object and
Chaff interferent leads to the problem of detection failure, and the purpose of the present invention is to provide a kind of object transmission sides based on mixing supervisory detection
Method first learns the constant object knowledge in domain, then with the annotation box of strong tag image from strong tag image and weak tag image
Carry out training objective fallout predictor, with the no noting frame of weak tag image come training domain classifier, and by the gradient of domain classifier come
It realizes domain invariance, roughly distinguishes object and chaff interferent followed by object knowledge, finally, using more case-based learnings are based on
(MIL) model of method further discriminates between object and chaff interferent.
To solve the above problems, the present invention provides a kind of object transmission method based on mixing supervisory detection, it is main interior
Appearance includes:
(1) study of domain immutable object;
(2) subject perceptions detection model.
Wherein, the study of domain immutable object refers to that the method based on convolutional neural networks (CNN) builds object knowledge
Mould, and object knowledge model is directly trained with the training program from bottom to top that selective search generates.
Further, the object knowledge is referred to and is annotated using the bounding box in strong classification, the study of object knowledge
It is carried out as a binary classification task:Having very big overlapping region to be considered as object with reference label frame, and it is overlapped smaller
Region be considered as non-object.
Further, the training program refers to that in training, image and one group of region suggest inputing to multiple convolution
Layer, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to 256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier Liang Ge branch.
Further, the target predictor, is expressed as Gobj, formed by fc layers, according to fiEstimation range riWhether be
One object separates subject area and non-object region due to needing to be annotated with bounding box in the picture, so only using
SetIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region intersected with reference label
(IoU) be considered as positive example not less than 0.5 region, maximum IoU section [0.1,0.5) in region be reverse side example
Son;Example positive and negative in each image is pressed 1:3 ratio keeps balance, because the quantity of negative examples is far longer than positive example
Quantity;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be well
It is applied to setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and target
Domain.
Further, the domain classifier, is expressed as Gdom, domain may be implemented not by link field classifier and characteristic f
Denaturation;Domain classifier is received from setAnd setRegion, predict the source of input area, it is binary system point
Generic task, the majorized function used are:
WhereinIt is region riBelong to setProbability, in this domain classification task, from collection
It closesThe region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process,
The gradient of domain classifier is first reversed passes to f (multiplied by -1) again, inverts operator, in the training process, net by this gradient
Network is virtually the maximum to have changed Ldom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192
A region.
Wherein, the subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate
SetIn weak tag image object and chaff interferent, sequentially for each weak tag image, its region suggests first being entered
It into object module, to obtain their target score, is then ranked up according to score, the suggestion quilt of the highest m% of score
It is selected as subject area, remaining region is as interference region;Be then based on more case-based learnings (MIL) method further to object and
Difference modeling between chaff interferent, finally using quickly convolutional network method (R-CNN) frame based on region to object sense
Know that detection model is trained.
Further, the modeling refers in MIL frame, for setEach of weak tag image, use
Subject area constructs object data packet, constructs chaff interferent data packet with interference region, the label of both data packets isWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
Further, the training refers in the training process, the image of weak label and by selective search
Suggest in the region of generationAs network input (each image include two packet:Object packet and interference packet), then network is same
When suggest calculating feature for each region, and by Feature Mapping to K+1 dimensional vectorRepresent the classification in region
Score, the other score of these region classes are directly used in detection performance when assessment test;Finally use cross entropy loss function pair
Network carries out end-to-end training, and the weight attenuation parameter of weight w is for answering the object knowledge learnt preferably in CNN network
For weak tag image;Wherein because the region in packet cannot be labeled, the other score S of region classRIt needs to be aggregated to one
The classification score S of a package levelBIn carry out training pattern.
Further, the other score of the region class refers in traditional MIL setting, by operation max by highest
Score of the other score of region class as package level, but operation max is wrapped only each using a region as positive example, is
Loosen this limitation, uses " exp-sum-log " operator as the soft approximation of max operation;After the score for obtaining package level, benefit
Packet B is calculated with hyperbolic functionsiBelong to the posterior probability of k-th of classification.
Detailed description of the invention
Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.
Fig. 2 is a kind of subject perceptions detection model process of the object transmission method based on mixing supervisory detection of the present invention
Figure.
Specific embodiment
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
It mutually combines, invention is further described in detail in the following with reference to the drawings and specific embodiments.
Fig. 1 is a kind of system flow chart of the object transmission method based on mixing supervisory detection of the present invention.It mainly include domain
The study of immutable object and subject perceptions detection model.
Wherein, the study of domain immutable object refers to that the method based on convolutional neural networks (CNN) builds object knowledge
Mould, and object knowledge model is directly trained with the training program from bottom to top that selective search generates.
Further, the object knowledge is referred to and is annotated using the bounding box in strong classification, the study of object knowledge
It is carried out as a binary classification task:Having very big overlapping region to be considered as object with reference label frame, and it is overlapped smaller
Region be considered as non-object.
Further, the training program refers to that in training, image and one group of region suggest inputing to multiple convolution
Layer, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to 256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier Liang Ge branch.
Further, the target predictor, is expressed as Gobj, formed by fc layers, according to fiEstimation range riWhether be
One object separates subject area and non-object region due to needing to be annotated with bounding box in the picture, so only using
SetIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region intersected with reference label
(IoU) be considered as positive example not less than 0.5 region, maximum IoU section [0.1,0.5) in region be reverse side example
Son;Example positive and negative in each image is pressed 1:3 ratio keeps balance, because the quantity of negative examples is far longer than positive example
Quantity;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be well
It is applied to setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and target
Domain.
Further, the domain classifier, is expressed as Gdom, domain may be implemented not by link field classifier and characteristic f
Denaturation;Domain classifier is received from setAnd setRegion, predict the source of input area, it is binary system point
Generic task, the majorized function used are:
WhereinIt is region riBelong to setProbability, in this domain classification task, from collection
It closesThe region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process,
The gradient of domain classifier is first reversed passes to f (multiplied by -1) again, inverts operator, in the training process, net by this gradient
Network is virtually the maximum to have changed Ldom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn 64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192
A region.
Fig. 2 is a kind of subject perceptions detection model process of the object transmission method based on mixing supervisory detection of the present invention
Figure.
Subject perceptions detection model refers to that the object knowledge for learning object detector is used to separate setIn weak mark
The object and chaff interferent for remembering image, sequentially for each weak tag image, its region suggests first being input into object module,
To obtain their target score, then it is ranked up according to score, the suggestion of the highest m% of score is chosen as subject area,
Remaining region is as interference region;More case-based learnings (MIL) method is then based on further between object and chaff interferent
Difference modeling, finally using quick convolutional network method (R-CNN) frame based on region to subject perceptions detection model into
Row training.
Further, the modeling refers in MIL frame, for setEach of weak tag image, use
Subject area constructs object data packet, constructs chaff interferent data packet with interference region, the label of both data packets isWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
Further, the training refers in the training process, the image of weak label and by selective search
Suggest in the region of generationAs network input (each image include two packet:Object packet and interference packet), then network is same
When suggest calculating feature for each region, and by Feature Mapping to K+1 dimensional vectorRepresent the classification in region
Score, the other score of these region classes are directly used in detection performance when assessment test;Finally use cross entropy loss function pair
Network carries out end-to-end training, and the weight attenuation parameter of weight w is for answering the object knowledge learnt preferably in CNN network
For weak tag image;Wherein because the region in packet cannot be labeled, the other score S of region classRIt needs to be aggregated to one
The classification score S of a package levelBIn carry out training pattern.
Further, the other score of the region class refers in traditional MIL setting, by operation max by highest
Score of the other score of region class as package level, but operation max is wrapped only each using a region as positive example, is
Loosen this limitation, uses " exp-sum-log " operator as the soft approximation of max operation;After the score for obtaining package level, benefit
Packet B is calculated with hyperbolic functionsiBelong to the posterior probability of k-th of classification.
For those skilled in the art, the present invention is not limited to the details of above-described embodiment, without departing substantially from essence of the invention
In the case where mind and range, the present invention can be realized in other specific forms.In addition, those skilled in the art can be to this hair
Bright to carry out various modification and variations without departing from the spirit and scope of the present invention, these improvements and modifications also should be regarded as of the invention
Protection scope.Therefore, it includes preferred embodiment and all changes for falling into the scope of the invention that the following claims are intended to be interpreted as
More and modify.
Claims (10)
1. a kind of object transmission method based on mixing supervisory detection, which is characterized in that main includes the study of domain immutable object
(1);Subject perceptions detection model (two).
2. the study (one) based on domain immutable object described in claims 1, which is characterized in that with based on convolutional neural networks
(CNN) method models object knowledge, and object knowledge model is directly generated from bottom to top with selective search
Training program be trained.
3. based on object knowledge described in claims 2, which is characterized in that annotated using the bounding box in strong classification, target
The study of knowledge is carried out as a binary classification task:Thering is very big overlapping region to be considered as object with reference label frame,
And it is overlapped lesser region and is considered as non-object.
4. based on training program described in claims 2, which is characterized in that in training, image and one group of region suggestion are defeated
Enter to multiple convolutional layers, the pond area-of-interest (RoI) layer and full connection (fc) layer, each region riIt is eventually mapped to one
256 n dimensional vector nsAs the internal representation of input area, then fiIt is connected to target predictor and domain classifier two
Branch.
5. based on target predictor described in claims 4, which is characterized in that be expressed as Gobj, formed by fc layers, according to fiIn advance
Survey region riWhether it is an object, separates subject area and non-object region due to needing to be annotated with bounding box in the picture
It opens, so only using setIn region carry out training objective fallout predictor, binary logic loss is as follows:
WhereinIt is region riBelong to the posterior probability of object;The region (IoU) intersected with reference label
Region not less than 0.5 is considered as positive example, maximum IoU section [0.1,0.5) in region be positive examples;It will
Positive and negative example presses 1 in each image:3 ratio keeps balance, because the quantity of negative examples is far longer than the number of positive example
Amount;Then gatheringUpper training objective fallout predictor branch, while in order to make gatheringThe target of middle study can be applied well
To setOn, target study is regard as a field adjustment, setWithIt is respectively seen as source domain and aiming field.
6. based on domain classifier described in claims 4, which is characterized in that be expressed as Gdom, pass through link field classifier and spy
Domain invariance may be implemented in property f;Domain classifier is received from setAnd setRegion, predict the source of input area,
It is a binary classification task, and the majorized function used is:
WhereinIt is region riBelong to setProbability, in this domain classification task, from set
The region of middle sampling is positive example, from setThe region of middle sampling is positive examples;
In the forward propagation process, domain classifier executes program by standard, and L is calculateddom, in back-propagation process, domain point
The gradient of class device is first reversed passes to f (multiplied by -1) again, inverts operator by this gradient, and in the training process, network is real
L is maximised on borderdom;In each trained iteration, set is extractedIn 64 random unmarked regions, setIn
64 equilibrium regions and 64 random areas, be used for training domain classifier, therefore each trained subbranch includes 192 areas
Domain.
7. based on subject perceptions detection model (two) described in claims 1, which is characterized in that arrive object detector study
Object knowledge for separate setIn weak tag image object and chaff interferent, sequentially for each weak tag image, it
Region suggest first be input into object module, to obtain their target score, be then ranked up according to score, score
The suggestion of highest m% is chosen as subject area, and remaining region is as interference region;It is then based on more case-based learnings (MIL)
Method further models the difference between object and chaff interferent, finally using the quickly convolutional network method based on region
(R-CNN) frame is trained subject perceptions detection model.
8. based on modeling described in claims 6, which is characterized in that in MIL frame, for setEach of it is weak
Tag image, constructs object data packet with subject area, constructs chaff interferent data packet, the mark of both data packets with interference region
Label areWherein interference coating is labeled asAnd object coating is labeled as corresponding object class
9. based on training described in claims 6, which is characterized in that in the training process, the image of weak label and
Suggested by the region that selective search generatesAs network input (each image include two packet:Object packet and interference
Packet), then network is that feature is suggested calculating in each region simultaneously, and by Feature Mapping to K+1 dimensional vector
The classification score in region is represented, the other score of these region classes is directly used in detection performance when assessment test;Finally using friendship
It pitches entropy loss function and end-to-end training is carried out to network, the weight attenuation parameter of weight w is used for the mesh for making to learn in CNN network
Mark knowledge is preferably applied for weak tag image;Wherein because the region in packet cannot be labeled, the other score of region class
SRNeed to be aggregated to the classification score S an of package levelBIn carry out training pattern.
10. based on the other score of region class described in claims 9, which is characterized in that in traditional MIL setting, pass through
Operation max is using the score of highest zone rank as the score of package level, but operation max is only made using a region in each packet
Use " exp-sum-log " operator as the soft approximation of max operation to loosen this limitation for positive example;Obtain packet level
After other score, packet B is calculated using hyperbolic functionsiBelong to the posterior probability of k-th of classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810520073.1A CN108830300A (en) | 2018-05-28 | 2018-05-28 | A kind of object transmission method based on mixing supervisory detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810520073.1A CN108830300A (en) | 2018-05-28 | 2018-05-28 | A kind of object transmission method based on mixing supervisory detection |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108830300A true CN108830300A (en) | 2018-11-16 |
Family
ID=64145759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810520073.1A Withdrawn CN108830300A (en) | 2018-05-28 | 2018-05-28 | A kind of object transmission method based on mixing supervisory detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108830300A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110738263A (en) * | 2019-10-17 | 2020-01-31 | 腾讯科技(深圳)有限公司 | image recognition model training method, image recognition method and device |
CN113196291A (en) * | 2019-01-23 | 2021-07-30 | 动态Ad有限责任公司 | Automatic selection of data samples for annotation |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063719A (en) * | 2014-06-27 | 2014-09-24 | 深圳市赛为智能股份有限公司 | Method and device for pedestrian detection based on depth convolutional network |
CN104573731A (en) * | 2015-02-06 | 2015-04-29 | 厦门大学 | Rapid target detection method based on convolutional neural network |
-
2018
- 2018-05-28 CN CN201810520073.1A patent/CN108830300A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104063719A (en) * | 2014-06-27 | 2014-09-24 | 深圳市赛为智能股份有限公司 | Method and device for pedestrian detection based on depth convolutional network |
CN104573731A (en) * | 2015-02-06 | 2015-04-29 | 厦门大学 | Rapid target detection method based on convolutional neural network |
Non-Patent Citations (1)
Title |
---|
YAN LI ET.AL: "Mixed Supervised Object Detection with Robust Objectness Transfer", 《ARXIV:1802.09778V2 [CS.CV]》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113196291A (en) * | 2019-01-23 | 2021-07-30 | 动态Ad有限责任公司 | Automatic selection of data samples for annotation |
CN110738263A (en) * | 2019-10-17 | 2020-01-31 | 腾讯科技(深圳)有限公司 | image recognition model training method, image recognition method and device |
CN110738263B (en) * | 2019-10-17 | 2020-12-29 | 腾讯科技(深圳)有限公司 | Image recognition model training method, image recognition method and image recognition device |
US11960571B2 (en) | 2019-10-17 | 2024-04-16 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for training image recognition model, and image recognition method and apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rigano | Using artificial intelligence to address criminal justice needs | |
CN109978893B (en) | Training method, device, equipment and storage medium of image semantic segmentation network | |
CN108710892B (en) | Cooperative immune defense method for multiple anti-picture attacks | |
Liu et al. | Computational and statistical methods for analysing big data with applications | |
CN112163638A (en) | Defense method, device, equipment and medium for image classification model backdoor attack | |
CN109697503A (en) | The KI module and operation method of fuzzy parameter | |
CN110139067A (en) | A kind of wild animal monitoring data management information system | |
CN113642474A (en) | Hazardous area personnel monitoring method based on YOLOV5 | |
CN108830300A (en) | A kind of object transmission method based on mixing supervisory detection | |
CN112308093B (en) | Air quality perception method based on image recognition, model training method and system | |
WO2019180310A1 (en) | A method, an apparatus and a computer program product for an interpretable neural network representation | |
Apeagyei et al. | Evaluation of deep learning models for classification of asphalt pavement distresses | |
Haffar et al. | Explaining image misclassification in deep learning via adversarial examples | |
CN113343123A (en) | Training method and detection method for generating confrontation multiple relation graph network | |
Chandra et al. | RIPA: Real-time image privacy alert system | |
CN116680633B (en) | Abnormal user detection method, system and storage medium based on multitask learning | |
Greene et al. | Natural scene categorization from conjunctions of ecological global properties | |
Sheng et al. | Backdoor attack of graph neural networks based on subgraph trigger | |
CN115758337A (en) | Back door real-time monitoring method based on timing diagram convolutional network, electronic equipment and medium | |
CN110263842A (en) | For the neural network training method of target detection, device, equipment, medium | |
Touazi et al. | A k-nearest neighbor approach to improve change detection from remote sensing: Application to optical aerial images | |
CN112929380B (en) | Trojan horse communication detection method and system combining meta-learning and spatiotemporal feature fusion | |
Zhou et al. | UAV forest fire detection based on lightweight YOLOv5 model | |
Praneash et al. | Forest fire detection using computer vision | |
Tian | Detect and repair errors for DNN-based software |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20181116 |
|
WW01 | Invention patent application withdrawn after publication |