CN107273922A - A kind of screening sample and weighing computation method learnt towards multi-source instance migration - Google Patents
A kind of screening sample and weighing computation method learnt towards multi-source instance migration Download PDFInfo
- Publication number
- CN107273922A CN107273922A CN201710406537.1A CN201710406537A CN107273922A CN 107273922 A CN107273922 A CN 107273922A CN 201710406537 A CN201710406537 A CN 201710406537A CN 107273922 A CN107273922 A CN 107273922A
- Authority
- CN
- China
- Prior art keywords
- sample
- source
- distance
- individual
- target domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/211—Selection of the most significant subset of features
- G06F18/2113—Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention belongs to artificial intelligence field, a kind of screening sample and weighing computation method learnt towards multi-source instance migration is disclosed.A large amount of without label data and there is label data on a small quantity to make full use of in target domain, reject simultaneously in source domain to target domain less related sample, the invention defines the distance between sample on the basis of covariance and weighted euclidean distance first, with each source sample of average sample distance definition and the characteristic distance of target domain;Secondly, the invention filters out in source domain the sample more related to target domain according to characteristic distance and with initial weight of the distance as sample, this feature distance is updated to one of foundation of source sample weights as dynamic, more effectively support is provided for the study of multi-source instance migration.
Description
Technical field
The invention belongs to artificial intelligence, it is related to a kind of instance migration learning method towards multi-source.
Background technology
Transfer learning is to remove but label associated to one using one or several fields for having sufficient exemplar
Process and method that the not enough emerging field of sample is learnt.There are the field referred to as source domain of sufficient exemplar, label sample
The emerging field that this is not enough is referred to as target domain.The transfer learning method of Case-based Reasoning migration, main is exactly in source domain data
Concentrate to find and can improve the sample data of target domain classifier performance.The key of this kind of algorithm is the label sample using target
This assigns weights to each sample in source domain and target domain and constantly updated by iterative manner.However, working as target domain
In markd training sample very little when, a small amount of exemplar of target domain will be submerged in substantial amounts of source domain sample,
Prevent the training sample of target domain to the contribution of the structure of final classification device from fully demonstrating.Meanwhile, source domain there may be
The larger sample with differences between samples in target domain, these data can not only make the efficiency step-down of classification, or even can be to final point
The result of class brings negative impact.It is contemplated that make full use of in target domain without label data and having label data, and
The sample for having larger difference in source domain with target domain is rejected before the iteration, is made during iteration using characteristic distance
One of foundation for updating sample weights for dynamic, further optimizes existing transfer learning algorithm.
The content of the invention
The present invention learns towards multi-source instance migration, it is intended to make full use of a large amount of without label data and a small amount of in target domain
Have label data, at the same reject in source domain to target domain less related sample.The invention provides one kind in many source instances
Screening sample and weighing computation method in transfer learning.
1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that including following step
Suddenly:
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
;;
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special
Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n =
{w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤k≤n;
, represent iIndividual sampleWith jIndividual sampleBetween
Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is;
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target,
...,During repetitive exercise grader, an index of source sample weights is updated.
Brief description of the drawings
Fig. 1 is to utilize dash area in the multi-source instance migration learning process figure after present invention improvement, figure to be the present invention
To the optimization of existing multi-source instance migration learning method and improved invention.
Embodiment
With reference to accompanying drawing 1, to the embodiment provided according to the present invention, under describing in detail so.
As shown in figure 1, the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that first
Sample initial weight is calculated according to following steps and completes to select the brush of source sample;
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
;;
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special
Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n =
{w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤k≤n;
, represent iIndividual sampleWith jIndividual sampleBetween
Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is;
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target,...,During repetitive exercise grader, source sample weights are updated
An index.
Claims (1)
1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that comprise the following steps:
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
;;
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special
Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n =
{w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤k≤n;
, represent iIndividual sampleWith jIndividual sampleBetween
Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is;
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target,...,During repetitive exercise grader, source sample weights are updated
An index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710406537.1A CN107273922A (en) | 2017-06-02 | 2017-06-02 | A kind of screening sample and weighing computation method learnt towards multi-source instance migration |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710406537.1A CN107273922A (en) | 2017-06-02 | 2017-06-02 | A kind of screening sample and weighing computation method learnt towards multi-source instance migration |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107273922A true CN107273922A (en) | 2017-10-20 |
Family
ID=60065709
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710406537.1A Pending CN107273922A (en) | 2017-06-02 | 2017-06-02 | A kind of screening sample and weighing computation method learnt towards multi-source instance migration |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107273922A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846444A (en) * | 2018-06-23 | 2018-11-20 | 重庆大学 | The multistage depth migration learning method excavated towards multi-source data |
CN109886303A (en) * | 2019-01-21 | 2019-06-14 | 武汉大学 | A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing |
CN110398986A (en) * | 2019-04-28 | 2019-11-01 | 清华大学 | A kind of intensive woods cognition technology of unmanned plane of multi-source data migration |
CN111261299A (en) * | 2020-01-14 | 2020-06-09 | 之江实验室 | Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning |
CN113420824A (en) * | 2021-07-03 | 2021-09-21 | 上海理想信息产业(集团)有限公司 | Pre-training data screening and training method and system for industrial vision application |
-
2017
- 2017-06-02 CN CN201710406537.1A patent/CN107273922A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108846444A (en) * | 2018-06-23 | 2018-11-20 | 重庆大学 | The multistage depth migration learning method excavated towards multi-source data |
CN109886303A (en) * | 2019-01-21 | 2019-06-14 | 武汉大学 | A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing |
CN110398986A (en) * | 2019-04-28 | 2019-11-01 | 清华大学 | A kind of intensive woods cognition technology of unmanned plane of multi-source data migration |
CN111261299A (en) * | 2020-01-14 | 2020-06-09 | 之江实验室 | Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning |
CN113420824A (en) * | 2021-07-03 | 2021-09-21 | 上海理想信息产业(集团)有限公司 | Pre-training data screening and training method and system for industrial vision application |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107273922A (en) | A kind of screening sample and weighing computation method learnt towards multi-source instance migration | |
Parsopoulos et al. | Objective function" stretching" to alleviate convergence to local minima | |
CN103729678B (en) | A kind of based on navy detection method and the system of improving DBN model | |
CN102520341B (en) | Analog circuit fault diagnosis method based on Bayes-KFCM (Kernelized Fuzzy C-Means) algorithm | |
CN106095872A (en) | Answer sort method and device for Intelligent Answer System | |
CN108304316B (en) | Software defect prediction method based on collaborative migration | |
CN108062572A (en) | A kind of Fault Diagnosis Method of Hydro-generating Unit and system based on DdAE deep learning models | |
Du et al. | Time series prediction using evolving radial basis function networks with new encoding scheme | |
CN103955702A (en) | SAR image terrain classification method based on depth RBF network | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN103886330A (en) | Classification method based on semi-supervised SVM ensemble learning | |
CN103473598A (en) | Extreme learning machine based on length-changing particle swarm optimization algorithm | |
CN106503731A (en) | A kind of based on conditional mutual information and the unsupervised feature selection approach of K means | |
CN102521656A (en) | Integrated transfer learning method for classification of unbalance samples | |
CN104732249A (en) | Deep learning image classification method based on popular learning and chaotic particle swarms | |
CN110309854A (en) | A kind of signal modulation mode recognition methods and device | |
CN110009030A (en) | Sewage treatment method for diagnosing faults based on stacking meta learning strategy | |
CN110287985B (en) | Depth neural network image identification method based on variable topology structure with variation particle swarm optimization | |
CN110363230A (en) | Stacking integrated sewage handling failure diagnostic method based on weighting base classifier | |
Zhang et al. | Evolving neural network classifiers and feature subset using artificial fish swarm | |
CN110298434A (en) | A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED | |
CN109840413A (en) | A kind of detection method for phishing site and device | |
CN107220663A (en) | A kind of image automatic annotation method classified based on semantic scene | |
CN106569954A (en) | Method based on KL divergence for predicting multi-source software defects | |
CN105512675A (en) | Memory multi-point crossover gravitational search-based feature selection method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171020 |