CN107273922A - A kind of screening sample and weighing computation method learnt towards multi-source instance migration - Google Patents

A kind of screening sample and weighing computation method learnt towards multi-source instance migration Download PDF

Info

Publication number
CN107273922A
CN107273922A CN201710406537.1A CN201710406537A CN107273922A CN 107273922 A CN107273922 A CN 107273922A CN 201710406537 A CN201710406537 A CN 201710406537A CN 107273922 A CN107273922 A CN 107273922A
Authority
CN
China
Prior art keywords
sample
source
distance
individual
target domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710406537.1A
Other languages
Chinese (zh)
Inventor
李维华
金宸
姬晨
王顺芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201710406537.1A priority Critical patent/CN107273922A/en
Publication of CN107273922A publication Critical patent/CN107273922A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/211Selection of the most significant subset of features
    • G06F18/2113Selection of the most significant subset of features by ranking or filtering the set of features, e.g. using a measure of variance or of feature cross-correlation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to artificial intelligence field, a kind of screening sample and weighing computation method learnt towards multi-source instance migration is disclosed.A large amount of without label data and there is label data on a small quantity to make full use of in target domain, reject simultaneously in source domain to target domain less related sample, the invention defines the distance between sample on the basis of covariance and weighted euclidean distance first, with each source sample of average sample distance definition and the characteristic distance of target domain;Secondly, the invention filters out in source domain the sample more related to target domain according to characteristic distance and with initial weight of the distance as sample, this feature distance is updated to one of foundation of source sample weights as dynamic, more effectively support is provided for the study of multi-source instance migration.

Description

A kind of screening sample and weighing computation method learnt towards multi-source instance migration
Technical field
The invention belongs to artificial intelligence, it is related to a kind of instance migration learning method towards multi-source.
Background technology
Transfer learning is to remove but label associated to one using one or several fields for having sufficient exemplar Process and method that the not enough emerging field of sample is learnt.There are the field referred to as source domain of sufficient exemplar, label sample The emerging field that this is not enough is referred to as target domain.The transfer learning method of Case-based Reasoning migration, main is exactly in source domain data Concentrate to find and can improve the sample data of target domain classifier performance.The key of this kind of algorithm is the label sample using target This assigns weights to each sample in source domain and target domain and constantly updated by iterative manner.However, working as target domain In markd training sample very little when, a small amount of exemplar of target domain will be submerged in substantial amounts of source domain sample, Prevent the training sample of target domain to the contribution of the structure of final classification device from fully demonstrating.Meanwhile, source domain there may be The larger sample with differences between samples in target domain, these data can not only make the efficiency step-down of classification, or even can be to final point The result of class brings negative impact.It is contemplated that make full use of in target domain without label data and having label data, and The sample for having larger difference in source domain with target domain is rejected before the iteration, is made during iteration using characteristic distance One of foundation for updating sample weights for dynamic, further optimizes existing transfer learning algorithm.
The content of the invention
The present invention learns towards multi-source instance migration, it is intended to make full use of a large amount of without label data and a small amount of in target domain Have label data, at the same reject in source domain to target domain less related sample.The invention provides one kind in many source instances Screening sample and weighing computation method in transfer learning.
1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that including following step Suddenly:
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n = {w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤kn
, represent iIndividual sampleWith jIndividual sampleBetween Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target, ...,During repetitive exercise grader, an index of source sample weights is updated.
Brief description of the drawings
Fig. 1 is to utilize dash area in the multi-source instance migration learning process figure after present invention improvement, figure to be the present invention To the optimization of existing multi-source instance migration learning method and improved invention.
Embodiment
With reference to accompanying drawing 1, to the embodiment provided according to the present invention, under describing in detail so.
As shown in figure 1, the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that first Sample initial weight is calculated according to following steps and completes to select the brush of source sample;
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n = {w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤kn
, represent iIndividual sampleWith jIndividual sampleBetween Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target,...,During repetitive exercise grader, source sample weights are updated An index.

Claims (1)

1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that comprise the following steps:
Step 1:Input feature set X=x 1,x 2,……,x m And tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain;With one feature set X=x 1,x 2,……,x m On target domain without label data collection
Step 2:In data setIt is upper to calculate covariance matrix respectively, and its it is special Value indicativev 1={v 11,v 12,…,v 1m ,v 2={v 21,v 22,…,v 2m ...,v n ={v n1,v n2,…,v nm };
Step 3:WillNormalization is obtained respectivelyw 1={w 11,w 12,…,w 1m ,w 2={w 21,w 22,…,w 2m ...,w n = {w n1,w n2,…,w nm };
Step 4:Calculate respectivelyWith'snIndividual distance matrixR 1,R 2,…,R n ,
, 1≤kn
, represent iIndividual sampleWith jIndividual sampleBetween Distance;
Step 5:Exist respectivelyR 1,R 2,…,R n In, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR k IniAverage value in row;Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is
Step 6:Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization;
Step 7:Use the n characteristic distance vector of n source sample to target,...,During repetitive exercise grader, source sample weights are updated An index.
CN201710406537.1A 2017-06-02 2017-06-02 A kind of screening sample and weighing computation method learnt towards multi-source instance migration Pending CN107273922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710406537.1A CN107273922A (en) 2017-06-02 2017-06-02 A kind of screening sample and weighing computation method learnt towards multi-source instance migration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710406537.1A CN107273922A (en) 2017-06-02 2017-06-02 A kind of screening sample and weighing computation method learnt towards multi-source instance migration

Publications (1)

Publication Number Publication Date
CN107273922A true CN107273922A (en) 2017-10-20

Family

ID=60065709

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710406537.1A Pending CN107273922A (en) 2017-06-02 2017-06-02 A kind of screening sample and weighing computation method learnt towards multi-source instance migration

Country Status (1)

Country Link
CN (1) CN107273922A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846444A (en) * 2018-06-23 2018-11-20 重庆大学 The multistage depth migration learning method excavated towards multi-source data
CN109886303A (en) * 2019-01-21 2019-06-14 武汉大学 A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing
CN110398986A (en) * 2019-04-28 2019-11-01 清华大学 A kind of intensive woods cognition technology of unmanned plane of multi-source data migration
CN111261299A (en) * 2020-01-14 2020-06-09 之江实验室 Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning
CN113420824A (en) * 2021-07-03 2021-09-21 上海理想信息产业(集团)有限公司 Pre-training data screening and training method and system for industrial vision application

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108846444A (en) * 2018-06-23 2018-11-20 重庆大学 The multistage depth migration learning method excavated towards multi-source data
CN109886303A (en) * 2019-01-21 2019-06-14 武汉大学 A kind of TrAdaboost sample migration aviation image classification method based on particle group optimizing
CN110398986A (en) * 2019-04-28 2019-11-01 清华大学 A kind of intensive woods cognition technology of unmanned plane of multi-source data migration
CN111261299A (en) * 2020-01-14 2020-06-09 之江实验室 Multi-center collaborative cancer prognosis prediction system based on multi-source transfer learning
CN113420824A (en) * 2021-07-03 2021-09-21 上海理想信息产业(集团)有限公司 Pre-training data screening and training method and system for industrial vision application

Similar Documents

Publication Publication Date Title
CN107273922A (en) A kind of screening sample and weighing computation method learnt towards multi-source instance migration
Parsopoulos et al. Objective function" stretching" to alleviate convergence to local minima
CN103729678B (en) A kind of based on navy detection method and the system of improving DBN model
CN102520341B (en) Analog circuit fault diagnosis method based on Bayes-KFCM (Kernelized Fuzzy C-Means) algorithm
CN106095872A (en) Answer sort method and device for Intelligent Answer System
CN108304316B (en) Software defect prediction method based on collaborative migration
CN108062572A (en) A kind of Fault Diagnosis Method of Hydro-generating Unit and system based on DdAE deep learning models
Du et al. Time series prediction using evolving radial basis function networks with new encoding scheme
CN103955702A (en) SAR image terrain classification method based on depth RBF network
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN103886330A (en) Classification method based on semi-supervised SVM ensemble learning
CN103473598A (en) Extreme learning machine based on length-changing particle swarm optimization algorithm
CN106503731A (en) A kind of based on conditional mutual information and the unsupervised feature selection approach of K means
CN102521656A (en) Integrated transfer learning method for classification of unbalance samples
CN104732249A (en) Deep learning image classification method based on popular learning and chaotic particle swarms
CN110309854A (en) A kind of signal modulation mode recognition methods and device
CN110009030A (en) Sewage treatment method for diagnosing faults based on stacking meta learning strategy
CN110287985B (en) Depth neural network image identification method based on variable topology structure with variation particle swarm optimization
CN110363230A (en) Stacking integrated sewage handling failure diagnostic method based on weighting base classifier
Zhang et al. Evolving neural network classifiers and feature subset using artificial fish swarm
CN110298434A (en) A kind of integrated deepness belief network based on fuzzy division and FUZZY WEIGHTED
CN109840413A (en) A kind of detection method for phishing site and device
CN107220663A (en) A kind of image automatic annotation method classified based on semantic scene
CN106569954A (en) Method based on KL divergence for predicting multi-source software defects
CN105512675A (en) Memory multi-point crossover gravitational search-based feature selection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171020