CN107273922A

CN107273922A - A kind of screening sample and weighing computation method learnt towards multi-source instance migration

Info

Publication number: CN107273922A
Application number: CN201710406537.1A
Authority: CN
Inventors: 李维华; 金宸; 姬晨; 王顺芳
Original assignee: Yunnan University YNU
Current assignee: Yunnan University YNU
Priority date: 2017-06-02
Filing date: 2017-06-02
Publication date: 2017-10-20

Abstract

The invention belongs to artificial intelligence field, a kind of screening sample and weighing computation method learnt towards multi-source instance migration is disclosed.A large amount of without label data and there is label data on a small quantity to make full use of in target domain, reject simultaneously in source domain to target domain less related sample, the invention defines the distance between sample on the basis of covariance and weighted euclidean distance first, with each source sample of average sample distance definition and the characteristic distance of target domain；Secondly, the invention filters out in source domain the sample more related to target domain according to characteristic distance and with initial weight of the distance as sample, this feature distance is updated to one of foundation of source sample weights as dynamic, more effectively support is provided for the study of multi-source instance migration.

Description

A kind of screening sample and weighing computation method learnt towards multi-source instance migration

Technical field

The invention belongs to artificial intelligence, it is related to a kind of instance migration learning method towards multi-source.

Background technology

Transfer learning is to remove but label associated to one using one or several fields for having sufficient exemplar Process and method that the not enough emerging field of sample is learnt.There are the field referred to as source domain of sufficient exemplar, label sample The emerging field that this is not enough is referred to as target domain.The transfer learning method of Case-based Reasoning migration, main is exactly in source domain data Concentrate to find and can improve the sample data of target domain classifier performance.The key of this kind of algorithm is the label sample using target This assigns weights to each sample in source domain and target domain and constantly updated by iterative manner.However, working as target domain In markd training sample very little when, a small amount of exemplar of target domain will be submerged in substantial amounts of source domain sample, Prevent the training sample of target domain to the contribution of the structure of final classification device from fully demonstrating.Meanwhile, source domain there may be The larger sample with differences between samples in target domain, these data can not only make the efficiency step-down of classification, or even can be to final point The result of class brings negative impact.It is contemplated that make full use of in target domain without label data and having label data, and The sample for having larger difference in source domain with target domain is rejected before the iteration, is made during iteration using characteristic distance One of foundation for updating sample weights for dynamic, further optimizes existing transfer learning algorithm.

The content of the invention

The present invention learns towards multi-source instance migration, it is intended to make full use of a large amount of without label data and a small amount of in target domain Have label data, at the same reject in source domain to target domain less related sample.The invention provides one kind in many source instances Screening sample and weighing computation method in transfer learning.

1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that including following step Suddenly：

Step 1：Input feature set X=x ₁,x ₂,……,x _mAnd tag attributesyOnnIndividual source domain data set,...,There is label data collection with a target domain；With one feature set X=x ₁,x ₂,……,x _mOn target domain without label data collection

；；

Step 2：In data setIt is upper to calculate covariance matrix respectively, and its it is special Value indicativev ₁={v ₁₁,v ₁₂,…,v _1m,v ₂={v ₂₁,v ₂₂,…,v _2m...,v _n={v _n1,v _n2,…,v _nm}；

Step 3：WillNormalization is obtained respectivelyw ₁={w ₁₁,w ₁₂,…,w _1m,w ₂={w ₂₁,w ₂₂,…,w _2m...,w _n= {w _n1,w _n2,…,w _nm}；

Step 4：Calculate respectivelyWith'snIndividual distance matrixR ¹,R ²,…,R ⁿ,

, 1≤k≤n；

, represent iIndividual sampleWith jIndividual sampleBetween Distance；

Step 5：Exist respectivelyR ¹,R ²,…,R ⁿIn, use every a lineiAverage distance, definitionIniIndividual sample withCharacteristic distance, respectively obtain for,...,, i.e.,It isR ^kIniAverage value in row；Calculate respectivelyIt is middle minimumpIndividual value, and according to Middle selection correspondencepIndividual sample is obtained, that is, existIt is middle selection withAverage distance is nearestpIndividual sample is；

Step 6：Calculate respectivelyInitial sample weights vector,...,, wherein, willNormalization；

Step 7：Use the n characteristic distance vector of n source sample to target, ...,During repetitive exercise grader, an index of source sample weights is updated.

Brief description of the drawings

Fig. 1 is to utilize dash area in the multi-source instance migration learning process figure after present invention improvement, figure to be the present invention To the optimization of existing multi-source instance migration learning method and improved invention.

Embodiment

With reference to accompanying drawing 1, to the embodiment provided according to the present invention, under describing in detail so.

As shown in figure 1, the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that first Sample initial weight is calculated according to following steps and completes to select the brush of source sample；

；；

, 1≤k≤n；

, represent iIndividual sampleWith jIndividual sampleBetween Distance；

Step 7：Use the n characteristic distance vector of n source sample to target,...,During repetitive exercise grader, source sample weights are updated An index.

Claims

1. the screening sample and weighing computation method that learn towards multi-source instance migration, it is characterised in that comprise the following steps：

；；

, 1≤k≤n；

, represent iIndividual sampleWith jIndividual sampleBetween Distance；