CN103177114B - Based on the shift learning sorting technique across data field differentiating stream shape - Google Patents

Based on the shift learning sorting technique across data field differentiating stream shape Download PDF

Info

Publication number
CN103177114B
CN103177114B CN201310113911.0A CN201310113911A CN103177114B CN 103177114 B CN103177114 B CN 103177114B CN 201310113911 A CN201310113911 A CN 201310113911A CN 103177114 B CN103177114 B CN 103177114B
Authority
CN
China
Prior art keywords
data
field
centerdot
sigma
factor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310113911.0A
Other languages
Chinese (zh)
Other versions
CN103177114A (en
Inventor
方正
张仲非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201310113911.0A priority Critical patent/CN103177114B/en
Publication of CN103177114A publication Critical patent/CN103177114A/en
Application granted granted Critical
Publication of CN103177114B publication Critical patent/CN103177114B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of shift learning sorting technique across data field based on differentiating stream shape, comprising the following steps: input the data of each data field and the label data for training, data are set up to the adjacent map being used for spectrogram geometry and regulating; To the data of input, the adjacent map of label information and foundation, optimization aim is combined, sets up unified mathematical model; According to the mathematical model set up, the more new formula of derivation variable, upgrades the hiding factor of each dimension of each data field, the relational structure of inter-domain sharing, and regression coefficient in the mode of alternating iteration, until convergence; Utilize the parameter obtained, generic Tag Estimation is carried out to the data of aiming field, obtains the generic label to aiming field data prediction.The present invention is for learning the data manifold space obtaining a kind of discriminating, and the new expression factor has the height being conducive to classifying and differentiates structure, also maintains the original cluster manifold structure of data.

Description

Based on the shift learning sorting technique across data field differentiating stream shape
Technical field
The invention belongs to technical field of data processing, particularly a kind of shift learning sorting technique across data field based on differentiating stream shape.
Background technology
In the information age being representative with the large data of magnanimity, various data are with geometric series explosive growth, and the excavation of data potential value has become the focus of people's concern and research.No matter be internet, or mobile communication, financial field, daily life all constantly produces a large amount of data, and wherein sorting technique is a kind of effectively method of mining data potentially useful knowledge.Such as, Internet user needs to receive and dispatch a large amount of Emails every day, how to help user categorizedly by mail collating sort, automatically identifies that spam just needs sorting technique accurately and effectively to help user intelligently.And for example, on isdn router node, how effectively classification and Detection to be carried out to data stream, Timeliness coverage abnormal occurrence and trojan horse data, have great effect to the safety of maintaining network and stability.And the monitoring to customer transaction behavior in financial field and classification, contribute to the fraudulent trading behavior identifying malice, thus the heavy economic losses that it will bring can be avoided.
On the other hand, in the Data Mining Classification problem of reality, often need reliable label data as training sample.And such training data will be obtained, need a large amount of human and material resources and time.So often cause the object domain studied only have limited on a small quantity can in order to training pattern by the label data of manual sort.If but in relevant similar data field, have a certain amount of classified authentic data simultaneously, the migration of knowledge is carried out by the relation effectively utilizing different pieces of information territory, just when training data scarcity, also also can carry out modeling and Accurate classification to the data of aiming field.Moreover, for internet, although at a time, data in have sufficient label data, but along with the development of time, the data of future time instance will develop, the Future Data object after may not necessarily being adapted to by the existing model of data training before, need to readjust or train, this just will bring heavy manpower and time to drop into again.How using for reference and to utilize the information in previous time training data and knowledge, reducing the input requirement that re-training brings, the classification problem for the data field of research different time has vital meaning.The most representational shift learning technology in existing many advanced technologies, is devoted to solve the label and useful information that how to utilize other data fields exactly, carrys out the knowledge excavation problems such as the cluster of auxiliary mark object data fields, classification.
In existing shift learning text mining algorithm, a lot of researchist proposes and excavates the potential data representation factor, and the relational structure between the hiding factor utilizing the hiding Summing Factor characteristic dimension of data dimension is as the physical quantity shared between multiple territory.Relation between the many data fields set up by this shared hiding factor relationships structure, reach the effect of knowledge between migration data territory to a certain extent, can, when aiming field only has a small amount of training sample, the label data in auxiliary territory be utilized to carry out training and classifying.But hide in factor mining algorithm in the major part of shift learning technology, the hiding factor obtained lacks the identification feature being conducive to Accurate classification.Hiding the factor due to majority is obtained by the frame model of matrix decomposition associating cluster, keep in data cluster structures while, have ignored the excavation that data differentiate structure, thus lose the ability improved further for generic Accurate Prediction.Although in the process of shift learning, each dimension utilizing and have shared aiming field and auxiliary territory hides the potential contact of the factor, there is the distribution gap not between same area between the final hiding factor learnt.Especially when target data territory is identical with the classification decision function of ancillary data field, although can classify accurately to the data in auxiliary territory, but offset between the territory due to Data distribution8, sorter is in aiming field or can not reach desirable classifying quality.
In view of the shortcoming and defect existed in the existing shift learning sorting technique excavated based on the hiding factor, the shift learning sorting technique that the present invention proposes can while the good cluster structures of maintenance data, the discriminating structure of classification is beneficial in mining data, and pass through the Largest Mean difference (MaximumMeanDiscrepancy in different pieces of information territory, MMD) adjustment of distance, between the territory of the hiding factor finally obtained, deviation can greatly reduce.Thus, the problem of the shift learning classification between efficiently solving across data field.Compared to the existing shift learning sorting technique excavated based on the hiding factor, the sorter of proposition is greatly improved in accuracy rate and stability.
Summary of the invention
For solving the problem, the object of the present invention is to provide a kind of shift learning sorting technique across data field based on differentiating stream shape, for while classifying across data field shift learning, decomposed by the confederate matrix under certain constraint condition and return and differentiate that the unified of model combines, study obtains a kind of data manifold space of discriminating, the expression factor that data in this Manifold space are new has the height being conducive to classifying and differentiates structure, also maintains the original cluster manifold structure of data simultaneously.By Data distribution8 distance MMD (MaximumMeanDiscrepancy between territory, Largest Mean difference) minimize, between the territory learning the hiding factor obtained between different pieces of information territory, difference is greatly reduced, thus further increases the Stability and veracity of the shift learning sorter across data field.
For achieving the above object, technical scheme of the present invention is:
Based on the shift learning sorting technique across data field differentiating stream shape, comprise the following steps:
S1O, inputs the data of each data field and the label data for training, and data is set up to the adjacent map being used for spectrogram geometry and regulating;
S20, to the adjacent map of the data of described input, label information and foundation, by across data field confederate matrix decomposition model, differentiate regression model, combine across the optimization aim such as distance adjustment, the adjustment of stream shape geometry of data field, set up unified mathematical model;
S30, according to the mathematical model of described foundation, the more new formula of derivation variable, upgrades the hiding factor of each dimension of each data field, the relational structure of inter-domain sharing, and regression coefficient in the mode of alternating iteration, until convergence;
S40, utilizes the parameter obtained, carries out generic Tag Estimation, obtain the generic label to aiming field data prediction to the data of aiming field.
Preferably, specifically comprise the following steps in S10:
S101, input ancillary data field with target data territory training sample data, comprise the label data of ancillary data field and the label information matrix of correspondence and the data of aiming field when aiming field has a small amount of label data, input label indication information P tmatrix indicates which data of aiming field label, and inputs the label information of aiming field data simultaneously with set represent the subscript in different pieces of information territory, when the data field referred to is time, the another one data field corresponding to it is designated as
S102, utilizes the data of input to build the adjacent map of the data dimension in auxiliary territory respectively with the adjacent map of characteristic dimension limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of data x, get p=5,
The data dimension adjacent map in establishing target territory with characteristic dimension adjacent map, the limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of data x, get p=5.
Preferably, specifically comprise the following steps in S20:
S201, sets up the confederate matrix decomposition model across data field:
The data of target data territory and ancillary data field decompose in the data representation of low-dimensional simultaneously go by matrix decomposition model, and remain the structure of knowledge common between two data fields, wherein, represent π data field the low-dimensional cluster structures of feature, k mit is the cluster number of characteristic dimension; represent π data field data low-dimensional cluster structures, be also simultaneously that the low-dimensional of data is hidden and represented the factor, k nit is the cluster number of data; represent π data field in relational structure between feature class and data class, and target data territory and ancillary data field share this stable relations structure;
S202, merges and differentiates regression model, hides represent the constraint of exercising supervision property of the factor to the low-dimensional of data:
Wherein act on the regression coefficient in the image watermarking factor, label indication information P tmatrix is a diagonal matrix, represent π data field in i-th element differentiate constraint for the recurrence supervised, otherwise P ii π = 0 ;
S203, reduces the difference between target data territory and ancillary data field, introduces the adjustment of Largest Mean difference MMD distance;
Between the territory on data dimension, difference distance is defined as follows:
Between the territory in characteristic dimension, difference distance is defined as follows:
In order to reduce the difference between target data territory and ancillary data field, expect that the image watermarking obtained represents that Summing Factor feature low-dimensional cluster structures represents the factor, difference distance between the territory in respective dimension can be enable little as much as possible, thus these two distance functions are fused in the model that previous step S202 obtains go as minimizing target regulatory factor, and obtain:
S204, keeps the low dimensional manifold structure of data, according to spectrogram geometric theory, utilizes the auxiliary territory obtained in step S102 the adjacent map of data dimension, set up metric data mapping function at low dimensional manifold space estimating along geodesic slickness:
Wherein, D s v = diag ( Σ i ( W s v ) ij )
Utilize the adjacent map of the characteristic dimension in the auxiliary territory obtained in step S102, set up metric data Feature Mapping function at low dimensional manifold space estimating along geodesic slickness:
Wherein, D s u = diag ( Σ i ( W s u ) ij )
Similarly, the aiming field obtained in step S102 is utilized the adjacent map of data dimension, at aiming field on data dimension, set up metric data mapping function at low dimensional manifold space estimating along geodesic slickness:
Wherein, D t v = diag ( Σ i ( W t v ) ij )
Utilize the adjacent map of the characteristic dimension of the aiming field obtained in step S102, in characteristic dimension, set up metric data Feature Mapping function at low dimensional manifold space estimating along geodesic slickness:
Wherein, D t u = diag ( Σ i ( W t u ) ij )
S205: set up based on differentiating that the shift learning disaggregated model across data field of stream shape is as follows:
s.t.V s,V t,U s,U t,H≥0
Preferably, carry out alternating iteration in S30 specifically to comprise the following steps:
S301, upgrades auxiliary numeric field data and hides factor Ⅴ s:
Wherein B s = A T Y s P s P s T , B s + = ( | B s | + B s ) / 2 , B s - = ( | B s | - B s ) / 2 , E s = A T A V s P s P s T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S302, upgrades aiming field image watermarking factor Ⅴ t:
Wherein B t = A T Y t P t P t T , B t + = ( | B t | + B t ) / 2 , B t - = ( | B t | - B t ) / 2 , E t = A T A V t P t P t T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S303, upgrades auxiliary characteristic of field dimension low-dimensional factor U s:
S304, upgrades target domain characterization dimension low-dimensional factor U t:
S305, upgrade auxiliary sharing learning between territory and aiming field: the relational structure between the hiding factor of the hiding Summing Factor characteristic dimension of data dimension, more new formula is as follows:
wherein
S306, upgrades regression coefficient A:
wherein γ = α β .
Preferably, S40 is further comprising the steps:
Factor Ⅴ hidden by S401, the regression coefficient A that utilization obtains and aiming field document tgeneric Tag Estimation is carried out to the document of aiming field, obtains the generic label that aiming field news documents is predicted
Y ~ t = A V t ;
S402, according to often the subscript at the greatest member place of the row document factor determines the generic of these data.
Compared with prior art, beneficial effect of the present invention is as follows:
(1) sorter of the embodiment of the present invention will differentiate that regression model is introduced in the mining algorithm of the hiding factor of shift learning, make the image watermarking factor learning to obtain have the discriminating structure being beneficial to classification, thus improve distinctive and the classification accuracy of sorter;
(2) embodiment of the present invention is while the useful structure that mining data is potential, utilize difference distance (MaximumMeanDiscrepancy between minimise data territory, MMD), make difference between the territory of the hiding factor learning to obtain minimum, thus reduce not between same area because Data distribution8 drifts about the otherness brought, by the relational matrix of the cluster structures of inter-domain sharing characteristic dimension and data dimension, further overcome the large difficult problem in traditional shift learning algorithm;
(3) while the data of auxiliary territory and aiming field are carried out confederate matrix decomposition by the embodiment of the present invention, regulated by spectrogram geometry, excavate in the subspace of the hiding factor obtained remain in data manifold structure, the hiding factor learning to obtain is while having taxonomic history structure, also retains the cluster structures of raw data, thus improve anti-noise ability and the robustness of sorter;
(4) embodiment of the present invention proposes the sorter (TransferLearningClassifieronDiscriminativeManifold based on the shift learning across data field differentiating stream shape, TLCDM), and propose innovatively a set of parameter iteration effectively upgrade method carry out training classifier.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of the shift learning sorting technique across data field based on discriminating stream shape of the embodiment of the present invention.
Embodiment
In order to make object of the present invention, technical scheme and advantage clearly understand, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein only in order to explain the present invention, be not intended to limit the present invention.
On the contrary, the present invention is contained any by the substituting of making on marrow of the present invention and scope of defining of claim, amendment, equivalent method and scheme.Further, in order to make the public have a better understanding to the present invention, in hereafter details of the present invention being described, detailedly describe some specific detail sections.Do not have the description of these detail sections can understand the present invention completely for a person skilled in the art yet.
The embodiment of the present invention proposes a kind of sorter (TransferLearningClassifieronDiscriminativeManifold differentiating the shift learning across data field flowing shape, TLCDM), wherein to input data for newsletter archive data, carrying out subject classification to news data is that example is described, certainly, the sorting technique of the embodiment of the present invention also can be applied in cross-domain various types of Data classification problems, such as aiming field is video data, auxiliary territory is the image data of internet, carries out video data classification; Or aiming field and auxiliary territory are the e-mail datas of different users, carry out Spam Classification.
With reference to figure 1, be depicted as the flow chart of steps of the shift learning sorting technique across data field based on discriminating stream shape of the embodiment of the present invention, it comprises the following steps:
S10, inputs the data of each data field and the generic label data for training, and data is set up to the adjacent map being used for spectrogram geometry and regulating.Specifically comprise step S101 to S102:
S101, input ancillary data field with target data territory training sample data, comprise the label data of ancillary data field and the label information matrix of correspondence and the data of aiming field when aiming field has a small amount of generic label data, input generic label indication information P tmatrix indicates which data of aiming field label, and inputs the generic label information of aiming field data simultaneously
S102, for news data, data dimension is every section of news documents, and characteristic dimension is the text word in news, builds the document adjacent map in auxiliary territory respectively with text word adjacent map limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of object x, get p=5 here.
The document adjacent map in establishing target territory with text word adjacent map limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of object x, get p=5 here.
S20, to the adjacent map of inputted data, label information and foundation, by across data field confederate matrix decomposition model, differentiate regression model, combine across the optimization aim such as distance adjustment, the adjustment of stream shape geometry of data field, set up unified mathematical model, specifically comprise step S201 to S204:
S201, sets up the confederate matrix decomposition model across data field:
Being concise in expression, with set wherein in order to the convenience discussed and modeling represent the subscript in different pieces of information territory, when the data field referred to is time, the another one data field corresponding to it is designated as
The document of target data territory and ancillary data field and text word decompose in the data representation of low-dimensional simultaneously go by this matrix decomposition model, and remain the structure of knowledge common between two data fields.Wherein, represent π data field the low-dimensional cluster structures of text word, k mit is the cluster number of text word; represent π data field document low-dimensional cluster structures, be also simultaneously that the low-dimensional of document is hidden and represented the factor, k nit is the cluster number of document; represent π data field in relational structure between text part of speech and document class.Empirical evidence target data territory and ancillary data field share this stable relations structure.
S202, merges and differentiates regression model, hides represent the constraint of exercising supervision property of the factor to the low-dimensional of document:
Wherein act on the regression coefficient in the image watermarking factor, generic indication information P tmatrix is a diagonal matrix, represent π data field in i-th element differentiate constraint for the recurrence supervised, otherwise P ii π = 0 .
S203, reduces the difference between target data territory and ancillary data field, introduces the adjustment of Largest Mean difference (MMD) distance.
Between the territory on data dimension, difference distance is defined as follows:
Between the territory in characteristic dimension, difference distance is defined as follows:
In order to reduce the difference between target data territory and ancillary data field, expect to obtain between territory that the factor defines hidden by document, difference distance can be little as much as possible, and the low-dimensional of text word is expressed between territory that the factor defines, and difference distance can be little as much as possible.Thus these two distance functions are fused in the model that previous step S202 obtains go as minimizing target regulatory factor, and obtain:
S204, keeps the low dimensional manifold structure of data.According to spectrogram geometric theory, utilize the auxiliary territory obtained in step S102 the adjacent map of document dimension, set up the function of measuring mapping document at low dimensional manifold space estimating along geodesic slickness:
Wherein, D s v = diag ( Σ i ( W s v ) ij ) .
Utilize the adjacent map of the text word dimension in the auxiliary territory obtained in step S102, set up the function of measuring mapping text word at low dimensional manifold space estimating along geodesic slickness:
Wherein, D s u = diag ( Σ i ( W s u ) ij ) .
Similarly, the aiming field obtained in step S102 is utilized the adjacent map of document dimension, at aiming field in document dimension, set up the function of measuring mapping document at low dimensional manifold space estimating along geodesic slickness:
Wherein, D t v = diag ( Σ i ( W t v ) ij ) .
Utilize the adjacent map of the text word dimension of the aiming field obtained in step S102, in text word dimension, set up the function of measuring mapping text word at low dimensional manifold space estimating along geodesic slickness:
Wherein, D t u = diag ( Σ i ( W t u ) ij ) .
S205, sets up the shift learning disaggregated model across data field based on differentiating stream shape.
In order to make in aiming field and auxiliary territory, in data keep in each dimension stream shape space prototype structure (especially the spatial light slip of data), the constraint that the smoothing of functions of each dimension in aiming field and auxiliary territory is estimated as matrix decomposition model is regulated, is fused in unified mathematical model.Considering that the low-dimensional of each dimension obtained represents the nonnegativity of the factor and the nonnegativity of relational structure matrix simultaneously, finally obtaining the following shift learning disaggregated model across data field based on differentiating stream shape:
s.t.V s,V t,U s,U t,H≥0
Excavate by utilizing confederate matrix decomposition model above and hide the factor, utilize and differentiate that regression model improves the distinctive hiding the factor, the distance adjustment across data field is utilized to reduce the distributional difference of the hiding factor in different pieces of information territory, stream shape geometry is utilized to regulate the Local Clustering structure keeping raw data, the hiding factor learning to obtain is while having taxonomic history structure, also retains the cluster structures of raw data, thus improve anti-noise ability and the robustness of sorter.
S30, according to the mathematical model set up in S20, the more new formula of derivation variable, upgrades the hiding factor in the document of each data field and text word dimension in the mode of alternating iteration, the relational structure of inter-domain sharing, and regression coefficient, until convergence.Each iteration, specifically comprises step S301 to S306:
S301, upgrades auxiliary territory document and hides factor Ⅴ s:
Wherein B s = A T Y s P s P s T , B s + = ( | B s | + B s ) / 2 , B s - = ( | B s | - B s ) / 2 , E s = A T A V s P s P s T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S302, upgrades aiming field document and hides factor Ⅴ t:
Wherein B t = A T Y t P t P t T , , B t + = ( | B t | + B t ) / 2 , B t - = ( | B t | - B t ) / 2 , E t = A T A V t P t P t T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S303, upgrades auxiliary territory text word low-dimensional and represents factor U s:
S304, upgrades aiming field text word low-dimensional and represents factor U t:
S305, upgrades the auxiliary structure factor shared between territory and aiming field: the relationship factor between the cluster structures of document and text term clustering structure.More new formula is as follows:
wherein
S306, upgrades regression coefficient A:
wherein γ = α β
S40, utilizes the parameter obtained, carries out generic Tag Estimation, obtain the generic label to aiming field data prediction to the data of aiming field.
Specifically comprise,
S401, utilizes the regression coefficient A that obtains in S30 and aiming field document to hide factor Ⅴ tgeneric Tag Estimation is carried out to the document of aiming field, obtains the generic label that aiming field news documents is predicted
Y ~ t = A V t .
S402, according to often the subscript at the greatest member place of the row document factor determines the generic of these data.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any amendments done within the spirit and principles in the present invention, equivalent replacement and improvement etc., all should be included within protection scope of the present invention.

Claims (2)

1., based on the shift learning sorting technique across data field differentiating stream shape, it is characterized in that, comprise the following steps:
S10, inputs the data of each data field and the label data for training, and data is set up to the adjacent map being used for spectrogram geometry and regulating;
S20, to the adjacent map of the data of described input, label information and foundation, in conjunction with optimization aim, set up unified mathematical model, described optimization aim comprise across data field confederate matrix decomposition model, differentiate regression model, regulate across the distance adjustment of data field, stream shape geometry;
S30, according to the mathematical model of described foundation, the more new formula of derivation variable, upgrades the hiding factor of each dimension of each data field, the relational structure of inter-domain sharing, and regression coefficient in the mode of alternating iteration, until convergence;
S40, utilizes the parameter obtained, carries out generic Tag Estimation, obtain the generic label to aiming field data prediction to the data of aiming field;
Wherein, specifically comprise the following steps in S10:
S101, input ancillary data field D swith target data territory D ttraining sample data, comprise the label data of ancillary data field and the label information matrix of correspondence and the data of aiming field when aiming field has a small amount of label data, input label indication information P tmatrix indicates which data of aiming field label, and inputs the label information of aiming field data simultaneously represent the subscript in different pieces of information territory with set I={s, t}, when the data field referred to is π ∈ I, the another one data field corresponding to it is designated as
S102, utilizes the data of input to build the adjacent map of the data dimension in auxiliary territory respectively with the adjacent map of characteristic dimension limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of data x, get p=5,
The data dimension adjacent map in establishing target territory with characteristic dimension adjacent map, the limit weight between the point of adjacent map is as follows respectively:
Figure
Figure
Wherein N px () represents the p field of data x, get p=5;
Specifically comprise the following steps in S20:
S201, sets up the confederate matrix decomposition model across data field:
min U π , H , V π ≥ 0 Σ π ∈ I | | X π - U π HV π | | 2
The data of target data territory and ancillary data field decompose in the data representation of low-dimensional simultaneously go by matrix decomposition model, and remain the structure of knowledge common between two data fields, wherein, represent π data field D πthe low-dimensional cluster structures of feature, k mit is the cluster number of characteristic dimension; represent π data field D πdata low-dimensional cluster structures, be also simultaneously that the low-dimensional of data is hidden and represented the factor, k nit is the cluster number of data; represent π data field D πin relational structure between feature class and data class, and target data territory and ancillary data field share this stable relations structure;
S202, merges and differentiates regression model, hides represent the constraint of exercising supervision property of the factor to the low-dimensional of data:
min V π , U π , H , A Σ π ∈ 1 ( | | X π - U π HV π | | 2 + β | | Y π P π - AV π P π | | 2 ) + α | | A | | 2
Wherein act on the regression coefficient in the image watermarking factor, label indication information P tmatrix is a diagonal matrix, represent π data field D πin i-th element differentiate constraint for the recurrence supervised, otherwise P i i π = 0 ;
S203, reduces the difference between target data territory and ancillary data field, introduces Largest Mean difference MaximumMeanDiscrepancy, the adjustment of MMD distance;
Between the territory on data dimension, difference distance is defined as follows:
Dist v ( D s , D t ) = | | 1 n s Σ i = 1 n s v · i s - 1 n t Σ j = 1 n t v · j t | | 2 ;
Between the territory in characteristic dimension, difference distance is defined as follows:
Dist u ( D s , D t ) = | | 1 n s Σ i = 1 n s u i · s - 1 n t Σ j = 1 n t u j · t | | 2 ;
In order to reduce the difference between target data territory and ancillary data field, expect that the image watermarking obtained represents that Summing Factor feature low-dimensional cluster structures represents the factor, difference distance between the territory in respective dimension can be enable little as much as possible, thus these two distance functions are fused in the model that previous step S202 obtains go as minimizing target regulatory factor, and obtain:
min V s , V t , U s , U t , H , A Σ π ∈ I ( | | X π - U π HV π | | 2 + β | | Y π P π - AV π P π | | 2 ) + α | | A | | 2 + | | 1 m s 1 m s T U s - 1 m t 1 m t T U t | | 2 + | | 1 n s V s 1 n s - 1 n t V t 1 n t | | 2
S204, keeps the low dimensional manifold structure of data, according to spectrogram geometric theory, utilizes the auxiliary territory obtained in step S102 the adjacent map of data dimension, set up metric data mapping function at low dimensional manifold space estimating along geodesic slickness:
R s v = 1 2 Σ i j | | v · i s - v · j s | | 2 ( W s v ) i j = Σ i t r ( v · i s ( v · i s ) T ) ( D s v ) i i - Σ i j t r ( v · i s ( v · j s ) T ) ( W s v ) i j = t r ( V s ( D s v - W s v ) V s T )
Wherein, D s v = d i a g ( Σ i ( W s v ) i j )
Utilize the adjacent map of the characteristic dimension in the auxiliary territory obtained in step S102, set up metric data Feature Mapping function at low dimensional manifold space estimating along geodesic slickness:
R s u = 1 2 Σ i j | | u i · s - u j · s | | 2 ( W s u ) i j = Σ i t r ( ( u i · s ) T ( u i · s ) ) ( D s u ) i i - Σ i j t r ( ( u i · s ) T ( u j · s ) ) ( W s u ) i j = t r ( U s T ( D s u - W s u ) U s )
Wherein, D s u = d i a g ( Σ i ( W s u ) i j )
Similarly, the aiming field D obtained in step S102 is utilized tthe adjacent map of data dimension, at aiming field D ton data dimension, set up metric data mapping function at low dimensional manifold space estimating along geodesic slickness:
R t v = 1 2 Σ i j | | v · i t - v · j t | | 2 ( W t v ) i j = Σ i t r ( v · i t ( v · i t ) T ) ( D t v ) i i - Σ i j t r ( v · i t ( v · j t ) T ) ( W t v ) i j = t r ( V t ( D t v - W t v ) V t T )
Wherein, D t v = d i a g ( Σ i ( W t v ) i j )
Utilize the adjacent map of the characteristic dimension of the aiming field obtained in step S102, in characteristic dimension, set up metric data Feature Mapping function at low dimensional manifold space estimating along geodesic slickness:
R t u = 1 2 Σ i j | | u i · t - u j · t | | 2 ( W t u ) i j = Σ i t r ( ( u i · t ) T ( u i · t ) ) ( D t u ) i i - Σ i j t r ( ( u i · t ) T ( u j · t ) ) ( W t u ) i j = t r ( U t T ( D t T - W t u ) U t )
Wherein, D t u = d i a g ( Σ i ( W t u ) i j )
S205: set up based on differentiating that the shift learning disaggregated model across data field of stream shape is as follows:
min V s , V t , U s , U t , H , A Σ π ∈ I ( | | X π - U π HV π | | 2 + β | | Y π P π - AV π P π | | 2 ) + α | | A | | 2 + Σ π ∈ I λ ( R π u + R π v ) + | | 1 m s 1 m s T U s - 1 m t 1 m t T U t | | 2 + | | 1 n s V s 1 n s - 1 n t V t 1 n t | | 2
s.t.V s,V t,U s,U t,H≥0
Carry out alternating iteration in S30 specifically to comprise the following steps:
S301, upgrades auxiliary numeric field data and hides factor Ⅴ s:
Wherein B s = A T Y s P s P s T , B s + = ( | B s | + B s ) / 2 , B s - = ( | B s | - B s ) / 2 , E s = A T AV s P s P s T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S302, upgrades aiming field image watermarking factor Ⅴ t:
Wherein B t = A T Y t P t P t T , B t + = ( | B t | + B t ) / 2 , B t - = ( | B t | - B t ) / 2 , E t = A T AV t P t P t T , R=A TA,R +=(|R|+R)/2,R -=(|R|-R)/2,
S303, upgrades auxiliary characteristic of field dimension low-dimensional factor U s:
S304, upgrades target domain characterization dimension low-dimensional factor U t:
S305, upgrade auxiliary sharing learning between territory and aiming field: the relational structure between the hiding factor of the hiding Summing Factor characteristic dimension of data dimension, more new formula is as follows:
wherein I={s, t}
S306, upgrades regression coefficient A:
A = ( Σ π ∈ I Y π P π ( V π P π ) T ) ( Σ π ∈ I V π P π ( V π P π ) T + γ I ) - 1 , Wherein I={s, t}, γ = α β .
2. the shift learning sorting technique across data field based on differentiating stream shape according to claim 1, it is characterized in that, S40 is further comprising the steps:
Factor Ⅴ hidden by S401, the regression coefficient A that utilization obtains and aiming field document tgeneric Tag Estimation is carried out to the document of aiming field, obtains the generic label that aiming field news documents is predicted
Y ~ t = AV t ;
S402, according to often the subscript at the greatest member place of the row document factor determines the generic of these data.
CN201310113911.0A 2013-04-02 2013-04-02 Based on the shift learning sorting technique across data field differentiating stream shape Expired - Fee Related CN103177114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310113911.0A CN103177114B (en) 2013-04-02 2013-04-02 Based on the shift learning sorting technique across data field differentiating stream shape

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310113911.0A CN103177114B (en) 2013-04-02 2013-04-02 Based on the shift learning sorting technique across data field differentiating stream shape

Publications (2)

Publication Number Publication Date
CN103177114A CN103177114A (en) 2013-06-26
CN103177114B true CN103177114B (en) 2016-01-27

Family

ID=48636975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310113911.0A Expired - Fee Related CN103177114B (en) 2013-04-02 2013-04-02 Based on the shift learning sorting technique across data field differentiating stream shape

Country Status (1)

Country Link
CN (1) CN103177114B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103473366B (en) * 2013-09-27 2017-01-04 浙江大学 A kind of various visual angles are across the sorting technique of data field picture material identification and device
CN103678580B (en) * 2013-12-07 2017-08-08 浙江大学 A kind of multitask machine learning method and its device for text classification
US11062792B2 (en) 2017-07-18 2021-07-13 Analytics For Life Inc. Discovering genomes to use in machine learning techniques
US11139048B2 (en) 2017-07-18 2021-10-05 Analytics For Life Inc. Discovering novel features to use in machine learning techniques, such as machine learning techniques for diagnosing medical conditions
CN107563452B (en) * 2017-09-18 2020-03-27 天津师范大学 Cross-domain foundation cloud picture classification method based on discriminant measure learning
CN109492094A (en) * 2018-10-15 2019-03-19 上海电力学院 A kind of mixing multidimensional property data processing method based on density
CN110411724B (en) * 2019-07-30 2021-07-06 广东工业大学 Rotary machine fault diagnosis method, device and system and readable storage medium
CN110928916B (en) * 2019-10-18 2022-03-25 平安科技(深圳)有限公司 Data monitoring method and device based on manifold space and storage medium
CN116538996B (en) * 2023-07-04 2023-09-29 云南超图地理信息有限公司 Laser radar-based topographic mapping system and method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100011025A1 (en) * 2008-07-09 2010-01-14 Yahoo! Inc. Transfer learning methods and apparatuses for establishing additive models for related-task ranking
US20110320387A1 (en) * 2010-06-28 2011-12-29 International Business Machines Corporation Graph-based transfer learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Transfer Learning with Graph Co-Regularization》;Long Mingsheng等;《Proceedings of the twenty-sixth conference on artificial intelligence》;20120726;第2页右栏倒数第2段-第4页左栏"算法1" *

Also Published As

Publication number Publication date
CN103177114A (en) 2013-06-26

Similar Documents

Publication Publication Date Title
CN103177114B (en) Based on the shift learning sorting technique across data field differentiating stream shape
Mei et al. Sgnn: A graph neural network based federated learning approach by hiding structure
CN104731962B (en) Friend recommendation method and system based on similar corporations in a kind of social networks
CN103812872B (en) A kind of network navy behavioral value method and system based on mixing Di Li Cray process
CN110532542A (en) It is a kind of that recognition methods and system are write out falsely with the invoice for not marking study based on positive example
Li et al. Integrating ensemble-urban cellular automata model with an uncertainty map to improve the performance of a single model
CN104217087B (en) A kind of permanent resident population's analysis method based on carrier network data
CN103971129A (en) Classification method and device based on learning image content recognition in cross-data field subspace
CN104933444A (en) Design method of multi-dimension attribute data oriented multi-layered clustering fusion mechanism
Hadighi et al. A framework for strategy formulation based on clustering approach: A case study in a corporate organization
CN108960273A (en) A kind of poor student's identification based on deep learning
CN112085086A (en) Multi-source transfer learning method based on graph convolution neural network
Blanco et al. Multi-type maximal covering location problems: Hybridizing discrete and continuous problems
CN102722578B (en) Unsupervised cluster characteristic selection method based on Laplace regularization
Pérez-Campuzano et al. Visualizing the historical COVID-19 shock in the US airline industry: A Data Mining approach for dynamic market surveillance
Lopez-Rubio et al. Grid topologies for the self-organizing map
CN109951499A (en) A kind of method for detecting abnormality based on network structure feature
CN103473366B (en) A kind of various visual angles are across the sorting technique of data field picture material identification and device
Gao et al. The user-knowledge crowdsourcing task allocation integrated decision model and genetic matrix factorization algorithm
CN102799891A (en) Spectral clustering method based on landmark point representation
CN116305233A (en) Scientific research data management method and system based on federal migration learning
CN116070385A (en) Automatic risk identification method and system for overseas mineral resource supply chain
Abellana et al. A novel hybrid DEMATEL-K-means clustering algorithm for modeling the barriers of green computing adoption in the Philippines
Yin et al. Improved two-stage DEA model: an application to logistics efficiency evaluation enterprise in xiamen, China
Sun et al. Research on the relationship between human resource management activities and enterprise performance based on the supervised learning model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160127

Termination date: 20200402