CN106294506A

CN106294506A - The viewpoint data classification method of domain-adaptive and device

Info

Publication number: CN106294506A
Application number: CN201510316353.7A
Authority: CN
Inventors: 周光有; 张小鹏; 肖磊; 刘婷婷; 王巨宏
Original assignee: Huazhong Normal University; Tencent Technology Shenzhen Co Ltd
Current assignee: Huazhong Normal University; Tencent Technology Shenzhen Co Ltd
Priority date: 2015-06-10
Filing date: 2015-06-10
Publication date: 2017-01-04
Anticipated expiration: 2035-06-10
Also published as: CN106294506B

Abstract

The present invention discloses the viewpoint data classification method of a kind of domain-adaptive, belongs to Internet technical field.Comprise determining that source domain term matrix and target domain term matrix；Determine source domain object function and target domain object function；General objective function is determined according to source domain object function and target domain object function；Determine the desired value of parameters in general objective function respectively；The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model, and the viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.Owing to the hinge matrix of the shared topic between general objective function with source domain specific topics matrix, target domain specific topics matrix and expression source domain and target domain is relevant, thus provide a kind of viewpoint data classification method realizing domain-adaptive by shared topic.Owing to shared topic can reduce the difference between source domain and target domain, thus can ensure that the accuracy of classification results.

Description

The viewpoint data classification method of domain-adaptive and device

Technical field

The present invention relates to Internet technical field, particularly to the viewpoint data classification side of a kind of domain-adaptive Method and device.

Background technology

Along with the development of Internet technology, the viewpoint data that user shares on the internet get more and more.Such as, User comment, user that user delivers at shopping website are viewpoint number for the feedback opinion etc. of a certain product According to.Wherein, the viewpoint data that user shares can relate to multiple field, such as garment industry, books field etc.. For the ease of research or the viewpoint data of statistics every field, it usually needs the viewpoint data of every field are entered Row classification.Generally, when the viewpoint data in some field are classified, need first to mark in this field Viewpoint data, and based on mark viewpoint data training grader realize.But, due in the Internet The field involved by viewpoint data a lot, be all labeled wasting a lot to the viewpoint data in each field Resource, and the viewpoint data classification method of domain-adaptive, it is possible to achieve in the viewpoint data to some field In the case of not being labeled, it is achieved the viewpoint data in these fields are classified.

To use SFA (Spectral Feature Alignment, the feature queue of spectrum) algorithm to realize field certainly As a example by the viewpoint data classification adapted to, correlation technique is when the viewpoint data realizing domain-adaptive are classified, first A first arbitrarily selected source domain and target domain, and determine source domain and target domain field specific word and Field autonomous word.Wherein, field specific word is word specific to a field, and field autonomous word is connection source Bridge between field and target domain.Then, between field specific word and field autonomous word, one is built Two-dimensional plot, this two-dimensional plot is used for representing the cooccurrence relation between field specific word and field autonomous word, and then adopts Two-dimensional plot will contact more field specific word and field autonomous word is assigned in one bunch with SFA algorithm.By The gap between the field specific word of source domain and target domain can be reduced in this bunch, therefore, can basis This bunch trains a grader, and then the grader obtained by training realizes the viewpoint number of domain-adaptive According to classification.

During realizing the present invention, inventor finds that correlation technique at least there is problems in that

Correlation technique is when the viewpoint data realizing domain-adaptive are classified, due to selected source domain and target All words involved by field might not clearly be divided into field specific word or field autonomous word, leads The classification results that viewpoint data are classified by the viewpoint data of the domain-adaptive that cause is proposed by correlation technique The most accurate.

Summary of the invention

In order to solve problem of the prior art, embodiments provide the viewpoint number of a kind of domain-adaptive According to sorting technique and device.Described technical scheme is as follows:

First aspect, it is provided that the viewpoint data classification method of a kind of domain-adaptive, described method includes:

The relation between document and term according to source domain, determines source domain term matrix；

The relation between document and term according to target domain, determines target domain term matrix；

According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, determines source domain mesh Scalar functions；

According to described target domain term matrix, target domain specific topics matrix, described target domain spy Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function；

According to described source domain object function and described target domain object function, determine general objective function；

Determine the desired value of parameters in described general objective function respectively；

The viewpoint data of mark in desired value according to described parameters and described source domain, training is specified Disaggregated model, the viewpoint data of described target domain are classified by the disaggregated model of specifying obtained by training.

Second aspect, it is provided that the viewpoint device for classifying data of a kind of domain-adaptive, described device includes:

First determines module, for according to the relation between document and the term of source domain, determines source domain Term matrix；

Second determines module, for according to the relation between document and the term of target domain, determines target Field term matrix；

3rd determines module, for according to described source domain term matrix, source domain specific topics matrix, Hinge between coefficient matrix and described source domain and the described target domain of described source domain specific topics matrix Matrix, determines source domain object function；

4th determines module, for according to described target domain term matrix, target domain specific topics square Battle array, the coefficient matrix of described target domain specific topics matrix and described hinge matrix, determine target domain mesh Scalar functions；

5th determines module, is used for according to described source domain object function and described target domain object function, Determine general objective function；

6th determines module, for determining the desired value of parameters in described general objective function respectively；

Training module, the viewpoint of mark in the desired value according to described parameters and described source domain Data, disaggregated model is specified in training；

Sort module, specifies the disaggregated model viewpoint data to described target domain for obtained by training Classify.

The technical scheme that the embodiment of the present invention provides has the benefit that

Due to the general objective function that determines and source domain specific topics matrix, target domain specific topics matrix and The hinge matrix of the shared topic between expression source domain and target domain is relevant, thus provides a kind of by source Shared topic between field and target domain realizes the viewpoint data classification method of domain-adaptive.Due to altogether Enjoy topic and can reduce the difference between source domain and target domain so that by the sight of this kind of domain-adaptive When point data sorting technique is classified, it can be ensured that the accuracy of classification results.

Accompanying drawing explanation

For the technical scheme being illustrated more clearly that in the embodiment of the present invention, institute in embodiment being described below The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only the present invention Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work, Other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the flow process of the viewpoint data classification method of a kind of domain-adaptive that one embodiment of the invention provides Figure；

Fig. 2 is the stream of the viewpoint data classification method of a kind of domain-adaptive that another embodiment of the present invention provides Cheng Tu；

Fig. 3 is a kind of convergence curve that another embodiment of the present invention provides；

Fig. 4 is that a kind of difference for every pair of field that another embodiment of the present invention provides carries out testing obtaining Experimental result picture；

Fig. 5 is the knot of the viewpoint device for classifying data of a kind of domain-adaptive that another embodiment of the present invention provides Structure schematic diagram；

Fig. 6 is the structural representation of a kind of server that another embodiment of the present invention provides；

Fig. 7 is the structural representation of a kind of terminal that another embodiment of the present invention provides.

Detailed description of the invention

For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present invention Embodiment is described in further detail.

Along with developing rapidly of Internet technology, the viewpoint number that can be identified for that user's emotion that the Internet is shared According to more and more.Such as, when after user's net purchase success, the commodity to this purchase can be delivered in comment Evaluation opinion；After some user delivers blog, what the content that this user is delivered by other users was carried out comments Opinion etc..Wherein, viewpoint data may be derogatory sense, it is also possible to for commendation；May be subjective, it is possible to Can be objective etc..It is to say, viewpoint data have certain feeling polarities, the emotion pole of viewpoint data Property includes positively and negatively etc., and the process of the feeling polarities of research viewpoint data is and carries out viewpoint data point The process of class.By the feeling polarities of viewpoint data is studied, guide product or service etc. are produced Practice has great importance, consequently, it is frequently necessary to classify viewpoint data.

Further, the viewpoint data owing to including on the Internet relate to multiple different field.For the ease of The viewpoint data in multiple fields are classified, it will usually use the sorting technique of domain-adaptive.By neck The adaptive sorting technique in territory, it is possible to achieve be labeled in not viewpoint data to certain or some fields In the case of, it is achieved the viewpoint data in this or these field are classified.The embodiment of the present invention i.e. provides The viewpoint data classification method of a kind of domain-adaptive.In embodiments of the present invention, source domain includes one A little viewpoint data of mark having marked polarity, and target domain may not include having marked viewpoint data, The method provided by the embodiment of the present invention, it may be determined that the feeling polarities of arbitrary viewpoint data in target domain, Thus realize arbitrary viewpoint data of target domain are classified, specific field adaptive viewpoint data are divided Class method each embodiment as described below:

Fig. 1 is the viewpoint data classification method of a kind of domain-adaptive provided according to an exemplary embodiment Flow chart.Seeing Fig. 1, the method flow that the embodiment of the present invention provides includes:

101: according to the relation between document and the term of source domain, determine source domain term matrix.

102: according to the relation between document and the term of target domain, determine target domain term matrix.

103: according to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix Coefficient matrix and source domain and target domain between hinge matrix, determine source domain object function.

104: according to target domain term matrix, target domain specific topics matrix, the specific words of target domain The coefficient matrix of topic matrix and hinge matrix, determine target domain object function.

105: according to source domain object function and target domain object function, determine general objective function.

106: determine the desired value of parameters in general objective function respectively.

107: according to the viewpoint data of mark in the desired value of parameters and source domain, classification is specified in training Model, the viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.

The method that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix, The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results Really property.

In another embodiment, according to source domain term matrix, source domain specific topics matrix, source neck Hinge matrix between coefficient matrix and source domain and the target domain of territory specific topics matrix, determines source domain Object function, including:

According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be Hinge matrix between matrix number and source domain and target domain, determines source domain target letter by below equation Number:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

In formula, O_sFor source domain object function, X_sFor source domain term matrix, U₀For hinge matrix, U_s For source domain specific topics matrix, V_sFor the coefficient matrix of source domain specific topics matrix,Represent Fei Luobei Ni Wusi norm；

According to target domain term matrix, target domain specific topics matrix, target domain specific topics square The coefficient matrix of battle array and hinge matrix, determine target domain object function, including:

According to target domain term matrix, target domain specific topics matrix, target domain specific topics square The coefficient matrix of battle array and hinge matrix, determine target domain object function by below equation:

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

In formula, O_tFor target domain object function, X_tFor target domain term matrix, U₀For hinge matrix, U_tFor target domain specific topics matrix, V_tCoefficient matrix for target domain specific topics matrix.

In another embodiment, according to source domain object function and target domain object function, catalogue is determined Scalar functions, including:

According to source domain object function and target domain object function, determine general objective function by equation below:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

In formula, Φ is general objective function, D (U₀, U_s, U_t, V_s, V_t) it is a regular terms, α, β, γ are Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach₀(i, j) >=0 Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approach_s(i, j) >=0 Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approach_t(i, j) >=0 Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approach_s(i, j) >=0 Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approach_t(i, j) >=0 The Lagrange multiplier matrix obtained under part.

In another embodiment, determine the desired value of parameters in general objective function respectively, including:

It is respectively the value initial value as parameters of one non-negative of parameters random assortment；

According to the initial value of parameters, calculate the convergency value of parameters, the convergency value of parameters is made Desired value for parameters.

In another embodiment, the parameters in general objective function includes U₀、U_s、U_t、V_sAnd V_t；

According to the initial value of parameters, calculate the convergency value of parameters, including:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

To U₀It is iterated calculating, until obtaining U₀Convergency valueIn formula,The U that last iteration obtains₀ Value,According toThe U that iteration obtains₀Value, H_sFor the hinge matrix coefficient matrix to source domain, H_tFor the hinge matrix coefficient matrix to target domain, r represents iterations；

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

To U_sCarry out Iterative computation, until obtaining U_sConvergency valueIn formula,The U that last iteration obtains_sValue, According toThe U that iteration obtains_sValue, L_sFor the coefficient matrix of source domain specific topics matrix, L_tFor mesh The coefficient matrix of mark field specific topics matrix；

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

To U_tCarry out Iterative computation, until obtaining U_tConvergency valueIn formula,The U that last iteration obtains_tValue, According toThe U that iteration obtains_tValue；

According to V_sInitial value, according toTo V_sIt is iterated calculating, until obtaining V_s's Convergency valueIn formula,The V obtained for last iteration_sValue,According toThe V that iteration obtains_s Value；

According to V_tInitial value, according toTo V_tIt is iterated calculating, until obtaining V_t's Convergency valueIn formula,The V obtained for last iteration_tValue,According toThe V that iteration obtains_t Value.

In conjunction with the content of embodiment corresponding to Fig. 1, Fig. 2 is a kind of neck provided according to an exemplary embodiment The flow chart of territory adaptive viewpoint data classification method.See Fig. 2, the method stream that the embodiment of the present invention provides Journey includes:

201: according to the relation between document and the term of source domain, determine source domain term matrix.

Wherein, source domain includes that some have marked viewpoint data, it is also possible to include that some do not mark viewpoint data, Having marked viewpoint data can be to have marked document.Viewpoint data have been marked, the data energy marked for arbitrary The feeling polarities enough showing these viewpoint data is forward or negative sense.Such as, if some has marked viewpoint Data are one and have marked document, and represent that with "+1 " and "-1 " feeling polarities is positively and negatively respectively, If this labeled data having marked document is "+1 ", then may determine that this has marked the feeling polarities of document and has been Forward.It addition, about the type of source domain, the embodiment of the present invention is not especially limited.Such as, source domain Can be books field, electronic applications, garment industry etc..

Generally, each field can include multiple document, and each document is made up of at least one term, because of This, for arbitrary field, can represent the pass between the document in this field and term by term matrix System, thus realize identifying the feature in this field.In embodiments of the present invention, in order to determine document in source domain And the relation between term, to determine the feature of source domain, sets source domain as X_s, source domain comprises Number of files be n_sIndividual, the quantity of the term comprised in each document is m, and on this basis, source is led Territory term matrix can be expressed as:

X_{s} = {x_{1}^{(s)}, . . ., x_{n_{s}}^{(s)}} .

Wherein, source domain term matrix X_sIn the weight of each element representation correspondence term.Each inspection The weight of rope word can be according to the relation between the document of source domain and term, by TF-IDF Algorithm for Solving Obtain.

Owing to document each in source domain comprising m term, therefore, the term matrix of source domain It is also shown asI.e.

X_{s} &Element; R^{m \times n_{s}} .

It addition, when arbitrary viewpoint data are document, a number of mark sight owing to source domain comprising Point data, i.e. source domain include a number of having marked document, for the ease of in follow-up use source domain The viewpoint data of mark training specify disaggregated model, for the document of mark in source domain, can pass through One document Polarity Matrix Y_sRepresent that each has marked the feeling polarities of document.Specifically, Y_sCan be one n_sThe matrix of × 2, n_sThe quantity of the document for comprising in source domain, 2 represent that the feeling polarities kind of document has two Kind: a kind of polarity is forward, represents that the viewpoint that document is expressed is forward viewpoint, and a kind of polarity is negative sense, table Show that the viewpoint that document is expressed is negative sense viewpoint.In conjunction with the content of document Polarity Matrix, with i-th in source domain As a example by individual document, if the element y in the i-th document correspondence document Polarity Matrix of source domain_i=1, then can determine that In source domain, the feeling polarities of i-th document is forward, and the viewpoint that i.e. the document is expressed is forward viewpoint；If source Element y in the Polarity Matrix that the i-th document in field is corresponding_i=-1, it is determined that i-th document in source domain Feeling polarities is negative sense, and the viewpoint that i.e. the document is expressed is negative sense viewpoint.Certainly, aforesaid way is only with "+1 " Illustrated as a example by the feeling polarities of "-1 " expression document, but, in the specific implementation, also may be used Using the feeling polarities of other numeric representation document, this is not made concrete restriction by the present embodiment.

202: according to the relation between document and the term of target domain, determine target domain term matrix.

Wherein, target domain may not include having marked viewpoint data.It addition, target domain can be books The fields different from source domain such as field, electronic applications, garment industry, the present embodiment is not to target domain Type makees concrete restriction.In conjunction with the content in above-mentioned steps 201, in order to determine the document of target domain with Relation between term, to determine the feature of target domain, embodiment of the present invention target setting field is X_t, The number of files comprised in target domain is n_tIndividual, the quantity of the term comprised in each document is m, then The term matrix of target domain can be expressed as:

X_{t} = {x_{1}^{(t)}, . . ., x_{n_{t}}^{(t)}} .

Owing to document each in target domain comprising m term, therefore, the term of target domain Matrix is also shown asI.e.

X_{t} &Element; R^{m \times n_{t}} .

It should be noted that above-mentioned steps 201 and step 202 are only first to determine source domain term matrix, It is illustrated as a example by determining target domain term matrix again.But, in the specific implementation, it is also possible to first Determine target domain term matrix, then determine source domain term matrix；Source domain can also be determined simultaneously Term matrix and target domain term matrix.

203: according to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix Coefficient matrix and source domain and target domain between hinge matrix, determine source domain object function.

Typically for different types of field, often have some field specific topics.Such as, for electricity Sub-product scope, " durable " and " brightness " is its field specific topics.Source domain specific topics matrix is The matrix being made up of the field specific topics of source domain.For convenience of description, the embodiment of the present invention assumes source neck The quantity of the field specific topics in territory is k_s, source domain specific topics matrix is U_s, then source domain specific topics Matrix can be expressed as:

U_s=[u₁ ^(s)..., u_k ^(s)]。

Owing to each document in source domain comprising m term, therefore, source domain specific topics square Battle array is also denoted asI.e.Wherein, the every string in source domain specific topics matrix Represent a specific topics of source domain.

It addition, source domain and target domain generally will also include some shared topics, source domain and target domain Between the topic that shared topic is source domain and target domain all can relate to.Such as, source domain is books necks Territory, target domain is garment industry, and the topic such as " expensive ", " cheaply " all can relate at source domain and target domain, Therefore, the topic such as " expensive ", " cheaply " can be as the shared topic between source domain and target domain.The present invention Embodiment represents the shared topic between source domain and target domain by hinge matrix.

Specifically, for convenience of description, the embodiment of the present invention sets the quantity sharing topic as k₀, source domain And the hinge matrix between target domain is U₀, then the hinge matrix U between source domain and target domain₀Permissible It is expressed as:

U_{0} = [u_{1}^{(0)}, . . ., u_{k_{0}}^{(0)}] .

Owing to each document in source domain and target domain comprising m term, therefore, hinge square Battle array is also denoted asI.e.Wherein, each list in hinge matrix show source domain and A shared topic between target domain.

In conjunction with foregoing, owing to source domain specific topics matrix and hinge matrix are included by source domain Topic, therefore, the topic numbers included by source domain is k₀+k_s。

It addition, in embodiments of the present invention, source domain object function can represent the feature of source domain exactly, And source domain object function is that carry out in subsequent step that the viewpoint data of domain-adaptive carry out classifying important depends on According to, it is thus necessary to determine that source domain object function.Specific due to source domain term matrix, source domain again Topic matrix and hinge matrix may be incorporated for representing the feature of source domain, therefore, it can examine according to source domain Rope word matrix, source domain specific topics matrix, the coefficient matrix of source domain specific topics matrix and source domain with Hinge matrix between target domain, determines source domain object function.

Specifically, can be specific according to source domain term matrix, source domain specific topics matrix, source domain Hinge matrix between coefficient matrix and source domain and the target domain of topic matrix, is determined by below equation Source domain object function:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} - - - (1)

In formula (1), O_sFor source domain object function, X_sFor source domain term matrix, U₀For hinge Matrix, U_sFor source domain specific topics matrix, V_sFor the coefficient matrix of source domain specific topics matrix,Table Show expense Luo Beini this norm of crow.

Above-mentioned formula (1) understands, hinge matrix, source domain specific topics matrix, source domain specific topics square The coefficient matrix of battle array and source domain term matrix determine that the key of source domain object function, therefore, really Before determining source domain object function, need first to determine source domain specific topics matrix, source domain specific topics square The coefficient matrix of battle array and hinge matrix.Wherein, source domain term matrix and source domain specific topics matrix, Between coefficient matrix and the hinge matrix of source domain specific topics matrix, there is certain relation.It is explained below Source domain term matrix and source domain specific topics matrix, the coefficient square of source domain specific topics matrix once Relation between battle array and hinge matrix.

Specifically, in conjunction with formula (1), in the ideal case, source domain term matrix X_sCan decompose To two matrixes, a matrix is source domain document topic matrix V_s, a matrix is source domain entry topic Matrix U_s'.Wherein, source domain entry topic matrix U_s' it is a m × (k_s+k₀) rank matrix, i.e.Source domain entry topic matrix U_sThe matrix comprised in ' includes but not limited to hinge matrix U₀ With source domain specific topics matrix U_s.Source domain document topic matrix V_sIt is a n_s×(k_s+k₀) rank matrix, i.e.Every a line in matrix represents a document in source domain.Source domain document topic matrix V_sAgain Matrix H can be decomposed into_sAnd matrix L_s.Wherein, H_sIt is a n_s×k₀Rank matrix, for hinge matrix to source The coefficient matrix in field, for representing hinge matrix weight size in source domain；L_sIt is a n_s×k_sRank Matrix, for the coefficient matrix of source domain specific topics matrix.

About the method carrying out decomposing by the term matrix of source domain, include but not limited to use nonnegative matrix Source domain term matrix is decomposed by decomposition method.Wherein, Non-negative Matrix Factorization method is for own in a matrix Element is the matrix disassembling method under nonnegative number constraints, Non-negative Matrix Factorization method by find low-rank, Matrix decomposition is become the matrix of several non-negative.

Actual application use the example of Non-negative Matrix Factorization method split-matrix to have a lot, as used nonnegative matrix Pixel in decomposition digital picture, the word statistics in text analyzing and stock price etc..Nonnegative matrix is divided The basic thought of solution can be briefly described into: for any given nonnegative matrix A, one can be found Individual nonnegative matrix U and nonnegative matrix V so that the matrix A of non-negative can resolve into nonnegative matrix U and The product of V.Non-negative Matrix Factorization method is utilized to carry out text, the analysis of image large-scale data, more traditional place Adjustment method more can describe and portray potential semantic information.

204: according to target domain term matrix, target domain specific topics matrix, the specific words of target domain The coefficient matrix of topic matrix and hinge matrix, determine target domain object function.

In conjunction with the content in above-mentioned steps 203, the field specific topics of target domain is peculiar by target domain Topic, target domain specific topics matrix is the matrix being made up of the specific topics of target domain.In order to just In explanation, the embodiment of the present invention assumes that the quantity of the field specific topics of target domain is k_t, target domain is specific Topic matrix is U_t, then target domain specific topics matrix U_tCan be expressed as:

U_t=[u₁ ^(t)..., u_k ^(t)]。

Owing to each document in target domain comprising m term, therefore, the specific words of target domain Topic matrix is also denoted asI.e.Wherein, in target domain specific topics matrix A specific topics of target domain is shown in each list.

In conjunction with the content in above-mentioned steps 203, owing to target domain specific topics matrix and hinge matrix are Topic included by source domain, therefore, the topic numbers included by target domain is k₀+k_t。

It addition, in embodiments of the present invention, target domain object function can represent target domain well Feature, again due to target domain object function be subsequent step is carried out domain-adaptive viewpoint data classification Important evidence, it is thus necessary to determine that target domain object function.Again due to target domain term matrix, Target domain specific topics matrix and hinge matrix may be incorporated for represent target domain feature, therefore, can With according to target domain term matrix, target domain specific topics matrix, target domain specific topics matrix Coefficient matrix and hinge matrix, determine target domain object function.

Specifically, can be according to target domain term matrix, target domain specific topics matrix, target neck The coefficient matrix of territory specific topics matrix and hinge matrix, determine target domain object function by below equation:

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} - - - (2)

In formula (2), O_tFor target domain object function, X_tFor target domain term matrix, U₀For Hinge matrix, U_tFor target domain specific topics matrix, V_tCoefficient square for target domain specific topics matrix Battle array.

Further, above-mentioned formula (2), hinge matrix and target domain specific topics matrix it is true Set the goal the key of field object function, therefore, before determining target domain object function, needs the most really Determine the field specific topics matrix of hinge matrix, target domain.About determining hinge matrix, target domain The method of the coefficient matrix of field specific topics matrix and target domain specific topics matrix, includes but not limited to Target domain term matrix is carried out decomposition obtain.

Specifically, in conjunction with formula (2), in the ideal case, the term matrix X of target domain_tCan divide Solution obtains two matrixes, and a matrix is target domain document topic matrix V_t, a matrix is target domain Entry topic matrix U_t'.Wherein, target domain entry topic matrix U_t' it is a m × (k_t+k₀) rank matrix, I.e.Target domain entry topic matrix U_tThe matrix comprised in ' includes but not limited to hinge square Battle array U₀With target domain specific topics matrix U_t.Target domain document topic matrix V_tIt is a n_t×(k_t+k₀) Rank matrix, i.e.Every a line in matrix represents a document in target domain.Target domain Document topic matrix V_tMatrix H can be decomposed into again_tAnd matrix L_t, wherein, H_tIt is a n_t×k₀Rank matrix, For the hinge matrix coefficient matrix to target domain, it is used for representing that hinge matrix weight in target domain is big Little；L_tIt is a n_t×k₀Rank matrix, for the coefficient matrix of target domain specific topics matrix.

About the method carrying out decomposing by the term matrix of target domain, include but not limited to use non-negative square The term matrix of target domain is decomposed by battle array decomposition method.

It should be noted that the present embodiment is to performing in above-mentioned steps 203 and step 204 to determine source domain Object function and determine that the sequencing of target domain object function is defined, when specifically performing, the most permissible First determine source domain object function, it is also possible to first determine target domain object function, it is also possible to determine source simultaneously Field object function and target domain object function.

205: according to source domain object function and target domain object function, determine general objective function.

Specifically, when determining general objective function, a simple directly mode is exactly directly by source domain mesh Scalar functions and target domain object function are added and obtain.But, when so determining general objective function, there is one Significantly defect cannot significantly distinguish source domain and the respective domain level constraints of target domain exactly and they have Domain level constraints, this would likely result in obtain source domain field specific topics time, do not have restrictive condition about Restraint its shared topic not obtaining between source domain and target domain；When obtaining shared topic, the most do not limit Its not acquisition from the field specific topics of source domain or target domain of constraint processed.In order to overcome above-mentioned lacking Falling into, the embodiment of the present invention is when determining general objective function, at source domain object function and target domain target letter A regular terms is added on the basis of number.The problems referred to above can be overcome by this regular terms.

In conjunction with foregoing, can be according to source domain object function and target domain object function, by as follows Formula determines general objective function:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t}) - - - (3)

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix} - - - (4)

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

In formula (3) and formula (4), Φ is general objective function, D (U₀, U_s, U_t, V_s, V_t) it is canonical , α, β, γ are each regularization parameter, and Tr () is matrix trace,For passing through lagrange multiplier approach Limiting U₀(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach Limiting U_s(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach Limiting U_t(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach Limiting V_s(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach Limiting V_t(i, j represent U respectively for i, the Lagrange multiplier matrix obtained under the conditions of j) >=0₀、U_s、U_t、 V_sAnd V_tIn any row and either rank.

Wherein, α=a/ (k₀*k_s), β=a/ (k₀*k_t), γ=a/ (k_s*k_t).A can pass through cross validation Method determines.About the concrete numerical value of a, the embodiment of the present invention is not especially limited.

206: be respectively the value initial value as parameters of one non-negative of parameters random assortment, and root According to the initial value of parameters, calculate the convergency value of parameters, using the convergency value of parameters as each The desired value of parameter.

This step is to determine the specific implementation of the desired value of parameters in general objective function respectively.By step Rapid 203 and step 204 in content can obtain, source domain specific topics matrix, source domain specific topics matrix Coefficient matrix and hinge matrix can by source domain term matrix use Non-negative Matrix Factorization method decompose Obtain, target domain specific topics matrix, the coefficient matrix of target domain specific topics matrix and hinge matrix Can obtain by using Non-negative Matrix Factorization method to decompose in target domain term matrix.In conjunction with the catalogue offer of tender The expression formula of number, the parameters in general objective function includes hinge matrix U₀, source domain specific topics matrix U_s、 Target domain specific topics matrix U_t, the coefficient matrix V of source domain specific topics matrix_sSpecific with target domain The coefficient matrix V of topic matrix_t.But, by source domain term matrix X_sRetrieve with target domain Word matrix X_tWhen carrying out decomposing to obtain parameters, it is not necessary to carry out once-through operation and i.e. can get parameters Optimal Decomposition matrix, and need to be determined the optimal value of parameters by iterative computation.Therefore, the present invention During the desired value of embodiment parameters in determining general objective function, can first be respectively parameters random The value of one non-negative of distribution is as the initial value of parameters, and according to the initial value of parameters, uses one Parameters is iterated calculating, to obtain the convergency value of parameters, by parameters by fixed algorithm Convergency value is as the desired value of parameters.

Wherein, owing in general objective function, parameters is matrix, therefore, dividing at random for parameters When joining the value of a non-negative, for one nonnegative value of the equal random assortment of each element in parameters.

Specifically, when being iterated calculating, for different parameters, used during iterative computation is concrete Algorithm is the most different.Respectively the mode of the convergency value calculating parameters will be introduced below.

1, hinge matrix U is calculated₀Convergency value:

First, by U₀Regard unknown parameter, U as_s、U_t、V_sAnd V_tRegard known parameters as, then general objective function phi About U₀First derivative formula can be expressed as:

{&dtri;}_{U_{0}} Φ = [λ_{s} (X_{s}^{(r)} - X_{s}) H_{s}^{T} + λ_{t} (X_{t}^{(r)} - X_{t}) L_{t}^{T} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0} + A^{U_{0}}] - - - (5)

X_{s}^{(r)} = [U_{0}^{(r)}, U_{s}^{(r)}] H_{s}^{(r)}

X_{t}^{(r)} = [U_{0}^{(r)}, U_{t}^{(r)}] H_{t}^{(r)}

It follows that use KKT (Karush-Kuhn-Tucke, Caro need-Kuhn-Tucker condition) conditionWith gradient Φ of general objective function phi, above-mentioned formula (5) is defined, can To obtain:

[λ_{s} (X_{s}^{(r)} - X_{s}) H_{s}^{T} + λ_{t} (X_{t}^{(r)} - X_{t}) H_{t}^{T} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}] (i, j) U_{0} (i, j) = 0 - - - (6) .

This formula (6) is calculated, can obtain:

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]} - - - (7)

In formula (7),The U that last iteration obtains₀Value,According toThe U that iteration obtains₀ Value, H_sFor the hinge matrix coefficient matrix to source domain, represent that hinge matrix weight in source domain is big Little；H_tFor the hinge matrix coefficient matrix to target domain, represent hinge matrix weight in target domain Size；R represents iterations, i.e. the r time iteration；Representing matrix point division operation.

Finally, use above-mentioned formula (7) to U₀It is iterated calculating, until obtaining U₀Convergency value Wherein, when carrying out iterative computation for the first time, will be for U₀The initial value conduct of random assortment

2, source domain specific topics matrix U is calculated_sConvergency value:

First, by U_sRegard unknown parameter, U as₀、U_t、V_sAnd V_tRegard known parameters as, then general objective function phi About U_sFirst derivative formula can be expressed as:

{&dtri;}_{U_{s}} Φ = 2 [λ_{s} (X_{s}^{(r)} - X_{s}) L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s} + A^{U_{s}}]

It follows that use KKT conditionWith gradient Φ of general objective function phi to upper State formula to be defined, can obtain:

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]} - - - (8) .

In formula (8),The U that last iteration obtains_sValue,According toThe U that iteration obtains_s Value, L_sFor the coefficient matrix of the field specific topics matrix of source domain, L_tField for target domain is specific The coefficient matrix of topic matrix.

Finally, use above-mentioned formula (8) to U_sIt is iterated calculating, until obtaining U_sConvergency value Wherein, when carrying out iterative computation for the first time, will be for U_sThe initial value conduct of random assortment

3, target domain specific topics matrix U is calculated_tConvergency value:

The principle of this process calculates hinge matrix U in above-mentioned 1 or 2₀Convergency value or calculate source domain specific Topic matrix U_sConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically, The U obtained_tExpression formula is:

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]} - - - (9)

In formula (9),The U that last iteration obtains_tValue,According toThe U that iteration obtains_t Value.Calculating U_tConvergency value time, can be by above-mentioned formula (9) constantly to U_tIt is iterated calculating, Until obtaining U_tConvergency value

4, the coefficient matrix V of source domain specific topics matrix is calculated_sConvergency value:

The principle of this process calculates hinge matrix U in above-mentioned 1 or 2₀Convergency value or calculate source domain specific Topic matrix U_sConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically, The V obtained_sExpression formula is:

V_{s}^{m} = V_{s}^{m - 1} \frac{[{\overset{&OverBar;}{U}}_{s}^{T} X_{s}]}{[{\overset{&OverBar;}{U}}_{s}^{T} {\overset{&OverBar;}{U}}_{s} V_{s}]} - - - (10)

In formula (10),The V obtained for last iteration_sValue,According toIteration obtains V_sValue.Specifically, V is being calculated_sConvergency valueTime, can be by above-mentioned formula (10) to V_sConstantly enter Row iteration calculates, until obtaining V_sConvergency value

5, the coefficient matrix V of target domain specific topics matrix is calculated_tConvergency value:

The principle of this process calculates hinge matrix U in above-mentioned 1 or 2₀Convergency value or calculate source domain specific Topic matrix U_sConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically, The V obtained_tExpression formula is:

V_{t}^{m} = V_{t}^{m - 1} \frac{[{\overset{&OverBar;}{U}}_{t}^{T} X_{t}]}{[{\overset{&OverBar;}{U}}_{t}^{T} {\overset{&OverBar;}{U}}_{t} V_{t}]}

In formula (10),The V obtained for last iteration_tValue,According toIteration obtains V_tValue.Calculating V_tConvergency value time, can be by above-mentioned formula (10) constantly to V_tIt is iterated calculating, Until obtaining V_tConvergency value

Further, ensure that according to the convergence formula (formula (7) of parameters in general objective function To formula (10)) obtain the convergency value of parameters, the method that the embodiment of the present invention provides is according to above-mentioned mistake After journey determines the convergence formula of parameters, also the convergence formula of parameters will be carried out convergence Checking.For convenience of description, following will be in conjunction with formula (8), with to source domain specific topics matrix U_sReceipts Holding back property illustrates as a example by verifying.For the checking principle of other parameter with to U_sChecking principle consistent, The constringent process verifying other parameter will be described in detail by the embodiment of the present invention.

Specifically, before carrying out convergence checking, need first to introduce a definition, two lemma and one Theorem.

Definition 1: assume that F (X, X ') is an auxiliary function of Φ (X), and

Φ (X)≤F (X, X ')

During and if only if Φ (X)=F (X, X), equation is set up.

Lemma 1: vacation lets f be an auxiliary function of Φ, Φ is a nonincreasing function, on this basis, has:

X^{(r + 1)} = \arg \min_{X} F (X, X^{(r)})

Can obtain in conjunction with above-mentioned definition 1:

Φ(X^(r+1))≤F(X^(r+1), X^(r))≤F(X^(r), X^(r))=Φ (X^(r))。

Lemma 2: assumeRepresent that Φ includes allSum, following function isOne Individual auxiliary function:

F (U_{s}, U_{s}^{(r)}) = Φ (U_{s}^{(r)}) + (U_{s} - U_{s}^{(r)}) &dtri; Φ (U_{s}^{(r)}) + \frac{1}{2} {(U_{s} - U_{s}^{(r)})}^{2} S (U_{s}^{(r)})

S (U_{s}^{(r)}) = \frac{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}^{(r)}]}{[U_{s}^{(r)}]}

Theorem 1: on the basis of above-mentioned formula (7) to (10), Φ (U₀, U_s, U_t, V_s, V_t) be one non- Increasing function.

Prove U_sConvergence as follows:

Because the purpose optimizing general objective function is to use auxiliary functionMinimize Φ (U_s), therefore, OrderAnd use lemma 1 and lemma 2, can obtain following equation:

U_{s}^{(r + 1)} = U_{s}^{(r)} - [&dtri; Φ (U_{s}^{(r)}) / Φ (U_{s}^{(r)})] - - - (11)

And

&dtri; Φ (U_{s}^{(r)}) = [λ_{s} (X_{s}^{(r)} - X_{s}) H_{s}^{T} + α U_{0} U_{0}^{T} U_{s}^{(r)} + γ U_{t} U_{t}^{T} U_{s}^{(r)}] - - - (12)

Formula (12) is used to substitute in lemma 2I.e. can get formula (8).

It addition, in obtaining general objective function after the convergence formula of parameters, it is also possible to further to each The complexity of parameter is analyzed.In embodiments of the present invention, the complexity of parameters is represented with O.

Specifically, the process in conjunction with the above-mentioned convergence formula solving parameters can obtain: in each iteration, Calculate hinge matrix U₀Complexity be O (m × n × k₀), wherein, n=max (n_s, n_t).Similarly, In each iteration, source domain specific topics matrix U is calculated_sWith target domain specific topics matrix U_tComplexity Degree is respectively O (m × n_s×k_s) and O (m × n_t×k_t).In each iteration, source domain specific topics is calculated The coefficient matrix V of matrix_sCoefficient matrix V with target domain specific topics matrix_tComplexity be respectively O(m×n_s×(k₀+k_s)) and O (m × n_t×(k₀+k_t))。

Can be obtained by the complicated dynamic behaviour formula of above-mentioned parameters, the complexity of whole calculating process depends on meter Calculate the coefficient matrix V of source domain specific topics matrix_sCoefficient matrix V with target domain specific topics matrix_t。

It should be noted that above-mentioned steps 206 is only to determine general objective function by formula (7) to (10) It is illustrated as a example by the desired value of middle parameters, but, in the specific implementation, determining the catalogue offer of tender In number during the desired value of parameters, it is also possible to based on alternating least-squares, active set m ethod or Projected Methods etc., the mode of the desired value determining parameters is not specifically limited by the embodiment of the present invention.

207: according to the viewpoint data of mark in the desired value of parameters and source domain, classification is specified in training Model, by specifying disaggregated model to classify the viewpoint data of target domain.

The convergency value of parameters in general objective function can be obtained by above-mentioned steps 206, and parameters It is and can be identified for that source domain and the parameter of target domain feature, such as, U_sFor source domain specific topics matrix, This parameter can be identified for that topic specific to source domain；U_tFor target domain specific topics matrix, this parameter energy Enough topics specific to mark target domain；U₀For hinge matrix, this parameter can be identified for that source domain and target The topic that field is common.It is to say, the parameters in general objective function can be identified for that source domain and mesh The feature in mark field, therefore, after the convergency value obtaining parameters, can obtain source domain and target The feature in field.Include that some have marked viewpoint data due to source domain again, and target domain may not wrap Include and mark viewpoint data, therefore, it can the mark in the desired value according to parameters and source domain and see Point data, training is specified disaggregated model, and then can be led target by the appointment disaggregated model that training obtains The viewpoint data in territory are classified.

Specifically, can be in conjunction with source domain specific topics matrix U_sConvergency value, target domain specific topics square Battle array U_tConvergency value and source domain in the viewpoint data of mark, training specify disaggregated model.About according to total The viewpoint data of mark in the desired value of parameters and source domain in object function, classification mould is specified in training The process of type, the embodiment of the present invention is not described in detail, and can come real in conjunction with existing model training method Existing.

Further, after training obtains disaggregated model, need the arbitrary literary composition in target domain if follow-up Shelves are classified, i.e. it needs to be determined that during the feeling polarities of the document, the document can be input to this and train The appointment disaggregated model arrived, and the output specifying disaggregated model obtained by this training determines the feelings of the document Sense polarity.

Specifically, the appointment disaggregated model obtained when training represents document respectively by output "+1 " and "-1 " Feeling polarities when being respectively the most positively and negatively, divide if arbitrary document to be input to the appointment that this training obtains In class model, the appointment disaggregated model obtained when this training is output as "+1 ", then may determine that the feelings of the document Sense polarity is forward；The appointment disaggregated model obtained when this training is output as "-1 ", then may determine that the document Feeling polarities be negative sense.

About the concrete form of appointment disaggregated model, can have a variety of.Such as, it is intended that disaggregated model is permissible For SVM (Support Vector Machine, support vector machine) etc..

It should be noted that above-described embodiment is only studied choosing a source domain and a target domain As a example by be illustrated.But, in the specific implementation, the quantity of source domain and target domain can also be it Its numerical value.

Alternatively, the viewpoint number of domain-adaptive is realized in order to be verified above-mentioned steps 201 to step 207 According to accuracy during classification, the method that above-mentioned steps 201 to step 207 is also proposed by the embodiment of the present invention Carry out experimental verification.

Specifically, the embodiment of the present invention have chosen four fields and carried out experimental verification.Wherein, four chosen Individual field is respectively as follows: books field (B), DVD (Digital Versatile Disc, digital versatile disc) S field (D), electronics field (E), field of kitchen products (K).Experimentation is above-mentioned four Each viewpoint data in individual field distribute a viewpoint label.Wherein, the label of the viewpoint data of distribution For+1 or-1.When the viewpoint label that the arbitrary viewpoint data for a certain field are distributed is+1, this viewpoint is described Feeling polarities be forward；When the viewpoint label distributed for a certain viewpoint data is-1, this viewpoint is described Feeling polarities is negative sense.Wherein, each field includes 1000 forward viewpoint data points and 1000 negative senses Viewpoint data, also have some not mark viewpoint data.Realizing the viewpoint data sorting task of domain-adaptive In, it is possible to the classification task of structure has 12, is respectively as follows: D → B, E → B, K → B, K → E, D → E, B → E, B → D, K → D, E → D, B → K, D → K, E → K.Wherein, source neck is represented before arrow Territory, represents target domain after arrow.As shown in table 1, the composition situation that it illustrates a kind of experimental data is shown Expectation.

Table 1

Field	Training data	Test data	Do not mark the data of viewpoint	The ratio of negative sense data
					Books	1600	400	4465	50%
DVD	1600	400	5945	50%

Electronic product	1600	400	5681	50%
					Kitchen articles	1600	400	3586	50%

The viewpoint data that data are four fields chosen listed in table 1, wherein, wrap in each field Contain training data, test data and do not mark the data of viewpoint, and in each field shared by negative sense data Ratio is the 50% of each FIELD Data.Owing to, in 12 classification task built, both may be used in each field Be source domain can also be target domain, when selected field is as source domain, the training data in field For building appointment disaggregated model, when selected field is as target domain, the test data in field are used for The appointment disaggregated model obtaining training is tested.Therefore, in order to ensure the accuracy of experiment, the present invention Embodiment sets training data and the test data of equal number for every field, as shown in table 1, each Training data in field is 1600, and test data are 400.

In order to represent the method using the present embodiment to provide intuitively in domain-adaptive viewpoint data are classified Superiority, when the viewpoint data that have chosen four fields are tested, also have chosen benchmark algorithm (baseline), SCL (Structural Correspondence Learning, structure correspondence learns), MCT (Multi-label Consensus Training, multiple labeling common recognition training), SFA (Spectral Feature Alignment, the spy of spectrum Levy queue), SDA (Stacked Denoising Auto-encoders, every layer of denoising automatic encoding), CODA (Chen et al. [2011] proposed a state-Of-the-art Domain Adaptation) and PJNMF (Linking Heterogeneous InputFeatures via Pivots via Joint Non-negative Matrix Factor-ization, the algorithm being connected different input feature vector by hinge based on Non-negative Matrix Factorization), wherein, PJNMF is the method that the embodiment of the present invention is provided.

As shown in table 2, it illustrates and a kind of carry out, by various different algorithms, the classification results obtained of classifying Signal table.

Table 2

Task

Basic Law

SCL

MCT

SFA

SDA

CODA

PJNMF

B→D

76.41±0.31

78.68±0.26

78.92±0.23

80.58±0.18

81.12±0.17

80.64±0.16

81.85±0.17

E→D

71.95±0.19

75.51±0.27

72.67±0.35

76.02±0.12

76.63±0.25

76.10±0.23

77.35±0.20

K→D

73.35±0.20

76.88±0.29

74.05±0.28

76.55±0.16

76.85±0.28

76.62±0.21

78.62±0.28

D→B

73.8±0.24

78.27±0.18

75.67±0.30

77.58±0.23

78.22±0.33

77.83±0.17

79.27±0.25

E→B

72.14±0.26

75.06±0.21

72.90±0.27

75.38±0.27

75.50±0.19

75.46±0.25

76.30±0.22

K→B

71.25±0.18

73.08±0.24

74.01±0.31

74.15±0.34

74.47±0.25

75.41±0.22

75.87±0.23

B→E

71.75±0.32

75.21±0.18

75.62±0.26

75.35±0.26

75.77±0.27

76.34±0.18

76.28±0.27

D→E

72.38±0.20

75.95±0.25

76.82±0.34

77.13±0.23

77.65±0.22

77.94±0.20

77.86±0.24

K→E

83.35±0.13

85.18±0.15

84.24±0.25

85.01±0.23

84.65±0.34

84.50±0.32

85.92±0.32

B→K

74.44±0.30

77.06±0.21

78.31±0.22

78.28±0.25

78.54±0.23

78.35±0.26

79.15±0.29

D→K

75.11±0.33

78.96±0.19

80.57±0.24

80.35±0.29

80.77±0.31

80.65±0.24

81.26±0.33

E→K

85.11±0.13

85.08±0.16

85.33±0.26

85.91±0.19

87.25±0.20

86.08±0.27

86.37±0.21

Meansigma methods

75.09±0.23

77.91±0.20

77.43±0.28

78.52±0.23

78.95±0.25

78.83±0.23

79.68±0.25

Wherein, the data mode of " accuracy ± standard deviation " of the data acquisition in table 2, adding in table 2 Raw data represents the best experimental result using these algorithms to obtain.Can be obtained by the data in table 2, this The PJNMF method that bright embodiment proposes all performances in all of 12 tasks are good, and almost in institute There is the classification results in task all good than other calculated classification results of algorithm institute.

Further, the convergence of method that the embodiment of the present invention is also provided by the embodiment of the present invention has been carried out point Analysis, Fig. 3 shows a kind of convergence curve.This convergence curve is based on training data and uses the present invention to implement The method that example provides obtains.X-axis in Fig. 3 represents that iterations, Y-axis represent the value of general objective function. Can be obtained by Fig. 3, use the general objective function obtained by the method for embodiment of the present invention offer quickly to restrain, Generally, this general objective function convergence can be made when iterations is less than 200 times.

It addition, the embodiment of the present invention also further study the similarity between source domain and target domain.Real Testing and show, A-distance (A-distance) may be used for identifying the difference between two fields.Assume A-dis tan ce=2 (1-2 ε), ε represent elementary error (such as, the step 207 training the designated model obtained The SVM elementary error that middle training obtains.Fig. 4 shows that a kind of difference for every pair of field carries out testing The experimental result picture arrived.Transverse axis in Fig. 4 is the value of the A-distance making word bag data try to achieve, and the longitudinal axis is The value of the A-distance that the PJNMF method provided by the embodiment of the present invention is tried to achieve.Test result indicate that, The method provided by the embodiment of the present invention, A-distance presents the trend of increase, further demonstrates and pass through When the method that the embodiment of the present invention provides determines the parameters in general objective function, it can be ensured that in the source of acquisition During the field specific topics of field or target domain, only obtain in topic specific to source domain or target domain Take, and be unlikely to the shared topic getting between source domain and target domain；And when obtaining hinge topic, Only the shared topic between source domain and target domain obtains, without from the specific words in the field of source domain The field specific topics of topic or target domain obtains.

Fig. 5 is the viewpoint device for classifying data of a kind of domain-adaptive provided according to an exemplary embodiment Structural representation, the viewpoint device for classifying data of this domain-adaptive may be used for performing above-mentioned Fig. 1 or Fig. 2 The viewpoint data classification method of the domain-adaptive that corresponding embodiment provides.See Fig. 5, this domain-adaptive Viewpoint device for classifying data include:

First determines module 501, for according to the relation between document and the term of source domain, determines that source is led Territory term matrix；

Second determines module 502, for according to the relation between document and the term of target domain, determines mesh Mark field term matrix；

3rd determines module 503, for according to source domain term matrix, source domain specific topics matrix, source Hinge matrix between coefficient matrix and source domain and the target domain of field specific topics matrix, determines that source is led Territory object function；

4th determines module 504, for according to target domain term matrix, target domain specific topics matrix, The coefficient matrix of target domain specific topics matrix and hinge matrix, determine target domain object function；

5th determines module 505, for according to source domain object function and target domain object function, determines total Object function；

6th determines module 506, for determining the desired value of parameters in general objective function respectively；

Training module 507, the viewpoint data of mark in the desired value according to parameters and source domain, Disaggregated model is specified in training；

The viewpoint data of target domain are entered by sort module 508 for the disaggregated model of specifying obtained by training Row classification.

The device that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix, The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results Really property.

In another embodiment, the 3rd determines module 503, for according to source domain term matrix, source neck Territory specific topics matrix, source domain specific topics matrix coefficient matrix and source domain and target domain between Hinge matrix, determines source domain object function by below equation:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

In formula, O_sFor source domain object function, X_sFor source domain term matrix, U₀For described hinge matrix, U_sFor source domain specific topics matrix, V_sFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve Benny's this norm of crow；

4th determines module 504, for according to target domain term matrix, target domain specific topics matrix, The coefficient matrix of target domain specific topics matrix and hinge matrix, determine target domain mesh by below equation Scalar functions:

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

In another embodiment, the 5th determines module 505, for leading according to source domain object function and target Territory object function, determines general objective function by equation below:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

In another embodiment, the 6th determines that module 506 includes:

Allocation unit, is used for the value of respectively one non-negative of parameters random assortment as at the beginning of parameters Initial value；

Computing unit, for the initial value according to parameters, calculates the convergency value of parameters, by each The convergency value of parameter is as the desired value of parameters.

Computing unit is used for:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

Fig. 6 is the structural representation according to a kind of server shown in an exemplary embodiment.This server can For the viewpoint data classification side performing the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides Method.With reference to Fig. 6, server 600 includes processing assembly 622, and it farther includes one or more processor, And by the memory resource representated by memorizer 632, can be by the execution of process assembly 622 for storage Instruction, such as application program.In memorizer 632, the application program of storage can include one or more Each corresponding to one group instruction module.It is configured to perform instruction additionally, process assembly 622, with Perform the viewpoint data classification method of the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides.

Server 600 can also include that a power supply module 626 is configured to perform the power supply of server 600 Management, a wired or wireless network interface 650 is configured to server 600 is connected to network, and one Individual input and output (I/O) interface 658.Server 600 can operate based on the behaviour being stored in memorizer 632 Make system, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM Or it is similar.

Wherein, one or more than one program are stored in memorizer, and are configured to by one or one Individual above processor performs, and one or more than one program comprise the instruction for carrying out following operation:

According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be Hinge matrix between matrix number and source domain and target domain, determines source domain object function；

According to target domain term matrix, target domain specific topics matrix, target domain specific topics square The coefficient matrix of battle array and hinge matrix, determine target domain object function；

According to source domain object function and target domain object function, determine general objective function；

Determine the desired value of parameters in general objective function respectively；

The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model, The viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.

Assume above-mentioned for the first possible embodiment, then based on the embodiment that the first is possible And in the possible embodiment of the second of providing, in the memorizer of server, also comprise below performing The instruction of operation: according to source domain term matrix, source domain specific topics matrix, source domain specific topics Hinge matrix between coefficient matrix and source domain and the target domain of matrix, determines source domain object function, Including:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

In the third the possible embodiment provided based on the embodiment that the second is possible, clothes In the memorizer of business device, also comprise for performing the following instruction operated: according to source domain object function and mesh Mark field object function, determines general objective function, including:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

The 4th kind of possible embodiment party provided based on the first or the third possible embodiment In formula, in the memorizer of server, also comprise for performing the following instruction operated: determine general objective respectively The desired value of parameters in function, including:

In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, clothes In the memorizer of business device, also comprise for performing the following instruction operated: the parameters in general objective function Including U₀、U_s、U_t、V_sAnd V_t；According to the initial value of parameters, calculate the convergency value of parameters, Including:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

The server that the embodiment of the present invention provides, due to the general objective function and the source domain specific topics square that determine The hinge square of the shared topic between battle array, target domain specific topics matrix and expression source domain and target domain Battle array is relevant, thus provides a kind of and realize domain-adaptive by the shared topic between source domain and target domain Viewpoint data classification method.Owing to shared topic can reduce the difference between source domain and target domain, When viewpoint data classification method by this kind of domain-adaptive is classified, it can be ensured that classification results Accuracy.

Fig. 7 is the structural representation of a kind of terminal provided according to an exemplary embodiment, and this terminal can be used In the viewpoint data classification method performing the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides. Specifically:

Terminal 700 can include RF (Radio Frequency, radio frequency) circuit 110, include one or The memorizer 120 of more than one computer-readable recording medium, input block 130, display unit 140, biography Sensor 150, voicefrequency circuit 160, WiFi (Wireless Fidelity, Wireless Fidelity) module 170, include There are one or more than one parts such as the processor 180 processing core and power supply 190.Art technology Personnel are appreciated that the terminal structure shown in Fig. 7 is not intended that the restriction to terminal, can include than figure Show more or less of parts, or combine some parts, or different parts are arranged.Wherein:

RF circuit 110 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially, After the downlink information of base station is received, transfer to one or more than one processor 180 processes；It addition, will Relate to up data and be sent to base station.Generally, RF circuit 110 include but not limited to antenna, at least one Amplifier, tuner, one or more agitator, subscriber identity module (SIM) card, transceiver, coupling Clutch, LNA (Low Noise Amplifier, low-noise amplifier), duplexer etc..Additionally, RF circuit 110 can also be communicated with network and other equipment by radio communication.Described radio communication can use arbitrary logical Beacon is accurate or agreement, include but not limited to GSM (Global System of Mobile communication, entirely Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service), CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code Division Multiple Access, WCDMA), LTE (Long Term Evolution, Long Term Evolution), Email, SMS (Short Messaging Service, Short Message Service) etc..

Memorizer 120 can be used for storing software program and module, and processor 180 is stored in by operation The software program of reservoir 120 and module, thus perform the application of various function and data process.Memorizer 120 can mainly include store program area and storage data field, wherein, storage program area can store operating system, Application program (such as sound-playing function, image player function etc.) etc. needed at least one function；Deposit Storage data field can store the data (such as voice data, phone directory etc.) that the use according to terminal 700 is created Deng.Additionally, memorizer 120 can include high-speed random access memory, it is also possible to include non-volatile depositing Reservoir, for example, at least one disk memory, flush memory device or other volatile solid-state parts. Correspondingly, memorizer 120 can also include Memory Controller, to provide processor 180 and input block The access of 130 pairs of memorizeies 120.

Input block 130 can be used for receive input numeral or character information, and produce with user setup with And function controls relevant keyboard, mouse, action bars, optics or the input of trace ball signal.Specifically, Input block 130 can include Touch sensitive surface 131 and other input equipments 132.Touch sensitive surface 131, also referred to as For touching display screen or Trackpad, can collect user thereon or neighbouring touch operation (such as user makes With any applicable object such as finger, stylus or adnexa on Touch sensitive surface 131 or attached at Touch sensitive surface 131 Near operation), and drive corresponding attachment means according to formula set in advance.Optionally, Touch sensitive surface 131 Touch detecting apparatus and two parts of touch controller can be included.Wherein, touch detecting apparatus detects user's Touch orientation, and detect the signal that touch operation brings, transmit a signal to touch controller；Touch control Device receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 180, And order that processor 180 sends can be received and performed.Furthermore, it is possible to use resistance-type, condenser type, The polytype such as infrared ray and surface acoustic wave realizes Touch sensitive surface 131.Except Touch sensitive surface 131, input is single Unit 130 can also include other input equipments 132.Specifically, other input equipments 132 can include but not It is limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, behaviour Make one or more in bar etc..

Display unit 140 can be used for showing the information inputted by user or the information being supplied to user and terminal The various graphical user interface of 700, these graphical user interface can by figure, text, icon, video and Its combination in any is constituted.Display unit 140 can include display floater 141, optionally, can use LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode) etc. form configure display floater 141.Further, Touch sensitive surface 131 can cover Display floater 141, when Touch sensitive surface 131 detects thereon or after neighbouring touch operation, sends process to Device 180 is to determine the type of touch event, with preprocessor 180 according to the type of touch event at display surface Corresponding visual output is provided on plate 141.Although in the figure 7, Touch sensitive surface 131 and display floater 141 It is to realize input and input function as two independent parts, but in some embodiments it is possible to will Touch sensitive surface 131 is integrated with display floater 141 and realizes input and output function.

Terminal 700 may also include at least one sensor 150, such as optical sensor, motion sensor and its His sensor.Specifically, optical sensor can include ambient light sensor and proximity transducer, wherein, environment Optical sensor can regulate the brightness of display floater 141 according to the light and shade of ambient light, and proximity transducer can be When terminal 700 moves in one's ear, close display floater 141 and/or backlight.As the one of motion sensor, Gravity accelerometer can detect the size of (generally three axles) acceleration in all directions, can time static Detect size and the direction of gravity, can be used for identifying application (such as horizontal/vertical screen switching, the phase of mobile phone attitude Close game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.；As for Gyroscope that terminal 700 can also configure, barometer, drimeter, thermometer, infrared ray sensor etc. other Sensor, does not repeats them here.

Voicefrequency circuit 160, speaker 161, microphone 162 can provide the audio frequency between user and terminal 700 Interface.The signal of telecommunication after the voice data conversion that voicefrequency circuit 160 can will receive, is transferred to speaker 161, Acoustical signal output is converted to by speaker 161；On the other hand, the acoustical signal that microphone 162 will be collected Be converted to the signal of telecommunication, voicefrequency circuit 160 after receiving, be converted to voice data, then by voice data output After reason device 180 processes, through RF circuit 110 to be sent to such as another terminal, or voice data is exported To memorizer 120 to process further.Voicefrequency circuit 160 is also possible that earphone jack, outside providing If earphone and the communication of terminal 700.

WiFi belongs to short range wireless transmission technology, and terminal 700 can help user by WiFi module 170 Sending and receiving e-mail, browse webpage and access streaming video etc., it has provided the user wireless broadband interconnection Net accesses.Although Fig. 7 shows WiFi module 170, but it is understood that, it is also not belonging to terminal 700 must be configured into, can omit completely as required in not changing the scope of essence of invention.

Processor 180 is the control centre of terminal 700, utilizes various interface and the whole mobile phone of connection Various piece, by running or perform to be stored in the software program in memorizer 120 and/or module, and adjusts By the data being stored in memorizer 120, perform the various functions of terminal 700 and process data, thus right Mobile phone carries out integral monitoring.Optionally, processor 180 can include one or more process core；Preferably, Processor 180 can integrated application processor and modem processor, wherein, application processor mainly processes Operating system, user interface and application program etc., modem processor mainly processes radio communication.Permissible Being understood by, above-mentioned modem processor can not also be integrated in processor 180.

Terminal 700 also includes the power supply 190 (such as battery) powered to all parts, it is preferred that power supply can With logically contiguous with processor 180 by power-supply management system, thus realize management by power-supply management system The functions such as charging, electric discharge and power managed.Power supply 190 can also include one or more directly Stream or alternating current power supply, recharging system, power failure detection circuit, power supply changeover device or inverter, electricity The random component such as source positioning indicator.

Although not shown, terminal 700 can also include photographic head, bluetooth module etc., does not repeats them here. It is concrete that the display unit of terminal is touch-screen display, and terminal also includes memorizer in the present embodiment, And one or more than one program, one of them or more than one program is stored in memorizer, And be configured to be performed by one or more than one processor.One or more than one program comprise For performing the following instruction operated:

Assume above-mentioned for the first possible embodiment, then based on the embodiment that the first is possible And in the possible embodiment of the second of providing, in the memorizer of terminal, also comprise for performing following behaviour The instruction made:

According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be Hinge matrix between matrix number and source domain and target domain, determines source domain object function, including:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

In the third the possible embodiment provided based on the embodiment that the second is possible, eventually In the memorizer of end, also comprise for performing the following instruction operated: according to source domain object function and target Field object function, determines general objective function, including:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

The 4th kind of possible embodiment party provided based on the first or the third possible embodiment In formula, in the memorizer of terminal, also comprise for performing the following instruction operated: determine the catalogue offer of tender respectively The desired value of parameters in number, including:

In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, eventually In the memorizer of end, also comprise for performing the following instruction operated: the parameters bag in general objective function Include U₀、U_s、U_t、V_sAnd V_t；According to the initial value of parameters, calculate the convergency value of parameters, bag Include:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

The terminal that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix, The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results Really property.

Embodiments providing a kind of computer-readable recording medium, this computer-readable recording medium can To be the computer-readable recording medium included in the memorizer in above-described embodiment；Can also be individually to deposit , it is unkitted the computer-readable recording medium allocating in terminal.This computer-readable recording medium storage has one Individual or more than one program, this or more than one program are by one or more than one processor Being used for performing the viewpoint data classification method of domain-adaptive, the method includes:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, U_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} H_{t}^{T}]}{[λ_{t} X_{t}^{(r)} H_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

The computer-readable recording medium that the embodiment of the present invention provides, owing to the general objective function determined is led with source Sharing between territory specific topics matrix, target domain specific topics matrix and expression source domain and target domain The hinge matrix of topic is relevant, thus provides a kind of real by the shared topic between source domain and target domain The viewpoint data classification method of existing domain-adaptive.Owing to shared topic can reduce source domain and target domain Between difference so that when being classified by the viewpoint data classification method of this kind of domain-adaptive, permissible Guarantee the accuracy of classification results.

Providing a kind of graphical user interface in the embodiment of the present invention, this graphical user interface is used in terminal, This terminal include touch-screen display, memorizer and for perform one or one of more than one program Or more than one processor；This graphical user interface includes:

The graphical user interface that the embodiment of the present invention provides, owing to the general objective function determined is specific with source domain Shared topic between topic matrix, target domain specific topics matrix and expression source domain and target domain Hinge matrix is relevant, thus provides a kind of and realize field by the shared topic between source domain and target domain Adaptive viewpoint data classification method.Owing to shared topic can reduce between source domain and target domain Difference so that when being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that point The accuracy of class result.

It should be understood that the viewpoint data of the domain-adaptive of above-described embodiment offer are sorted in and carry out field During the classification of adaptive viewpoint data, being only illustrated with the division of above-mentioned each functional module, reality should In with, can as desired above-mentioned functions distribution be completed by different functional modules, will device interior Portion's structure is divided into different functional modules, to complete all or part of function described above.It addition, Viewpoint device for classifying data, server and the terminal of the domain-adaptive that above-described embodiment provides are adaptive with field The viewpoint data classification method embodiment answered belongs to same design, and it implements process and refers to embodiment of the method, Here repeat no more.

One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can be passed through Hardware completes, it is also possible to instructing relevant hardware by program and complete, described program can be stored in In a kind of computer-readable recording medium, storage medium mentioned above can be read only memory, disk or CD etc..

The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all the present invention's Within spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in the present invention's Within protection domain.

Claims

1. the viewpoint data classification method of a domain-adaptive, it is characterised in that described method includes:

Method the most according to claim 1, it is characterised in that described according to described source domain term Matrix, source domain specific topics matrix, the coefficient matrix of described source domain specific topics matrix and described source neck Hinge matrix between territory and described target domain, determines source domain object function, including:

According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, passes through below equation Determine source domain object function:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

Described according to described target domain term matrix, target domain specific topics matrix, described target neck The coefficient matrix of territory specific topics matrix and described hinge matrix, determine target domain object function, including:

According to described target domain term matrix, target domain specific topics matrix, described target domain spy Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function by below equation:

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

In formula, O_tFor target domain object function, X_tFor target domain term matrix, U₀For described hinge Matrix, U_tFor target domain specific topics matrix, V_tCoefficient matrix for target domain specific topics matrix.

Method the most according to claim 2, it is characterised in that described according to described source domain target letter Several and described target domain object function, determines general objective function, including:

According to described source domain object function and described target domain object function, determined always by equation below Object function:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

4. according to the method described in claim 1 or 3, it is characterised in that described determine described catalogue respectively The desired value of parameters in scalar functions, including:

It is respectively the value initial value as described parameters of described one non-negative of parameters random assortment；

According to the initial value of described parameters, calculate the convergency value of described parameters, by each ginseng described The convergency value of number is as the desired value of described parameters.

Method the most according to claim 4, it is characterised in that each ginseng in described general objective function Number includes U₀、U_s、U_t、V_sAnd V_t；

The described initial value according to described parameters, calculates the convergency value of described parameters, including:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} L_{t}^{T}]}{[λ_{t} X_{t}^{(r)} L_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

6. the viewpoint device for classifying data of a domain-adaptive, it is characterised in that described device includes:

Device the most according to claim 6, it is characterised in that the described 3rd determines module, for root According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics matrix Hinge matrix between coefficient matrix and described source domain and described target domain, determines source by below equation Field object function:

O_{s} = {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2}

Described 4th determines module, for according to described target domain term matrix, the specific words of target domain Topic matrix, the coefficient matrix of described target domain specific topics matrix and described hinge matrix, by following public affairs Formula determines target domain object function:

O_{t} = {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2}

Device the most according to claim 7, it is characterised in that the described 5th determines module, for root According to described source domain object function and described target domain object function, determine the catalogue offer of tender by equation below Number:

Φ = λ_{s} {| | X_{s} - [U_{0}, U_{s}] V_{s} | |}_{F}^{2} + λ_{t} {| | X_{t} - [U_{0}, U_{t}] V_{t} | |}_{F}^{2} + D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t})

\begin{matrix} D (U_{0}, U_{s}, U_{t}, V_{s}, V_{t}) = α {| | U_{0}^{T} U_{s} | |}_{F}^{2} + β {| | U_{0}^{T} U_{t} | |}_{F}^{2} + γ {| | U_{s}^{T} U_{t} | |}_{F}^{2} \\ + Tr (A^{U_{0}} U_{0}^{T}) + Tr (A^{U_{s}} U_{S}^{T}) + Tr (A^{U_{t}} U_{t}^{T}) + Tr (A^{V_{s}} V_{s}^{T}) + Tr (A^{V_{t}} V_{t}^{T}) \end{matrix}

λ_{s} = {| | X_{s} | |}_{F}^{- 2}

λ_{t} = {| | X_{t} | |}_{F}^{- 2}

9. according to the device described in claim 6 or 8, it is characterised in that the described 6th determines that module includes:

Allocation unit, for the most described one non-negative of parameters random assortment value as described each The initial value of parameter；

Computing unit, for the initial value according to described parameters, calculates the convergency value of described parameters, Using the convergency value of described parameters as the desired value of described parameters.

Device the most according to claim 9, it is characterised in that each in described general objective function Parameter includes U₀、U_s、U_t、V_sAnd V_t；

Described computing unit is used for:

According to U₀Initial value, according to

U_{0}^{m} = U_{0}^{m - 1} \frac{[λ_{s} X_{s} H_{s}^{T} + λ_{t} X_{t} H_{t}^{T}]}{[λ_{s} X_{s}^{(r)} H_{s}^{T} + λ_{t} X_{t}^{(r)} H_{t} + (α U_{s} U_{s}^{T} + β U_{t} U_{t}^{T}) U_{0}]}

According to U_sInitial value, according to

U_{s}^{m} = U_{s}^{m - 1} \frac{[λ_{s} X_{s} L_{s}^{T}]}{[λ_{s} X_{s}^{(r)} L_{s}^{T} + (α U_{0} U_{0}^{T} + γ U_{t} U_{t}^{T}) U_{s}]}

According to U_tInitial value, according to

U_{t}^{m} = U_{t}^{m - 1} \frac{[λ_{t} X_{t} L_{t}^{T}]}{[λ_{t} X_{t}^{(r)} L_{t}^{T} + (β U_{0} U_{0}^{T} + γ U_{s} U_{s}^{T}) U_{t}]}

According to V_tInitial value, according toTo V_tIt is iterated calculating, until obtaining V_tConvergency valueIn formula,The V obtained for last iteration_tValue,According toIteration obtains The V arrived_tValue.