CN105095277B - The classification method and device of cross-cutting viewpoint data - Google Patents

The classification method and device of cross-cutting viewpoint data Download PDF

Info

Publication number
CN105095277B
CN105095277B CN201410201027.7A CN201410201027A CN105095277B CN 105095277 B CN105095277 B CN 105095277B CN 201410201027 A CN201410201027 A CN 201410201027A CN 105095277 B CN105095277 B CN 105095277B
Authority
CN
China
Prior art keywords
matrix
domain
value
parameter
source domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410201027.7A
Other languages
Chinese (zh)
Other versions
CN105095277A (en
Inventor
周光有
薛伟
王巨宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Tencent Cyber Tianjin Co Ltd
Original Assignee
Institute of Automation of Chinese Academy of Science
Tencent Cyber Tianjin Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science, Tencent Cyber Tianjin Co Ltd filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN201410201027.7A priority Critical patent/CN105095277B/en
Publication of CN105095277A publication Critical patent/CN105095277A/en
Application granted granted Critical
Publication of CN105095277B publication Critical patent/CN105095277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses the classification methods and device of a kind of cross-cutting viewpoint data, belong to Internet technical field.Method includes: to obtain shared topic matrix according to the shared topic of source domain and target domain, and obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain respectively according to the field specific topics of source domain and the specific topics of target domain;It determines the objective function of source domain, and determines the objective function of target domain;Catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain;The convergency value for determining parameters in catalogue scalar functions obtains classification function according to the convergency value of parameters in catalogue scalar functions;Classified according to viewpoint data of the classification function to target domain.The present invention classifies to cross-cutting viewpoint data by sharing the classification function that topic matrix obtains, and since shared topic matrix can reduce the gap of different field, thus improves the precision to cross-cutting viewpoint data classification.

Description

The classification method and device of cross-cutting viewpoint data
Technical field
The present invention relates to Internet technical field, in particular to a kind of the classification method and device of cross-cutting viewpoint data.
Background technique
With the development of internet technology, more and more share about the viewpoint data of User Perspective on the net, these Viewpoint data exist in the form of the user comment of shopping website, blog articles, user feedback etc..Due to the viewpoint number on internet According to being related to different fields, and the viewpoint data of different field are important to instructing user to have in the production practices of different field Meaning, therefore, it is necessary to which the viewpoint data for obtaining different field are studied.Again since the data volume of internet is larger, it is difficult The data in field each in internet are labeled, therefore, how to be classified to cross-cutting viewpoint data, become acquisition not The key of the viewpoint data of same domain.
To use SFA (Spectral Feature Alignment, the feature queue of spectrum) algorithm to cross-cutting viewpoint number For being classified, the relevant technologies when classifying to cross-cutting viewpoint data, a source domain arbitrarily selected first and Target domain, and determine the field specific word and field autonomous word of source domain and target domain, then in the special word in field and neck A two-dimensional plot is constructed between the autonomous word of domain, which is used to indicate the cooccurrence relation of field special word and field autonomous word, In turn the special word in more field will be contacted using SFA algorithm in two-dimensional plot and field autonomous word is assigned in a cluster, due to this A cluster can reduce the gap between the special word in field of source domain and target domain, therefore, can be according to this cluster training one Classifier, and then classified by the classifier that training obtains to cross-cutting viewpoint data.
In the implementation of the present invention, inventor find the relevant technologies the prior art has at least the following problems:
The relevant technologies are when classifying to cross-cutting viewpoint data, due to selected source domain and target domain and different Surely there is the special word in specific field and field autonomous word, therefore, the knot that the relevant technologies classify to cross-cutting viewpoint data Fruit is simultaneously inaccurate.
Summary of the invention
In order to solve the problems, such as the relevant technologies, the embodiment of the invention provides a kind of classification methods of cross-cutting viewpoint data And device.The technical solution is as follows:
In a first aspect, providing a kind of classification method of cross-cutting viewpoint data, which comprises
Shared topic matrix is obtained according to the shared topic of source domain and target domain, and specific according to the field of source domain The field specific topics of topic and target domain obtain the field of field the specific topics matrix and target domain of source domain respectively Specific topics matrix;
It is led according to the shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source The term matrix in domain determines the objective function of source domain, and specific according to the shared topic matrix, the field of target domain The term matrix of topic matrix and target domain determines the objective function of target domain;
Catalogue scalar functions are determined according to the objective function of the objective function of the source domain and the target domain, and are determined The convergency value of parameters in the catalogue scalar functions;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to the classification function to target The viewpoint data in field are classified.
Second aspect, provides a kind of sorter of cross-cutting viewpoint data, and described device includes:
First obtains module, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module, for being distinguished according to the field specific topics of source domain and the field specific topics of target domain Obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain;
First determining module, for field specific topics matrix, the source domain according to the shared topic matrix, source domain Polarity Matrix and the term matrix of source domain determine the objective function of source domain;
Second determining module, for the field specific topics matrix and mesh according to the shared topic matrix, target domain The term matrix in mark field determines the objective function of target domain;
Third determining module, for being determined according to the objective function of the source domain and the objective function of the target domain Catalogue scalar functions;
4th determining module, for determining the convergency value of parameters in the catalogue scalar functions;
Third obtains module, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module, for being classified according to viewpoint data of the classification function to target domain.
Technical solution provided in an embodiment of the present invention has the benefit that
By obtaining the shared topic matrix of source domain and target domain, and according to the field of source domain and target domain spy Determine topic and construct the field specific topics matrix of source domain and the field specific topics matrix of target domain respectively, and then according to altogether Enjoy field specific topics matrix, the Polarity Matrix of source domain, the term matrix of source domain, target of topic matrix, source domain After the field specific topics matrix in field and the term matrix of target domain determine catalogue scalar functions, according to catalogue scalar functions The convergency value of middle parameters obtains classification function, and is classified according to viewpoint data of the classification function to target domain.By Can be used as the difference between bridge reduction field in shared topic, therefore, according to above-mentioned classification function to cross-cutting viewpoint When data are classified, the accuracy of classification can be improved.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other Attached drawing.
Fig. 1 is the classification method flow chart for the cross-cutting viewpoint data that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides cross-cutting viewpoint data classification method flow chart;
Fig. 3 be another embodiment of the present invention provides the knot classified using different algorithms to cross-cutting viewpoint data Fruit schematic diagram;
Fig. 4 be another embodiment of the present invention provides the knot classified using different algorithms to cross-cutting viewpoint data Fruit schematic diagram;
Fig. 5 be another embodiment of the present invention provides convergence curve schematic diagram;
Fig. 6 be another embodiment of the present invention provides convergence curve schematic diagram;
Fig. 7 be another embodiment of the present invention provides cross-cutting viewpoint data sorter structural schematic diagram;
Fig. 8 be another embodiment of the present invention provides third determining module structural schematic diagram;
Fig. 9 be another embodiment of the present invention provides a kind of server structural schematic diagram.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention Formula is described in further detail.
With the development of internet technology, viewpoint data sharing has become a development trend of today's society.Due to The viewpoint data of different field have great importance to consumer-oriented production practices, and the number of the viewpoint data on internet Amount and type are more, therefore, it is necessary to classify to the different viewpoints data on internet.For this purpose, the embodiment of the present invention mentions A kind of classification method of cross-cutting viewpoint data is supplied, referring to Fig. 1, method flow provided in this embodiment includes:
101: shared topic matrix being obtained according to the shared topic of source domain and target domain, and according to the field of source domain Specific topics and the field specific topics of target domain obtain the field specific topics matrix and target domain of source domain respectively Field specific topics matrix.
102: being led according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source The term matrix in domain determines the objective function of source domain, and according to the field specific topics of shared topic matrix, target domain The term matrix of matrix and target domain determines the objective function of target domain.
103: catalogue scalar functions being determined according to the objective function of the objective function of source domain and target domain, and determine catalogue The convergency value of parameters in scalar functions.
104: classification function being obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target The viewpoint data in field are classified.
As a kind of optional embodiment, according to shared topic matrix, field specific topics matrix, the source domain of source domain Polarity Matrix and source domain term matrix determine source domain objective function ψsAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys For the Polarity Matrix of source domain, CsFor diagonal matrix.
As a kind of optional embodiment, according to the field specific topics of shared topic matrix, source domain and target domain The objective function ψ for the target domain that the term matrix of matrix and target domain determinestAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain Topic matrix, VtFor the document topic matrix of target domain, Vt TFor the transposed matrix of the document topic matrix of source domain.
As a kind of optional embodiment, determined according to the objective function of the objective function of source domain and target domain total Objective function ψ are as follows:
As a kind of optional embodiment, the convergency value of parameters in catalogue scalar functions is determined, comprising:
According to formulaIterate to calculate parameter U0's Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergence ValueWherein, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient of the shared topic matrix of target domain Matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, LSFor source neck The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil working as Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueUntil Parameter currentConvergence, and by convergent current iteration valueConvergency value as parameter UtLtFor the neck of target domain The coefficient matrix of domain specific topics matrix;
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, the classification function y obtained according to the convergency value of parameters in catalogue scalar functionsi Are as follows:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain The row at place, j viThe corresponding column of the row at place.
Method provided in an embodiment of the present invention, by the shared topic matrix of acquisition source domain and target domain, and according to The field specific topics of source domain and target domain construct the neck of field the specific topics matrix and target domain of source domain respectively Domain specific topics matrix, so according to shared topic matrix, the field specific topics matrix of source domain, source domain polarity square The term matrix of battle array, the field specific topics matrix of the term matrix of source domain, target domain and target domain determines total After objective function, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to mesh The viewpoint data in mark field are classified.Since shared topic can be used as the difference between bridge reduction field, in root When classifying according to above-mentioned classification function to cross-cutting viewpoint data, the accuracy of classification can be improved.
There are a certain number of shared topics in source domain and target domain, and source domain and target domain are respectively provided with neck In the case where the special topic in domain, the embodiment of the invention provides a kind of classification methods of cross-cutting viewpoint data.This implementation provides Method when classifying to cross-cutting viewpoint data, using a kind of new algorithm TCT (Topical Correspondence Transfer, topic unanimously shift) algorithm.The TCT algorithm is based on the shared topic training one between field A classification function, and classified using classification function to cross-cutting viewpoint data.Referring to fig. 2, provided in an embodiment of the present invention Method flow includes:
201: topic matrix being shared according to the acquisition of source domain and target domain, and according to the specific topics and mesh of source domain The specific topics in mark field obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain.
Wherein, source domain can be books field, electronic field, garment industry etc., and the present embodiment does not have source domain work The restriction of body.Source domain is set as Xs, the number of files for including in source domain is nsIt is a, the number for the term for including in each document Amount is m, then source domain can be indicated with a term matrix, obtains the term matrix of source domain:
Due to including m term in document each in source domain, the term matrix of source domain is also shown asI.e.
Due to marking polar documents comprising a certain number of in source domain, it is polar for being marked in source domain Document can use a document Polarity Matrix YsIt indicates.Wherein, YsFor ns× 2 rank matrixes, nsNumber for the document for including in source domain Amount, there are two types of the polar groups of 2 expression documents: a kind of polarity is positive, and indicates that the viewpoint of document expression is positive viewpoint, Yi Zhongji Property be negative, indicate document expression viewpoint be negative sense viewpoint.The polarity chron of document in determining source domain, in source domain For i documents, if first y in the corresponding Polarity Matrix of i-th document of source domaini=1, then it can determine i-th in source domain The polarity of piece document is positive, i.e. the viewpoint of the document expression is positive viewpoint;If the corresponding polarity square of i-th document of source domain First y in battle arrayi=-1, it is determined that the polarity of i-th document is negative in source domain, i.e., the viewpoint of the document expression is negative sense sight Point.Certainly, other than aforesaid way, other methods of determination also can be used, the present embodiment does not limit this specifically.
Wherein, target domain can be the neck different from source domain such as books field, electronic field, field of kitchen products Domain, the present embodiment do not make specific limit to target domain.Target domain is set as Xt, the number of files for including in target domain is ntA, the quantity for the term for including in each document is m, then target domain can be indicated with a term matrix, be obtained The term matrix of target domain:
Due to including m term in document each in target domain, the term matrix also table of target domain It is shown asI.e.
Since the classification method of cross-cutting viewpoint data provided in this embodiment is mainly based upon source domain and target domain Shared topic realize that and the shared topic of source domain and target domain can as the bridge between source domain and target domain To reduce the gap of source domain and target domain, make it possible that knowledge is transmitted across field.Therefore, in order to cross-cutting right Viewpoint data are classified, and method provided in this embodiment is it needs to be determined that share the quantity of topic.Wherein, source domain and target neck The shared topic in domain is the topic that source domain and target domain can all be related to.For example, source domain is books field, target domain is Garment industry, the topics such as " valuableness ", " cheap " can all be related in source domain and target domain, therefore, the words such as " valuableness ", " cheap " Topic can be used as shared topic.
For the ease of subsequent analytical calculation, the quantity of shared topic is set in the present embodiment as k0Shared topic matrix For U0, then according to the shared topic matrix for sharing topic acquisition are as follows:
Due to including m term in each document in source domain and target domain, share topic matrix also It can be expressed asI.e.Wherein, sharing each column in topic matrix indicates source domain and target neck One shared topic in domain.
Further, since source domain and target domain not only have shared topic, but also also each have field specific Topic, and characterization of the field specific topics as each field uniqueness, and realize the important of cross-cutting viewpoint data classification Foundation.Therefore, method provided in this embodiment it is cross-cutting classify to viewpoint data before, need for source domain set source The field specific topics in field set the field specific topics of target domain for target domain.Wherein, the field of source domain is specific Topic is the exclusive topic of source domain, and the field specific topics of target domain are the exclusive topic of target domain.For example, if source is led Domain is electronics field, and target domain is books field, then the topics such as " power consumption ", " sensitive " are the specific words in field of source domain Topic, the topics such as " exquisiteness ", " tediously long " are the field specific topics of target domain.For the ease of subsequent analytical calculation, as far as possible Ground reduces the gap between source domain and target domain, and method provided in this embodiment can set number for source domain and target domain Measure identical field specific topics.
If the quantity of the field specific topics of source domain is k, the specific topics matrix of source domain is Us, then according to source domain Field specific topics obtain source domain field specific topics matrix are as follows:
Due to including m term in each document in source domain, the field specific topics matrix of source domain It is also denoted as Rm×k, i.e. Us∈Rm×k.Wherein, each column in the field specific topics matrix of source domain indicate source domain One specific topics.
If the quantity of the field specific topics of target domain is k, the field specific topics matrix of target domain is Ut, then root According to the field specific topics matrix for the target domain that the field specific topics of target domain obtain are as follows:
Ut=[u1 (t)..., uk (t)]。
Due to including m term in each document in target domain, the field specific topics of target domain Matrix is also denoted as Rm×k, i.e. Ut∈Rm×k.Wherein, each column in the field specific topics matrix of target domain indicate mesh One specific topics in mark field.
202: being led according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source The term matrix in domain determines the objective function of source domain.
Since the objective function of source domain is that the cross-cutting viewpoint data to target domain are classified in subsequent step Therefore important evidence needs first to determine the target of source domain before the cross-cutting viewpoint data to target domain are classified Function.About the method for the objective function for determining source domain, the present embodiment is not especially limited, including but not limited to according to shared The term matrix determination of topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain.
Specifically, according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source The objective function ψ for the source domain that the term matrix in field determinessAre as follows:
Wherein, | | | |2 FTo take this black norm of Luo Beini;Tr [] is trace of a matrix;XsFor the term matrix of source domain;U0 To share topic matrix;UsFor the field specific topics matrix of source domain;VsFor the document topic matrix of source domain;For source neck The transposed matrix of the document topic matrix in domain;α is arbitrary parameter, can be 1,2,3 etc., the present embodiment does not have the value work of α The restriction of body;WsFor linear model coefficients, for predicting VsThe polarity of viewpoint data;YsFor the Polarity Matrix of source domain;CsIt is right Angular moment battle array, every a line of diagonal matrix and each column all correspond to a document in source domain.Wherein, CsFor a ns×ns Rank matrix, Ke YiyongIt indicates.About diagonal matrix CsIn each element setting means, including but not limited to using as follows Mode: if diagonal matrix CsThe corresponding document of some element is to mark polar document on middle diagonal line, then by the element Value is set as 1, i.e., when i-th of document in source domain is to mark polar document, Cs(i, i)=1;If diagonal matrix CsIn it is right The corresponding document of some element is not mark polar document on linea angulata, then the value of the element is set as 0, i.e., when in source domain I-th of document be C when not marking polar documents(i, i)=0.
Further, by the expression formula of the objective function of the source domain of above-mentioned determination it is found that shared topic matrix and source neck The field specific topics matrix in domain is the key that the objective function of determining source domain, therefore, in the target letter of determining source domain Before number, need first to determine the field specific topics matrix of shared topic matrix and source domain.Topic matrix is shared about determining And the method for the field specific topics matrix of source domain, including but not limited to the term matrix of source domain decompose It arrives.By by the term matrix X of source domainsAvailable two matrixes are decomposed, a matrix is the document topic of source domain Matrix Vs, a matrix is the entry topic matrix U of source domains`.Wherein, the entry topic matrix U of source domains` be a m × (k+k0) rank matrix, i.e.,The entry topic matrix U of source domainsThe matrix for including in ` includes but unlimited In shared topic matrix U0With the field specific topics matrix U of source domains.The document topic matrix V of source domainsFor a ns× (k+k0) rank matrix, i.e.,Every a line in matrix indicates a document in source domain.The text of source domain Shelves topic matrix VsIn include matrix include but is not limited to matrix HsAnd Ls, wherein HsFor a ns×k0Rank matrix, HsIt is total Enjoy the coefficient matrix of topic matrix;LsFor a ns× k rank matrix, LsFor the coefficient square of the field specific topics matrix of source domain Battle array.
About the method for decomposing the term matrix of source domain, Non-negative Matrix Factorization method is including but not limited to used The term matrix of source domain is decomposed.Wherein, Non-negative Matrix Factorization method is that all elements are nonnegative number in a matrix Matrix disassembling method under constraint condition, Non-negative Matrix Factorization method are non-negative at several by matrix decomposition by finding low-rank Matrix.Had much in practical application using the example of Non-negative Matrix Factorization method split-matrix, such as uses Non-negative Matrix Factorization number Word statistics and stock price in pixel, text analyzing in word image etc..The basic thought of Non-negative Matrix Factorization method can To be briefly described are as follows: for an any given nonnegative matrix A, a nonnegative matrix U and a nonnegative matrix can be found V allows non-negative matrix A to resolve into the product of nonnegative matrix U and V.Text, image are carried out using Non-negative Matrix Factorization method The analysis of large-scale data, more traditional Processing Algorithm speed faster, it is more convenient.
203: according to shared topic matrix, the field specific topics matrix of target domain and the term matrix of target domain Determine the objective function of target domain.
Since the objective function of target domain is that the cross-cutting viewpoint data to target domain are classified in subsequent step Important evidence therefore need first to determine target domain before the cross-cutting viewpoint data to target domain are classified Objective function.About the method for the objective function for determining target domain, including but not limited to according to shared topic matrix, target neck The field specific topics matrix in domain and the term matrix of target domain determine.
Specifically, according to the field specific topics matrix and target domain of shared topic matrix, source domain and target domain Term matrix determine target domain objective function ψtAre as follows:
Wherein, | | | |2 FTo take this black norm of Luo Beini;XtFor the term matrix of target domain;U0To share topic square Battle array;UtFor the field specific topics matrix of target domain;VtFor the document topic matrix of target domain, Vt TFor the text of target domain The transposed matrix of shelves topic matrix.
Further, by the expression formula of the objective function of the target domain of above-mentioned determination it is found that shared topic matrix and mesh The field specific topics matrix in mark field is the key that the objective function of determining target domain, therefore, in determining target domain Objective function before, need first to determine the field specific topics matrix of shared topic matrix, target domain.It is shared about determining Topic matrix, target domain field specific topics matrix method, including but not limited to by the term matrix of target domain It is decomposed to obtain.By by the term matrix X of target domaintIt carries out decomposing available two matrixes, a matrix is The document topic matrix V of target domaint, a matrix is the entry topic matrix U of target domaint`。
Wherein, the entry topic matrix U of target domaint` is a m × (k+k0) rank matrix, i.e.,The entry topic matrix U of target domaintThe matrix for including in ` includes but is not limited to shared topic square Battle array U0With the field specific topics matrix U of target domaint.The document topic matrix V of target domaintFor a nt×(k+k0) rank square Battle array, i.e.,Every a line in matrix indicates a document in target domain.The document topic of target domain Matrix VtIn include matrix include but is not limited to matrix HtAnd Lt, wherein HtFor a nt×k0Rank matrix, HtTo share topic The coefficient matrix of matrix;LtFor a nt× k rank matrix, LtFor the coefficient matrix of the field specific topics matrix of target domain.
About the method for decomposing the term matrix of target domain, Non-negative Matrix Factorization is including but not limited to used Method decomposes the term matrix of target domain.
It should be noted that the present embodiment does not determine the objective function of source domain and the target of target domain to above-mentioned execution The sequencing of the process of function is defined, and when specifically executing, both can first determine the objective function of source domain, can also be first Determine the objective function of target domain.
204: catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain.
Target domain obtained in the objective function and above-mentioned steps 203 of the source domain as obtained in above-mentioned steps 202 Objective function be the catalogue offer of tender that is complementary, and will being obtained according to the objective function of source domain and the objective function of target domain Number makees optimization and handles the precision and speed that can be improved to the viewpoint data classification of target domain.Therefore, in order to quick and precisely Ground classifies to the viewpoint data of target domain, method provided in this embodiment the viewpoint data classification to target domain it Before, it needs first to determine a catalogue scalar functions according to the objective function of source domain and the objective function of target domain.
About the method for determining catalogue scalar functions according to the objective function of source domain and the objective function of target domain, this reality Example is applied to be not especially limited, including but unlimited be limited to the following method: by the mesh of the objective function of source domain and target domain Scalar functions make additional calculation, and then obtain a catalogue scalar functions.Therefore, according to the objective function of source domain and target domain The catalogue scalar functions ψ that objective function determines are as follows:
Wherein, the parameter in catalogue scalar functions includes but is not limited to U0、Us、Vs、Ws、UtAnd VtDeng.
Further, after obtaining catalogue scalar functions, method provided in this embodiment needs to make most catalogue scalar functions Optimization processing.About the method for making to optimize processing to catalogue scalar functions, following formula is including but not limited to used:
Wherein, d ∈ { s, t }, when d is s, UdAnd VdRespectively UsAnd Vs, when d is t, UdAnd VdRespectively UtAnd Vt.It is logical It crosses above-mentioned formula and parameters U in available general objective is made during optimization processing to catalogue scalar functions0、Us、Vs、Ws、 UtAnd VtConvergency value, and the convergency value be subsequent step in obtain classification function key.
It should be noted that needing due to when making to optimize processing to objective function by U0、Ud、VdIt is decomposed.In By U0、Ud、VdNegative matrix may be obtained when decomposition, and objective function is divided using TCT algorithm provided in this embodiment When class, it need to guarantee that each matrix in calculating process is nonnegative matrix.Therefore, in order to avoid to U0、Ud、VdThere is negative square when decomposition The appearance of battle array can be U before making optimization processing to catalogue scalar functions0、Ud、VdRestrictive condition is decomposed in setting.Wherein, it sets Decomposition restrictive condition include but is not limited to:
Wherein, U0 TFor the transposed matrix for sharing topic matrix;Ud TFor source domain or the field specific topics square of target domain The transposed matrix of battle array, when d is s, Ud TFor the field specific topics matrix of source domain, when d is t, Ud TFor the neck of target domain Domain specific topics matrix;I is unit matrix, value 1.
205: determining the convergency value of parameters in catalogue scalar functions.
Make to optimize the formula used when processing it is found that optimal if taking catalogue scalar functions to catalogue scalar functions by above-mentioned Solution, need to guarantee that the parameters in catalogue scalar functions take a certain minimum value, which is parameters in catalogue scalar functions Convergency value.
Wherein, the parameters of catalogue scalar functions are U0、Us、Vs、Ws、UtAnd Vt, each in catalogue scalar functions to determining below The process of the convergency value of a parameter is introduced one by one:
(1) parameter U is determined0Convergency value:
Firstly, introducing lagrange formula to parameter U0It is calculated:
Wherein,For Lagrange multiplier, for limiting
Secondly, make derivative operation to above formula, i.e., so thatIt is available by derivative operation:
Again, using KKT (Karush-Kuhn-Tucke, Caro need-Kuhn-Tucker condition) condition to above-mentioned formula into Row limits, and obtains parameter U0Convergence formula are as follows:
Wherein,For inner product operation symbol, t represents current iteration, and t-1 represents last iteration, HsFor being total to for source domain Enjoy the coefficient matrix of topic matrix, HtFor the coefficient matrix of the shared topic matrix of target domain.
Further, it ensures that and parameters is obtained according to the convergence formula of parameters in catalogue scalar functions Convergency value, method provided in this embodiment are determining parameter U as procedure described above0Convergence formula after, will also be to parameter U0 Convergence formula carry out convergence verifying.Before carrying out convergence verifying, need first to introduce a definition, lemma and theorem.
Wherein, introducing is defined as: F (X, X ') is the auxiliary function of L (X), if L (X)≤F (X, X '), then when and only As L (X)=F (X, X '), equal sign is set up.
Wherein, the lemma of introducing are as follows: if F is the auxiliary function of L, L is non-increasing in following renewal sequence.
It is as follows for the proof procedure of above-mentioned lemma:
Since F is the auxiliary function of L, L (X(t+1))≤F(X(t+1), X(t)), and due to F (X(t+1), X(t))≤F(X(t), X(t))=L (X), therefore, L (X(t+1))≤L(X(t)), i.e. L is non-increasing in renewal sequence.
Wherein, the theorem of introducing are as follows: if function
For L (U0) auxiliary function, then convex functionU will be converged on0
After introducing formula, lemma and theorem, it can be obtained according to above-mentioned formula, lemma and theorem
Similarly, to the other parameters U of subsequent determinations、Vs、Ws、UtAnd VtConvergence formula convergence proof process with above-mentioned Parameter U0, details are not described herein again.
Based on above content, according to formulaRepeatedly For calculating parameter U0Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueMake For parameter U0Convergency value
For above-mentioned determining parameter U0Convergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if U0For 2 × 2 rank matrixes, then according to U0 Dimension at random be U.Choose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to FormulaIn, first time iteration is obtained by calculation Iterative value
Again, the current iteration value that will be obtainedIt is updated to
In, it is obtained by calculation second repeatedly The iterative value in generationContinue to obtained current iteration valueIt is iterated calculating, until current iteration valueConvergence, this When can be by convergent current iteration valueAs parameter U0Convergency value
(2) parameter U is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter UsConvergence formula are as follows:
Wherein, LSFor the coefficient matrix of the field specific topics matrix of source domain.
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency value
For above-mentioned determining parameter UsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if UsFor 2 × 2 rank matrixes, then according to Us Dimension at random be UsChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through meter Calculation obtains the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter UsConvergency value
(3) parameter V is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter VsConvergence formula are as follows:
According to formulaIterate to calculate parameter VsCurrent iteration ValueUntil current iteration valueConvergence, and by convergent current iteration valueConvergency value as parameter Vs
For above-mentioned determining parameter VsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if VsFor 2 × 2 rank matrixes, then according to VS Dimension at random be VSChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated to
In, the iteration of second of iteration is obtained by calculation ValueContinue to obtained current iteration valueIt is iterated calculating, until current iteration valueConvergence can incite somebody to action at this time Convergent current iteration valueAs parameter VsConvergency value
(4) parameter W is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter WsConvergence formula are as follows:
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil Current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
For above-mentioned determining parameter WsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if WsFor 2 × 2 rank matrixes, then according to Ws's Dimension is at random WsChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to public affairs FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through calculating Obtain the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter WsConvergency value
(5) parameter U is determinedtConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter UtConvergence formula are as follows:
Wherein, LtFor the coefficient matrix of the field specific topics matrix of target domain.
According to formulaIterate to calculate parameter UtCurrent iteration valueDirectly To parameter currentConvergence, and by convergent current iteration valueAs parameter UtConvergency value
For above-mentioned determining parameter UtConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if UtFor 2 × 2 rank matrixes, then according to Ut Dimension at random be UtChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through meter Calculation obtains the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter UtConvergency value
(6) parameter V is determinedtConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter VtConvergence formula are as follows:
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one A initial valueWithFor example, if VtFor 2 × 2 rank matrixes, then according to Vt Dimension at random be VtChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionIt is updated to formulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated to In, the iterative value of second of iteration is obtained by calculationContinue to obtained current iteration valueCalculating is iterated, Until current iteration valueConvergence, at this time can be by convergent current iteration valueAs parameter VtConvergency value
It should be noted that, although the initial value of parameters can randomly select in catalogue scalar functions, but due to choosing Initial value decide parameters convergence formula convergence rate, therefore, for each convergence parameter selection initial value when can According to being determined the case where data in source domain and target domain.By choosing suitable initial value, each receipts can be accelerated The convergence rate of the convergence expression formula of parameter is held back, cycle-index is reduced.
Further, TCT method provided in this embodiment is taken to classify cross-cutting viewpoint data in order to know When resource consumption situation, method provided in this embodiment using TCT algorithm classify to cross-cutting viewpoint data when, also The computation complexity of each convergence parameter will be calculated.Referring to table 1, each convergence parameter complexity being calculated is as follows:
Table 1
Wherein, k`=k+k0, n=max { ns, nt, m > > k`, n > > k`.
206: classification function is obtained according to the convergency value of parameters in catalogue scalar functions.
Due to having determined that the convergency value of parameters in catalogue scalar functions in above-mentioned steps 205, this step Classification function will be obtained according to the convergency value of parameters in catalogue scalar functions on the basis of above-mentioned steps 205, and then rear Classified according to the classification function of acquisition to the viewpoint data of target domain in continuous step.
Specifically, the step of classification function being obtained according to the convergency value of parameters in catalogue scalar functions, including but it is unlimited In following steps:
Firstly, when obtaining to objective function work optimization processing, parameter U in objective function0、Us、UtConvergency value;
Secondly, parameter U in the objective function that will acquire0、UtConvergency value be updated in following formula:
Wherein, xiFor any one document in target domain, viFor the document topic square of any of target domain document Battle array, i viRow where in the document topic matrix of target domain, j viThe corresponding column of the row at place.
Again, according to viAnd WsObtain classification function;
Specifically, according to viAnd WsThe classification function of acquisition is as follows:
Wherein, i viThe row at place, j viThe corresponding column of the row at place.
207: being classified according to viewpoint data of the classification function to target domain.
Due to having determined that classification function in above-mentioned steps 206, this step is on the basis of above-mentioned steps by root Classify according to viewpoint data of the classification function to target domain.Specifically, y is setiValue be 1 represents forward direction viewpoint, yiValue Negative sense viewpoint is represented for -1, when classifying by any one document of classification function to target domain, if being calculated YiValue be 1, then illustrate therefore the document can be divided into positive document for positive viewpoint by the viewpoint of the document expression;If The y being calculatediValue be -1, then illustrate the document expression viewpoint be negative sense viewpoint, therefore, can by the document divide be negative To document.
Preferably, in order to test the accuracy that TCT algorithm provided in this embodiment classifies to cross-cutting viewpoint data, Method provided in this embodiment will also carry out experimental verification to the four of selection fields.Wherein, four fields of selection are as follows: books Field (B), DVD (Digital Versatile Disc, digital versatile disc) field s (D), electronics field (E), kitchen Room articles field (K).A viewpoint label is distributed during the experiment for each viewpoint in aforementioned four field.Wherein, The viewpoint label of distribution is+1 or -1.When the viewpoint label of the viewpoint distribution for a certain field is+1, illustrate a certain field The viewpoint is positive viewpoint, when the viewpoint label of the viewpoint distribution for a certain field is -1, illustrates the viewpoint in a certain field For negative sense viewpoint.1000 positive sight data points and 1000 negative sense viewpoint data also are set for each field simultaneously, there are also one A little data without marking viewpoint.In cross-cutting viewpoint data sorting task, the classification task that can be constructed has 12, respectively Are as follows: D → B, E → B, K → B, K → E, D → E, B → E, B → D, K → D, E → D, B → K, D → K, E → K.Wherein, table before arrow Show source domain, indicates target domain after arrow.The problem of in view of computing capability, in the present embodiment, selected by each field Data be 5000 or so.It is as shown in table 2:
Table 2
Field Training data Test data The data of viewpoint are not marked The ratio of negative sense data
Books 1600 400 4465 50%
DVD 1600 400 5945 50%
Electronic product 1600 400 5681 50%
Kitchen article 1600 400 3586 50%
Listed data are the data in four fields chosen in table 2, wherein include to train number in each field According to, test data and the data for not marking viewpoint, and ratio shared by negative sense data is each FIELD Data in each field 50%.Since in 12 cross-cutting classification tasks of building, each field is both source domain and target domain, when selected When field is as source domain, the training data in field is for constructing classification function, when selected field is as target domain, neck Test data in domain is for testing the classification function of building.Therefore, in order to guarantee that cross-cutting viewpoint data are divided The accuracy of class, is training data and test data that each field sets identical quantity in the present embodiment, as shown in table 2, often Training data in a field is 1600, and test data is 400.
In order to intuitively show the superiority classified using method provided in this embodiment in cross-cutting viewpoint data, When the data for having chosen four fields are tested, classified calculating will be carried out using different algorithms.Specifically, in addition to this reality It applies outside the TCT algorithm used in example, also has chosen No Transf, SCL (Structural during the experiment Correspondence Learning, the corresponding study of structure), SFA (Spectral Feature Alignment, the feature of spectrum Queue), SDA (Stacked Denoising Auto-encoders, every layer of denoising autocoding), NMTF (non-negative Matrix tri-factorization, nonnegative matrix three are decomposed) scheduling algorithm.
After selected algorithm, 12 cross-cutting classification in order to execute above-mentioned setting according to selected algorithm are appointed Business, method provided in this embodiment is also by the parameter of every kind of algorithm of determination.It is patrolled since algorithm No Transf, SCL and SFA are used It collects and returns as basic classifier, therefore, need to consider the data in four given fields in selection parameter;For algorithm The data that SDA and NMTF will be used in the parameter set to have published thesis;Exist for algorithm TCT used in the embodiment of the present invention When parameter is set, the value of parameter alpha is set as 1, and parameter k and k0Value then according to the classification task of building: E → B determine.
Further, when being classified using different algorithms to 12 classification tasks of building, can be obtained such as Fig. 3 and Classification results shown in Fig. 4.Wherein, horizontal axis in Fig. 3 and Fig. 4 indicates the classification task of building, longitudinal axis presentation class it is accurate Degree, NF represent No Transf algorithm.As can be seen from figs. 3 and 4 when classifying to 12 classification tasks of building, D → B and The classification accuracy of B → D and K → E and E → K is higher, illustrates that the similarity of field B and D are higher, the similarity of field E and K It is higher.The accuracy classified using different algorithms to 12 classification tasks of building is compared simultaneously it is found that using this reality The accuracy for applying the TCT algorithm of example offer is apparently higher than other algorithms, such as SCL, SFA.
Further, the convergent of data when using TCT algorithm to classify cross-cutting viewpoint data to obtain, The experimental result of method provided in this embodiment shows the convergent of B → D and E → K.Referring specifically to Fig. 5 and Fig. 6.Wherein, Fig. 5 is the convergence curve of B → D, and as can be seen from Figure 5, when cycle-index reaches 300 times, the value of objective function will not Change again.Fig. 6 is the convergence curve of E → K, as can be seen from Figure 6, when cycle-index reaches 300 times, the value of objective function function No longer change.
Method provided in this embodiment by obtaining the shared topic matrix of source domain and target domain, and is led according to source The field specific topics of domain and target domain construct the field specific topics matrix of source domain and the field spy of target domain respectively Determine topic matrix, and then according to shared topic matrix, field specific topics matrix, the Polarity Matrix of source domain, source of source domain The term matrix of the term matrix in field, the field specific topics matrix of target domain and target domain determines the catalogue offer of tender After number, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target domain Viewpoint data classify.Since shared topic can be used as the difference between bridge reduction field, according to above-mentioned When classification function classifies to cross-cutting viewpoint data, the accuracy of classification can be improved.
Referring to Fig. 7, the embodiment of the invention provides a kind of sorter of cross-cutting viewpoint data, which includes:
First obtains module 701, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module 702, for according to the field specific topics of source domain and the field specific topics of target domain The field specific topics matrix of source domain and the field specific topics matrix of target domain are obtained respectively;
First determining module 703, for according to field specific topics matrix, the source domain for sharing topic matrix, source domain Polarity Matrix and the term matrix of source domain determine the objective function of source domain;
Second determining module 704, for according to shared topic matrix, the field specific topics matrix of target domain and target The term matrix in field determines the objective function of target domain;
Third determining module 705, for determining catalogue according to the objective function of source domain and the objective function of target domain Scalar functions;
4th determining module 706, for determining the convergency value of parameters in catalogue scalar functions;
Third obtains module 707, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module 708, for being classified according to viewpoint data of the classification function to target domain.
Objective function ψ as the source domain that a kind of optional embodiment, the first determining module 703 determinesAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys For the Polarity Matrix of source domain, CsFor diagonal matrix.
Objective function ψ as the target domain that a kind of optional embodiment, the second determining module 704 determinetAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain Topic matrix, VtFor the document topic matrix of source domain,For the transposed matrix of the document topic matrix of source domain.
The catalogue scalar functions ψ determined as a kind of optional embodiment, third determining module 705 are as follows:
Referring to Fig. 8, the 4th determining module 706, comprising:
First determination unit 7061, for according to formula
Iterate to calculate parameter U0Current iteration ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergency valueIts In, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient matrix of the shared topic matrix of target domain;
Second determination unit 7062, for according to formulaIterate to calculate parameter Us Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter UsReceipts Hold back valueWherein, LSFor the coefficient matrix of the field specific topics matrix of source domain;
Third determination unit 7063, for according to formula
Iterate to calculate parameter VsCurrent iteration valueDirectly To current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
4th determination unit 7064, for according to formulaIterate to calculate parameter Ws Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter Ws's Convergency value
5th determination unit 7065, for according to formulaIterate to calculate parameter UtCurrent iteration valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameterReceipts Hold back valueLtFor the coefficient matrix of the field specific topics matrix of target domain;
6th determination unit 7066, for according to formulaIteration Calculating parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs ginseng Number VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, third obtains module 707 according to the convergency value of parameters in catalogue scalar functions The classification function of acquisition is yi:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain The row at place, j viThe corresponding column of the row at place.
To sum up, device provided in an embodiment of the present invention obtains shared words by the shared topic of source domain and target domain Inscribe matrix, and according to the field specific topics of source domain and target domain construct respectively source domain field specific topics matrix and The field specific topics matrix of target domain, and then led according to shared topic matrix, the field specific topics matrix of source domain, source The Polarity Matrix in domain and the term matrix of source domain determine the objective function of source domain, are led according to shared topic matrix, target The field specific topics matrix in domain and the term matrix of target domain determine the objective function of target domain, and lead according to source After the objective function in domain and the objective function of target domain obtain catalogue scalar functions, according to parameters in catalogue scalar functions Convergency value obtains classification function, and then is classified according to viewpoint data of the classification function to target domain.Due to sharing topic Can be used as the difference between bridge reduction field, therefore, it is cross-cutting classify to viewpoint data when, can be improved point The accuracy of class.
Fig. 9 is a kind of device of classification method for cross-cutting viewpoint data shown according to an exemplary embodiment 900 block diagram.For example, device 900 may be provided as a server.Referring to Fig. 9, device 900 includes processing component 922, It further comprise one or more processors, and the memory resource as representated by memory 932, it can be by handling for storing The instruction of the execution of component 922, such as application program.The application program stored in memory 932 may include one or one Each above corresponds to the module of one group of instruction.In addition, processing component 922 is configured as executing instruction, it is above-mentioned to execute The classification method of the cross-cutting viewpoint data of method, this method comprises:
101: shared topic matrix being obtained according to the shared topic of source domain and target domain, and according to the field of source domain Specific topics and the field specific topics of target domain obtain the field specific topics matrix and target domain of source domain respectively Field specific topics matrix.
According to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain Term matrix determines the objective function of source domain, and according to shared topic matrix, the field specific topics matrix of target domain And the term matrix of target domain determines the objective function of target domain;
Catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain, and determine the catalogue offer of tender The convergency value of parameters in number;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target domain Viewpoint data classify.
As a kind of optional embodiment, according to shared topic matrix, field specific topics matrix, the source domain of source domain Polarity Matrix and source domain term matrix determine source domain objective function ψsAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys For the Polarity Matrix of source domain, CsFor diagonal matrix.
As a kind of optional embodiment, according to the field specific topics of shared topic matrix, source domain and target domain The objective function ψ for the target domain that the term matrix of matrix and target domain determinestAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain Topic matrix, VtFor the document topic matrix of target domain, Vt TFor the transposed matrix of the document topic matrix of source domain.
As a kind of optional embodiment, determined according to the objective function of the objective function of source domain and target domain total Objective function ψ are as follows:
As a kind of optional embodiment, the convergency value of parameters in catalogue scalar functions is determined, comprising:
According to formulaIterate to calculate parameter U0's Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergence ValueWherein, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient of the shared topic matrix of target domain Matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, LSFor source neck The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil Current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueDirectly To parameter currentConvergence, and by convergent current iteration valueConvergency value as parameter UtLtFor target domain The coefficient matrix of field specific topics matrix;
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, the classification function y obtained according to the convergency value of parameters in catalogue scalar functionsi Are as follows:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain The row at place, j viThe corresponding column of the row at place.
Device 900 can also include the power management that a power supply module 926 is configured as executive device 900, and one has Line or radio network interface 950 are configured as device 900 being connected to network and input and output (I/O) interface 958.Dress Setting 900 can operate based on the operating system for being stored in memory 932, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or similar.
In conclusion server provided in an embodiment of the present invention, by the shared topic for obtaining source domain and target domain Matrix, and the field specific topics matrix and mesh of source domain are constructed according to the field specific topics of source domain and target domain respectively The field specific topics matrix in mark field, and then according to shared topic matrix, field specific topics matrix, the source domain of source domain Polarity Matrix, source domain term matrix, the field specific topics matrix of target domain and the term square of target domain After battle array determines catalogue scalar functions, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification Function classifies to the viewpoint data of target domain.Since shared topic can be used as the difference between bridge reduction field, Therefore, when classifying according to above-mentioned classification function to cross-cutting viewpoint data, the accuracy of classification can be improved.
It should be understood that the sorter of cross-cutting viewpoint data provided by the above embodiment is to cross-cutting viewpoint number According to classification when, only the example of the division of the above functional modules, in practical application, can according to need and will be above-mentioned Function distribution is completed by different functional modules, i.e., the internal structure of cross-cutting viewpoint data is divided into different function moulds Block, to complete all or part of the functions described above.In addition, the classification of cross-cutting viewpoint data provided by the above embodiment The classification method embodiment of device and cross-cutting viewpoint data belongs to same design, and specific implementation process is detailed in method implementation Example, which is not described herein again.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of classification method of cross-cutting viewpoint data, which is characterized in that the described method includes:
Shared topic matrix is obtained according to the shared topic of source domain and target domain, and according to the field specific topics of source domain With the field specific topics of target domain obtain respectively source domain field specific topics matrix and target domain field it is specific Topic matrix;
According to the shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain Term matrix determines the objective function of source domain, the objective function ψ of the source domainsAre as follows:
Wherein, describedTo take this black norm of Luo Beini, the Tr [] is trace of a matrix, the XsFor the term square of source domain Battle array, the U0To share topic matrix, the UsFor the field specific topics matrix of source domain, the VsFor the document of source domain Topic matrix, it is describedFor the transposed matrix of the document topic matrix of source domain, the α is arbitrary parameter, the WsIt is linear Model coefficient, for predicting the VsViewpoint data, the YsFor the Polarity Matrix of source domain, the CsFor diagonal matrix;
And it is true according to the term matrix of the shared topic matrix, the field specific topics matrix of target domain and target domain Set the goal the objective function in field, the objective function ψ of the target domaintAre as follows:
Wherein, the XtFor the term matrix of target domain, the U0For the shared topic matrix, the UtFor target neck The field specific topics matrix in domain, the VtFor the document topic matrix of target domain, the Vt TIt is talked about for the document of target domain Inscribe the transposed matrix of matrix;
Catalogue scalar functions are determined according to the objective function of the objective function of the source domain and the target domain, and described in determination The convergency value of parameters in catalogue scalar functions;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to the classification function to target domain Viewpoint data classify.
2. the method according to claim 1, wherein being led according to the objective function of the source domain and the target The catalogue scalar functions ψ that the objective function in domain determines are as follows:
3. according to the method described in claim 2, it is characterized in that, the parameter of the catalogue scalar functions is U0、Us、Vs、Ws、UtWith Vt
The convergency value of parameters in the determination catalogue scalar functions, comprising:
According to formulaIterate to calculate parameter U0It is current Iterative valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergency valueWherein, the HsFor the coefficient matrix of the shared topic matrix of source domain, the HtFor the shared topic matrix of target domain Coefficient matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil working as Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, the LSSource neck The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil current change Generation valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueUntil current ParameterConvergence, and by convergent current iteration valueAs parameter UtConvergency valueLtIt is special for the field of target domain Determine the coefficient matrix of topic matrix;
According to formulaIterate to calculate parameter VtCurrent convergency value Until parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
4. according to the method described in claim 3, it is characterized in that, being obtained according to the convergency value of parameters in catalogue scalar functions Classification function yiAre as follows:
Wherein, the viFor any one document topic matrix of target domain, the i is the viIt is talked about in the document of target domain The row where in matrix is inscribed, the j is the viThe corresponding column of the row at place.
5. a kind of sorter of cross-cutting viewpoint data, which is characterized in that described device includes:
First obtains module, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module, for being obtained respectively according to the field specific topics of source domain and the field specific topics of target domain The field specific topics matrix of source domain and the field specific topics matrix of target domain;
First determining module, for the pole according to the field specific topics matrix of the shared topic matrix, source domain, source domain The term matrix of property matrix and source domain determines the objective function of source domain, the objective function ψ of the source domainsAre as follows:
Wherein, describedTo take this black norm of Luo Beini, the Tr [] is trace of a matrix, the XsFor the term square of source domain Battle array, the U0To share topic matrix, the UsFor the field specific topics matrix of source domain, the VsFor the document of source domain Topic matrix, it is describedFor the transposed matrix of the document topic matrix of source domain, the α is arbitrary parameter, the WsIt is linear Model coefficient, for predicting the VsViewpoint data, the YsFor the Polarity Matrix of source domain, the CsFor diagonal matrix;
Second determining module, for being led according to the field specific topics matrix and target of the shared topic matrix, target domain The term matrix in domain determines the objective function of target domain, the objective function ψ of the target domaintAre as follows:
Wherein, the XtFor the term matrix of target domain, the U0For the shared topic matrix, the UtFor target neck The field specific topics matrix in domain, the VtFor the document topic matrix of target domain, the Vt TFor the document topic of source domain The transposed matrix of matrix;
Third determining module, for determining catalogue according to the objective function of the source domain and the objective function of the target domain Scalar functions;
4th determining module, for determining the convergency value of parameters in the catalogue scalar functions;
Third obtains module, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module, for being classified according to viewpoint data of the classification function to target domain.
6. device according to claim 5, which is characterized in that the catalogue scalar functions ψ that the third determining module determines are as follows:
7. device according to claim 6, which is characterized in that the 4th determining module, comprising:
First determination unit, for according to formulaIteration meter Calculate parameter U0Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs ginseng Number U0Convergency valueWherein, the HsFor the coefficient matrix of the shared topic matrix of source domain, the HtFor target domain The coefficient matrix of shared topic matrix;
Second determination unit, for according to formulaIterate to calculate parameter UsIt is current repeatedly Generation valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency value Wherein, the LSFor the coefficient matrix of the field specific topics matrix of source domain;
Third determination unit, for according to formula
Iterate to calculate parameter VsCurrent iteration valueUntil working as Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
4th determination unit, for according to formulaIterate to calculate parameter WsCurrent iteration ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
5th determination unit, for according to formulaIterate to calculate parameter UtIt is current to change Generation valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter UtConvergency valueLtFor The coefficient matrix of the field specific topics matrix of target domain;
6th determination unit, for according to formulaIterate to calculate parameter Vt Current convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergence Value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
8. device according to claim 7, which is characterized in that the third obtains module according to each in catalogue scalar functions The classification function y that the convergency value of parameter obtainsiAre as follows:
Wherein, the vi is any one document topic matrix of target domain, and the i is that the vi is talked about in the document of target domain The row where in matrix is inscribed, the j is the corresponding column of row where the vi.
9. a kind of server, which is characterized in that the server includes one or a processor and memory, the storage One or more instructions are stored in device, described instruction is loaded by the processor and executed to realize such as claim 1 Operation performed by classification method to the described in any item cross-cutting viewpoint data of claim 4.
10. a kind of computer readable storage medium, which is characterized in that be stored in the computer readable storage medium one or More than one instruction, described instruction is as processor loads and executes to realize as described in claim 1 to any one of claim 4 Cross-cutting viewpoint data classification method performed by operation.
CN201410201027.7A 2014-05-13 2014-05-13 The classification method and device of cross-cutting viewpoint data Active CN105095277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410201027.7A CN105095277B (en) 2014-05-13 2014-05-13 The classification method and device of cross-cutting viewpoint data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410201027.7A CN105095277B (en) 2014-05-13 2014-05-13 The classification method and device of cross-cutting viewpoint data

Publications (2)

Publication Number Publication Date
CN105095277A CN105095277A (en) 2015-11-25
CN105095277B true CN105095277B (en) 2019-12-03

Family

ID=54575730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410201027.7A Active CN105095277B (en) 2014-05-13 2014-05-13 The classification method and device of cross-cutting viewpoint data

Country Status (1)

Country Link
CN (1) CN105095277B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107392242B (en) * 2017-07-18 2020-06-19 广东工业大学 Cross-domain picture classification method based on homomorphic neural network

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102937960A (en) * 2012-09-06 2013-02-20 北京邮电大学 Device and method for identifying and evaluating emergency hot topic
CN103761311A (en) * 2014-01-23 2014-04-30 中国矿业大学 Sentiment classification method based on multi-source field instance migration
CN104239402A (en) * 2014-07-23 2014-12-24 中国科学院自动化研究所 Document enquiry method and document enquiry device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9194800B2 (en) * 2012-10-29 2015-11-24 Tokitae Llc Systems, devices, and methods employing angular-resolved scattering and spectrally resolved measurements for classification of objects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102937960A (en) * 2012-09-06 2013-02-20 北京邮电大学 Device and method for identifying and evaluating emergency hot topic
CN103761311A (en) * 2014-01-23 2014-04-30 中国矿业大学 Sentiment classification method based on multi-source field instance migration
CN104239402A (en) * 2014-07-23 2014-12-24 中国科学院自动化研究所 Document enquiry method and document enquiry device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"微博话题评论的情感分析研究";曾佳妮等;《信息安全与通信保密》;20130328;第56-58页 *

Also Published As

Publication number Publication date
CN105095277A (en) 2015-11-25

Similar Documents

Publication Publication Date Title
Sahoo et al. Exploratory data analysis using Python
Shao et al. Online multi-view clustering with incomplete views
Sharma Deep challenges associated with deep learning
Sussman et al. A consistent adjacency spectral embedding for stochastic blockmodel graphs
Gao et al. Stability analysis of learning algorithms for ontology similarity computation
CN106909931B (en) Feature generation method and device for machine learning model and electronic equipment
CN106095966B (en) User extensible label labeling method and system
CN109241290A (en) A kind of knowledge mapping complementing method, device and storage medium
Reff Spectral properties of oriented hypergraphs
CN105825269B (en) A kind of feature learning method and system based on parallel automatic coding machine
CN104616029A (en) Data classification method and device
Bai et al. Multidimensional scaling on multiple input distance matrices
CN105320764A (en) 3D model retrieval method and 3D model retrieval apparatus based on slow increment features
Mutar et al. Smoke detection based on image processing by using grey and transparency features
CN103942214B (en) Natural image classification method and device on basis of multi-modal matrix filling
CN104077408B (en) Extensive across media data distributed semi content of supervision method for identifying and classifying and device
Sharma et al. Comparative Analysis of Data Storage Solutions for Responsive Big Data Applications
Gavval et al. CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM
Aftab et al. Sentiment analysis of customer for ecommerce by applying AI
Belanche et al. Handling missing values in kernel methods with application to microbiology data
CN105095277B (en) The classification method and device of cross-cutting viewpoint data
Baskaran et al. Accelerated low-rank updates to tensor decompositions
CN104268217A (en) User behavior time relativity determining method and device
Aluja-Banet et al. GRAFT, a complete system for data fusion
Lu et al. Explainable, stable, and scalable graph convolutional networks for learning graph representation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant