CN105095277B - The classification method and device of cross-cutting viewpoint data - Google Patents
The classification method and device of cross-cutting viewpoint data Download PDFInfo
- Publication number
- CN105095277B CN105095277B CN201410201027.7A CN201410201027A CN105095277B CN 105095277 B CN105095277 B CN 105095277B CN 201410201027 A CN201410201027 A CN 201410201027A CN 105095277 B CN105095277 B CN 105095277B
- Authority
- CN
- China
- Prior art keywords
- matrix
- domain
- value
- parameter
- source domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000005520 cutting process Methods 0.000 title claims abstract description 58
- 239000011159 matrix material Substances 0.000 claims abstract description 417
- 230000006870 function Effects 0.000 claims abstract description 217
- 230000008859 change Effects 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 5
- 241000208340 Araliaceae Species 0.000 claims description 2
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 2
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 2
- 235000008434 ginseng Nutrition 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 description 14
- 238000012545 processing Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000012360 testing method Methods 0.000 description 7
- 238000012549 training Methods 0.000 description 7
- 238000005457 optimization Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000009467 reduction Effects 0.000 description 5
- 238000000354 decomposition reaction Methods 0.000 description 4
- 238000009826 distribution Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 239000004576 sand Substances 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 238000001228 spectrum Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 230000000699 topical effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Abstract
The invention discloses the classification methods and device of a kind of cross-cutting viewpoint data, belong to Internet technical field.Method includes: to obtain shared topic matrix according to the shared topic of source domain and target domain, and obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain respectively according to the field specific topics of source domain and the specific topics of target domain;It determines the objective function of source domain, and determines the objective function of target domain;Catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain;The convergency value for determining parameters in catalogue scalar functions obtains classification function according to the convergency value of parameters in catalogue scalar functions;Classified according to viewpoint data of the classification function to target domain.The present invention classifies to cross-cutting viewpoint data by sharing the classification function that topic matrix obtains, and since shared topic matrix can reduce the gap of different field, thus improves the precision to cross-cutting viewpoint data classification.
Description
Technical field
The present invention relates to Internet technical field, in particular to a kind of the classification method and device of cross-cutting viewpoint data.
Background technique
With the development of internet technology, more and more share about the viewpoint data of User Perspective on the net, these
Viewpoint data exist in the form of the user comment of shopping website, blog articles, user feedback etc..Due to the viewpoint number on internet
According to being related to different fields, and the viewpoint data of different field are important to instructing user to have in the production practices of different field
Meaning, therefore, it is necessary to which the viewpoint data for obtaining different field are studied.Again since the data volume of internet is larger, it is difficult
The data in field each in internet are labeled, therefore, how to be classified to cross-cutting viewpoint data, become acquisition not
The key of the viewpoint data of same domain.
To use SFA (Spectral Feature Alignment, the feature queue of spectrum) algorithm to cross-cutting viewpoint number
For being classified, the relevant technologies when classifying to cross-cutting viewpoint data, a source domain arbitrarily selected first and
Target domain, and determine the field specific word and field autonomous word of source domain and target domain, then in the special word in field and neck
A two-dimensional plot is constructed between the autonomous word of domain, which is used to indicate the cooccurrence relation of field special word and field autonomous word,
In turn the special word in more field will be contacted using SFA algorithm in two-dimensional plot and field autonomous word is assigned in a cluster, due to this
A cluster can reduce the gap between the special word in field of source domain and target domain, therefore, can be according to this cluster training one
Classifier, and then classified by the classifier that training obtains to cross-cutting viewpoint data.
In the implementation of the present invention, inventor find the relevant technologies the prior art has at least the following problems:
The relevant technologies are when classifying to cross-cutting viewpoint data, due to selected source domain and target domain and different
Surely there is the special word in specific field and field autonomous word, therefore, the knot that the relevant technologies classify to cross-cutting viewpoint data
Fruit is simultaneously inaccurate.
Summary of the invention
In order to solve the problems, such as the relevant technologies, the embodiment of the invention provides a kind of classification methods of cross-cutting viewpoint data
And device.The technical solution is as follows:
In a first aspect, providing a kind of classification method of cross-cutting viewpoint data, which comprises
Shared topic matrix is obtained according to the shared topic of source domain and target domain, and specific according to the field of source domain
The field specific topics of topic and target domain obtain the field of field the specific topics matrix and target domain of source domain respectively
Specific topics matrix;
It is led according to the shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source
The term matrix in domain determines the objective function of source domain, and specific according to the shared topic matrix, the field of target domain
The term matrix of topic matrix and target domain determines the objective function of target domain;
Catalogue scalar functions are determined according to the objective function of the objective function of the source domain and the target domain, and are determined
The convergency value of parameters in the catalogue scalar functions;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to the classification function to target
The viewpoint data in field are classified.
Second aspect, provides a kind of sorter of cross-cutting viewpoint data, and described device includes:
First obtains module, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module, for being distinguished according to the field specific topics of source domain and the field specific topics of target domain
Obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain;
First determining module, for field specific topics matrix, the source domain according to the shared topic matrix, source domain
Polarity Matrix and the term matrix of source domain determine the objective function of source domain;
Second determining module, for the field specific topics matrix and mesh according to the shared topic matrix, target domain
The term matrix in mark field determines the objective function of target domain;
Third determining module, for being determined according to the objective function of the source domain and the objective function of the target domain
Catalogue scalar functions;
4th determining module, for determining the convergency value of parameters in the catalogue scalar functions;
Third obtains module, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module, for being classified according to viewpoint data of the classification function to target domain.
Technical solution provided in an embodiment of the present invention has the benefit that
By obtaining the shared topic matrix of source domain and target domain, and according to the field of source domain and target domain spy
Determine topic and construct the field specific topics matrix of source domain and the field specific topics matrix of target domain respectively, and then according to altogether
Enjoy field specific topics matrix, the Polarity Matrix of source domain, the term matrix of source domain, target of topic matrix, source domain
After the field specific topics matrix in field and the term matrix of target domain determine catalogue scalar functions, according to catalogue scalar functions
The convergency value of middle parameters obtains classification function, and is classified according to viewpoint data of the classification function to target domain.By
Can be used as the difference between bridge reduction field in shared topic, therefore, according to above-mentioned classification function to cross-cutting viewpoint
When data are classified, the accuracy of classification can be improved.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those of ordinary skill in the art, without creative efforts, it can also be obtained according to these attached drawings other
Attached drawing.
Fig. 1 is the classification method flow chart for the cross-cutting viewpoint data that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides cross-cutting viewpoint data classification method flow chart;
Fig. 3 be another embodiment of the present invention provides the knot classified using different algorithms to cross-cutting viewpoint data
Fruit schematic diagram;
Fig. 4 be another embodiment of the present invention provides the knot classified using different algorithms to cross-cutting viewpoint data
Fruit schematic diagram;
Fig. 5 be another embodiment of the present invention provides convergence curve schematic diagram;
Fig. 6 be another embodiment of the present invention provides convergence curve schematic diagram;
Fig. 7 be another embodiment of the present invention provides cross-cutting viewpoint data sorter structural schematic diagram;
Fig. 8 be another embodiment of the present invention provides third determining module structural schematic diagram;
Fig. 9 be another embodiment of the present invention provides a kind of server structural schematic diagram.
Specific embodiment
To make the object, technical solutions and advantages of the present invention clearer, below in conjunction with attached drawing to embodiment party of the present invention
Formula is described in further detail.
With the development of internet technology, viewpoint data sharing has become a development trend of today's society.Due to
The viewpoint data of different field have great importance to consumer-oriented production practices, and the number of the viewpoint data on internet
Amount and type are more, therefore, it is necessary to classify to the different viewpoints data on internet.For this purpose, the embodiment of the present invention mentions
A kind of classification method of cross-cutting viewpoint data is supplied, referring to Fig. 1, method flow provided in this embodiment includes:
101: shared topic matrix being obtained according to the shared topic of source domain and target domain, and according to the field of source domain
Specific topics and the field specific topics of target domain obtain the field specific topics matrix and target domain of source domain respectively
Field specific topics matrix.
102: being led according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source
The term matrix in domain determines the objective function of source domain, and according to the field specific topics of shared topic matrix, target domain
The term matrix of matrix and target domain determines the objective function of target domain.
103: catalogue scalar functions being determined according to the objective function of the objective function of source domain and target domain, and determine catalogue
The convergency value of parameters in scalar functions.
104: classification function being obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target
The viewpoint data in field are classified.
As a kind of optional embodiment, according to shared topic matrix, field specific topics matrix, the source domain of source domain
Polarity Matrix and source domain term matrix determine source domain objective function ψsAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For
Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain
Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys
For the Polarity Matrix of source domain, CsFor diagonal matrix.
As a kind of optional embodiment, according to the field specific topics of shared topic matrix, source domain and target domain
The objective function ψ for the target domain that the term matrix of matrix and target domain determinestAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain
Topic matrix, VtFor the document topic matrix of target domain, Vt TFor the transposed matrix of the document topic matrix of source domain.
As a kind of optional embodiment, determined according to the objective function of the objective function of source domain and target domain total
Objective function ψ are as follows:
As a kind of optional embodiment, the convergency value of parameters in catalogue scalar functions is determined, comprising:
According to formulaIterate to calculate parameter U0's
Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergence
ValueWherein, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient of the shared topic matrix of target domain
Matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil
Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, LSFor source neck
The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration
ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil working as
Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueUntil
Parameter currentConvergence, and by convergent current iteration valueConvergency value as parameter UtLtFor the neck of target domain
The coefficient matrix of domain specific topics matrix;
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, the classification function y obtained according to the convergency value of parameters in catalogue scalar functionsi
Are as follows:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain
The row at place, j viThe corresponding column of the row at place.
Method provided in an embodiment of the present invention, by the shared topic matrix of acquisition source domain and target domain, and according to
The field specific topics of source domain and target domain construct the neck of field the specific topics matrix and target domain of source domain respectively
Domain specific topics matrix, so according to shared topic matrix, the field specific topics matrix of source domain, source domain polarity square
The term matrix of battle array, the field specific topics matrix of the term matrix of source domain, target domain and target domain determines total
After objective function, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to mesh
The viewpoint data in mark field are classified.Since shared topic can be used as the difference between bridge reduction field, in root
When classifying according to above-mentioned classification function to cross-cutting viewpoint data, the accuracy of classification can be improved.
There are a certain number of shared topics in source domain and target domain, and source domain and target domain are respectively provided with neck
In the case where the special topic in domain, the embodiment of the invention provides a kind of classification methods of cross-cutting viewpoint data.This implementation provides
Method when classifying to cross-cutting viewpoint data, using a kind of new algorithm TCT (Topical
Correspondence Transfer, topic unanimously shift) algorithm.The TCT algorithm is based on the shared topic training one between field
A classification function, and classified using classification function to cross-cutting viewpoint data.Referring to fig. 2, provided in an embodiment of the present invention
Method flow includes:
201: topic matrix being shared according to the acquisition of source domain and target domain, and according to the specific topics and mesh of source domain
The specific topics in mark field obtain the field specific topics matrix of source domain and the field specific topics matrix of target domain.
Wherein, source domain can be books field, electronic field, garment industry etc., and the present embodiment does not have source domain work
The restriction of body.Source domain is set as Xs, the number of files for including in source domain is nsIt is a, the number for the term for including in each document
Amount is m, then source domain can be indicated with a term matrix, obtains the term matrix of source domain:
Due to including m term in document each in source domain, the term matrix of source domain is also shown asI.e.
Due to marking polar documents comprising a certain number of in source domain, it is polar for being marked in source domain
Document can use a document Polarity Matrix YsIt indicates.Wherein, YsFor ns× 2 rank matrixes, nsNumber for the document for including in source domain
Amount, there are two types of the polar groups of 2 expression documents: a kind of polarity is positive, and indicates that the viewpoint of document expression is positive viewpoint, Yi Zhongji
Property be negative, indicate document expression viewpoint be negative sense viewpoint.The polarity chron of document in determining source domain, in source domain
For i documents, if first y in the corresponding Polarity Matrix of i-th document of source domaini=1, then it can determine i-th in source domain
The polarity of piece document is positive, i.e. the viewpoint of the document expression is positive viewpoint;If the corresponding polarity square of i-th document of source domain
First y in battle arrayi=-1, it is determined that the polarity of i-th document is negative in source domain, i.e., the viewpoint of the document expression is negative sense sight
Point.Certainly, other than aforesaid way, other methods of determination also can be used, the present embodiment does not limit this specifically.
Wherein, target domain can be the neck different from source domain such as books field, electronic field, field of kitchen products
Domain, the present embodiment do not make specific limit to target domain.Target domain is set as Xt, the number of files for including in target domain is
ntA, the quantity for the term for including in each document is m, then target domain can be indicated with a term matrix, be obtained
The term matrix of target domain:
Due to including m term in document each in target domain, the term matrix also table of target domain
It is shown asI.e.
Since the classification method of cross-cutting viewpoint data provided in this embodiment is mainly based upon source domain and target domain
Shared topic realize that and the shared topic of source domain and target domain can as the bridge between source domain and target domain
To reduce the gap of source domain and target domain, make it possible that knowledge is transmitted across field.Therefore, in order to cross-cutting right
Viewpoint data are classified, and method provided in this embodiment is it needs to be determined that share the quantity of topic.Wherein, source domain and target neck
The shared topic in domain is the topic that source domain and target domain can all be related to.For example, source domain is books field, target domain is
Garment industry, the topics such as " valuableness ", " cheap " can all be related in source domain and target domain, therefore, the words such as " valuableness ", " cheap "
Topic can be used as shared topic.
For the ease of subsequent analytical calculation, the quantity of shared topic is set in the present embodiment as k0Shared topic matrix
For U0, then according to the shared topic matrix for sharing topic acquisition are as follows:
Due to including m term in each document in source domain and target domain, share topic matrix also
It can be expressed asI.e.Wherein, sharing each column in topic matrix indicates source domain and target neck
One shared topic in domain.
Further, since source domain and target domain not only have shared topic, but also also each have field specific
Topic, and characterization of the field specific topics as each field uniqueness, and realize the important of cross-cutting viewpoint data classification
Foundation.Therefore, method provided in this embodiment it is cross-cutting classify to viewpoint data before, need for source domain set source
The field specific topics in field set the field specific topics of target domain for target domain.Wherein, the field of source domain is specific
Topic is the exclusive topic of source domain, and the field specific topics of target domain are the exclusive topic of target domain.For example, if source is led
Domain is electronics field, and target domain is books field, then the topics such as " power consumption ", " sensitive " are the specific words in field of source domain
Topic, the topics such as " exquisiteness ", " tediously long " are the field specific topics of target domain.For the ease of subsequent analytical calculation, as far as possible
Ground reduces the gap between source domain and target domain, and method provided in this embodiment can set number for source domain and target domain
Measure identical field specific topics.
If the quantity of the field specific topics of source domain is k, the specific topics matrix of source domain is Us, then according to source domain
Field specific topics obtain source domain field specific topics matrix are as follows:
Due to including m term in each document in source domain, the field specific topics matrix of source domain
It is also denoted as Rm×k, i.e. Us∈Rm×k.Wherein, each column in the field specific topics matrix of source domain indicate source domain
One specific topics.
If the quantity of the field specific topics of target domain is k, the field specific topics matrix of target domain is Ut, then root
According to the field specific topics matrix for the target domain that the field specific topics of target domain obtain are as follows:
Ut=[u1 (t)..., uk (t)]。
Due to including m term in each document in target domain, the field specific topics of target domain
Matrix is also denoted as Rm×k, i.e. Ut∈Rm×k.Wherein, each column in the field specific topics matrix of target domain indicate mesh
One specific topics in mark field.
202: being led according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source
The term matrix in domain determines the objective function of source domain.
Since the objective function of source domain is that the cross-cutting viewpoint data to target domain are classified in subsequent step
Therefore important evidence needs first to determine the target of source domain before the cross-cutting viewpoint data to target domain are classified
Function.About the method for the objective function for determining source domain, the present embodiment is not especially limited, including but not limited to according to shared
The term matrix determination of topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain.
Specifically, according to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source
The objective function ψ for the source domain that the term matrix in field determinessAre as follows:
Wherein, | | | |2 FTo take this black norm of Luo Beini;Tr [] is trace of a matrix;XsFor the term matrix of source domain;U0
To share topic matrix;UsFor the field specific topics matrix of source domain;VsFor the document topic matrix of source domain;For source neck
The transposed matrix of the document topic matrix in domain;α is arbitrary parameter, can be 1,2,3 etc., the present embodiment does not have the value work of α
The restriction of body;WsFor linear model coefficients, for predicting VsThe polarity of viewpoint data;YsFor the Polarity Matrix of source domain;CsIt is right
Angular moment battle array, every a line of diagonal matrix and each column all correspond to a document in source domain.Wherein, CsFor a ns×ns
Rank matrix, Ke YiyongIt indicates.About diagonal matrix CsIn each element setting means, including but not limited to using as follows
Mode: if diagonal matrix CsThe corresponding document of some element is to mark polar document on middle diagonal line, then by the element
Value is set as 1, i.e., when i-th of document in source domain is to mark polar document, Cs(i, i)=1;If diagonal matrix CsIn it is right
The corresponding document of some element is not mark polar document on linea angulata, then the value of the element is set as 0, i.e., when in source domain
I-th of document be C when not marking polar documents(i, i)=0.
Further, by the expression formula of the objective function of the source domain of above-mentioned determination it is found that shared topic matrix and source neck
The field specific topics matrix in domain is the key that the objective function of determining source domain, therefore, in the target letter of determining source domain
Before number, need first to determine the field specific topics matrix of shared topic matrix and source domain.Topic matrix is shared about determining
And the method for the field specific topics matrix of source domain, including but not limited to the term matrix of source domain decompose
It arrives.By by the term matrix X of source domainsAvailable two matrixes are decomposed, a matrix is the document topic of source domain
Matrix Vs, a matrix is the entry topic matrix U of source domains`.Wherein, the entry topic matrix U of source domains` be a m ×
(k+k0) rank matrix, i.e.,The entry topic matrix U of source domainsThe matrix for including in ` includes but unlimited
In shared topic matrix U0With the field specific topics matrix U of source domains.The document topic matrix V of source domainsFor a ns×
(k+k0) rank matrix, i.e.,Every a line in matrix indicates a document in source domain.The text of source domain
Shelves topic matrix VsIn include matrix include but is not limited to matrix HsAnd Ls, wherein HsFor a ns×k0Rank matrix, HsIt is total
Enjoy the coefficient matrix of topic matrix;LsFor a ns× k rank matrix, LsFor the coefficient square of the field specific topics matrix of source domain
Battle array.
About the method for decomposing the term matrix of source domain, Non-negative Matrix Factorization method is including but not limited to used
The term matrix of source domain is decomposed.Wherein, Non-negative Matrix Factorization method is that all elements are nonnegative number in a matrix
Matrix disassembling method under constraint condition, Non-negative Matrix Factorization method are non-negative at several by matrix decomposition by finding low-rank
Matrix.Had much in practical application using the example of Non-negative Matrix Factorization method split-matrix, such as uses Non-negative Matrix Factorization number
Word statistics and stock price in pixel, text analyzing in word image etc..The basic thought of Non-negative Matrix Factorization method can
To be briefly described are as follows: for an any given nonnegative matrix A, a nonnegative matrix U and a nonnegative matrix can be found
V allows non-negative matrix A to resolve into the product of nonnegative matrix U and V.Text, image are carried out using Non-negative Matrix Factorization method
The analysis of large-scale data, more traditional Processing Algorithm speed faster, it is more convenient.
203: according to shared topic matrix, the field specific topics matrix of target domain and the term matrix of target domain
Determine the objective function of target domain.
Since the objective function of target domain is that the cross-cutting viewpoint data to target domain are classified in subsequent step
Important evidence therefore need first to determine target domain before the cross-cutting viewpoint data to target domain are classified
Objective function.About the method for the objective function for determining target domain, including but not limited to according to shared topic matrix, target neck
The field specific topics matrix in domain and the term matrix of target domain determine.
Specifically, according to the field specific topics matrix and target domain of shared topic matrix, source domain and target domain
Term matrix determine target domain objective function ψtAre as follows:
Wherein, | | | |2 FTo take this black norm of Luo Beini;XtFor the term matrix of target domain;U0To share topic square
Battle array;UtFor the field specific topics matrix of target domain;VtFor the document topic matrix of target domain, Vt TFor the text of target domain
The transposed matrix of shelves topic matrix.
Further, by the expression formula of the objective function of the target domain of above-mentioned determination it is found that shared topic matrix and mesh
The field specific topics matrix in mark field is the key that the objective function of determining target domain, therefore, in determining target domain
Objective function before, need first to determine the field specific topics matrix of shared topic matrix, target domain.It is shared about determining
Topic matrix, target domain field specific topics matrix method, including but not limited to by the term matrix of target domain
It is decomposed to obtain.By by the term matrix X of target domaintIt carries out decomposing available two matrixes, a matrix is
The document topic matrix V of target domaint, a matrix is the entry topic matrix U of target domaint`。
Wherein, the entry topic matrix U of target domaint` is a m × (k+k0) rank matrix, i.e.,The entry topic matrix U of target domaintThe matrix for including in ` includes but is not limited to shared topic square
Battle array U0With the field specific topics matrix U of target domaint.The document topic matrix V of target domaintFor a nt×(k+k0) rank square
Battle array, i.e.,Every a line in matrix indicates a document in target domain.The document topic of target domain
Matrix VtIn include matrix include but is not limited to matrix HtAnd Lt, wherein HtFor a nt×k0Rank matrix, HtTo share topic
The coefficient matrix of matrix;LtFor a nt× k rank matrix, LtFor the coefficient matrix of the field specific topics matrix of target domain.
About the method for decomposing the term matrix of target domain, Non-negative Matrix Factorization is including but not limited to used
Method decomposes the term matrix of target domain.
It should be noted that the present embodiment does not determine the objective function of source domain and the target of target domain to above-mentioned execution
The sequencing of the process of function is defined, and when specifically executing, both can first determine the objective function of source domain, can also be first
Determine the objective function of target domain.
204: catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain.
Target domain obtained in the objective function and above-mentioned steps 203 of the source domain as obtained in above-mentioned steps 202
Objective function be the catalogue offer of tender that is complementary, and will being obtained according to the objective function of source domain and the objective function of target domain
Number makees optimization and handles the precision and speed that can be improved to the viewpoint data classification of target domain.Therefore, in order to quick and precisely
Ground classifies to the viewpoint data of target domain, method provided in this embodiment the viewpoint data classification to target domain it
Before, it needs first to determine a catalogue scalar functions according to the objective function of source domain and the objective function of target domain.
About the method for determining catalogue scalar functions according to the objective function of source domain and the objective function of target domain, this reality
Example is applied to be not especially limited, including but unlimited be limited to the following method: by the mesh of the objective function of source domain and target domain
Scalar functions make additional calculation, and then obtain a catalogue scalar functions.Therefore, according to the objective function of source domain and target domain
The catalogue scalar functions ψ that objective function determines are as follows:
Wherein, the parameter in catalogue scalar functions includes but is not limited to U0、Us、Vs、Ws、UtAnd VtDeng.
Further, after obtaining catalogue scalar functions, method provided in this embodiment needs to make most catalogue scalar functions
Optimization processing.About the method for making to optimize processing to catalogue scalar functions, following formula is including but not limited to used:
Wherein, d ∈ { s, t }, when d is s, UdAnd VdRespectively UsAnd Vs, when d is t, UdAnd VdRespectively UtAnd Vt.It is logical
It crosses above-mentioned formula and parameters U in available general objective is made during optimization processing to catalogue scalar functions0、Us、Vs、Ws、
UtAnd VtConvergency value, and the convergency value be subsequent step in obtain classification function key.
It should be noted that needing due to when making to optimize processing to objective function by U0、Ud、VdIt is decomposed.In
By U0、Ud、VdNegative matrix may be obtained when decomposition, and objective function is divided using TCT algorithm provided in this embodiment
When class, it need to guarantee that each matrix in calculating process is nonnegative matrix.Therefore, in order to avoid to U0、Ud、VdThere is negative square when decomposition
The appearance of battle array can be U before making optimization processing to catalogue scalar functions0、Ud、VdRestrictive condition is decomposed in setting.Wherein, it sets
Decomposition restrictive condition include but is not limited to:
Wherein, U0 TFor the transposed matrix for sharing topic matrix;Ud TFor source domain or the field specific topics square of target domain
The transposed matrix of battle array, when d is s, Ud TFor the field specific topics matrix of source domain, when d is t, Ud TFor the neck of target domain
Domain specific topics matrix;I is unit matrix, value 1.
205: determining the convergency value of parameters in catalogue scalar functions.
Make to optimize the formula used when processing it is found that optimal if taking catalogue scalar functions to catalogue scalar functions by above-mentioned
Solution, need to guarantee that the parameters in catalogue scalar functions take a certain minimum value, which is parameters in catalogue scalar functions
Convergency value.
Wherein, the parameters of catalogue scalar functions are U0、Us、Vs、Ws、UtAnd Vt, each in catalogue scalar functions to determining below
The process of the convergency value of a parameter is introduced one by one:
(1) parameter U is determined0Convergency value:
Firstly, introducing lagrange formula to parameter U0It is calculated:
Wherein,For Lagrange multiplier, for limiting
Secondly, make derivative operation to above formula, i.e., so thatIt is available by derivative operation:
Again, using KKT (Karush-Kuhn-Tucke, Caro need-Kuhn-Tucker condition) condition to above-mentioned formula into
Row limits, and obtains parameter U0Convergence formula are as follows:
Wherein,For inner product operation symbol, t represents current iteration, and t-1 represents last iteration, HsFor being total to for source domain
Enjoy the coefficient matrix of topic matrix, HtFor the coefficient matrix of the shared topic matrix of target domain.
Further, it ensures that and parameters is obtained according to the convergence formula of parameters in catalogue scalar functions
Convergency value, method provided in this embodiment are determining parameter U as procedure described above0Convergence formula after, will also be to parameter U0
Convergence formula carry out convergence verifying.Before carrying out convergence verifying, need first to introduce a definition, lemma and theorem.
Wherein, introducing is defined as: F (X, X ') is the auxiliary function of L (X), if L (X)≤F (X, X '), then when and only
As L (X)=F (X, X '), equal sign is set up.
Wherein, the lemma of introducing are as follows: if F is the auxiliary function of L, L is non-increasing in following renewal sequence.
It is as follows for the proof procedure of above-mentioned lemma:
Since F is the auxiliary function of L, L (X(t+1))≤F(X(t+1), X(t)), and due to F (X(t+1), X(t))≤F(X(t), X(t))=L (X), therefore, L (X(t+1))≤L(X(t)), i.e. L is non-increasing in renewal sequence.
Wherein, the theorem of introducing are as follows: if function
For L (U0) auxiliary function, then convex functionU will be converged on0。
After introducing formula, lemma and theorem, it can be obtained according to above-mentioned formula, lemma and theorem
Similarly, to the other parameters U of subsequent determinations、Vs、Ws、UtAnd VtConvergence formula convergence proof process with above-mentioned
Parameter U0, details are not described herein again.
Based on above content, according to formulaRepeatedly
For calculating parameter U0Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueMake
For parameter U0Convergency value
For above-mentioned determining parameter U0Convergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if U0For 2 × 2 rank matrixes, then according to U0
Dimension at random be U.Choose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to
FormulaIn, first time iteration is obtained by calculation
Iterative value
Again, the current iteration value that will be obtainedIt is updated to
In, it is obtained by calculation second repeatedly
The iterative value in generationContinue to obtained current iteration valueIt is iterated calculating, until current iteration valueConvergence, this
When can be by convergent current iteration valueAs parameter U0Convergency value
(2) parameter U is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter UsConvergence formula are as follows:
Wherein, LSFor the coefficient matrix of the field specific topics matrix of source domain.
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil
Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency value
For above-mentioned determining parameter UsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if UsFor 2 × 2 rank matrixes, then according to Us
Dimension at random be UsChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to
FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through meter
Calculation obtains the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change
Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter UsConvergency value
(3) parameter V is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter VsConvergence formula are as follows:
According to formulaIterate to calculate parameter VsCurrent iteration
ValueUntil current iteration valueConvergence, and by convergent current iteration valueConvergency value as parameter Vs
For above-mentioned determining parameter VsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if VsFor 2 × 2 rank matrixes, then according to VS
Dimension at random be VSChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to
FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated to
In, the iteration of second of iteration is obtained by calculation
ValueContinue to obtained current iteration valueIt is iterated calculating, until current iteration valueConvergence can incite somebody to action at this time
Convergent current iteration valueAs parameter VsConvergency value
(4) parameter W is determinedsConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter WsConvergence formula are as follows:
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil
Current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
For above-mentioned determining parameter WsConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if WsFor 2 × 2 rank matrixes, then according to Ws's
Dimension is at random WsChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to public affairs
FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through calculating
Obtain the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change
Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter WsConvergency value
(5) parameter U is determinedtConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter UtConvergence formula are as follows:
Wherein, LtFor the coefficient matrix of the field specific topics matrix of target domain.
According to formulaIterate to calculate parameter UtCurrent iteration valueDirectly
To parameter currentConvergence, and by convergent current iteration valueAs parameter UtConvergency value
For above-mentioned determining parameter UtConvergency value mode, specifically:
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if UtFor 2 × 2 rank matrixes, then according to Ut
Dimension at random be UtChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionWithIt is updated to
FormulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated toIn, pass through meter
Calculation obtains the iterative value of second of iterationContinue to obtained current iteration valueIt is iterated calculating, until current change
Generation valueConvergence, at this time can be by convergent current iteration valueAs parameter UtConvergency value
(6) parameter V is determinedtConvergency value:
According to above-mentioned determining parameter U0Principle determine parameter VtConvergence formula are as follows:
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Firstly, according to parameter U0、Us、Vs、Ws、UtAnd VtDimension at random be parameter U0、Us、Vs、Ws、UtAnd VtChoose one
A initial valueWithFor example, if VtFor 2 × 2 rank matrixes, then according to Vt
Dimension at random be VtChoose an initial valueForDeng.
Secondly, by the initial value of the parameters of selectionIt is updated to formulaIn, the iterative value of first time iteration is obtained by calculation
Again, the current iteration value that will be obtainedIt is updated to
In, the iterative value of second of iteration is obtained by calculationContinue to obtained current iteration valueCalculating is iterated,
Until current iteration valueConvergence, at this time can be by convergent current iteration valueAs parameter VtConvergency value
It should be noted that, although the initial value of parameters can randomly select in catalogue scalar functions, but due to choosing
Initial value decide parameters convergence formula convergence rate, therefore, for each convergence parameter selection initial value when can
According to being determined the case where data in source domain and target domain.By choosing suitable initial value, each receipts can be accelerated
The convergence rate of the convergence expression formula of parameter is held back, cycle-index is reduced.
Further, TCT method provided in this embodiment is taken to classify cross-cutting viewpoint data in order to know
When resource consumption situation, method provided in this embodiment using TCT algorithm classify to cross-cutting viewpoint data when, also
The computation complexity of each convergence parameter will be calculated.Referring to table 1, each convergence parameter complexity being calculated is as follows:
Table 1
Wherein, k`=k+k0, n=max { ns, nt, m > > k`, n > > k`.
206: classification function is obtained according to the convergency value of parameters in catalogue scalar functions.
Due to having determined that the convergency value of parameters in catalogue scalar functions in above-mentioned steps 205, this step
Classification function will be obtained according to the convergency value of parameters in catalogue scalar functions on the basis of above-mentioned steps 205, and then rear
Classified according to the classification function of acquisition to the viewpoint data of target domain in continuous step.
Specifically, the step of classification function being obtained according to the convergency value of parameters in catalogue scalar functions, including but it is unlimited
In following steps:
Firstly, when obtaining to objective function work optimization processing, parameter U in objective function0、Us、UtConvergency value;
Secondly, parameter U in the objective function that will acquire0、UtConvergency value be updated in following formula:
Wherein, xiFor any one document in target domain, viFor the document topic square of any of target domain document
Battle array, i viRow where in the document topic matrix of target domain, j viThe corresponding column of the row at place.
Again, according to viAnd WsObtain classification function;
Specifically, according to viAnd WsThe classification function of acquisition is as follows:
Wherein, i viThe row at place, j viThe corresponding column of the row at place.
207: being classified according to viewpoint data of the classification function to target domain.
Due to having determined that classification function in above-mentioned steps 206, this step is on the basis of above-mentioned steps by root
Classify according to viewpoint data of the classification function to target domain.Specifically, y is setiValue be 1 represents forward direction viewpoint, yiValue
Negative sense viewpoint is represented for -1, when classifying by any one document of classification function to target domain, if being calculated
YiValue be 1, then illustrate therefore the document can be divided into positive document for positive viewpoint by the viewpoint of the document expression;If
The y being calculatediValue be -1, then illustrate the document expression viewpoint be negative sense viewpoint, therefore, can by the document divide be negative
To document.
Preferably, in order to test the accuracy that TCT algorithm provided in this embodiment classifies to cross-cutting viewpoint data,
Method provided in this embodiment will also carry out experimental verification to the four of selection fields.Wherein, four fields of selection are as follows: books
Field (B), DVD (Digital Versatile Disc, digital versatile disc) field s (D), electronics field (E), kitchen
Room articles field (K).A viewpoint label is distributed during the experiment for each viewpoint in aforementioned four field.Wherein,
The viewpoint label of distribution is+1 or -1.When the viewpoint label of the viewpoint distribution for a certain field is+1, illustrate a certain field
The viewpoint is positive viewpoint, when the viewpoint label of the viewpoint distribution for a certain field is -1, illustrates the viewpoint in a certain field
For negative sense viewpoint.1000 positive sight data points and 1000 negative sense viewpoint data also are set for each field simultaneously, there are also one
A little data without marking viewpoint.In cross-cutting viewpoint data sorting task, the classification task that can be constructed has 12, respectively
Are as follows: D → B, E → B, K → B, K → E, D → E, B → E, B → D, K → D, E → D, B → K, D → K, E → K.Wherein, table before arrow
Show source domain, indicates target domain after arrow.The problem of in view of computing capability, in the present embodiment, selected by each field
Data be 5000 or so.It is as shown in table 2:
Table 2
Field | Training data | Test data | The data of viewpoint are not marked | The ratio of negative sense data |
Books | 1600 | 400 | 4465 | 50% |
DVD | 1600 | 400 | 5945 | 50% |
Electronic product | 1600 | 400 | 5681 | 50% |
Kitchen article | 1600 | 400 | 3586 | 50% |
Listed data are the data in four fields chosen in table 2, wherein include to train number in each field
According to, test data and the data for not marking viewpoint, and ratio shared by negative sense data is each FIELD Data in each field
50%.Since in 12 cross-cutting classification tasks of building, each field is both source domain and target domain, when selected
When field is as source domain, the training data in field is for constructing classification function, when selected field is as target domain, neck
Test data in domain is for testing the classification function of building.Therefore, in order to guarantee that cross-cutting viewpoint data are divided
The accuracy of class, is training data and test data that each field sets identical quantity in the present embodiment, as shown in table 2, often
Training data in a field is 1600, and test data is 400.
In order to intuitively show the superiority classified using method provided in this embodiment in cross-cutting viewpoint data,
When the data for having chosen four fields are tested, classified calculating will be carried out using different algorithms.Specifically, in addition to this reality
It applies outside the TCT algorithm used in example, also has chosen No Transf, SCL (Structural during the experiment
Correspondence Learning, the corresponding study of structure), SFA (Spectral Feature Alignment, the feature of spectrum
Queue), SDA (Stacked Denoising Auto-encoders, every layer of denoising autocoding), NMTF (non-negative
Matrix tri-factorization, nonnegative matrix three are decomposed) scheduling algorithm.
After selected algorithm, 12 cross-cutting classification in order to execute above-mentioned setting according to selected algorithm are appointed
Business, method provided in this embodiment is also by the parameter of every kind of algorithm of determination.It is patrolled since algorithm No Transf, SCL and SFA are used
It collects and returns as basic classifier, therefore, need to consider the data in four given fields in selection parameter;For algorithm
The data that SDA and NMTF will be used in the parameter set to have published thesis;Exist for algorithm TCT used in the embodiment of the present invention
When parameter is set, the value of parameter alpha is set as 1, and parameter k and k0Value then according to the classification task of building: E → B determine.
Further, when being classified using different algorithms to 12 classification tasks of building, can be obtained such as Fig. 3 and
Classification results shown in Fig. 4.Wherein, horizontal axis in Fig. 3 and Fig. 4 indicates the classification task of building, longitudinal axis presentation class it is accurate
Degree, NF represent No Transf algorithm.As can be seen from figs. 3 and 4 when classifying to 12 classification tasks of building, D → B and
The classification accuracy of B → D and K → E and E → K is higher, illustrates that the similarity of field B and D are higher, the similarity of field E and K
It is higher.The accuracy classified using different algorithms to 12 classification tasks of building is compared simultaneously it is found that using this reality
The accuracy for applying the TCT algorithm of example offer is apparently higher than other algorithms, such as SCL, SFA.
Further, the convergent of data when using TCT algorithm to classify cross-cutting viewpoint data to obtain,
The experimental result of method provided in this embodiment shows the convergent of B → D and E → K.Referring specifically to Fig. 5 and
Fig. 6.Wherein, Fig. 5 is the convergence curve of B → D, and as can be seen from Figure 5, when cycle-index reaches 300 times, the value of objective function will not
Change again.Fig. 6 is the convergence curve of E → K, as can be seen from Figure 6, when cycle-index reaches 300 times, the value of objective function function
No longer change.
Method provided in this embodiment by obtaining the shared topic matrix of source domain and target domain, and is led according to source
The field specific topics of domain and target domain construct the field specific topics matrix of source domain and the field spy of target domain respectively
Determine topic matrix, and then according to shared topic matrix, field specific topics matrix, the Polarity Matrix of source domain, source of source domain
The term matrix of the term matrix in field, the field specific topics matrix of target domain and target domain determines the catalogue offer of tender
After number, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target domain
Viewpoint data classify.Since shared topic can be used as the difference between bridge reduction field, according to above-mentioned
When classification function classifies to cross-cutting viewpoint data, the accuracy of classification can be improved.
Referring to Fig. 7, the embodiment of the invention provides a kind of sorter of cross-cutting viewpoint data, which includes:
First obtains module 701, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module 702, for according to the field specific topics of source domain and the field specific topics of target domain
The field specific topics matrix of source domain and the field specific topics matrix of target domain are obtained respectively;
First determining module 703, for according to field specific topics matrix, the source domain for sharing topic matrix, source domain
Polarity Matrix and the term matrix of source domain determine the objective function of source domain;
Second determining module 704, for according to shared topic matrix, the field specific topics matrix of target domain and target
The term matrix in field determines the objective function of target domain;
Third determining module 705, for determining catalogue according to the objective function of source domain and the objective function of target domain
Scalar functions;
4th determining module 706, for determining the convergency value of parameters in catalogue scalar functions;
Third obtains module 707, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module 708, for being classified according to viewpoint data of the classification function to target domain.
Objective function ψ as the source domain that a kind of optional embodiment, the first determining module 703 determinesAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For
Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain
Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys
For the Polarity Matrix of source domain, CsFor diagonal matrix.
Objective function ψ as the target domain that a kind of optional embodiment, the second determining module 704 determinetAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain
Topic matrix, VtFor the document topic matrix of source domain,For the transposed matrix of the document topic matrix of source domain.
The catalogue scalar functions ψ determined as a kind of optional embodiment, third determining module 705 are as follows:
Referring to Fig. 8, the 4th determining module 706, comprising:
First determination unit 7061, for according to formula
Iterate to calculate parameter U0Current iteration
ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergency valueIts
In, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient matrix of the shared topic matrix of target domain;
Second determination unit 7062, for according to formulaIterate to calculate parameter Us
Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter UsReceipts
Hold back valueWherein, LSFor the coefficient matrix of the field specific topics matrix of source domain;
Third determination unit 7063, for according to formula
Iterate to calculate parameter VsCurrent iteration valueDirectly
To current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
4th determination unit 7064, for according to formulaIterate to calculate parameter Ws
Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter Ws's
Convergency value
5th determination unit 7065, for according to formulaIterate to calculate parameter
UtCurrent iteration valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameterReceipts
Hold back valueLtFor the coefficient matrix of the field specific topics matrix of target domain;
6th determination unit 7066, for according to formulaIteration
Calculating parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs ginseng
Number VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, third obtains module 707 according to the convergency value of parameters in catalogue scalar functions
The classification function of acquisition is yi:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain
The row at place, j viThe corresponding column of the row at place.
To sum up, device provided in an embodiment of the present invention obtains shared words by the shared topic of source domain and target domain
Inscribe matrix, and according to the field specific topics of source domain and target domain construct respectively source domain field specific topics matrix and
The field specific topics matrix of target domain, and then led according to shared topic matrix, the field specific topics matrix of source domain, source
The Polarity Matrix in domain and the term matrix of source domain determine the objective function of source domain, are led according to shared topic matrix, target
The field specific topics matrix in domain and the term matrix of target domain determine the objective function of target domain, and lead according to source
After the objective function in domain and the objective function of target domain obtain catalogue scalar functions, according to parameters in catalogue scalar functions
Convergency value obtains classification function, and then is classified according to viewpoint data of the classification function to target domain.Due to sharing topic
Can be used as the difference between bridge reduction field, therefore, it is cross-cutting classify to viewpoint data when, can be improved point
The accuracy of class.
Fig. 9 is a kind of device of classification method for cross-cutting viewpoint data shown according to an exemplary embodiment
900 block diagram.For example, device 900 may be provided as a server.Referring to Fig. 9, device 900 includes processing component 922,
It further comprise one or more processors, and the memory resource as representated by memory 932, it can be by handling for storing
The instruction of the execution of component 922, such as application program.The application program stored in memory 932 may include one or one
Each above corresponds to the module of one group of instruction.In addition, processing component 922 is configured as executing instruction, it is above-mentioned to execute
The classification method of the cross-cutting viewpoint data of method, this method comprises:
101: shared topic matrix being obtained according to the shared topic of source domain and target domain, and according to the field of source domain
Specific topics and the field specific topics of target domain obtain the field specific topics matrix and target domain of source domain respectively
Field specific topics matrix.
According to shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain
Term matrix determines the objective function of source domain, and according to shared topic matrix, the field specific topics matrix of target domain
And the term matrix of target domain determines the objective function of target domain;
Catalogue scalar functions are determined according to the objective function of the objective function of source domain and target domain, and determine the catalogue offer of tender
The convergency value of parameters in number;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification function to target domain
Viewpoint data classify.
As a kind of optional embodiment, according to shared topic matrix, field specific topics matrix, the source domain of source domain
Polarity Matrix and source domain term matrix determine source domain objective function ψsAre as follows:
Wherein,To take this black norm of Luo Beini, Tr [] is trace of a matrix, XsFor the term matrix of source domain, U0For
Shared topic matrix, UsFor the field specific topics matrix of source domain, VsFor the document topic matrix of source domain,For source domain
Document topic matrix transposed matrix, α is arbitrary parameter, WsFor linear model coefficients, for predicting VsViewpoint data, Ys
For the Polarity Matrix of source domain, CsFor diagonal matrix.
As a kind of optional embodiment, according to the field specific topics of shared topic matrix, source domain and target domain
The objective function ψ for the target domain that the term matrix of matrix and target domain determinestAre as follows:
Wherein, XtFor the term matrix of target domain, U0To share topic matrix, UtIt is specific for the field of target domain
Topic matrix, VtFor the document topic matrix of target domain, Vt TFor the transposed matrix of the document topic matrix of source domain.
As a kind of optional embodiment, determined according to the objective function of the objective function of source domain and target domain total
Objective function ψ are as follows:
As a kind of optional embodiment, the convergency value of parameters in catalogue scalar functions is determined, comprising:
According to formulaIterate to calculate parameter U0's
Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergence
ValueWherein, HsFor the coefficient matrix of the shared topic matrix of source domain, HtFor the coefficient of the shared topic matrix of target domain
Matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil
Current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, LSFor source neck
The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration
ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil
Current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueDirectly
To parameter currentConvergence, and by convergent current iteration valueConvergency value as parameter UtLtFor target domain
The coefficient matrix of field specific topics matrix;
According to formulaIterate to calculate parameter VtCurrent convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
As a kind of optional embodiment, the classification function y obtained according to the convergency value of parameters in catalogue scalar functionsi
Are as follows:
Wherein, viFor any one document topic matrix of target domain, i viIn the document topic matrix of target domain
The row at place, j viThe corresponding column of the row at place.
Device 900 can also include the power management that a power supply module 926 is configured as executive device 900, and one has
Line or radio network interface 950 are configured as device 900 being connected to network and input and output (I/O) interface 958.Dress
Setting 900 can operate based on the operating system for being stored in memory 932, such as Windows ServerTM, Mac OS XTM,
UnixTM, LinuxTM, FreeBSDTM or similar.
In conclusion server provided in an embodiment of the present invention, by the shared topic for obtaining source domain and target domain
Matrix, and the field specific topics matrix and mesh of source domain are constructed according to the field specific topics of source domain and target domain respectively
The field specific topics matrix in mark field, and then according to shared topic matrix, field specific topics matrix, the source domain of source domain
Polarity Matrix, source domain term matrix, the field specific topics matrix of target domain and the term square of target domain
After battle array determines catalogue scalar functions, classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to classification
Function classifies to the viewpoint data of target domain.Since shared topic can be used as the difference between bridge reduction field,
Therefore, when classifying according to above-mentioned classification function to cross-cutting viewpoint data, the accuracy of classification can be improved.
It should be understood that the sorter of cross-cutting viewpoint data provided by the above embodiment is to cross-cutting viewpoint number
According to classification when, only the example of the division of the above functional modules, in practical application, can according to need and will be above-mentioned
Function distribution is completed by different functional modules, i.e., the internal structure of cross-cutting viewpoint data is divided into different function moulds
Block, to complete all or part of the functions described above.In addition, the classification of cross-cutting viewpoint data provided by the above embodiment
The classification method embodiment of device and cross-cutting viewpoint data belongs to same design, and specific implementation process is detailed in method implementation
Example, which is not described herein again.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely presently preferred embodiments of the present invention, is not intended to limit the invention, it is all in spirit of the invention and
Within principle, any modification, equivalent replacement, improvement and so on be should all be included in the protection scope of the present invention.
Claims (10)
1. a kind of classification method of cross-cutting viewpoint data, which is characterized in that the described method includes:
Shared topic matrix is obtained according to the shared topic of source domain and target domain, and according to the field specific topics of source domain
With the field specific topics of target domain obtain respectively source domain field specific topics matrix and target domain field it is specific
Topic matrix;
According to the shared topic matrix, the field specific topics matrix of source domain, the Polarity Matrix of source domain and source domain
Term matrix determines the objective function of source domain, the objective function ψ of the source domainsAre as follows:
Wherein, describedTo take this black norm of Luo Beini, the Tr [] is trace of a matrix, the XsFor the term square of source domain
Battle array, the U0To share topic matrix, the UsFor the field specific topics matrix of source domain, the VsFor the document of source domain
Topic matrix, it is describedFor the transposed matrix of the document topic matrix of source domain, the α is arbitrary parameter, the WsIt is linear
Model coefficient, for predicting the VsViewpoint data, the YsFor the Polarity Matrix of source domain, the CsFor diagonal matrix;
And it is true according to the term matrix of the shared topic matrix, the field specific topics matrix of target domain and target domain
Set the goal the objective function in field, the objective function ψ of the target domaintAre as follows:
Wherein, the XtFor the term matrix of target domain, the U0For the shared topic matrix, the UtFor target neck
The field specific topics matrix in domain, the VtFor the document topic matrix of target domain, the Vt TIt is talked about for the document of target domain
Inscribe the transposed matrix of matrix;
Catalogue scalar functions are determined according to the objective function of the objective function of the source domain and the target domain, and described in determination
The convergency value of parameters in catalogue scalar functions;
Classification function is obtained according to the convergency value of parameters in catalogue scalar functions, and according to the classification function to target domain
Viewpoint data classify.
2. the method according to claim 1, wherein being led according to the objective function of the source domain and the target
The catalogue scalar functions ψ that the objective function in domain determines are as follows:
3. according to the method described in claim 2, it is characterized in that, the parameter of the catalogue scalar functions is U0、Us、Vs、Ws、UtWith
Vt;
The convergency value of parameters in the determination catalogue scalar functions, comprising:
According to formulaIterate to calculate parameter U0It is current
Iterative valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter U0Convergency valueWherein, the HsFor the coefficient matrix of the shared topic matrix of source domain, the HtFor the shared topic matrix of target domain
Coefficient matrix;
According to formulaIterate to calculate parameter UsCurrent iteration valueUntil working as
Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter UsConvergency valueWherein, the LSSource neck
The coefficient matrix of the field specific topics matrix in domain;
According to formulaIterate to calculate parameter VsCurrent iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
According to formulaIterate to calculate parameter WsCurrent iteration valueUntil current change
Generation valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
According to formulaIterate to calculate parameter UtCurrent iteration valueUntil current
ParameterConvergence, and by convergent current iteration valueAs parameter UtConvergency valueLtIt is special for the field of target domain
Determine the coefficient matrix of topic matrix;
According to formulaIterate to calculate parameter VtCurrent convergency value
Until parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergency value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
4. according to the method described in claim 3, it is characterized in that, being obtained according to the convergency value of parameters in catalogue scalar functions
Classification function yiAre as follows:
Wherein, the viFor any one document topic matrix of target domain, the i is the viIt is talked about in the document of target domain
The row where in matrix is inscribed, the j is the viThe corresponding column of the row at place.
5. a kind of sorter of cross-cutting viewpoint data, which is characterized in that described device includes:
First obtains module, for obtaining shared topic matrix according to the shared topic of source domain and target domain;
Second obtains module, for being obtained respectively according to the field specific topics of source domain and the field specific topics of target domain
The field specific topics matrix of source domain and the field specific topics matrix of target domain;
First determining module, for the pole according to the field specific topics matrix of the shared topic matrix, source domain, source domain
The term matrix of property matrix and source domain determines the objective function of source domain, the objective function ψ of the source domainsAre as follows:
Wherein, describedTo take this black norm of Luo Beini, the Tr [] is trace of a matrix, the XsFor the term square of source domain
Battle array, the U0To share topic matrix, the UsFor the field specific topics matrix of source domain, the VsFor the document of source domain
Topic matrix, it is describedFor the transposed matrix of the document topic matrix of source domain, the α is arbitrary parameter, the WsIt is linear
Model coefficient, for predicting the VsViewpoint data, the YsFor the Polarity Matrix of source domain, the CsFor diagonal matrix;
Second determining module, for being led according to the field specific topics matrix and target of the shared topic matrix, target domain
The term matrix in domain determines the objective function of target domain, the objective function ψ of the target domaintAre as follows:
Wherein, the XtFor the term matrix of target domain, the U0For the shared topic matrix, the UtFor target neck
The field specific topics matrix in domain, the VtFor the document topic matrix of target domain, the Vt TFor the document topic of source domain
The transposed matrix of matrix;
Third determining module, for determining catalogue according to the objective function of the source domain and the objective function of the target domain
Scalar functions;
4th determining module, for determining the convergency value of parameters in the catalogue scalar functions;
Third obtains module, for obtaining classification function according to the convergency value of parameters in catalogue scalar functions;
Categorization module, for being classified according to viewpoint data of the classification function to target domain.
6. device according to claim 5, which is characterized in that the catalogue scalar functions ψ that the third determining module determines are as follows:
7. device according to claim 6, which is characterized in that the 4th determining module, comprising:
First determination unit, for according to formulaIteration meter
Calculate parameter U0Current iteration valueUntil current iteration valueConvergence, and by convergent current iteration valueAs ginseng
Number U0Convergency valueWherein, the HsFor the coefficient matrix of the shared topic matrix of source domain, the HtFor target domain
The coefficient matrix of shared topic matrix;
Second determination unit, for according to formulaIterate to calculate parameter UsIt is current repeatedly
Generation valueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter UsConvergency value
Wherein, the LSFor the coefficient matrix of the field specific topics matrix of source domain;
Third determination unit, for according to formula
Iterate to calculate parameter VsCurrent iteration valueUntil working as
Preceding iterative valueConvergence, and by convergent current iteration valueAs parameter VsConvergency value
4th determination unit, for according to formulaIterate to calculate parameter WsCurrent iteration
ValueUntil current iteration valueConvergence, and by convergent current iteration valueAs parameter WsConvergency value
5th determination unit, for according to formulaIterate to calculate parameter UtIt is current to change
Generation valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter UtConvergency valueLtFor
The coefficient matrix of the field specific topics matrix of target domain;
6th determination unit, for according to formulaIterate to calculate parameter Vt
Current convergency valueUntil parameter currentConvergence, and by convergent current iteration valueAs parameter VtConvergence
Value
Wherein,Inner product operation symbol is represented, t represents current iteration, and t-1 represents last iteration.
8. device according to claim 7, which is characterized in that the third obtains module according to each in catalogue scalar functions
The classification function y that the convergency value of parameter obtainsiAre as follows:
Wherein, the vi is any one document topic matrix of target domain, and the i is that the vi is talked about in the document of target domain
The row where in matrix is inscribed, the j is the corresponding column of row where the vi.
9. a kind of server, which is characterized in that the server includes one or a processor and memory, the storage
One or more instructions are stored in device, described instruction is loaded by the processor and executed to realize such as claim 1
Operation performed by classification method to the described in any item cross-cutting viewpoint data of claim 4.
10. a kind of computer readable storage medium, which is characterized in that be stored in the computer readable storage medium one or
More than one instruction, described instruction is as processor loads and executes to realize as described in claim 1 to any one of claim 4
Cross-cutting viewpoint data classification method performed by operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410201027.7A CN105095277B (en) | 2014-05-13 | 2014-05-13 | The classification method and device of cross-cutting viewpoint data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410201027.7A CN105095277B (en) | 2014-05-13 | 2014-05-13 | The classification method and device of cross-cutting viewpoint data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105095277A CN105095277A (en) | 2015-11-25 |
CN105095277B true CN105095277B (en) | 2019-12-03 |
Family
ID=54575730
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410201027.7A Active CN105095277B (en) | 2014-05-13 | 2014-05-13 | The classification method and device of cross-cutting viewpoint data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105095277B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107392242B (en) * | 2017-07-18 | 2020-06-19 | 广东工业大学 | Cross-domain picture classification method based on homomorphic neural network |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102937960A (en) * | 2012-09-06 | 2013-02-20 | 北京邮电大学 | Device and method for identifying and evaluating emergency hot topic |
CN103761311A (en) * | 2014-01-23 | 2014-04-30 | 中国矿业大学 | Sentiment classification method based on multi-source field instance migration |
CN104239402A (en) * | 2014-07-23 | 2014-12-24 | 中国科学院自动化研究所 | Document enquiry method and document enquiry device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9194800B2 (en) * | 2012-10-29 | 2015-11-24 | Tokitae Llc | Systems, devices, and methods employing angular-resolved scattering and spectrally resolved measurements for classification of objects |
-
2014
- 2014-05-13 CN CN201410201027.7A patent/CN105095277B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102937960A (en) * | 2012-09-06 | 2013-02-20 | 北京邮电大学 | Device and method for identifying and evaluating emergency hot topic |
CN103761311A (en) * | 2014-01-23 | 2014-04-30 | 中国矿业大学 | Sentiment classification method based on multi-source field instance migration |
CN104239402A (en) * | 2014-07-23 | 2014-12-24 | 中国科学院自动化研究所 | Document enquiry method and document enquiry device |
Non-Patent Citations (1)
Title |
---|
"微博话题评论的情感分析研究";曾佳妮等;《信息安全与通信保密》;20130328;第56-58页 * |
Also Published As
Publication number | Publication date |
---|---|
CN105095277A (en) | 2015-11-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Sahoo et al. | Exploratory data analysis using Python | |
Shao et al. | Online multi-view clustering with incomplete views | |
Sharma | Deep challenges associated with deep learning | |
Sussman et al. | A consistent adjacency spectral embedding for stochastic blockmodel graphs | |
Gao et al. | Stability analysis of learning algorithms for ontology similarity computation | |
CN106909931B (en) | Feature generation method and device for machine learning model and electronic equipment | |
CN106095966B (en) | User extensible label labeling method and system | |
CN109241290A (en) | A kind of knowledge mapping complementing method, device and storage medium | |
Reff | Spectral properties of oriented hypergraphs | |
CN105825269B (en) | A kind of feature learning method and system based on parallel automatic coding machine | |
CN104616029A (en) | Data classification method and device | |
Bai et al. | Multidimensional scaling on multiple input distance matrices | |
CN105320764A (en) | 3D model retrieval method and 3D model retrieval apparatus based on slow increment features | |
Mutar et al. | Smoke detection based on image processing by using grey and transparency features | |
CN103942214B (en) | Natural image classification method and device on basis of multi-modal matrix filling | |
CN104077408B (en) | Extensive across media data distributed semi content of supervision method for identifying and classifying and device | |
Sharma et al. | Comparative Analysis of Data Storage Solutions for Responsive Big Data Applications | |
Gavval et al. | CUDA-Self-Organizing feature map based visual sentiment analysis of bank customer complaints for Analytical CRM | |
Aftab et al. | Sentiment analysis of customer for ecommerce by applying AI | |
Belanche et al. | Handling missing values in kernel methods with application to microbiology data | |
CN105095277B (en) | The classification method and device of cross-cutting viewpoint data | |
Baskaran et al. | Accelerated low-rank updates to tensor decompositions | |
CN104268217A (en) | User behavior time relativity determining method and device | |
Aluja-Banet et al. | GRAFT, a complete system for data fusion | |
Lu et al. | Explainable, stable, and scalable graph convolutional networks for learning graph representation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |