CN106294506A - The viewpoint data classification method of domain-adaptive and device - Google Patents
The viewpoint data classification method of domain-adaptive and device Download PDFInfo
- Publication number
- CN106294506A CN106294506A CN201510316353.7A CN201510316353A CN106294506A CN 106294506 A CN106294506 A CN 106294506A CN 201510316353 A CN201510316353 A CN 201510316353A CN 106294506 A CN106294506 A CN 106294506A
- Authority
- CN
- China
- Prior art keywords
- matrix
- domain
- source domain
- target domain
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses the viewpoint data classification method of a kind of domain-adaptive, belongs to Internet technical field.Comprise determining that source domain term matrix and target domain term matrix;Determine source domain object function and target domain object function;General objective function is determined according to source domain object function and target domain object function;Determine the desired value of parameters in general objective function respectively;The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model, and the viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.Owing to the hinge matrix of the shared topic between general objective function with source domain specific topics matrix, target domain specific topics matrix and expression source domain and target domain is relevant, thus provide a kind of viewpoint data classification method realizing domain-adaptive by shared topic.Owing to shared topic can reduce the difference between source domain and target domain, thus can ensure that the accuracy of classification results.
Description
Technical field
The present invention relates to Internet technical field, particularly to the viewpoint data classification side of a kind of domain-adaptive
Method and device.
Background technology
Along with the development of Internet technology, the viewpoint data that user shares on the internet get more and more.Such as,
User comment, user that user delivers at shopping website are viewpoint number for the feedback opinion etc. of a certain product
According to.Wherein, the viewpoint data that user shares can relate to multiple field, such as garment industry, books field etc..
For the ease of research or the viewpoint data of statistics every field, it usually needs the viewpoint data of every field are entered
Row classification.Generally, when the viewpoint data in some field are classified, need first to mark in this field
Viewpoint data, and based on mark viewpoint data training grader realize.But, due in the Internet
The field involved by viewpoint data a lot, be all labeled wasting a lot to the viewpoint data in each field
Resource, and the viewpoint data classification method of domain-adaptive, it is possible to achieve in the viewpoint data to some field
In the case of not being labeled, it is achieved the viewpoint data in these fields are classified.
To use SFA (Spectral Feature Alignment, the feature queue of spectrum) algorithm to realize field certainly
As a example by the viewpoint data classification adapted to, correlation technique is when the viewpoint data realizing domain-adaptive are classified, first
A first arbitrarily selected source domain and target domain, and determine source domain and target domain field specific word and
Field autonomous word.Wherein, field specific word is word specific to a field, and field autonomous word is connection source
Bridge between field and target domain.Then, between field specific word and field autonomous word, one is built
Two-dimensional plot, this two-dimensional plot is used for representing the cooccurrence relation between field specific word and field autonomous word, and then adopts
Two-dimensional plot will contact more field specific word and field autonomous word is assigned in one bunch with SFA algorithm.By
The gap between the field specific word of source domain and target domain can be reduced in this bunch, therefore, can basis
This bunch trains a grader, and then the grader obtained by training realizes the viewpoint number of domain-adaptive
According to classification.
During realizing the present invention, inventor finds that correlation technique at least there is problems in that
Correlation technique is when the viewpoint data realizing domain-adaptive are classified, due to selected source domain and target
All words involved by field might not clearly be divided into field specific word or field autonomous word, leads
The classification results that viewpoint data are classified by the viewpoint data of the domain-adaptive that cause is proposed by correlation technique
The most accurate.
Summary of the invention
In order to solve problem of the prior art, embodiments provide the viewpoint number of a kind of domain-adaptive
According to sorting technique and device.Described technical scheme is as follows:
First aspect, it is provided that the viewpoint data classification method of a kind of domain-adaptive, described method includes:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics
Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, determines source domain mesh
Scalar functions;
According to described target domain term matrix, target domain specific topics matrix, described target domain spy
Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function;
According to described source domain object function and described target domain object function, determine general objective function;
Determine the desired value of parameters in described general objective function respectively;
The viewpoint data of mark in desired value according to described parameters and described source domain, training is specified
Disaggregated model, the viewpoint data of described target domain are classified by the disaggregated model of specifying obtained by training.
Second aspect, it is provided that the viewpoint device for classifying data of a kind of domain-adaptive, described device includes:
First determines module, for according to the relation between document and the term of source domain, determines source domain
Term matrix;
Second determines module, for according to the relation between document and the term of target domain, determines target
Field term matrix;
3rd determines module, for according to described source domain term matrix, source domain specific topics matrix,
Hinge between coefficient matrix and described source domain and the described target domain of described source domain specific topics matrix
Matrix, determines source domain object function;
4th determines module, for according to described target domain term matrix, target domain specific topics square
Battle array, the coefficient matrix of described target domain specific topics matrix and described hinge matrix, determine target domain mesh
Scalar functions;
5th determines module, is used for according to described source domain object function and described target domain object function,
Determine general objective function;
6th determines module, for determining the desired value of parameters in described general objective function respectively;
Training module, the viewpoint of mark in the desired value according to described parameters and described source domain
Data, disaggregated model is specified in training;
Sort module, specifies the disaggregated model viewpoint data to described target domain for obtained by training
Classify.
The technical scheme that the embodiment of the present invention provides has the benefit that
Due to the general objective function that determines and source domain specific topics matrix, target domain specific topics matrix and
The hinge matrix of the shared topic between expression source domain and target domain is relevant, thus provides a kind of by source
Shared topic between field and target domain realizes the viewpoint data classification method of domain-adaptive.Due to altogether
Enjoy topic and can reduce the difference between source domain and target domain so that by the sight of this kind of domain-adaptive
When point data sorting technique is classified, it can be ensured that the accuracy of classification results.
Accompanying drawing explanation
For the technical scheme being illustrated more clearly that in the embodiment of the present invention, institute in embodiment being described below
The accompanying drawing used is needed to be briefly described, it should be apparent that, the accompanying drawing in describing below is only the present invention
Some embodiments, for those of ordinary skill in the art, on the premise of not paying creative work,
Other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the flow process of the viewpoint data classification method of a kind of domain-adaptive that one embodiment of the invention provides
Figure;
Fig. 2 is the stream of the viewpoint data classification method of a kind of domain-adaptive that another embodiment of the present invention provides
Cheng Tu;
Fig. 3 is a kind of convergence curve that another embodiment of the present invention provides;
Fig. 4 is that a kind of difference for every pair of field that another embodiment of the present invention provides carries out testing obtaining
Experimental result picture;
Fig. 5 is the knot of the viewpoint device for classifying data of a kind of domain-adaptive that another embodiment of the present invention provides
Structure schematic diagram;
Fig. 6 is the structural representation of a kind of server that another embodiment of the present invention provides;
Fig. 7 is the structural representation of a kind of terminal that another embodiment of the present invention provides.
Detailed description of the invention
For making the object, technical solutions and advantages of the present invention clearer, below in conjunction with accompanying drawing to the present invention
Embodiment is described in further detail.
Along with developing rapidly of Internet technology, the viewpoint number that can be identified for that user's emotion that the Internet is shared
According to more and more.Such as, when after user's net purchase success, the commodity to this purchase can be delivered in comment
Evaluation opinion;After some user delivers blog, what the content that this user is delivered by other users was carried out comments
Opinion etc..Wherein, viewpoint data may be derogatory sense, it is also possible to for commendation;May be subjective, it is possible to
Can be objective etc..It is to say, viewpoint data have certain feeling polarities, the emotion pole of viewpoint data
Property includes positively and negatively etc., and the process of the feeling polarities of research viewpoint data is and carries out viewpoint data point
The process of class.By the feeling polarities of viewpoint data is studied, guide product or service etc. are produced
Practice has great importance, consequently, it is frequently necessary to classify viewpoint data.
Further, the viewpoint data owing to including on the Internet relate to multiple different field.For the ease of
The viewpoint data in multiple fields are classified, it will usually use the sorting technique of domain-adaptive.By neck
The adaptive sorting technique in territory, it is possible to achieve be labeled in not viewpoint data to certain or some fields
In the case of, it is achieved the viewpoint data in this or these field are classified.The embodiment of the present invention i.e. provides
The viewpoint data classification method of a kind of domain-adaptive.In embodiments of the present invention, source domain includes one
A little viewpoint data of mark having marked polarity, and target domain may not include having marked viewpoint data,
The method provided by the embodiment of the present invention, it may be determined that the feeling polarities of arbitrary viewpoint data in target domain,
Thus realize arbitrary viewpoint data of target domain are classified, specific field adaptive viewpoint data are divided
Class method each embodiment as described below:
Fig. 1 is the viewpoint data classification method of a kind of domain-adaptive provided according to an exemplary embodiment
Flow chart.Seeing Fig. 1, the method flow that the embodiment of the present invention provides includes:
101: according to the relation between document and the term of source domain, determine source domain term matrix.
102: according to the relation between document and the term of target domain, determine target domain term matrix.
103: according to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix
Coefficient matrix and source domain and target domain between hinge matrix, determine source domain object function.
104: according to target domain term matrix, target domain specific topics matrix, the specific words of target domain
The coefficient matrix of topic matrix and hinge matrix, determine target domain object function.
105: according to source domain object function and target domain object function, determine general objective function.
106: determine the desired value of parameters in general objective function respectively.
107: according to the viewpoint data of mark in the desired value of parameters and source domain, classification is specified in training
Model, the viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.
The method that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix,
The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has
Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided
Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that
When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results
Really property.
In another embodiment, according to source domain term matrix, source domain specific topics matrix, source neck
Hinge matrix between coefficient matrix and source domain and the target domain of territory specific topics matrix, determines source domain
Object function, including:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain target letter by below equation
Number:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For hinge matrix, Us
For source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent Fei Luobei
Ni Wusi norm;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function, including:
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function by below equation:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For hinge matrix,
UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
In another embodiment, according to source domain object function and target domain object function, catalogue is determined
Scalar functions, including:
According to source domain object function and target domain object function, determine general objective function by equation below:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
In another embodiment, determine the desired value of parameters in general objective function respectively, including:
It is respectively the value initial value as parameters of one non-negative of parameters random assortment;
According to the initial value of parameters, calculate the convergency value of parameters, the convergency value of parameters is made
Desired value for parameters.
In another embodiment, the parameters in general objective function includes U0、Us、Ut、VsAnd Vt;
According to the initial value of parameters, calculate the convergency value of parameters, including:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
In conjunction with the content of embodiment corresponding to Fig. 1, Fig. 2 is a kind of neck provided according to an exemplary embodiment
The flow chart of territory adaptive viewpoint data classification method.See Fig. 2, the method stream that the embodiment of the present invention provides
Journey includes:
201: according to the relation between document and the term of source domain, determine source domain term matrix.
Wherein, source domain includes that some have marked viewpoint data, it is also possible to include that some do not mark viewpoint data,
Having marked viewpoint data can be to have marked document.Viewpoint data have been marked, the data energy marked for arbitrary
The feeling polarities enough showing these viewpoint data is forward or negative sense.Such as, if some has marked viewpoint
Data are one and have marked document, and represent that with "+1 " and "-1 " feeling polarities is positively and negatively respectively,
If this labeled data having marked document is "+1 ", then may determine that this has marked the feeling polarities of document and has been
Forward.It addition, about the type of source domain, the embodiment of the present invention is not especially limited.Such as, source domain
Can be books field, electronic applications, garment industry etc..
Generally, each field can include multiple document, and each document is made up of at least one term, because of
This, for arbitrary field, can represent the pass between the document in this field and term by term matrix
System, thus realize identifying the feature in this field.In embodiments of the present invention, in order to determine document in source domain
And the relation between term, to determine the feature of source domain, sets source domain as Xs, source domain comprises
Number of files be nsIndividual, the quantity of the term comprised in each document is m, and on this basis, source is led
Territory term matrix can be expressed as:
Wherein, source domain term matrix XsIn the weight of each element representation correspondence term.Each inspection
The weight of rope word can be according to the relation between the document of source domain and term, by TF-IDF Algorithm for Solving
Obtain.
Owing to document each in source domain comprising m term, therefore, the term matrix of source domain
It is also shown asI.e.
It addition, when arbitrary viewpoint data are document, a number of mark sight owing to source domain comprising
Point data, i.e. source domain include a number of having marked document, for the ease of in follow-up use source domain
The viewpoint data of mark training specify disaggregated model, for the document of mark in source domain, can pass through
One document Polarity Matrix YsRepresent that each has marked the feeling polarities of document.Specifically, YsCan be one
nsThe matrix of × 2, nsThe quantity of the document for comprising in source domain, 2 represent that the feeling polarities kind of document has two
Kind: a kind of polarity is forward, represents that the viewpoint that document is expressed is forward viewpoint, and a kind of polarity is negative sense, table
Show that the viewpoint that document is expressed is negative sense viewpoint.In conjunction with the content of document Polarity Matrix, with i-th in source domain
As a example by individual document, if the element y in the i-th document correspondence document Polarity Matrix of source domaini=1, then can determine that
In source domain, the feeling polarities of i-th document is forward, and the viewpoint that i.e. the document is expressed is forward viewpoint;If source
Element y in the Polarity Matrix that the i-th document in field is correspondingi=-1, it is determined that i-th document in source domain
Feeling polarities is negative sense, and the viewpoint that i.e. the document is expressed is negative sense viewpoint.Certainly, aforesaid way is only with "+1 "
Illustrated as a example by the feeling polarities of "-1 " expression document, but, in the specific implementation, also may be used
Using the feeling polarities of other numeric representation document, this is not made concrete restriction by the present embodiment.
202: according to the relation between document and the term of target domain, determine target domain term matrix.
Wherein, target domain may not include having marked viewpoint data.It addition, target domain can be books
The fields different from source domain such as field, electronic applications, garment industry, the present embodiment is not to target domain
Type makees concrete restriction.In conjunction with the content in above-mentioned steps 201, in order to determine the document of target domain with
Relation between term, to determine the feature of target domain, embodiment of the present invention target setting field is Xt,
The number of files comprised in target domain is ntIndividual, the quantity of the term comprised in each document is m, then
The term matrix of target domain can be expressed as:
Owing to document each in target domain comprising m term, therefore, the term of target domain
Matrix is also shown asI.e.
It should be noted that above-mentioned steps 201 and step 202 are only first to determine source domain term matrix,
It is illustrated as a example by determining target domain term matrix again.But, in the specific implementation, it is also possible to first
Determine target domain term matrix, then determine source domain term matrix;Source domain can also be determined simultaneously
Term matrix and target domain term matrix.
203: according to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix
Coefficient matrix and source domain and target domain between hinge matrix, determine source domain object function.
Typically for different types of field, often have some field specific topics.Such as, for electricity
Sub-product scope, " durable " and " brightness " is its field specific topics.Source domain specific topics matrix is
The matrix being made up of the field specific topics of source domain.For convenience of description, the embodiment of the present invention assumes source neck
The quantity of the field specific topics in territory is ks, source domain specific topics matrix is Us, then source domain specific topics
Matrix can be expressed as:
Us=[u1 (s)..., uk (s)]。
Owing to each document in source domain comprising m term, therefore, source domain specific topics square
Battle array is also denoted asI.e.Wherein, the every string in source domain specific topics matrix
Represent a specific topics of source domain.
It addition, source domain and target domain generally will also include some shared topics, source domain and target domain
Between the topic that shared topic is source domain and target domain all can relate to.Such as, source domain is books necks
Territory, target domain is garment industry, and the topic such as " expensive ", " cheaply " all can relate at source domain and target domain,
Therefore, the topic such as " expensive ", " cheaply " can be as the shared topic between source domain and target domain.The present invention
Embodiment represents the shared topic between source domain and target domain by hinge matrix.
Specifically, for convenience of description, the embodiment of the present invention sets the quantity sharing topic as k0, source domain
And the hinge matrix between target domain is U0, then the hinge matrix U between source domain and target domain0Permissible
It is expressed as:
Owing to each document in source domain and target domain comprising m term, therefore, hinge square
Battle array is also denoted asI.e.Wherein, each list in hinge matrix show source domain and
A shared topic between target domain.
In conjunction with foregoing, owing to source domain specific topics matrix and hinge matrix are included by source domain
Topic, therefore, the topic numbers included by source domain is k0+ks。
It addition, in embodiments of the present invention, source domain object function can represent the feature of source domain exactly,
And source domain object function is that carry out in subsequent step that the viewpoint data of domain-adaptive carry out classifying important depends on
According to, it is thus necessary to determine that source domain object function.Specific due to source domain term matrix, source domain again
Topic matrix and hinge matrix may be incorporated for representing the feature of source domain, therefore, it can examine according to source domain
Rope word matrix, source domain specific topics matrix, the coefficient matrix of source domain specific topics matrix and source domain with
Hinge matrix between target domain, determines source domain object function.
Specifically, can be specific according to source domain term matrix, source domain specific topics matrix, source domain
Hinge matrix between coefficient matrix and source domain and the target domain of topic matrix, is determined by below equation
Source domain object function:
In formula (1), OsFor source domain object function, XsFor source domain term matrix, U0For hinge
Matrix, UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Table
Show expense Luo Beini this norm of crow.
Above-mentioned formula (1) understands, hinge matrix, source domain specific topics matrix, source domain specific topics square
The coefficient matrix of battle array and source domain term matrix determine that the key of source domain object function, therefore, really
Before determining source domain object function, need first to determine source domain specific topics matrix, source domain specific topics square
The coefficient matrix of battle array and hinge matrix.Wherein, source domain term matrix and source domain specific topics matrix,
Between coefficient matrix and the hinge matrix of source domain specific topics matrix, there is certain relation.It is explained below
Source domain term matrix and source domain specific topics matrix, the coefficient square of source domain specific topics matrix once
Relation between battle array and hinge matrix.
Specifically, in conjunction with formula (1), in the ideal case, source domain term matrix XsCan decompose
To two matrixes, a matrix is source domain document topic matrix Vs, a matrix is source domain entry topic
Matrix Us'.Wherein, source domain entry topic matrix Us' it is a m × (ks+k0) rank matrix, i.e.Source domain entry topic matrix UsThe matrix comprised in ' includes but not limited to hinge matrix U0
With source domain specific topics matrix Us.Source domain document topic matrix VsIt is a ns×(ks+k0) rank matrix, i.e.Every a line in matrix represents a document in source domain.Source domain document topic matrix VsAgain
Matrix H can be decomposed intosAnd matrix Ls.Wherein, HsIt is a ns×k0Rank matrix, for hinge matrix to source
The coefficient matrix in field, for representing hinge matrix weight size in source domain;LsIt is a ns×ksRank
Matrix, for the coefficient matrix of source domain specific topics matrix.
About the method carrying out decomposing by the term matrix of source domain, include but not limited to use nonnegative matrix
Source domain term matrix is decomposed by decomposition method.Wherein, Non-negative Matrix Factorization method is for own in a matrix
Element is the matrix disassembling method under nonnegative number constraints, Non-negative Matrix Factorization method by find low-rank,
Matrix decomposition is become the matrix of several non-negative.
Actual application use the example of Non-negative Matrix Factorization method split-matrix to have a lot, as used nonnegative matrix
Pixel in decomposition digital picture, the word statistics in text analyzing and stock price etc..Nonnegative matrix is divided
The basic thought of solution can be briefly described into: for any given nonnegative matrix A, one can be found
Individual nonnegative matrix U and nonnegative matrix V so that the matrix A of non-negative can resolve into nonnegative matrix U and
The product of V.Non-negative Matrix Factorization method is utilized to carry out text, the analysis of image large-scale data, more traditional place
Adjustment method more can describe and portray potential semantic information.
204: according to target domain term matrix, target domain specific topics matrix, the specific words of target domain
The coefficient matrix of topic matrix and hinge matrix, determine target domain object function.
In conjunction with the content in above-mentioned steps 203, the field specific topics of target domain is peculiar by target domain
Topic, target domain specific topics matrix is the matrix being made up of the specific topics of target domain.In order to just
In explanation, the embodiment of the present invention assumes that the quantity of the field specific topics of target domain is kt, target domain is specific
Topic matrix is Ut, then target domain specific topics matrix UtCan be expressed as:
Ut=[u1 (t)..., uk (t)]。
Owing to each document in target domain comprising m term, therefore, the specific words of target domain
Topic matrix is also denoted asI.e.Wherein, in target domain specific topics matrix
A specific topics of target domain is shown in each list.
In conjunction with the content in above-mentioned steps 203, owing to target domain specific topics matrix and hinge matrix are
Topic included by source domain, therefore, the topic numbers included by target domain is k0+kt。
It addition, in embodiments of the present invention, target domain object function can represent target domain well
Feature, again due to target domain object function be subsequent step is carried out domain-adaptive viewpoint data classification
Important evidence, it is thus necessary to determine that target domain object function.Again due to target domain term matrix,
Target domain specific topics matrix and hinge matrix may be incorporated for represent target domain feature, therefore, can
With according to target domain term matrix, target domain specific topics matrix, target domain specific topics matrix
Coefficient matrix and hinge matrix, determine target domain object function.
Specifically, can be according to target domain term matrix, target domain specific topics matrix, target neck
The coefficient matrix of territory specific topics matrix and hinge matrix, determine target domain object function by below equation:
In formula (2), OtFor target domain object function, XtFor target domain term matrix, U0For
Hinge matrix, UtFor target domain specific topics matrix, VtCoefficient square for target domain specific topics matrix
Battle array.
Further, above-mentioned formula (2), hinge matrix and target domain specific topics matrix it is true
Set the goal the key of field object function, therefore, before determining target domain object function, needs the most really
Determine the field specific topics matrix of hinge matrix, target domain.About determining hinge matrix, target domain
The method of the coefficient matrix of field specific topics matrix and target domain specific topics matrix, includes but not limited to
Target domain term matrix is carried out decomposition obtain.
Specifically, in conjunction with formula (2), in the ideal case, the term matrix X of target domaintCan divide
Solution obtains two matrixes, and a matrix is target domain document topic matrix Vt, a matrix is target domain
Entry topic matrix Ut'.Wherein, target domain entry topic matrix Ut' it is a m × (kt+k0) rank matrix,
I.e.Target domain entry topic matrix UtThe matrix comprised in ' includes but not limited to hinge square
Battle array U0With target domain specific topics matrix Ut.Target domain document topic matrix VtIt is a nt×(kt+k0)
Rank matrix, i.e.Every a line in matrix represents a document in target domain.Target domain
Document topic matrix VtMatrix H can be decomposed into againtAnd matrix Lt, wherein, HtIt is a nt×k0Rank matrix,
For the hinge matrix coefficient matrix to target domain, it is used for representing that hinge matrix weight in target domain is big
Little;LtIt is a nt×k0Rank matrix, for the coefficient matrix of target domain specific topics matrix.
About the method carrying out decomposing by the term matrix of target domain, include but not limited to use non-negative square
The term matrix of target domain is decomposed by battle array decomposition method.
It should be noted that the present embodiment is to performing in above-mentioned steps 203 and step 204 to determine source domain
Object function and determine that the sequencing of target domain object function is defined, when specifically performing, the most permissible
First determine source domain object function, it is also possible to first determine target domain object function, it is also possible to determine source simultaneously
Field object function and target domain object function.
205: according to source domain object function and target domain object function, determine general objective function.
Specifically, when determining general objective function, a simple directly mode is exactly directly by source domain mesh
Scalar functions and target domain object function are added and obtain.But, when so determining general objective function, there is one
Significantly defect cannot significantly distinguish source domain and the respective domain level constraints of target domain exactly and they have
Domain level constraints, this would likely result in obtain source domain field specific topics time, do not have restrictive condition about
Restraint its shared topic not obtaining between source domain and target domain;When obtaining shared topic, the most do not limit
Its not acquisition from the field specific topics of source domain or target domain of constraint processed.In order to overcome above-mentioned lacking
Falling into, the embodiment of the present invention is when determining general objective function, at source domain object function and target domain target letter
A regular terms is added on the basis of number.The problems referred to above can be overcome by this regular terms.
In conjunction with foregoing, can be according to source domain object function and target domain object function, by as follows
Formula determines general objective function:
In formula (3) and formula (4), Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is canonical
, α, β, γ are each regularization parameter, and Tr () is matrix trace,For passing through lagrange multiplier approach
Limiting U0(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach
Limiting Us(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach
Limiting Ut(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach
Limiting Vs(i, the Lagrange multiplier matrix obtained under the conditions of j) >=0,For passing through lagrange multiplier approach
Limiting Vt(i, j represent U respectively for i, the Lagrange multiplier matrix obtained under the conditions of j) >=00、Us、Ut、
VsAnd VtIn any row and either rank.
Wherein, α=a/ (k0*ks), β=a/ (k0*kt), γ=a/ (ks*kt).A can pass through cross validation
Method determines.About the concrete numerical value of a, the embodiment of the present invention is not especially limited.
206: be respectively the value initial value as parameters of one non-negative of parameters random assortment, and root
According to the initial value of parameters, calculate the convergency value of parameters, using the convergency value of parameters as each
The desired value of parameter.
This step is to determine the specific implementation of the desired value of parameters in general objective function respectively.By step
Rapid 203 and step 204 in content can obtain, source domain specific topics matrix, source domain specific topics matrix
Coefficient matrix and hinge matrix can by source domain term matrix use Non-negative Matrix Factorization method decompose
Obtain, target domain specific topics matrix, the coefficient matrix of target domain specific topics matrix and hinge matrix
Can obtain by using Non-negative Matrix Factorization method to decompose in target domain term matrix.In conjunction with the catalogue offer of tender
The expression formula of number, the parameters in general objective function includes hinge matrix U0, source domain specific topics matrix Us、
Target domain specific topics matrix Ut, the coefficient matrix V of source domain specific topics matrixsSpecific with target domain
The coefficient matrix V of topic matrixt.But, by source domain term matrix XsRetrieve with target domain
Word matrix XtWhen carrying out decomposing to obtain parameters, it is not necessary to carry out once-through operation and i.e. can get parameters
Optimal Decomposition matrix, and need to be determined the optimal value of parameters by iterative computation.Therefore, the present invention
During the desired value of embodiment parameters in determining general objective function, can first be respectively parameters random
The value of one non-negative of distribution is as the initial value of parameters, and according to the initial value of parameters, uses one
Parameters is iterated calculating, to obtain the convergency value of parameters, by parameters by fixed algorithm
Convergency value is as the desired value of parameters.
Wherein, owing in general objective function, parameters is matrix, therefore, dividing at random for parameters
When joining the value of a non-negative, for one nonnegative value of the equal random assortment of each element in parameters.
Specifically, when being iterated calculating, for different parameters, used during iterative computation is concrete
Algorithm is the most different.Respectively the mode of the convergency value calculating parameters will be introduced below.
1, hinge matrix U is calculated0Convergency value:
First, by U0Regard unknown parameter, U ass、Ut、VsAnd VtRegard known parameters as, then general objective function phi
About U0First derivative formula can be expressed as:
It follows that use KKT (Karush-Kuhn-Tucke, Caro need-Kuhn-Tucker condition) conditionWith gradient Φ of general objective function phi, above-mentioned formula (5) is defined, can
To obtain:
This formula (6) is calculated, can obtain:
In formula (7),The U that last iteration obtains0Value,According toThe U that iteration obtains0
Value, HsFor the hinge matrix coefficient matrix to source domain, represent that hinge matrix weight in source domain is big
Little;HtFor the hinge matrix coefficient matrix to target domain, represent hinge matrix weight in target domain
Size;R represents iterations, i.e. the r time iteration;Representing matrix point division operation.
Finally, use above-mentioned formula (7) to U0It is iterated calculating, until obtaining U0Convergency value
Wherein, when carrying out iterative computation for the first time, will be for U0The initial value conduct of random assortment
2, source domain specific topics matrix U is calculatedsConvergency value:
First, by UsRegard unknown parameter, U as0、Ut、VsAnd VtRegard known parameters as, then general objective function phi
About UsFirst derivative formula can be expressed as:
It follows that use KKT conditionWith gradient Φ of general objective function phi to upper
State formula to be defined, can obtain:
In formula (8),The U that last iteration obtainssValue,According toThe U that iteration obtainss
Value, LsFor the coefficient matrix of the field specific topics matrix of source domain, LtField for target domain is specific
The coefficient matrix of topic matrix.
Finally, use above-mentioned formula (8) to UsIt is iterated calculating, until obtaining UsConvergency value
Wherein, when carrying out iterative computation for the first time, will be for UsThe initial value conduct of random assortment
3, target domain specific topics matrix U is calculatedtConvergency value:
The principle of this process calculates hinge matrix U in above-mentioned 1 or 20Convergency value or calculate source domain specific
Topic matrix UsConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically,
The U obtainedtExpression formula is:
In formula (9),The U that last iteration obtainstValue,According toThe U that iteration obtainst
Value.Calculating UtConvergency value time, can be by above-mentioned formula (9) constantly to UtIt is iterated calculating,
Until obtaining UtConvergency value
4, the coefficient matrix V of source domain specific topics matrix is calculatedsConvergency value:
The principle of this process calculates hinge matrix U in above-mentioned 1 or 20Convergency value or calculate source domain specific
Topic matrix UsConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically,
The V obtainedsExpression formula is:
In formula (10),The V obtained for last iterationsValue,According toIteration obtains
VsValue.Specifically, V is being calculatedsConvergency valueTime, can be by above-mentioned formula (10) to VsConstantly enter
Row iteration calculates, until obtaining VsConvergency value
5, the coefficient matrix V of target domain specific topics matrix is calculatedtConvergency value:
The principle of this process calculates hinge matrix U in above-mentioned 1 or 20Convergency value or calculate source domain specific
Topic matrix UsConvergency value in principle consistent, specifically can be found in the content in above-mentioned 1 or 2.Specifically,
The V obtainedtExpression formula is:
In formula (10),The V obtained for last iterationtValue,According toIteration obtains
VtValue.Calculating VtConvergency value time, can be by above-mentioned formula (10) constantly to VtIt is iterated calculating,
Until obtaining VtConvergency value
Further, ensure that according to the convergence formula (formula (7) of parameters in general objective function
To formula (10)) obtain the convergency value of parameters, the method that the embodiment of the present invention provides is according to above-mentioned mistake
After journey determines the convergence formula of parameters, also the convergence formula of parameters will be carried out convergence
Checking.For convenience of description, following will be in conjunction with formula (8), with to source domain specific topics matrix UsReceipts
Holding back property illustrates as a example by verifying.For the checking principle of other parameter with to UsChecking principle consistent,
The constringent process verifying other parameter will be described in detail by the embodiment of the present invention.
Specifically, before carrying out convergence checking, need first to introduce a definition, two lemma and one
Theorem.
Definition 1: assume that F (X, X ') is an auxiliary function of Φ (X), and
Φ (X)≤F (X, X ')
During and if only if Φ (X)=F (X, X), equation is set up.
Lemma 1: vacation lets f be an auxiliary function of Φ, Φ is a nonincreasing function, on this basis, has:
Can obtain in conjunction with above-mentioned definition 1:
Φ(X(r+1))≤F(X(r+1), X(r))≤F(X(r), X(r))=Φ (X(r))。
Lemma 2: assumeRepresent that Φ includes allSum, following function isOne
Individual auxiliary function:
Theorem 1: on the basis of above-mentioned formula (7) to (10), Φ (U0, Us, Ut, Vs, Vt) be one non-
Increasing function.
Prove UsConvergence as follows:
Because the purpose optimizing general objective function is to use auxiliary functionMinimize Φ (Us), therefore,
OrderAnd use lemma 1 and lemma 2, can obtain following equation:
And
Formula (12) is used to substitute in lemma 2I.e. can get formula (8).
It addition, in obtaining general objective function after the convergence formula of parameters, it is also possible to further to each
The complexity of parameter is analyzed.In embodiments of the present invention, the complexity of parameters is represented with O.
Specifically, the process in conjunction with the above-mentioned convergence formula solving parameters can obtain: in each iteration,
Calculate hinge matrix U0Complexity be O (m × n × k0), wherein, n=max (ns, nt).Similarly,
In each iteration, source domain specific topics matrix U is calculatedsWith target domain specific topics matrix UtComplexity
Degree is respectively O (m × ns×ks) and O (m × nt×kt).In each iteration, source domain specific topics is calculated
The coefficient matrix V of matrixsCoefficient matrix V with target domain specific topics matrixtComplexity be respectively
O(m×ns×(k0+ks)) and O (m × nt×(k0+kt))。
Can be obtained by the complicated dynamic behaviour formula of above-mentioned parameters, the complexity of whole calculating process depends on meter
Calculate the coefficient matrix V of source domain specific topics matrixsCoefficient matrix V with target domain specific topics matrixt。
It should be noted that above-mentioned steps 206 is only to determine general objective function by formula (7) to (10)
It is illustrated as a example by the desired value of middle parameters, but, in the specific implementation, determining the catalogue offer of tender
In number during the desired value of parameters, it is also possible to based on alternating least-squares, active set m ethod or Projected
Methods etc., the mode of the desired value determining parameters is not specifically limited by the embodiment of the present invention.
207: according to the viewpoint data of mark in the desired value of parameters and source domain, classification is specified in training
Model, by specifying disaggregated model to classify the viewpoint data of target domain.
The convergency value of parameters in general objective function can be obtained by above-mentioned steps 206, and parameters
It is and can be identified for that source domain and the parameter of target domain feature, such as, UsFor source domain specific topics matrix,
This parameter can be identified for that topic specific to source domain;UtFor target domain specific topics matrix, this parameter energy
Enough topics specific to mark target domain;U0For hinge matrix, this parameter can be identified for that source domain and target
The topic that field is common.It is to say, the parameters in general objective function can be identified for that source domain and mesh
The feature in mark field, therefore, after the convergency value obtaining parameters, can obtain source domain and target
The feature in field.Include that some have marked viewpoint data due to source domain again, and target domain may not wrap
Include and mark viewpoint data, therefore, it can the mark in the desired value according to parameters and source domain and see
Point data, training is specified disaggregated model, and then can be led target by the appointment disaggregated model that training obtains
The viewpoint data in territory are classified.
Specifically, can be in conjunction with source domain specific topics matrix UsConvergency value, target domain specific topics square
Battle array UtConvergency value and source domain in the viewpoint data of mark, training specify disaggregated model.About according to total
The viewpoint data of mark in the desired value of parameters and source domain in object function, classification mould is specified in training
The process of type, the embodiment of the present invention is not described in detail, and can come real in conjunction with existing model training method
Existing.
Further, after training obtains disaggregated model, need the arbitrary literary composition in target domain if follow-up
Shelves are classified, i.e. it needs to be determined that during the feeling polarities of the document, the document can be input to this and train
The appointment disaggregated model arrived, and the output specifying disaggregated model obtained by this training determines the feelings of the document
Sense polarity.
Specifically, the appointment disaggregated model obtained when training represents document respectively by output "+1 " and "-1 "
Feeling polarities when being respectively the most positively and negatively, divide if arbitrary document to be input to the appointment that this training obtains
In class model, the appointment disaggregated model obtained when this training is output as "+1 ", then may determine that the feelings of the document
Sense polarity is forward;The appointment disaggregated model obtained when this training is output as "-1 ", then may determine that the document
Feeling polarities be negative sense.
About the concrete form of appointment disaggregated model, can have a variety of.Such as, it is intended that disaggregated model is permissible
For SVM (Support Vector Machine, support vector machine) etc..
It should be noted that above-described embodiment is only studied choosing a source domain and a target domain
As a example by be illustrated.But, in the specific implementation, the quantity of source domain and target domain can also be it
Its numerical value.
Alternatively, the viewpoint number of domain-adaptive is realized in order to be verified above-mentioned steps 201 to step 207
According to accuracy during classification, the method that above-mentioned steps 201 to step 207 is also proposed by the embodiment of the present invention
Carry out experimental verification.
Specifically, the embodiment of the present invention have chosen four fields and carried out experimental verification.Wherein, four chosen
Individual field is respectively as follows: books field (B), DVD (Digital Versatile Disc, digital versatile disc)
S field (D), electronics field (E), field of kitchen products (K).Experimentation is above-mentioned four
Each viewpoint data in individual field distribute a viewpoint label.Wherein, the label of the viewpoint data of distribution
For+1 or-1.When the viewpoint label that the arbitrary viewpoint data for a certain field are distributed is+1, this viewpoint is described
Feeling polarities be forward;When the viewpoint label distributed for a certain viewpoint data is-1, this viewpoint is described
Feeling polarities is negative sense.Wherein, each field includes 1000 forward viewpoint data points and 1000 negative senses
Viewpoint data, also have some not mark viewpoint data.Realizing the viewpoint data sorting task of domain-adaptive
In, it is possible to the classification task of structure has 12, is respectively as follows: D → B, E → B, K → B, K → E, D → E,
B → E, B → D, K → D, E → D, B → K, D → K, E → K.Wherein, source neck is represented before arrow
Territory, represents target domain after arrow.As shown in table 1, the composition situation that it illustrates a kind of experimental data is shown
Expectation.
Table 1
Field | Training data | Test data | Do not mark the data of viewpoint | The ratio of negative sense data |
Books | 1600 | 400 | 4465 | 50% |
DVD | 1600 | 400 | 5945 | 50% |
Electronic product | 1600 | 400 | 5681 | 50% |
Kitchen articles | 1600 | 400 | 3586 | 50% |
The viewpoint data that data are four fields chosen listed in table 1, wherein, wrap in each field
Contain training data, test data and do not mark the data of viewpoint, and in each field shared by negative sense data
Ratio is the 50% of each FIELD Data.Owing to, in 12 classification task built, both may be used in each field
Be source domain can also be target domain, when selected field is as source domain, the training data in field
For building appointment disaggregated model, when selected field is as target domain, the test data in field are used for
The appointment disaggregated model obtaining training is tested.Therefore, in order to ensure the accuracy of experiment, the present invention
Embodiment sets training data and the test data of equal number for every field, as shown in table 1, each
Training data in field is 1600, and test data are 400.
In order to represent the method using the present embodiment to provide intuitively in domain-adaptive viewpoint data are classified
Superiority, when the viewpoint data that have chosen four fields are tested, also have chosen benchmark algorithm (baseline),
SCL (Structural Correspondence Learning, structure correspondence learns), MCT (Multi-label
Consensus Training, multiple labeling common recognition training), SFA (Spectral Feature Alignment, the spy of spectrum
Levy queue), SDA (Stacked Denoising Auto-encoders, every layer of denoising automatic encoding), CODA
(Chen et al. [2011] proposed a state-Of-the-art Domain Adaptation) and PJNMF
(Linking Heterogeneous InputFeatures via Pivots via Joint Non-negative Matrix
Factor-ization, the algorithm being connected different input feature vector by hinge based on Non-negative Matrix Factorization), wherein,
PJNMF is the method that the embodiment of the present invention is provided.
As shown in table 2, it illustrates and a kind of carry out, by various different algorithms, the classification results obtained of classifying
Signal table.
Table 2
Task | Basic Law | SCL | MCT | SFA | SDA | CODA | PJNMF |
B→D | 76.41±0.31 | 78.68±0.26 | 78.92±0.23 | 80.58±0.18 | 81.12±0.17 | 80.64±0.16 | 81.85±0.17 |
E→D | 71.95±0.19 | 75.51±0.27 | 72.67±0.35 | 76.02±0.12 | 76.63±0.25 | 76.10±0.23 | 77.35±0.20 |
K→D | 73.35±0.20 | 76.88±0.29 | 74.05±0.28 | 76.55±0.16 | 76.85±0.28 | 76.62±0.21 | 78.62±0.28 |
D→B | 73.8±0.24 | 78.27±0.18 | 75.67±0.30 | 77.58±0.23 | 78.22±0.33 | 77.83±0.17 | 79.27±0.25 |
E→B | 72.14±0.26 | 75.06±0.21 | 72.90±0.27 | 75.38±0.27 | 75.50±0.19 | 75.46±0.25 | 76.30±0.22 |
K→B | 71.25±0.18 | 73.08±0.24 | 74.01±0.31 | 74.15±0.34 | 74.47±0.25 | 75.41±0.22 | 75.87±0.23 |
B→E | 71.75±0.32 | 75.21±0.18 | 75.62±0.26 | 75.35±0.26 | 75.77±0.27 | 76.34±0.18 | 76.28±0.27 |
D→E | 72.38±0.20 | 75.95±0.25 | 76.82±0.34 | 77.13±0.23 | 77.65±0.22 | 77.94±0.20 | 77.86±0.24 |
K→E | 83.35±0.13 | 85.18±0.15 | 84.24±0.25 | 85.01±0.23 | 84.65±0.34 | 84.50±0.32 | 85.92±0.32 |
B→K | 74.44±0.30 | 77.06±0.21 | 78.31±0.22 | 78.28±0.25 | 78.54±0.23 | 78.35±0.26 | 79.15±0.29 |
D→K | 75.11±0.33 | 78.96±0.19 | 80.57±0.24 | 80.35±0.29 | 80.77±0.31 | 80.65±0.24 | 81.26±0.33 |
E→K | 85.11±0.13 | 85.08±0.16 | 85.33±0.26 | 85.91±0.19 | 87.25±0.20 | 86.08±0.27 | 86.37±0.21 |
Meansigma methods | 75.09±0.23 | 77.91±0.20 | 77.43±0.28 | 78.52±0.23 | 78.95±0.25 | 78.83±0.23 | 79.68±0.25 |
Wherein, the data mode of " accuracy ± standard deviation " of the data acquisition in table 2, adding in table 2
Raw data represents the best experimental result using these algorithms to obtain.Can be obtained by the data in table 2, this
The PJNMF method that bright embodiment proposes all performances in all of 12 tasks are good, and almost in institute
There is the classification results in task all good than other calculated classification results of algorithm institute.
Further, the convergence of method that the embodiment of the present invention is also provided by the embodiment of the present invention has been carried out point
Analysis, Fig. 3 shows a kind of convergence curve.This convergence curve is based on training data and uses the present invention to implement
The method that example provides obtains.X-axis in Fig. 3 represents that iterations, Y-axis represent the value of general objective function.
Can be obtained by Fig. 3, use the general objective function obtained by the method for embodiment of the present invention offer quickly to restrain,
Generally, this general objective function convergence can be made when iterations is less than 200 times.
It addition, the embodiment of the present invention also further study the similarity between source domain and target domain.Real
Testing and show, A-distance (A-distance) may be used for identifying the difference between two fields.Assume
A-dis tan ce=2 (1-2 ε), ε represent elementary error (such as, the step 207 training the designated model obtained
The SVM elementary error that middle training obtains.Fig. 4 shows that a kind of difference for every pair of field carries out testing
The experimental result picture arrived.Transverse axis in Fig. 4 is the value of the A-distance making word bag data try to achieve, and the longitudinal axis is
The value of the A-distance that the PJNMF method provided by the embodiment of the present invention is tried to achieve.Test result indicate that,
The method provided by the embodiment of the present invention, A-distance presents the trend of increase, further demonstrates and pass through
When the method that the embodiment of the present invention provides determines the parameters in general objective function, it can be ensured that in the source of acquisition
During the field specific topics of field or target domain, only obtain in topic specific to source domain or target domain
Take, and be unlikely to the shared topic getting between source domain and target domain;And when obtaining hinge topic,
Only the shared topic between source domain and target domain obtains, without from the specific words in the field of source domain
The field specific topics of topic or target domain obtains.
The method that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix,
The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has
Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided
Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that
When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results
Really property.
Fig. 5 is the viewpoint device for classifying data of a kind of domain-adaptive provided according to an exemplary embodiment
Structural representation, the viewpoint device for classifying data of this domain-adaptive may be used for performing above-mentioned Fig. 1 or Fig. 2
The viewpoint data classification method of the domain-adaptive that corresponding embodiment provides.See Fig. 5, this domain-adaptive
Viewpoint device for classifying data include:
First determines module 501, for according to the relation between document and the term of source domain, determines that source is led
Territory term matrix;
Second determines module 502, for according to the relation between document and the term of target domain, determines mesh
Mark field term matrix;
3rd determines module 503, for according to source domain term matrix, source domain specific topics matrix, source
Hinge matrix between coefficient matrix and source domain and the target domain of field specific topics matrix, determines that source is led
Territory object function;
4th determines module 504, for according to target domain term matrix, target domain specific topics matrix,
The coefficient matrix of target domain specific topics matrix and hinge matrix, determine target domain object function;
5th determines module 505, for according to source domain object function and target domain object function, determines total
Object function;
6th determines module 506, for determining the desired value of parameters in general objective function respectively;
Training module 507, the viewpoint data of mark in the desired value according to parameters and source domain,
Disaggregated model is specified in training;
The viewpoint data of target domain are entered by sort module 508 for the disaggregated model of specifying obtained by training
Row classification.
The device that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix,
The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has
Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided
Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that
When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results
Really property.
In another embodiment, the 3rd determines module 503, for according to source domain term matrix, source neck
Territory specific topics matrix, source domain specific topics matrix coefficient matrix and source domain and target domain between
Hinge matrix, determines source domain object function by below equation:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
4th determines module 504, for according to target domain term matrix, target domain specific topics matrix,
The coefficient matrix of target domain specific topics matrix and hinge matrix, determine target domain mesh by below equation
Scalar functions:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For hinge matrix,
UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
In another embodiment, the 5th determines module 505, for leading according to source domain object function and target
Territory object function, determines general objective function by equation below:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
In another embodiment, the 6th determines that module 506 includes:
Allocation unit, is used for the value of respectively one non-negative of parameters random assortment as at the beginning of parameters
Initial value;
Computing unit, for the initial value according to parameters, calculates the convergency value of parameters, by each
The convergency value of parameter is as the desired value of parameters.
In another embodiment, the parameters in general objective function includes U0、Us、Ut、VsAnd Vt;
Computing unit is used for:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
Fig. 6 is the structural representation according to a kind of server shown in an exemplary embodiment.This server can
For the viewpoint data classification side performing the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides
Method.With reference to Fig. 6, server 600 includes processing assembly 622, and it farther includes one or more processor,
And by the memory resource representated by memorizer 632, can be by the execution of process assembly 622 for storage
Instruction, such as application program.In memorizer 632, the application program of storage can include one or more
Each corresponding to one group instruction module.It is configured to perform instruction additionally, process assembly 622, with
Perform the viewpoint data classification method of the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides.
Server 600 can also include that a power supply module 626 is configured to perform the power supply of server 600
Management, a wired or wireless network interface 650 is configured to server 600 is connected to network, and one
Individual input and output (I/O) interface 658.Server 600 can operate based on the behaviour being stored in memorizer 632
Make system, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM
Or it is similar.
Wherein, one or more than one program are stored in memorizer, and are configured to by one or one
Individual above processor performs, and one or more than one program comprise the instruction for carrying out following operation:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain object function;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function;
According to source domain object function and target domain object function, determine general objective function;
Determine the desired value of parameters in general objective function respectively;
The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model,
The viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.
Assume above-mentioned for the first possible embodiment, then based on the embodiment that the first is possible
And in the possible embodiment of the second of providing, in the memorizer of server, also comprise below performing
The instruction of operation: according to source domain term matrix, source domain specific topics matrix, source domain specific topics
Hinge matrix between coefficient matrix and source domain and the target domain of matrix, determines source domain object function,
Including:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain target letter by below equation
Number:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function, including:
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function by below equation:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For hinge matrix,
UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
In the third the possible embodiment provided based on the embodiment that the second is possible, clothes
In the memorizer of business device, also comprise for performing the following instruction operated: according to source domain object function and mesh
Mark field object function, determines general objective function, including:
According to source domain object function and target domain object function, determine general objective function by equation below:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
The 4th kind of possible embodiment party provided based on the first or the third possible embodiment
In formula, in the memorizer of server, also comprise for performing the following instruction operated: determine general objective respectively
The desired value of parameters in function, including:
It is respectively the value initial value as parameters of one non-negative of parameters random assortment;
According to the initial value of parameters, calculate the convergency value of parameters, the convergency value of parameters is made
Desired value for parameters.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, clothes
In the memorizer of business device, also comprise for performing the following instruction operated: the parameters in general objective function
Including U0、Us、Ut、VsAnd Vt;According to the initial value of parameters, calculate the convergency value of parameters,
Including:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
The server that the embodiment of the present invention provides, due to the general objective function and the source domain specific topics square that determine
The hinge square of the shared topic between battle array, target domain specific topics matrix and expression source domain and target domain
Battle array is relevant, thus provides a kind of and realize domain-adaptive by the shared topic between source domain and target domain
Viewpoint data classification method.Owing to shared topic can reduce the difference between source domain and target domain,
When viewpoint data classification method by this kind of domain-adaptive is classified, it can be ensured that classification results
Accuracy.
Fig. 7 is the structural representation of a kind of terminal provided according to an exemplary embodiment, and this terminal can be used
In the viewpoint data classification method performing the domain-adaptive that embodiment corresponding to above-mentioned Fig. 1 or Fig. 2 provides.
Specifically:
Terminal 700 can include RF (Radio Frequency, radio frequency) circuit 110, include one or
The memorizer 120 of more than one computer-readable recording medium, input block 130, display unit 140, biography
Sensor 150, voicefrequency circuit 160, WiFi (Wireless Fidelity, Wireless Fidelity) module 170, include
There are one or more than one parts such as the processor 180 processing core and power supply 190.Art technology
Personnel are appreciated that the terminal structure shown in Fig. 7 is not intended that the restriction to terminal, can include than figure
Show more or less of parts, or combine some parts, or different parts are arranged.Wherein:
RF circuit 110 can be used for receiving and sending messages or in communication process, the reception of signal and transmission, especially,
After the downlink information of base station is received, transfer to one or more than one processor 180 processes;It addition, will
Relate to up data and be sent to base station.Generally, RF circuit 110 include but not limited to antenna, at least one
Amplifier, tuner, one or more agitator, subscriber identity module (SIM) card, transceiver, coupling
Clutch, LNA (Low Noise Amplifier, low-noise amplifier), duplexer etc..Additionally, RF circuit
110 can also be communicated with network and other equipment by radio communication.Described radio communication can use arbitrary logical
Beacon is accurate or agreement, include but not limited to GSM (Global System of Mobile communication, entirely
Ball mobile communcations system), GPRS (General Packet Radio Service, general packet radio service),
CDMA (Code Division Multiple Access, CDMA), WCDMA (Wideband Code
Division Multiple Access, WCDMA), LTE (Long Term Evolution, Long Term Evolution),
Email, SMS (Short Messaging Service, Short Message Service) etc..
Memorizer 120 can be used for storing software program and module, and processor 180 is stored in by operation
The software program of reservoir 120 and module, thus perform the application of various function and data process.Memorizer
120 can mainly include store program area and storage data field, wherein, storage program area can store operating system,
Application program (such as sound-playing function, image player function etc.) etc. needed at least one function;Deposit
Storage data field can store the data (such as voice data, phone directory etc.) that the use according to terminal 700 is created
Deng.Additionally, memorizer 120 can include high-speed random access memory, it is also possible to include non-volatile depositing
Reservoir, for example, at least one disk memory, flush memory device or other volatile solid-state parts.
Correspondingly, memorizer 120 can also include Memory Controller, to provide processor 180 and input block
The access of 130 pairs of memorizeies 120.
Input block 130 can be used for receive input numeral or character information, and produce with user setup with
And function controls relevant keyboard, mouse, action bars, optics or the input of trace ball signal.Specifically,
Input block 130 can include Touch sensitive surface 131 and other input equipments 132.Touch sensitive surface 131, also referred to as
For touching display screen or Trackpad, can collect user thereon or neighbouring touch operation (such as user makes
With any applicable object such as finger, stylus or adnexa on Touch sensitive surface 131 or attached at Touch sensitive surface 131
Near operation), and drive corresponding attachment means according to formula set in advance.Optionally, Touch sensitive surface 131
Touch detecting apparatus and two parts of touch controller can be included.Wherein, touch detecting apparatus detects user's
Touch orientation, and detect the signal that touch operation brings, transmit a signal to touch controller;Touch control
Device receives touch information from touch detecting apparatus, and is converted into contact coordinate, then gives processor 180,
And order that processor 180 sends can be received and performed.Furthermore, it is possible to use resistance-type, condenser type,
The polytype such as infrared ray and surface acoustic wave realizes Touch sensitive surface 131.Except Touch sensitive surface 131, input is single
Unit 130 can also include other input equipments 132.Specifically, other input equipments 132 can include but not
It is limited to physical keyboard, function key (such as volume control button, switch key etc.), trace ball, mouse, behaviour
Make one or more in bar etc..
Display unit 140 can be used for showing the information inputted by user or the information being supplied to user and terminal
The various graphical user interface of 700, these graphical user interface can by figure, text, icon, video and
Its combination in any is constituted.Display unit 140 can include display floater 141, optionally, can use
LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode,
Organic Light Emitting Diode) etc. form configure display floater 141.Further, Touch sensitive surface 131 can cover
Display floater 141, when Touch sensitive surface 131 detects thereon or after neighbouring touch operation, sends process to
Device 180 is to determine the type of touch event, with preprocessor 180 according to the type of touch event at display surface
Corresponding visual output is provided on plate 141.Although in the figure 7, Touch sensitive surface 131 and display floater 141
It is to realize input and input function as two independent parts, but in some embodiments it is possible to will
Touch sensitive surface 131 is integrated with display floater 141 and realizes input and output function.
Terminal 700 may also include at least one sensor 150, such as optical sensor, motion sensor and its
His sensor.Specifically, optical sensor can include ambient light sensor and proximity transducer, wherein, environment
Optical sensor can regulate the brightness of display floater 141 according to the light and shade of ambient light, and proximity transducer can be
When terminal 700 moves in one's ear, close display floater 141 and/or backlight.As the one of motion sensor,
Gravity accelerometer can detect the size of (generally three axles) acceleration in all directions, can time static
Detect size and the direction of gravity, can be used for identifying application (such as horizontal/vertical screen switching, the phase of mobile phone attitude
Close game, magnetometer pose calibrating), Vibration identification correlation function (such as pedometer, percussion) etc.;As for
Gyroscope that terminal 700 can also configure, barometer, drimeter, thermometer, infrared ray sensor etc. other
Sensor, does not repeats them here.
Voicefrequency circuit 160, speaker 161, microphone 162 can provide the audio frequency between user and terminal 700
Interface.The signal of telecommunication after the voice data conversion that voicefrequency circuit 160 can will receive, is transferred to speaker 161,
Acoustical signal output is converted to by speaker 161;On the other hand, the acoustical signal that microphone 162 will be collected
Be converted to the signal of telecommunication, voicefrequency circuit 160 after receiving, be converted to voice data, then by voice data output
After reason device 180 processes, through RF circuit 110 to be sent to such as another terminal, or voice data is exported
To memorizer 120 to process further.Voicefrequency circuit 160 is also possible that earphone jack, outside providing
If earphone and the communication of terminal 700.
WiFi belongs to short range wireless transmission technology, and terminal 700 can help user by WiFi module 170
Sending and receiving e-mail, browse webpage and access streaming video etc., it has provided the user wireless broadband interconnection
Net accesses.Although Fig. 7 shows WiFi module 170, but it is understood that, it is also not belonging to terminal
700 must be configured into, can omit completely as required in not changing the scope of essence of invention.
Processor 180 is the control centre of terminal 700, utilizes various interface and the whole mobile phone of connection
Various piece, by running or perform to be stored in the software program in memorizer 120 and/or module, and adjusts
By the data being stored in memorizer 120, perform the various functions of terminal 700 and process data, thus right
Mobile phone carries out integral monitoring.Optionally, processor 180 can include one or more process core;Preferably,
Processor 180 can integrated application processor and modem processor, wherein, application processor mainly processes
Operating system, user interface and application program etc., modem processor mainly processes radio communication.Permissible
Being understood by, above-mentioned modem processor can not also be integrated in processor 180.
Terminal 700 also includes the power supply 190 (such as battery) powered to all parts, it is preferred that power supply can
With logically contiguous with processor 180 by power-supply management system, thus realize management by power-supply management system
The functions such as charging, electric discharge and power managed.Power supply 190 can also include one or more directly
Stream or alternating current power supply, recharging system, power failure detection circuit, power supply changeover device or inverter, electricity
The random component such as source positioning indicator.
Although not shown, terminal 700 can also include photographic head, bluetooth module etc., does not repeats them here.
It is concrete that the display unit of terminal is touch-screen display, and terminal also includes memorizer in the present embodiment,
And one or more than one program, one of them or more than one program is stored in memorizer,
And be configured to be performed by one or more than one processor.One or more than one program comprise
For performing the following instruction operated:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain object function;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function;
According to source domain object function and target domain object function, determine general objective function;
Determine the desired value of parameters in general objective function respectively;
The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model,
The viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.
Assume above-mentioned for the first possible embodiment, then based on the embodiment that the first is possible
And in the possible embodiment of the second of providing, in the memorizer of terminal, also comprise for performing following behaviour
The instruction made:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain object function, including:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain target letter by below equation
Number:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function, including:
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function by below equation:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For hinge matrix,
UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
In the third the possible embodiment provided based on the embodiment that the second is possible, eventually
In the memorizer of end, also comprise for performing the following instruction operated: according to source domain object function and target
Field object function, determines general objective function, including:
According to source domain object function and target domain object function, determine general objective function by equation below:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
The 4th kind of possible embodiment party provided based on the first or the third possible embodiment
In formula, in the memorizer of terminal, also comprise for performing the following instruction operated: determine the catalogue offer of tender respectively
The desired value of parameters in number, including:
It is respectively the value initial value as parameters of one non-negative of parameters random assortment;
According to the initial value of parameters, calculate the convergency value of parameters, the convergency value of parameters is made
Desired value for parameters.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, eventually
In the memorizer of end, also comprise for performing the following instruction operated: the parameters bag in general objective function
Include U0、Us、Ut、VsAnd Vt;According to the initial value of parameters, calculate the convergency value of parameters, bag
Include:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
The terminal that the embodiment of the present invention provides, due to the general objective function that determines and source domain specific topics matrix,
The hinge matrix of the shared topic between target domain specific topics matrix and expression source domain and target domain has
Close, thus a kind of sight realizing domain-adaptive by the shared topic between source domain and target domain is provided
Point data sorting technique.Owing to shared topic can reduce the difference between source domain and target domain so that
When being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that the standard of classification results
Really property.
Embodiments providing a kind of computer-readable recording medium, this computer-readable recording medium can
To be the computer-readable recording medium included in the memorizer in above-described embodiment;Can also be individually to deposit
, it is unkitted the computer-readable recording medium allocating in terminal.This computer-readable recording medium storage has one
Individual or more than one program, this or more than one program are by one or more than one processor
Being used for performing the viewpoint data classification method of domain-adaptive, the method includes:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain object function;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function;
According to source domain object function and target domain object function, determine general objective function;
Determine the desired value of parameters in general objective function respectively;
The viewpoint data of mark in desired value according to parameters and source domain, training specifies disaggregated model,
The viewpoint data of target domain are classified by the disaggregated model of specifying obtained by training.
Assume above-mentioned for the first possible embodiment, then based on the embodiment that the first is possible
And in the possible embodiment of the second of providing, in the memorizer of terminal, also comprise for performing following behaviour
The instruction made:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain object function, including:
According to source domain term matrix, source domain specific topics matrix, source domain specific topics matrix be
Hinge matrix between matrix number and source domain and target domain, determines source domain target letter by below equation
Number:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function, including:
According to target domain term matrix, target domain specific topics matrix, target domain specific topics square
The coefficient matrix of battle array and hinge matrix, determine target domain object function by below equation:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For hinge matrix,
UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
In the third the possible embodiment provided based on the embodiment that the second is possible, eventually
In the memorizer of end, also comprise for performing the following instruction operated: according to source domain object function and target
Field object function, determines general objective function, including:
According to source domain object function and target domain object function, determine general objective function by equation below:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
The 4th kind of possible embodiment party provided based on the first or the third possible embodiment
In formula, in the memorizer of terminal, also comprise for performing the following instruction operated: determine the catalogue offer of tender respectively
The desired value of parameters in number, including:
It is respectively the value initial value as parameters of one non-negative of parameters random assortment;
According to the initial value of parameters, calculate the convergency value of parameters, the convergency value of parameters is made
Desired value for parameters.
In the 5th kind of possible embodiment provided based on the 4th kind of possible embodiment, eventually
In the memorizer of end, also comprise for performing the following instruction operated: the parameters bag in general objective function
Include U0、Us、Ut、VsAnd Vt;According to the initial value of parameters, calculate the convergency value of parameters, bag
Include:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
The computer-readable recording medium that the embodiment of the present invention provides, owing to the general objective function determined is led with source
Sharing between territory specific topics matrix, target domain specific topics matrix and expression source domain and target domain
The hinge matrix of topic is relevant, thus provides a kind of real by the shared topic between source domain and target domain
The viewpoint data classification method of existing domain-adaptive.Owing to shared topic can reduce source domain and target domain
Between difference so that when being classified by the viewpoint data classification method of this kind of domain-adaptive, permissible
Guarantee the accuracy of classification results.
Providing a kind of graphical user interface in the embodiment of the present invention, this graphical user interface is used in terminal,
This terminal include touch-screen display, memorizer and for perform one or one of more than one program
Or more than one processor;This graphical user interface includes:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics
Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, determines source domain mesh
Scalar functions;
According to described target domain term matrix, target domain specific topics matrix, described target domain spy
Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function;
According to described source domain object function and described target domain object function, determine general objective function;
Determine the desired value of parameters in described general objective function respectively;
The viewpoint data of mark in desired value according to described parameters and described source domain, training is specified
Disaggregated model, the viewpoint data of described target domain are classified by the disaggregated model of specifying obtained by training.
The graphical user interface that the embodiment of the present invention provides, owing to the general objective function determined is specific with source domain
Shared topic between topic matrix, target domain specific topics matrix and expression source domain and target domain
Hinge matrix is relevant, thus provides a kind of and realize field by the shared topic between source domain and target domain
Adaptive viewpoint data classification method.Owing to shared topic can reduce between source domain and target domain
Difference so that when being classified by the viewpoint data classification method of this kind of domain-adaptive, it can be ensured that point
The accuracy of class result.
It should be understood that the viewpoint data of the domain-adaptive of above-described embodiment offer are sorted in and carry out field
During the classification of adaptive viewpoint data, being only illustrated with the division of above-mentioned each functional module, reality should
In with, can as desired above-mentioned functions distribution be completed by different functional modules, will device interior
Portion's structure is divided into different functional modules, to complete all or part of function described above.It addition,
Viewpoint device for classifying data, server and the terminal of the domain-adaptive that above-described embodiment provides are adaptive with field
The viewpoint data classification method embodiment answered belongs to same design, and it implements process and refers to embodiment of the method,
Here repeat no more.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can be passed through
Hardware completes, it is also possible to instructing relevant hardware by program and complete, described program can be stored in
In a kind of computer-readable recording medium, storage medium mentioned above can be read only memory, disk or
CD etc..
The foregoing is only presently preferred embodiments of the present invention, not in order to limit the present invention, all the present invention's
Within spirit and principle, any modification, equivalent substitution and improvement etc. made, should be included in the present invention's
Within protection domain.
Claims (10)
1. the viewpoint data classification method of a domain-adaptive, it is characterised in that described method includes:
The relation between document and term according to source domain, determines source domain term matrix;
The relation between document and term according to target domain, determines target domain term matrix;
According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics
Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, determines source domain mesh
Scalar functions;
According to described target domain term matrix, target domain specific topics matrix, described target domain spy
Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function;
According to described source domain object function and described target domain object function, determine general objective function;
Determine the desired value of parameters in described general objective function respectively;
The viewpoint data of mark in desired value according to described parameters and described source domain, training is specified
Disaggregated model, the viewpoint data of described target domain are classified by the disaggregated model of specifying obtained by training.
Method the most according to claim 1, it is characterised in that described according to described source domain term
Matrix, source domain specific topics matrix, the coefficient matrix of described source domain specific topics matrix and described source neck
Hinge matrix between territory and described target domain, determines source domain object function, including:
According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics
Hinge matrix between coefficient matrix and described source domain and the described target domain of matrix, passes through below equation
Determine source domain object function:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
Described according to described target domain term matrix, target domain specific topics matrix, described target neck
The coefficient matrix of territory specific topics matrix and described hinge matrix, determine target domain object function, including:
According to described target domain term matrix, target domain specific topics matrix, described target domain spy
Determine the coefficient matrix of topic matrix and described hinge matrix, determine target domain object function by below equation:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For described hinge
Matrix, UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
Method the most according to claim 2, it is characterised in that described according to described source domain target letter
Several and described target domain object function, determines general objective function, including:
According to described source domain object function and described target domain object function, determined always by equation below
Object function:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
4. according to the method described in claim 1 or 3, it is characterised in that described determine described catalogue respectively
The desired value of parameters in scalar functions, including:
It is respectively the value initial value as described parameters of described one non-negative of parameters random assortment;
According to the initial value of described parameters, calculate the convergency value of described parameters, by each ginseng described
The convergency value of number is as the desired value of described parameters.
Method the most according to claim 4, it is characterised in that each ginseng in described general objective function
Number includes U0、Us、Ut、VsAnd Vt;
The described initial value according to described parameters, calculates the convergency value of described parameters, including:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining Vt's
Convergency valueIn formula,The V obtained for last iterationtValue,According toThe V that iteration obtainst
Value.
6. the viewpoint device for classifying data of a domain-adaptive, it is characterised in that described device includes:
First determines module, for according to the relation between document and the term of source domain, determines source domain
Term matrix;
Second determines module, for according to the relation between document and the term of target domain, determines target
Field term matrix;
3rd determines module, for according to described source domain term matrix, source domain specific topics matrix,
Hinge between coefficient matrix and described source domain and the described target domain of described source domain specific topics matrix
Matrix, determines source domain object function;
4th determines module, for according to described target domain term matrix, target domain specific topics square
Battle array, the coefficient matrix of described target domain specific topics matrix and described hinge matrix, determine target domain mesh
Scalar functions;
5th determines module, is used for according to described source domain object function and described target domain object function,
Determine general objective function;
6th determines module, for determining the desired value of parameters in described general objective function respectively;
Training module, the viewpoint of mark in the desired value according to described parameters and described source domain
Data, disaggregated model is specified in training;
Sort module, specifies the disaggregated model viewpoint data to described target domain for obtained by training
Classify.
Device the most according to claim 6, it is characterised in that the described 3rd determines module, for root
According to described source domain term matrix, source domain specific topics matrix, described source domain specific topics matrix
Hinge matrix between coefficient matrix and described source domain and described target domain, determines source by below equation
Field object function:
In formula, OsFor source domain object function, XsFor source domain term matrix, U0For described hinge matrix,
UsFor source domain specific topics matrix, VsFor the coefficient matrix of source domain specific topics matrix,Represent and take sieve
Benny's this norm of crow;
Described 4th determines module, for according to described target domain term matrix, the specific words of target domain
Topic matrix, the coefficient matrix of described target domain specific topics matrix and described hinge matrix, by following public affairs
Formula determines target domain object function:
In formula, OtFor target domain object function, XtFor target domain term matrix, U0For described hinge
Matrix, UtFor target domain specific topics matrix, VtCoefficient matrix for target domain specific topics matrix.
Device the most according to claim 7, it is characterised in that the described 5th determines module, for root
According to described source domain object function and described target domain object function, determine the catalogue offer of tender by equation below
Number:
In formula, Φ is general objective function, D (U0, Us, Ut, Vs, Vt) it is a regular terms, α, β, γ are
Each regularization parameter, Tr () is matrix trace,For limiting U by lagrange multiplier approach0(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting U by lagrange multiplier approacht(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approachs(i, j) >=0
Under the conditions of the Lagrange multiplier matrix that obtains,For limiting V by lagrange multiplier approacht(i, j) >=0
The Lagrange multiplier matrix obtained under part.
9. according to the device described in claim 6 or 8, it is characterised in that the described 6th determines that module includes:
Allocation unit, for the most described one non-negative of parameters random assortment value as described each
The initial value of parameter;
Computing unit, for the initial value according to described parameters, calculates the convergency value of described parameters,
Using the convergency value of described parameters as the desired value of described parameters.
Device the most according to claim 9, it is characterised in that each in described general objective function
Parameter includes U0、Us、Ut、VsAnd Vt;
Described computing unit is used for:
According to U0Initial value, according to
To U0It is iterated calculating, until obtaining U0Convergency valueIn formula,The U that last iteration obtains0
Value,According toThe U that iteration obtains0Value, HsFor the hinge matrix coefficient matrix to source domain,
HtFor the hinge matrix coefficient matrix to target domain, r represents iterations;
According to UsInitial value, according to To UsCarry out
Iterative computation, until obtaining UsConvergency valueIn formula,The U that last iteration obtainssValue,
According toThe U that iteration obtainssValue, LsFor the coefficient matrix of source domain specific topics matrix, LtFor mesh
The coefficient matrix of mark field specific topics matrix;
According to UtInitial value, according to To UtCarry out
Iterative computation, until obtaining UtConvergency valueIn formula,The U that last iteration obtainstValue,
According toThe U that iteration obtainstValue;
According to VsInitial value, according toTo VsIt is iterated calculating, until obtaining Vs's
Convergency valueIn formula,The V obtained for last iterationsValue,According toThe V that iteration obtainss
Value;
According to VtInitial value, according toTo VtIt is iterated calculating, until obtaining
VtConvergency valueIn formula,The V obtained for last iterationtValue,According toIteration obtains
The V arrivedtValue.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510316353.7A CN106294506B (en) | 2015-06-10 | 2015-06-10 | Domain-adaptive viewpoint data classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510316353.7A CN106294506B (en) | 2015-06-10 | 2015-06-10 | Domain-adaptive viewpoint data classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106294506A true CN106294506A (en) | 2017-01-04 |
CN106294506B CN106294506B (en) | 2020-04-24 |
Family
ID=57659599
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510316353.7A Active CN106294506B (en) | 2015-06-10 | 2015-06-10 | Domain-adaptive viewpoint data classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106294506B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635110A (en) * | 2018-11-30 | 2019-04-16 | 北京百度网讯科技有限公司 | Data processing method, device, equipment and computer readable storage medium |
CN110414631A (en) * | 2019-01-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Lesion detection method, the method and device of model training based on medical image |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1310825A (en) * | 1998-06-23 | 2001-08-29 | 微软公司 | Methods and apparatus for classifying text and for building a text classifier |
WO2002013055A2 (en) * | 2000-08-09 | 2002-02-14 | Elron Software, Inc. | Automatic categorization of documents based on textual content |
CN101714135A (en) * | 2009-12-11 | 2010-05-26 | 中国科学院计算技术研究所 | Emotional orientation analytical method of cross-domain texts |
CN103473380A (en) * | 2013-09-30 | 2013-12-25 | 南京大学 | Computer text sentiment classification method |
CN103646097A (en) * | 2013-12-18 | 2014-03-19 | 北京理工大学 | Constraint relationship based opinion objective and emotion word united clustering method |
CN104199829A (en) * | 2014-07-25 | 2014-12-10 | 中国科学院自动化研究所 | Emotion data classifying method and system |
-
2015
- 2015-06-10 CN CN201510316353.7A patent/CN106294506B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1310825A (en) * | 1998-06-23 | 2001-08-29 | 微软公司 | Methods and apparatus for classifying text and for building a text classifier |
WO2002013055A2 (en) * | 2000-08-09 | 2002-02-14 | Elron Software, Inc. | Automatic categorization of documents based on textual content |
CN101714135A (en) * | 2009-12-11 | 2010-05-26 | 中国科学院计算技术研究所 | Emotional orientation analytical method of cross-domain texts |
CN103473380A (en) * | 2013-09-30 | 2013-12-25 | 南京大学 | Computer text sentiment classification method |
CN103646097A (en) * | 2013-12-18 | 2014-03-19 | 北京理工大学 | Constraint relationship based opinion objective and emotion word united clustering method |
CN104199829A (en) * | 2014-07-25 | 2014-12-10 | 中国科学院自动化研究所 | Emotion data classifying method and system |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635110A (en) * | 2018-11-30 | 2019-04-16 | 北京百度网讯科技有限公司 | Data processing method, device, equipment and computer readable storage medium |
CN110414631A (en) * | 2019-01-29 | 2019-11-05 | 腾讯科技(深圳)有限公司 | Lesion detection method, the method and device of model training based on medical image |
CN110414631B (en) * | 2019-01-29 | 2022-02-01 | 腾讯科技(深圳)有限公司 | Medical image-based focus detection method, model training method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106294506B (en) | 2020-04-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10503632B1 (en) | Impact analysis for software testing | |
US20170300750A1 (en) | User classification based upon images | |
CN105279146B (en) | For the context perception method of the detection of short uncorrelated text | |
CN104679969B (en) | Prevent the method and device of customer churn | |
US20180157636A1 (en) | Methods and systems for language-agnostic machine learning in natural language processing using feature extraction | |
CN103959282B (en) | For the selective feedback of text recognition system | |
CN104219617B (en) | Service acquisition method and device | |
JP6637947B2 (en) | Cognitive robotics analyzer | |
CN105279672A (en) | Lead recommendations | |
CN106484766B (en) | Searching method and device based on artificial intelligence | |
US20120296941A1 (en) | Method and Apparatus for Modelling Personalized Contexts | |
US20170109756A1 (en) | User Unsubscription Prediction Method and Apparatus | |
US20190362025A1 (en) | Personalized query formulation for improving searches | |
CN104143097A (en) | Classification function obtaining method and device, face age recognition method and device and equipment | |
CN106878041A (en) | Log information processing method, apparatus and system | |
CN110263255A (en) | Acquisition methods, system, server and the storage medium of customer attribute information | |
CA3135466A1 (en) | User loan willingness prediction method and device and computer system | |
US20180039631A1 (en) | Shareability score | |
CN106294506A (en) | The viewpoint data classification method of domain-adaptive and device | |
CN113962401A (en) | Federal learning system, and feature selection method and device in federal learning system | |
US10037359B2 (en) | Search results using social routing of content | |
CN107807940B (en) | Information recommendation method and device | |
US20160127429A1 (en) | Applicant analytics for a multiuser social networking system | |
JP6680663B2 (en) | Information processing apparatus, information processing method, prediction model generation apparatus, prediction model generation method, and program | |
CN108536869A (en) | A kind of method, apparatus and computer readable storage medium of search participle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |