CN107391492A - Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation - Google Patents
Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation Download PDFInfo
- Publication number
- CN107391492A CN107391492A CN201710661382.6A CN201710661382A CN107391492A CN 107391492 A CN107391492 A CN 107391492A CN 201710661382 A CN201710661382 A CN 201710661382A CN 107391492 A CN107391492 A CN 107391492A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- msubsup
- matrix
- local
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, comprises the following steps:Training set is clustered into m cluster using k means clustering methods, local correlative character matrix and cluster centre label matrix are initialized;Object function is optimized using gradient descent method, solves primitive character coefficient matrix, local correlations characteristic coefficient matrix and local correlative character matrix;Using primitive character as input, each of local correlations eigenmatrix c for solving to obtain is classified as output, trains m linear regression model (LRM);The local correlations feature of test sample is predicted using the linear regression model (LRM) trained;It is predicted using distribution of the output model to test sample.
Description
Technical field
The present invention relates to a kind of emotion Predicting Technique, particularly a kind of indicia distribution Chinese based on fractional sample correlation
Emotion Forecasting Methodology.
Background technology
Mark ambiguity problem is the popular research direction of current machine learning areas.The solution mark of comparative maturity at present
The normal form of ambiguity has two kinds, is single mark study (Single-lable learning) and Multi-label learning (Multi- respectively
lable learning).In singly learning framework is marked, an example corresponds only to a label, and in Multi-label learning,
One example may have multiple labels to correspond to therewith.Multi-label learning is the expansion to single mark study.By largely studying
Show with experiment, Multi-label learning is a kind of effective and wider study formula of application scenarios.But still there are some problems not
It is adapted to solve using Multi-label learning, for example, in some cases, we are related to which emotion necessary not only for a word
Connection, with greater need for knowing description degree of each emotion to the words.Such issues that in order to solve, indicia distribution study are proposed out
Come.Indicia distribution study is the further expansion to Multi-label learning, and a tag set difference, mark are exported with Multi-label learning
Remember Distributed learning output is an indicia distribution, description degree of each representation in components correspondence markings in distribution to example
(referred to as description degree).Indicia distribution study is a kind of wider array of study formula of usage scenario, can solve the problem that more mark ambiguity
Sex chromosome mosaicism.
Learn currently for indicia distribution, the strategy of algorithm for design mainly there are three kinds.The first strategy is problem conversion.This
After the problems such as indicia distribution problem concerning study is converted to single mark study by kind strategy first, the existing algorithm in corresponding normal form is utilized
Solved, output result is then converted into indicia distribution again.Second of strategy is algorithm adjustment.This layout strategy does not have
Indicia distribution problem concerning study is converted into other study formula problems to be solved.This strategy found some before this to be solved
The algorithm of multivariate regression problem, then these algorithms could be adjusted to solve indicia distribution study.The third strategy is pin
The algorithm special to indicia distribution learning scene.This strategy is had no problem transfer process, and the study of direct solution indicia distribution is asked
Topic.And it is different with second of strategy, directly output token it can be distributed using this strategy, it is not necessary to which output result is carried out
Conversion.
Existing indicia distribution algorithm seldom considers the correlation between mark, or only considered the mark correlation of the overall situation,
But in actual life, the correlation between marking is typically local.Herein, we attempt to utilize the mark in fractional sample
Remember correlation, it is proposed that a kind of new indicia distribution algorithm.We assume that example is segmented into different clusters, example in each cluster
Mark correlation be just as.In order to represent the influence of local flag correlation, we are that each example constructs one
Local correlations are vectorial, the feature extra as the example, and each single item in local correlations vector represents each fractional sample
Influence to the example.
The content of the invention
It is an object of the invention to provide the indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, bag
Include:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and
Cluster centre label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, Local Phase
Closing property characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved with above-mentioned steps 2
For output, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
The mark correlation in fractional sample is utilized in the present invention, it is proposed that a kind of new indicia distribution algorithm, by example
It is divided into different clusters, not each example that the mark correlation of example is just as in each cluster builds a local correlations
Vector, the feature extra as the example, each single item in local correlations vector represent each fractional sample to the example
Influence.Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation has preferable performance.
The present invention is described further with reference to Figure of description.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention.
Embodiment
With reference to Fig. 1, a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, including following step
Suddenly:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and
Cluster centre label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, Local Phase
Closing property characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved with above-mentioned steps 2
For output, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
In step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix
The detailed process that c and cluster centre label matrix P are initialized is as follows:
Step S100, if Chinese affection data primitive character is X=Rq, emotion mark corresponding to i-th of example in data set
Note setWherein q is the dimension of primitive character, and L is label number,Represent l-th of mark to showing
Example xiDescription.Given training set S={ (x1,D1),(x2,D2),…,(xn,Dn), wherein xi∈ X are an examples.In label
Spatially, using k-means clustering methods by sample cluster into m cluster.
Step S101, according to cluster result, local correlative character matrix c and cluster centre label matrix P is carried out just
Beginningization, initialization step are as follows:If example xiIn j-th of cluster, then1 is initialized as, is otherwise initialized as 0, whereinFor
An element in local feature matrix c,|Gj| for the number of example in cluster, xkFor in j-th of cluster
K-th of element.
In step 2, the detailed process optimized using gradient descent method to object function is as follows:
Step S200, the object function of this algorithm are as follows:
Wherein, wherein, n is number of samples, and m is the number of cluster, pjIt is j-th of cluster centre,For local feature square
An element in battle array c, | | | |FFor the F normal forms of matrix, λ1、λ2、λ3For three balance parameters, p (yl|xi;θ, w, c) it is p (y
|xi;θ, w, c) l items, p (y | xi;θ, w, c) it is the indicia distribution predicted.The Section 1 of object function is KL divergences, measurement
The similitude of prediction result and legitimate reading;Section 2 and Section 3 are regular terms, it is therefore an objective to simplified model;Section 4 be in order to
Make similar sample that there is similar local correlations, and sample xiAnd pjIt is more similar,It is bigger;
Step S201, object function in S200 is optimized using gradient descent method, solve parameter θ, w and c.
In step 3, train the detailed process of m linear regression model (LRM) as follows:
Due to being characterized in unknown for a test sample, its local correlations.So we use the original of training set
Beginning is characterized as inputting, and each of correlative character matrix c is classified as output, trains m linear regression model (LRM).
In step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained
Detailed process is as follows:
The m linear regression model (LRM) obtained using step 3, using the primitive character of test sample as input, respectively to test
M local correlations feature of sample is predicted.
In step 5, the detailed process being predicted using distribution of the output model to test sample is as follows:
By the primitive character of test sample, local correlations feature, primitive character coefficient matrix, local correlations feature system
Matrix number is updated in output model, and the emotion distribution to test sample is predicted.Wherein, output model is as follows:
Wherein,It is a normalization item, is in order to full
All label degree of description summations of one sample of foot are 1.θl,k1It is the row l column elements of kth 1 of primitive character coefficient matrix,It is
Example xi1 primitive character of kth, wl,k2It is the row l column elements of kth 2 of fractional sample correlative character coefficient matrix,It is to show
Example xiFractional sample correlation vector 2 elements of kth.The Section 1 of index represents the information of primitive character, Section 2 generation
Table additionally increases the information of feature, i.e. local correlations information.
Finally, it is Euclidean, S φ respectively using the performance of six kinds of evaluation index alignment marks Distribution Algorithms
rensen、Squaredχ2, K-L, Intersection and Fidelity.
Claims (4)
1. a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, it is characterised in that including following step
Suddenly:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and cluster
Center label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, local correlations
Characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved to be defeated using above-mentioned steps 2
Go out, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
2. method according to claim 1, it is characterised in that step 1 it is specific excessively as follows:
If Chinese affection data primitive character is X=Rq, emotion tag set corresponding to i-th of example in data setWherein q is the dimension of primitive character, and L is label number,Represent l-th of mark to example xi's
Description;
Given training set S={ (x1,D1),(x2,D2),…,(xn,Dn), wherein xi∈ X are an examples;
On Label space, using k-means clustering methods by sample cluster into m cluster;
According to cluster result, local correlative character matrix c and cluster centre label matrix P are initialized, initialization step
It is rapid as follows:If example xiIn j-th of cluster, then1 is initialized as, is otherwise initialized as 0, whereinFor local feature matrix c
In an element,|Gj| for the number of example in cluster, xkFor k-th of element in j-th of cluster.
3. method according to claim 2, it is characterised in that the detailed process of step 2 is:
It is as follows to establish object function:
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>T</mi>
<mrow>
<mo>(</mo>
<mi>&theta;</mi>
<mo>,</mo>
<mi>w</mi>
<mo>,</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>&Sigma;</mi>
<mi>i</mi>
</munder>
<munder>
<mi>&Sigma;</mi>
<mi>l</mi>
</munder>
<mrow>
<mo>(</mo>
<msubsup>
<mi>d</mi>
<mi>i</mi>
<mi>l</mi>
</msubsup>
<mi>ln</mi>
<mo>(</mo>
<mfrac>
<msubsup>
<mi>d</mi>
<mi>i</mi>
<mi>l</mi>
</msubsup>
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>l</mi>
</msub>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>;</mo>
<mi>&theta;</mi>
<mo>,</mo>
<mi>w</mi>
<mo>,</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>)</mo>
<mo>)</mo>
</mrow>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>1</mn>
</msub>
<mo>|</mo>
<mo>|</mo>
<mi>&theta;</mi>
<mo>|</mo>
<msubsup>
<mo>|</mo>
<mi>F</mi>
<mn>2</mn>
</msubsup>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>2</mn>
</msub>
<mo>|</mo>
<mo>|</mo>
<mi>w</mi>
<mo>|</mo>
<msubsup>
<mo>|</mo>
<mi>F</mi>
<mn>2</mn>
</msubsup>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>+</mo>
<msub>
<mi>&lambda;</mi>
<mn>3</mn>
</msub>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>n</mi>
</munderover>
<munderover>
<mi>&Sigma;</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>m</mi>
</munderover>
<msubsup>
<mi>c</mi>
<mi>i</mi>
<mi>j</mi>
</msubsup>
<mo>|</mo>
<mo>|</mo>
<mi>p</mi>
<mrow>
<mo>(</mo>
<mi>y</mi>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>;</mo>
<mi>&theta;</mi>
<mo>,</mo>
<mi>w</mi>
<mo>,</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<msub>
<mi>p</mi>
<mi>j</mi>
</msub>
<mo>|</mo>
<msubsup>
<mo>|</mo>
<mn>2</mn>
<mn>2</mn>
</msubsup>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
Wherein, n is number of samples, and m is the number of cluster, pjIt is j-th of cluster centre,For one in local feature matrix c
Element, | | | |FFor the F normal forms of matrix, λ1、λ2、λ3For three balance parameters, p (yl|xi;θ, w, c) for p (y | xi;θ,w,c)
L items, p (y | xi;θ, w, c) it is the indicia distribution predicted;
Above-mentioned object function is optimized using gradient descent method, solves parameter θ, w and c.
4. method according to claim 3, it is characterised in that the detailed process of step 5 is:
It is as follows to establish output model:
<mrow>
<mi>p</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>y</mi>
<mi>l</mi>
</msub>
<mo>|</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>;</mo>
<mi>&theta;</mi>
<mo>,</mo>
<mi>w</mi>
<mo>,</mo>
<mi>c</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mi>Z</mi>
</mfrac>
<mi>exp</mi>
<mrow>
<mo>(</mo>
<munder>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mn>1</mn>
</mrow>
</munder>
<msub>
<mi>&theta;</mi>
<mrow>
<mi>l</mi>
<mo>,</mo>
<mi>k</mi>
<mn>1</mn>
</mrow>
</msub>
<msubsup>
<mi>x</mi>
<mi>i</mi>
<mrow>
<mi>k</mi>
<mn>1</mn>
</mrow>
</msubsup>
<mo>+</mo>
<munder>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mn>2</mn>
</mrow>
</munder>
<msub>
<mi>w</mi>
<mrow>
<mi>l</mi>
<mo>,</mo>
<mi>k</mi>
<mn>2</mn>
</mrow>
</msub>
<msubsup>
<mi>c</mi>
<mi>i</mi>
<mrow>
<mi>k</mi>
<mn>2</mn>
</mrow>
</msubsup>
<mo>)</mo>
</mrow>
</mrow>
θl,k1It is the row l column elements of kth 1 of primitive character coefficient matrix,It is example xi1 primitive character of kth, wl,k2It is office
The row l column elements of kth 2 of portion's sample correlations characteristic coefficient matrix,It is example xiFractional sample correlation vector kth 2
Individual element;
By the primitive character of test sample, local correlations feature, primitive character coefficient matrix, local correlations characteristic coefficient square
Battle array is updated in output model, and the emotion distribution to test sample is predicted;
It is Euclidean, S φ rensen, Squared respectively using the performance of six kinds of evaluation index alignment marks Distribution Algorithms
χ2, K-L, Intersection and Fidelity.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710661382.6A CN107391492A (en) | 2017-08-04 | 2017-08-04 | Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710661382.6A CN107391492A (en) | 2017-08-04 | 2017-08-04 | Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107391492A true CN107391492A (en) | 2017-11-24 |
Family
ID=60343872
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710661382.6A Pending CN107391492A (en) | 2017-08-04 | 2017-08-04 | Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107391492A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177384A (en) * | 2019-12-25 | 2020-05-19 | 南京理工大学 | Multi-mark Chinese emotion marking method based on global and local mark correlation |
CN113449048A (en) * | 2021-08-31 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Data label distribution determining method and device, computer equipment and storage medium |
-
2017
- 2017-08-04 CN CN201710661382.6A patent/CN107391492A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111177384A (en) * | 2019-12-25 | 2020-05-19 | 南京理工大学 | Multi-mark Chinese emotion marking method based on global and local mark correlation |
CN111177384B (en) * | 2019-12-25 | 2023-01-20 | 南京理工大学 | Multi-mark Chinese emotion marking method based on global and local mark correlation |
CN113449048A (en) * | 2021-08-31 | 2021-09-28 | 腾讯科技(深圳)有限公司 | Data label distribution determining method and device, computer equipment and storage medium |
CN113449048B (en) * | 2021-08-31 | 2021-11-09 | 腾讯科技(深圳)有限公司 | Data label distribution determining method and device, computer equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107402993B (en) | The cross-module state search method for maximizing Hash is associated with based on identification | |
CN108229590B (en) | Method and device for acquiring multi-label user portrait | |
CN110222771B (en) | Class identification method of zero-sample picture | |
CN105808752B (en) | A kind of automatic image marking method based on CCA and 2PKNN | |
CN108897989A (en) | A kind of biological event abstracting method based on candidate events element attention mechanism | |
De Souza et al. | A probabilistic approach to emission-line galaxy classification | |
CN104966105A (en) | Robust machine error retrieving method and system | |
CN110377727B (en) | Multi-label text classification method and device based on multi-task learning | |
CN105335756A (en) | Robust learning model and image classification system | |
CN109918642A (en) | The sentiment analysis method and system of Active Learning frame based on committee's inquiry | |
CN106203483A (en) | A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme | |
US20200193220A1 (en) | Method for data imputation and classification and system for data imputation and classification | |
CN110598753A (en) | Defect identification method based on active learning | |
CN110598733A (en) | Multi-label distance measurement learning method based on interactive modeling | |
CN107330448A (en) | A kind of combination learning method based on mark covariance and multiple labeling classification | |
CN107391492A (en) | Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation | |
CN114998613B (en) | Multi-mark zero sample learning method based on deep mutual learning | |
CN104036021A (en) | Method for semantically annotating images on basis of hybrid generative and discriminative learning models | |
Pu et al. | Deep performance factors analysis for knowledge tracing | |
CN104809468A (en) | Multi-view classification method based on indefinite kernels | |
Fang et al. | [Retracted] Application of the Deep Learning Algorithm and Similarity Calculation Model in Optimization of Personalized Online Teaching System of English Course | |
CN107885854A (en) | A kind of semi-supervised cross-media retrieval method of feature based selection and virtual data generation | |
CN108491865A (en) | A kind of more example multiple labeling sorting techniques based on metric learning | |
CN108734207A (en) | A kind of model prediction method based on double preferred Semi-Supervised Regression algorithms | |
CN103544500B (en) | Multi-user natural scene mark sequencing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20171124 |
|
WD01 | Invention patent application deemed withdrawn after publication |