CN107391492A - Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation - Google Patents

Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation Download PDF

Info

Publication number
CN107391492A
CN107391492A CN201710661382.6A CN201710661382A CN107391492A CN 107391492 A CN107391492 A CN 107391492A CN 201710661382 A CN201710661382 A CN 201710661382A CN 107391492 A CN107391492 A CN 107391492A
Authority
CN
China
Prior art keywords
mrow
msub
msubsup
matrix
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710661382.6A
Other languages
Chinese (zh)
Inventor
贾修
贾修一
郑翔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN201710661382.6A priority Critical patent/CN107391492A/en
Publication of CN107391492A publication Critical patent/CN107391492A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, comprises the following steps:Training set is clustered into m cluster using k means clustering methods, local correlative character matrix and cluster centre label matrix are initialized;Object function is optimized using gradient descent method, solves primitive character coefficient matrix, local correlations characteristic coefficient matrix and local correlative character matrix;Using primitive character as input, each of local correlations eigenmatrix c for solving to obtain is classified as output, trains m linear regression model (LRM);The local correlations feature of test sample is predicted using the linear regression model (LRM) trained;It is predicted using distribution of the output model to test sample.

Description

Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation
Technical field
The present invention relates to a kind of emotion Predicting Technique, particularly a kind of indicia distribution Chinese based on fractional sample correlation Emotion Forecasting Methodology.
Background technology
Mark ambiguity problem is the popular research direction of current machine learning areas.The solution mark of comparative maturity at present The normal form of ambiguity has two kinds, is single mark study (Single-lable learning) and Multi-label learning (Multi- respectively lable learning).In singly learning framework is marked, an example corresponds only to a label, and in Multi-label learning, One example may have multiple labels to correspond to therewith.Multi-label learning is the expansion to single mark study.By largely studying Show with experiment, Multi-label learning is a kind of effective and wider study formula of application scenarios.But still there are some problems not It is adapted to solve using Multi-label learning, for example, in some cases, we are related to which emotion necessary not only for a word Connection, with greater need for knowing description degree of each emotion to the words.Such issues that in order to solve, indicia distribution study are proposed out Come.Indicia distribution study is the further expansion to Multi-label learning, and a tag set difference, mark are exported with Multi-label learning Remember Distributed learning output is an indicia distribution, description degree of each representation in components correspondence markings in distribution to example (referred to as description degree).Indicia distribution study is a kind of wider array of study formula of usage scenario, can solve the problem that more mark ambiguity Sex chromosome mosaicism.
Learn currently for indicia distribution, the strategy of algorithm for design mainly there are three kinds.The first strategy is problem conversion.This After the problems such as indicia distribution problem concerning study is converted to single mark study by kind strategy first, the existing algorithm in corresponding normal form is utilized Solved, output result is then converted into indicia distribution again.Second of strategy is algorithm adjustment.This layout strategy does not have Indicia distribution problem concerning study is converted into other study formula problems to be solved.This strategy found some before this to be solved The algorithm of multivariate regression problem, then these algorithms could be adjusted to solve indicia distribution study.The third strategy is pin The algorithm special to indicia distribution learning scene.This strategy is had no problem transfer process, and the study of direct solution indicia distribution is asked Topic.And it is different with second of strategy, directly output token it can be distributed using this strategy, it is not necessary to which output result is carried out Conversion.
Existing indicia distribution algorithm seldom considers the correlation between mark, or only considered the mark correlation of the overall situation, But in actual life, the correlation between marking is typically local.Herein, we attempt to utilize the mark in fractional sample Remember correlation, it is proposed that a kind of new indicia distribution algorithm.We assume that example is segmented into different clusters, example in each cluster Mark correlation be just as.In order to represent the influence of local flag correlation, we are that each example constructs one Local correlations are vectorial, the feature extra as the example, and each single item in local correlations vector represents each fractional sample Influence to the example.
The content of the invention
It is an object of the invention to provide the indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, bag Include:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and Cluster centre label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, Local Phase Closing property characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved with above-mentioned steps 2 For output, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
The mark correlation in fractional sample is utilized in the present invention, it is proposed that a kind of new indicia distribution algorithm, by example It is divided into different clusters, not each example that the mark correlation of example is just as in each cluster builds a local correlations Vector, the feature extra as the example, each single item in local correlations vector represent each fractional sample to the example Influence.Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation has preferable performance.
The present invention is described further with reference to Figure of description.
Brief description of the drawings
Fig. 1 is flow chart of the method for the present invention.
Embodiment
With reference to Fig. 1, a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, including following step Suddenly:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and Cluster centre label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, Local Phase Closing property characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved with above-mentioned steps 2 For output, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
In step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix The detailed process that c and cluster centre label matrix P are initialized is as follows:
Step S100, if Chinese affection data primitive character is X=Rq, emotion mark corresponding to i-th of example in data set Note setWherein q is the dimension of primitive character, and L is label number,Represent l-th of mark to showing Example xiDescription.Given training set S={ (x1,D1),(x2,D2),…,(xn,Dn), wherein xi∈ X are an examples.In label Spatially, using k-means clustering methods by sample cluster into m cluster.
Step S101, according to cluster result, local correlative character matrix c and cluster centre label matrix P is carried out just Beginningization, initialization step are as follows:If example xiIn j-th of cluster, then1 is initialized as, is otherwise initialized as 0, whereinFor An element in local feature matrix c,|Gj| for the number of example in cluster, xkFor in j-th of cluster K-th of element.
In step 2, the detailed process optimized using gradient descent method to object function is as follows:
Step S200, the object function of this algorithm are as follows:
Wherein, wherein, n is number of samples, and m is the number of cluster, pjIt is j-th of cluster centre,For local feature square An element in battle array c, | | | |FFor the F normal forms of matrix, λ1、λ2、λ3For three balance parameters, p (yl|xi;θ, w, c) it is p (y |xi;θ, w, c) l items, p (y | xi;θ, w, c) it is the indicia distribution predicted.The Section 1 of object function is KL divergences, measurement The similitude of prediction result and legitimate reading;Section 2 and Section 3 are regular terms, it is therefore an objective to simplified model;Section 4 be in order to Make similar sample that there is similar local correlations, and sample xiAnd pjIt is more similar,It is bigger;
Step S201, object function in S200 is optimized using gradient descent method, solve parameter θ, w and c.
In step 3, train the detailed process of m linear regression model (LRM) as follows:
Due to being characterized in unknown for a test sample, its local correlations.So we use the original of training set Beginning is characterized as inputting, and each of correlative character matrix c is classified as output, trains m linear regression model (LRM).
In step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained Detailed process is as follows:
The m linear regression model (LRM) obtained using step 3, using the primitive character of test sample as input, respectively to test M local correlations feature of sample is predicted.
In step 5, the detailed process being predicted using distribution of the output model to test sample is as follows:
By the primitive character of test sample, local correlations feature, primitive character coefficient matrix, local correlations feature system Matrix number is updated in output model, and the emotion distribution to test sample is predicted.Wherein, output model is as follows:
Wherein,It is a normalization item, is in order to full All label degree of description summations of one sample of foot are 1.θl,k1It is the row l column elements of kth 1 of primitive character coefficient matrix,It is Example xi1 primitive character of kth, wl,k2It is the row l column elements of kth 2 of fractional sample correlative character coefficient matrix,It is to show Example xiFractional sample correlation vector 2 elements of kth.The Section 1 of index represents the information of primitive character, Section 2 generation Table additionally increases the information of feature, i.e. local correlations information.
Finally, it is Euclidean, S φ respectively using the performance of six kinds of evaluation index alignment marks Distribution Algorithms rensen、Squaredχ2, K-L, Intersection and Fidelity.

Claims (4)

1. a kind of indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation, it is characterised in that including following step Suddenly:
Step 1, training set is clustered into m cluster using k-means clustering methods, to local correlative character matrix c and cluster Center label matrix P is initialized;
Step 2, object function is optimized using gradient descent method, solves primitive character coefficient matrix θ, local correlations Characteristic coefficient matrix w and local correlative character matrix c;
Step 3, using the primitive character of data as input, obtained local correlations eigenmatrix c is solved to be defeated using above-mentioned steps 2 Go out, m linear regression model (LRM) is trained using existing linear regression method;
Step 4, the local correlations feature of test sample is predicted using the linear regression model (LRM) trained;
Step 5, it is predicted using distribution of the output model to test sample.
2. method according to claim 1, it is characterised in that step 1 it is specific excessively as follows:
If Chinese affection data primitive character is X=Rq, emotion tag set corresponding to i-th of example in data setWherein q is the dimension of primitive character, and L is label number,Represent l-th of mark to example xi's Description;
Given training set S={ (x1,D1),(x2,D2),…,(xn,Dn), wherein xi∈ X are an examples;
On Label space, using k-means clustering methods by sample cluster into m cluster;
According to cluster result, local correlative character matrix c and cluster centre label matrix P are initialized, initialization step It is rapid as follows:If example xiIn j-th of cluster, then1 is initialized as, is otherwise initialized as 0, whereinFor local feature matrix c In an element,|Gj| for the number of example in cluster, xkFor k-th of element in j-th of cluster.
3. method according to claim 2, it is characterised in that the detailed process of step 2 is:
It is as follows to establish object function:
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>T</mi> <mrow> <mo>(</mo> <mi>&amp;theta;</mi> <mo>,</mo> <mi>w</mi> <mo>,</mo> <mi>c</mi> <mo>)</mo> </mrow> <mo>=</mo> <munder> <mi>&amp;Sigma;</mi> <mi>i</mi> </munder> <munder> <mi>&amp;Sigma;</mi> <mi>l</mi> </munder> <mrow> <mo>(</mo> <msubsup> <mi>d</mi> <mi>i</mi> <mi>l</mi> </msubsup> <mi>ln</mi> <mo>(</mo> <mfrac> <msubsup> <mi>d</mi> <mi>i</mi> <mi>l</mi> </msubsup> <mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>l</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>;</mo> <mi>&amp;theta;</mi> <mo>,</mo> <mi>w</mi> <mo>,</mo> <mi>c</mi> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>)</mo> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>1</mn> </msub> <mo>|</mo> <mo>|</mo> <mi>&amp;theta;</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>2</mn> </msub> <mo>|</mo> <mo>|</mo> <mi>w</mi> <mo>|</mo> <msubsup> <mo>|</mo> <mi>F</mi> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>+</mo> <msub> <mi>&amp;lambda;</mi> <mn>3</mn> </msub> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>n</mi> </munderover> <munderover> <mi>&amp;Sigma;</mi> <mrow> <mi>j</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>m</mi> </munderover> <msubsup> <mi>c</mi> <mi>i</mi> <mi>j</mi> </msubsup> <mo>|</mo> <mo>|</mo> <mi>p</mi> <mrow> <mo>(</mo> <mi>y</mi> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>;</mo> <mi>&amp;theta;</mi> <mo>,</mo> <mi>w</mi> <mo>,</mo> <mi>c</mi> <mo>)</mo> </mrow> <mo>-</mo> <msub> <mi>p</mi> <mi>j</mi> </msub> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> </mrow> </mtd> </mtr> </mtable> </mfenced>
Wherein, n is number of samples, and m is the number of cluster, pjIt is j-th of cluster centre,For one in local feature matrix c Element, | | | |FFor the F normal forms of matrix, λ1、λ2、λ3For three balance parameters, p (yl|xi;θ, w, c) for p (y | xi;θ,w,c) L items, p (y | xi;θ, w, c) it is the indicia distribution predicted;
Above-mentioned object function is optimized using gradient descent method, solves parameter θ, w and c.
4. method according to claim 3, it is characterised in that the detailed process of step 5 is:
It is as follows to establish output model:
<mrow> <mi>p</mi> <mrow> <mo>(</mo> <msub> <mi>y</mi> <mi>l</mi> </msub> <mo>|</mo> <msub> <mi>x</mi> <mi>i</mi> </msub> <mo>;</mo> <mi>&amp;theta;</mi> <mo>,</mo> <mi>w</mi> <mo>,</mo> <mi>c</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mi>Z</mi> </mfrac> <mi>exp</mi> <mrow> <mo>(</mo> <munder> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mn>1</mn> </mrow> </munder> <msub> <mi>&amp;theta;</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>k</mi> <mn>1</mn> </mrow> </msub> <msubsup> <mi>x</mi> <mi>i</mi> <mrow> <mi>k</mi> <mn>1</mn> </mrow> </msubsup> <mo>+</mo> <munder> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mn>2</mn> </mrow> </munder> <msub> <mi>w</mi> <mrow> <mi>l</mi> <mo>,</mo> <mi>k</mi> <mn>2</mn> </mrow> </msub> <msubsup> <mi>c</mi> <mi>i</mi> <mrow> <mi>k</mi> <mn>2</mn> </mrow> </msubsup> <mo>)</mo> </mrow> </mrow>
θl,k1It is the row l column elements of kth 1 of primitive character coefficient matrix,It is example xi1 primitive character of kth, wl,k2It is office The row l column elements of kth 2 of portion's sample correlations characteristic coefficient matrix,It is example xiFractional sample correlation vector kth 2 Individual element;
By the primitive character of test sample, local correlations feature, primitive character coefficient matrix, local correlations characteristic coefficient square Battle array is updated in output model, and the emotion distribution to test sample is predicted;
It is Euclidean, S φ rensen, Squared respectively using the performance of six kinds of evaluation index alignment marks Distribution Algorithms χ2, K-L, Intersection and Fidelity.
CN201710661382.6A 2017-08-04 2017-08-04 Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation Pending CN107391492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710661382.6A CN107391492A (en) 2017-08-04 2017-08-04 Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710661382.6A CN107391492A (en) 2017-08-04 2017-08-04 Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation

Publications (1)

Publication Number Publication Date
CN107391492A true CN107391492A (en) 2017-11-24

Family

ID=60343872

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710661382.6A Pending CN107391492A (en) 2017-08-04 2017-08-04 Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation

Country Status (1)

Country Link
CN (1) CN107391492A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177384A (en) * 2019-12-25 2020-05-19 南京理工大学 Multi-mark Chinese emotion marking method based on global and local mark correlation
CN113449048A (en) * 2021-08-31 2021-09-28 腾讯科技(深圳)有限公司 Data label distribution determining method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177384A (en) * 2019-12-25 2020-05-19 南京理工大学 Multi-mark Chinese emotion marking method based on global and local mark correlation
CN111177384B (en) * 2019-12-25 2023-01-20 南京理工大学 Multi-mark Chinese emotion marking method based on global and local mark correlation
CN113449048A (en) * 2021-08-31 2021-09-28 腾讯科技(深圳)有限公司 Data label distribution determining method and device, computer equipment and storage medium
CN113449048B (en) * 2021-08-31 2021-11-09 腾讯科技(深圳)有限公司 Data label distribution determining method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107402993B (en) The cross-module state search method for maximizing Hash is associated with based on identification
CN108229590B (en) Method and device for acquiring multi-label user portrait
CN110222771B (en) Class identification method of zero-sample picture
CN105808752B (en) A kind of automatic image marking method based on CCA and 2PKNN
CN108897989A (en) A kind of biological event abstracting method based on candidate events element attention mechanism
De Souza et al. A probabilistic approach to emission-line galaxy classification
CN104966105A (en) Robust machine error retrieving method and system
CN110377727B (en) Multi-label text classification method and device based on multi-task learning
CN105335756A (en) Robust learning model and image classification system
CN109918642A (en) The sentiment analysis method and system of Active Learning frame based on committee&#39;s inquiry
CN106203483A (en) A kind of zero sample image sorting technique of multi-modal mapping method of being correlated with based on semanteme
US20200193220A1 (en) Method for data imputation and classification and system for data imputation and classification
CN110598753A (en) Defect identification method based on active learning
CN110598733A (en) Multi-label distance measurement learning method based on interactive modeling
CN107330448A (en) A kind of combination learning method based on mark covariance and multiple labeling classification
CN107391492A (en) Indicia distribution Chinese emotion Forecasting Methodology based on fractional sample correlation
CN114998613B (en) Multi-mark zero sample learning method based on deep mutual learning
CN104036021A (en) Method for semantically annotating images on basis of hybrid generative and discriminative learning models
Pu et al. Deep performance factors analysis for knowledge tracing
CN104809468A (en) Multi-view classification method based on indefinite kernels
Fang et al. [Retracted] Application of the Deep Learning Algorithm and Similarity Calculation Model in Optimization of Personalized Online Teaching System of English Course
CN107885854A (en) A kind of semi-supervised cross-media retrieval method of feature based selection and virtual data generation
CN108491865A (en) A kind of more example multiple labeling sorting techniques based on metric learning
CN108734207A (en) A kind of model prediction method based on double preferred Semi-Supervised Regression algorithms
CN103544500B (en) Multi-user natural scene mark sequencing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20171124

WD01 Invention patent application deemed withdrawn after publication