CN110008477A - A kind of Chinese Affective Evaluation unit abstracting method - Google Patents

A kind of Chinese Affective Evaluation unit abstracting method Download PDF

Info

Publication number
CN110008477A
CN110008477A CN201910301318.6A CN201910301318A CN110008477A CN 110008477 A CN110008477 A CN 110008477A CN 201910301318 A CN201910301318 A CN 201910301318A CN 110008477 A CN110008477 A CN 110008477A
Authority
CN
China
Prior art keywords
word
candidate
node
alignment
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910301318.6A
Other languages
Chinese (zh)
Inventor
万常选
喻聪
刘德喜
刘喜平
江腾蛟
张子靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi University of Finance and Economics
Original Assignee
Jiangxi University of Finance and Economics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi University of Finance and Economics filed Critical Jiangxi University of Finance and Economics
Priority to CN201910301318.6A priority Critical patent/CN110008477A/en
Publication of CN110008477A publication Critical patent/CN110008477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present invention relates to information extraction technique fields, disclose a kind of Chinese Affective Evaluation unit abstracting method;Candidate evaluations object and candidate emotion word are excavated, the alignment relation between candidate evaluations object and candidate emotion word is obtained, constructs the associated diagram of the emotional relationship and semantic relation between candidate evaluations object and candidate emotion word;The emotional relationship intensity between candidate evaluations object and candidate emotion word is calculated, the semantic similarity between the semantic similarity and candidate emotion word between candidate evaluations object is calculated, while calculating the word comentropy of different nodes, adjusts random walk impact factor;The confidence level of candidate evaluations object and candidate emotion word is obtained by random walk model, confidence level is extracted respectively higher than the candidate evaluations object of preset threshold and candidate emotion word as last candidate evaluations object and last candidate emotion word;This Affective Evaluation unit abstracting method does not need a large amount of artificial labeled data, and the accurate rate of extraction, recall rate are higher.

Description

A kind of Chinese Affective Evaluation unit abstracting method
Technical field
The present invention relates to information extraction technique field, in particular to a kind of Chinese Affective Evaluation unit abstracting method.
Background technique
The fast development of network, a large amount of unstructured data accumulate rapidly, how therefrom to extract valuable information It is of great significance.With comment on commodity data instance, some product features and user are generally comprised in text to this feature Description, these fine granularity features can effectively help user to understand commodity.The extraction of evaluation object and emotion word is as emotion point The subtask of analysis, main target are to extract evaluation object and author described in author to the Sentiment orientation of evaluation object.
Have the research achievement of many sentiment analysis both at home and abroad, the evaluation pair of Sentence-level is broadly divided into according to the difference of granularity As being extracted with emotion word, the evaluation object of corpus grade and emotion word extract.The evaluation object of Sentence-level is extracted with emotion word Evaluation object that may be present and emotion word are extracted from each sentence, which is generally viewed as a sequence labelling task, Each word in sentence is labeled, judgement is evaluation object or emotion word.Therefore, the upper of word can be passed through Following traits or other supplemental characteristics model word, and training pattern popular at present has CRF (Conditional Random Field, condition random field), HMM (Hidden Markov Model, Hidden Markov Model) and neural network Deng.
Evaluation object and the emotion word extraction of corpus rank are that evaluation object and emotion word are extracted from entire corpus.It passes Most of method of system uses joint extraction scheme, reason is that evaluation object and emotion word usually appear in a sentence simultaneously In, there are very strong codependency relationship, abbreviation emotional relationships between them.Such as in " this cell phone appearance can be beautiful " " appearance " and " beautiful " are " beautiful " to be used as feelings when we have found that " appearance " and " beautiful " frequently appears in together in corpus When feeling word, then it is likely used only to " appearance " is exactly modified evaluation object.
The method that joint extracts evaluation object and emotion word, performance depend on the acquisition of emotional relationship, while generally depositing In error propagation phenomenon.If emotional relationship obtains inaccuracy, there are the information of some mistakes, in the iterative process of two-way propagation In, this mistake will hand on always.
To solve the above-mentioned problems, Liu K etc. combines word alignment model and random walk model, realizes evaluation object With the extraction of emotion word.Firstly, being aligned result by what word alignment model obtained candidate evaluations object and candidate emotion word;So Afterwards, by constructing two layers of isomery figure being made of candidate evaluations object, candidate emotion word, considered between evaluation object respectively, feelings Feel word between semantic dependency and evaluation object and emotion word between emotion correlation, and by random walk model come Estimate the confidence level of each candidate evaluations object and candidate emotion word, the high candidate evaluations object of confidence level and candidate emotion word quilt It extracts as last evaluation object or emotion word.
We have found that word alignment model has two in Chinese corpus: first is that more part of speech problems, i.e. a word Different parts of speech may be labeled as in different sentences;Second is that model hypothesis problem, i.e., assume in word alignment model noun and Nominal phrase is as candidate evaluations object, adjective and verb as emotion word candidate, which has certain problems, such as In " mobile phone is run quickly ", " RUN " should be extracted as the index of description mobile phone fluency, but the word is usual It is noted as verb, not in evaluation object Candidate Set.
In addition, there is also deficiencies for random walk model.Liu K etc. is using identical impact factor come to a term node It is punished, to influence to continue from the term node migration to neighbouring term node or jump to priori confidence level or stopping trip It walks;However, the random walk between term node can also be distinguished are as follows: be logical from an emotion word or evaluation object node Cross emotional relationship migration to an evaluation object or emotion word node or by semantic relation migration to another emotion word or Evaluation object node? their impact factor should be different.
Summary of the invention
The present invention provides a kind of Chinese Affective Evaluation unit abstracting method, can solve the above problem in the prior art.
The present invention provides a kind of Chinese Affective Evaluation unit abstracting methods, comprising the following steps:
S1, word-based alignment model excavate candidate evaluations object and candidate emotion word, correct word by interdependent syntactic rule The hypothesis problem of alignment model is solved the problems, such as more parts of speech of word alignment model by word extension, is commented according to the candidate excavated Valence object and candidate emotion word, obtain the alignment relation between candidate evaluations object and candidate emotion word, also referred to as emotional relationship, And the associated diagram of the emotional relationship and semantic relation between candidate evaluations object and candidate emotion word is constructed based on alignment relation;
S2, the emotional relationship calculated between candidate evaluations object and candidate emotion word using the co-occurrence information between word are strong Degree calculates separately between semantic similarity and candidate emotion word between candidate evaluations object by topic model or term vector Semantic similarity, while the alignment condition of word-based alignment model calculates the word comentropy of different nodes, measures a node The information content size of corresponding candidate evaluations object or candidate emotion word, according between candidate evaluations object and candidate emotion word The semantic similarity and word between semantic similarity and candidate emotion word between emotional relationship intensity, candidate evaluations object Comentropy is adaptively adjusted random walk impact factor;Based on the emotional relationship between candidate evaluations object and candidate emotion word And semantic relation, the confidence level of candidate evaluations object and candidate emotion word is obtained by random walk model, confidence level is higher than pre- If the candidate evaluations object of threshold value and candidate emotion word, are extracted as last evaluation object and last emotion word respectively.
The hypothesis problem by interdependent syntactic rule amendment word alignment model in the step S1, interdependent syntactic rule Include:
If rule 1, in sentence comprising relationship ATT in fixed, and the part of speech of the qualifier in ATT meet adjective | verb | Foreign language part of speech }, the part of speech of the core word in ATT meets { noun | verb | foreign language part of speech }, while the core word in ATT is also core The qualifier of heart HED, then the qualifier in ATT and core word are mutually aligned, other words be aligned with itself or with " NULL " Alignment;
If rule 2 includes subject-predicate relationship SBV in sentence, and the part of speech of the qualifier in SBV meet noun | verb | it is outer Cliction property }, the part of speech of core word in SBV meet { adjective | verb | foreign language part of speech }, while the core word in SBV is also core The qualifier of heart HED, then the qualifier in SBV and core word are mutually aligned, other words be aligned with itself or with " NULL " Alignment;
If rule 3 includes structure of complementation CMP in sentence, and the part of speech of the qualifier in CMP meet adjective | verb | Adverbial word | foreign language part of speech, the part of speech of core word in CMP meet { verb | foreign language part of speech }, while the core word in CMP is also core The qualifier of heart HED, then the qualifier in CMP and core word are mutually aligned, other words be aligned with itself or with " NULL " Alignment;
If rule 4, in sentence comprising dynamic guest's relationship VOB, and the part of speech of the qualifier in VOB meet noun | verb | it is outer Cliction property }, the part of speech of core word in VOB meet { verb | foreign language part of speech }, while the core word in VOB is also core HED Qualifier, then the qualifier in VOB and core word are mutually aligned, other words are aligned with itself or are aligned with " NULL ";
If any one of rule 5, sentence matching above-mentioned regular 1, regular 2, regular 3 or rule 4, while being wrapped in sentence COO containing coordination, and the qualifier in COO and core word meet same matching rule, then the alignment side of the qualifier in COO Formula is consistent with the alignment thereof of core word, the alignment thereofs of other words is consistent with the alignment thereof of sentence institute matching rule;
If rule 6, sentence do not have any one of matching rule 1, rule 2, rule 3 or rule 4, by original hypothesis Alignment thereof be aligned;The rule of the alignment thereof of original hypothesis is as follows: noun and noun phrase can only with adjective or Verb alignment;On the contrary, adjective and verb can only be aligned with noun or noun phrase;Other words can only be aligned with itself, if It still can not find alignment thereof above, be then aligned with " NULL ".
Word alignment model in the step S1, word alignment process the following steps are included:
S11, in word alignment model, the excavation of emotional relationship between evaluation object and emotion word is considered as machine translation In word alignment process, during word alignment, each sentence copy at two parts, be expressed as original language and target language Speech;
S12, the sentence vector S=(w that given length is n1,w2,…,wi,…,wn)T, the alignment result A=of sentence vector S {(i,ai)|i∈[1,n],ai∈ [1, n] }, it is formulated are as follows:
A*=argmaxAP(A|S) (1)
Wherein, wiIndicate i-th of word in sentence vector S, (i, ai) indicate original language position be i word with it is corresponding Target position aiAlignment, i and aiAll indicate the subscript within the scope of sentence length n;
S13, in word alignment model, using IBM-3 model carry out word alignment, be formulated are as follows:
Wherein,Indicate wordWith word wjA possibility that alignment, describes the alignment between word Relationship;d(j|aj, n) and indicate word position ajIt is aligned possibility with word position j, the alignment for describing word position is closed System;n(θi|wi) indicate word wiWith θiThe ability of a word alignment describes the Modifying Capability of word, θi∈[1,n]。
The emotional relationship and semantic relation associated diagram between candidate evaluations object and candidate emotion word in the step S1 It is denoted as: non-directed graph G=(V, E), V=Vt∪Vo, E=Ett∪Eoo∪Eto, wherein VtIndicate candidate evaluations object set, VoIt indicates Candidate emotion set of words, EttIndicate the semantic relation in candidate evaluations object set between candidate evaluations object, EooIndicate candidate Semantic relation in emotion set of words between candidate emotion word, EtoIt indicates candidate evaluations object in candidate evaluations object set and waits Emotional relationship in selection sense set of words between candidate emotion word;
Random walk model in the step S2 is closed according to the emotion between candidate evaluations object and candidate emotion word System and semantic relation associated diagram carry out random walk, and random walk process is formulated are as follows:
Ct=α MttCt+βMtoCo+γIt (3)
Co=α MooCo+β(Mto)TCt+γIo (4)
In formula (3) and formula (4), CtIndicate the confidence level of candidate evaluations object, CoIndicate the confidence level of candidate emotion word, Mtt Indicate the semantic similarity matrix between candidate evaluations object, MooIndicate the semantic similarity matrix between candidate emotion word, Mto Indicate the emotional relationship intensity matrix between candidate evaluations object and candidate emotion word;Vector ItIndicate all candidate evaluations objects Priori confidence level, IoIndicate the priori confidence level of all candidate emotion words;α is the impact factor of semantic relation, and β is emotion pass The impact factor of system, γ are the impact factor of priori confidence level, and meet alpha+beta+γ=1;As α=1, indicate evaluation object or The confidence level of emotion word is only influenced by semantic relation;Similarly, as β=1, the confidence level of evaluation object or emotion word is indicated It is influenced by emotional relationship;As γ=1, indicate that the confidence level of evaluation object or emotion word is only influenced by priori confidence level.
Semantic similarity and the candidate obtained respectively by topic model between candidate evaluations object in the step S2 Semantic similarity between emotion word, method particularly includes: the similitude between two words is measured by KL distance, and is utilized Relativity measurement between theme indicates the semantic similarity between evaluation object or between emotion word;Assuming that two word wi With wj, the calculation formula of their semantic similarities on theme z is as follows:
In formula (5), and p (z | wi) it is word wiIndicate the probability of z of being the theme, KLz(wi,wj) it is the word w on theme ziIt arrives Word wjKL distance;In formula (6), D (wi,wj) it is the KL on all themesz(wi,wj) and KLz(wj,wi) the sum of average value; In formula (7), SA (wi,wj) it is the word w for normalizing to (0,1) sectioniWith word wjSemantic similarity.
The semantic similarity obtained respectively by term vector between candidate evaluations object and candidate feelings in the step S2 Feel the semantic similarity between word, method particularly includes: word2vec tool is used, word is expressed as a vector, if two Vector is expressed as xi=(xi1,xi2,xi3,…xim)TWith xj=(xj1,xj2,xj3,…,xjm)T, use cosine angle as measurement language The calculation formula of the foundation of adopted similarity, semantic similarity is as follows:
The calculation formula of emotional relationship intensity between the candidate evaluations object and candidate emotion word are as follows:
OA(wt,wo)=ω p (wt|wo)+(1-ω)p(wo|wt) (10)
In formula (9) and formula (10), p (wt|wo) and p (wo|wt) it is conditional probability, count (wt,wo) it is candidate evaluations object wtWith candidate emotion word woThe number being mutually aligned in word alignment model, count (wo), count (wt) it is respectively candidate emotion Word woOr candidate evaluations object wtThe total degree occurred in word alignment model;OA(wt,wo) indicate candidate evaluations object wtWith time Select emotion word woBetween emotional relationship intensity, ω be tradeoff two conditional probability p (wt|wo) and p (wo|wt) weight factor, Here its value is set as 0.5, i.e. as influence of two conditional probabilities to emotional relationship intensity.
The random walk process of the formula (3) and formula (4),
In order to solve semantic relation impact factor α, emotional relationship impact factor β and priori confidence level impact factor γ parameter The offering question of value, use information entropy measure the information content size of word w corresponding to a term node v, to a certain extent Reflect the degree of uncertainty of word w corresponding to term node v;
For a time in the emotional relationship and semantic relation associated diagram between candidate evaluations object and candidate emotion word Select evaluation object node vtOr candidate emotion word node vo, when carrying out random walk, including three kinds of migration schemes:
1) continuation migration, will be from a candidate evaluations Object node v to adjacent nodetOr candidate emotion word node voContinue The impact factor parameter value of migration to adjacent node, other candidate evaluations Object nodes or candidate emotion word node is denoted as respectivelyOr
2) stop random walk, it will be from a candidate evaluations Object node vtOr candidate emotion word node voStop random trip The impact factor parameter value walked is denoted as respectivelyOr
3) priori confidence level is jumped to, it will be from a candidate evaluations Object node vtOr candidate emotion word node voIt jumps to The impact factor parameter value of priori confidence level is denoted as respectivelyOr
After dynamic adjustment impact factor parameter value, the formula of random walk model is as follows:
In formula (11) and formula (12), vector IWith vector IIn each element value be set as 1;Vector ItIndicate all times Select the priori confidence level of evaluation object, IoIndicate the priori confidence level of all candidate emotion words;VtFor candidate evaluations Object node to Amount, VoFor candidate emotion word knot vector;For the influence from the random walk of candidate evaluations Object node to adjacent node because Sub-parameter value vector,For the impact factor parameter value vector from candidate emotion word node random walk to adjacent node;Class As,The impact factor parameter value vector of priori confidence level is jumped to for candidate evaluations Object node,For candidate Emotion word node jumps to the impact factor parameter value vector of priori confidence level;For candidate evaluations Object node stop with The impact factor parameter value vector of machine migration,Stop the impact factor parameter value of random walk for candidate emotion word node Vector;τ is two kinds of random walks of tradeoff to the weight factor of adjacent node mode, its value is set as 0.5 here, i.e., is waited from one Select evaluation object node vtOr candidate emotion word node voWhen continuing migration to adjacent node, migration to candidate evaluations Object node Or the chance of candidate emotion word node is the same;The meaning of operator are as follows:
According to heuristic, random walk impact factor parameter value is adaptively obtained, calculation formula is as follows:
In formula (13) and formula (14), vNULLIt is " NULL " term node introduced in word alignment model;
Through formula (13) and formula (14) it is found that in random walk process, if a term node vtOr voWord w Uncertainty is higher, then entropyOrIt is bigger,OrWith regard to smaller, therefore random walk impact factor parameter value OrAlso just smaller, this can reduce the shadow of uncertain high word error propagation in random walk to a certain extent It rings.
In the random walk model, the weight factor τ of the two ways of tradeoff random walk to adjacent node is fixed , it is difficult to suitable for all words, since this uncertain index is equally to random walk there are the uncertainty of word Two ways to adjacent node has an impact, therefore, a term node vtOr voUncertainty is higher, then its emotion collocation It is more, when continuing migration to adjacent node, by emotional relationship toward term node voOr vtA possibility that migration, is higher, therefore, I The weight factor τ of two ways of tradeoff random walk to adjacent node is improved, enable to be adapted to different Word, improved random walk formula are as follows:
In formula (15) and formula (16),OrIt is that tradeoff is random from a candidate evaluations object or candidate emotion word node Migration to adjacent node two ways weight factor vector, i.e.,OrIt indicates from a candidate evaluations Object node vt Or candidate emotion word node voPass through semantic relation migration to another candidate evaluations object or the weight of candidate emotion word node Because of subvector,OrIt then indicates from a candidate evaluations Object node vtOr candidate emotion word node voPass through Emotional relationship migration is to a candidate emotion word or the weight factor vector of candidate evaluations Object node;
Based on above-mentioned thought, we are rightWithThe exploration of several schemes has been carried out, specific formula is as follows:
Scheme one:WithValue directly use for reference continue migration impact factor parameter value vectorWith
Scheme two: normalized has been carried out on the basis of the first scheme;
Scheme three: word comentropy is used for reference, while also having carried out normalized;
By heuristic, the pairing situation based on word alignment calculates the word comentropy of different nodes, adaptively Random walk impact factor parameter value is adjusted, the influence of uncertain high word error propagation in random walk is reduced.
Compared with prior art, innovation of the invention is:
The present invention solves the problems, such as more parts of speech in word alignment model, through word extension, and advises by interdependent syntactic analysis Then correct the hypothesis problem of word alignment model;In random walk model, while emotional relationship and semantic relation are considered, and is Solve the offering question of impact factor parameter value, use information entropy measures the information content of word corresponding to a term node Size reflects the degree of uncertainty of word corresponding to term node to a certain extent.The present invention is a kind of unsupervised side Method does not need a large amount of artificial labeled data, and the accurate rate of extraction, recall rate are higher.
Detailed description of the invention
Fig. 1 is the flow chart of Chinese Affective Evaluation unit abstracting method provided by the invention;
Fig. 2 is the exemplary diagram of word alignment model of the present invention;
Fig. 3 is the exemplary diagram of present invention rule 1;
Fig. 4 is the exemplary diagram of present invention rule 2;
Fig. 5 is the exemplary diagram of present invention rule 3;
Fig. 6 is the exemplary diagram of present invention rule 4;
Fig. 7 is the exemplary diagram 1 of present invention rule 5;
Fig. 8 is the exemplary diagram 2 of present invention rule 5;
Fig. 9 is the exemplary diagram 3 of present invention rule 5;
Figure 10 is the exemplary diagram 4 of present invention rule 5;
Figure 11 is the exemplary diagram 1 of present invention rule 6;
Figure 12 is the exemplary diagram 2 of present invention rule 6;
Figure 13 is word alignment model framework figure;
Figure 14 is the random walk exemplary diagram on semantic relation non-directed graph;
Figure 15 is the random walk exemplary diagram on emotional relationship non-directed graph;
Figure 16 is the random walk exemplary diagram in emotional relationship and semantic relation non-directed graph.
Specific embodiment
The specific embodiment of the present invention is described in detail in 1~Figure 16 with reference to the accompanying drawing, it is to be understood that The protection scope of the present invention is not limited by the specific implementation manner.
1. word alignment model
The definition of 1.1 word alignment models
In word alignment model, the emotional relationship excavation between evaluation object and emotion word is considered as in machine translation Word alignment process.During word alignment, each sentence is copied into two parts, original language and object language are expressed as.Generally Ground, given length are the sentence vector S=(w of n1,w2,…,wi,…,wn)T, alignment result A={ (i, a of sentence vector Si)|i ∈[1,n],ai∈ [1, n] }, word alignment model is represented by formula (1).
A*=argmaxAP(A|S) (1)
Wherein, wiIndicate i-th of word in sentence vector S, (i, ai) indicate original language position be i word with it is corresponding Target position aiAlignment, i and aiAll indicate the subscript within the scope of sentence length n.Currently, being widely used in word alignment model IBM-3 carries out word alignment, shown in alignment procedure such as formula (2).
Wherein,Indicate wordWith word wjA possibility that alignment, the alignment described between word are closed System;d(j|aj, n) and indicate word position ajIt is aligned possibility with word position j, describes the alignment relation of position;n(θi| wi) indicate word wiWith θiThe ability of a word alignment describes the Modifying Capability of word, θi∈[1,n].For example, in sentence In " sense of touch and appearance are all well and good ", emotion word " good " has modified two evaluation objects, respectively " sense of touch " and " appearance ".
It is more accurate in order to obtain the Matching Relation of evaluation object and emotion word during word alignment, usually right It needs to constrain plus some parts of speech during neat, such as: 1. noun and noun phrase can only be aligned with adjective or verb;2. phase Instead, adjective and verb can only be aligned with noun or noun phrase;3. other words can only be aligned with itself.Therefore, for " touching Sense and appearance are all well and good ", the result of word alignment is as shown in Figure 2.
1.2 word alignments there are the problem of and solution
It is existing there are being aligned in more part of speech phenomenons, the incomplete phenomenon of model hypothesis, sentence with " NULL " in Chinese corpus As.For above-mentioned phenomenon, the present invention has carried out some improvement to word alignment model, and improvement project is as follows:
1) more part of speech problems
In Chinese comment on commodity, a word can be labeled as a variety of parts of speech, such as " taking pictures ", " commonly used it and clapped According to " in part of speech be verb, in " take pictures and be apparent " part of speech be noun.If can be seen that " taking pictures " is used as verb, the word was both It is not evaluation object, nor emotion word;But " taking pictures " be used as noun when, which is likely to be an evaluation object. According to the alignment rule that word alignment model is assumed in advance, " taking pictures " is since there are more part of speech phenomenons, the alignment rules of different parts of speech It is different, therefore word " taking pictures " just will receive the interference of part of speech in alignment procedure, influence to be aligned result.
In order to solve the problems, such as more parts of speech of word, we are extended word, i.e., in the building process of dictionary, pass through " word+lower separator+part of speech that draws lines " extends later " word " to identify.By taking above-mentioned example as an example, " it is commonly used to clap According to " in word " taking pictures " be extended to " take pictures _ V ", the word " taking pictures " in " take pictures and be apparent " is extended to " take pictures _ N ". So, more part of speech phenomenons of word can be effectively solved, alignment effect is improved.
2) with " NULL " alignment problem
During word alignment, in fact it could happen that some word can not find its alignment word, in order to keep alignment procedure smooth Carry out, be provided with " NULL " word, occur word can not find alignment result when, then it is corresponding with " NULL ".For example, in " hand Machine run quickly " in, " RUN " be aligned " quickly ", remaining noun " mobile phone " due to can not find be aligned as a result, then with " NULL " alignment;Similarly, in " trusting millet handset capability ", " trust " is aligned with " performance ", remaining noun " millet " " mobile phone " is due to can not find alignment as a result, being then all aligned with " NULL ".By being aligned with " NULL ", solve to a certain extent pair It can not find the influence of alignment result during neat.
3) model hypothesis problem
Extracting evaluation object and when emotion word, many scholars it is universal it is assumed that the part of speech substantially noun of evaluation object or Noun phrase, and the part of speech of emotion word substantially adjective or verb.However in Chinese corpus, this phenomenon exists certain Careless mistake, such as " mobile phone is run quickly ", it can be found that " RUN " is noted as verb during part-of-speech tagging, however " RUN " is made It is an evaluation object for the index for describing mobile phone fluency.Therefore, because " RUN " is usually noted as verb, do not exist In evaluation object Candidate Set, evaluation object is caused to extract not accurate enough.
In order to make up deficiency of the model hypothesis in Chinese corpus, we constrain word alignment mould by interdependent syntactic rule The alignment of type is as a result, interdependent syntactic analysis can capture the relation of interdependence in sentence between word well, to word alignment The alignment thereof of model has good directive function.Interdependent parsing tree is constructed with LTP, the interdependent syntactic rule used is such as Under:
If comprising fixed middle relationship ATT (core word, qualifier) in regular 1. sentences, and the part of speech of qualifier meets in ATT The part of speech of { adjective | verb | foreign language part of speech }, core word meets { noun | verb | foreign language part of speech }, while the core word of ATT The qualifier of core HED, then it is fixed in qualifier, core word in relationship ATT be mutually aligned, other words be aligned with itself or Person is aligned with " NULL ".
There is relationship ATT in surely (appearance, perfect) in sentence " perfect appearance " in example 1., and qualifier " perfection " is Adjective, core word " appearance " are noun, while core word " appearance " is also the qualifier of core HED.Interdependent syntactic analysis knot Fruit and word alignment result are as shown in Figure 3.
If including subject-predicate relationship SBV (core word, qualifier) in regular 2. sentences, and the part of speech of qualifier meets in SBV The part of speech of { noun | verb | foreign language part of speech }, core word meets { adjective | verb | foreign language part of speech }, while the core word of SBV The qualifier of core HED, then the qualifier in subject-predicate relationship SBV, core word are mutually aligned, other words be aligned with itself or Person is aligned with " NULL ".
Example 2. is in sentence " appearance is exquisite ", and there are subject-predicate relationship SBV (exquisite, appearance), and qualifier " appearance " is run after fame Word, core word " exquisiteness " are adjective, while core word " exquisiteness " is also the qualifier of core HED.Interdependent syntactic analysis result It is as shown in Figure 4 with word alignment result.
If including structure of complementation CMP (core word, qualifier) in regular 3. sentences, and the part of speech of qualifier meets in CMP Adjective | verb | adverbial word | foreign language part of speech, the part of speech of core word meet { verb | foreign language part of speech }, while the core word of CMP The qualifier of core HED, then the qualifier in structure of complementation CMP, core word are mutually aligned, other words be aligned with itself or Person is aligned with " NULL ".
Example 3. is in sentence " operation is quickly ", and there are structure of complementation CMP (operations, quickly), and qualifier " quickly " is pair Word, core word " RUN " are verb, while core word " RUN " is also the qualifier of core HED.Interdependent syntactic analysis result with Word alignment result is as shown in Figure 5.
If comprising dynamic guest's relationship VOB (core word, qualifier) in regular 4. sentences, and the part of speech of qualifier meets in VOB The part of speech of { noun | verb | foreign language part of speech }, core word meets { verb | foreign language part of speech }, while the core word of dependence VOB It is also the qualifier of core HED, then qualifier, the core word moved in guest's relationship VOB is mutually aligned, other words are aligned with itself Or it is aligned with " NULL ".
For example 4. in sentence " trusting millet handset capability ", there are dynamic guest's relationship VOB (to trust, performance), and qualifier " property Energy " is noun, core word " trust " is verb, while core word " trust " is also the qualifier of core HED.Interdependent syntactic analysis As a result as shown in Figure 6 with word alignment result.
If one in regular 5. sentence matching rules 1, rule 2, rule 3 or rule 4, while comprising closing side by side in sentence It is COO (core word, qualifier), and the qualifier in COO and core word meet same matching rule, then coordination COO The alignment thereof of middle qualifier and the alignment thereof of core word are consistent, remaining word alignment mode and sentence institute matching rule Alignment thereof is consistent.
There is relationship ATT (appearance, beautiful) in surely in sentence " beautiful and exquisite appearance " in example 5., with rule 1 Match;Concurrently there are coordination COO (beautiful, exquisite), therefore, in coordination COO the alignment thereof of qualifier " exquisiteness " and The alignment thereof of core word " beautiful " is consistent.I.e. " exquisiteness " is consistent with the alignment thereof of " beautiful ", and is aligned with " appearance ", according to It deposits syntactic analysis result and word alignment result is as shown in Figure 7.
Example 6. is in sentence " appearance is beautiful and exquisite ", and there are subject-predicate relationship SBV (beautiful, appearances), matches with rule 2; It concurrently there are coordination COO (beautiful, exquisite), therefore, the alignment thereof and core of qualifier " exquisiteness " in coordination COO The alignment thereof of heart word " beautiful " is consistent.I.e. " exquisiteness " is consistent with the alignment thereof of " beautiful ", and is aligned with " appearance ", interdependent Syntactic analysis result and word alignment result are as shown in Figure 8.
Example 7. is in sentence " operation smooth, quickly ", and there are structure of complementation CMP (operations, smooth), matches with rule 3;Together When, there are coordination COO (smooth, quickly), therefore, the alignment thereof and core of qualifier " quickly " in coordination COO The alignment thereof of word " smoothness " is consistent.It is i.e. " quickly " consistent with the alignment thereof of " smoothness ", and be aligned with " RUN ", interdependent sentence Method analyzes result and word alignment result is as shown in Figure 9.
For example 8. in sentence " trusting millet handset capability and service ", there are dynamic guest's relationship VOB (to trust, performance), with rule Then 4 matching;It concurrently there are coordination COO (performance, service), therefore, the alignment of qualifier " service " in coordination COO Mode is consistent with the alignment thereof of core word " performance "." service " it is consistent with the alignment thereof of " performance ", and with " trust " Alignment, the results are shown in Figure 10 with word alignment for interdependent syntactic analysis result.
If regular 6. sentences do not have any one of matching rule 1, rule 2, rule 3 or rule 4, by original hypothesis Alignment thereof be aligned.Rule is as follows: noun and noun phrase can only be aligned with adjective or verb;On the contrary, adjective And verb can only be aligned with noun or noun phrase;Other words can only be aligned with itself, if still can not find alignment side above Formula is then aligned with " NULL ".
Example 9. is in sentence " warm and thoughtful ", without matching rule 1, rule 2, rule 3 or rule any one of 4, Its interdependent syntactic analysis result and word alignment result is as shown in figure 11.
Example 10. is in sentence " when special price is bought ", without any in matching rule 1, rule 2, rule 3 or rule 4 One, its interdependent syntactic analysis result and word alignment result is as shown in figure 12.
In Chinese corpus, for above-mentioned phenomenon, we have carried out the alignment thereof of word alignment model some It improves: firstly, being directed to more part of speech problems of word, the word of different parts of speech is distinguished using word extension, so that meeting alignment rule Then;Secondly, by interdependent syntactic analysis, constructing rule 1, rule 2, rule 3 and rule for the deficiency of word alignment model hypothesis Then 4 instruct alignment as a result, and instructing one-to-many alignment thereof using rule 5;Finally, using the initial right of model hypothesis Neat rule 6 carries out word alignment.Improvement project frame is as shown in figure 13.
2. random walk model
The extraction that candidate evaluations object and candidate emotion word can be completed by the word alignment model of upper section, obtains candidate and comments The alignment relation of valence object and candidate emotion word, also referred to as emotional relationship, so as to construct candidate evaluations based on alignment relation Emotional relationship and semantic relation figure between object and candidate emotion word.Remember non-directed graph G=(V, E), V=Vt∪Vo, E=Ett∪ Eoo∪Eto, wherein VtIndicate candidate evaluations object set, VoIndicate candidate emotion set of words, EttIt indicates between candidate evaluations object Semantic relation, EooIndicate the semantic relation between candidate emotion word, EtoIt indicates between candidate evaluations object and candidate emotion word Emotional relationship.
1) only consider semantic relation
In the case where only considering semantic relation, if a word is evaluation object or emotion word, there is the pass Gao Yuyi with it The word of system is probably also evaluation object or emotion word.Semantic relation non-directed graph is as shown in figure 14, wherein TCiIt is i-th Candidate evaluations object, OCiFor i-th of candidate emotion word, MttIndicate the semantic dependency matrix between evaluation object, MooIndicate feelings Feel the semantic dependency matrix between word.
Only consider between evaluation object, the semantic relation between emotion word, random walk process is as follows:
Ct=α MttCt+γIt (18)
Co=α MooCo+γIo (19)
Wherein, CtIndicate the confidence level of candidate evaluations object, CoIndicate the confidence level of candidate emotion word, ItWith IoTable respectively Show the priori confidence level of candidate evaluations object and candidate emotion word;α is the impact factor of semantic relation, and γ is priori confidence level Impact factor, and meet+γ=1 α;As γ=0, the confidence level of expression evaluation object or emotion word is only by the shadow of semantic relation It rings;On the contrary, indicating that the confidence level of evaluation object or emotion word is only influenced by priori confidence level when γ=1.
2) only consider emotional relationship
In the case where only considering emotional relationship, if a word is evaluation object or emotion word, there is high touch pass with it The word of system is probably emotion word or evaluation object.Emotional relationship non-directed graph is as shown in figure 15, wherein TCiIt is waited for i-th Select evaluation object, OCiFor i-th of candidate emotion word, MtoIndicate the emotion correlation matrix between evaluation object and emotion word.
Only consider the emotional relationship between evaluation object and emotion word, random walk process is as follows:
Ct=β MtoCo+γIt (20)
Co=β (Mto)TCt+γIo (21)
Wherein, similar to above, β is the impact factor of emotional relationship, and γ is the impact factor of priori confidence level, and is met + γ=1 β;As γ=0, indicate that the confidence level of evaluation object or emotion word is only influenced by emotional relationship;On the contrary, γ=1 When, indicate that the confidence level of evaluation object or emotion word is only influenced by priori confidence level.
3) emotional relationship and semantic relation are considered simultaneously
In the case of considering emotional relationship and semantic relation simultaneously, if a word is evaluation object or emotion word, with it The word for having high touch relationship is probably emotion word or evaluation object;Meanwhile if a word is evaluation object or emotion Word, then having the word of high semantic relation with it is also probably evaluation object or emotion word.Emotional relationship and semantic relation without It is as shown in figure 16 to figure, wherein TCiFor i-th candidate evaluations object, OCiFor i-th of candidate emotion word, MttIndicate evaluation pair Semantic dependency matrix as between, MooIndicate the semantic dependency matrix between emotion word, MtoIndicate evaluation object and emotion Emotion correlation matrix between word.
Consider emotional relationship and semantic relation simultaneously, random walk process is as follows:
Ct=α MttCt+βMtoCo+γIt (3)
Co=α MooCo+β(Mto)TCt+γIo (4)
Wherein, similar to above, α is the impact factor of semantic relation, and β is the impact factor of emotional relationship, and γ is priori The impact factor of confidence level, and meet alpha+beta+γ=1;As α=1, the confidence level of expression evaluation object or emotion word is only by language The influence of adopted relationship;Similarly, as β=1, indicate that the confidence level of evaluation object or emotion word is only influenced by emotional relationship;When When γ=1, indicate that the confidence level of evaluation object or emotion word is only influenced by priori confidence level.
3. the semantic relation between evaluation object and between emotion word obtains
In order to obtain the semantic similarity between evaluation object or the semantic similarity between emotion word, we mainly by Topic model or term vector are measured.
1) semantic similarity is obtained by topic model
In topic model (LDA), the probability distribution that text representation is the theme, theme is expressed as the probability distribution of word.It borrows KL distance is helped to measure the similitude between two words, and indicated using the relativity measurement between theme evaluation object it Between Semantic Similarity or emotion word between Semantic Similarity.Assuming that two word wiWith wj, their semantemes on theme z The calculation formula of similarity is as follows:
Wherein, p (z | wi) it is word wiIndicate the probability of z of being the theme, KLz(wi,wj) it is the word w on theme ziTo word Language wjKL distance, D (wi,wj) it is the KL on all themesz(wi,wj) and KLz(wj,wi) the sum of average value, SA (wi,wj) be Normalize to the word w in (0,1) sectioniWith word wjSemantic similarity.
2) semantic similarity is obtained by term vector
In order to measure the semantic similarity between word, using word2vec tool, word is expressed as a vector, if Two vectors are expressed as xi=(xi1,xi2,xi3,…xim)TWith xj=(xj1,xj2,xj3,…,xjm)T, use cosine angle as weighing apparatus Measure the foundation of semantic similarity.Calculation formula is as follows:
4. the emotional relationship between evaluation object and emotion word obtains
In order to measure evaluation object wtWith emotion word woBetween emotional relationship intensity, using the co-occurrence information between word It is calculated, calculation is as follows:
OA(wt,wo)=ω p (wt|wo)+(1-ω)p(wo|wt) (10)
Wherein, p (wt|wo) and p (wo|wt) it is conditional probability, count (wt,wo) it is evaluation object wtWith emotion word woIn word The number being mutually aligned in alignment model, count (wo), count (wt) it is respectively emotion word woOr evaluation object wtIn word alignment The total degree occurred in model;OA(wt,wo) indicate candidate evaluations object wtWith candidate emotion word woBetween emotional relationship it is strong Degree, ω are the weight factor for weighing two conditional probabilities, its value are set as 0.5 here, i.e., two conditional probabilities are to emotional relationship Influence as.
5. the random walk on figure
In above random walk process, it can be found that there is also following shortcomings: (1) semantic relation impact factor The parameter value of α, emotional relationship impact factor β and priori confidence level impact factor γ be all it is given in advance, have certain office It is sex-limited;(2) to each term node, impact factor parameter value is put on an equal footing, and will lead to certain error propagation in this way.
In order to solve semantic relation impact factor α, emotional relationship impact factor β and priori confidence level impact factor γ parameter The offering question of value, our use information entropys measure the information content size of word w corresponding to a term node v, certain journey The degree of uncertainty of word w corresponding to term node v is reflected on degree.
A candidate in emotional relationship and semantic relation figure between candidate evaluations object and candidate emotion word is commented Valence Object node vtOr candidate emotion word node vo, when carrying out random walk, have following 3 kinds of migration schemes available:
1) continuation migration, will be from a candidate evaluations Object node v to adjacent nodetOr candidate emotion word node voContinue The impact factor parameter value of migration to adjacent node, other candidate evaluations Object nodes or candidate emotion word node is denoted as respectivelyOr
2) stop random walk, it will be from a candidate evaluations Object node vtOr candidate emotion word node voStop random trip The impact factor parameter value walked is denoted as respectivelyOr
3) priori confidence level is jumped to, it will be from a candidate evaluations Object node vtOr candidate emotion word node voIt jumps to The impact factor parameter value of priori confidence level is denoted as respectivelyOr
After dynamic adjustment impact factor parameter value, the formula of random walk model is as follows:
Wherein, vector IAnd IIn each element value be set as 1;VtFor candidate evaluations Object node vector, VoFor candidate Emotion word knot vector;For the impact factor parameter value from the random walk of candidate evaluations Object node to adjacent node to Amount,For the impact factor parameter value vector from candidate emotion word node random walk to adjacent node;Similarly, The impact factor parameter value vector of priori confidence level is jumped to for candidate evaluations Object node,For candidate emotion word node Jump to the impact factor parameter value vector of priori confidence level;Stop the shadow of random walk for candidate evaluations Object node Factor parameter value vector is rung,Stop the impact factor parameter value vector of random walk for candidate emotion word node;τ is power Two kinds of random walks weigh to the weight factor of adjacent node mode, its value is set as 0.5 here, i.e., from a candidate evaluations object Node vtOr candidate emotion word node voContinue migration arrive adjacent node when, migration to candidate evaluations Object node or candidate emotion The chance of word node is the same;The meaning of operator are as follows:
According to heuristic, random walk impact factor parameter value can be adaptively obtained, calculation formula is as follows:
Wherein, vNULLIt is " NULL " term node introduced in word alignment model.
Through formula (13), formula (14) it is found that in random walk process, if a term node vtOr voWord w not Certainty is higher, then entropyOrIt is bigger,OrWith regard to smaller, therefore random walk impact factor parameter valueOrAlso just smaller, this can reduce the shadow of uncertain high word error propagation in random walk to a certain extent It rings.
For example, if node voWord be " good _ a ", it is a general emotion word, and assortable evaluation object has Very much, such as " performance " " color " " workmanship ", therefore the comentropy of the word is larger,It is smaller, random walk impact factor Parameter valueWithRespectively 0.07559704,0.92440296 and 5.55e-17;Similarly, if node voWord be " bear dirty _ a ", it is a dedicated emotion word, and assortable evaluation object is seldom, such as " color ", therefore the word Comentropy it is smaller,It is larger, random walk impact factor parameter valueWithRespectively 0.43543369,0.56456631 and 1.11E-16.
In above-mentioned random walk parameter, the weight factor τ of the two ways of tradeoff random walk to adjacent node is solid Fixed, it is difficult to suitable for all words.Since it is known that the uncertainty of word, this uncertain index is same Sample has an impact to the two ways of random walk to adjacent node, it is believed that, if a term node vtOr voUncertainty is got over Height, then its emotion collocation is more, when continuing migration to adjacent node, by emotional relationship toward term node voOr vtMigration Possibility is higher.Therefore, we improve the weight factor τ of the two ways of tradeoff random walk to adjacent node, make Can adapt in different words, improved random walk formula is as follows:
Wherein,OrIt is that tradeoff is saved from a candidate evaluations object or candidate emotion word node random walk to neighbouring The weight factor vector of the two ways of point, i.e.,OrIt indicates from a candidate evaluations Object node vtOr candidate emotion word Node voBy semantic relation migration to another candidate evaluations object or the weight factor vector of candidate emotion word node,OrIt then indicates from a candidate evaluations Object node vtOr candidate emotion word node voPass through emotional relationship Migration is to a candidate emotion word or the weight factor vector of candidate evaluations Object node.
Based on above-mentioned thought, we are rightWithThe exploration of several schemes has been carried out, specific formula is as follows:
Scheme one:WithValue directly use for reference continue migration impact factor parameter value vectorWith
Scheme two: normalized has been carried out on the basis of the first scheme;
Scheme three: word comentropy is used for reference, while also having carried out normalized.
By heuristic, the pairing situation based on word alignment calculates the word comentropy of different nodes, adaptively Adjust random walk impact factor, it is therefore an objective to reduce the influence of uncertain high word error propagation in random walk.
Firstly, crawling comment data from electric business website, and certain cleaning is carried out to comment data, later using LTP into Row participle, part-of-speech tagging and interdependent syntactic analysis processing;Secondly, obtaining candidate evaluations object and candidate by word alignment model The Matching Relation of emotion word, and carry out comparative experiments analysis;Again, using the evaluation pair of random walk model extraction high confidence level As with emotion word, equally also compare and analyze;Finally, being carried out to the effect for obtaining auto-adaptive parameter using heuristic real Test assessment.
6. the experiment of word alignment model
In word alignment model experiment, the problem of in order to verify more part of speech problems, be aligned with " NULL " and model hypothesis The influence of problem carries out experiment respectively and is analyzed.
1) word extension solves the problems, such as more parts of speech
In Chinese comment on commodity, the phenomenon that there are more parts of speech, need to be extended the word in former dictionary, and pass through " word+lower separator+part of speech that draws lines " extends later " word " to identify.The comment on commodity data easily purchased with Suning are (altogether 8944 comment texts) for, the dictionary size without word extension is 9847, is by the dictionary size that word extends 10979, it is apparent that, word extends the content for enriching dictionary.Table 1 is word extended example.
2) with " NULL " alignment problem
In alignment procedure, according to alignment rule, often having word can not find alignment as a result, such as " trusting small in sentence In rice handset capability ", since " trust " is aligned with " performance ", and " millet " " mobile phone " can not find the word of alignment as noun, In order to realize the completeness of alignment, can virtual " NULL " word, can not find alignment result word be then aligned with " NULL ". For another example, some comments are " very advanced, handy " " very well, very well, very well " " beautiful, to praise vigorously " etc., it can be found that they are all Emotion word, but according to alignment thereof, due to default evaluation object, cause to can not find alignment word, in the case of these, also with " NULL " alignment.Table 2 is aligned example with " NULL " for word.
1 word extended example table of table
2 word of table is aligned sample table with " NULL "
3) improve model hypothesis problem using interdependent syntactic rule
In Chinese comment corpus, the hypothesis of word alignment model is incomplete, is advised in experimentation by means of interdependent syntax Then constrain the alignment result of word alignment model.The interdependent syntactic rule of building has: rule 1, rule 2, rule 3, rule 4, rule Then 5 and rule 6.By taking comment on commodity data that Suning easily purchases totally 8944 comment texts as an example, will include in total after text subordinate sentence 46669 sentences, in these sentences, regular coverage condition is as shown in table 3.
The interdependent syntactic rule coverage condition table of table 3
By interdependent syntactic analysis, part of speech constraint and the constraint of interdependent syntactic rule are constructed, word alignment can be effectively improved The alignment effect of model makes up the deficiency in word alignment model hypothesis, i.e. noun and noun phrase can only be with adjective or verb Alignment, adjective and verb can only be aligned with noun or noun phrase.Interdependent syntactic rule 1, rule 2, rule 3, rule 4 and rule Then 5, it can capture the word of other parts of speech as candidate evaluations object or candidate emotion word.Such as: the word of v, ws, n part of speech can It can be used as candidate evaluations object, the word of v, ws, a, b, m, i part of speech may show as candidate emotion word, the word of these situations Such as shown in table 4.
The part of speech sample table of 4 candidate evaluations object of table and candidate emotion word
It can be seen that the candidate evaluations pair that more parts of speech are can be found that using interdependent syntactic rule from word alignment model experiment As with candidate emotion word, be not that only noun and noun phrase just can be used as candidate evaluations object, be not only tangible yet Hold word and verb just can be used as candidate emotion word.We constrain by part of speech and interdependent syntactic rule, not only enrich candidate The part of speech requirement of evaluation object and candidate emotion word, while making the alignment thereof of candidate evaluations object and candidate emotion word more Accurately.
7. the experiment of random walk model
In order to investigate the effect that evaluation object, emotion word and other words extract, accurate rate, recall rate and F are used Value is used as evaluation index.Calculation is as follows:
Wherein, RextractIt is correctly counted to extract result, TextractFor the tale of extraction, TsampleFor actual sample Tale.As can be seen that accurate rate reflection is the accurate picture for extracting result, recall rate reflection is to extract result in sample In recall situation.
In order to verify the influence of semantic relation, emotional relationship and priori confidence level to random walk model, we done as Lower three groups of comparative experimentss, experimental result is as shown in 5~table of table 7.
The extraction result table of evaluation object and emotion word when table 5 verifies semantic relation
The extraction result table of evaluation object and emotion word when table 6 verifies emotional relationship
The extraction result table of evaluation object and emotion word when table 7 verifies priori confidence level
1) the result is that influence of the verifying semantic relation to random walk model, migration model are shown in formula (11), formula shown in table 5 (12), adaptive impact factor parameter value all is obtained using the heuristic of formula (13), formula (14).RW-1 is not consider language Adopted relationship only considers emotional relationship, i.e. weight factor parameter τ is set as 0;RW-2, RW-3 consider semantic relation and emotion simultaneously Relationship, i.e. weight factor parameter τ are set as 0.5, but RW-2 excavates semantic relation by LDA model, and RW-3 is excavated by term vector Semantic relation.
2) the result is that influence of the verifying emotional relationship to random walk model, migration model are shown in formula (11), formula shown in table 6 (12), adaptive impact factor parameter value all is obtained using the heuristic of formula (13), formula (14).RW-4 does not consider emotion Relationship only considers semantic relation (i.e. weight factor parameter τ is set as 1);RW-3 considers that emotional relationship and semantic relation (are weighed simultaneously Repeated factor parameter τ is set as 0.5), being shown in Table 5.Semantic relation is excavated by term vector.
3) shown in table 7 the result is that influence of the verifying priori confidence level to random walk model, migration model see formula (3), Formula (4).RW-5, RW-6 are the random walk experiments for considering semantic relation and emotional relationship, are unused heuristic Adaptive impact factor parameter value is obtained, and semantic relation is excavated by term vector, but RW-5 does not consider priori confidence The value of the influence of degree, i.e. impact factor parameter alpha, β and γ are respectively the shadow that 0.5,0.5 and 0.0, RW-6 considers priori confidence level It rings, i.e. impact factor parameter alpha, the value of β and γ are respectively 0.25,0.25 and 0.5.
8. the experiment that heuristic obtains adaptive impact factor parameter value
Heuristic is used to obtain influence of the adaptive impact factor parameter value to random walk model in order to verify, We have also done the comparison of 5 experiments, and experimental result is as shown in table 8.
Table 8 verifies the extraction result table of evaluation object and emotion word when heuristic obtains auto-adaptive parameter value
In this 5 experiments, random walk model all considers the influence of semantic relation and emotional relationship, while also considering The influence of priori confidence level excavates the semantic relation between word by term vector, and detailed description are as follows for experiment:
RW-6: heuristic is not used and obtains impact factor parameter value, impact factor parameter alpha, the value difference of β and γ It is 0.25,0.25 and 0.5, is shown in Table 7;
RW-3: obtaining adaptive impact factor parameter value using heuristic, and random walk model is shown in formula (11), formula (12), weight factor parameter τ is set as 0.5, is shown in Table 5, table 6;
RW-7: obtaining adaptive impact factor parameter value using heuristic, and random walk model is shown in formula (15), formula (16), the weight factor of two ways of tradeoff random walk to adjacent node is shown in the first string in formula (17);
RW-8: obtaining adaptive impact factor parameter value using heuristic, and random walk model is shown in formula (15), formula (16), the weight factor of two ways of tradeoff random walk to adjacent node is shown in second scheme in formula (17);
RW-9;Adaptive impact factor parameter value is obtained using heuristic, random walk model is shown in formula (15), formula (16), the weight factor of two ways of tradeoff random walk to adjacent node is shown in the third scheme in formula (17).
In random walk model experiment, fully consider semantic relation, emotional relationship and priori confidence level to random trip The influence of model extraction effect is walked, while obtaining adaptive weight factor parameter value using heuristic.Experiment shows to examine Initial evaluation object, initial emotion word set can effectively be expanded by considering semantic relation, improve recall rate significantly;Consider emotion Relationship can effectively capture accurate evaluation object, emotion word, be affected to accurate rate is promoted;Meanwhile priori confidence Degree also has great directive function to random walk extraction effect, has more significant work for promoting accurate rate and recall rate With F value is also improved therewith.
Disclosed above is only several specific embodiments of the invention, and still, the embodiment of the present invention is not limited to this, is appointed What what those skilled in the art can think variation should all fall into protection scope of the present invention.

Claims (8)

1. a kind of Chinese Affective Evaluation unit abstracting method, which comprises the following steps:
S1, word-based alignment model excavate candidate evaluations object and candidate emotion word, correct word alignment by interdependent syntactic rule The hypothesis problem of model solves the problems, such as more parts of speech of word alignment model by word extension, according to the candidate evaluations pair excavated As with candidate emotion word, obtain the alignment relation between candidate evaluations object and candidate emotion word, also referred to as emotional relationship, and base The associated diagram of emotional relationship and semantic relation between candidate evaluations object and candidate emotion word is constructed in alignment relation;
S2, the emotional relationship intensity between candidate evaluations object and candidate emotion word is calculated using the co-occurrence information between word, The language between semantic similarity and candidate emotion word between candidate evaluations object is calculated separately by topic model or term vector Adopted similarity, while the alignment condition of word-based alignment model calculates the word comentropy of different nodes, measures a node institute The information content size of corresponding candidate evaluations object or candidate emotion word, according to the feelings between candidate evaluations object and candidate emotion word Feel the semantic similarity and word letter between the semantic similarity and candidate emotion word between relationship strength, candidate evaluations object Breath entropy is adaptively adjusted random walk impact factor;Based between candidate evaluations object and candidate emotion word emotional relationship and Semantic relation, the confidence level of candidate evaluations object and candidate emotion word is obtained by random walk model, and confidence level is higher than default The candidate evaluations object of threshold value and candidate emotion word, are extracted as last evaluation object and last emotion word respectively.
2. Chinese Affective Evaluation unit abstracting method as described in claim 1, which is characterized in that in the step S1 by The hypothesis problem of interdependent syntactic rule amendment word alignment model, interdependent syntactic rule include:
If rule 1, in sentence comprising relationship ATT in fixed, and the part of speech of the qualifier in ATT meet adjective | verb | foreign language Part of speech }, the part of speech of the core word in ATT meets { noun | verb | foreign language part of speech }, while the core word in ATT is also core The qualifier of HED, then the qualifier in ATT and core word are mutually aligned, other words are aligned or right with " NULL " with itself Together;
If rule 2 includes subject-predicate relationship SBV in sentence, and the part of speech of the qualifier in SBV meet noun | verb | outer cliction Property }, the part of speech of core word in SBV meet { adjective | verb | foreign language part of speech }, while the core word in SBV is also core The qualifier of HED, then the qualifier in SBV and core word are mutually aligned, other words are aligned or right with " NULL " with itself Together;
If rule 3 includes structure of complementation CMP in sentence, and the part of speech of the qualifier in CMP meet adjective | verb | adverbial word | Foreign language part of speech }, the part of speech of core word in CMP meet { verb | foreign language part of speech }, while the core word in CMP is also core HED Qualifier, then the qualifier in CMP and core word are mutually aligned, other words are aligned with itself or are aligned with " NULL ";
If rule 4, in sentence comprising dynamic guest's relationship VOB, and the part of speech of the qualifier in VOB meet noun | verb | outer cliction Property }, the part of speech of core word in VOB meet { verb | foreign language part of speech }, while the core word in VOB is also the modification of core HED Word, then the qualifier in VOB and core word are mutually aligned, other words are aligned with itself or are aligned with " NULL ";
If any one of rule 5, sentence matching above-mentioned regular 1, regular 2, regular 3 or rule 4, while comprising simultaneously in sentence Column relationship COO, and the qualifier in COO and core word meet same matching rule, then the alignment thereof of the qualifier in COO with The alignment thereof of core word is consistent, the alignment thereofs of other words is consistent with the alignment thereof of sentence institute matching rule;
If rule 6, sentence do not have any one of matching rule 1, rule 2, rule 3 or rule 4, by pair of original hypothesis Neat mode is aligned;The rule of the alignment thereof of original hypothesis is as follows: noun and noun phrase can only be with adjective or verbs Alignment;On the contrary, adjective and verb can only be aligned with noun or noun phrase;Other words can only be aligned with itself, if more than Still it can not find alignment thereof, be then aligned with " NULL ".
3. Chinese Affective Evaluation unit abstracting method as described in claim 1, which is characterized in that the word pair in the step S1 Neat model, word alignment process the following steps are included:
S11, in word alignment model, the excavation of emotional relationship between evaluation object and emotion word is considered as in machine translation Word alignment process copies each sentence at two parts, is expressed as source language and the target language during word alignment;
S12, the sentence vector S=(w that given length is n1,w2,…,wi,…,wn)T, sentence vector S alignment result A=(i, ai)|i∈[1,n],ai∈ [1, n] }, it is formulated are as follows:
A*=argmaxAP(A|S) (1)
Wherein, wiIndicate i-th of word in sentence vector S, (i, ai) indicate that original language position is word and the corresponding target of i Position aiAlignment, i and aiAll indicate the subscript within the scope of sentence length n;
S13, in word alignment model, using IBM-3 model carry out word alignment, be formulated are as follows:
Wherein,Indicate wordWith word wjA possibility that alignment, describes the alignment relation between word; d(j|aj, n) and indicate word position ajIt is aligned possibility with word position j, describes the alignment relation of word position;n(θi |wi) indicate word wiWith θiThe ability of a word alignment describes the Modifying Capability of word, θi∈[1,n]。
4. Chinese Affective Evaluation unit abstracting method as described in claim 1, which is characterized in that the candidate in the step S1 Emotional relationship and semantic relation associated diagram between evaluation object and candidate emotion word are denoted as: non-directed graph G=(V, E), V=Vt∪ Vo, E=Ett∪Eoo∪Eto, wherein VtIndicate candidate evaluations object set, VoIndicate candidate emotion set of words, EttIndicate that candidate comments Semantic relation in valence object set between candidate evaluations object, EooIt indicates in candidate emotion set of words between candidate emotion word Semantic relation, EtoIt indicates in candidate evaluations object set in candidate evaluations object and candidate emotion set of words between candidate emotion word Emotional relationship;
Random walk model in the step S2 be according between candidate evaluations object and candidate emotion word emotional relationship and Semantic relation associated diagram carries out random walk, and random walk process is formulated are as follows:
Ct=α MttCt+βMtoCo+γIt (3)
Co=α MooCo+β(Mto)TCt+γIo (4)
In formula (3) and formula (4), CtIndicate the confidence level of candidate evaluations object, CoIndicate the confidence level of candidate emotion word, MttIt indicates Semantic similarity matrix between candidate evaluations object, MooIndicate the semantic similarity matrix between candidate emotion word, MtoIt indicates Emotional relationship intensity matrix between candidate evaluations object and candidate emotion word;Vector ItIndicate the elder generation of all candidate evaluations objects Test confidence level, IoIndicate the priori confidence level of all candidate emotion words;α is the impact factor of semantic relation, and β is emotional relationship Impact factor, γ are the impact factor of priori confidence level, and meet alpha+beta+γ=1;As α=1, evaluation object or emotion are indicated The confidence level of word is only influenced by semantic relation;Similarly, as β=1, the confidence level of expression evaluation object or emotion word is only by feelings The influence of sense relationship;As γ=1, indicate that the confidence level of evaluation object or emotion word is only influenced by priori confidence level.
5. Chinese Affective Evaluation unit abstracting method as described in claim 1, which is characterized in that in the step S2 by Topic model obtains the semantic similarity between semantic similarity and candidate emotion word between candidate evaluations object respectively, specifically Method are as follows: measure the similitude between two words by KL distance, and indicated using the relativity measurement between theme Semantic similarity between evaluation object or between emotion word;Assuming that two word wiWith wj, their semantic phases on theme z Calculation formula like degree is as follows:
In formula (5), and p (z | wi) it is word wiIndicate the probability of z of being the theme, KLz(wi,wj) it is the word w on theme ziTo word wjKL distance;In formula (6), D (wi,wj) it is the KL on all themesz(wi,wj) and KLz(wj,wi) the sum of average value;Formula (7) in, SA (wi,wj) it is the word w for normalizing to (0,1) sectioniWith word wjSemantic similarity.
6. Chinese Affective Evaluation unit abstracting method as described in claim 1, which is characterized in that in the step S2 by Term vector obtains the semantic similarity between semantic similarity and candidate emotion word between candidate evaluations object, specific side respectively Method are as follows: use word2vec tool, word is expressed as a vector, if two vectors are expressed as xi=(xi1,xi2,xi3,… xim)TWith xj=(xj1,xj2,xj3,…,xjm)T, use cosine angle as the foundation for measuring semantic similarity, semantic similarity Calculation formula it is as follows:
The calculation formula of emotional relationship intensity between the candidate evaluations object and candidate emotion word are as follows:
OA(wt,wo)=ω p (wt|wo)+(1-ω)p(wo|wt) (10)
In formula (9) and formula (10), p (wt|wo) and p (wo|wt) it is conditional probability, count (wt,wo) it is candidate evaluations object wtWith Candidate emotion word woThe number being mutually aligned in word alignment model, count (wo), count (wt) it is respectively candidate emotion word wo Or candidate evaluations object wtThe total degree occurred in word alignment model;OA(wt,wo) indicate candidate evaluations object wtWith candidate feelings Feel word woBetween emotional relationship intensity, ω be tradeoff two conditional probability p (wt|wo) and p (wo|wt) weight factor, here Its value is set as 0.5, i.e. as influence of two conditional probabilities to emotional relationship intensity.
7. Chinese Affective Evaluation unit abstracting method as claimed in claim 4, which is characterized in that the formula (3) and formula (4) Random walk process,
In order to solve semantic relation impact factor α, emotional relationship impact factor β and priori confidence level impact factor γ parameter value Offering question, use information entropy measure the information content size of word w corresponding to a term node v, reflect to a certain extent The degree of uncertainty of word w corresponding to term node v;
A candidate in emotional relationship and semantic relation associated diagram between candidate evaluations object and candidate emotion word is commented Valence Object node vtOr candidate emotion word node vo, when carrying out random walk, including three kinds of migration schemes:
1) continuation migration, will be from a candidate evaluations Object node v to adjacent nodetOr candidate emotion word node voContinue migration Impact factor parameter value to adjacent node, other candidate evaluations Object nodes or candidate emotion word node is denoted as respectively Or
2) stop random walk, it will be from a candidate evaluations Object node vtOr candidate emotion word node voStop random walk Impact factor parameter value is denoted as respectivelyOr
3) priori confidence level is jumped to, it will be from a candidate evaluations Object node vtOr candidate emotion word node voJump to priori The impact factor parameter value of confidence level is denoted as respectivelyOr
After dynamic adjustment impact factor parameter value, the formula of random walk model is as follows:
In formula (11) and formula (12), vector IWith vector IIn each element value be set as 1;Vector ItIndicate that all candidates comment The priori confidence level of valence object, IoIndicate the priori confidence level of all candidate emotion words;VtFor candidate evaluations Object node vector, VoFor candidate emotion word knot vector;For the impact factor from the random walk of candidate evaluations Object node to adjacent node Parameter value vector,For the impact factor parameter value vector from candidate emotion word node random walk to adjacent node;It is similar Ground,The impact factor parameter value vector of priori confidence level is jumped to for candidate evaluations Object node,For candidate emotion Word node jumps to the impact factor parameter value vector of priori confidence level;Stop random trip for candidate evaluations Object node The impact factor parameter value vector walked,Stop the impact factor parameter value vector of random walk for candidate emotion word node; τ is two kinds of random walks of tradeoff to the weight factor of adjacent node mode, its value is set as 0.5 here, i.e., is commented from a candidate Valence Object node vtOr candidate emotion word node voWhen continuing migration to adjacent node, migration to candidate evaluations Object node or time It is the same for selecting the chance of emotion word node;The meaning of operator are as follows:
According to heuristic, random walk impact factor parameter value is adaptively obtained, calculation formula is as follows:
In formula (13) and formula (14), vNULLIt is " NULL " term node introduced in word alignment model;
Through formula (13) and formula (14) it is found that in random walk process, if a term node vtOr voWord w not really It is qualitative higher, then entropyOrIt is bigger,OrWith regard to smaller, therefore random walk impact factor parameter valueOrAlso just smaller, this can reduce the shadow of uncertain high word error propagation in random walk to a certain extent It rings.
8. Chinese Affective Evaluation unit abstracting method as claimed in claim 7, which is characterized in that the random walk model In, the weight factor τ of two ways of tradeoff random walk to adjacent node is fixed, it is difficult to suitable for all words, Since this uncertain index equally has shadow to the two ways of random walk to adjacent node there are the uncertainty of word It rings, therefore, a term node vtOr voUncertainty is higher, then its emotion collocation is more, continues migration to adjacent node When, by emotional relationship toward term node voOr vtA possibility that migration, is higher, and therefore, we are to tradeoff random walk to neighbouring The weight factor τ of the two ways of node is improved, and enables to be adapted to different words, improved random walk is public Formula is as follows:
In formula (15) and formula (16),OrIt is to weigh from a candidate evaluations object or candidate emotion word node random walk To the weight factor vector of the two ways of adjacent node, i.e.,OrIt indicates from a candidate evaluations Object node vtOr it waits Select emotion word node voPass through semantic relation migration to another candidate evaluations object or the weight factor of candidate emotion word node Vector,OrIt then indicates from a candidate evaluations Object node vtOr candidate emotion word node voPass through emotion Relationship migration is to a candidate emotion word or the weight factor vector of candidate evaluations Object node;
Based on above-mentioned thought, we are rightWithThe exploration of several schemes has been carried out, specific formula is as follows:
Scheme one:WithValue directly use for reference continue migration impact factor parameter value vectorWith
Scheme two: normalized has been carried out on the basis of the first scheme;
Scheme three: word comentropy is used for reference, while also having carried out normalized;
By heuristic, the pairing situation based on word alignment calculates the word comentropy of different nodes, is adaptively adjusted Random walk impact factor parameter value reduces the influence of uncertain high word error propagation in random walk.
CN201910301318.6A 2019-04-15 2019-04-15 A kind of Chinese Affective Evaluation unit abstracting method Pending CN110008477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910301318.6A CN110008477A (en) 2019-04-15 2019-04-15 A kind of Chinese Affective Evaluation unit abstracting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910301318.6A CN110008477A (en) 2019-04-15 2019-04-15 A kind of Chinese Affective Evaluation unit abstracting method

Publications (1)

Publication Number Publication Date
CN110008477A true CN110008477A (en) 2019-07-12

Family

ID=67172023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910301318.6A Pending CN110008477A (en) 2019-04-15 2019-04-15 A kind of Chinese Affective Evaluation unit abstracting method

Country Status (1)

Country Link
CN (1) CN110008477A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126046A (en) * 2019-12-06 2020-05-08 腾讯云计算(北京)有限责任公司 Statement feature processing method and device and storage medium
CN111339752A (en) * 2020-02-18 2020-06-26 哈尔滨工业大学 Evaluation object-oriented emotion analysis method for multi-task joint learning
CN111966800A (en) * 2020-07-27 2020-11-20 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device
CN113379167A (en) * 2021-08-02 2021-09-10 刘德喜 Method for predicting psychological crisis degree of internet forum users
CN113434628A (en) * 2021-05-14 2021-09-24 南京信息工程大学 Comment text confidence detection method based on feature level and propagation relation network
CN116562305A (en) * 2023-07-10 2023-08-08 江西财经大学 Aspect emotion four-tuple prediction method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268160A (en) * 2014-09-05 2015-01-07 北京理工大学 Evaluation object extraction method based on domain dictionary and semantic roles
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268160A (en) * 2014-09-05 2015-01-07 北京理工大学 Evaluation object extraction method based on domain dictionary and semantic roles
CN105117428A (en) * 2015-08-04 2015-12-02 电子科技大学 Web comment sentiment analysis method based on word alignment model
CN106257455A (en) * 2016-07-08 2016-12-28 闽江学院 A kind of Bootstrapping algorithm based on dependence template extraction viewpoint evaluation object

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
喻聪: "基于句法分析与随机游走的商品评论评价单元抽取", 《万方数据库》 *
廖祥文等: "基于多层关系图模型的中文评价对象与评价词抽取方法", 《自动化学报》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126046A (en) * 2019-12-06 2020-05-08 腾讯云计算(北京)有限责任公司 Statement feature processing method and device and storage medium
CN111126046B (en) * 2019-12-06 2023-07-14 腾讯云计算(北京)有限责任公司 Sentence characteristic processing method and device and storage medium
CN111339752A (en) * 2020-02-18 2020-06-26 哈尔滨工业大学 Evaluation object-oriented emotion analysis method for multi-task joint learning
CN111966800A (en) * 2020-07-27 2020-11-20 腾讯科技(深圳)有限公司 Emotional dialogue generation method and device and emotional dialogue model training method and device
CN111966800B (en) * 2020-07-27 2023-12-12 腾讯科技(深圳)有限公司 Emotion dialogue generation method and device and emotion dialogue model training method and device
CN113434628A (en) * 2021-05-14 2021-09-24 南京信息工程大学 Comment text confidence detection method based on feature level and propagation relation network
CN113434628B (en) * 2021-05-14 2023-07-25 南京信息工程大学 Comment text confidence detection method based on feature level and propagation relation network
CN113379167A (en) * 2021-08-02 2021-09-10 刘德喜 Method for predicting psychological crisis degree of internet forum users
CN113379167B (en) * 2021-08-02 2022-09-23 刘德喜 Method for predicting psychological crisis degree of internet forum user
CN116562305A (en) * 2023-07-10 2023-08-08 江西财经大学 Aspect emotion four-tuple prediction method and system
CN116562305B (en) * 2023-07-10 2023-09-12 江西财经大学 Aspect emotion four-tuple prediction method and system

Similar Documents

Publication Publication Date Title
CN110008477A (en) A kind of Chinese Affective Evaluation unit abstracting method
US10997370B2 (en) Hybrid classifier for assigning natural language processing (NLP) inputs to domains in real-time
Raganato et al. Neural sequence learning models for word sense disambiguation
Zhou et al. Learning semantic representation with neural networks for community question answering retrieval
Pathak et al. English–Mizo machine translation using neural and statistical approaches
He et al. Trirank: Review-aware explainable recommendation by modeling aspects
JP6309644B2 (en) Method, system, and storage medium for realizing smart question answer
Boyd-Graber et al. Multilingual topic models for unaligned text
Dwivedi et al. Research and reviews in question answering system
US20190392066A1 (en) Semantic Analysis-Based Query Result Retrieval for Natural Language Procedural Queries
CN103870000B (en) The method and device that candidate item caused by a kind of pair of input method is ranked up
US20140114649A1 (en) Method and system for semantic searching
Boschee et al. Automatic information extraction
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN108681557A (en) Based on the short text motif discovery method and system indicated from expansion with similar two-way constraint
Soderland et al. Panlingual lexical translation via probabilistic inference
Asghari et al. Algorithms and corpora for persian plagiarism detection: overview of PAN at FIRE 2016
CN107092605A (en) A kind of entity link method and device
Wang et al. NLP-based query-answering system for information extraction from building information models
Asghari et al. On the use of word embedding for cross language plagiarism detection
Zhan et al. Probing product description generation via posterior distillation
de-Dios-Flores et al. The Nós Project: Opening routes for the Galician language in the field of language technologies
Banerjee et al. Generating abstractive summaries from meeting transcripts
Hirigoyen et al. A copy mechanism for handling knowledge base elements in SPARQL neural machine translation
Nguyen et al. Introducing two vietnamese datasets for evaluating semantic models of (dis-) similarity and relatedness

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190712