CN110222185A - A kind of emotion information representation method of associated entity - Google Patents

A kind of emotion information representation method of associated entity Download PDF

Info

Publication number
CN110222185A
CN110222185A CN201910511692.9A CN201910511692A CN110222185A CN 110222185 A CN110222185 A CN 110222185A CN 201910511692 A CN201910511692 A CN 201910511692A CN 110222185 A CN110222185 A CN 110222185A
Authority
CN
China
Prior art keywords
entity
text
word
term vector
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910511692.9A
Other languages
Chinese (zh)
Inventor
徐睿峰
梁斌
杜嘉晨
黄锦辉
何瑜岚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN201910511692.9A priority Critical patent/CN110222185A/en
Publication of CN110222185A publication Critical patent/CN110222185A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Abstract

The invention patent relates to a kind of emotion information classification methods of associated entity.The method comprising the steps of one), indicates using the large-scale term vector of wikipedia corpus training as the general term vector of word in text;Step 2), term vector is finely adjusted for entity and entity attribute different in text in conjunction with the Q learning method in intensified learning, word is made to have different vectors to indicate when modifying different entities or entity attribute;Step 3), the word emotion information vector expression that study obtains is applied in entity level text emotion analysis task.Using this method the feeling polarities of different entities or entity attribute can be effectively differentiated in the case where not using attention mechanism.

Description

A kind of emotion information representation method of associated entity
Technical field
The invention belongs to emotion information presentation technology field, in particular to the emotion information expression side of a kind of associated entity Method.
Background technique
Text emotion analysis is the differentiation by the completion such as being analyzed text, concluding, handle to text feeling polarities. In text emotion analysis task, the word information of text can directly affect text especially with the word of emotional color Feeling polarities.In the text emotion analysis task with entity, need to carry out feeling polarities for entity different in text to sentence Disconnected, this will not only consider text itself, while also consider entity information different in text.In the text data of reality, Often there are multiple entities in one text, and different entities have different emotional expressions.On the other hand, for different realities Body, even if using the same qualifier, it is possible that antipodal feeling polarities.Such as " noise of automobile is very big ", " vapour The space of vehicle is very big ", it is equally to describe that the word " big " of entity " automobile " attribute is Negative Affect when describing " car noise ", It and is positive emotion when describing " motor space ".
There are many traditional word information expressing method, such as: One-hot representation method and by word be expressed as successive value to Term vector representation method (continuous bag of words, Continues Bag Of Words and jump multi-component grammar, the Skip n- of amount Gram) etc..Such methods for model learning and adjustment, can learn to arrive word by the way that word to be expressed as to the vector of a multidimensional Characteristic information in the text.But the above method usually only considers other words in word itself and word and text Dependence.So word all only has identical vector table for the attribute of different scenes, different entities and different entities Show.For the sentiment analysis task with entity, currently used method is to splice the expression of special entity and different terms, structure Making new word indicates, be either added outside knowledge base or interdependent syntactic analysis etc. obtain between different terms and entity Connection.Although these methods can solve the word information in multiple entity text emotion analysis task to a certain extent, expression is asked Topic, but there are still some disadvantages:
1. identical vector information can be added to different terms in the method that binding entity vector indicates, cannot effectively distinguish not With word to the percentage contribution of entity or entity attribute;
2. the method for combining external knowledge needs the quality of height dependence external knowledge, when the information of introducing is inappropriate, It can bring challenges instead to the study of model;
3. such methods all do not have different terms construction vector to be indicated for special entity, entity attribute, make word There is different expressions when modifying different entities, and the significance level of word is distinguished.
Summary of the invention
For the shortcoming for overcoming prior art, the present invention proposes the method that a kind of emotion information of associated entity indicates, Targetedly vector can be carried out to word in the case where not using external knowledge to finely tune, make word when being associated with different entities There is different vectors to indicate, effectively differentiates the feeling polarities of different entities or entity attribute.
To achieve the goals above, the technical solution adopted by the present invention are as follows:
A kind of emotion information representation method of associated entity, which is characterized in that this method includes the following steps:
Step 1), using the large-scale term vector of wikipedia corpus training as the general term vector of word in text It indicates;
Step 2), in conjunction with intensified learning Q learning method for entity and entity attribute different in text to the word of word Vector is finely adjusted, and word is made to have different vectors to indicate when modifying different entities or entity attribute;
Step 3), the word emotion information vector expression that study obtains is applied to specific text emotion analysis task In.
Next word is chosen with ε-greedy, and different entities are assigned with different reward values.
Compared to existing technology, advantages of the present invention has:
1, Q study can not make in combination intensified learning proposed by the invention come the method being finely adjusted to term vector Targetedly vector fine tuning is carried out to word in the case where external knowledge, make word have when being associated with different entities it is different to Amount indicates
2, it can be obtained in text apart from entity or the farther away word of entity attribute using ε-greedy method to entity or reality The emotional connection of body attribute.
3, input text is indicated using the term vector after fine tuning proposed by the present invention, it can be without using attention mechanism In the case of, effectively differentiate the feeling polarities of different entities or entity attribute.
Detailed description of the invention
Fig. 1 is general term vector training;
Fig. 2 is the disaggregated model using fine tuning term vector.
Specific embodiment
The present invention is further described for explanation and specific embodiment with reference to the accompanying drawing.
The present invention is a kind of emotion information representation method of associated entity.
The key step of this method has:
Step 1: using the large-scale term vector of wikipedia corpus training as the general term vector table of word in text Show;
Step 2: in conjunction with intensified learning Q learning method for entity and entity attribute different in text to the word of word to Amount is finely adjusted, and word is made to have different vectors to indicate when modifying different entities or entity attribute;
Step 3: the word emotion information vector expression that study obtains is applied in specific text emotion analysis task.
This method schematic diagram is shown in attached drawing 1,2.
It is specific as follows (attached using the general term vector of large-scale wikipedia corpus training in above method step 1 Shown in Fig. 1):
1. crawling enough corpus from wikipedia, and corpus is pre-processed, filters out and task is not acted on Text;
2. use depth language model network (ASGD Weight-Dropped Long-Short Term Memory, AWD-LSTM term vector training) is carried out on wikipedia corpus, obtains the term vector set of entry.
In above method step 2, using the Q study in intensified learning with AWD-LSTM network in particular task corpus In term vector is micro-adjusted:
vs,w=vs,w+α(ri+γmaxw′vs′,w′-vs,w)
Wherein, vs,wIt is indicated for the vector of current term, vs′,w′For the vector table for reaching next word from current term Show, riFor the mobile reward value of this word provided for entity or entity attribute i, α is learning rate, and γ is award decay series Number.In the present invention, centered on a certain entity or entity attribute, move word along the entity or entity attribute, often A mobile word, assigns an award 0, and a specific award r is assigned when word is moved to entity or entity attribute ii。 Method by the way that different entities and entity attribute are arranged with different awards can carry out specific aim to different terms in learning process Adjustment.Meanwhile the method by moving word gradually, different terms can be also distinguished to the emotion shadow of entity or entity attribute The degree of sound.
In addition, certain pairs of entities have the word of highlights correlations to be likely to appear in from entity farther out in the text of reality Place, will be unable to learn well these words using above-mentioned method for trimming at this time and the emotion of entity or entity attribute joined System.In order to solve this problem, the present invention has used ε-greedy to choose word next time, i.e., with ε's in the above-mentioned methods Probability randomly selects word in the text.Those can be effectively obtained farther out but the word that has an important influence from entity by this method Emotional connection of the language to entity or entity attribute.
In fine tuning term vector method, carry out objective function using mean square error:
L (v)=E (ri+γmaxw′vs′,w′-vs,w)2
In above method step 3, using traditional shot and long term memory network (Long-Short Term Memory, LSTM) text emotion with entity is carried out to specific corpus to analyze.The specific method is as follows (shown in attached drawing 2):
Step 31): input text is indicated using the term vector after fine tuning, and text is chronologically transported to LSTM network In.
Step 32): learnt and adjusted the abstract of the available text of ginseng to the term vector matrix in 1 by LSTM network Change character representation:
H=[h1,h2,...,hn]
Step 33) passes through using the abstract feature of the last layer network obtained in 2 as the input of full articulamentum The sentiment analysis result of the available associated entity of softmax function.
Y=softmax (Whn+b)。
To sum up, this method can carry out targetedly vector fine tuning to word in the case where not using external knowledge, make word Language has different vectors to indicate when being associated with different entities, can be obtained in text using ε-greedy method apart from entity or entity Emotional connection of the farther away word of attribute to entity or entity attribute, term vector after fine tuning indicate input text, can be not In the case where using attention mechanism, the feeling polarities of different entities or entity attribute are effectively differentiated.
The above content is a further detailed description of the present invention in conjunction with specific preferred embodiments, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims (6)

1. a kind of emotion information representation method of associated entity, which is characterized in that
Step 1), using the large-scale term vector of extensive corpus of text training as the general term vector table of word in text Show;
Step 2), in conjunction with intensified learning Q learning method for entity and entity attribute different in text to the term vector of word It is finely adjusted, word is made to have different vectors to indicate when modifying different entities or entity attribute;
Step 3), the word emotion information vector expression that study obtains is applied in specific text emotion analysis task.
2. the method according to claim 1, wherein above-mentioned steps one) in, the specific step of the general term vector of training It is rapid as follows: crawl a large amount of corpus of text from internet, and corpus pre-processed, unrelated symbol in removal text and Stop words.Later using depth language model network (ASGD Weight-Dropped Long-Short Term Memory, AWD-LSTM term vector training) is carried out on large-scale corpus, obtains the term vector set of word.
3. method according to claim 1 or 2, which is characterized in that in above-mentioned steps two) in, utilize the Q in intensified learning Study and AWD-LSTM network are micro-adjusted term vector in particular task corpus:
vs,w=vs,w+α(ri+γmaxw′vs′,w′-vs,w)
Wherein, vs,wIt is indicated for the vector of current term, vs′,w′It is indicated to reach the vector of next word from current term, ri For the mobile reward value of this word provided for entity or entity attribute i, α is learning rate, and γ is award decay coefficient.
4. method according to claim 1 or 2, which is characterized in that above-mentioned steps three) in, using shot and long term memory network (Long-Short Term Memory, LSTM) carries out the text emotion with entity to specific corpus and analyzes.
5. according to the method described in claim 4, it is characterized by: above-mentioned carry out band entity to specific corpus using LSTM network Text emotion analysis specific steps are as follows:
Step 31): input text is indicated using the term vector after fine tuning, and text is chronologically transported in LSTM network;
Step 32): the abstract that the term vector matrix in step 31) is learnt and ginseng is adjusted to obtain text by LSTM network Character representation:
H=[h1,h2,...,hn]
Step 33): using the abstract feature of the last layer network obtained in step 32) as the input of full articulamentum, pass through Softmax function obtains the sentiment analysis result of associated entity:
Y=softmax (Whn+b)
Wherein, W is weight matrix, and b is biasing.
6. according to the method described in claim 3, it is characterized in that, choose next word with ε-greedy, and to difference Entity assigns different reward values.
CN201910511692.9A 2019-06-13 2019-06-13 A kind of emotion information representation method of associated entity Pending CN110222185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910511692.9A CN110222185A (en) 2019-06-13 2019-06-13 A kind of emotion information representation method of associated entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910511692.9A CN110222185A (en) 2019-06-13 2019-06-13 A kind of emotion information representation method of associated entity

Publications (1)

Publication Number Publication Date
CN110222185A true CN110222185A (en) 2019-09-10

Family

ID=67816893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910511692.9A Pending CN110222185A (en) 2019-06-13 2019-06-13 A kind of emotion information representation method of associated entity

Country Status (1)

Country Link
CN (1) CN110222185A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241453A (en) * 2020-10-20 2021-01-19 虎博网络技术(上海)有限公司 Emotion attribute determining method and device and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN107066446A (en) * 2017-04-13 2017-08-18 广东工业大学 A kind of Recognition with Recurrent Neural Network text emotion analysis method of embedded logic rules
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN108629690A (en) * 2018-04-28 2018-10-09 福州大学 Futures based on deeply study quantify transaction system
CN109857848A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Interaction content generation method, device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6963839B1 (en) * 2000-11-03 2005-11-08 At&T Corp. System and method of controlling sound in a multi-media communication application
CN107066446A (en) * 2017-04-13 2017-08-18 广东工业大学 A kind of Recognition with Recurrent Neural Network text emotion analysis method of embedded logic rules
CN107301171A (en) * 2017-08-18 2017-10-27 武汉红茶数据技术有限公司 A kind of text emotion analysis method and system learnt based on sentiment dictionary
CN108629690A (en) * 2018-04-28 2018-10-09 福州大学 Futures based on deeply study quantify transaction system
CN109857848A (en) * 2019-01-18 2019-06-07 深圳壹账通智能科技有限公司 Interaction content generation method, device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何炎祥 孙松涛 牛菲菲 李 飞: "用于微博情感分析的一种情感语义增强的", 《计算机学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241453A (en) * 2020-10-20 2021-01-19 虎博网络技术(上海)有限公司 Emotion attribute determining method and device and electronic equipment
CN112241453B (en) * 2020-10-20 2023-10-13 虎博网络技术(上海)有限公司 Emotion attribute determining method and device and electronic equipment

Similar Documents

Publication Publication Date Title
CN108021616B (en) Community question-answer expert recommendation method based on recurrent neural network
US8818926B2 (en) Method for personalizing chat bots
US9396724B2 (en) Method and apparatus for building a language model
US9454958B2 (en) Exploiting heterogeneous data in deep neural network-based speech recognition systems
CN107818164A (en) A kind of intelligent answer method and its system
US20060206333A1 (en) Speaker-dependent dialog adaptation
CN110188202A (en) Training method, device and the terminal of semantic relation identification model
Armbrust A history of new media in the Arab Middle East
KR102415101B1 (en) A device that analyzes the emotions of the examinee using voice data, text data, and picture data extracted from the subject's voice
CN110825850B (en) Natural language theme classification method and device
EP3270374A1 (en) Systems and methods for automatic repair of speech recognition engine output
CN110210027A (en) Fine granularity sentiment analysis method, apparatus, equipment and medium based on integrated study
CN110134863A (en) The method and device that application program is recommended
Zhao et al. Domain-oriented prefix-tuning: Towards efficient and generalizable fine-tuning for zero-shot dialogue summarization
CN114818703A (en) Multi-intention recognition method and system based on BERT language model and TextCNN model
CN115630145A (en) Multi-granularity emotion-based conversation recommendation method and system
CN110222185A (en) A kind of emotion information representation method of associated entity
CN107734123A (en) A kind of contact sequencing method and device
CN112434165A (en) Ancient poetry classification method and device, terminal equipment and storage medium
Hirzel et al. I can parse you: Grammars for dialogs
Wijayanti et al. Illocutionary Acts in Main Character's Dialogue of “Maleficent: Mistress of Evil” Movie
Tomko et al. Speech graffiti vs. natural language: Assessing the user experience
Zhang et al. The iscslp 2022 intelligent cockpit speech recognition challenge (icsrc): Dataset, tracks, baseline and results
CN113627155A (en) Data screening method, device, equipment and storage medium
Zhan et al. Application of machine learning and image target recognition in English learning task

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190910