CN109376251A - A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model - Google Patents

A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model Download PDF

Info

Publication number
CN109376251A
CN109376251A CN201811143903.XA CN201811143903A CN109376251A CN 109376251 A CN109376251 A CN 109376251A CN 201811143903 A CN201811143903 A CN 201811143903A CN 109376251 A CN109376251 A CN 109376251A
Authority
CN
China
Prior art keywords
term vector
learning model
microblogging
training
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811143903.XA
Other languages
Chinese (zh)
Inventor
葛季栋
李传艺
孔力
杨玉凡
冯奕
周筱羽
骆斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201811143903.XA priority Critical patent/CN109376251A/en
Publication of CN109376251A publication Critical patent/CN109376251A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The invention discloses a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model, comprising: (1) corresponding training corpus is obtained for the feature of current microblog data;(2) data prediction work is carried out to training corpus;(3) candidate dictionary is constructed;(4) seed sentiment dictionary is constructed;(5) selection and definition of training parameter and configuration;(6) training term vector learning model;(7) term vector learning model training result is assessed;(8) iteration executes step (6), until the training of all parameter traversals finishes;(9) term vector of optimum evaluation result is selected;(10) training word-level feeling polarities classifier;(11) words application grade feeling polarities classifier and final target sentiment dictionary is obtained.The present invention devises one and combines semantic and emotion information term vector learning model, thus devises the Chinese sentiment dictionary construction method towards microblogging, can promote the efficiency and quality for obtaining Chinese sentiment dictionary.

Description

A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
Technical field
The present invention relates to a kind of term vector learning arts, and in particular to a kind of microblogging Chinese based on term vector learning model Sentiment dictionary construction method, belongs to natural language processing technique field.
Background technique
Sentiment analysis is an important branch of natural language processing field, is also known as opining mining, proneness analysis. Its task is that people's quick obtaining, arrangement and analysis related commentary information are helped by computer resource, to passionate color Color subjectivity information text is analyzed, handled, concluded and reasoning.In recent years, with the popularity of the internet with development, especially It is the rise of all kinds of social networks, the network user can issue daily and propagate up to more than one hundred million information.In the letter of these magnanimity In informative text, there is the viewpoint tendency for greatly expressing user and Sentiment orientation, these emotion information texts are very precious Expensive opinion resource includes people to the different viewpoints and position of the various phenomenons of society, topic be related to politics, it is economical, military, The various fields such as amusement, life.Individuals and organizations increasingly pay attention to the Sentiment orientation and viewpoint of user, and by the analysis knot to it Fruit is used for relevant Decision, therefore automatically analyzes it processing using computer technology, in the analysis of public opinion, precision marketing, sales volume The fields such as prediction suffer from very extensive application, thus cause the extensive concern of enterprise, researcher and government organs. Sentiment analysis is a newer and more popular research field, after starting from 2000.The sub- direction of sentiment analysis includes emotion Classification, viewpoint extraction, viewpoint question and answer and viewpoint abstract etc..Wherein, narrow sense can be regarded to the classification of text emotion tendency as Sentiment analysis belongs to the research range of text classification;The elements such as extraction viewpoint holder, evaluation object then belong to information extraction and ask Topic;The viewpoint to some object is determined from a large amount of texts, and is considered as Issues about Information Retrieval.
For sentiment analysis problem, the sentiment dictionary for constructing high quality can provide great help for sentiment analysis.Emotion The difference is that for a word, mark is not its semanteme or its foreign language translation for dictionary and normal dictionary, But its feeling polarities.The labels of this feeling polarities, can also either " positive " of coarseness, " passiveness ", " neutrality " To be fine-grained " indignation ", " fearing ", " liking " etc..In addition to polar categories, this polar intensity can also be provided, table is carried out Up to the emotion intensity of vocabulary out.The classification of sentiment dictionary can be divided into three classes: basic sentiment dictionary, expand sentiment dictionary and Field sentiment dictionary.Emotion word that is that basic sentiment dictionary includes some bases and being accepted extensively, such as " fine ", " beauty It is beautiful ", " evil person ", " ugliness " etc.;Sentiment dictionary is expanded, is expanded by basic sentiment dictionary, main side Formula is to carry out the extension of emotion word by synonymicon;For the emotion word inside identification sentence, basic emotion is relied solely on Dictionary is inadequate, because the word being not present in basic sentiment dictionary is also likely to be present mood and inclines in certain fields To, such as: " this mobile phone always blue screen ", " blue screen " is exactly the word for having negative feeling in this field of the digital products such as mobile phone, Therefore domain lexicon is also needed.
Complete sentiment dictionary is to carry out necessity for sentiment analysis without adequate condition, the emotion of text with comprising Word emotion has very big correlation.Utilize sentiment dictionary, it can be determined that whether each of sentence word has actively Perhaps passive Sentiment orientation or more fine-grained specific emotional color and intensity are obtained, thus to judge a sentence Sub, a document Sentiment orientation provides certain reference frame.Therefore, how the emotion that range is wide, quality is high is constructed automatically Dictionary has great research significance.Currently, the sentiment dictionary of English has had many good achievements, and Chinese emotion word Allusion quotation is although there is some products, and there are also to be strengthened for quality.
Summary of the invention
The present invention is a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model, is primarily directed to Microblogging Chinese sentiment dictionary constructs task, according to current existing term vector learning method and the feature of Chinese language, proposes A kind of term vector learning model of combination semanteme and emotion information proposes a kind of benefit in combination with the feature of microblog data The method of microblogging Chinese sentiment dictionary is constructed with term vector learning model.In building process, pointedly to microblogging sentence into Row pretreatment, optimizes the training process of term vector learning model, improves the semanteme and emotional expression ability of the term vector of acquisition, most The quality of the Chinese sentiment dictionary obtained is improved eventually.
A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model of the present invention, feature exist In the following steps are included:
Step (1) obtains corresponding training corpus for the feature of current microblog data;
Step (2) carries out data prediction work to the microblogging training corpus of acquisition;
Step (3) constructs candidate dictionary, as the dictionary in subsequent term vector learning model training process;
Step (4) constructs seed sentiment dictionary;
The selection and definition of step (5) term vector learning model training parameter and configuration;
Step (6) trains term vector learning model;
Step (7) assesses term vector learning model training result;
Step (8) iteration executes step (6), until the training of all parameter traversals finishes;
Step (9) selects the term vector under optimum evaluation result, the vector characteristics final as word;
Step (10) trains word-level feeling polarities classifier;
Step (11) words application grade feeling polarities classifier simultaneously obtains final target sentiment dictionary.
Specifically, step (1) is the corresponding training corpus of feature acquisition for current microblog data.
Step (2) is to carry out data prediction work for the microblogging training corpus obtained, and obtaining from original language material can Mainly include following sub-step to be directly used in the data of model training:
Step (2.1) data normalization extracts the useful information in microblogging sentence;
The emotion of step (2.2) microblogging sentence parses, the feeling polarities information of labeled statement;
Step (2.3) carries out Chinese word segmentation for microblogging sentence;
Stop words is arranged in step (2.4), and filtering is for meaningless word in term vector learning model training process.
Step (3) is to carry out the construction of candidate dictionary for the microblogging corpus after pretreatment.
Step (4) constructs seed sentiment dictionary, and the dictionary is by the learning process for being used for subsequent term vector and final word The training process of grade feeling polarities classifier, mainly includes following sub-step:
Step (4.1) is based on candidate dictionary construction basic seeds sentiment dictionary;
Step (4.2) is based on seed sentiment dictionary, expands seed sentiment dictionary size using synonym extended method.
Step (5) is chosen and defines term vector learning model relevant parameter and configuration, mainly includes following sub-step:
Non-combined word vocabulary is arranged in step (5.1), and filtering is without considering Chinese in term vector learning model training process The word of word semanteme;
The setting of step (5.2) term vector learning model training parameter;
The setting of step (5.3) term vector learning model evaluation criteria.
The training of step (6) term vector learning model is trained using training corpus and obtains corresponding term vector.
The term vector that step (7) is obtained for training is assessed, which is that the emotion of word is carried out using term vector Polarity classification task assesses the quality of term vector, determines the quality of term vector by accuracy rate and macro average F value.
Step (8) adjusting training parameter iteration executes step (6), until all equal iteration of parameter are finished.
Step (9) selects the word under optimal result according to the assessment result under different term vector learning model training parameters The vector vector characteristics final as word.
Step (10) utilizes the term vector training word-level feeling polarities classifier obtained.
Step (11) words application grade feeling polarities classifier carries out the feeling polarities reasoning of word in candidate dictionary, and shape At final target sentiment dictionary.
Compared with prior art, the present invention its remarkable advantage is: rejecting microblogging language using the analysis of the technologies such as regular expression Irrelevant information in sentence, avoids influence of these irrelevant informations for term vector learning model training result;Use stop words Influence of the nonsense words to term vector training process is removed, noise word is reduced, reduces computation complexity;Use non-combined word To remove influence of the word for being not necessarily to text semanteme in consideration to term vector training process, reduction computation complexity;Use one Kind trains acquisition term vector in conjunction with semantic and emotion information term vector learning model, the semanteme of three parts of models coupling Information, semantic information, the feeling polarities information of sentence and the feelings of word of the word of context and composition word including word Feel polarity information, the term vector obtained by this model can preferably express the semanteme and affective characteristics of word.
Detailed description of the invention
Fig. 1 constructs process based on the microblogging Chinese sentiment dictionary of term vector learning model
Fig. 2 seed sentiment dictionary extends process
Fig. 3 seed sentiment dictionary is respectively classified quantity situation
Fig. 4 combines semantic and emotion information term vector learning model
Fig. 5 term vector estimation flow figure
Fig. 6 word-level feeling polarities classifier training flow chart
Specific embodiment
It is right below in conjunction with the accompanying drawings and the specific embodiments to be more clear the object, technical solutions and advantages of the present invention The present invention is described in detail.
The purpose of the present invention is to provide a kind of efficient and accurate Chinese sentiment dictionary construction methods, propose a kind of base In the microblogging Chinese sentiment dictionary construction method of term vector learning model.Valid data are filtered out by using regular expression, Emotion is marked using emoticon, seed sentiment dictionary is constructed using a kind of mode of semi-automation, uses a kind of novel knot Semantic and emotion information term vector learning model is closed to train acquisition term vector, the Chinese of acquisition is improved by these modes The quality of sentiment dictionary.The invention mainly includes steps:
Step (1) obtains corresponding training corpus for the feature of current microblog data;
Step (2) carries out data prediction work to the microblogging training corpus of acquisition;
Step (3) constructs candidate dictionary, as the dictionary in subsequent term vector learning model training process;
Step (4) constructs seed sentiment dictionary;
The selection and definition of step (5) term vector learning model training parameter and configuration;
Step (6) trains term vector learning model;
Step (7) assesses term vector learning model training result;
Step (8) iteration executes step (6), until the training of all parameter traversals finishes;
Step (9) selects the term vector under optimum evaluation result, the vector characteristics final as word;
Step (10) trains word-level feeling polarities classifier;
Step (11) words application grade feeling polarities classifier simultaneously obtains final target sentiment dictionary.
Detailed operation process such as Fig. 1 institute of the above-mentioned microblogging Chinese sentiment dictionary construction method based on term vector learning model Show.Here above-mentioned steps are described in detail respectively.
1. various neologisms layers go out not entirely since the expression way of user on microblogging is ever-changing, so in selection corpus When choose that wherein emotional expression is abundant and corpus with current era feature as far as possible, is promoted as much as possible with this final trained The accuracy and timeliness of obtained sentiment dictionary, since in reality, user expresses the way of viewpoint by emoticon It is more and more, emotion all kinds of emoticons abundant are contained in a large amount of microblogging sentence, so finally crawling microblog data When, only obtain the sentence that those include emoticon.
2. rejecting meaningless content to obtain valuable information from microblogging training corpus, need to carry out data Pretreatment work specifically includes following sub-step:
(2.1) data normalization, for the microblog data got, due to being the daily commentary delivered of user, no Normalization with document, a portion information is no for model training in all senses, such as the symbol in text Information (comma, fullstop, exclamation mark etc.), some webpage link informations that may be present in text, there are also present in text its The information (such as # theme # ,@user, other special symbolic information) of his form, further, since target is to construct Chinese emotion word Allusion quotation, it is therefore desirable to reject the English word in microblogging sentence;It is eventually by regular expression that these redundant informations are literary from microblogging It is removed in this, leaves behind valuable text information;
(2.2) label microblogging sentence feeling polarities only obtain the data comprising emoticon when obtaining microblog data, When carrying out specific microblogging sentence feeling polarities label, such a strategy is used: if only wrapped in a microblogging sentence The emoticon of the positive emotion containing expression, the feeling polarities of this microblogging sentence are exactly positive, conversely, its feeling polarities is exactly Passive.During specific implementation, need to summarize two class emoticon collection, such as [relative], [heart], [struggle] express product The emotion of pole, [sad], [cursing in rage], [terrified] etc. express passive emotion;Using the strategy and emoticon set, lead to The matched mode of canonical is crossed to parse microblog text affective information, is finally labelled with corresponding emotion on each microblogging sentence Polarity.
(2.3) Chinese word segmentation, since the target of this method is building microblogging Chinese sentiment dictionary, Chinese word is this method The basic unit of operation uses Jieba participle tool for processing and segments task, the participle tool in the specific implementation process There are three types of different participle modes, are accurate model, syntype and search engine mode respectively, wherein accurate model is by sentence It accurately separates, compares suitable for text analyzing;Syntype is to provide all words that can be scanned to divide, this mode It is easy to appear ambiguity, and the word segmentation result of search engine mode is segmented suitable for search engine.It is different for the participle tool The feature of participle mode, this method select accurate model to carry out specific participle task.Finally one by one by microblogging sentence set Carry out participle operation.
(2.4) stop words is arranged, and the natural language text form that user shows emotion is varied, wherein comprising a large amount of Pronoun, conjunction, interjection, for example, etc, oh, I, then etc., these words trained actual term vector learning model Journey is nonsensical, and for the sentiment dictionary finally constructed be also not in all senses, so before model training, One deactivated vocabulary can be first set, and the word deactivated in vocabulary at this can be removed in specific training process, reduced with this Bring negatively affects these words in the training process.It in the specific implementation process, can will be in the microblogging sentence after participle The word being present in deactivated vocabulary is rejected, and has filtered the microblogging sentence of deactivated vocabulary as the corpus in subsequent training process.
3. constructing subsequent dictionary, the dictionary set needed during subsequent experimental is obtained, when constructing candidate dictionary, first According to frequency of occurrence will be pretreated after corpus in word be ranked up, the too low data of removal frequency of occurrence, here A frequency threshold value MIN_FREQUENCY is set, the word that will be less than the threshold value all removes, and using remaining word as candidate Dictionary carries out subsequent experimentation.10 are set by the frequency threshold value in the actual implementation process.
4. constructing seed sentiment dictionary, classify for the learning process of subsequent term vector and last word-level feeling polarities The training process of device specifically includes following sub-step:
(4.1) based on candidate dictionary construction basic seeds sentiment dictionary, basic seeds sentiment dictionary is with a high credibility, quantity Less dictionary is carried out by the way of manually marking in specific implementation process, selects 5 labelers first, then from candidate 5 labelers are allowed to carry out emotion to these words respectively later from 500 vocabulary of high to low selection according to frequency of occurrence in dictionary Polarity mark, mark value is divided into three classes: actively, it is passive and other, finally extract mark value is all the same in five parts of data It is configured to basic dictionary, different takes most values voted as annotation results for marking.
(4.2) seed sentiment dictionary extends, although the feeling polarities of basic seeds sentiment dictionary are with a high credibility, due to Negligible amounts, while the mode inefficiency manually marked, so need a kind of method of automation to carry out the extension of dictionary, Specific extension process is as shown in Figure 2.Specific implementation steps are as follows: (i.e. by emotion word w existing in basic seeds sentiment dictionary Not comprising the unrelated word of emotion, the i.e. word of other classifications) it is put into the near synonym set S that w is searched in Harbin Institute of Technology's Chinese thesaurus, for Each word w_new in S, count w_new in the near synonym set M in Harbin Institute of Technology's Chinese thesaurus it is positive/passive/other Number n1, n2, the n3 of class word, if n1 > n2+threshold_pos, n1 > n3+threshold_pos, then return w_new To positive class, similarly for passive/other division mode, until the word in S is all inspected, stop algorithm.Pass through True extension experience, the extension dictionary effect that every threshold value is respectively set as 1,0,0 acquisition are best.It is final to obtain Seed sentiment dictionary quantity situation it is as shown in Figure 3.
5. the selection and definition of term vector learning model training parameter and configuration, specifically include following sub-step:
(5.1) non-combined word vocabulary setting, since the training process of subsequent term vector learning model considers composition word The feature of the word of language, still, the word in not every Chinese word are all meaningful, such as foreign language phonemic loans, as " chalk Power ", " sofa " etc., they are come by the pronunciation transliteration of English word, and individually word can not inside these words The semanteme of word itself, furthermore many substantive nouns, such as name, place name, organization name are expressed, these words are instructed in model Feature without the concern for word during practicing, in the specific implementation process, by the way of manual reviews' candidate's dictionary come Non-combined word is extracted in identification.
(5.2) term vector learning model training parameter is arranged, and needs the model training number that carries out with this to be arranged;Because Need to obtain be adjoint product --- the term vector of model training, so needing to consider different training parameters to finally obtaining The influence of the quality of term vector, specific term vector quality evaluation are arranged in next step.It is primarily upon in specific training process Several parameter indexes are as follows: window size, vector dimension and initial learning rate.
(5.3) evaluation criteria be arranged, due to need obtain be model training intermediate product, so it is finally paying close attention to and Model training as a result, but the term vector data of final output, so need to assess is the quality of term vector, here Characterized by term vector, word-level feeling polarities classification task is carried out, it is specific to assess finally using classification results as evaluation criteria Process will be described in detail in subsequent model evaluation step.
6. the training of term vector learning model utilizes combination semanteme and emotion information after the pretreatment of microblogging corpus finishes Term vector learning model train corresponding term vector feature, network structure such as Fig. 4 institute of term vector learning model here Show, which combines three kinds of language message joint training term vectors, context and composition word including word The feeling polarities information of the semantic information of the word of language, the feeling polarities information of sentence and word, in specific training process, point Safety pin is trained these three language messages, and is optimized using Negative Sampling method to objective function, The objective function of these three final parts is respectively such as following f1、f2、f3Shown in formula.
7. term vector learning model training result is assessed, wherein assessment object is the term vector obtained, the totality of the assessment Thinking is to carry out word-level feeling polarities classification task using the term vector obtained as feature, final to choose classifying quality most Good parameter combination, in the specific implementation process, selection construct SVM classifier to carry out classification task, whole evaluation process As shown in Figure 5.The specific operating procedure of the process are as follows: the first step prepares training set and test set, here by the seed of extension Emotion set of words is as training test set, also with the thought of k-fold cross validation, by entire seed emotion Set of words is divided evenly into 5 parts.Second step selects a copy of it data as test set each time, remaining 4 parts are used as training set For SVM classifier model training.Third step repeats second step 5 times, and each part of data set in this way all can serve as test set ginseng With enter.Training can obtain a model on each training set, be tested on corresponding test set with this model, calculate simultaneously The evaluation index of preservation model.4th step, calculates estimation of the average value as model accuracy of 5 groups of test results, and as working as The performance indicator of preceding k folding cross validation drag.
8. adjusting term vector learning model parameter, iteration executes the assessment of model training and training result, records simultaneously The performance indicator that the model obtained under lower different parameters is shown in assessment component.
9. as a result, which type of parameter can be gone out with Tactic selection according to obtained in model training before and model evaluation It combines to carry out the training of final term vector learning model.And the term vector generated after completing the model training is as subsequent The basic word feature of sentiment dictionary building is applied.Finally according to hands-on and evaluation process, selected term vector Practise the final argument setting of model are as follows: window size 5, term vector dimension are 200, and initial learning rate is 0.025.
10. training word-level feeling polarities classifier obtains portion by the training of term vector learning model before With the semantic and associated term vector feature of emotion, word-level feeling polarities point are constructed in conjunction with seed sentiment dictionary set Class device.Here the feeling polarities of the word set are divided into three classes: actively, it is passive and other, in the specific implementation process, use SVM classifier to carry out the feeling polarities classification, overall training process as shown in fig. 6, by seed emotion set of words and Input of the term vector feature as model training, wherein seed emotion set of words needs to be divided into training set and test set, utilizes K The mode of cross validation is rolled over to carry out model training and obtain the classifying quality index of corresponding model, in addition, by adjusting mould The hyper parameter of type itself carrys out iteration and executes model training, final to choose by comparing the classifying quality index under different models The optimal sorter model of classifying quality is as final sentiment dictionary reasoning device.The parameter that the process finally confirms includes: core Function is Gaussian kernel (Radial Basis Function, RBF), and penalty coefficient C is that 1, gamma parameter is 1/k, and wherein k is The feature quantity of word, the i.e. dimension of term vector.
11. the word-level feeling polarities classifier that application training obtains, and obtained before being to the set of words of reasoning Candidate dictionary, the classifier can be divided among three classification according to the term vector feature of each word, finally obtain respectively Taking feeling polarities is positive, passive and other set of words, is actively together with the word bout of passiveness by wherein expression It may make up final target Chinese sentiment dictionary.
Above by reference to attached drawing to a kind of microblogging Chinese feelings based on term vector learning model implemented according to the present invention Sense dictionary creation method is described in detail.It is rejected the present invention has the advantage that being analyzed using technologies such as regular expressions Irrelevant information in microblogging sentence avoids influence of these irrelevant informations for term vector learning model training result;Using stopping Word removes influence of the nonsense words to term vector training process, reduces noise word, reduces computation complexity;Using non- Portmanteau word come remove be not necessarily to consider in text semanteme influence of the word to term vector training process, reduction computation complexity;Make Acquisition term vector is trained with the term vector learning model of a kind of combination semanteme and emotion information, three parts of the models coupling Semantic information, the semantic information of word, the feeling polarities information of sentence and the word of context and composition word including word The feeling polarities information of language, the term vector obtained by this model can preferably express semanteme and the emotion spy of word Sign.
It needs to define, the invention is not limited to specific configuration described above and shown in figure and processing.Also, For brevity, the detailed description to known method technology is omitted here.Current embodiment is all counted as in all respects It is exemplary rather than limited, the scope of the present invention is by appended claims rather than foregoing description defines, and falls into power Whole changes in the range of meaning and equivalent that benefit requires are to all be included among the scope of the present invention.

Claims (12)

1. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model, it is characterized in that by design fusion Literary semantic and emotion information term vector learning model obtains corresponding term vector feature to train, retraining word-level emotion point Emotion reasoning of the class device for final Chinese word, comprising the following steps:
Step (1) obtains corresponding training corpus for the feature of current microblog data;
Step (2) carries out data prediction work to the microblogging training corpus of acquisition;
Step (3) constructs candidate dictionary, as the dictionary in subsequent term vector learning model training process;
Step (4) constructs seed sentiment dictionary;
The selection and definition of step (5) term vector learning model training parameter and configuration;
Step (6) trains term vector learning model;
Step (7) assesses term vector learning model training result;
Step (8) iteration executes step (6), until the training of all parameter traversals finishes;
Step (9) selects the term vector under optimum evaluation result, the vector characteristics final as word;
Step (10) trains word-level feeling polarities classifier;
Step (11) words application grade feeling polarities classifier simultaneously obtains final target sentiment dictionary.
2. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that obtaining corresponding training corpus for the feature of current microblog data in step (1).
3. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that carrying out data prediction work for the microblogging training corpus obtained in step (2), obtaining from original language material can Mainly include following sub-step to be directly used in the data of model training:
Step (2.1) data normalization extracts the useful information in microblogging sentence;
The emotion of step (2.2) microblogging sentence parses, the feeling polarities information of labeled statement;
Step (2.3) carries out Chinese word segmentation for microblogging sentence;
Stop words is arranged in step (2.4), and filtering is for meaningless word in term vector learning model training process.
4. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that in step (3) carrying out the construction of candidate dictionary for the microblogging corpus after pretreatment.
5. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (4) construct seed sentiment dictionary, the dictionary is by the learning process for being used for subsequent term vector and final word The training process of grade feeling polarities classifier, mainly includes following sub-step:
Step (4.1) is based on candidate dictionary construction basic seeds sentiment dictionary;
Step (4.2) is based on seed sentiment dictionary, expands seed sentiment dictionary size using synonym extended method.
6. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (5) are chosen and define term vector learning model relevant parameter and configuration, mainly includes following sub-step:
Non-combined word vocabulary is arranged in step (5.1), and filtering is without text language in considering in term vector learning model training process The word of justice;
The setting of step (5.2) term vector learning model training parameter;
The setting of step (5.3) term vector learning model evaluation criteria.
7. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that the training of step (6) term vector learning model, is trained using training corpus and obtain corresponding term vector.
8. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, The term vector for being characterized in that step (7) are obtained for training is assessed, which is that the emotion of word is carried out using term vector Polarity classification task assesses the quality of term vector, determines the quality of term vector by accuracy rate and macro average F value.
9. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (8) adjusting training parameter iteration executes step (6), until all equal iteration of parameter are finished.
10. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (9) according to the assessment result under different term vector learning model training parameters, selects the word under optimal result The vector vector characteristics final as word.
11. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (10) using the term vector training word-level feeling polarities classifier obtained.
12. a kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model according to claim 1, It is characterized in that step (11) words application grade feeling polarities classifier carries out the feeling polarities reasoning of word in candidate dictionary, and shape At final target sentiment dictionary.
CN201811143903.XA 2018-09-25 2018-09-25 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model Pending CN109376251A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811143903.XA CN109376251A (en) 2018-09-25 2018-09-25 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811143903.XA CN109376251A (en) 2018-09-25 2018-09-25 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model

Publications (1)

Publication Number Publication Date
CN109376251A true CN109376251A (en) 2019-02-22

Family

ID=65402988

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811143903.XA Pending CN109376251A (en) 2018-09-25 2018-09-25 A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model

Country Status (1)

Country Link
CN (1) CN109376251A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858034A (en) * 2019-02-25 2019-06-07 武汉大学 A kind of text sentiment classification method based on attention model and sentiment dictionary
CN110083825A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of Laotian sentiment analysis method based on GRU model
CN110263321A (en) * 2019-05-06 2019-09-20 成都数联铭品科技有限公司 A kind of sentiment dictionary construction method and system
CN110570941A (en) * 2019-07-17 2019-12-13 北京智能工场科技有限公司 System and device for assessing psychological state based on text semantic vector model
CN110569354A (en) * 2019-07-22 2019-12-13 中国农业大学 Barrage emotion analysis method and device
CN110597997A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario text event extraction corpus iterative construction method and device
CN110750648A (en) * 2019-10-21 2020-02-04 南京大学 Text emotion classification method based on deep learning and feature fusion
CN111061876A (en) * 2019-12-10 2020-04-24 中国建设银行股份有限公司 Event public opinion data analysis method and device
CN111191463A (en) * 2019-12-30 2020-05-22 杭州远传新业科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN111353044A (en) * 2020-03-09 2020-06-30 重庆邮电大学 Comment-based emotion analysis method and system
CN111400496A (en) * 2020-03-18 2020-07-10 江苏海洋大学 Public praise emotion analysis method for user behavior analysis
CN111522913A (en) * 2020-04-16 2020-08-11 山东贝赛信息科技有限公司 Emotion classification method suitable for long text and short text
CN111881676A (en) * 2020-07-03 2020-11-03 南京航空航天大学 Emotion classification method based on word vectors and emotion part of speech
CN112765350A (en) * 2021-01-15 2021-05-07 西华大学 Microblog comment emotion classification method based on emoticons and text information
CN113111655A (en) * 2021-05-12 2021-07-13 数库(上海)科技有限公司 Construction method of separation dictionary, word segmentation method and device based on separation dictionary
WO2021147298A1 (en) * 2020-01-21 2021-07-29 中国银联股份有限公司 Sentiment lexicon construction method and system, sentiment recognition method and system, and storage medium
CN113191135A (en) * 2021-01-26 2021-07-30 北京联合大学 Multi-category emotion extraction method fusing facial characters
CN113420151A (en) * 2021-07-13 2021-09-21 上海明略人工智能(集团)有限公司 Emotion polarity intensity classification method, system, electronic device and medium
CN116340511A (en) * 2023-02-16 2023-06-27 深圳市深弈科技有限公司 Public opinion analysis method combining deep learning and language logic reasoning
CN116450840A (en) * 2023-03-22 2023-07-18 武汉理工大学 Deep learning-based field emotion dictionary construction method
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278195A1 (en) * 2014-03-31 2015-10-01 Abbyy Infopoisk Llc Text data sentiment analysis method
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150278195A1 (en) * 2014-03-31 2015-10-01 Abbyy Infopoisk Llc Text data sentiment analysis method
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN107193801A (en) * 2017-05-21 2017-09-22 北京工业大学 A kind of short text characteristic optimization and sentiment analysis method based on depth belief network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨玉凡: "中文情感词典构建中词向量学习技术的研究与应用", 《中国知网》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858034B (en) * 2019-02-25 2023-02-03 武汉大学 Text emotion classification method based on attention model and emotion dictionary
CN109858034A (en) * 2019-02-25 2019-06-07 武汉大学 A kind of text sentiment classification method based on attention model and sentiment dictionary
CN110083825A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of Laotian sentiment analysis method based on GRU model
CN110263321A (en) * 2019-05-06 2019-09-20 成都数联铭品科技有限公司 A kind of sentiment dictionary construction method and system
CN110570941A (en) * 2019-07-17 2019-12-13 北京智能工场科技有限公司 System and device for assessing psychological state based on text semantic vector model
CN110570941B (en) * 2019-07-17 2020-08-14 北京智能工场科技有限公司 System and device for assessing psychological state based on text semantic vector model
CN110597997A (en) * 2019-07-19 2019-12-20 中国人民解放军国防科技大学 Military scenario text event extraction corpus iterative construction method and device
CN110597997B (en) * 2019-07-19 2022-03-22 中国人民解放军国防科技大学 Military scenario text event extraction corpus iterative construction method and device
CN110569354A (en) * 2019-07-22 2019-12-13 中国农业大学 Barrage emotion analysis method and device
CN110569354B (en) * 2019-07-22 2022-08-09 中国农业大学 Barrage emotion analysis method and device
CN110750648A (en) * 2019-10-21 2020-02-04 南京大学 Text emotion classification method based on deep learning and feature fusion
CN111061876B (en) * 2019-12-10 2023-06-13 中国建设银行股份有限公司 Event public opinion data analysis method and device
CN111061876A (en) * 2019-12-10 2020-04-24 中国建设银行股份有限公司 Event public opinion data analysis method and device
CN111191463A (en) * 2019-12-30 2020-05-22 杭州远传新业科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
WO2021147298A1 (en) * 2020-01-21 2021-07-29 中国银联股份有限公司 Sentiment lexicon construction method and system, sentiment recognition method and system, and storage medium
CN111353044B (en) * 2020-03-09 2022-11-11 重庆邮电大学 Comment-based emotion analysis method and system
CN111353044A (en) * 2020-03-09 2020-06-30 重庆邮电大学 Comment-based emotion analysis method and system
CN111400496A (en) * 2020-03-18 2020-07-10 江苏海洋大学 Public praise emotion analysis method for user behavior analysis
CN111400496B (en) * 2020-03-18 2023-05-09 江苏海洋大学 Public praise emotion analysis method for user behavior analysis
CN111522913A (en) * 2020-04-16 2020-08-11 山东贝赛信息科技有限公司 Emotion classification method suitable for long text and short text
CN111881676B (en) * 2020-07-03 2024-03-15 南京航空航天大学 Emotion classification method based on word vector and emotion part of speech
CN111881676A (en) * 2020-07-03 2020-11-03 南京航空航天大学 Emotion classification method based on word vectors and emotion part of speech
CN112765350A (en) * 2021-01-15 2021-05-07 西华大学 Microblog comment emotion classification method based on emoticons and text information
CN113191135A (en) * 2021-01-26 2021-07-30 北京联合大学 Multi-category emotion extraction method fusing facial characters
CN113111655A (en) * 2021-05-12 2021-07-13 数库(上海)科技有限公司 Construction method of separation dictionary, word segmentation method and device based on separation dictionary
CN113420151A (en) * 2021-07-13 2021-09-21 上海明略人工智能(集团)有限公司 Emotion polarity intensity classification method, system, electronic device and medium
CN116340511A (en) * 2023-02-16 2023-06-27 深圳市深弈科技有限公司 Public opinion analysis method combining deep learning and language logic reasoning
CN116340511B (en) * 2023-02-16 2023-09-15 深圳市深弈科技有限公司 Public opinion analysis method combining deep learning and language logic reasoning
CN116450840A (en) * 2023-03-22 2023-07-18 武汉理工大学 Deep learning-based field emotion dictionary construction method
CN117217218A (en) * 2023-11-08 2023-12-12 中国科学技术信息研究所 Emotion dictionary construction method and device, electronic equipment and storage medium
CN117217218B (en) * 2023-11-08 2024-01-23 中国科学技术信息研究所 Emotion dictionary construction method and device for science and technology risk event related public opinion

Similar Documents

Publication Publication Date Title
CN109376251A (en) A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
Saeed et al. An ensemble approach for spam detection in Arabic opinion texts
CN108446271B (en) Text emotion analysis method of convolutional neural network based on Chinese character component characteristics
CN108573047A (en) A kind of training method and device of Module of Automatic Chinese Documents Classification
CN111767741A (en) Text emotion analysis method based on deep learning and TFIDF algorithm
CN105183717B (en) A kind of OSN user feeling analysis methods based on random forest and customer relationship
CN107315734B (en) A kind of method and system to be standardized based on time window and semantic variant word
KR20120109943A (en) Emotion classification method for analysis of emotion immanent in sentence
Zhao et al. ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN
US11030533B2 (en) Method and system for generating a transitory sentiment community
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
CN107463703A (en) English social media account number classification method based on information gain
CN106569996B (en) A kind of Sentiment orientation analysis method towards Chinese microblogging
Mohandas et al. Domain specific sentence level mood extraction from malayalam text
Xiao et al. Chinese text sentiment analysis based on improved Convolutional Neural Networks
CN112069312A (en) Text classification method based on entity recognition and electronic device
CN112434164A (en) Network public opinion analysis method and system considering topic discovery and emotion analysis
Katyayan et al. Sarcasm detection approaches for English language
US11605004B2 (en) Method and system for generating a transitory sentiment community
KR20130103249A (en) Method of classifying emotion from multi sentence using context information
Ilavarasan A Survey on Sarcasm detection and challenges
Walha et al. A Lexicon approach to multidimensional analysis of tweets opinion
CN116911286A (en) Dictionary construction method, emotion analysis device, dictionary construction equipment and storage medium
Bhatia et al. Analysing cyberbullying using natural language processing by understanding jargon in social media

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20190222