CN108536801A - A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning - Google Patents

A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning Download PDF

Info

Publication number
CN108536801A
CN108536801A CN201810290094.9A CN201810290094A CN108536801A CN 108536801 A CN108536801 A CN 108536801A CN 201810290094 A CN201810290094 A CN 201810290094A CN 108536801 A CN108536801 A CN 108536801A
Authority
CN
China
Prior art keywords
microblogging
threat
text
word
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810290094.9A
Other languages
Chinese (zh)
Inventor
韩萍
孙佳慧
方澄
贾云飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Civil Aviation University of China
Original Assignee
Civil Aviation University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Civil Aviation University of China filed Critical Civil Aviation University of China
Priority to CN201810290094.9A priority Critical patent/CN108536801A/en
Publication of CN108536801A publication Critical patent/CN108536801A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning.It includes the following steps:The content of text concentrated to microblog data is pre-processed and is segmented;Training term vector;The deep learning network C LSTM of combination is built, grader of the training based on the network threatens content to classify to whether containing civil aviaton's security in microblogging text;Marking is refined for the microblogging text for having threat, evaluates its Threat grade.The present invention trains grader first with the method based on deep learning, filters out the negative speech of subjectivity in relation to civil aviaton, the microblogging for removing objective speech such as news, stating the fact roughly;Civil aviaton's public sentiment key words and rule are recycled to calculate and divide Threat grade.Solve the problems, such as there is that applicability is stronger, accuracy rate higher in the method based on word allusion quotation and rule because the objective speech containing civil aviaton's public sentiment key words is judged to high Threat grade.

Description

A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning
Technical field
The invention belongs to the text emotion analysis technical fields in natural language processing, more particularly to one kind being based on depth Civil aviaton's microblogging security public sentiment sentiment analysis method of habit.
Background technology
In recent years, all kinds of flights, airport bomb threat and false terrified information are in the impetus occurred frequently on internet.Certain groups It is many often to threaten speech, rumour and extreme language etc. in Web realease falseness because discontented to society, when facing the threat of terrorism, Compared to other industries, Civil Aviation Industry is generally subjected to more major injury.Microblogging because with spread speed fast, information opening, influence extensively and The characteristics of publisher's identity is not easy to reveal is a kind of common terrified information propagating pathway.
Sentiment analysis is the process that the text with emotional color is handled, analyzed and applied, it can be from text User Perspective and Sentiment orientation are obtained in data, there is important practical value.The research method and technological means of sentiment analysis Usually related with goal in research, generally speaking, existing achievement in research is applied to product scope or the analysis of public opinion field, such as public more Evaluation to product or event and opinion.Sentiment analysis method for civil aviaton's terror information is extremely rare, by pair and civil aviaton Related microblogging text carries out sentiment analysis, can filter out the microblogging for having threat to safety of civil aviation, to which locking has crime to incline To emphasis user.
Currently, Chinese text sentiment analysis method mainly has based on semantic understanding and learns two class sides based on conventional machines Method.But both methods is applied to be primarily present problems in microblog emotional analysis:1. the method structure based on semantic understanding It builds the method that benchmark passes judgement on word library and defines display rule and pattern match is carried out to language material, for expression way complexity, do not advise There is significant limitation in microblogging text-processing then.2. the method based on conventional machines study needs complicated Feature Engineering, Expend a large amount of cost of labor.
Invention content
To solve the above-mentioned problems, the purpose of the present invention is to provide a kind of civil aviaton's microblogging security carriage based on deep learning Feelings sentiment analysis method.
In order to achieve the above object, civil aviaton's security public sentiment sentiment analysis method packet provided by the invention based on deep learning Include the following steps carried out in order:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and Corresponding threat intensity value constitutes keywords database;
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set, Pretreatment operation and word segmentation processing are carried out to the microblogging text in training set;Each microblogging text is by least one microblogging clause structure At label is divided into threat and without two kinds of threat;
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in combined depth Full articulamentum and softmax layers are added after practising network, collectively form combined deep learning classification model;
(5) in the term vector model for obtaining the microblogging text input after being segmented in training set to step (3), by microblogging text This vectorization;
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth It practises in model, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and protect It deposits
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through The term vector model that step (3) obtains carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition and threatens Classify in grader, is finally directed to and is judged to having the microblogging text of threat further to calculate prestige according to sentiment dictionary and rule Stress score value;
(8) Threat grade is judged according to above-mentioned Threat score value.
In step (1), the keyword is divided into two class of place word and behavior word;Wherein there are two behavior words Attribute, first attribute are to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1, 3,5,7,9 5 kinds of intensity;Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this One word just can determine that have threat to civil aviaton;Another kind of is indirect-type, i.e., must simultaneously occur just sentencing with place word Whether make has threat to civil aviaton's security.
In step (2), the method to microblogging text the progress pretreatment operation and word segmentation processing in training set It is:Pretreatment operation include remove microblogging text in web page interlinkage, forwarding, reply microblogging when user's pet name, spcial character Noise information inside retains without the topic label threatened in microblogging text, micro- comprising civil aviaton's public sentiment keyword as distinguishing Blog article is originally the subjective important feature for threatening speech or news topic;Then utilize participle tool to above-mentioned pretreated micro- Blog article is originally segmented.
In step (3), term vector training is preserved and is used using the Skip_gram methods in word2vec algorithms The term vector model that this method is trained.
In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in group Full articulamentum and softmax layers, the method for collectively forming combined deep learning classification model is added after closing deep learning network It is:Convolution operation is carried out using the sentence matrix in different convolution kernel and input layer;By the feature under same size convolution kernel Value is stitched together in chronological order, as the input of long memory network in short-term, is further obtained by long memory network in short-term micro- The context relation feature of blog article sheet;Full articulamentum obtains the score vector of label after nonlinear transformation;When obtaining for label Divide vector after softmax layers, class probability can be calculated, finally obtain the classification of classification.
In step (5), the method by microblogging text vector is:Microblogging text is found in term vector model The corresponding term vector of each word, is then spliced into sentence matrix by term vector.
In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are defeated Enter and be trained in the combined deep learning classification model obtained to step (4), trained model is known as literary whether there is or not threatening The method of this grader is:Combined deep learning classification model intersects entropy function not using SGD optimisation techniques and by minimum Disconnected update training weight executes dropout to the parameter in full articulamentum and operates to prevent model over-fitting, according to training set The suitable mini-batch sizes of scale selection finally preserve the disaggregated model that training obtains, for directly to input Microblogging text classify.
In step (7), the method for the calculating Threat score is:
1) emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated; And above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause;
3) according to civil aviaton's security public sentiment keywords database, the behavior for calculating microblogging text threatens score;
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text Threat score.
It is described to judge that the method for Threat grade is according to Threat score value in step (8):It is obtained according to step (7) The Threat score value obtained, high, medium and low threat level is divided into using threshold method by the microblogging text for having threat.
Civil aviaton's microblogging security public sentiment sentiment analysis method provided by the invention based on deep learning has the following advantages: (1) present invention filters out roughly the microblogging text of threat first with text classifier, then utilizes civil aviaton's public sentiment keywords database Threat score value is calculated with syntactic rule, improves efficiency and accuracy rate.(2) when training text grader using CNN and The neural network of LSTM combinations avoids a large amount of manual features engineering and makes Text character extraction more comprehensively, and classification is accurate True rate higher.
Description of the drawings
Fig. 1 is civil aviaton's microblogging security public sentiment sentiment analysis method flow diagram provided by the invention based on deep learning.
Fig. 2 is combined deep learning model overall structure figure in the present invention.
Specific implementation mode
Civil aviaton's microblogging security public sentiment based on deep learning to provided by the invention in the following with reference to the drawings and specific embodiments Sentiment analysis method is described in detail.
As depicted in figs. 1 and 2, civil aviaton's security microblogging public sentiment sentiment analysis method provided by the invention includes carrying out in order The following steps:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and Corresponding threat intensity value constitutes keywords database;
Keyword is divided into two class of place word and behavior word.Wherein place word includes airport, runway, terminal, boat Class etc., behavior word include that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke;Wherein attribute there are two behavior words, First attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5, 7,9 5 kinds of intensity.Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this Word just can determine that there is a threat to civil aviaton, such as aircraft bombing, airplane hijacking, despot's machine, empty noisy etc.;Another kind of is indirect-type, i.e., must be with Place word occurs just determining simultaneously whether have threat to civil aviaton's security, such as has a fist fight, protests, smokes.Between only existing When direct type behavior word, it is not enough to judge that it has threat to civil aviaton's security.
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keywords database and corresponding label as training Collection carries out pretreatment operation and word segmentation processing to the microblogging text in training set;Each microblogging text is by least one microblogging Sentence is constituted, and label is divided into threat and without two kinds of threat;
Pretreatment operation is carried out to the microblogging text in training set, to remove unrelated with emotional expression in microblogging text make an uproar Acoustic intelligence, such as:1) web page interlinkage, shaped like " http://t.cn/ECj0WN " etc., due to not including useful information, pre- It is removed when processing.2) user's pet name when forwarding, reply microblogging, spcial character etc., shaped like " Li Qiong:It is old to reply@sketch craftsman King:This thing return insane asylum pipe uncle police no matter ", wherein the microblog users name after@symbols needs to remove.Pay attention to retaining nothing The topic label in text is threatened, using whether there is or not the important features for threatening microblogging text classifier as training.Participle is utilized later Tool segments above-mentioned each microblogging clause by pretreated microblogging text and obtains N number of word, segments work Tool is increased income participle tool jieba using Python.The corresponding label of microblogging text is by the way of manually marking.
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
When carrying out microblog text affective analysis using the method for deep learning, computer is unable to Direct Recognition Chinese character, Therefore it needs to carry out being re-used as being trained in training data feeding combined deep learning network after vectorization by microblogging text. Word2Vec algorithms can while capturing language ambience information compressed data scale.The word2vec tools that Google provides contain CBOW and Skip_gram bilingual models, both models include input layer, projection layer and output layer.CBOW models are logical Context is crossed to predict current term;Skip_gram models then predict its context window inner side word by current term. The present invention uses the word2vec tools under gensim in Python kits, using Skip_gram models.Setting training word to It is 5 to measure window size, and term vector dimension is 300.
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in combined depth Full articulamentum and softmax layers are added after practising network, collectively form combined deep learning classification model;
Convolutional neural networks (CNN) in combined deep learning network (C-LSTM) can effectively obtain word and word it Between relationship, have the function of abstract local feature;Long memory network (LSTM) in short-term may learn the information relied on for a long time, Obtain syntactic feature related with word order.Concrete operations are:After convolution kernel and sentence matrix convolution operate, by same size rolls Feature under product core is stitched together in chronological order, the input as LSTM.Convolution kernel window size is respectively 3,4,5, each Each 100 of size convolution kernel setting, the structure chart of C-LSTM is as shown in Figure 2.Full articulamentum exports combined deep learning network Vector carry out nonlinear transformation, here nonlinear function use ReLU functions.Softmax layers by combined deep learning network Output is mapped in (0,1) section to classify, and the activation primitive of this layer uses sigmoid.
(5) in the term vector model for obtaining the microblogging text input after being segmented in training set to step (3), by microblogging text This vectorization;
Assuming that a certain microblogging clause includes N number of word after participle, the term vector dimension of each word is d, then whole A microblogging clause representation is:
Wherein,Indicate the concatenation in row vector direction.Therefore, sentence matrix X ∈ R are obtained after term vector splicingd×N, Input layer as combined deep learning network.
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth It practises in model, text classifier is threatened simultaneously the presence or absence of in the combined deep learning network in training combination deep learning disaggregated model It preserves;
C-LSTM can be divided into word feature acquisition and syntactic feature obtains two parts.Word feature acquisition is by CNN What convolutional layer was completed, the term vector matrix by setting three various sizes of convolution kernels and microblogging text carries out one-dimensional convolution behaviour Make, to extract feature, multigroup different characteristic pattern to be obtained after convolution operation.Convolution operation can be expressed as:
Wherein, cijIndicate the characteristic pattern that i-th of convolution kernel generates at j-th of word;F is a nonlinear transformation letter Number, the present invention select ReLU functions;XjIndicate matrix of the dimension from j to from convolution window size;xjIndicate the jth of sentence matrix X The term vector of a word;The inner product operation of representing matrix;It indicates in window dwinIn the range of parameter matrix; biIndicate the bias of i-th of convolution kernel;H indicates the number of parameter in convolution kernel, is one of important hyper parameter of CNN, we Method selects 100.According to the difference to feature both ends convolution operation mode, convolution window has narrow and the two different sides of wide type The dimension of formula, corresponding convolution results is respectively dm-dwin+ 1 and dm+dwin+ 1, all convolution operations of the present invention are all made of wide convolution. If each size convolution kernel respectively chooses n characteristic pattern, the total characteristic figure under certain size may be expressed as:
Wherein, W is the new feature vector generated in position j convolution by n convolution kernel, i.e., obtains each convolution operation To numerical value be stitched together in order, these new feature vectors generated by CNN have obtained the high-level characteristic of word, by this A little features are stitched together in order still maintains the word order of atomic sentence, the input as LSTM.
RNN is a kind of time recurrent neural network, can describe dynamic time behavior, and state is followed in own net Ring recurrence can receive wider time series structure input.But simple RNN is because can not handle with recurrence, power The problem of exponential explosion of weight or disappearance, it is difficult to long term time association is captured, and LSTM uses a mnemon (memory Cell the hidden layer in RNN) is replaced, can well solve the problem.Each word is in order into networking in one clause Network is considered as the input of different time sequence, and LSTM can capture long-term association in time, i.e., between word apart from each other Syntax is associated with, and obtains syntactic feature.
LSTM is by input gate it, out gate ot, forget door ftAnd memory cell ctComposition, these doors be used for determine how Update current memory cell ctWith current hidden layer vector ht, current hidden layer vector htIn include the output of all LSTM cells, profit With the language description LSTM of formalization:
it=σ (Wi·[ht-1,xt]+bi) (5)
ft=σ (Wf·[ht-1,xt]+bf) (6)
qt=tanh (Wq·[ht-1,xt]+bq) (7)
ot=σ (Wo·[ht-1,xt]+bo) (8)
Wherein, ht-1For last moment hidden layer vector;xtFor current time input vector;σ is logistic sigmoid letters Number, the function make the output valve of object function between [0,1];Tanh makes the output valve of object function between [- 1,1]; It is dot product operation.
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through The term vector model of step (3) carries out text vector, is then input in the presence or absence of above-mentioned threat taxonomy device and classifies, most It is directed to afterwards and is judged to having the microblogging text of threat further to calculate Threat score according to sentiment dictionary and rule;
1) the emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
The word that the microblogging text for having threat obtains after participle is matched with sentiment dictionary, and if it exists, then select It is taken as emotion word;If word does not appear in sentiment dictionary, emotion word is determined with the method for semantic similarity. In order to reduce operand, noun, verb and adjective alternately emotion word are only remained.The present invention utilizes Hownet semanteme phase Like degree algorithm as benchmark algorithm, there is good effect on weighing two Words similarities.Specific method is for two A word w1And w2If word w1There are the n senses of a dictionary entry or concept:x1,x2…,xn, word w2There are the m senses of a dictionary entry or concept:y1,y2…, ym, it is specified that word w1And w2Similarity be each senses of a dictionary entry or concept similarity maximum value, i.e.,:
Two former calculating formula of similarity of justice are:
Wherein, λ is positive variable element;d(x1,y2) indicate adopted original x1With adopted original y2Distance in hierarchical tree.
For any one word, can be obtained by calculating the similarity in the word and sentiment dictionary between seed word Its Sentiment orientation value is obtained, computational methods are:Each seed word in word w and positive emotion dictionary is pressed into formula (11) and formula (12) It carries out similarity calculation and obtains the similarity of the word and front seed word, then will be each in word w and negative emotion dictionary Seed word carries out similarity calculation and obtains the similarity of the word and negative seed word, by comparing the inequality between them Value, finally obtains the Sentiment orientation value of word w, calculation formula is as follows:
Wherein, piIndicate a certain positive emotion seed word, njIndicate a certain negative emotion seed word;Sentiment orientation value SwValue range be (- 1,1).Given threshold T, by calculated Sentiment orientation value SwIt is compared with threshold value T, to judge word Whether language w belongs to emotion word.When | Sw| when > T, judgement word w is emotion word.The intensity of the emotion word is set to 10 Sw, to be consistent with the level of intensity in sentiment dictionary.
If in microblogging clause including emotion word, and occur belonging to the negative word in negative dictionary or modification before it When modification word in dictionary, the emotion score Sa of microblogging clause is calculated by following several situations:
A) degree adverb+emotion word, emotion word intensity change with degree adverb intensity, and emotion is scored at:
Sa=Ma·ps·pa (14)
B) polarity of negative word+emotion word, emotion word changes according to the number of negative word, and emotion is scored at:
Sa=(- 1)n·ps·pa (15)
C) degree adverb+negative word+emotion word, the reversion of emotion word polarity, and intensity changes with degree adverb intensity, feelings Sense is scored at:
Sa=(- 1) Ma·ps·pa (16)
D) negative word+degree adverb+emotion word, before appearing in degree adverb due to negative, after the reversion of emotion word polarity, Emotion word intensity more directly negates to be weakened, and introduces the first weight factor z1=0.5, emotion is scored at:
Sa=(- 1) Ma·ps·pa·z1 (17)
Wherein, ps indicates that the intensity of emotion word, pa indicate emotion word polarity, MaIndicate the intensity of degree adverb.
If belonging to compound sentence comprising the adversative conjunction in conjunction dictionary, microblogging clause in microblogging clause, it is contemplated that between sentence Feeling polarities transfer, the emotion score of microblogging clause is calculated by following several situations:
A) turning relation:When occur in microblogging clause " still ", " however " etc. semantic reversion vocabulary when, previous microblogging clause Polarity will change, the integral polarity of the two microbloggings clause will be identical as the latter microblogging clause, introduce second power Repeated factor z2=-1, emotion is scored at:
Sen=z2Sen1+Sen2 (18)
B) progressive relationship:Former and later two microbloggings clause's polarity is identical, intensity enhancing, introduces third weight factor z3=1.5, Emotion is scored at:
Sen=z3(Sen1+Sen2) (19)
C) concession relationship:The polarity of the latter microblogging clause can invert, the polarity of whole sentence and previous microblogging clause phase Together, the 4th weight factor z is introduced4=-1, emotion is scored at:
Sen=Sen1+z4Sen2 (20)
Wherein, Sen1Indicate the emotion score of previous microblogging clause, Sen2Indicate the emotion score of latter microblogging clause;
2) extraction has the emoticon threatened in microblogging text, calculates the emoticon score of each microblogging clause;
A large amount of emoticon is provided in Sina weibo, by can brightly be indicated using emoticon in microblogging Go out the Sentiment orientation of the microblogging text.Using emoticon as a weighted term of emotion score value, for whole microblogging text Sentiment orientation judgement has certain correcting action.According to emoticon dictionary, all emoticons in the microblogging text are found Polarity and intensity, and record the number of each emoticon;Enable NiFor the number of i-th of emoticon, eiFor the emoticon Intensity, piFor the polarity of the emoticon, then the calculation formula of the emoticon score in microblogging text is:
The emotion score of above-mentioned microblogging text and emoticon score are weighted summation, you can obtain each microblogging text This emotion score value, formula are as follows:
S1=α scoreemo+β·scoretext (22)
Wherein, α, β are adjustable weights, and value range is (0,1), and alpha+beta=1 can be selected by the verification of cross-beta collection Adjustable weights α, β when correct class probability maximum;scoretextIt is each microblogging clause's for the emotion score of the microblogging text The average value of emotion score.
3) according to above-mentioned steps (1) structure and the relevant keywords database of civil aviaton's security public sentiment, the row of microblogging text is calculated To threaten score;
As described above, the keyword in keywords database is divided into two class of place word and behavior word.Wherein place word packet Include airport, runway, terminal, flight etc., behavior word includes that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke;Its Middle behavior word is to threaten intensity value there are two attribute, first attribute, has weighed threat degree of the word to civil aviaton's security, Module is divided into 1,3,5,7,9 five kinds of intensity, consistent with the strength metric of emotion word.Second attribute is type of word, Type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs and just can determine that have threat to civil aviaton, such as is fried Machine, airplane hijacking, despot's machine, empty noisy etc.;Whether another kind of is indirect-type, i.e., must occur just determining simultaneously with place word to the people Boat security has threat, such as has a fist fight, protests, smokes.When only existing indirect-type behavior word, it is not enough to judge that it pacifies civil aviaton Possess threat.
Behavior threatens score S2The calculating process of < L, B > are as follows:The behavior word B in microblogging text is searched, then root The type of behavior word is judged according to keywords database;When behavior word is Direct-type, behavior threatens score S2< L, B >'s Value takes the intensity of behavior word;When behavior word is indirect-type, judge whether exist simultaneously place in the microblogging text Word, if existed simultaneously, behavior threatens score S2The value of < L, B > take the intensity of behavior word, if asynchronously deposited In behavior threat score S2< L, B > are 0.
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text This Threat score value;
Shown in the calculation formula of Threat score value such as formula (23):
Wherein, D indicates Threat score value, and range is between [- 10,10];S1Indicate the emotion score value of microblogging text;S2< L, B > are that behavior threatens score, L to indicate that place word, B indicate behavior word;
(8) Threat grade is judged according to above-mentioned Threat score value;
The microblogging text for having threat is divided by high, medium and low threat level using threshold method, it is specific as follows:
1) it is low Threat when -4.5≤D≤0.
2) it is medium Threat when -7≤D < -4.5.
3) it is high Threat when -10≤D < -7.
Table 1 is listed certain microblogging texts are handled according to the method for the present invention after obtained Threat score value and threat Spend grade.As can be seen from the table, the method for the present invention can accurately determine whether microblogging text has safety of civil aviation It threatens.
The Threat of 1 microblogging text of table judges result

Claims (9)

1. a kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning, it is characterised in that:It includes in order The following steps of progress:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and correspondence Threat intensity value constitute keywords database;
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set, to instruction Practice the microblogging text concentrated and carries out pretreatment operation and word segmentation processing;Each microblogging text is made of at least one microblogging clause, Label is divided into threat and without two kinds of threat;
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn net in combined depth Full articulamentum and softmax layers are added after network, collectively form combined deep learning classification model;
(5) by the term vector model that is obtained to step (3) of microblogging text input after being segmented in training set, by microblogging text to Quantization;
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth and learn mould In type, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and preserve
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through step (3) the term vector model obtained carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition threat taxonomy Classify in device, is finally directed to and is judged to having the microblogging text of threat further to calculate Threat according to sentiment dictionary and rule Score value;
(8) Threat grade is judged according to above-mentioned Threat score value.
2. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (1), the keyword is divided into two class of place word and behavior word;Wherein attribute there are two behavior words, first A attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5,7,9 five Kind intensity;Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs It just can determine that have threat to civil aviaton;Another kind of is indirect-type, i.e., must simultaneously occur capable of just determining with place word whether There is threat to civil aviaton's security.
3. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (2), the microblogging text in training set, which carries out pretreatment operation and the method for word segmentation processing, is:Pretreatment behaviour The noise letter including user's pet name, spcial character when making including removing the web page interlinkage in microblogging text, forwarding, reply microblogging Breath retains without the topic label threatened in microblogging text, is subjective as the microblogging text comprising civil aviaton's public sentiment keyword is distinguished Threaten the important feature of speech or news topic;Then above-mentioned pretreated microblogging text is divided using participle tool Word.
4. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (3), using the Skip_gram methods in word2vec algorithms, preservation is trained with this method for term vector training Obtained term vector model.
5. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn in combined depth Full articulamentum and softmax layers are added after network, the method for collectively forming combined deep learning classification model is:Utilize difference Convolution kernel and input layer in sentence matrix carry out convolution operation;In chronological order by the characteristic value under same size convolution kernel It is stitched together, as the input for growing memory network in short-term, the upper and lower of microblogging text is further obtained by long memory network in short-term Literary relationship characteristic;Full articulamentum obtains the score vector of label after nonlinear transformation;When the score vector of label passes through After softmax layers, class probability can be calculated, finally obtains the classification of classification.
6. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (5), the method by microblogging text vector is:The each word pair of microblogging text is found in term vector model Then term vector is spliced into sentence matrix by the term vector answered.
7. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are input to step (4) it is trained in the combined deep learning classification model obtained, by trained model, referred to as whether there is or not threaten text classifier Method be:Combined deep learning classification model intersects entropy function continuous renewal instruction using SGD optimisation techniques and by minimum Practice weight, executing dropout to the parameter in full articulamentum operates to prevent model over-fitting, according to the scale selection of training set Suitable mini-batch sizes, finally preserve the disaggregated model that training obtains, for directly to the microblogging of input text This is classified.
8. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: In step (7), the method for the calculating Threat score is:
1) emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated;And it will Above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause;
3) according to civil aviaton's security public sentiment keywords database, the behavior for calculating microblogging text threatens score;
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains the prestige of microblogging text Stress score.
9. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that: It is described to judge that the method for Threat grade is according to Threat score value in step (8):The Threat obtained according to step (7) The microblogging text for having threat is divided into high, medium and low threat level by score value using threshold method.
CN201810290094.9A 2018-04-03 2018-04-03 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning Pending CN108536801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810290094.9A CN108536801A (en) 2018-04-03 2018-04-03 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810290094.9A CN108536801A (en) 2018-04-03 2018-04-03 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning

Publications (1)

Publication Number Publication Date
CN108536801A true CN108536801A (en) 2018-09-14

Family

ID=63482373

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810290094.9A Pending CN108536801A (en) 2018-04-03 2018-04-03 A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning

Country Status (1)

Country Link
CN (1) CN108536801A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325125A (en) * 2018-10-08 2019-02-12 中山大学 A kind of social networks rumour method based on CNN optimization
CN109582785A (en) * 2018-10-31 2019-04-05 天津大学 Emergency event public sentiment evolution analysis method based on text vector and machine learning
CN110083825A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of Laotian sentiment analysis method based on GRU model
CN110232109A (en) * 2019-05-17 2019-09-13 深圳市兴海物联科技有限公司 A kind of Internet public opinion analysis method and system
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN111104526A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Financial label extraction method and system based on keyword semantics
CN111523319A (en) * 2020-04-10 2020-08-11 广东海洋大学 Microblog emotion analysis method based on scene LSTM structure network
CN111767398A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Secondary equipment fault short text data classification method based on convolutional neural network
CN111967494A (en) * 2020-07-01 2020-11-20 北京工业大学 Multi-source heterogeneous data analysis method for security protection of large-scale activity public security system guard
CN112329974A (en) * 2020-09-03 2021-02-05 中国人民公安大学 LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN113688240A (en) * 2021-08-25 2021-11-23 南京中孚信息技术有限公司 Threat element extraction method, device, equipment and storage medium
CN113792118A (en) * 2021-09-08 2021-12-14 浙江力石科技股份有限公司 Satisfaction improving system and method based on scenic spot evaluation
CN115982473A (en) * 2023-03-21 2023-04-18 环球数科集团有限公司 AIGC-based public opinion analysis arrangement system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491013A (en) * 2015-11-20 2016-04-13 电子科技大学 Multi-domain network security situation perception model and method based on SDN
CN105719291A (en) * 2016-01-20 2016-06-29 江苏省沙钢钢铁研究院有限公司 Surface defect image classification system having selectable types
WO2016187472A1 (en) * 2015-05-21 2016-11-24 Baidu Usa Llc Multilingual image question answering
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN107562784A (en) * 2017-07-25 2018-01-09 同济大学 Short text classification method based on ResLCNN models

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016187472A1 (en) * 2015-05-21 2016-11-24 Baidu Usa Llc Multilingual image question answering
CN105491013A (en) * 2015-11-20 2016-04-13 电子科技大学 Multi-domain network security situation perception model and method based on SDN
CN105719291A (en) * 2016-01-20 2016-06-29 江苏省沙钢钢铁研究院有限公司 Surface defect image classification system having selectable types
CN106598944A (en) * 2016-11-25 2017-04-26 中国民航大学 Civil aviation security public opinion emotion analysis method
CN107562784A (en) * 2017-07-25 2018-01-09 同济大学 Short text classification method based on ResLCNN models

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PING HAN 等: "A Topic-Independent Hybrid Approach for Sentiment Analysis of Chinese Microblog", 《2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI)》 *
韩萍 等: "民航恐怖威胁信息预警系统的设计与实现", 《中国民航大学学报》 *

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325125B (en) * 2018-10-08 2022-06-14 中山大学 Social network rumor detection method based on CNN optimization
CN109325125A (en) * 2018-10-08 2019-02-12 中山大学 A kind of social networks rumour method based on CNN optimization
CN109582785A (en) * 2018-10-31 2019-04-05 天津大学 Emergency event public sentiment evolution analysis method based on text vector and machine learning
CN110083825A (en) * 2019-03-21 2019-08-02 昆明理工大学 A kind of Laotian sentiment analysis method based on GRU model
CN110232109A (en) * 2019-05-17 2019-09-13 深圳市兴海物联科技有限公司 A kind of Internet public opinion analysis method and system
CN110321562A (en) * 2019-06-28 2019-10-11 广州探迹科技有限公司 A kind of short text matching process and device based on BERT
CN110321562B (en) * 2019-06-28 2023-06-02 广州探迹科技有限公司 Short text matching method and device based on BERT
CN110377739A (en) * 2019-07-19 2019-10-25 出门问问(苏州)信息科技有限公司 Text sentiment classification method, readable storage medium storing program for executing and electronic equipment
CN111104526A (en) * 2019-11-21 2020-05-05 新华智云科技有限公司 Financial label extraction method and system based on keyword semantics
CN111523319A (en) * 2020-04-10 2020-08-11 广东海洋大学 Microblog emotion analysis method based on scene LSTM structure network
CN111523319B (en) * 2020-04-10 2023-06-30 广东海洋大学 Microblog emotion analysis method based on scene LSTM structure network
CN111767398A (en) * 2020-06-30 2020-10-13 国网新疆电力有限公司电力科学研究院 Secondary equipment fault short text data classification method based on convolutional neural network
CN111967494A (en) * 2020-07-01 2020-11-20 北京工业大学 Multi-source heterogeneous data analysis method for security protection of large-scale activity public security system guard
CN111967494B (en) * 2020-07-01 2024-03-26 北京工业大学 Multi-source heterogeneous data analysis method for guard security of large movable public security system
CN112329974A (en) * 2020-09-03 2021-02-05 中国人民公安大学 LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN112329974B (en) * 2020-09-03 2024-02-27 中国人民公安大学 LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN113688240A (en) * 2021-08-25 2021-11-23 南京中孚信息技术有限公司 Threat element extraction method, device, equipment and storage medium
CN113688240B (en) * 2021-08-25 2024-01-30 南京中孚信息技术有限公司 Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium
CN113792118A (en) * 2021-09-08 2021-12-14 浙江力石科技股份有限公司 Satisfaction improving system and method based on scenic spot evaluation
CN115982473A (en) * 2023-03-21 2023-04-18 环球数科集团有限公司 AIGC-based public opinion analysis arrangement system

Similar Documents

Publication Publication Date Title
CN108536801A (en) A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning
CN110188192B (en) Multi-task network construction and multi-scale criminal name law enforcement combined prediction method
Balwant Bidirectional LSTM based on POS tags and CNN architecture for fake news detection
CN106598944A (en) Civil aviation security public opinion emotion analysis method
CN109726745B (en) Target-based emotion classification method integrating description knowledge
CN106202372A (en) A kind of method of network text information emotional semantic classification
KR20190063978A (en) Automatic classification method of unstructured data
Zhao et al. ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN
Rashid et al. Feature level opinion mining of educational student feedback data using sequential pattern mining and association rule mining
CN106682089A (en) RNNs-based method for automatic safety checking of short message
Chowdhury et al. A comparative analysis of word embedding representations in authorship attribution of bengali literature
Akhter et al. Cyber bullying detection and classification using multinomial Naïve Bayes and fuzzy logic
Gangadharan et al. Paraphrase detection using deep neural network based word embedding techniques
CN110297986A (en) A kind of Sentiment orientation analysis method of hot microblog topic
Gao et al. Sentiment classification for stock news
Lim et al. Examining machine learning techniques in business news headline sentiment analysis
Sajeevan et al. An enhanced approach for movie review analysis using deep learning techniques
Wang et al. YNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language
Hassan et al. Reviews Sentiment analysis for collaborative recommender system
CN117291190A (en) User demand calculation method based on emotion dictionary and LDA topic model
Krungklang et al. An analysis of natural language text relating to thai criminal law
Kavatagi et al. A context aware embedding for the detection of hate speech in social media networks
Zhu et al. Attention based BiLSTM-MCNN for sentiment analysis
Dutta et al. Fake news prediction: a survey
Lee et al. A two-level recurrent neural network language model based on the continuous Bag-of-Words model for sentence classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180914

RJ01 Rejection of invention patent application after publication