CN108536801A

CN108536801A - A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning

Info

Publication number: CN108536801A
Application number: CN201810290094.9A
Authority: CN
Inventors: 韩萍; 孙佳慧; 方澄; 贾云飞
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2018-04-03
Filing date: 2018-04-03
Publication date: 2018-09-14

Abstract

A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning.It includes the following steps：The content of text concentrated to microblog data is pre-processed and is segmented；Training term vector；The deep learning network C LSTM of combination is built, grader of the training based on the network threatens content to classify to whether containing civil aviaton's security in microblogging text；Marking is refined for the microblogging text for having threat, evaluates its Threat grade.The present invention trains grader first with the method based on deep learning, filters out the negative speech of subjectivity in relation to civil aviaton, the microblogging for removing objective speech such as news, stating the fact roughly；Civil aviaton's public sentiment key words and rule are recycled to calculate and divide Threat grade.Solve the problems, such as there is that applicability is stronger, accuracy rate higher in the method based on word allusion quotation and rule because the objective speech containing civil aviaton's public sentiment key words is judged to high Threat grade.

Description

A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning

Technical field

The invention belongs to the text emotion analysis technical fields in natural language processing, more particularly to one kind being based on depth Civil aviaton's microblogging security public sentiment sentiment analysis method of habit.

Background technology

In recent years, all kinds of flights, airport bomb threat and false terrified information are in the impetus occurred frequently on internet.Certain groups It is many often to threaten speech, rumour and extreme language etc. in Web realease falseness because discontented to society, when facing the threat of terrorism, Compared to other industries, Civil Aviation Industry is generally subjected to more major injury.Microblogging because with spread speed fast, information opening, influence extensively and The characteristics of publisher's identity is not easy to reveal is a kind of common terrified information propagating pathway.

Sentiment analysis is the process that the text with emotional color is handled, analyzed and applied, it can be from text User Perspective and Sentiment orientation are obtained in data, there is important practical value.The research method and technological means of sentiment analysis Usually related with goal in research, generally speaking, existing achievement in research is applied to product scope or the analysis of public opinion field, such as public more Evaluation to product or event and opinion.Sentiment analysis method for civil aviaton's terror information is extremely rare, by pair and civil aviaton Related microblogging text carries out sentiment analysis, can filter out the microblogging for having threat to safety of civil aviation, to which locking has crime to incline To emphasis user.

Currently, Chinese text sentiment analysis method mainly has based on semantic understanding and learns two class sides based on conventional machines Method.But both methods is applied to be primarily present problems in microblog emotional analysis：1. the method structure based on semantic understanding It builds the method that benchmark passes judgement on word library and defines display rule and pattern match is carried out to language material, for expression way complexity, do not advise There is significant limitation in microblogging text-processing then.2. the method based on conventional machines study needs complicated Feature Engineering, Expend a large amount of cost of labor.

Invention content

To solve the above-mentioned problems, the purpose of the present invention is to provide a kind of civil aviaton's microblogging security carriage based on deep learning Feelings sentiment analysis method.

In order to achieve the above object, civil aviaton's security public sentiment sentiment analysis method packet provided by the invention based on deep learning Include the following steps carried out in order：

(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and Corresponding threat intensity value constitutes keywords database；

(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set, Pretreatment operation and word segmentation processing are carried out to the microblogging text in training set；Each microblogging text is by least one microblogging clause structure At label is divided into threat and without two kinds of threat；

(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model；

(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in combined depth Full articulamentum and softmax layers are added after practising network, collectively form combined deep learning classification model；

(5) in the term vector model for obtaining the microblogging text input after being segmented in training set to step (3), by microblogging text This vectorization；

(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth It practises in model, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and protect It deposits

(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through The term vector model that step (3) obtains carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition and threatens Classify in grader, is finally directed to and is judged to having the microblogging text of threat further to calculate prestige according to sentiment dictionary and rule Stress score value；

(8) Threat grade is judged according to above-mentioned Threat score value.

In step (1), the keyword is divided into two class of place word and behavior word；Wherein there are two behavior words Attribute, first attribute are to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1, 3,5,7,9 5 kinds of intensity；Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this One word just can determine that have threat to civil aviaton；Another kind of is indirect-type, i.e., must simultaneously occur just sentencing with place word Whether make has threat to civil aviaton's security.

In step (2), the method to microblogging text the progress pretreatment operation and word segmentation processing in training set It is：Pretreatment operation include remove microblogging text in web page interlinkage, forwarding, reply microblogging when user's pet name, spcial character Noise information inside retains without the topic label threatened in microblogging text, micro- comprising civil aviaton's public sentiment keyword as distinguishing Blog article is originally the subjective important feature for threatening speech or news topic；Then utilize participle tool to above-mentioned pretreated micro- Blog article is originally segmented.

In step (3), term vector training is preserved and is used using the Skip_gram methods in word2vec algorithms The term vector model that this method is trained.

In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in group Full articulamentum and softmax layers, the method for collectively forming combined deep learning classification model is added after closing deep learning network It is：Convolution operation is carried out using the sentence matrix in different convolution kernel and input layer；By the feature under same size convolution kernel Value is stitched together in chronological order, as the input of long memory network in short-term, is further obtained by long memory network in short-term micro- The context relation feature of blog article sheet；Full articulamentum obtains the score vector of label after nonlinear transformation；When obtaining for label Divide vector after softmax layers, class probability can be calculated, finally obtain the classification of classification.

In step (5), the method by microblogging text vector is：Microblogging text is found in term vector model The corresponding term vector of each word, is then spliced into sentence matrix by term vector.

In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are defeated Enter and be trained in the combined deep learning classification model obtained to step (4), trained model is known as literary whether there is or not threatening The method of this grader is：Combined deep learning classification model intersects entropy function not using SGD optimisation techniques and by minimum Disconnected update training weight executes dropout to the parameter in full articulamentum and operates to prevent model over-fitting, according to training set The suitable mini-batch sizes of scale selection finally preserve the disaggregated model that training obtains, for directly to input Microblogging text classify.

In step (7), the method for the calculating Threat score is：

1) emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated；

2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated； And above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause；

3) according to civil aviaton's security public sentiment keywords database, the behavior for calculating microblogging text threatens score；

4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text Threat score.

It is described to judge that the method for Threat grade is according to Threat score value in step (8)：It is obtained according to step (7) The Threat score value obtained, high, medium and low threat level is divided into using threshold method by the microblogging text for having threat.

Civil aviaton's microblogging security public sentiment sentiment analysis method provided by the invention based on deep learning has the following advantages： (1) present invention filters out roughly the microblogging text of threat first with text classifier, then utilizes civil aviaton's public sentiment keywords database Threat score value is calculated with syntactic rule, improves efficiency and accuracy rate.(2) when training text grader using CNN and The neural network of LSTM combinations avoids a large amount of manual features engineering and makes Text character extraction more comprehensively, and classification is accurate True rate higher.

Description of the drawings

Fig. 1 is civil aviaton's microblogging security public sentiment sentiment analysis method flow diagram provided by the invention based on deep learning.

Fig. 2 is combined deep learning model overall structure figure in the present invention.

Specific implementation mode

Civil aviaton's microblogging security public sentiment based on deep learning to provided by the invention in the following with reference to the drawings and specific embodiments Sentiment analysis method is described in detail.

As depicted in figs. 1 and 2, civil aviaton's security microblogging public sentiment sentiment analysis method provided by the invention includes carrying out in order The following steps：

Keyword is divided into two class of place word and behavior word.Wherein place word includes airport, runway, terminal, boat Class etc., behavior word include that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke；Wherein attribute there are two behavior words, First attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5, 7,9 5 kinds of intensity.Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this Word just can determine that there is a threat to civil aviaton, such as aircraft bombing, airplane hijacking, despot's machine, empty noisy etc.；Another kind of is indirect-type, i.e., must be with Place word occurs just determining simultaneously whether have threat to civil aviaton's security, such as has a fist fight, protests, smokes.Between only existing When direct type behavior word, it is not enough to judge that it has threat to civil aviaton's security.

(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keywords database and corresponding label as training Collection carries out pretreatment operation and word segmentation processing to the microblogging text in training set；Each microblogging text is by least one microblogging Sentence is constituted, and label is divided into threat and without two kinds of threat；

Pretreatment operation is carried out to the microblogging text in training set, to remove unrelated with emotional expression in microblogging text make an uproar Acoustic intelligence, such as：1) web page interlinkage, shaped like " http://t.cn/ECj0WN " etc., due to not including useful information, pre- It is removed when processing.2) user's pet name when forwarding, reply microblogging, spcial character etc., shaped like " Li Qiong:It is old to reply@sketch craftsman King:This thing return insane asylum pipe uncle police no matter ", wherein the microblog users name after@symbols needs to remove.Pay attention to retaining nothing The topic label in text is threatened, using whether there is or not the important features for threatening microblogging text classifier as training.Participle is utilized later Tool segments above-mentioned each microblogging clause by pretreated microblogging text and obtains N number of word, segments work Tool is increased income participle tool jieba using Python.The corresponding label of microblogging text is by the way of manually marking.

When carrying out microblog text affective analysis using the method for deep learning, computer is unable to Direct Recognition Chinese character, Therefore it needs to carry out being re-used as being trained in training data feeding combined deep learning network after vectorization by microblogging text. Word2Vec algorithms can while capturing language ambience information compressed data scale.The word2vec tools that Google provides contain CBOW and Skip_gram bilingual models, both models include input layer, projection layer and output layer.CBOW models are logical Context is crossed to predict current term；Skip_gram models then predict its context window inner side word by current term. The present invention uses the word2vec tools under gensim in Python kits, using Skip_gram models.Setting training word to It is 5 to measure window size, and term vector dimension is 300.

Convolutional neural networks (CNN) in combined deep learning network (C-LSTM) can effectively obtain word and word it Between relationship, have the function of abstract local feature；Long memory network (LSTM) in short-term may learn the information relied on for a long time, Obtain syntactic feature related with word order.Concrete operations are：After convolution kernel and sentence matrix convolution operate, by same size rolls Feature under product core is stitched together in chronological order, the input as LSTM.Convolution kernel window size is respectively 3,4,5, each Each 100 of size convolution kernel setting, the structure chart of C-LSTM is as shown in Figure 2.Full articulamentum exports combined deep learning network Vector carry out nonlinear transformation, here nonlinear function use ReLU functions.Softmax layers by combined deep learning network Output is mapped in (0,1) section to classify, and the activation primitive of this layer uses sigmoid.

Assuming that a certain microblogging clause includes N number of word after participle, the term vector dimension of each word is d, then whole A microblogging clause representation is：

Wherein,Indicate the concatenation in row vector direction.Therefore, sentence matrix X ∈ R are obtained after term vector splicing^d×N, Input layer as combined deep learning network.

(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth It practises in model, text classifier is threatened simultaneously the presence or absence of in the combined deep learning network in training combination deep learning disaggregated model It preserves；

C-LSTM can be divided into word feature acquisition and syntactic feature obtains two parts.Word feature acquisition is by CNN What convolutional layer was completed, the term vector matrix by setting three various sizes of convolution kernels and microblogging text carries out one-dimensional convolution behaviour Make, to extract feature, multigroup different characteristic pattern to be obtained after convolution operation.Convolution operation can be expressed as：

Wherein, c_ijIndicate the characteristic pattern that i-th of convolution kernel generates at j-th of word；F is a nonlinear transformation letter Number, the present invention select ReLU functions；X_jIndicate matrix of the dimension from j to from convolution window size；x_jIndicate the jth of sentence matrix X The term vector of a word；The inner product operation of representing matrix；It indicates in window d_winIn the range of parameter matrix； b_iIndicate the bias of i-th of convolution kernel；H indicates the number of parameter in convolution kernel, is one of important hyper parameter of CNN, we Method selects 100.According to the difference to feature both ends convolution operation mode, convolution window has narrow and the two different sides of wide type The dimension of formula, corresponding convolution results is respectively d_m-d_win+ 1 and d_m+d_win+ 1, all convolution operations of the present invention are all made of wide convolution. If each size convolution kernel respectively chooses n characteristic pattern, the total characteristic figure under certain size may be expressed as：

Wherein, W is the new feature vector generated in position j convolution by n convolution kernel, i.e., obtains each convolution operation To numerical value be stitched together in order, these new feature vectors generated by CNN have obtained the high-level characteristic of word, by this A little features are stitched together in order still maintains the word order of atomic sentence, the input as LSTM.

RNN is a kind of time recurrent neural network, can describe dynamic time behavior, and state is followed in own net Ring recurrence can receive wider time series structure input.But simple RNN is because can not handle with recurrence, power The problem of exponential explosion of weight or disappearance, it is difficult to long term time association is captured, and LSTM uses a mnemon (memory Cell the hidden layer in RNN) is replaced, can well solve the problem.Each word is in order into networking in one clause Network is considered as the input of different time sequence, and LSTM can capture long-term association in time, i.e., between word apart from each other Syntax is associated with, and obtains syntactic feature.

LSTM is by input gate i_t, out gate o_t, forget door f_tAnd memory cell c_tComposition, these doors be used for determine how Update current memory cell c_tWith current hidden layer vector h_t, current hidden layer vector h_tIn include the output of all LSTM cells, profit With the language description LSTM of formalization：

i_t=σ (W_i·[h_t-1,x_t]+b_i) (5)

f_t=σ (W_f·[h_t-1,x_t]+b_f) (6)

q_t=tanh (W_q·[h_t-1,x_t]+b_q) (7)

o_t=σ (W_o·[h_t-1,x_t]+b_o) (8)

Wherein, h_t-1For last moment hidden layer vector；x_tFor current time input vector；σ is logistic sigmoid letters Number, the function make the output valve of object function between [0,1]；Tanh makes the output valve of object function between [- 1,1]； It is dot product operation.

(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through The term vector model of step (3) carries out text vector, is then input in the presence or absence of above-mentioned threat taxonomy device and classifies, most It is directed to afterwards and is judged to having the microblogging text of threat further to calculate Threat score according to sentiment dictionary and rule；

1) the emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated；

The word that the microblogging text for having threat obtains after participle is matched with sentiment dictionary, and if it exists, then select It is taken as emotion word；If word does not appear in sentiment dictionary, emotion word is determined with the method for semantic similarity. In order to reduce operand, noun, verb and adjective alternately emotion word are only remained.The present invention utilizes Hownet semanteme phase Like degree algorithm as benchmark algorithm, there is good effect on weighing two Words similarities.Specific method is for two A word w₁And w₂If word w₁There are the n senses of a dictionary entry or concept：x₁,x₂…,x_n, word w₂There are the m senses of a dictionary entry or concept：y₁,y₂…, y_m, it is specified that word w₁And w₂Similarity be each senses of a dictionary entry or concept similarity maximum value, i.e.,：

Two former calculating formula of similarity of justice are：

Wherein, λ is positive variable element；d(x₁,y₂) indicate adopted original x₁With adopted original y₂Distance in hierarchical tree.

For any one word, can be obtained by calculating the similarity in the word and sentiment dictionary between seed word Its Sentiment orientation value is obtained, computational methods are：Each seed word in word w and positive emotion dictionary is pressed into formula (11) and formula (12) It carries out similarity calculation and obtains the similarity of the word and front seed word, then will be each in word w and negative emotion dictionary Seed word carries out similarity calculation and obtains the similarity of the word and negative seed word, by comparing the inequality between them Value, finally obtains the Sentiment orientation value of word w, calculation formula is as follows：

Wherein, p_iIndicate a certain positive emotion seed word, n_jIndicate a certain negative emotion seed word；Sentiment orientation value S_wValue range be (- 1,1).Given threshold T, by calculated Sentiment orientation value S_wIt is compared with threshold value T, to judge word Whether language w belongs to emotion word.When | S_w| when ＞ T, judgement word w is emotion word.The intensity of the emotion word is set to 10 S_w, to be consistent with the level of intensity in sentiment dictionary.

If in microblogging clause including emotion word, and occur belonging to the negative word in negative dictionary or modification before it When modification word in dictionary, the emotion score Sa of microblogging clause is calculated by following several situations：

A) degree adverb+emotion word, emotion word intensity change with degree adverb intensity, and emotion is scored at：

Sa=M_a·ps·pa (14)

B) polarity of negative word+emotion word, emotion word changes according to the number of negative word, and emotion is scored at：

Sa=(- 1)ⁿ·ps·pa (15)

C) degree adverb+negative word+emotion word, the reversion of emotion word polarity, and intensity changes with degree adverb intensity, feelings Sense is scored at：

Sa=(- 1) M_a·ps·pa (16)

D) negative word+degree adverb+emotion word, before appearing in degree adverb due to negative, after the reversion of emotion word polarity, Emotion word intensity more directly negates to be weakened, and introduces the first weight factor z₁=0.5, emotion is scored at：

Sa=(- 1) M_a·ps·pa·z₁ (17)

Wherein, ps indicates that the intensity of emotion word, pa indicate emotion word polarity, M_aIndicate the intensity of degree adverb.

If belonging to compound sentence comprising the adversative conjunction in conjunction dictionary, microblogging clause in microblogging clause, it is contemplated that between sentence Feeling polarities transfer, the emotion score of microblogging clause is calculated by following several situations：

A) turning relation：When occur in microblogging clause " still ", " however " etc. semantic reversion vocabulary when, previous microblogging clause Polarity will change, the integral polarity of the two microbloggings clause will be identical as the latter microblogging clause, introduce second power Repeated factor z₂=-1, emotion is scored at：

Sen=z₂Sen₁+Sen₂ (18)

B) progressive relationship：Former and later two microbloggings clause's polarity is identical, intensity enhancing, introduces third weight factor z₃=1.5, Emotion is scored at：

Sen=z₃(Sen₁+Sen₂) (19)

C) concession relationship：The polarity of the latter microblogging clause can invert, the polarity of whole sentence and previous microblogging clause phase Together, the 4th weight factor z is introduced₄=-1, emotion is scored at：

Sen=Sen₁+z₄Sen₂ (20)

Wherein, Sen₁Indicate the emotion score of previous microblogging clause, Sen₂Indicate the emotion score of latter microblogging clause；

2) extraction has the emoticon threatened in microblogging text, calculates the emoticon score of each microblogging clause；

A large amount of emoticon is provided in Sina weibo, by can brightly be indicated using emoticon in microblogging Go out the Sentiment orientation of the microblogging text.Using emoticon as a weighted term of emotion score value, for whole microblogging text Sentiment orientation judgement has certain correcting action.According to emoticon dictionary, all emoticons in the microblogging text are found Polarity and intensity, and record the number of each emoticon；Enable N_iFor the number of i-th of emoticon, e_iFor the emoticon Intensity, p_iFor the polarity of the emoticon, then the calculation formula of the emoticon score in microblogging text is：

The emotion score of above-mentioned microblogging text and emoticon score are weighted summation, you can obtain each microblogging text This emotion score value, formula are as follows：

S₁=α score_emo+β·score_text (22)

Wherein, α, β are adjustable weights, and value range is (0,1), and alpha+beta=1 can be selected by the verification of cross-beta collection Adjustable weights α, β when correct class probability maximum；score_textIt is each microblogging clause's for the emotion score of the microblogging text The average value of emotion score.

3) according to above-mentioned steps (1) structure and the relevant keywords database of civil aviaton's security public sentiment, the row of microblogging text is calculated To threaten score；

As described above, the keyword in keywords database is divided into two class of place word and behavior word.Wherein place word packet Include airport, runway, terminal, flight etc., behavior word includes that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke；Its Middle behavior word is to threaten intensity value there are two attribute, first attribute, has weighed threat degree of the word to civil aviaton's security, Module is divided into 1,3,5,7,9 five kinds of intensity, consistent with the strength metric of emotion word.Second attribute is type of word, Type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs and just can determine that have threat to civil aviaton, such as is fried Machine, airplane hijacking, despot's machine, empty noisy etc.；Whether another kind of is indirect-type, i.e., must occur just determining simultaneously with place word to the people Boat security has threat, such as has a fist fight, protests, smokes.When only existing indirect-type behavior word, it is not enough to judge that it pacifies civil aviaton Possess threat.

Behavior threatens score S₂The calculating process of ＜ L, B ＞ are as follows：The behavior word B in microblogging text is searched, then root The type of behavior word is judged according to keywords database；When behavior word is Direct-type, behavior threatens score S₂＜ L, B ＞'s Value takes the intensity of behavior word；When behavior word is indirect-type, judge whether exist simultaneously place in the microblogging text Word, if existed simultaneously, behavior threatens score S₂The value of ＜ L, B ＞ take the intensity of behavior word, if asynchronously deposited In behavior threat score S₂＜ L, B ＞ are 0.

4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text This Threat score value；

Shown in the calculation formula of Threat score value such as formula (23)：

Wherein, D indicates Threat score value, and range is between [- 10,10]；S₁Indicate the emotion score value of microblogging text；S₂＜ L, B ＞ are that behavior threatens score, L to indicate that place word, B indicate behavior word；

(8) Threat grade is judged according to above-mentioned Threat score value；

The microblogging text for having threat is divided by high, medium and low threat level using threshold method, it is specific as follows：

1) it is low Threat when -4.5≤D≤0.

2) it is medium Threat when -7≤D ＜ -4.5.

3) it is high Threat when -10≤D ＜ -7.

Table 1 is listed certain microblogging texts are handled according to the method for the present invention after obtained Threat score value and threat Spend grade.As can be seen from the table, the method for the present invention can accurately determine whether microblogging text has safety of civil aviation It threatens.

The Threat of 1 microblogging text of table judges result

Claims

1. a kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning, it is characterised in that：It includes in order The following steps of progress：

(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and correspondence Threat intensity value constitute keywords database；

(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set, to instruction Practice the microblogging text concentrated and carries out pretreatment operation and word segmentation processing；Each microblogging text is made of at least one microblogging clause, Label is divided into threat and without two kinds of threat；

(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn net in combined depth Full articulamentum and softmax layers are added after network, collectively form combined deep learning classification model；

(5) by the term vector model that is obtained to step (3) of microblogging text input after being segmented in training set, by microblogging text to Quantization；

(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth and learn mould In type, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and preserve

(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through step (3) the term vector model obtained carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition threat taxonomy Classify in device, is finally directed to and is judged to having the microblogging text of threat further to calculate Threat according to sentiment dictionary and rule Score value；

(8) Threat grade is judged according to above-mentioned Threat score value.

2. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (1), the keyword is divided into two class of place word and behavior word；Wherein attribute there are two behavior words, first A attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5,7,9 five Kind intensity；Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs It just can determine that have threat to civil aviaton；Another kind of is indirect-type, i.e., must simultaneously occur capable of just determining with place word whether There is threat to civil aviaton's security.

3. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (2), the microblogging text in training set, which carries out pretreatment operation and the method for word segmentation processing, is：Pretreatment behaviour The noise letter including user's pet name, spcial character when making including removing the web page interlinkage in microblogging text, forwarding, reply microblogging Breath retains without the topic label threatened in microblogging text, is subjective as the microblogging text comprising civil aviaton's public sentiment keyword is distinguished Threaten the important feature of speech or news topic；Then above-mentioned pretreated microblogging text is divided using participle tool Word.

4. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (3), using the Skip_gram methods in word2vec algorithms, preservation is trained with this method for term vector training Obtained term vector model.

5. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn in combined depth Full articulamentum and softmax layers are added after network, the method for collectively forming combined deep learning classification model is：Utilize difference Convolution kernel and input layer in sentence matrix carry out convolution operation；In chronological order by the characteristic value under same size convolution kernel It is stitched together, as the input for growing memory network in short-term, the upper and lower of microblogging text is further obtained by long memory network in short-term Literary relationship characteristic；Full articulamentum obtains the score vector of label after nonlinear transformation；When the score vector of label passes through After softmax layers, class probability can be calculated, finally obtains the classification of classification.

6. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (5), the method by microblogging text vector is：The each word pair of microblogging text is found in term vector model Then term vector is spliced into sentence matrix by the term vector answered.

7. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are input to step (4) it is trained in the combined deep learning classification model obtained, by trained model, referred to as whether there is or not threaten text classifier Method be：Combined deep learning classification model intersects entropy function continuous renewal instruction using SGD optimisation techniques and by minimum Practice weight, executing dropout to the parameter in full articulamentum operates to prevent model over-fitting, according to the scale selection of training set Suitable mini-batch sizes, finally preserve the disaggregated model that training obtains, for directly to the microblogging of input text This is classified.

8. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： In step (7), the method for the calculating Threat score is：

2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated；And it will Above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause；

4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains the prestige of microblogging text Stress score.

9. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that： It is described to judge that the method for Threat grade is according to Threat score value in step (8)：The Threat obtained according to step (7) The microblogging text for having threat is divided into high, medium and low threat level by score value using threshold method.