CN108536801A - A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning - Google Patents
A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning Download PDFInfo
- Publication number
- CN108536801A CN108536801A CN201810290094.9A CN201810290094A CN108536801A CN 108536801 A CN108536801 A CN 108536801A CN 201810290094 A CN201810290094 A CN 201810290094A CN 108536801 A CN108536801 A CN 108536801A
- Authority
- CN
- China
- Prior art keywords
- microblogging
- threat
- text
- word
- deep learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Machine Translation (AREA)
Abstract
A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning.It includes the following steps:The content of text concentrated to microblog data is pre-processed and is segmented;Training term vector;The deep learning network C LSTM of combination is built, grader of the training based on the network threatens content to classify to whether containing civil aviaton's security in microblogging text;Marking is refined for the microblogging text for having threat, evaluates its Threat grade.The present invention trains grader first with the method based on deep learning, filters out the negative speech of subjectivity in relation to civil aviaton, the microblogging for removing objective speech such as news, stating the fact roughly;Civil aviaton's public sentiment key words and rule are recycled to calculate and divide Threat grade.Solve the problems, such as there is that applicability is stronger, accuracy rate higher in the method based on word allusion quotation and rule because the objective speech containing civil aviaton's public sentiment key words is judged to high Threat grade.
Description
Technical field
The invention belongs to the text emotion analysis technical fields in natural language processing, more particularly to one kind being based on depth
Civil aviaton's microblogging security public sentiment sentiment analysis method of habit.
Background technology
In recent years, all kinds of flights, airport bomb threat and false terrified information are in the impetus occurred frequently on internet.Certain groups
It is many often to threaten speech, rumour and extreme language etc. in Web realease falseness because discontented to society, when facing the threat of terrorism,
Compared to other industries, Civil Aviation Industry is generally subjected to more major injury.Microblogging because with spread speed fast, information opening, influence extensively and
The characteristics of publisher's identity is not easy to reveal is a kind of common terrified information propagating pathway.
Sentiment analysis is the process that the text with emotional color is handled, analyzed and applied, it can be from text
User Perspective and Sentiment orientation are obtained in data, there is important practical value.The research method and technological means of sentiment analysis
Usually related with goal in research, generally speaking, existing achievement in research is applied to product scope or the analysis of public opinion field, such as public more
Evaluation to product or event and opinion.Sentiment analysis method for civil aviaton's terror information is extremely rare, by pair and civil aviaton
Related microblogging text carries out sentiment analysis, can filter out the microblogging for having threat to safety of civil aviation, to which locking has crime to incline
To emphasis user.
Currently, Chinese text sentiment analysis method mainly has based on semantic understanding and learns two class sides based on conventional machines
Method.But both methods is applied to be primarily present problems in microblog emotional analysis:1. the method structure based on semantic understanding
It builds the method that benchmark passes judgement on word library and defines display rule and pattern match is carried out to language material, for expression way complexity, do not advise
There is significant limitation in microblogging text-processing then.2. the method based on conventional machines study needs complicated Feature Engineering,
Expend a large amount of cost of labor.
Invention content
To solve the above-mentioned problems, the purpose of the present invention is to provide a kind of civil aviaton's microblogging security carriage based on deep learning
Feelings sentiment analysis method.
In order to achieve the above object, civil aviaton's security public sentiment sentiment analysis method packet provided by the invention based on deep learning
Include the following steps carried out in order:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and
Corresponding threat intensity value constitutes keywords database;
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set,
Pretreatment operation and word segmentation processing are carried out to the microblogging text in training set;Each microblogging text is by least one microblogging clause structure
At label is divided into threat and without two kinds of threat;
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in combined depth
Full articulamentum and softmax layers are added after practising network, collectively form combined deep learning classification model;
(5) in the term vector model for obtaining the microblogging text input after being segmented in training set to step (3), by microblogging text
This vectorization;
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth
It practises in model, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and protect
It deposits
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through
The term vector model that step (3) obtains carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition and threatens
Classify in grader, is finally directed to and is judged to having the microblogging text of threat further to calculate prestige according to sentiment dictionary and rule
Stress score value;
(8) Threat grade is judged according to above-mentioned Threat score value.
In step (1), the keyword is divided into two class of place word and behavior word;Wherein there are two behavior words
Attribute, first attribute are to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,
3,5,7,9 5 kinds of intensity;Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this
One word just can determine that have threat to civil aviaton;Another kind of is indirect-type, i.e., must simultaneously occur just sentencing with place word
Whether make has threat to civil aviaton's security.
In step (2), the method to microblogging text the progress pretreatment operation and word segmentation processing in training set
It is:Pretreatment operation include remove microblogging text in web page interlinkage, forwarding, reply microblogging when user's pet name, spcial character
Noise information inside retains without the topic label threatened in microblogging text, micro- comprising civil aviaton's public sentiment keyword as distinguishing
Blog article is originally the subjective important feature for threatening speech or news topic;Then utilize participle tool to above-mentioned pretreated micro-
Blog article is originally segmented.
In step (3), term vector training is preserved and is used using the Skip_gram methods in word2vec algorithms
The term vector model that this method is trained.
In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in group
Full articulamentum and softmax layers, the method for collectively forming combined deep learning classification model is added after closing deep learning network
It is:Convolution operation is carried out using the sentence matrix in different convolution kernel and input layer;By the feature under same size convolution kernel
Value is stitched together in chronological order, as the input of long memory network in short-term, is further obtained by long memory network in short-term micro-
The context relation feature of blog article sheet;Full articulamentum obtains the score vector of label after nonlinear transformation;When obtaining for label
Divide vector after softmax layers, class probability can be calculated, finally obtain the classification of classification.
In step (5), the method by microblogging text vector is:Microblogging text is found in term vector model
The corresponding term vector of each word, is then spliced into sentence matrix by term vector.
In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are defeated
Enter and be trained in the combined deep learning classification model obtained to step (4), trained model is known as literary whether there is or not threatening
The method of this grader is:Combined deep learning classification model intersects entropy function not using SGD optimisation techniques and by minimum
Disconnected update training weight executes dropout to the parameter in full articulamentum and operates to prevent model over-fitting, according to training set
The suitable mini-batch sizes of scale selection finally preserve the disaggregated model that training obtains, for directly to input
Microblogging text classify.
In step (7), the method for the calculating Threat score is:
1) emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated;
And above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause;
3) according to civil aviaton's security public sentiment keywords database, the behavior for calculating microblogging text threatens score;
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text
Threat score.
It is described to judge that the method for Threat grade is according to Threat score value in step (8):It is obtained according to step (7)
The Threat score value obtained, high, medium and low threat level is divided into using threshold method by the microblogging text for having threat.
Civil aviaton's microblogging security public sentiment sentiment analysis method provided by the invention based on deep learning has the following advantages:
(1) present invention filters out roughly the microblogging text of threat first with text classifier, then utilizes civil aviaton's public sentiment keywords database
Threat score value is calculated with syntactic rule, improves efficiency and accuracy rate.(2) when training text grader using CNN and
The neural network of LSTM combinations avoids a large amount of manual features engineering and makes Text character extraction more comprehensively, and classification is accurate
True rate higher.
Description of the drawings
Fig. 1 is civil aviaton's microblogging security public sentiment sentiment analysis method flow diagram provided by the invention based on deep learning.
Fig. 2 is combined deep learning model overall structure figure in the present invention.
Specific implementation mode
Civil aviaton's microblogging security public sentiment based on deep learning to provided by the invention in the following with reference to the drawings and specific embodiments
Sentiment analysis method is described in detail.
As depicted in figs. 1 and 2, civil aviaton's security microblogging public sentiment sentiment analysis method provided by the invention includes carrying out in order
The following steps:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and
Corresponding threat intensity value constitutes keywords database;
Keyword is divided into two class of place word and behavior word.Wherein place word includes airport, runway, terminal, boat
Class etc., behavior word include that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke;Wherein attribute there are two behavior words,
First attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5,
7,9 5 kinds of intensity.Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., only occurs this
Word just can determine that there is a threat to civil aviaton, such as aircraft bombing, airplane hijacking, despot's machine, empty noisy etc.;Another kind of is indirect-type, i.e., must be with
Place word occurs just determining simultaneously whether have threat to civil aviaton's security, such as has a fist fight, protests, smokes.Between only existing
When direct type behavior word, it is not enough to judge that it has threat to civil aviaton's security.
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keywords database and corresponding label as training
Collection carries out pretreatment operation and word segmentation processing to the microblogging text in training set;Each microblogging text is by least one microblogging
Sentence is constituted, and label is divided into threat and without two kinds of threat;
Pretreatment operation is carried out to the microblogging text in training set, to remove unrelated with emotional expression in microblogging text make an uproar
Acoustic intelligence, such as:1) web page interlinkage, shaped like " http://t.cn/ECj0WN " etc., due to not including useful information, pre-
It is removed when processing.2) user's pet name when forwarding, reply microblogging, spcial character etc., shaped like " Li Qiong:It is old to reply@sketch craftsman
King:This thing return insane asylum pipe uncle police no matter ", wherein the microblog users name after@symbols needs to remove.Pay attention to retaining nothing
The topic label in text is threatened, using whether there is or not the important features for threatening microblogging text classifier as training.Participle is utilized later
Tool segments above-mentioned each microblogging clause by pretreated microblogging text and obtains N number of word, segments work
Tool is increased income participle tool jieba using Python.The corresponding label of microblogging text is by the way of manually marking.
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
When carrying out microblog text affective analysis using the method for deep learning, computer is unable to Direct Recognition Chinese character,
Therefore it needs to carry out being re-used as being trained in training data feeding combined deep learning network after vectorization by microblogging text.
Word2Vec algorithms can while capturing language ambience information compressed data scale.The word2vec tools that Google provides contain
CBOW and Skip_gram bilingual models, both models include input layer, projection layer and output layer.CBOW models are logical
Context is crossed to predict current term;Skip_gram models then predict its context window inner side word by current term.
The present invention uses the word2vec tools under gensim in Python kits, using Skip_gram models.Setting training word to
It is 5 to measure window size, and term vector dimension is 300.
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and in combined depth
Full articulamentum and softmax layers are added after practising network, collectively form combined deep learning classification model;
Convolutional neural networks (CNN) in combined deep learning network (C-LSTM) can effectively obtain word and word it
Between relationship, have the function of abstract local feature;Long memory network (LSTM) in short-term may learn the information relied on for a long time,
Obtain syntactic feature related with word order.Concrete operations are:After convolution kernel and sentence matrix convolution operate, by same size rolls
Feature under product core is stitched together in chronological order, the input as LSTM.Convolution kernel window size is respectively 3,4,5, each
Each 100 of size convolution kernel setting, the structure chart of C-LSTM is as shown in Figure 2.Full articulamentum exports combined deep learning network
Vector carry out nonlinear transformation, here nonlinear function use ReLU functions.Softmax layers by combined deep learning network
Output is mapped in (0,1) section to classify, and the activation primitive of this layer uses sigmoid.
(5) in the term vector model for obtaining the microblogging text input after being segmented in training set to step (3), by microblogging text
This vectorization;
Assuming that a certain microblogging clause includes N number of word after participle, the term vector dimension of each word is d, then whole
A microblogging clause representation is:
Wherein,Indicate the concatenation in row vector direction.Therefore, sentence matrix X ∈ R are obtained after term vector splicingd×N,
Input layer as combined deep learning network.
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth
It practises in model, text classifier is threatened simultaneously the presence or absence of in the combined deep learning network in training combination deep learning disaggregated model
It preserves;
C-LSTM can be divided into word feature acquisition and syntactic feature obtains two parts.Word feature acquisition is by CNN
What convolutional layer was completed, the term vector matrix by setting three various sizes of convolution kernels and microblogging text carries out one-dimensional convolution behaviour
Make, to extract feature, multigroup different characteristic pattern to be obtained after convolution operation.Convolution operation can be expressed as:
Wherein, cijIndicate the characteristic pattern that i-th of convolution kernel generates at j-th of word;F is a nonlinear transformation letter
Number, the present invention select ReLU functions;XjIndicate matrix of the dimension from j to from convolution window size;xjIndicate the jth of sentence matrix X
The term vector of a word;The inner product operation of representing matrix;It indicates in window dwinIn the range of parameter matrix;
biIndicate the bias of i-th of convolution kernel;H indicates the number of parameter in convolution kernel, is one of important hyper parameter of CNN, we
Method selects 100.According to the difference to feature both ends convolution operation mode, convolution window has narrow and the two different sides of wide type
The dimension of formula, corresponding convolution results is respectively dm-dwin+ 1 and dm+dwin+ 1, all convolution operations of the present invention are all made of wide convolution.
If each size convolution kernel respectively chooses n characteristic pattern, the total characteristic figure under certain size may be expressed as:
Wherein, W is the new feature vector generated in position j convolution by n convolution kernel, i.e., obtains each convolution operation
To numerical value be stitched together in order, these new feature vectors generated by CNN have obtained the high-level characteristic of word, by this
A little features are stitched together in order still maintains the word order of atomic sentence, the input as LSTM.
RNN is a kind of time recurrent neural network, can describe dynamic time behavior, and state is followed in own net
Ring recurrence can receive wider time series structure input.But simple RNN is because can not handle with recurrence, power
The problem of exponential explosion of weight or disappearance, it is difficult to long term time association is captured, and LSTM uses a mnemon (memory
Cell the hidden layer in RNN) is replaced, can well solve the problem.Each word is in order into networking in one clause
Network is considered as the input of different time sequence, and LSTM can capture long-term association in time, i.e., between word apart from each other
Syntax is associated with, and obtains syntactic feature.
LSTM is by input gate it, out gate ot, forget door ftAnd memory cell ctComposition, these doors be used for determine how
Update current memory cell ctWith current hidden layer vector ht, current hidden layer vector htIn include the output of all LSTM cells, profit
With the language description LSTM of formalization:
it=σ (Wi·[ht-1,xt]+bi) (5)
ft=σ (Wf·[ht-1,xt]+bf) (6)
qt=tanh (Wq·[ht-1,xt]+bq) (7)
ot=σ (Wo·[ht-1,xt]+bo) (8)
Wherein, ht-1For last moment hidden layer vector;xtFor current time input vector;σ is logistic sigmoid letters
Number, the function make the output valve of object function between [0,1];Tanh makes the output valve of object function between [- 1,1];
It is dot product operation.
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through
The term vector model of step (3) carries out text vector, is then input in the presence or absence of above-mentioned threat taxonomy device and classifies, most
It is directed to afterwards and is judged to having the microblogging text of threat further to calculate Threat score according to sentiment dictionary and rule;
1) the emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
The word that the microblogging text for having threat obtains after participle is matched with sentiment dictionary, and if it exists, then select
It is taken as emotion word;If word does not appear in sentiment dictionary, emotion word is determined with the method for semantic similarity.
In order to reduce operand, noun, verb and adjective alternately emotion word are only remained.The present invention utilizes Hownet semanteme phase
Like degree algorithm as benchmark algorithm, there is good effect on weighing two Words similarities.Specific method is for two
A word w1And w2If word w1There are the n senses of a dictionary entry or concept:x1,x2…,xn, word w2There are the m senses of a dictionary entry or concept:y1,y2…,
ym, it is specified that word w1And w2Similarity be each senses of a dictionary entry or concept similarity maximum value, i.e.,:
Two former calculating formula of similarity of justice are:
Wherein, λ is positive variable element;d(x1,y2) indicate adopted original x1With adopted original y2Distance in hierarchical tree.
For any one word, can be obtained by calculating the similarity in the word and sentiment dictionary between seed word
Its Sentiment orientation value is obtained, computational methods are:Each seed word in word w and positive emotion dictionary is pressed into formula (11) and formula (12)
It carries out similarity calculation and obtains the similarity of the word and front seed word, then will be each in word w and negative emotion dictionary
Seed word carries out similarity calculation and obtains the similarity of the word and negative seed word, by comparing the inequality between them
Value, finally obtains the Sentiment orientation value of word w, calculation formula is as follows:
Wherein, piIndicate a certain positive emotion seed word, njIndicate a certain negative emotion seed word;Sentiment orientation value
SwValue range be (- 1,1).Given threshold T, by calculated Sentiment orientation value SwIt is compared with threshold value T, to judge word
Whether language w belongs to emotion word.When | Sw| when > T, judgement word w is emotion word.The intensity of the emotion word is set to 10
Sw, to be consistent with the level of intensity in sentiment dictionary.
If in microblogging clause including emotion word, and occur belonging to the negative word in negative dictionary or modification before it
When modification word in dictionary, the emotion score Sa of microblogging clause is calculated by following several situations:
A) degree adverb+emotion word, emotion word intensity change with degree adverb intensity, and emotion is scored at:
Sa=Ma·ps·pa (14)
B) polarity of negative word+emotion word, emotion word changes according to the number of negative word, and emotion is scored at:
Sa=(- 1)n·ps·pa (15)
C) degree adverb+negative word+emotion word, the reversion of emotion word polarity, and intensity changes with degree adverb intensity, feelings
Sense is scored at:
Sa=(- 1) Ma·ps·pa (16)
D) negative word+degree adverb+emotion word, before appearing in degree adverb due to negative, after the reversion of emotion word polarity,
Emotion word intensity more directly negates to be weakened, and introduces the first weight factor z1=0.5, emotion is scored at:
Sa=(- 1) Ma·ps·pa·z1 (17)
Wherein, ps indicates that the intensity of emotion word, pa indicate emotion word polarity, MaIndicate the intensity of degree adverb.
If belonging to compound sentence comprising the adversative conjunction in conjunction dictionary, microblogging clause in microblogging clause, it is contemplated that between sentence
Feeling polarities transfer, the emotion score of microblogging clause is calculated by following several situations:
A) turning relation:When occur in microblogging clause " still ", " however " etc. semantic reversion vocabulary when, previous microblogging clause
Polarity will change, the integral polarity of the two microbloggings clause will be identical as the latter microblogging clause, introduce second power
Repeated factor z2=-1, emotion is scored at:
Sen=z2Sen1+Sen2 (18)
B) progressive relationship:Former and later two microbloggings clause's polarity is identical, intensity enhancing, introduces third weight factor z3=1.5,
Emotion is scored at:
Sen=z3(Sen1+Sen2) (19)
C) concession relationship:The polarity of the latter microblogging clause can invert, the polarity of whole sentence and previous microblogging clause phase
Together, the 4th weight factor z is introduced4=-1, emotion is scored at:
Sen=Sen1+z4Sen2 (20)
Wherein, Sen1Indicate the emotion score of previous microblogging clause, Sen2Indicate the emotion score of latter microblogging clause;
2) extraction has the emoticon threatened in microblogging text, calculates the emoticon score of each microblogging clause;
A large amount of emoticon is provided in Sina weibo, by can brightly be indicated using emoticon in microblogging
Go out the Sentiment orientation of the microblogging text.Using emoticon as a weighted term of emotion score value, for whole microblogging text
Sentiment orientation judgement has certain correcting action.According to emoticon dictionary, all emoticons in the microblogging text are found
Polarity and intensity, and record the number of each emoticon;Enable NiFor the number of i-th of emoticon, eiFor the emoticon
Intensity, piFor the polarity of the emoticon, then the calculation formula of the emoticon score in microblogging text is:
The emotion score of above-mentioned microblogging text and emoticon score are weighted summation, you can obtain each microblogging text
This emotion score value, formula are as follows:
S1=α scoreemo+β·scoretext (22)
Wherein, α, β are adjustable weights, and value range is (0,1), and alpha+beta=1 can be selected by the verification of cross-beta collection
Adjustable weights α, β when correct class probability maximum;scoretextIt is each microblogging clause's for the emotion score of the microblogging text
The average value of emotion score.
3) according to above-mentioned steps (1) structure and the relevant keywords database of civil aviaton's security public sentiment, the row of microblogging text is calculated
To threaten score;
As described above, the keyword in keywords database is divided into two class of place word and behavior word.Wherein place word packet
Include airport, runway, terminal, flight etc., behavior word includes that aircraft bombing, airplane hijacking, despot's machine, sky make a noise, have a fist fight, protest, smoke;Its
Middle behavior word is to threaten intensity value there are two attribute, first attribute, has weighed threat degree of the word to civil aviaton's security,
Module is divided into 1,3,5,7,9 five kinds of intensity, consistent with the strength metric of emotion word.Second attribute is type of word,
Type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs and just can determine that have threat to civil aviaton, such as is fried
Machine, airplane hijacking, despot's machine, empty noisy etc.;Whether another kind of is indirect-type, i.e., must occur just determining simultaneously with place word to the people
Boat security has threat, such as has a fist fight, protests, smokes.When only existing indirect-type behavior word, it is not enough to judge that it pacifies civil aviaton
Possess threat.
Behavior threatens score S2The calculating process of < L, B > are as follows:The behavior word B in microblogging text is searched, then root
The type of behavior word is judged according to keywords database;When behavior word is Direct-type, behavior threatens score S2< L, B >'s
Value takes the intensity of behavior word;When behavior word is indirect-type, judge whether exist simultaneously place in the microblogging text
Word, if existed simultaneously, behavior threatens score S2The value of < L, B > take the intensity of behavior word, if asynchronously deposited
In behavior threat score S2< L, B > are 0.
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains microblogging text
This Threat score value;
Shown in the calculation formula of Threat score value such as formula (23):
Wherein, D indicates Threat score value, and range is between [- 10,10];S1Indicate the emotion score value of microblogging text;S2<
L, B > are that behavior threatens score, L to indicate that place word, B indicate behavior word;
(8) Threat grade is judged according to above-mentioned Threat score value;
The microblogging text for having threat is divided by high, medium and low threat level using threshold method, it is specific as follows:
1) it is low Threat when -4.5≤D≤0.
2) it is medium Threat when -7≤D < -4.5.
3) it is high Threat when -10≤D < -7.
Table 1 is listed certain microblogging texts are handled according to the method for the present invention after obtained Threat score value and threat
Spend grade.As can be seen from the table, the method for the present invention can accurately determine whether microblogging text has safety of civil aviation
It threatens.
The Threat of 1 microblogging text of table judges result
Claims (9)
1. a kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning, it is characterised in that:It includes in order
The following steps of progress:
(1) filtered out in a large amount of text from network with the relevant keyword of civil aviaton's security public sentiment, by these keywords and correspondence
Threat intensity value constitute keywords database;
(2) using the microblogging text filtered out according to civil aviaton's security public sentiment keyword and corresponding label as training set, to instruction
Practice the microblogging text concentrated and carries out pretreatment operation and word segmentation processing;Each microblogging text is made of at least one microblogging clause,
Label is divided into threat and without two kinds of threat;
(3) term vector training is carried out to the microblogging text after the participle that is obtained by step (2), obtains term vector model;
(4) the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn net in combined depth
Full articulamentum and softmax layers are added after network, collectively form combined deep learning classification model;
(5) by the term vector model that is obtained to step (3) of microblogging text input after being segmented in training set, by microblogging text to
Quantization;
(6) by after the vectorization obtained in step (5) microblogging text and corresponding label be input to combined depth and learn mould
In type, threatens text classifier the presence or absence of in the combined deep learning network of training combination deep learning disaggregated model and preserve
(7) after the microblogging text being analysed to carries out pretreatment operation and word segmentation processing according to the method for step (2), pass through step
(3) the term vector model obtained carries out microblogging text vector, is then input to the presence or absence of above-mentioned steps (6) acquisition threat taxonomy
Classify in device, is finally directed to and is judged to having the microblogging text of threat further to calculate Threat according to sentiment dictionary and rule
Score value;
(8) Threat grade is judged according to above-mentioned Threat score value.
2. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (1), the keyword is divided into two class of place word and behavior word;Wherein attribute there are two behavior words, first
A attribute is to threaten intensity value, and the threat degree for weighing the word to civil aviaton's security, module is divided into 1,3,5,7,9 five
Kind intensity;Second attribute is type of word, and type of word is divided into two classes, and one kind is Direct-type, i.e., this word only occurs
It just can determine that have threat to civil aviaton;Another kind of is indirect-type, i.e., must simultaneously occur capable of just determining with place word whether
There is threat to civil aviaton's security.
3. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (2), the microblogging text in training set, which carries out pretreatment operation and the method for word segmentation processing, is:Pretreatment behaviour
The noise letter including user's pet name, spcial character when making including removing the web page interlinkage in microblogging text, forwarding, reply microblogging
Breath retains without the topic label threatened in microblogging text, is subjective as the microblogging text comprising civil aviaton's public sentiment keyword is distinguished
Threaten the important feature of speech or news topic;Then above-mentioned pretreated microblogging text is divided using participle tool
Word.
4. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (3), using the Skip_gram methods in word2vec algorithms, preservation is trained with this method for term vector training
Obtained term vector model.
5. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (4), the combined deep learning network of structure convolutional neural networks and long memory network in short-term, and learn in combined depth
Full articulamentum and softmax layers are added after network, the method for collectively forming combined deep learning classification model is:Utilize difference
Convolution kernel and input layer in sentence matrix carry out convolution operation;In chronological order by the characteristic value under same size convolution kernel
It is stitched together, as the input for growing memory network in short-term, the upper and lower of microblogging text is further obtained by long memory network in short-term
Literary relationship characteristic;Full articulamentum obtains the score vector of label after nonlinear transformation;When the score vector of label passes through
After softmax layers, class probability can be calculated, finally obtains the classification of classification.
6. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (5), the method by microblogging text vector is:The each word pair of microblogging text is found in term vector model
Then term vector is spliced into sentence matrix by the term vector answered.
7. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (6), the microblogging text by after the vectorization obtained in step (5) and corresponding label are input to step
(4) it is trained in the combined deep learning classification model obtained, by trained model, referred to as whether there is or not threaten text classifier
Method be:Combined deep learning classification model intersects entropy function continuous renewal instruction using SGD optimisation techniques and by minimum
Practice weight, executing dropout to the parameter in full articulamentum operates to prevent model over-fitting, according to the scale selection of training set
Suitable mini-batch sizes, finally preserve the disaggregated model that training obtains, for directly to the microblogging of input text
This is classified.
8. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
In step (7), the method for the calculating Threat score is:
1) emotion word threatened in microblogging text is extracted, the emotion score of each microblogging clause is calculated;
2) emoticon threatened in microblogging text is extracted, the emoticon score of each microblogging clause is calculated;And it will
Above-mentioned emotion score and emoticon score are weighted summation, obtain the emotion score value of each microblogging clause;
3) according to civil aviaton's security public sentiment keywords database, the behavior for calculating microblogging text threatens score;
4) it threatens score to be weighted summation the emotion score value of above-mentioned microblogging text and behavior, finally obtains the prestige of microblogging text
Stress score.
9. civil aviaton's microblogging public sentiment sentiment analysis method according to claim 1 based on deep learning, it is characterised in that:
It is described to judge that the method for Threat grade is according to Threat score value in step (8):The Threat obtained according to step (7)
The microblogging text for having threat is divided into high, medium and low threat level by score value using threshold method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810290094.9A CN108536801A (en) | 2018-04-03 | 2018-04-03 | A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810290094.9A CN108536801A (en) | 2018-04-03 | 2018-04-03 | A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108536801A true CN108536801A (en) | 2018-09-14 |
Family
ID=63482373
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810290094.9A Pending CN108536801A (en) | 2018-04-03 | 2018-04-03 | A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108536801A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325125A (en) * | 2018-10-08 | 2019-02-12 | 中山大学 | A kind of social networks rumour method based on CNN optimization |
CN109582785A (en) * | 2018-10-31 | 2019-04-05 | 天津大学 | Emergency event public sentiment evolution analysis method based on text vector and machine learning |
CN110083825A (en) * | 2019-03-21 | 2019-08-02 | 昆明理工大学 | A kind of Laotian sentiment analysis method based on GRU model |
CN110232109A (en) * | 2019-05-17 | 2019-09-13 | 深圳市兴海物联科技有限公司 | A kind of Internet public opinion analysis method and system |
CN110321562A (en) * | 2019-06-28 | 2019-10-11 | 广州探迹科技有限公司 | A kind of short text matching process and device based on BERT |
CN110377739A (en) * | 2019-07-19 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Text sentiment classification method, readable storage medium storing program for executing and electronic equipment |
CN111104526A (en) * | 2019-11-21 | 2020-05-05 | 新华智云科技有限公司 | Financial label extraction method and system based on keyword semantics |
CN111523319A (en) * | 2020-04-10 | 2020-08-11 | 广东海洋大学 | Microblog emotion analysis method based on scene LSTM structure network |
CN111767398A (en) * | 2020-06-30 | 2020-10-13 | 国网新疆电力有限公司电力科学研究院 | Secondary equipment fault short text data classification method based on convolutional neural network |
CN111967494A (en) * | 2020-07-01 | 2020-11-20 | 北京工业大学 | Multi-source heterogeneous data analysis method for security protection of large-scale activity public security system guard |
CN112329974A (en) * | 2020-09-03 | 2021-02-05 | 中国人民公安大学 | LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system |
CN113688240A (en) * | 2021-08-25 | 2021-11-23 | 南京中孚信息技术有限公司 | Threat element extraction method, device, equipment and storage medium |
CN113792118A (en) * | 2021-09-08 | 2021-12-14 | 浙江力石科技股份有限公司 | Satisfaction improving system and method based on scenic spot evaluation |
CN115982473A (en) * | 2023-03-21 | 2023-04-18 | 环球数科集团有限公司 | AIGC-based public opinion analysis arrangement system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105491013A (en) * | 2015-11-20 | 2016-04-13 | 电子科技大学 | Multi-domain network security situation perception model and method based on SDN |
CN105719291A (en) * | 2016-01-20 | 2016-06-29 | 江苏省沙钢钢铁研究院有限公司 | Surface defect image classification system having selectable types |
WO2016187472A1 (en) * | 2015-05-21 | 2016-11-24 | Baidu Usa Llc | Multilingual image question answering |
CN106598944A (en) * | 2016-11-25 | 2017-04-26 | 中国民航大学 | Civil aviation security public opinion emotion analysis method |
CN107562784A (en) * | 2017-07-25 | 2018-01-09 | 同济大学 | Short text classification method based on ResLCNN models |
-
2018
- 2018-04-03 CN CN201810290094.9A patent/CN108536801A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016187472A1 (en) * | 2015-05-21 | 2016-11-24 | Baidu Usa Llc | Multilingual image question answering |
CN105491013A (en) * | 2015-11-20 | 2016-04-13 | 电子科技大学 | Multi-domain network security situation perception model and method based on SDN |
CN105719291A (en) * | 2016-01-20 | 2016-06-29 | 江苏省沙钢钢铁研究院有限公司 | Surface defect image classification system having selectable types |
CN106598944A (en) * | 2016-11-25 | 2017-04-26 | 中国民航大学 | Civil aviation security public opinion emotion analysis method |
CN107562784A (en) * | 2017-07-25 | 2018-01-09 | 同济大学 | Short text classification method based on ResLCNN models |
Non-Patent Citations (2)
Title |
---|
PING HAN 等: "A Topic-Independent Hybrid Approach for Sentiment Analysis of Chinese Microblog", 《2016 IEEE 17TH INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION (IRI)》 * |
韩萍 等: "民航恐怖威胁信息预警系统的设计与实现", 《中国民航大学学报》 * |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325125B (en) * | 2018-10-08 | 2022-06-14 | 中山大学 | Social network rumor detection method based on CNN optimization |
CN109325125A (en) * | 2018-10-08 | 2019-02-12 | 中山大学 | A kind of social networks rumour method based on CNN optimization |
CN109582785A (en) * | 2018-10-31 | 2019-04-05 | 天津大学 | Emergency event public sentiment evolution analysis method based on text vector and machine learning |
CN110083825A (en) * | 2019-03-21 | 2019-08-02 | 昆明理工大学 | A kind of Laotian sentiment analysis method based on GRU model |
CN110232109A (en) * | 2019-05-17 | 2019-09-13 | 深圳市兴海物联科技有限公司 | A kind of Internet public opinion analysis method and system |
CN110321562A (en) * | 2019-06-28 | 2019-10-11 | 广州探迹科技有限公司 | A kind of short text matching process and device based on BERT |
CN110321562B (en) * | 2019-06-28 | 2023-06-02 | 广州探迹科技有限公司 | Short text matching method and device based on BERT |
CN110377739A (en) * | 2019-07-19 | 2019-10-25 | 出门问问(苏州)信息科技有限公司 | Text sentiment classification method, readable storage medium storing program for executing and electronic equipment |
CN111104526A (en) * | 2019-11-21 | 2020-05-05 | 新华智云科技有限公司 | Financial label extraction method and system based on keyword semantics |
CN111523319A (en) * | 2020-04-10 | 2020-08-11 | 广东海洋大学 | Microblog emotion analysis method based on scene LSTM structure network |
CN111523319B (en) * | 2020-04-10 | 2023-06-30 | 广东海洋大学 | Microblog emotion analysis method based on scene LSTM structure network |
CN111767398A (en) * | 2020-06-30 | 2020-10-13 | 国网新疆电力有限公司电力科学研究院 | Secondary equipment fault short text data classification method based on convolutional neural network |
CN111967494A (en) * | 2020-07-01 | 2020-11-20 | 北京工业大学 | Multi-source heterogeneous data analysis method for security protection of large-scale activity public security system guard |
CN111967494B (en) * | 2020-07-01 | 2024-03-26 | 北京工业大学 | Multi-source heterogeneous data analysis method for guard security of large movable public security system |
CN112329974A (en) * | 2020-09-03 | 2021-02-05 | 中国人民公安大学 | LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system |
CN112329974B (en) * | 2020-09-03 | 2024-02-27 | 中国人民公安大学 | LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system |
CN113688240A (en) * | 2021-08-25 | 2021-11-23 | 南京中孚信息技术有限公司 | Threat element extraction method, device, equipment and storage medium |
CN113688240B (en) * | 2021-08-25 | 2024-01-30 | 南京中孚信息技术有限公司 | Threat element extraction method, threat element extraction device, threat element extraction equipment and storage medium |
CN113792118A (en) * | 2021-09-08 | 2021-12-14 | 浙江力石科技股份有限公司 | Satisfaction improving system and method based on scenic spot evaluation |
CN115982473A (en) * | 2023-03-21 | 2023-04-18 | 环球数科集团有限公司 | AIGC-based public opinion analysis arrangement system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108536801A (en) | A kind of civil aviaton's microblogging security public sentiment sentiment analysis method based on deep learning | |
CN110188192B (en) | Multi-task network construction and multi-scale criminal name law enforcement combined prediction method | |
Balwant | Bidirectional LSTM based on POS tags and CNN architecture for fake news detection | |
CN106598944A (en) | Civil aviation security public opinion emotion analysis method | |
CN109726745B (en) | Target-based emotion classification method integrating description knowledge | |
CN106202372A (en) | A kind of method of network text information emotional semantic classification | |
KR20190063978A (en) | Automatic classification method of unstructured data | |
Zhao et al. | ZYJ123@ DravidianLangTech-EACL2021: Offensive language identification based on XLM-RoBERTa with DPCNN | |
Rashid et al. | Feature level opinion mining of educational student feedback data using sequential pattern mining and association rule mining | |
CN106682089A (en) | RNNs-based method for automatic safety checking of short message | |
Chowdhury et al. | A comparative analysis of word embedding representations in authorship attribution of bengali literature | |
Akhter et al. | Cyber bullying detection and classification using multinomial Naïve Bayes and fuzzy logic | |
Gangadharan et al. | Paraphrase detection using deep neural network based word embedding techniques | |
CN110297986A (en) | A kind of Sentiment orientation analysis method of hot microblog topic | |
Gao et al. | Sentiment classification for stock news | |
Lim et al. | Examining machine learning techniques in business news headline sentiment analysis | |
Sajeevan et al. | An enhanced approach for movie review analysis using deep learning techniques | |
Wang et al. | YNUWB at SemEval-2019 Task 6: K-max pooling CNN with average meta-embedding for identifying offensive language | |
Hassan et al. | Reviews Sentiment analysis for collaborative recommender system | |
CN117291190A (en) | User demand calculation method based on emotion dictionary and LDA topic model | |
Krungklang et al. | An analysis of natural language text relating to thai criminal law | |
Kavatagi et al. | A context aware embedding for the detection of hate speech in social media networks | |
Zhu et al. | Attention based BiLSTM-MCNN for sentiment analysis | |
Dutta et al. | Fake news prediction: a survey | |
Lee et al. | A two-level recurrent neural network language model based on the continuous Bag-of-Words model for sentence classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180914 |
|
RJ01 | Rejection of invention patent application after publication |