CN108108354B

CN108108354B - Microblog user gender prediction method based on deep learning

Info

Publication number: CN108108354B
Application number: CN201711380014.0A
Authority: CN
Inventors: 张春霞; 冉昇; 武嘉玉; 冯丽霞; 牛振东; 黄达友
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2017-06-18
Filing date: 2017-12-20
Publication date: 2021-04-06
Anticipated expiration: 2037-12-20
Also published as: CN108108354A

Abstract

The invention relates to a microblog user gender prediction method based on deep learning, and belongs to the field of Web mining and intelligent information processing. The prediction method comprises the following steps: collecting microblog information; preprocessing microblog texts; constructing word vectors of microblog text words; constructing a feature vector of a microblog text sentence by adopting a convolutional neural network-based microblog text representation method; and predicting or classifying the gender of the microblog users by adopting a method based on a long-term and short-term memory network model. The microblog text representation method based on the convolutional neural network does not need to artificially construct microblog text features, and can realize semantic modeling of microblog texts. The microblog user gender prediction method based on the long-short term memory network can extract semantic sequence dependency relationship features in microblog texts. According to the microblog user gender prediction method, the microblog text features are accurately extracted, the identification performance of the microblog user gender is improved, and the microblog user gender prediction method has wide application prospects in the fields of information recommendation and product marketing.

Description

Microblog user gender prediction method based on deep learning

Technical Field

The invention relates to the field of Web mining and intelligent information processing, in particular to a microblog user gender prediction method based on deep learning.

Background

The microblog user gender prediction is important research content for constructing the user identity portrait. User identity representation construction refers to identifying various identity attributes of a user, including the gender, age, education level and the like of the user. The user identity portrait construction technology can be widely applied to the fields of computer investigation and evidence obtaining, network public opinion monitoring, commodity marketing and the like.

Currently, the gender prediction of the user mainly adopts a classification method to identify the gender of the user. Mikros in the document "Autoform Attribution and Gender identification in Greek Blogs" (Methods and Applications of Quantitative rules, 2012), constructed features about high-frequency words and characters, and then used a support vector machine classifier to identify the Gender of the blog author. Ansari et al, in the document Gender Classification of Blog Authors (Special Issue of International Journal of Sustainable Development and Green Economics,2013), extracted features about part of speech and then used Bayesian classifier to identify the Gender of the Blog author. In the literature, "research on gender classification method for microblog users in chinese (chinese information science, 2014), royal crystal and the like, two classifiers based on user information and microblog texts are firstly developed respectively, and then the two classifiers are integrated by using a bayesian rule to identify the gender of a microblog writer.

The conventional microblog user gender identification method mainly has the following problems: manually constructing microblog text characteristics; the existing microblog text representation mainly adopts a vector space model or a bag-of-words model, and has the problems of sparse characteristic vectors and high dimensionality.

Aiming at the problems of the microblog user gender identification method, a microblog user gender identification technology is urgently needed for providing high-efficiency microblog user identity portrait construction service.

Disclosure of Invention

The invention aims to provide a microblog user gender prediction method based on deep learning, aiming at solving the problems in the microblog user gender recognition method. A microblog user gender prediction method based on deep learning comprises a microblog text representation method based on a convolutional neural network and a microblog user gender prediction or classification method based on a Long Short Term Memory network (LSTM). The microblog text representation method based on the convolutional neural network can automatically extract microblog text features. According to the microblog user gender prediction method based on the long-short term memory network, the semantic sequence dependency relationship in the microblog text can be obtained, and therefore the gender of the microblog user can be predicted more accurately.

The purpose of the invention is realized by the following technical scheme.

A network user gender prediction method based on deep learning comprises the following steps:

step 1, microblog information acquisition: acquiring a microblog text of a user on a microblog platform by using a web crawler, and storing the microblog text in a computer;

the method comprises the steps of collecting microblog texts of a plurality of microblog users with different genders, and storing the microblog text of each user into an extensible markup language file named by a user ID. In addition, the gender attributes of all microblog users are stored in a file.

Step 2, microblog text preprocessing: performing text extraction, morphological restoration and stop word and punctuation filtering on the microblog text collected in the step 1;

and (3) preprocessing the extensible markup language file collected in the step (1) to obtain the microblog text of each microblog user. In addition, a word form reduction is carried out on the microblog text by using an NLTK (Natural Language toolkit) tool, and stop words and punctuation marks in the microblog text are filtered out.

Step 3, constructing word vectors of microblog text words: and taking the microblog text as input, and mapping all words in the microblog text sentences into word vectors through an input mapping layer of a microblog text representation model convolutional neural network.

And for each word of a sentence in the microblog text, acquiring a k-dimensional vector of the current word by using a word vector model, wherein k is a positive integer. The Word vector model included either Word2Vec from google or GLOVE from stanford university. And if the current word is not contained in the word vector set constructed by the word vector model, generating a k-dimensional vector of the current word by a random method.

For a sentence w of microblog text₁w₂w₃…w_mWherein w is_iRepresents a word, i is more than or equal to 1 and less than or equal to m, and m is a positive integer. Let the word w₁The word vector of<x₁₁,x₁₂,…,x_1n>N is a positive integer, the term w₂The word vector of<x₂₁,x₂₂,…,x_2n>…, word w_mThe word vector of<x_m1,x_m2,…,x_mn>Then, the initial feature vector for constructing the sentence is:

and 4, constructing a feature vector of the microblog text sentence by adopting a convolutional neural network-based microblog text representation method.

The convolutional neural network includes the input mapping layer of step 3, as well as convolutional and pooling layers.

And 4.1, performing convolution operation on the word vectors generated in the step 3 through a convolution layer of the microblog text representation model convolutional neural network to generate a Feature Map (Feature Map) of a microblog text sentence.

For a convolution kernel with a window length of h, the successive convolution operations are performed on h words, i.e.

c_i＝f(w*v_i:i+h-1+b)

Wherein w and b are parameters, v_i:i+h-1All word vectors from the ith word to the (i + h-1) th word are denoted as a concatenation, and the function f denotes the activation function.

For example, the activation function may take the form of a ReLU function of: (x) max {0, x }; that is, f (x) is the greater of 0 and x, where x is the input to the activation function.

Step 4.2, extracting the significant features of the microblog text sentences through a pooling layer of a convolutional neural network in the microblog text representation model, and generating feature vectors of the microblog text sentences;

the pooling layer is mainly used for realizing feature selection of feature vectors of microblog text sentences through pooling operation. And performing pooling operation in a mode of integrating maximum pooling operation and average pooling operation.

Setting the characteristic diagram of the microblog text sentence generated in the step 4.1 as follows:

wherein, y_ijRepresents the result of convolution operation of the word vector from the ith word to the (i + h-1) th word by the jth convolution kernel, h is the window length of the convolution kernel, and r and s are positive integers. The average pooling operation is as follows:

the maximization operation is as follows:

(max{y₁₁,y₁₂,...,y_1s},max{y₂₁,y₂₂,...,y_2s},...,max{y_r1,y_r2,...,y_rs})

the integrated result of the maximum pooling operation and the average pooling operation is:

and 5, predicting the gender of the microblog user by adopting a method based on a long-short term memory network model.

The long-short term memory network model comprises a sequence generation layer, a bidirectional long-short term memory network layer and a classification layer.

And 5.1, using the feature vector of the microblog text sentence generated in the step 3 as input, regenerating the feature vector of the microblog text sentence through a sequence generation layer in a gender prediction method based on the long-short term memory network model, and using the feature vector as the input of the bidirectional long-short term memory network layer in the step 5.2.

The sequence generation layer sequentially comprises a first convolution layer, a second pooling layer, a third convolution layer and a fourth pooling layer. (1) In the first convolutional layer, convolution is performed using 64 convolution kernels with a window length of 2 and a step size of 1. (2) In the second pooling layer, pooling is performed using a pooling window having a window length of 2 and a step size of 1. (3) In the third convolutional layer, 64 convolutional kernels with a window length of 3 and a step size of 1 are used for convolution. (4) And in the fourth pooling layer, pooling is carried out by using a pooling window with the window length of 3 and the step length of 1, so as to generate the feature vector of the microblog text sentence.

And 5.2, taking the feature vector of the microblog text sentence generated in the step 5.1 as the input of a bidirectional long-short term memory network layer in the gender prediction method based on the long-short term memory network model, and regenerating the feature vector of the microblog text sentence by the bidirectional long-short term memory network layer by capturing the semantic sequence dependency relationship in the microblog text sentence.

The input of the bidirectional long and short term memory network layer is a feature vector sequence v of all sentences of the microblog text generated in the step 5.1₁,v₂,…,v_tAnd t is a positive integer. The sequence of feature vectors v₁,v₂,…,v_tCan be regarded as a time series, vector v_iThe bidirectional long-short term memory network layer generates an output state for the input state of each time step.

If the feature vector sequence v is to be combined₁,v₂,…,v_nAccording to v₁,v₂,…,v_nThe sequence of (2) is inputted into the long-short term memory network layer, which is called as the forward long-short term memory network layer. If the feature vector sequence v₁,v₂,…,v_nAccording to v_n,v_n-1,…,v₂,v₁The sequence of (2) is inputted into the long short term memory network layer, which is called reverse long short term memory network. If the feature vector sequence v is to be combined₁,v₂,…,v_nAccording to v₁,v₂,…,v_nThe sequence of the input vector is set as w₁,w₂,…,w_n. Further, vector sequence w₁,w₂,…,w_nAccording to w_n,w_n-1,…w₂,w₁The sequence of the first layer long short term memory network layer and the second layer short term memory network layer is called as a bidirectional long short term memory network.

Further, regarding the output vector sequence of the second layer long-short term memory network layer in the two-way long-short term memory network layer, the output state at the last time step is taken as the output state of the two-way long-short term memory network.

And 5.3, combining the feature vectors of the microblog text sentences constructed in the step 4 and the step 5.2.

Setting the feature vector of the microblog text sentence constructed in the step 4 as<a₁,a₂,…,a_p>Wherein p is a positive integer, and is a parameter set by the bidirectional long-short term memory network layer. For example, p may take the value 70. Setting the feature vector of the microblog text sentence constructed in the step 5.2 as<b₁,b₂,…,b_q>Wherein q is a positive integer. For example, q may take the value 32. Merging the two feature vectors into<a₁,a₂,…,a_p,b₁,b₂,…,b_q>As input vectors for the classification layer at step 5.4.

And 5.4, entering a classification layer in the gender prediction model based on the long-term and short-term memory network. The classification layer is composed of a fully connected neural network. And the classification layer inputs the feature vectors of the microblog text sentences constructed in the step 5.3 and outputs the feature vectors as gender classifications of microblog users, wherein the gender classifications include male and female categories.

The fully-connected neural network is formed by connecting a plurality of neurons of the neural network. The single neuron receives a vector as input, sums and applies an activation function to obtain the output of the single neuron.

The activation function is a ReLU function of the form: (x) max {0, x }; that is, f (x) is the greater of 0 and x, where x is the input to the activation function.

The fully-connected neural network can be constructed by connecting a plurality of neurons in a layered manner, so that the output of each neuron on the upper layer is used as the input of each neuron on the lower layer.

For predicting the gender of the microblog user, the output vector of the fully-connected neural network is<p₁,p₂>，p₁Representing the probability that the predicted outcome is female, p₂Indicating the probability that the predicted outcome is male. If p is₁>p₂And if not, the microblog user gender prediction result is female, otherwise, the microblog user gender prediction result is male.

Thus, the whole process of the method is completed.

Advantageous effects

According to the method, aiming at the existing microblog user gender identification method, microblog text characteristics need to be constructed manually; the conventional microblog text representation mainly adopts a vector space model or a bag-of-words model, has the problems of sparse characteristic vectors and high dimensionality, and provides a microblog user gender prediction method based on deep learning. The method comprises a microblog text representation method based on a convolutional neural network and a microblog user gender prediction method based on a long-short term memory network. The method improves the identification performance of the gender of the microblog user. The concrete aspects are as follows:

(1) according to the microblog text representation method based on the convolutional neural network, the characteristic vector of words and sentences of the microblog text can be automatically constructed without manually constructing the characteristic of the microblog text, and semantic modeling of the microblog text is realized.

(2) According to the microblog user gender prediction method based on the long-short term memory network, on one hand, the long-short term memory network can extract semantic sequence dependency relations in microblog text sentences, and implicit characteristics of microblog texts are captured. On the other hand, compared with the traditional recurrent neural network, the long-short term memory network effectively avoids the problem of gradient annihilation. That is, the gradient value becomes very small during back propagation due to the vector sequence being too long, making it difficult for the model to converge. Therefore, the microblog user gender prediction method based on the long-short term memory network improves the identification performance of the microblog user gender.

(3) According to the invention, the microblog text representation based on the convolutional neural network and the microblog text representation based on the long-short term memory network are combined into the feature representation of the microblog text, so that not only are the local features of the microblog text extracted, but also the semantic dependence features of the microblog text are extracted. In addition, the full-connection neural network is used as a classification layer, the fitting performance of the full-connection neural network is high, and the problem of predicting the gender of the microblog user is effectively solved.

Drawings

Fig. 1 is a schematic flow chart of a microblog user gender prediction method based on deep learning according to an embodiment of the invention.

Detailed Description

According to the technical scheme, the following describes a preferred embodiment of the invention in detail with reference to the accompanying drawings and examples.

Example 1

For example, for a microblog platform Twitter, a web crawler is used to collect Twitter text of a microblog user, namely microblog text. The microblog text with the user ID "1 a4a60942a15426c9a7ec3764e7d0 ede" is saved to a file "1 a4a60942a15426c9a7ec3764e7d0 ede.xml" in the form:

For example, the file "1 a4a60942a15426c9a7e 3764e7d0 ed. xml" collected in step 1 is preprocessed to obtain the microblog text "@ Michael _ J _ Parry can't com" on the microblog text but the microblog girl in me a bit registration of reacted ". And performing morphological restoration on the microblog text, wherein the morphological restoration result is as follows: "Michael _ J _ Parry can not comment on the transpositional bit in me be bit restriction on 'f' nd".

for example, for The sentence "The quick brown fox jumps over The lazy dog", 100-dimensional vectors for each word are generated by a word vector model and a stochastic method, and are stacked to form a 100 × 9 matrix.

For example, the 100-dimensional word vector for the word "dog" is: <0.50779, -1.0274, 0.48136, -0.09417, 0.44837, -0.52291, 0.51498, -0.038927, 0.35867, -0.065994, -0.82882, 0.76179, -3.803, -0.010576, 0.21654, 0.59712, 0.37424, -0.022629, -0.010331, -0.33966, 0.094336, 0.26253, -0.40161, -0.0079532, 1.0206, -0.35793, -0.565, 0.58815, -0.81847, 0.81847, 0.81847, -0.81847, -0.81847, -0.81847, -0.81847, 0.81847, 0.81847, -0.1903, 0.81847, 0.81847, -0.81847, -0.81847, -0.81847, 0.81847, 0.81847, 0.81847, 0.81847, -0.81847, -0.81847, -0.81847, -0.81847, 0.81847, 0.81847, 0.81847, -0.81847, 0.50297, 0.032685, -0.5179, -0.23541, 0.2296, -0.63588, 1.627, 0.62832, -0.74846, 0.60073, -0.011215, -0.32113, 0.14339, -0.060809, 0.088218, 0.65936, -0.46127, -0.37644, -0.1133, 0.15875, 0.39119, 0.67659, -0.071224, 0.17458, -0.033406, 0.73152 >.

c_i＝f(w*v_i:i+h-1+b)

For example, a convolution kernel of 3 × 100 means that the window length of the convolution kernel is 3, and a convolution operation is performed on a word vector of dimension 100. Assuming that the maximum number of words in a sentence is 200, and selecting 32 convolution kernels with a step size of 1 and a convolution operation of 3 × 100, a feature map with dimensions of 32 × 198 can be generated, which is expressed as:

wherein, y_ijRepresents the result of the convolution operation on the word vector from the ith word to the (i + 3-1) th word by the jth convolution kernel.

the maximization operation is as follows:

The activation function may take the form of a ReLU function: (x) max {0, x }; that is, f (x) is the greater of 0 and x, where x is the input to the activation function.

Thus, the whole process of the method is completed.

In order to illustrate the gender prediction effect of the microblog user, the experiment is carried out by comparing the same training set and the same testing set by two methods under the same condition. The first method is a microblog text representation based on a convolutional neural network and a microblog user gender prediction method based on logistic regression. The second method is the microblog user gender prediction method based on deep learning. The adopted evaluation index is precision (Accuracy), and the calculation formula is as follows:

wherein N is₁Number of microblog users, N, correctly gender identified₂The number of all microblog users for gender identification.

The microblog user gender prediction result has the following effects: the accuracy with the first method was about 63% and with the method of the invention about 71%. The effectiveness of the microblog user gender prediction method based on deep learning provided by the invention is shown through experiments.

While the foregoing is directed to the preferred embodiment of the present invention, it is not intended that the invention be limited to the embodiment and the drawings disclosed herein. Equivalents and modifications may be made without departing from the spirit of the disclosure, which is to be considered as within the scope of the invention.

Claims

1. A microblog user gender prediction method based on deep learning is characterized by comprising the following steps: the method comprises the following steps:

step 1, microblog information acquisition: aiming at a Twitter webpage, collecting Twitter texts of microblog users, namely microblog texts by using a web crawler, and storing the microblog texts into a local computer;

step 2, microblog text preprocessing: performing text extraction, morphological restoration and stop word and punctuation mark filtration on the microblog text acquired in the microblog information acquisition step 1;

step 3, vectorization representation of microblog text words: the method comprises the following steps of taking a microblog text as an input, mapping all words in a microblog text sentence into word vectors through an input mapping layer of a microblog text representation model convolutional neural network, and specifically comprises the following steps:

for each word of a sentence in the microblog text, acquiring a k-dimensional vector of the current word by using a word vector model; if the current word is not contained in the word vector set constructed by the word vector model, generating a k-dimensional vector of the current word by a random method;

step 4, constructing feature vector representation of the microblog text sentence by adopting a convolutional neural network-based microblog text representation method, which specifically comprises the following steps:

step 4.1, carrying out convolution operation on the word vectors generated in the step 3 through a microblog text representation model convolution neural network to generate a feature map representation of a microblog text sentence;

step 4.2, extracting significant features of microblog text sentences through a pooling layer of a microblog text representation model convolutional neural network, and generating feature vector representations of the microblog text sentences;

step 5, adopting a gender classification model based on a long-short term memory network to predict the gender of the microblog user, specifically comprising the following steps:

step 5.1, using the feature vector representation of the microblog text sentences generated in the step 3 as input, and regenerating the feature vector representation of the microblog text sentences by adopting a sequence generation layer in a gender classification model based on a long-short term memory network as input of a bidirectional long-short term memory network layer in the step 5.2;

the long-short term memory network model comprises a sequence generation layer, a bidirectional long-short term memory network layer and a classification layer; the sequence generation layer sequentially comprises a convolution layer, a pooling layer, a convolution layer and a pooling layer;

step 5.2, representing the feature vectors of the microblog text sentences generated in the step 5.1 as input of a bidirectional long-short term memory network layer in the gender classification model based on the long-short term memory network, wherein the bidirectional long-short term memory network layer constructs the feature vectors of the microblog text sentences by capturing semantic sequence dependency relations in the microblog text sentences;

step 5.3, combining the feature vectors of the microblog text sentences constructed in the step 4 and the step 5.2;

step 5.4, entering a classification layer in the gender classification model based on the long-term and short-term memory network, wherein the classification layer is formed by a fully-connected neural network;

the classification layer inputs the feature vectors of the microblog text sentences constructed in the step 5.3 and outputs the feature vectors as gender classifications of microblog users, wherein the gender classifications include male and female categories;

the fully-connected neural network is formed by connecting a plurality of neural elements of the neural network, and a single neural element receives a vector as input, sums and applies an activation function to obtain the output of the single neural element;

the fully-connected neural network can be constructed by connecting a plurality of neurons in a layered manner so that the output of each neuron on the upper layer is used as the input of each neuron on the lower layer;

for predicting the gender of the microblog user, the output vector of the fully-connected neural network is (p)₀,p₁)，p₀Representing the probability that the predicted outcome is female, p₁Indicating the probability that the predicted outcome is male.

2. The microblog user gender prediction method based on deep learning according to claim 1, characterized by comprising the following steps: the step 1 is realized by the following processes:

the method comprises the steps of collecting microblog texts of a plurality of microblog users with different genders, storing the microblog text of each user into an extensible markup language file named by a user ID, and simultaneously storing gender attributes of all the microblog users into one file.

3. The microblog user gender prediction method based on deep learning according to claim 2, characterized in that: the step 2 is realized by the following processes:

preprocessing the extensible markup language file acquired in the step 1 to obtain a microblog text of each microblog user;

in addition, performing morphological restoration on the microblog text by using an NLTK tool, and filtering stop words and punctuations in the microblog text;

among them, NLTK, Natural Language Toolkit.

4. The microblog user gender prediction method based on deep learning according to claim 3, wherein the microblog user gender prediction method comprises the following steps: k in the step 3 is a positive integer; the Word vector model includes Word2Vec of google or GLOVE of stanford university;

for a sentence w of microblog text₁,w₂,w₃,…,w_mWherein w is_iRepresents a word; let the word w₁The word vector of (x)₁₁,x₁₂,…,x_1n) Word w₂The word vector of (x)₂₁,x₂₂,…,x_2n) …, word w_mThe word vector of (x)_m1,x_m2,…,x_mn) Then a vector representation of the sentence is constructed as:

w_ithe value range of the middle subscript i is more than or equal to 1 and less than or equal to n.

5. The microblog user gender prediction method based on deep learning according to claim 4, wherein the microblog user gender prediction method comprises the following steps: the step 4.1 is specifically as follows: for a convolution kernel with a window length of h, successive convolution operations are performed on h words, i.e.

c_i＝f(w*v_i:i+h-1+b)

Wherein w and b are parameters, v_i:i+h-1Representing all word vectors from the ith word to the (i + h-1) th word in a concatenation, the function f representing an activation function;

4.2, the pooling layer realizes the feature selection of the feature vector of the microblog text sentence through pooling operation, and performs the pooling operation in a mode of integrating maximum pooling operation and average pooling operation;

wherein, y_ijRepresenting the result of convolution operation of a word vector from the ith word to the (i + h-1) th word by the jth convolution kernel, wherein h is the window length of the convolution kernel; the average pooling operation is as follows:

the maximization operation is as follows:

(max{y₁₁,y₁₂,...,y_1s},max{y₂₁,y₂₂,...,y_2s},...,max{y_r1,y_r2,...,y_rs}) the integration result of the maximum pooling operation and the average pooling operation is:

6. the microblog user gender prediction method based on deep learning according to claim 5, wherein the microblog user gender prediction method comprises the following steps:

the sequence generation layer in the step 5.1 sequentially comprises a first convolution layer, a second pooling layer, a third convolution layer and a fourth pooling layer;

(1) in the first convolution layer, performing convolution by using 64 convolution kernels with the window length of 2 and the step length of 1;

(2) in the second layer of the pooling layer, pooling is carried out by using a pooling window with the window length of 2 and the step length of 1;

(3) in the third convolutional layer, performing convolution by using 64 convolution kernels with the window length of 3 and the step length of 1;

(4) in the fourth pooling layer, pooling is carried out by using a pooling window with the window length of 3 and the step length of 1, and a feature vector representation of a microblog text sentence is generated;

the input of the bidirectional long and short term memory network layer in step 5.2 is a feature vector sequence v of all sentences of the microblog text generated in step 5.1₁,v₂,…,v_n(ii) a The sequence of feature vectors v₁,v₂,…,v_nCan be regarded as a time series, vector v_iThe bidirectional long-short term memory network layer generates an output state for the input state of each time step;

if the feature vector sequence v is to be combined₁,v₂,…,v_nAccording to v₁,v₂,…,v_nThe sequence of the long and short term memory network layer is called as a forward long and short term memory network layer;

if the feature vector sequence v₁,v₂,…,v_nAccording to v_n,v_n-1,…,v₂,v₁The sequence of the long and short term memory network layer is input into the long and short term memory network layer, which is called a reverse long and short term memory network;

if the feature vector sequence v is to be combined₁,v₂,…,v_nAccording to v₁,v₂,…,v_nThe sequence of the input vector is input into the first layer long-short term memory network layer, and the output vector sequence is set as t₁,t₂,…,t_n；

Further, vector sequence t₁,t₂,…,t_nAccording to t_n,t_n-1,…,t₂,t₁The sequence of the output vector is set as u₁,u₂,…,u_nIt is called bidirectional long-short term memory network;

further, adopting a bidirectional long-short term memory network, and for the output vector sequence of the second layer long-short term memory network layer, outputting the output state u at the last time step_nAs the output state of the bidirectional long-short term memory network;

step 5.3, specifically:

setting the feature vector of the microblog text sentence constructed in the step 4 as (a)₁,a₂,…,a_p)；

Wherein, p is a parameter set by the bidirectional long-short term memory network layer; setting the feature vector of the microblog text sentence constructed in the step 5.2 as (b)₁,b₂,…,b_q)；

The two feature vectors are combined into (a)₁,a₂,…,a_p,b₁,b₂,…,b_q) As input vectors for the classification layer of step 5.4;

in step 5.4, a single neuron in the fully-connected neural network receives a vector as an input, sums and applies an activation function to obtain the output of the single neuron, specifically: the activation function is a ReLU function of the form: (x) max {0, x }; that is, f (x) is the greater of 0 and x, where x is the input to the activation function.