CN115935075A

CN115935075A - Social network user depression detection method integrating tweet information and behavior characteristics

Info

Publication number: CN115935075A
Application number: CN202310045687.XA
Authority: CN
Inventors: 王李冬; 曹世华; 胡克用; 李文娟; 安康
Original assignee: Qianjiang College of Hangzhou Normal University
Current assignee: Guangzhou Dayu Chuangfu Technology Co ltd
Priority date: 2023-01-30
Filing date: 2023-01-30
Publication date: 2023-04-07
Anticipated expiration: 2043-01-30
Also published as: CN115935075B

Abstract

The invention discloses a social network user depression detection method integrating tweet information and behavior characteristics. Firstly, crawling a user data set from a Sina microblog database, performing text cleaning, and generating a depressed user data set and a non-depressed user data set through manual marking; then, combining the multichannel CNN and the BiGRU based on the attention mechanism to analyze the emotional tendency of each text pushed by the user, filtering out part of forward emotional pushed texts, and forming a user history text; secondly, extracting characteristics of the user such as posting time, forwarding behavior and image publishing behavior to form a user behavior characteristic vector; and finally, building a depression detection model fusing the historical text of the user and the behavior of the user, training the detection model by using an Adam optimization method, and detecting the user to be detected by using the model after the training is finished. The method can effectively integrate the user text pushing information and the user behavior characteristics to automatically detect the depression state of the user, and has the characteristics of low detection cost, convenient operation and the like.

Description

Social network user depression detection method integrating tweet information and behavior characteristics

Technical Field

The invention relates to the field of depression automatic detection, in particular to a social network user depression detection technology based on tweet information and user behavior characteristics.

Background

As a more serious disorder disease, depression affects the physical and mental health of patients. According to the statistics of the world health organization, the number of global depression patients is up to 3.22 hundred million. Accurate diagnosis of patients with depression is a prerequisite for treatment, but patients with depression must actively contact with mental health professionals and actively seek medical advice to have an opportunity to obtain a diagnosis. However, due to the lack of medical knowledge in most people, the risk of disease is not realized, or factors such as shame, etc., make more than 70% of early depression patients not effectively treated. Therefore, an automatic depression screening technology without a face diagnosis is urgently needed, potential depression patients are excavated, and harm to people and the society caused by depression is reduced through automatic early warning or auxiliary diagnosis provided for corresponding medical institutions and the like.

The current automatic detection method of depression is mainly realized by using voice or video characteristics, for example, srimadhur et al propose a convolutional neural network based on a spectrum program to process voice signals, and about 60% of accuracy can be obtained based on the method. Negi et al use attributes of voice, pitch, and rhythm to build a depression detection model. The Melo et al can propose an accurate prediction method based on the distributed learning on the basis of facial expression analysis of the face, explore the relationship between facial images and depression levels, and have robustness to noise data and uncertain labels. There is a commonality in the above-mentioned research methods, that is, most methods require analysis through voice, face image and video data at diagnosis and treatment, and the acquisition of these data requires the user to actively seek medical advice.

With the popularity of social networks, more and more users are beginning to share their emotions and feelings on social media, such as Twitter and Facebook. More and more researchers have discovered that social media can serve as a window to observe the mental health of a user. For example, shen et al faced the Twitter platform and found that the behavior of depressed users and non-depressed users on the social platform was not the same. Chiu et al predicted the composite depression score for each post on the Instagram using features such as images and text on the social network, and fully considered the time interval factor between tweets. Zogan et al, which aim at text objects, perform text semantic coding by a multi-layer attention mechanism, and predict the probability value of depression of a user by using a neural network. However, the above method still has several problems:

1) The text of the existing text of the user contains more useless noise text, and the text of the text interferes the detection of the depressed user and influences the accuracy of the detection. However, most algorithms analyze all historical tweets of users, and a satisfactory detection effect cannot be achieved.

2) Most existing methods ignore behavior attributes of a user publishing a tweet, such as publishing time, whether the published tweet contains an image, whether the published tweet has a forwarding attribute, and the like.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides a depression detection method fusing a tweet text and user behavior characteristics.

The technical scheme adopted by the invention for solving the technical problem comprises the following steps:

step 1, crawling a user data set from a Xinlang microblog database, cleaning texts, and generating a depressed user data set and a non-depressed user data set through manual labeling, namely a data acquisition module.

And 2, analyzing the emotional tendency of each tweet of the user by fusing the multichannel CNN and the BiGRU based on the attention mechanism to obtain the emotional tendency probability value of the tweet of the user. Randomly removing positive emotional tendency tweets with a certain proportion p from the history tweets of each user, and splicing the rest tweets into a user history text T; namely a tweet emotional tendency analysis module.

And 3, extracting characteristics of the posting time, the forwarding behavior, the image publishing behavior and the like of the user to form a user behavior characteristic vector, namely a user behavior characteristic module.

And 4, building a depression detection model fusing the user historical text T and the user behavior feature vectors, inputting the T into a BiGRU layer and a feed-forward (feed-forward) attention layer to obtain the feature vectors of each user historical text, and inputting the feature vectors of the user historical texts and the user behavior feature vectors into a full connection layer and a softmax layer after fusing the feature vectors of the user historical texts.

And 5, training a depression detection model by using an Adam optimization method, and detecting the depression state of the user by using a test set after the training is finished.

Further, the step 1 is specifically realized as follows:

1-1, collecting a data set of candidate depression users from Xinlang microblogs. Several topics related to depression, such as "depression", "juvenile depression", are randomly selected, and then candidate depressed users are crawled from each topic. And (4) crawling historical data of the candidate depressed users, wherein the historical data comprises information such as historical tweets, publishing time, whether forwarding tweets are adopted, whether images are published and the like.

1-2, selecting users who mention the history of depression diagnosis in the tweet of the candidate depression user data set as depression users; in addition, if the user's tweets contain words for symptoms associated with the field of depression including "depression", and some associated therapeutic agents include "sertraline", "fluoxetine", etc., the same is set for depressed users.

1-3, randomly selecting users from the theme unrelated to depression, for example, randomly selecting users from the theme such as 'this day happy', 'food', 'travel', and the like, and crawling historical data of the users, including information such as historical tweets, publishing time, whether forwarding tweets are required, whether images are published, and forming a data set of the users not suffering depression.

And 1-4, for the text data of each user, performing cleaning on the text data through word segmentation and data filtering. Text word segmentation is performed using the "Jieba" word segmentation package. The data filtering is mainly to remove the "#" theme, URL information, irregular characters, stop words and official account users, and convert emoticons into text information.

Further, the step 2 is specifically realized as follows:

2-1, pre-training a CBOW model by utilizing a large-scale Chinese Wikipedia data set so as to obtain an embedded vector of the Chinese word. The history of each user is pushed to the text t _i After the CBOW model, a matrix S belongs to R ^n×d Where n represents the number of words in the tweet and d represents the embedded vector dimension for each word.

2-2. As shown in FIG. 2, the matrix S is input into a multi-channel CNN, which contains convolutional and pooling layers. In the convolutional layer, assume the convolutional kernel W ∈ R ^h×d H = {2,3,4} is the size of the convolution kernel, and the eigenvector a = [ a ] is obtained by the convolution kernel W ₀ ,a ₁ ,...,a _n-h ]∈R ^n-h+1 ，a _j ＝σ(W·S _i:i+h-1 + b); where σ represents a non-linear function, b represents a bias term, S _i:i+h-1 Representing the ith through (i + h-1) th rows of the matrix S. In the pooling layer: and (4) inputting the output of the convolution layers under different convolution kernels into the pooling layer, and extracting the most important feature O under the fixed dimension.

2-3, pushing each text t _i Input into the attention-based BiGRU model. The first layer is designed as a BiGRU layer having a forward GRU and backward GRU structure. In the first layer, the outputs from the hidden layers in both directions are connected as the final output of the BiGRU layer. The second layer is designed as a feed-forward attention layer to obtain a representative vector with fixed length:

c _i ＝tanh(W _i h _i +b _i )

wherein ,h_i Representing a word s _i Output vector at BiGRU layer,c _i Represents the output of the fully connected layer, W _i ∈R ^1×d and b_i E R is the weight and bias in the attention calculation process, h represents the output of the attention layer, α _i Representing a word s _i Attention distribution coefficient of (1).

2-4. As shown in fig. 2, the feature O of the output of step 2-2 and the output h of the attention layer of step 2-3 are spliced to obtain a vector V = [ O, h =]. V is input into the fully connected layer and a dropout layer is added after the fully connected layer to prevent overfitting. Designing a softmax layer after the dropout layer, and outputting to obtain a user specific text t _i Positive and negative emotional tendency probability value p (y) _i = positive') and p (y) _i ＝'negative')。p(y _i = 'positive') represents the probability value that the tweet is a positive emotional tendency, p (y) _i = 'negative') represents a probability value for a presumed negative emotional tendency.

And 2-5, training the model by using an Adam optimizer.

And 2-6, randomly removing positive emotion texts with a certain proportion p from the history texts of each user, and splicing the rest texts into a history text T.

Further, the step 3 is realized as follows:

3-1, in order to extract the release time characteristics of a certain user, extracting the tweet proportion released by each user every hour in a week. In specific implementation, the proportion of the number of the derived messages is calculated according to the number of the derived messages issued in a specific hour

The pushtext publication time in one day can form a 24-dimensional feature, and the pushtext in one week forms a 168-dimensional feature, which is marked as f _t 。

3-2, in order to extract the forwarding behavior characteristics of a certain user, extracting the forwarding labels of the previous 150 historical tweets of the certain user as forwarding behavior characteristic vectors. If a certain tweet is forwarded from the tweets of other people, the forwarding tag is set to 1, otherwise, the forwarding tag is set to 0. If there are fewer than 150 historical tweets for a user, the vector is filled with 1 s. The generated user forwarding behavior feature vector is recorded as f _r 。

3-3. For extracting image of userAnd (4) distributing characteristics, namely extracting image distribution labels of the previous 150 historical tweets of a certain user to form a characteristic vector. If a certain tweet issued by the user contains image information, the image issuing tag is set to 1, otherwise, the image issuing tag is set to 0. If a user's historical tweets are less than 150, the vector is filled with 0 s. The generated image release characteristic vector is recorded as f _g 。

3-4. The value ranges of different characteristics are different, so that the characteristic f is obtained _t Normalized to [0,1 ] by min-max normalization method]To give f' _t Then f 'is prepared' _t 、f _r and f_g And f is obtained by splicing the feature vectors. f is the behavior feature vector of the user with dimension 468.

Further, the step 4 is implemented as follows:

4-1, obtaining an embedded vector of each word in a historical text T of a certain user by utilizing the CBOW model obtained by training in the step 2-1, and forming a historical tweet sequence S' e in R ^m×d Where m represents the total number of words in the historical tweet sequence, d represents the embedding vector dimension for each word, d =300.

4-2. As shown in FIG. 3, the historical tweet sequence S' for each user is entered into the attention-based BiGRU model. The first layer is designed as a BiGRU layer with a forward GRU and backward GRU structure. In this layer, the outputs of the hidden layer from both directions are connected as the final output of the BiGRU. The second layer is designed as a feed-forward attention layer to obtain a representative vector with fixed length:

c _i '＝tanh(W _i 'h _i '+b _i ')

wherein ,h_i ' represents the word S ' in the historical tweet sequence S ' _i At the output vector of BiGRU, c _i ' denotes the output of the fully connected layer, W _i '∈R ^1×d and b_i '. Epsilon.R is the weight and bias in the attention calculation process, alpha _i ' stands for the word s _i ' the attention-assigning coefficient, h ' represents the output of the attention layer, i.e., the feature vector of the user's historical text.

4-3, as shown in figure 3, splicing the characteristic vector h 'of the user historical text and the user behavior characteristic vector f, inputting the spliced characteristic vector h' and the user behavior characteristic vector f into a full connection layer, then designing a sigmoid layer, and outputting to obtain the depression probability value of the user

wherein ,

represents the output of the fully connected layer, W _f and b_f Representing weights and biases, and defining a cross-entropy loss function as:

wherein K represents the number of training sets.

Further, the step 5 is implemented as follows:

5-1. The model in FIG. 3 was trained on a training set using an Adam optimizer.

And 5-2, after training, inputting the test set into a text-pushing emotion judgment model, filtering positive emotion text with a certain proportion p, forming the remaining text of each user into historical text pushing of the user, extracting behavior characteristic vectors of the user according to the step 3, inputting the behavior characteristic vectors and the historical text pushing into the trained automatic detection model, and outputting the probability value that a certain user suffers from depression.

The invention has the following beneficial effects:

the method has the focus on how to effectively fuse the user text information and the user behavior characteristics to automatically detect the depression state of the user. The method can track the psychological behavior condition of the user at any time based on the disclosed social platform data, is used for automatic detection of depression users, can also be used as an early automatic screening technology for depression users in a social network, and has the characteristics of low detection cost, convenience in operation and the like. The method comprises the steps of forming a user behavior feature vector based on extracting features of a user such as posting time, forwarding behavior and image publishing behavior; and finally, building a depression detection model fusing the user historical text and the user behaviors. The method can automatically predict potential depression users of the social network, and provides favorable technical means for auxiliary diagnosis of depression in hospitals, early-stage psychological problem early warning and tracking of college students, entry assessment of employees of enterprises and public institutions and the like.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

FIG. 2 is a model diagram of a tweet emotion decision model that integrates a multichannel CNN and an attention BiGRU;

fig. 3 is a diagram of an automatic depression detection model that fuses historical text sequences and user behavior.

Detailed Description

The invention will be further described with reference to the accompanying drawings.

As shown in fig. 1, the method for detecting depression of social network users by fusing tweet information and behavior characteristics includes the following steps:

step 1, crawling a user data set from a Xinlang microblog database, cleaning texts, and generating a depressed user data set and a non-depressed user data set through manual labeling.

And 2, analyzing the emotional tendency of each tweet of the user by fusing the multichannel CNN and the BiGRU based on the attention mechanism to obtain the emotional tendency probability value of the tweet of the user. And randomly removing the positive emotional tendency tweed with a certain proportion p from the historical tweed of each user, and splicing the rest tweeds into the historical text T of the user.

And 3, extracting characteristics of the posting time, the forwarding behavior, the image publishing behavior and the like of the user to form a user behavior characteristic vector.

And 4, building a depression detection model fusing the user historical texts and the user behaviors, inputting the T into a BiGRU layer and a feed-forward (feed-forward) attention layer to obtain a feature vector of each user historical text, fusing the feature vectors of the user historical texts and the user behavior feature vectors, and inputting the feature vectors into a full connection layer and a softmax layer.

And 5, training the detection model by using an Adam optimization method, and detecting the depression state of the user by using a test set after the training is finished.

Further, the step 1 is specifically realized as follows:

1-5, collecting a data set of candidate depression users from the Xinlang microblog. Several topics related to depression, such as "depression", "juvenile depression", are selected, and candidate depressed users are crawled from each topic. We crawl these users' historical data, including historical tweets, time of publication, whether forwarding tweets, whether images are published, etc.

1-6, selecting users who mention a history of depression diagnosis in their tweets as depressed users for the candidate depressed user dataset; in addition, if the user's tweets contain words for symptoms related to the field of depression, such as "depression", and some related therapeutic drugs, such as "sertraline", "fluoxetine", etc., the same is set for depressed users.

1-7, randomly selecting users from the topics of ' this day happy ', food, travel ' and the like, crawling historical data of the users, including information such as historical text pushing, publishing time, whether text pushing is forward or not, whether images are published and the like, and forming a non-depression user data set.

According to the steps, the method faces to the domestic social platform Xinlang microblog, crawls data and generates a data set of a large depressed user and a non-depressed user. The data set contained 6423 depressed users and 8617 normal users. The specific situation is as follows:

TABLE 1 Sina microblog user data set specific information

	Number of users	Number of context
			Depression user	6423	207322
Non-depressed user	8617	496327
			Total up to	15040	703649

And 1-8, for the text data of each user, performing cleaning on the text data through word segmentation and data filtering. Text word segmentation is performed using a "Jieba" word segmentation package. The data filtering mainly comprises the steps of removing a # theme, URL information, irregular characters, stop words and official account users, and converting emoticons into text information.

Further, the step 2 is specifically realized as follows:

2-1, pre-training a CBOW model by utilizing a large-scale Chinese Wikipedia data set to obtain Chinese words

The embedded vector of (2). The present invention sets the word vector size to 300. The history of each user is pushed to the text t _i After the CBOW model, a matrix S belongs to R ^n×d Where n represents the number of words in the tweet and d represents the embedded vector dimension for each word.

2-2. As shown in FIG. 2, S is input into a multi-channel CNN, which contains convolutional and pooling layers. In convolutional layers, a convolutional kernel is assumedW∈R ^h×d H = {2,3,4} is the size of the convolution kernel, and the eigenvector a = [ a ] is obtained by the convolution kernel W ₀ ,a ₁ ,...,a _n-h ]∈R ^n-h+1 ，a _j ＝σ(W·S _i:i+h-1 + b); where σ represents a non-linear function, b represents a bias term, S _i:i+h-1 Representing the ith through (i + h-1) th rows of the matrix S. The present invention sets the number of each convolution kernel to 128 with a step size of 1. In the pooling layer, the output of the convolutional layers under different convolutional kernels is input into the pooling layer, and the most important feature O under a fixed dimension is extracted, wherein the dimension is 128 x 3.

2-3, inputting each tweet into the attention-based BiGRU model. The first layer is designed as a BiGRU layer with a forward GRU and backward GRU structure, and the present invention sets the dimension of the hidden layer to 128. In this layer, the outputs of the hidden layer from both directions are connected as the final output of the BiGRU. The second layer is designed as a feed-forward attention layer to obtain a representative vector with fixed length:

c _i ＝tanh(W _i h _i +b _i )

wherein ,h_i Representing words s _i At the output vector of BiGRU, c _i Represents the output of the fully connected layer, W _i ∈R ^1×d and b_i E R is the weight and bias in the attention calculation process, h represents the output of the attention layer and has a fixed length of 128, alpha _i Representing a word s _i Attention distribution coefficient of (1).

2-4. As shown in fig. 2, the output O of step 2-2 and the output h of step 2-3 are spliced to obtain a vector V = [ O, h =]Dimension 512. V is input into the fully connected layer and a dropout layer is added after the fully connected layer to prevent overfitting. Designing softmax layer after dropout layer, and outputting to obtain usefulUser-specific tweet t _i Positive and negative emotional tendency probability value p (y) _i = 'positive') and p (y) _i ＝'negative')。p(y _i = 'positive') represents the probability value that the inferences are positive emotional trends, p (y) _i = 'negative') represents the probability value that the tweet is a negative emotional tendency.

2-5. The model in FIG. 2 was trained using an Adam optimizer. Specifically, the Mini-batch size is set to 100, the learning rate is set to 0.001, the epoch is set to 50, and the discharge rate is set to 0.5. And 2-6, randomly removing positive emotion texts with a certain proportion p from the history texts of each user, and splicing the rest texts into a history text T. In a specific implementation, the ratio p =0.5 is set.

Further, the step 3 is realized as follows:

3-1, in order to extract the release time characteristics of a certain user, extracting the tweet proportion released by each user every hour in a week. In particular, the proportion of the number of the tweets is calculated according to the number of the tweets issued in a specific hour

The pushtext publication time in one day can form a 24-dimensional feature, and the pushtext in one week forms a 168-dimensional feature, which is marked as f _t . <xnotran> , 20 , 0 23 , [0,0,0,0,0,2,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 </xnotran>]<xnotran> , 24 [0,0,0,0,0,0.1,0,0,0.05,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 </xnotran>]。

3-2, in order to extract the forwarding behavior characteristics of a certain user, extracting the forwarding labels of the previous 150 historical tweets of the certain user as forwarding behavior characteristic vectors. If a certain pushtext is forwarded from the pushtext of other people, the forwarding tag is set to 1, otherwise, the forwarding tag is set to 0. If there are fewer than 150 historical tweets for a user, the vector is filled with 1 s. The generated user forwarding behavior feature vector is recorded as f _r 。

3-3, in order to extract the image release characteristics of the user, extracting the image release labels of the previous 150 historical tweets of the user to form a characteristic vector. If a certain tweet issued by the user contains a figureAnd if the image information is obtained, the image release label is set to be 1, otherwise, the image release label is set to be 0. If a user's historical tweets are less than 150, the vector is filled with 0 s. The generated image release characteristic vector is recorded as f _g 。

3-4. The value ranges of different characteristics are different, so that the characteristic f is obtained _t Normalized to [0,1 ] by min-max normalization method]To give f' _t Then f 'is prepared' _t ，f _r and f_g And f is obtained by splicing the feature vectors. f is the behavior feature vector of the user with dimension 468.

Further, the step 4 is implemented as follows:

4-1, obtaining an embedded vector of each word in a certain user historical text T by utilizing the CBOW model obtained by training in the step 2-1, and forming a matrix S' belonging to the R ^m×d Where m represents the total number of words in the historical tweet sequence, d represents the embedding vector dimension for each word, and d =300.

4-2. As shown in FIG. 3, the historical tweet sequence S' for each user is entered into the attention-based BiGRU model. The first layer is designed as a BiGRU layer with a forward GRU and backward GRU structure, and the present invention sets the dimension of the hidden layer to 128. In this layer, the outputs from the hidden layers in both directions are connected as the final output of the BiGRU. The second layer is designed as a feed-forward attention layer to obtain a representative vector with fixed length:

c _i '＝tanh(W _i 'h _i '+b _i ')

wherein ,h_i ' denotes vocabulary S ' in the history grammar sequence S ' _i Output vector at BiGRU, c _i ' denotes the output of the fully connected layer, W _i '∈R ^1×d and b_i '. Epsilon.R is the weight and bias in the attention calculation process, α _i ' is representative ofWord s _i 'and h' represents the output of the attention layer. In particular, h' has a fixed length of 128.

wherein ,

wherein K represents the number of training sets.

Further, the step 5 is implemented as follows:

5-1. The model in FIG. 3 was trained on a training set using an Adam optimizer. Specifically, the Mini-batch size is set to 200, the learning rate is set to 0.001, the epoch is set to 100, and the discharge rate is set to 0.5.

And 5-2, after training, inputting the test set into a text-pushing emotion judgment model, filtering positive emotion text with a certain proportion p, forming the remaining text of each user into historical text pushing of the user, extracting behavior characteristic vectors of the user according to the step 3, inputting the behavior characteristic vectors and the historical text pushing into the trained automatic detection model, and outputting the probability value that a certain user suffers from depression. The invention makes the crawled Sina user database according to the following steps of 7:3, dividing the training set and the test set in proportion, wherein the specific judgment standard comprises the following steps: F1-Score, recall and Precision, the test results are shown in Table 2.

TABLE 2 test results

Method	Precision	Recall	F1_score
				TBF	0.8581	0.7258	0.7864
EHLM	0.8723	0.7896	0.8289
				This patent	0.8887	0.8749	0.8823

In addition, comparing the present invention with the TBF (Chiong et al) and EHLM (Ansari et al) methods, the results of Table 2 show that the effect of the present invention is significantly superior to the other two methods. Both on Precision and on Recall, a major improvement was achieved. Compared with a TBF method, the method achieves 0.0959 improvement on F1_ Score; compared with the EHLM method, the invention achieves a 0.0534 improvement on F1_ Score.

Claims

1. The social network user depression detection method fusing tweet information and behavior characteristics is characterized by comprising the following steps of:

step 1, crawling a user data set from a Xinlang microblog database, cleaning texts, and generating a depressed user data set and a non-depressed user data set through manual labeling;

step 2, integrating the multichannel CNN and the BiGRU based on the attention mechanism to analyze the emotional tendency of each piece of tweed of the user, and obtaining the emotional tendency probability value of the tweed of the user; randomly removing positive emotional tendency tweets with a certain proportion p from the history tweets of each user, and splicing the rest tweets into a user history text T;

step 3, extracting characteristics of the user such as posting time, forwarding behavior and image publishing behavior to form a user behavior characteristic vector;

step 4, building a depression detection model fusing the user historical text T and the user behavior feature vectors, inputting the T into a BiGRU layer and a feedforward attention layer to obtain the feature vectors of each user historical text, fusing the feature vectors of the user historical texts and the user behavior feature vectors, and then inputting the fused feature vectors into a full connection layer and a softmax layer;

2. The method for detecting depression of social network users based on fusion of tweet information and behavior features as claimed in claim 1, wherein the step 1 is implemented as follows:

1-1, collecting a data set of candidate depression users from the Xinlang microblog; randomly selecting several topics relevant to depression, and then crawling candidate depression users from each relevant topic; historical data of the candidate depressed users are crawled, wherein the historical data comprises historical tweets, release time, whether forwarding tweets are used or not and whether image information is released or not;

1-2, selecting users who mention a history of depression diagnosis in their tweets as depressed users for the candidate depressed user data set; in addition, if the user's tweets contain words for symptoms related to the field of depression including "suicide", "depression", and some related therapeutic drugs including "sertraline", "fluoxetine", the same is set for depressed users;

1-3, randomly selecting users from the non-relevant theme of depression, and crawling historical data of the users, wherein the historical data comprises historical text pushing, publishing time, whether the text is forwarding text pushing or not, whether image information is published or not, and forming a non-depression user data set;

1-4, aiming at the text data of each user, cleaning the text data through word segmentation and data filtering; performing text word segmentation by using a "Jieba" word segmentation packet; the data filtering mainly comprises the steps of removing a # theme, URL information, irregular characters, stop words and official account users, and converting emoticons into text information.

3. The method for detecting depression of social network users based on fusion of tweet information and behavior features as claimed in claim 2, wherein the step 2 is implemented as follows:

2-1, pre-training a CBOW model by utilizing a large-scale Chinese Wikipedia data set so as to obtain an embedded vector of Chinese words; pushing the history of each user to a text t _i After the CBOW model, a matrix S belongs to R ^n×d Where n represents the number of words in the tweet, d represents the embedded vector dimension of each word;

2-2, inputting the matrix S into a multichannel CNN, wherein the multichannel CNN comprises a convolution layer and a pooling layer; in the convolutional layer, assume the convolutional kernel W ∈ R ^h×d H = {2,3,4} is the size of the convolution kernel, and the eigenvector a = [ a ] is obtained by the convolution kernel W ₀ ,a ₁ ,...,a _n-h ]∈R ^n-h+1 ，a _j ＝σ(W·S _i:i+h-1 + b); where σ denotes a non-linear function, b denotes a bias term, S _i:i+h-1 Represents the ith to ith + h-1 rows of the matrix S; in the pooling layer: inputting the output of the convolution layers under different convolution kernels into a pooling layer, and extracting the most important feature O under a fixed dimension;

2-3, inputting each tweet into a BiGRU model based on attention; designing the first layer as a BiGRU layer with a forward GRU structure and a backward GRU structure; in the first layer, the outputs from the hidden layers in both directions are connected as the final output of the BiGRU layer; the second layer is designed as a feed-forward attention layer to obtain a representative vector with fixed length:

c _i ＝tanh(W _i h _i +b _i )

wherein ,h_i Representing a word s _i Output vector at the BiGRU layer, c _i Represents the output of the fully connected layer, W _i ∈R ^1×d and b_i E R is the weight and bias in the attention calculation process, h represents the output of the attention layer, α _i Representing a word s _i The attention distribution coefficient of (a);

2-4, splicing the output characteristic O of the step 2-2 and the output h of the attention layer of the step 2-3 to obtain a vector V = [ O, h =](ii) a Inputting V into a full connection layer, and adding a dropout layer after the full connection layer to prevent overfitting; designing a softmax layer after the dropout layer, and outputting to obtain a user specific text t _i Positive and negative emotional tendency probability value p (y) _i = 'positive') and p (y) _i ＝'negative')；p(y _i = 'positive') represents the probability value that the inferences are positive emotional trends, p (y) _i = 'negative') represents the probability value of inferring a negative emotional tendency;

2-5, training the model by using an Adam optimizer;

4. The method for detecting depression of social network users based on fusion of tweet information and behavior features as claimed in claim 3, wherein the step 3 is implemented as follows:

3-1, extracting the tweet proportion released by each user every hour in a week in order to extract the release time characteristics of the user; calculating the ratio of the number of the given time-lapse messages according to the number of the given time-lapse messages

The pushtext release time in one day can form a 24-dimensional feature, and the pushtext in one week can form a 168-dimensional release time feature, which is marked as f _t ；

3-2, in order to extract the forwarding behavior characteristics of a certain user, extracting the forwarding labels of the previous 150 historical tweets of the certain user as forwarding behavior characteristic vectors; if a certain pushtext is forwarded from the pushtext of other people, the forwarding label is set to be 1, otherwise, the forwarding label is set to be 0; if the history tweet of a user is less than 150 pieces, filling the vector with 1; the generated user forwarding behavior feature vector is recorded as f _r ；

3-3, in order to extract image release characteristics of a user, extracting image release labels of the previous 150 historical tweets of a certain user to form a characteristic vector; if a certain tweet published by the user contains image information, setting an image publishing label as 1, otherwise, setting the image publishing label as 0; if the historical tweets of a certain user are less than 150, filling the vector with 0; the generated image release characteristic vector is recorded as f _g ；

3-4. The value ranges of different characteristics are different, so that the characteristic f is obtained _t Normalized to [0,1 ] by min-max normalization method]To give f' _t Then f 'is prepared' _t 、f _r and f_g Splicing the feature vectors to obtain f; f is the behavior feature vector of the user with dimension 468.

5. The method for detecting depression of social network users based on fusion of tweet information and behavior features as claimed in claim 4, wherein the step 4 is implemented as follows:

4-1, obtaining an embedded vector of each word in a historical text T of a certain user by utilizing the CBOW model obtained by training in the step 2-1, and forming a historical tweet sequence S' e in R ^m×d Where m represents the total number of words in the historical tweet sequence, d represents the embedded vector dimension for each word, d =300;

4-2, inputting the historical tweet sequence S' of each user into a BiGRU model based on attention; designing the first layer as a BiGRU layer with a forward GRU structure and a backward GRU structure; in this layer, the outputs from the hidden layers in both directions are connected as the final output of the BiGRU; the second layer is designed as a feedforward attention layer to obtain a representative vector with a fixed length:

c _i '＝tanh(W _i 'h _i '+b _i ')

wherein ,h_i ' represents the word S ' in the historical tweet sequence S ' _i Output vector at BiGRU, c _i ' denotes the output of the fully connected layer, W _i '∈R ^1×d and b_i '. Epsilon.R is the weight and bias in the attention calculation process, alpha _i ' represents the word s _i ' the attention distribution coefficient, h ' represents the output of the attention layer, i.e. the feature vector of the user's historical text;

4-3, splicing the characteristic vector h 'of the user historical text and the user behavior characteristic vector f, inputting the spliced characteristic vector h' and the user behavior characteristic vector f into a full-connection layer, designing a sigmoid layer, and outputting to obtain the depression probability value of the user

wherein ,

wherein K represents the number of training sets.

6. The method for detecting depression of social network users based on fusion of tweet information and behavior features as claimed in claim 5, wherein the step 3 is implemented as follows:

and (3) after the training of the depression detection model is finished, inputting the test set into the depression detection model, filtering positive emotion text with a certain proportion p, forming the rest text of each user into the historical text of the user, extracting the behavior characteristic vector of the user according to the step 3, inputting the behavior characteristic vector and the historical text into the trained depression detection model, and outputting the probability value that a certain user suffers from depression.