CN113780012B

CN113780012B - Depression interview dialogue generating method based on pre-training language model

Info

Publication number: CN113780012B
Application number: CN202111165245.6A
Authority: CN
Inventors: 周德宇; 王博宇; 张林海
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2023-12-29
Anticipated expiration: 2041-09-30
Also published as: CN113780012A

Abstract

The invention discloses a depression interview dialogue generating method based on a pre-training language model, which comprises the steps of collecting basic information of a user through fixed questions; extracting part of questions in a preset question library according to a strategy, and constructing a main question stream for asking a user; classifying emotion polarities of the user replies, and selecting corresponding reply sentences as responses to the emotion of the user; and generating subsequent questions related to the user replies according to the current questions and the user replies by using the fine-tuned pre-training language model GPT-2. The system of the invention mainly comprises: the system comprises a voice recognition and synthesis module, a preset question library, a main question stream construction module, an emotion classification and response module and a follow-up question generation module. Compared with the previous dialogue generation method completely using the fixed template, the method provided by the invention has the advantages that the fine-tuned pre-training language model can generate more flexible follow-up problems, and can perform more effective depression diagnosis interviews, so that the method has better effect.

Description

Depression interview dialogue generating method based on pre-training language model

Technical Field

The invention relates to a method for generating a dialogue by using a deep learning technology, in particular to a depression interview dialogue generating method and system based on a pre-training language model.

Background

In the medical field, the application of dialog systems for diagnosis and evaluation of mental health is a study with a long history. In fact, the first chat robot historically, ELIZA developed in the last 60 th century Joseph Weizenbaum, was an automated dialog system for mental health intervention.

DSM-IV clinical interview is a clinical diagnostic approach primarily directed to depression. The existing dialogue system facing medical diagnosis mainly takes problems as guidance, and the dialogue system continuously throws out the problems to push the whole dialogue flow. However, most of the mental health interview systems currently available employ a fixed set of questions for semi-structured interviews. The fixed problem setting makes the construction of the dialogue system more convenient, but also limits the flexibility of the system to a great extent. An ideal depression diagnosis dialogue system should be able to present more relevant follow-up questions based on the user's responses. The invention considers the adoption of a text generation technology in the deep learning field for subsequent problem generation.

Traditional text generation techniques include GAN, seq2Seq, etc. models. These models have achieved good results in terms of the task of generating for a particular text. However, in the case of less training data, the model is less robust, and it is difficult to generate text close to the true distribution.

In 2018, openAI proposed a GPT (generating Pre-Training) model. GPT adopts a training mode of pre-training and fine tuning, and is pre-trained by using a large-scale corpus, and then fine tuning is performed according to specific generation tasks. The basic structure of the GPT model is to stack 12 transducer decoder layers. When the input layer inputs a sequence, identifiers are added at the beginning and end of the sequence. If multiple sequences are entered, the sequences are concatenated with identifiers representing the separation and then processed in a sequence, most text can be entered in this manner. The mode of pretraining and fine tuning reduces training cost, and can obtain better generation result under the condition of only a small quantity of training samples. In 2019, openAI proposed an upgraded version of GPT-2. Currently, various types of text generation studies are actively attempting to use pre-trained language models.

Disclosure of Invention

In order to solve the problems, the invention discloses a depression interview dialogue generation method and a depression interview dialogue generation system based on a pre-training language model, which utilize the pre-training language model to generate texts, thereby solving the problems of fixed system problem setting, incapability of carrying out effective inquiry, less data available for training and the like. The method disclosed by the invention mainly comprises the following steps:

(1) Basic information acquisition, namely, when a depression outpatient dialogue starts, presenting a fixed problem of acquiring basic information to a user, wherein the basic information comprises names, ages and residence places, extracting and recording the information, and providing the information for subsequent steps;

(2) The method comprises the steps of constructing a main question stream, extracting questions from a preset question library according to a given strategy, constructing a main question stream of a depression outpatient conversation, sequentially outputting the questions in the main question stream to a user in the whole conversation process, and ending the depression outpatient conversation when the questions in the main question stream are not remained and the corresponding follow-up steps are executed;

(3) The method comprises the steps of replying emotion classification and response, classifying the replied emotion by using a trained classifier after receiving the reply of a user to a problem in a main problem stream each time, and responding according to a classification result;

(4) And (3) judging a follow-up question generation condition, and determining whether to generate the follow-up question according to the returned emotion classification result and the returned content length in the step (3).

(5) And (3) generating a follow-up problem, wherein when the judging result in the step (4) meets the follow-up problem generating condition, the follow-up problem is generated by using the fine-tuned pre-training language model GPT-2, and the follow-up problem is generated according to the current problem of the system and the reply of the user to the problem.

Further, the basic information collection in the step (1) is provided with simple problems of inquiring three aspects of the name, age and residence of the user, and extracting the name, age and residence information in the reply of the user; the method for extracting the names and the ages in the user answers adopts a regular expression to extract the entities representing the names and the numbers representing the ages in the user answers; the method for extracting the residence in the user answer adopts a keyword dictionary method.

Further, the specific content of the preset problem library in the step (2) is derived from two aspects: (a) The DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in 50 depression clinics with the highest frequency of occurrence are taken out, wherein each problem is provided with a theme label and an emotion guiding label; (b) PHQ-9 (Patient Health Questionnaire-9) questionnaire.

Further, the main problem stream construction strategy in the step (2) specifically comprises the following steps: (1) macroscopically dividing the main question stream into three phases, namely a starting phase, a consultation phase and an ending phase, wherein the range of the problem selection in the starting phase is a problem that the emotion guiding label in the DAIC-WOZ problem is Common, the problem selection in the consultation phase is the whole content of the PHQ-9 questionnaire, the range of the problem selection in the ending phase is a problem that the emotion guiding label in the DAIC-WOZ problem is Positive, and the number of the problem bars in the starting phase and the ending phase is randomly extracted according to the set number; (2) according to the topic labels of the questions, the question mark ratios of various topic labels are calculated for the questions extracted in the beginning stage and the ending stage, and the questions under topics with the ratio exceeding the corresponding ratio in a preset question library are replaced until the question ratios of all the topics meet the requirements; (3) taking the difference of topics of interest of users in different age groups into consideration, and correspondingly supplementing a main problem stream according to the age information of the users; (4) and (3) post-processing, and replacing the placeholders of part of the problems according to the acquired user information.

Further, the emotion classification and response method in the step (3) specifically comprises the following steps: converting word sequences of dialog text into word vector sequences using word2vec word vectors (w ₀ ,w ₁ ,w ₂ ,…,w _n ) Inputting a trained two-way long-short-term memory network, calculating, and taking the hidden layer state h of the last time step _n And (3) as semantic representation of the whole input, obtaining emotion category probability distribution of the dialogue text through a full-connection layer using a softmax function as an activation function, wherein the category corresponding to the maximum probability is an emotion classification result. If the emotion classification result is positive or negative, randomly extracting corresponding replies from a preset emotion reply library, and outputting the replies to the user.

Further, the following problem generating condition in the step (4) is judged, and the specific content is as follows: if the returned emotion classification result is negative, directly judging that no subsequent problem is generated; if the result of the emotion reply classification is neutral or positive and the reply length exceeds the artificially set length threshold, generating a subsequent problem according to the current problem and the user reply, and receiving the reply of the user about the subsequent problem.

Further, the fine tuning method of the pre-training language model GPT-2 in the step (5) specifically comprises the following steps: obtaining 4112 data samples in the form of { q, r, f } from the Asynchronous Interview data set and the Empathic Dialogues data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and dividing all the samples into a training set, a verification set and a test set; 300 samples were randomly extracted in the training set, and f in these samples was randomly replaced with a problem independent of q and r. The 300 samples are taken as interference samples, the labels are 0, and the labels of other normal samples are 1; for each sample { q, r, f } is spliced by special segmenters to obtain a shape<startoftext>q<speaker1>r< speaker2>f<endoftext>Is denoted as u= (U) ₀ ,u ₁ ,u ₂ ,…,u _N )；

For each input sequence, a GPT-2 model is used to obtain a predicted sequenceA semantic representation h of the entire input sequence; obtaining a predicted sequence for each sample by means of the model GPT-2>Calculating a loss of the language model using multi-class cross entropy, the partial loss being denoted +.>Semantic representation h obtained by calculating each sample through a model GPT-2 is used for performing content correlation classification calculation to obtain a classification result +.>Calculating a loss using a two-class cross entropy, the partial loss being denoted +.>For total loss->And optimizing by using a gradient descent method, and updating parameters of the pre-training language model GPT-2.

Further, the subsequent problem in the step (5) is generated, and the specific contents are as follows: and performing subsequent problem generation by applying the trimmed GPT-2 model, taking < startoftex > q < speaker1> r < speaker2> as an initial input sequence, calculating by using the trimmed GPT-2 model to obtain probability distribution of the next word to be predicted, decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of the original sequence to form a new input sequence. The above procedure is repeated until a terminator < endoftext > is generated or a maximum sequence length is reached. The text generated in the process is the follow-up question and is used as the response of the system to the reply of the user and is output to the user.

Based on the depression diagnosis dialogue generation method, the invention also provides a DSM-IV clinical fixed-type interview dialogue generation system based on a pre-training language model, wherein the system comprises a voice recognition and synthesis module, a basic information acquisition module, a main question stream preset question library, a main question stream construction module, an emotion classification and response module and a follow-up question generation module;

the basic information acquisition module is used for providing a fixed problem for a user to acquire user information, extracting basic information in user replies, recording the information and providing the information for subsequent steps of a conversation;

the main question stream preset question library is used for storing questions for constructing the main question stream;

the main question stream combination module extracts a plurality of questions from a main question stream preset question library according to a given construction strategy to form a main question stream in a primary depression outpatient conversation;

the emotion classification and response module classifies emotion polarities of the user replies, the classification results are positive and negative, and the module responds to a preset emotion reply sentence;

and the follow-up question generation module is used for generating follow-up questions asking the user according to the current questions and the user replies, and generating the follow-up questions by adopting the fine-tuned pre-training language model GPT-2. .

Compared with the existing method, the DSM-IV clinical decision interview-oriented dialog generation method and system based on the pre-training language model have the following beneficial effects: the GPT-2 model is pretrained on a large-scale corpus, and compared with the traditional generation model, only a small amount of samples are needed for fine adjustment, and the mode of pretraining and fine adjustment reduces the training cost; the fine-tuned pre-training language model can automatically generate subsequent questions with richer expression and more relativity with the user reply according to the user reply, so that more effective inquiry is carried out in the dialogue process; in addition, the system can sense emotion in the conversation process of the user and give a response, and the whole conversation process is more humanized.

Drawings

Fig. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram showing the structure and fine tuning process of the pre-training language model GPT-2 according to the present invention.

Detailed Description

The present invention is further illustrated in the following drawings and detailed description, which are to be understood as being merely illustrative of the invention and not limiting the scope of the invention. It should be noted that the words "front", "rear", "left", "right", "upper" and "lower" used in the following description refer to directions in the drawings, and the words "inner" and "outer" refer to directions toward or away from, respectively, the geometric center of a particular component.

The embodiment of the invention discloses a pre-training language model GPT-2 based on which a depression outpatient dialogue system is realized, and a model structure diagram is shown in figure 2. In the embodiment of the invention, the Asynchronous Interview data set is used for fine tuning the GPT-2 model, and in the follow-up implementation process, the training sample is expanded by using the data extracted from the Empathic Dialogues data set. The overall training data includes 4112 samples, each data sample shaped as { q, r, f }. As described above, q represents an initial question, r represents a reply to the question, and f represents a subsequent question. All samples were divided into training, validation and test sets in a ratio of 8:1:1. Fine tuning is carried out on a small-scale English version pre-training language model GPT-2, and the specific steps in the fine tuning process are as follows:

1. each sample { q, r, f } is spliced by special segmenters to obtain a shape like a Chinese character<startoftext>q<speaker1>r<speaker2>f<endoftext>Is a sequence of inputs to the computer. This sequence was denoted as u= (U) ₀ ,u ₁ ,u ₂ ,…,u _N )

2. According to the english vocabulary of GPT-2, each word (token) in the input sequence will be mapped to a corresponding token id, which is converted into a corresponding one-hot vector when the model is input. For example, the word "happy" is mapped to the corresponding token id value 3772 and then translated into a one-hot vector, such as [0, …,1, …,0]The values at the other positions are all 0 except for the corresponding position number 3772, which is 1. Matrix composed of independent heat vectors and embedded matrix W _E Multiplying to obtain the word embedded representation sequence of the whole input sequence.

Meanwhile, the position embedding is calculated, and the position embedding calculation is shown as follows:

where pos represents the position of the current token in the input sequence; i is a number used to identify the location of the word embedding dimension; d, d _model Dimension for word embedding. And adding corresponding elements of the word embedding sequence and the position embedding sequence to obtain a representation sequence of the input model. The sequence was noted as U _E 。

3. The GPT-2 model is formed from a stack of 12 layers of identical transducer decoder modules, each containing layer normalization, mask self-attention mechanisms, residual connections, and fully connected neural networks. The computation of the transducer decoder module is as follows:

Transform(H)＝FN(Norm(M _output (H)))+M _output (H)

wherein Norm represents the layer normalization function, the calculation formula is as follows,

M _output the result after passing through the mask attention mechanism with residual connection is shown as follows:

M _output (H)＝M(Norm(H))+H

where M represents the mask self-attention mechanism.

The overall GPT-2 model calculation process can then be expressed as:

H ₁ ＝Transformer(U _E )

H ₂ ＝Transformer(H ₁ )

H ₁₂ ＝Transformer(H ₁₁ )

4. h calculated by GPT-2 model ₁₂ Is the hidden state representation of each token in the input sequence, and the probability distribution of the next token corresponding to each position is obtained through a fully connected layer with an activation function being a softmax function. The final predicted sequence is noted as

5. And (3) supervising task design in the fine tuning process. In the fine tuning process, in order to improve the relevance of the generated content of the model and the existing text, a classification task for judging and generating the relevance of the subsequent problem and the above is constructed. In the specific embodiment, 300 samples are randomly extracted from the training set, and f in the samples is randomly replaced by a problem irrelevant to q and r. The 300 samples served as interference samples, with a label of 0. Other normal samplesThe label is 1.H ₁₂ The last item of the sequence contains semantic information of the whole sequence, which is marked as h, and the semantic information is put into a classifier C for two classification, wherein the calculation formula is as follows:

during the fine tuning process, the loss function of the model is composed of two parts. The first part is the loss of language model, the form is as follows:

specifically, in the present embodiment, multi-class cross entropy is used as a loss function of the language model, and thereforeCan be written in the following form:

where N represents the total number of predicted tokens in the sample. The loss function of the classification task for judging and generating the content correlation is recorded as:

the overall loss function of the trimming process is noted as:

where λ is a super parameter for controlling the weight, and is set to 0.2 in the present embodiment.

When in fine tuning, the text embedding dimension is set to 768, the maximum sequence length is set to 1024, the sample batch size is set to 4, a gradient descent method is adopted to optimize the loss function, model parameters are updated, an Adam optimizer is used, and the initial learning rate is set to 0.0015.

The follow-up problem generating module in the system provided by the invention applies the fine-tuned model to generate the follow-up problem. In a subsequent problem generation, the current q and f are first connected by a separate indicator as an initial input sequence, and after the GPT-2 model is trimmed, the probability distribution of the next word to be predicted is obtained. And decoding by using a Top-k sampling strategy, namely selecting words corresponding to the maximum k values in probability distribution of the next token to be predicted as candidates for the round of decoding, and randomly extracting one word from the k candidates to be used as a decoding result. Here, k is a super parameter, which is set to 5 in the present embodiment. And adding the decoding result to the end of the original sequence to form a new input sequence. The above procedure is repeated until a terminator < endoftext > is generated or a maximum sequence length is reached.

The DSM-IV clinical decision interview-oriented dialog generation method based on a pre-training language model disclosed by the embodiment of the invention, and a flow chart is shown in fig. 1, and mainly comprises the following steps:

s1: and (5) basic information collection. A simple question asking three aspects of the user name, age and residence is set, and the question is presented to the user at the beginning of the dialogue; the method for extracting the names and the ages in the user replies adopts a regular expression to extract the entities representing the names and the numbers representing the ages in the user replies; the method for extracting the residence in the user reply adopts a keyword dictionary method, and the residence information of the user is extracted by comparing the user reply with the city names stored in the preset dictionary. Recording the acquired name, age and residence information, and providing the information for the subsequent steps.

S2: and constructing a dialogue main problem stream. The preset problem library used for constructing the dialogue main problem stream is derived from two aspects: (1) Open domain problems associated with depressed clinics extracted from the DAIC-WOZ dataset; (2) PHQ-9 (Patient Health Questionnaire) questionnaire.

The dialogue chapter in the DAIC-WOZ dataset is a semi-structured depression outpatient interview using fixed questions, the questions in the text data of the dataset are extracted and counted, and finally the open domain questions in the 50 depression outpatients with the highest occurrence number are taken out as a part of the preset question library of the invention. Subject labels and emotion guiding labels were applied to 50 questions as described in DAIC-WOZ original study. The subject tag includes five classes experience, education, work, family and self. Emotion-oriented labels are divided into positive and common, where the question of labels being positive refers to the question of guiding the patient's active emotion during a conversation. The part of the problem is specifically shown in table 1:

TABLE 1 open field problem for depressed clinics extracted in DAIC-WOZ

/>

The PHQ-9 questionnaire content is shown in Table 2:

TABLE 2 PHQ-9 questionnaire content

/>

The strategy for constructing the dialogue main problem stream is as follows:

(1) Macroscopically, the main problem stream is divided into three phases, a start phase, a problem phase and an end phase. The range of the initial phase selection problem is that the emotion guiding label in table 1 is Common, and n is randomly extracted from the problem ₁ A stripe problem; the inquiry stage selects the whole content of the PHQ-9 questionnaire; the range of the end stage selection questions is the questions with the emotion guiding labels of Positive in table 1, from which n is randomly extracted ₂ A stripe problem. N in the present embodiment ₁ Set to 7, n ₂ Set to 5.

(2) And adjusting according to the theme label of the problem. In order to solve the situation that the number of the questions of a certain type of theme is excessive, the proportion of the questions of various types of theme labels is calculated for the questions extracted in the beginning stage and the ending stage, and if the proportion of the questions of a certain type of theme exceeds the corresponding proportion in a preset problem library, the part of the extracted theme questions are replaced randomly. The above process is repeated until the question scale for all topics is appropriate.

(3) Supplementing according to the age information of the user. Taking the fact that the topics of interest of the users in different age groups are different into consideration, corresponding supplementation is carried out according to the age information of the users acquired in the step S1. The supplementing mode is that if the acquired age value is less than or equal to 25, 2 problems with the extracted topics being effects are added, if the acquired age value is more than or equal to 25 and less than or equal to 35, 2 problems with the extracted topics being works are added, and if the acquired age value is more than 35, two problems with the extracted topics being Family are added. The newly added questions are added to the corresponding stages according to their emotion guiding labels.

(4) And (5) post-treatment. And replacing the placeholders of part of the problems according to the collected user information. For example, a question "what are some things you really like about < place >" is extracted, where "< place >" is a placeholder, and if the user residence information at this time is "Jiangsu", the question becomes "what are some things you really like about Jiangsu" through post-processing.

Through the strategy, the main problem flow generated by the depression diagnosis dialogue is constructed. And outputting the questions in the main question stream to the user in sequence in the whole dialogue process, and ending the depression outpatient dialogue when the questions in the main question stream are not remained and the corresponding follow-up steps are executed.

S3: and judging the emotion polarity replied by the user by using an emotion classification module. Firstly, training a classifier in an emotion classification module, extracting 6327 samples for training the classification model from DailyDialog data set, wherein emotion labels of data samples are divided into three types of positive, negative and none, and the three types of emotion labels are respectively used for marking positive emotion samples, negative emotion samples and no emotion samples. These samples were used to train a classifier consisting of word2vec in combination with a Bi-directional long and short Term Memory network (Bi-directional Long Short-Term Memory, biLSTM). In the dialogue process, the specific use process of classification is that word2vec word vectors are used to convert word sequences of dialogue texts into word vector sequences (w ₀ ,w ₁ ,w ₂ ,…,w _n ) Each word vector dimension is 200 dimension, biLSTM is input, and after calculation, the hidden layer state h of the last time step is taken _n And (3) as semantic representation of the whole input, obtaining emotion category probability distribution of the dialogue text through a full-connection layer using a softmax function as an activation function, wherein the category corresponding to the maximum probability is an emotion classification result. If the emotion classification result is positive or negative, randomly extracting corresponding replies from a preset emotion reply library, and outputting the replies to the user.

S4: and judging the subsequent problem generation conditions. If the returned emotion classification result is negative, directly judging that no subsequent problem is generated; if the result of the emotion reply classification is neutral or positive and the reply length exceeds the artificially set length threshold, generating a subsequent problem according to the current problem and the user reply, and receiving the reply of the user about the subsequent problem. In the embodiment of the invention, the length threshold value is set to 25.

S5: subsequent problems are generated. And carrying out subsequent problem generation by using the GPT-2 model after fine adjustment. Firstly, the reply r of the user is preprocessed to remove redundant blank spaces and nonsensical words generated by voice recognition. Connecting the current q and r through a connection indicator to obtain < startoftext > q < speaker1> r < speaker2> as an initial input sequence, and obtaining probability distribution of the next word to be predicted after calculation of a fine-tuned GPT-2 model. Specific calculation procedures are described above in the context of the GPT-2 calculation procedure. And decoding by using a Top-k sampling strategy, and adding a decoding result to the end of the original sequence to form a new input sequence. The above procedure is repeated until a terminator < endoftext > is generated or a maximum sequence length is reached. The text generated in the process is the follow-up question and is used as the response of the system to the reply of the user and is output to the user.

The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features.

Claims

1. A method of generating a depression interview dialogue based on a pre-trained language model, characterized by:

the dialog generation method comprises the following steps:

the method comprises the following steps of (1) collecting basic information, wherein the basic information comprises names, ages and residence places, extracting and recording the basic information, and providing the basic information for subsequent steps when a depression outpatient dialogue starts;

step (2) constructing a main question stream, extracting questions from a preset question library according to a given strategy, constructing a main question stream of a depression outpatient conversation, sequentially outputting the questions in the main question stream to a user in the whole conversation process, and ending the depression outpatient conversation when the questions in the main question stream are not remained and the corresponding follow-up steps are executed;

the emotion classification and response are replied, after the user receives the reply of the questions in the main question stream each time, the trained classifier is used for classifying the replied emotion, and the response is carried out according to the classification result;

judging a subsequent problem generating condition, and determining whether to generate a subsequent problem according to the returned emotion classification result and the returned content length in the step (3);

generating a follow-up problem, namely generating the follow-up problem by using a fine-tuned pre-training language model GPT-2 when the judgment result in the step (4) is that the follow-up problem generation condition is met, and generating the follow-up problem according to the current problem of the system and the reply of a user to the problem; wherein the fine tuning method of the pre-training language model GPT-2 in the step (5) comprises the following steps:

(51) Obtaining 4112 data samples in the form of { q, r, f } from the Asynchronous Interview data set and the Empathic Dialogues data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and dividing all the samples into a training set, a verification set and a test set;

(52) Randomly extracting 300 samples in a training set, and randomly replacing f in the samples with a problem irrelevant to q and r; the 300 samples are taken as interference samples, the labels are 0, and the labels of other normal samples are 1;

(53) For each sample { q, r, f } is spliced by special segmenters to obtain a shape

<startoftext>q<speaker1>r<speaker2>f<endoftext>Is denoted as u= (U) ₀ ,u ₁ ,u ₂ ,…,u _N )；

(54) For each input sequence, a GPT-2 model is used to obtain a predicted sequenceA semantic representation h of the entire input sequence;

(55) Obtaining a predicted sequence by a model GPT-2 for each sampleCalculating a loss of the language model using multi-class cross entropy, the partial loss being denoted +.>

(56) The semantic representation h obtained by calculating the GPT-2 model of each sample is used for performing content correlation classification calculation to obtain classification resultsCalculating a loss using a two-class cross entropy, the partial loss being denoted +.>

(57) For total lossesAnd optimizing by using a gradient descent method, and updating parameters of the pre-training language model GPT-2.

2. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the basic information collection in the step (1) is provided with simple problems of inquiring three aspects of the name, age and residence of the user, and extracting the name, age and residence information in the reply of the user; the method for extracting the names and the ages in the user answers adopts a regular expression to extract the entities representing the names and the numbers representing the ages in the user answers; the method for extracting the residence in the user answer adopts a keyword dictionary method.

3. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the specific content of the preset problem library in the step (2) is derived from two aspects: (a) The DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in 50 depression clinics with the highest frequency of occurrence are taken out, wherein each problem is provided with a theme label and an emotion guiding label; (b) PHQ-9 (Patient Health Questionnaire-9) questionnaire.

4. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the main problem flow construction strategy in the step (2) comprises the following specific contents:

(21) Macroscopically dividing a main question stream into three phases, namely a starting phase, a consultation phase and an ending phase, wherein the range of the selected questions in the starting phase is a question with a Common of emotion guiding labels in open domain questions, the selected questions in the consultation phase are all the contents of PHQ-9 questionnaires, the range of the selected questions in the ending phase is a question with a Positive of emotion guiding labels in open domain questions, and the number of question bars in the starting phase and the ending phase is randomly extracted according to the set number;

(22) According to the topic labels of the questions, the question mark ratios of various topic labels are calculated for the questions extracted in the beginning stage and the ending stage, and the questions under the topics with the ratio exceeding the corresponding ratio in the preset question library are replaced until the question ratios of all the topics meet the requirements;

(23) Taking the difference of topics of interest of users in different age groups into consideration, and correspondingly supplementing a main problem stream according to the age information of the users; (24) And (3) post-processing, and replacing the placeholders in part of the problems according to the acquired user information.

5. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the emotion classification and response method in the step (3) comprises the following specific contents: converting word sequences of dialog text into word vector sequences using word2vec word vectors (w ₀ ,w ₁ ,w ₂ ,…,w _n ) Inputting a trained two-way long-short-term memory network, calculating, and taking the hidden layer state h of the last time step _n As semantic representation of the whole input, obtaining emotion category probability distribution of the dialogue text through a full-connection layer using a softmax function as an activation function, wherein the category corresponding to the maximum probability is an emotion classification result; if the emotion classification result is positive or negative, thenAnd randomly extracting corresponding replies from the set emotion reply library, and outputting the replies to the user.

6. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the following problem generating condition in the step (4) is judged, and the specific content is as follows: if the returned emotion classification result is negative, directly judging that no subsequent problem is generated; if the result of the emotion reply classification is neutral or positive and the reply length exceeds the artificially set length threshold, generating a subsequent problem according to the current problem and the user reply, and receiving the reply of the user about the subsequent problem.

7. The pre-trained language model based depression interview dialogue generation method of claim 1 wherein: the subsequent problem in the step (5) is generated, and the specific contents are as follows: generating a follow-up problem by applying a fine-tuned GPT-2 model, taking < startoftex > q < speaker1> r < speaker2> as an initial input sequence, calculating the fine-tuned GPT-2 model to obtain probability distribution of the next word to be predicted, decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of an original sequence to form a new input sequence; repeating the above process until a terminator < endof > is generated or the maximum sequence length is reached; the text generated in the process is the follow-up question and is used as the response of the system to the reply of the user and is output to the user.

8. A pre-trained language model based depression interview dialog generation system characterized by: the fine tuning method based on the pre-training language model GPT-2 in claim 1 comprises the following modules:

the voice recognition and synthesis module is used for converting the voice input by the user into a text form through voice recognition and is used for processing information of a follow-up module of the dialogue system, correspondingly, the system outputs the problem to the user, and the voice is broadcasted to the user in a voice form after passing through the voice synthesis module;

and the follow-up question generation module is used for generating follow-up questions asking the user according to the current questions and the user replies, and generating the follow-up questions by adopting the fine-tuned pre-training language model GPT-2.