CN113780012A

CN113780012A - Depression interview conversation generation method based on pre-training language model

Info

Publication number: CN113780012A
Application number: CN202111165245.6A
Authority: CN
Inventors: 周德宇; 王博宇; 张林海
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2021-09-30
Filing date: 2021-09-30
Publication date: 2021-12-10
Anticipated expiration: 2041-09-30
Also published as: CN113780012B

Abstract

The invention discloses a depression interview conversation generation method based on a pre-training language model, which comprises the steps of collecting basic information of a user through a fixed problem; extracting partial questions in a preset question bank according to strategies, and constructing a main question flow for asking questions of a user; classifying emotion polarities of user responses, and selecting corresponding reply sentences as responses to the user emotions; and generating a subsequent question related to the user response according to the current question and the user response by using the fine-tuned pre-training language model GPT-2. The system of the invention mainly comprises: the system comprises a voice recognition and synthesis module, a preset question bank, a main question flow construction module, an emotion classification and response module and a follow-up question generation module. Compared with the traditional dialogue generation method completely using a fixed template, the method has the advantages that the finely adjusted pre-training language model can generate more flexible follow-up problems, more effective depression diagnosis interview can be carried out, and better effect is achieved.

Description

Depression interview conversation generation method based on pre-training language model

Technical Field

The invention relates to a method for generating a dialogue by utilizing a deep learning technology, in particular to a method and a system for generating a dialogue for a depression interview based on a pre-training language model.

Background

In the medical field, the application of dialog systems for the diagnosis and assessment of mental health is a study with a long history. In fact, the first chat robot in history, ELIZA developed by Joseph Weizenbaum in the last 60 th century, was an automated dialog system for mental health interventions.

The DSM-IV clinical interview, is the clinical diagnostic tool mainly oriented to depression. The existing dialogue system for medical diagnosis mainly takes problems as guidance, and the dialogue system continuously throws the problems to promote the whole dialogue process. However, most of the currently available mental health interview systems employ a fixed problem set for semi-structured interviews. Although the fixed problem setting makes the building of the dialogue system more convenient, the flexibility of the system is limited to a great extent. An ideal depression diagnosis dialogue system should be able to present more relevant follow-up questions based on the user's responses. The invention considers the generation of follow-up problems by adopting a text generation technology in the field of deep learning.

Conventional text generation techniques include GAN, Seq2Seq, etc. models. These models have achieved good results in the task of generating specific text. However, under the condition of less training data, the robustness of the model is poor, and the text which is close to the real distribution is difficult to generate.

OpenAI proposed a GPT (Generation Pre-Training) model in 2018. The GPT adopts a training mode of pre-training and fine-tuning, firstly uses large-scale linguistic data to perform pre-training, and then performs fine-tuning according to a specific generation task. The basic structure of the GPT model is to stack 12 transform decoder layers. When a sequence is entered at the input layer, identifiers are added at the beginning and end of the sequence. If a plurality of sequences are input, the plurality of sequences are concatenated with identifiers representing the divisions and then processed in one sequence, and most of the text can be input in this manner. The mode of 'pre-training + fine-tuning' reduces the training cost, and a better generation result can be obtained under the condition of only a small number of training samples. In 2019, OpenAI proposed an upgraded version of GPT, GPT-2. Currently, various types of text generation research are actively attempting to use pre-trained language models.

Disclosure of Invention

In order to solve the problems, the invention discloses a method and a system for generating a depression interview dialogue based on a pre-training language model, which are used for generating texts by utilizing the pre-training language model, so that the problems that the system is fixed in problem setting, cannot be effectively pursued, and has less data available for training are solved. The method disclosed by the invention mainly comprises the following steps:

(1) basic information collection, wherein when a depression clinic session is started, a fixed problem of collecting basic information is presented to a user, the basic information comprises name, age and place of residence, and the information is extracted and recorded for subsequent steps;

(2) constructing a main problem flow, extracting problems from a preset problem library according to a set strategy, constructing the main problem flow of the depression clinic dialogue, outputting the problems in the main problem flow to a user in sequence in the whole dialogue process, and finishing the depression clinic dialogue when the problems in the main problem flow are not left and corresponding subsequent steps are executed;

(3) replying emotion classification and response, classifying the replied emotions by using a trained classifier after receiving the reply of the user to the question in the main question stream each time, and responding according to the classification result;

(4) and (4) judging the condition of generating the follow-up question, and determining whether to generate the follow-up question according to the reply emotion classification result and the reply content length in the step (3).

(5) And (4) generating a follow-up problem, when the judgment result in the step (4) meets the follow-up problem generation condition, generating the follow-up problem by using the fine-tuned pre-trained language model GPT-2, and generating the follow-up problem according to the current problem of the system and the response of the user to the problem.

Further, the basic information collection in the step (1) sets simple questions inquiring about the name, age and place of residence of the user, and extracts the name, age and place of residence information in the reply of the user; the method for extracting the name and the age in the user response adopts a regular expression to extract the entity representing the name and the number representing the age in the user response; the method for extracting the residence in the user response adopts a method of a keyword dictionary.

Further, the preset problem library in the step (2) is derived from two aspects: (a) the DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in the 50 depression outpatients with the highest current times are extracted, wherein each problem is provided with a theme label and an emotion guidance label; (b) PHQ-9(Patient Health Questionaire-9) Questionnaire.

Further, the main problem flow construction strategy in the step (2) specifically includes: macroscopically dividing a main problem flow into three stages, namely a starting stage, an inquiry stage and an ending stage, wherein the range of the selected problems in the starting stage is that the emotion guide label in the DAIC-WOZ problem is Common, the selected problems in the inquiry stage are all contents of a PHQ-9 questionnaire, the range of the selected problems in the ending stage is that the emotion guide label in the DAIC-WOZ problem is Positive, and the number of the problems in the starting stage and the ending stage is randomly extracted according to the set number; adjusting according to the theme labels of the problems, calculating the problem proportion of various theme labels for the problems extracted in the starting stage and the ending stage, and replacing the problems under the topics with the proportion exceeding the corresponding proportion in the preset problem library until the problem proportion of all the themes meets the requirements; taking into account the difference of the topics which are interesting to the users in different age groups, and correspondingly supplementing the main question flow according to the age information of the users; and fourthly, post-processing, namely replacing the placeholders of the partial problems according to the collected user information.

Further, the emotion classification and response method in the step (3) specifically includes: converting a word sequence of a dialog text into a word vector sequence (w) using a word2vec word vector₀,w₁,w₂,…,w_n) Inputting the trained bidirectional long-short term memory network, calculating, and taking the hidden layer state h of the last time step_nAs a whole inputSemantic representation, namely obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. And if the emotion classification result is positive or negative, randomly extracting a corresponding response from a preset emotion response library, and outputting the response to the user.

Further, the judgment of the subsequent problem generation condition in the step (4) specifically includes: if the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question.

Further, the fine-tuning method of the pre-training language model GPT-2 in the step (5) specifically includes: obtaining 4112 data samples in the form of { q, r, f } from an Asynchronous overview data set and an empirical diagnostics data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and all samples are divided into a training set, a verification set and a test set; 300 samples were randomly drawn in the training set and f of these samples was randomly replaced with a question independent of q and r. The 300 samples are taken as interference samples, the label is 0, and the labels of other normal samples are 1; splicing each sample { q, r, f } through a special divider to obtain the shape<startoftext>q<speaker1>r< speaker2>f<endoftext>The input sequence of (1) is denoted as "U ═ U₀,u₁,u₂,…,u_N)；

For each input sequence, obtaining a prediction sequence through a GPT-2 model

And a semantic representation h of the entire input sequence; obtaining a prediction sequence for each sample through a model GPT-2

Computing language models using multi-class cross-entropyLoss, which is recorded as

And performing two-classification calculation on the content relevance to obtain a classification result, wherein the semantic expression h obtained by calculating each sample through the model GPT-2 is used for performing two-classification calculation on the content relevance

The loss is calculated using the two-class cross entropy and this partial loss is noted as

For total loss

And optimizing by using a gradient descent method, and updating the parameters of the pre-training language model GPT-2.

Further, the subsequent problem generation in the step (5) specifically includes: and (3) applying the fine-tuned GPT-2 model to generate subsequent problems, taking < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, calculating the fine-tuned GPT-2 model to obtain the probability distribution of the next word to be predicted, decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached. The text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.

Based on the depression diagnosis conversation generation method, the invention also provides a conversation generation system facing DSM-IV clinical fixed interview and based on a pre-training language model, wherein the system comprises a voice recognition and synthesis module, a basic information acquisition module, a main question flow preset question bank, a main question flow construction module, an emotion classification and response module and a follow-up question generation module;

the basic information acquisition module is used for proposing a fixed problem for acquiring user information to a user, extracting the basic information in the user reply, and recording the information for the subsequent steps of conversation;

the main question flow presetting question library is used for storing and constructing the questions of the main question flow;

the main problem flow combination module extracts a plurality of problems from a main problem flow preset problem library according to a set construction strategy to form a main problem flow in a primary depression clinic dialogue;

the emotion classification and response module classifies the emotion polarity replied by the user, and responds to the user with positive and negative classification results according to a preset emotion reply sentence;

and the follow-up question generation module is used for generating follow-up questions to be asked to the user according to the current question and the user response, and the follow-up question generation module adopts the fine-tuned pre-training language model GPT-2 to generate follow-up questions. .

Compared with the prior art, the DSM-IV clinical interview oriented dialog generation method and system based on the pre-training language model have the following beneficial effects that: the GPT-2 model is pre-trained on large-scale corpora, and compared with a traditional generation model, only a small amount of samples are needed for fine adjustment, and the training cost is reduced by the mode of pre-training and fine adjustment; the fine-tuned pre-training language model can automatically generate subsequent problems which are more abundant in expression and more relevant to the user response according to the user response, so that more effective question tracing is carried out in the conversation process; moreover, the system can sense the emotion of the user in the conversation process and give a response, and the whole conversation process is more humanized.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

FIG. 2 is a diagram illustrating the structure and fine-tuning process of the pre-training language model GPT-2 according to the present invention.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.

The embodiment of the invention discloses a pre-training language model GPT-2 based on which a depression clinic dialogue system is realized, and the model structure diagram is shown in figure 2. In the embodiment of the invention, the Asynchronous Interview data set is used for fine tuning of the GPT-2 model, and in the subsequent implementation process, the training sample is expanded by using data extracted from the empirical diagnostics data set. The total training data consists of 4112 samples, each shaped as { q, r, f }. As described above, q represents the initial question, r represents the reply to the question, and f represents the subsequent question. All samples were divided into training set, validation set and test set at a ratio of 8:1: 1. Fine tuning a small-scale English version pre-training language model GPT-2, wherein the fine tuning process comprises the following specific steps:

1. splicing each sample { q, r, f } through a special divider to obtain the shape of<startoftext>q<speaker1>r<speaker2>f<endoftext>The input sequence of (1). The sequence is designated as U ═ U (U)₀,u₁,u₂,…,u_N)

2. According to the English vocabulary of GPT-2, each word (token) in the input sequence is mapped into a corresponding token id, and when a model is input, the token id is converted into a corresponding one-hot encoding (one-hot) vector. For example, the word "happy" is first mapped to the corresponding token id value 3772 and then converted to a one-hot vector, which is shaped as [0,0,0, …,1, …,0]Except that the value at the position of 3772 is 1, the values at the other positions are 0. Combining the matrix composed of the unique heat vectors with the embedded matrix W_EMultiplying to obtain the word embedding representation sequence of the whole input sequence.

And simultaneously calculating position embedding, wherein the position embedding calculation is disclosed as follows:

wherein pos represents the position of the current token in the input sequence; i is a number used to identify the word embedding dimension position; d_modelThe dimension of word embedding. And adding corresponding elements of the word embedding sequence and the position embedding sequence to obtain a representation sequence of the input model. Record this sequence as U_E。

3. The GPT-2 model is formed by stacking 12 layers of identical Transformer decoder modules, each of which contains layer normalization, a mask self-attention mechanism, residual concatenation and fully-concatenated neural networks. The computation process of the Transformer decoder module is as follows:

Transform(H)＝FN(Norm(M_output(H)))+M_output(H)

where Norm represents the layer normalization function, the calculation formula is as follows,

M_outputthe result after the mask attention mechanism with residual error connection is shown, and the calculation formula is as follows:

M_output(H)＝M(Norm(H))+H

where M represents the mask self-attention mechanism.

The computation process of the entire GPT-2 model can then be expressed as:

H₁＝Transformer(U_E)

H₂＝Transformer(H₁)

H₁₂＝Transformer(H₁₁)

4. h obtained by calculation of GPT-2 model₁₂The hidden state of each token in the input sequence represents that the probability distribution of the next token corresponding to each position is obtained through a full connection layer with an activation function being a softmax function. The final predicted sequence is recorded as

5. And (5) carrying out supervision task design in the fine adjustment process. In the fine tuning process, in order to improve the relevance between the generated content of the model and the existing text, a binary task for judging the relevance between the generated subsequent problems and the text is constructed. The specific implementation mode is that 300 samples are randomly extracted from the training set, and f in the samples is randomly replaced by a problem which is independent of q and r. These 300 samples served as interference samples and were labeled 0. The other normal samples are labeled 1. H₁₂The last item of the sequence contains the semantic information of the whole sequence, which is marked as h, and the last item of the sequence is put into a classifier C for secondary classification, wherein the calculation formula is as follows:

during the fine tuning process, the loss function of the model consists of two parts. The first part is the loss of the language model, in the form:

specifically, in the present embodiment, multi-class cross entropy is used as a loss function of the language model, and thus

The following can be written:

where N represents the total number of tokens predicted in the sample. The loss function of the binary task for judging and generating the content correlation is recorded as follows:

the total loss function of the trimming process is recorded as:

where λ is a hyper-parameter for controlling the weight, set to 0.2 in the present embodiment.

During fine tuning, the text embedding dimension is set to 768, the maximum sequence length is 1024, the sample batch size is set to 4, a loss function is optimized by adopting a gradient descent method, model parameters are updated, an Adam optimizer is used, and the initial learning rate is set to 0.0015.

The follow-up problem generation module in the system provided by the invention applies the fine-tuned model to generate follow-up problems. In one subsequent problem generation, the current q and f are connected through a separation indicator as an initial input sequence, and after a fine-tuned GPT-2 model, the probability distribution of the next word to be predicted is obtained. And (3) decoding by using a Top-k sampling strategy, namely selecting a word corresponding to the maximum k values in the probability distribution for predicting the next token as a candidate for decoding in the current round, and randomly extracting a word from the k candidates as a decoding result. Here, k is a hyper-parameter, which is set to 5 in the present embodiment. And adding the decoding result to the end of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached.

The embodiment of the invention discloses a dialogue generating method facing DSM-IV clinical fixed interview based on a pre-training language model, a flow chart is shown in figure 1, and the method mainly comprises the following steps:

s1: and (5) collecting basic information. Simple questions inquiring about the three aspects of the name, age and place of residence of the user are set and are presented to the user at the beginning of the conversation; the method for extracting the name and the age in the user reply adopts a regular expression to extract the entity representing the name and the number representing the age in the user reply; the method for extracting the residence place in the user reply adopts a keyword dictionary method, and the residence place information of the user is extracted by comparing the user reply with the city name stored in the preset dictionary. And recording the collected information of name, age and residence, and providing the information for the subsequent steps.

S2: and constructing a dialogue main question flow. The preset problem library used for constructing the dialog main problem flow is derived from two aspects: (1) open domain questions related to depression outpatient services extracted from the DAIC-WOZ dataset; (2) PHQ-9 (parent Health Questionaire) Questionnaire.

The dialogue discourse in the DAIC-WOZ data set is a semi-structured depression clinic interview using fixed problems, extracts and counts the problems in the text data of the data set, and finally extracts the open domain problems in the 50 depression clinics with the highest current number as a part of the preset problem library of the invention. The 50 questions were labeled with a subject label and an emotion guidance label, respectively, as described in the DAIC-WOZ original work. The theme tag includes five categories of experience, education, work, family, and self. Emotion guide labels are classified into positive and common, wherein the question labeled positive refers to a question for guiding the active emotion of a patient during a conversation. This part of the problem is detailed in table 1:

TABLE 1 depression clinic open field problem abstracted from DAIC-WOZ

The contents of the PHQ-9 questionnaire are shown in Table 2:

TABLE 2 PHQ-9 questionnaire contents

The strategy for constructing the dialog master question flow is as follows:

(1) macroscopically, the main problem flow is divided into three phases, a start phase, a problem phase and an end phase. The range of the initial phase selection problem is that the emotion oriented tag in Table 1 is Common, from which n is randomly drawn₁A bar problem; selecting the questions as the whole contents of a PHQ-9 questionnaire in the inquiry stage; the range of the selection problem in the ending stage is that the emotion guidance label in Table 1 is Positive, and n is randomly extracted from the problem₂The problem of stripes. In this example n₁Is set to be 7, n₂Set to 5.

(2) And adjusting according to the theme label of the question. In order to solve the problem that the number of the problems of a certain proportion of themes is excessive, the proportion of various theme label problems is calculated for the problems extracted in the beginning stage and the ending stage, and if the problem proportion of a certain class of themes exceeds the corresponding proportion in a preset problem library, part of the extracted theme problems are replaced randomly. The above process is repeated until the problem proportions for all subjects are appropriate.

(3) And supplementing according to the age information of the user. In consideration of the difference of the subjects of interest of the users in different age groups, the corresponding supplement is performed according to the age information of the users collected in the step S1. The supplementary manner is that if the obtained age value is less than or equal to 25, 2 unextracted subjects are added as the issue of the duration, if the obtained age value is more than 25 and less than or equal to 35, 2 unextracted subjects are added as the issue of Work, and if the obtained age value is more than 35, two unextracted subjects are added as the issue of Family. The new questions are added to the corresponding stage according to their emotion guidance labels.

(4) And (5) post-treatment. And replacing the placeholders of the partial problems according to the collected user information. For example, a question "what are objects through you like about".

Through the strategy, a main problem flow generated by the depression diagnosis dialogue is constructed. During the whole dialog, the questions in the main question flow are output to the user in sequence, and when the questions in the main question flow do not remain and the corresponding subsequent steps are performed completely, the depression clinic dialog is ended.

S3: and judging the emotion polarity replied by the user by using an emotion classification module. The method comprises the steps of firstly training a classifier in an emotion classification module, extracting 6327 samples for training a classification model from a DailyDialog data set, wherein emotion labels of the data samples are classified into positive, negative and none types and are respectively used for marking a positive emotion sample, a negative emotion sample and a non-emotion sample. These samples are used to train a classifier consisting of word2vec in combination with a Bi-directional Long Short-Term Memory network (BilSTM). In the conversation process, the specific use process of classification is to convert the word sequence of the conversation text into a word vector sequence (w) by using word2vec word vectors₀,w₁,w₂,…,w_n) The dimension of each word vector is 200 dimensions, the BilSTM is input, and after calculation, the hidden layer state h of the last time step is taken_nAnd as the whole input semantic representation, obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. If the emotion classification result is positive or negative, then the emotion is returned from the preset emotion reply libraryAnd extracting the corresponding reply by the machine and outputting the reply to the user.

S4: and judging the subsequent problem generation condition. If the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question. In the embodiment of the present invention, the length threshold is set to 25.

S5: subsequent problems are generated. And applying the fine-tuned GPT-2 model to generate subsequent problems. Firstly, the reply r of the user is preprocessed, and redundant spaces and meaningless tone words generated by voice recognition are removed. And connecting the current q and r through a connection indicator to obtain a < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, and obtaining the probability distribution of the next word to be predicted after calculation of a fine-tuned GPT-2 model. The specific calculation process is described above for the GPT-2 calculation process. And decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached. The text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims

1. The depression interview conversation generation method based on the pre-training language model is characterized by comprising the following steps:

the dialog generation method comprises the following steps:

step (1), collecting basic information, wherein when a depression clinic dialogue begins, a fixed problem of collecting the basic information is presented to a user, the basic information comprises name, age and place of residence, and the information is extracted and recorded for subsequent steps;

step (2), constructing a main problem flow, extracting problems from a preset problem library according to a set strategy, constructing the main problem flow of the depression clinic dialogue, outputting the problems in the main problem flow to a user in sequence in the whole dialogue process, and finishing the depression clinic dialogue when the problems in the main problem flow are not left and corresponding subsequent steps are executed;

replying emotion classification and response, classifying the replied emotion by using a trained classifier after receiving the reply of the user to the question in the main question stream each time, and responding according to the classification result;

judging the subsequent question generation condition, and determining whether to generate the subsequent question according to the reply emotion classification result and the reply content length in the step (3);

and (5) generating follow-up problems, when the judgment result in the step (4) is that the follow-up problem generation conditions are met, generating the follow-up problems by using the fine-tuned pre-trained language model GPT-2, and generating the follow-up problems according to the current problems of the system and the answers of the users to the problems.

2. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the fine tuning method of the pre-training language model GPT-2 in the step (5) comprises the following steps:

(51) obtaining 4112 data samples in the form of { q, r, f } from an Asynchronous overview data set and an empirical diagnostics data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and all samples are divided into a training set, a verification set and a test set;

(52) 300 samples were randomly drawn in the training set and f of these samples was randomly replaced with a question independent of q and r. The 300 samples are taken as interference samples, the label is 0, and the labels of other normal samples are 1;

(53) splicing each sample { q, r, f } through a special divider to obtain the shape<startoftext>q<speaker1>r<speaker2>f<endoftext>The input sequence of (1) is denoted as "U ═ U₀，u₁，u₂，…，u_N)；

(54) For each input sequence, obtaining a prediction sequence through a GPT-2 model

And a semantic representation h of the entire input sequence;

(55) obtaining a prediction sequence for each sample through a model GPT-2

Computing the loss of the language model using multi-class cross-entropy, the partial loss being recorded as

(56) And performing two-classification calculation on the content relevance to obtain a classification result, wherein the semantic expression h obtained by calculating each sample through the model GPT-2 is used for performing two-classification calculation on the content relevance

(57) For total loss

3. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the basic information collection in the step (1) sets simple questions for inquiring the name, age and residence of the user, and extracts the name, age and residence information in the reply of the user; the method for extracting the name and the age in the user response adopts a regular expression to extract the entity representing the name and the number representing the age in the user response; the method for extracting the residence in the user response adopts a method of a keyword dictionary.

4. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the specific content of the preset problem library in the step (2) comes from two aspects: (a) the DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in the 50 depression outpatients with the highest current times are extracted, wherein each problem is provided with a theme label and an emotion guidance label; (b) PHQ-9(Patient Health Questionaire-9) Questionnaire.

5. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the main problem flow construction strategy in the step (2) specifically comprises the following contents:

(21) macroscopically dividing a main question flow into three stages, namely a starting stage, an inquiry stage and an ending stage, wherein the range of the selected questions in the starting stage is the problem that the emotion guide tag in the open domain problem is Common, the selected questions in the inquiry stage are all contents of a PHQ-9 questionnaire, the range of the selected questions in the ending stage is the problem that the emotion guide tag in the open domain problem is Positive, and the number of the questions in the starting stage and the ending stage is randomly extracted according to the set number;

(22) adjusting according to the theme label of the problem, calculating the problem proportion of various theme labels for the problems extracted in the starting stage and the ending stage, and replacing the problems under the theme with the proportion exceeding the corresponding proportion in the preset problem library until the problem proportion of all the themes meets the requirement;

(23) considering the difference of topics which are interesting to users in different age groups, and correspondingly supplementing the main question flow according to the age information of the users;

(24) and post-processing, namely replacing the placeholders in the partial problems according to the collected user information.

6. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the emotion classification and response method in the step (3) specifically comprises the following steps: converting a word sequence of a dialog text into a word vector sequence (w) using a word2vec word vector₀，w₁，w₂，…，w_n) Inputting the trained bidirectional long-short term memory network, calculating, and taking the hidden layer state h of the last time step_nAnd as the whole input semantic representation, obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. And if the emotion classification result is positive or negative, randomly extracting a corresponding response from a preset emotion response library, and outputting the response to the user.

7. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the judgment of the subsequent problem generation condition in the step (4) specifically comprises the following steps: if the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question.

8. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the subsequent problem generation in the step (5) specifically comprises the following steps: using the fine-tuned GPT-2 model to generate subsequent problems, taking < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, obtaining the probability distribution of the next word to be predicted after the calculation of the fine-tuned GPT-2 model, decoding by using a Top-k sampling strategy, adding the decoding result to the tail of the original sequence, and forming a new input sequence; repeating the above process until a terminator < endenoxext > is generated or the maximum sequence length is reached; the text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.

9. Depression interview dialog generation system based on a pre-trained language model, characterized in that: the system comprises the following modules:

the voice recognition and synthesis module converts voice input by a user into a text form through voice recognition, is used for information processing of a subsequent module of the dialogue system, correspondingly, the system outputs the problem to the user, and broadcasts the problem to the user in a voice form after passing through the voice synthesis module;

and the follow-up question generation module is used for generating follow-up questions to be asked to the user according to the current question and the user response, and the follow-up question generation module adopts the fine-tuned pre-training language model GPT-2 to generate follow-up questions.