CN113780012A - Depression interview conversation generation method based on pre-training language model - Google Patents

Depression interview conversation generation method based on pre-training language model Download PDF

Info

Publication number
CN113780012A
CN113780012A CN202111165245.6A CN202111165245A CN113780012A CN 113780012 A CN113780012 A CN 113780012A CN 202111165245 A CN202111165245 A CN 202111165245A CN 113780012 A CN113780012 A CN 113780012A
Authority
CN
China
Prior art keywords
user
question
language model
depression
emotion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111165245.6A
Other languages
Chinese (zh)
Other versions
CN113780012B (en
Inventor
周德宇
王博宇
张林海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202111165245.6A priority Critical patent/CN113780012B/en
Publication of CN113780012A publication Critical patent/CN113780012A/en
Application granted granted Critical
Publication of CN113780012B publication Critical patent/CN113780012B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a depression interview conversation generation method based on a pre-training language model, which comprises the steps of collecting basic information of a user through a fixed problem; extracting partial questions in a preset question bank according to strategies, and constructing a main question flow for asking questions of a user; classifying emotion polarities of user responses, and selecting corresponding reply sentences as responses to the user emotions; and generating a subsequent question related to the user response according to the current question and the user response by using the fine-tuned pre-training language model GPT-2. The system of the invention mainly comprises: the system comprises a voice recognition and synthesis module, a preset question bank, a main question flow construction module, an emotion classification and response module and a follow-up question generation module. Compared with the traditional dialogue generation method completely using a fixed template, the method has the advantages that the finely adjusted pre-training language model can generate more flexible follow-up problems, more effective depression diagnosis interview can be carried out, and better effect is achieved.

Description

Depression interview conversation generation method based on pre-training language model
Technical Field
The invention relates to a method for generating a dialogue by utilizing a deep learning technology, in particular to a method and a system for generating a dialogue for a depression interview based on a pre-training language model.
Background
In the medical field, the application of dialog systems for the diagnosis and assessment of mental health is a study with a long history. In fact, the first chat robot in history, ELIZA developed by Joseph Weizenbaum in the last 60 th century, was an automated dialog system for mental health interventions.
The DSM-IV clinical interview, is the clinical diagnostic tool mainly oriented to depression. The existing dialogue system for medical diagnosis mainly takes problems as guidance, and the dialogue system continuously throws the problems to promote the whole dialogue process. However, most of the currently available mental health interview systems employ a fixed problem set for semi-structured interviews. Although the fixed problem setting makes the building of the dialogue system more convenient, the flexibility of the system is limited to a great extent. An ideal depression diagnosis dialogue system should be able to present more relevant follow-up questions based on the user's responses. The invention considers the generation of follow-up problems by adopting a text generation technology in the field of deep learning.
Conventional text generation techniques include GAN, Seq2Seq, etc. models. These models have achieved good results in the task of generating specific text. However, under the condition of less training data, the robustness of the model is poor, and the text which is close to the real distribution is difficult to generate.
OpenAI proposed a GPT (Generation Pre-Training) model in 2018. The GPT adopts a training mode of pre-training and fine-tuning, firstly uses large-scale linguistic data to perform pre-training, and then performs fine-tuning according to a specific generation task. The basic structure of the GPT model is to stack 12 transform decoder layers. When a sequence is entered at the input layer, identifiers are added at the beginning and end of the sequence. If a plurality of sequences are input, the plurality of sequences are concatenated with identifiers representing the divisions and then processed in one sequence, and most of the text can be input in this manner. The mode of 'pre-training + fine-tuning' reduces the training cost, and a better generation result can be obtained under the condition of only a small number of training samples. In 2019, OpenAI proposed an upgraded version of GPT, GPT-2. Currently, various types of text generation research are actively attempting to use pre-trained language models.
Disclosure of Invention
In order to solve the problems, the invention discloses a method and a system for generating a depression interview dialogue based on a pre-training language model, which are used for generating texts by utilizing the pre-training language model, so that the problems that the system is fixed in problem setting, cannot be effectively pursued, and has less data available for training are solved. The method disclosed by the invention mainly comprises the following steps:
(1) basic information collection, wherein when a depression clinic session is started, a fixed problem of collecting basic information is presented to a user, the basic information comprises name, age and place of residence, and the information is extracted and recorded for subsequent steps;
(2) constructing a main problem flow, extracting problems from a preset problem library according to a set strategy, constructing the main problem flow of the depression clinic dialogue, outputting the problems in the main problem flow to a user in sequence in the whole dialogue process, and finishing the depression clinic dialogue when the problems in the main problem flow are not left and corresponding subsequent steps are executed;
(3) replying emotion classification and response, classifying the replied emotions by using a trained classifier after receiving the reply of the user to the question in the main question stream each time, and responding according to the classification result;
(4) and (4) judging the condition of generating the follow-up question, and determining whether to generate the follow-up question according to the reply emotion classification result and the reply content length in the step (3).
(5) And (4) generating a follow-up problem, when the judgment result in the step (4) meets the follow-up problem generation condition, generating the follow-up problem by using the fine-tuned pre-trained language model GPT-2, and generating the follow-up problem according to the current problem of the system and the response of the user to the problem.
Further, the basic information collection in the step (1) sets simple questions inquiring about the name, age and place of residence of the user, and extracts the name, age and place of residence information in the reply of the user; the method for extracting the name and the age in the user response adopts a regular expression to extract the entity representing the name and the number representing the age in the user response; the method for extracting the residence in the user response adopts a method of a keyword dictionary.
Further, the preset problem library in the step (2) is derived from two aspects: (a) the DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in the 50 depression outpatients with the highest current times are extracted, wherein each problem is provided with a theme label and an emotion guidance label; (b) PHQ-9(Patient Health Questionaire-9) Questionnaire.
Further, the main problem flow construction strategy in the step (2) specifically includes: macroscopically dividing a main problem flow into three stages, namely a starting stage, an inquiry stage and an ending stage, wherein the range of the selected problems in the starting stage is that the emotion guide label in the DAIC-WOZ problem is Common, the selected problems in the inquiry stage are all contents of a PHQ-9 questionnaire, the range of the selected problems in the ending stage is that the emotion guide label in the DAIC-WOZ problem is Positive, and the number of the problems in the starting stage and the ending stage is randomly extracted according to the set number; adjusting according to the theme labels of the problems, calculating the problem proportion of various theme labels for the problems extracted in the starting stage and the ending stage, and replacing the problems under the topics with the proportion exceeding the corresponding proportion in the preset problem library until the problem proportion of all the themes meets the requirements; taking into account the difference of the topics which are interesting to the users in different age groups, and correspondingly supplementing the main question flow according to the age information of the users; and fourthly, post-processing, namely replacing the placeholders of the partial problems according to the collected user information.
Further, the emotion classification and response method in the step (3) specifically includes: converting a word sequence of a dialog text into a word vector sequence (w) using a word2vec word vector0,w1,w2,…,wn) Inputting the trained bidirectional long-short term memory network, calculating, and taking the hidden layer state h of the last time stepnAs a whole inputSemantic representation, namely obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. And if the emotion classification result is positive or negative, randomly extracting a corresponding response from a preset emotion response library, and outputting the response to the user.
Further, the judgment of the subsequent problem generation condition in the step (4) specifically includes: if the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question.
Further, the fine-tuning method of the pre-training language model GPT-2 in the step (5) specifically includes: obtaining 4112 data samples in the form of { q, r, f } from an Asynchronous overview data set and an empirical diagnostics data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and all samples are divided into a training set, a verification set and a test set; 300 samples were randomly drawn in the training set and f of these samples was randomly replaced with a question independent of q and r. The 300 samples are taken as interference samples, the label is 0, and the labels of other normal samples are 1; splicing each sample { q, r, f } through a special divider to obtain the shape<startoftext>q<speaker1>r< speaker2>f<endoftext>The input sequence of (1) is denoted as "U ═ U0,u1,u2,…,uN);
For each input sequence, obtaining a prediction sequence through a GPT-2 model
Figure BDA0003291121820000051
And a semantic representation h of the entire input sequence; obtaining a prediction sequence for each sample through a model GPT-2
Figure BDA0003291121820000052
Computing language models using multi-class cross-entropyLoss, which is recorded as
Figure BDA0003291121820000053
And performing two-classification calculation on the content relevance to obtain a classification result, wherein the semantic expression h obtained by calculating each sample through the model GPT-2 is used for performing two-classification calculation on the content relevance
Figure BDA0003291121820000054
The loss is calculated using the two-class cross entropy and this partial loss is noted as
Figure BDA0003291121820000055
For total loss
Figure BDA0003291121820000056
And optimizing by using a gradient descent method, and updating the parameters of the pre-training language model GPT-2.
Further, the subsequent problem generation in the step (5) specifically includes: and (3) applying the fine-tuned GPT-2 model to generate subsequent problems, taking < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, calculating the fine-tuned GPT-2 model to obtain the probability distribution of the next word to be predicted, decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached. The text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.
Based on the depression diagnosis conversation generation method, the invention also provides a conversation generation system facing DSM-IV clinical fixed interview and based on a pre-training language model, wherein the system comprises a voice recognition and synthesis module, a basic information acquisition module, a main question flow preset question bank, a main question flow construction module, an emotion classification and response module and a follow-up question generation module;
the basic information acquisition module is used for proposing a fixed problem for acquiring user information to a user, extracting the basic information in the user reply, and recording the information for the subsequent steps of conversation;
the main question flow presetting question library is used for storing and constructing the questions of the main question flow;
the main problem flow combination module extracts a plurality of problems from a main problem flow preset problem library according to a set construction strategy to form a main problem flow in a primary depression clinic dialogue;
the emotion classification and response module classifies the emotion polarity replied by the user, and responds to the user with positive and negative classification results according to a preset emotion reply sentence;
and the follow-up question generation module is used for generating follow-up questions to be asked to the user according to the current question and the user response, and the follow-up question generation module adopts the fine-tuned pre-training language model GPT-2 to generate follow-up questions. .
Compared with the prior art, the DSM-IV clinical interview oriented dialog generation method and system based on the pre-training language model have the following beneficial effects that: the GPT-2 model is pre-trained on large-scale corpora, and compared with a traditional generation model, only a small amount of samples are needed for fine adjustment, and the training cost is reduced by the mode of pre-training and fine adjustment; the fine-tuned pre-training language model can automatically generate subsequent problems which are more abundant in expression and more relevant to the user response according to the user response, so that more effective question tracing is carried out in the conversation process; moreover, the system can sense the emotion of the user in the conversation process and give a response, and the whole conversation process is more humanized.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
FIG. 2 is a diagram illustrating the structure and fine-tuning process of the pre-training language model GPT-2 according to the present invention.
Detailed Description
The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention. It should be noted that the terms "front," "back," "left," "right," "upper" and "lower" used in the following description refer to directions in the drawings, and the terms "inner" and "outer" refer to directions toward and away from, respectively, the geometric center of a particular component.
The embodiment of the invention discloses a pre-training language model GPT-2 based on which a depression clinic dialogue system is realized, and the model structure diagram is shown in figure 2. In the embodiment of the invention, the Asynchronous Interview data set is used for fine tuning of the GPT-2 model, and in the subsequent implementation process, the training sample is expanded by using data extracted from the empirical diagnostics data set. The total training data consists of 4112 samples, each shaped as { q, r, f }. As described above, q represents the initial question, r represents the reply to the question, and f represents the subsequent question. All samples were divided into training set, validation set and test set at a ratio of 8:1: 1. Fine tuning a small-scale English version pre-training language model GPT-2, wherein the fine tuning process comprises the following specific steps:
1. splicing each sample { q, r, f } through a special divider to obtain the shape of<startoftext>q<speaker1>r<speaker2>f<endoftext>The input sequence of (1). The sequence is designated as U ═ U (U)0,u1,u2,…,uN)
2. According to the English vocabulary of GPT-2, each word (token) in the input sequence is mapped into a corresponding token id, and when a model is input, the token id is converted into a corresponding one-hot encoding (one-hot) vector. For example, the word "happy" is first mapped to the corresponding token id value 3772 and then converted to a one-hot vector, which is shaped as [0,0,0, …,1, …,0]Except that the value at the position of 3772 is 1, the values at the other positions are 0. Combining the matrix composed of the unique heat vectors with the embedded matrix WEMultiplying to obtain the word embedding representation sequence of the whole input sequence.
And simultaneously calculating position embedding, wherein the position embedding calculation is disclosed as follows:
Figure BDA0003291121820000081
Figure BDA0003291121820000082
wherein pos represents the position of the current token in the input sequence; i is a number used to identify the word embedding dimension position; dmodelThe dimension of word embedding. And adding corresponding elements of the word embedding sequence and the position embedding sequence to obtain a representation sequence of the input model. Record this sequence as UE
3. The GPT-2 model is formed by stacking 12 layers of identical Transformer decoder modules, each of which contains layer normalization, a mask self-attention mechanism, residual concatenation and fully-concatenated neural networks. The computation process of the Transformer decoder module is as follows:
Transform(H)=FN(Norm(Moutput(H)))+Moutput(H)
where Norm represents the layer normalization function, the calculation formula is as follows,
Figure BDA0003291121820000091
Moutputthe result after the mask attention mechanism with residual error connection is shown, and the calculation formula is as follows:
Moutput(H)=M(Norm(H))+H
where M represents the mask self-attention mechanism.
The computation process of the entire GPT-2 model can then be expressed as:
H1=Transformer(UE)
H2=Transformer(H1)
H12=Transformer(H11)
4. h obtained by calculation of GPT-2 model12The hidden state of each token in the input sequence represents that the probability distribution of the next token corresponding to each position is obtained through a full connection layer with an activation function being a softmax function. The final predicted sequence is recorded as
Figure BDA0003291121820000092
5. And (5) carrying out supervision task design in the fine adjustment process. In the fine tuning process, in order to improve the relevance between the generated content of the model and the existing text, a binary task for judging the relevance between the generated subsequent problems and the text is constructed. The specific implementation mode is that 300 samples are randomly extracted from the training set, and f in the samples is randomly replaced by a problem which is independent of q and r. These 300 samples served as interference samples and were labeled 0. The other normal samples are labeled 1. H12The last item of the sequence contains the semantic information of the whole sequence, which is marked as h, and the last item of the sequence is put into a classifier C for secondary classification, wherein the calculation formula is as follows:
Figure BDA0003291121820000101
during the fine tuning process, the loss function of the model consists of two parts. The first part is the loss of the language model, in the form:
Figure BDA0003291121820000102
specifically, in the present embodiment, multi-class cross entropy is used as a loss function of the language model, and thus
Figure BDA0003291121820000103
The following can be written:
Figure BDA0003291121820000104
where N represents the total number of tokens predicted in the sample. The loss function of the binary task for judging and generating the content correlation is recorded as follows:
Figure BDA0003291121820000105
the total loss function of the trimming process is recorded as:
Figure BDA0003291121820000106
where λ is a hyper-parameter for controlling the weight, set to 0.2 in the present embodiment.
During fine tuning, the text embedding dimension is set to 768, the maximum sequence length is 1024, the sample batch size is set to 4, a loss function is optimized by adopting a gradient descent method, model parameters are updated, an Adam optimizer is used, and the initial learning rate is set to 0.0015.
The follow-up problem generation module in the system provided by the invention applies the fine-tuned model to generate follow-up problems. In one subsequent problem generation, the current q and f are connected through a separation indicator as an initial input sequence, and after a fine-tuned GPT-2 model, the probability distribution of the next word to be predicted is obtained. And (3) decoding by using a Top-k sampling strategy, namely selecting a word corresponding to the maximum k values in the probability distribution for predicting the next token as a candidate for decoding in the current round, and randomly extracting a word from the k candidates as a decoding result. Here, k is a hyper-parameter, which is set to 5 in the present embodiment. And adding the decoding result to the end of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached.
The embodiment of the invention discloses a dialogue generating method facing DSM-IV clinical fixed interview based on a pre-training language model, a flow chart is shown in figure 1, and the method mainly comprises the following steps:
s1: and (5) collecting basic information. Simple questions inquiring about the three aspects of the name, age and place of residence of the user are set and are presented to the user at the beginning of the conversation; the method for extracting the name and the age in the user reply adopts a regular expression to extract the entity representing the name and the number representing the age in the user reply; the method for extracting the residence place in the user reply adopts a keyword dictionary method, and the residence place information of the user is extracted by comparing the user reply with the city name stored in the preset dictionary. And recording the collected information of name, age and residence, and providing the information for the subsequent steps.
S2: and constructing a dialogue main question flow. The preset problem library used for constructing the dialog main problem flow is derived from two aspects: (1) open domain questions related to depression outpatient services extracted from the DAIC-WOZ dataset; (2) PHQ-9 (parent Health Questionaire) Questionnaire.
The dialogue discourse in the DAIC-WOZ data set is a semi-structured depression clinic interview using fixed problems, extracts and counts the problems in the text data of the data set, and finally extracts the open domain problems in the 50 depression clinics with the highest current number as a part of the preset problem library of the invention. The 50 questions were labeled with a subject label and an emotion guidance label, respectively, as described in the DAIC-WOZ original work. The theme tag includes five categories of experience, education, work, family, and self. Emotion guide labels are classified into positive and common, wherein the question labeled positive refers to a question for guiding the active emotion of a patient during a conversation. This part of the problem is detailed in table 1:
TABLE 1 depression clinic open field problem abstracted from DAIC-WOZ
Figure BDA0003291121820000121
Figure BDA0003291121820000131
Figure BDA0003291121820000141
Figure BDA0003291121820000151
Figure BDA0003291121820000161
The contents of the PHQ-9 questionnaire are shown in Table 2:
TABLE 2 PHQ-9 questionnaire contents
Figure BDA0003291121820000162
Figure BDA0003291121820000171
The strategy for constructing the dialog master question flow is as follows:
(1) macroscopically, the main problem flow is divided into three phases, a start phase, a problem phase and an end phase. The range of the initial phase selection problem is that the emotion oriented tag in Table 1 is Common, from which n is randomly drawn1A bar problem; selecting the questions as the whole contents of a PHQ-9 questionnaire in the inquiry stage; the range of the selection problem in the ending stage is that the emotion guidance label in Table 1 is Positive, and n is randomly extracted from the problem2The problem of stripes. In this example n1Is set to be 7, n2Set to 5.
(2) And adjusting according to the theme label of the question. In order to solve the problem that the number of the problems of a certain proportion of themes is excessive, the proportion of various theme label problems is calculated for the problems extracted in the beginning stage and the ending stage, and if the problem proportion of a certain class of themes exceeds the corresponding proportion in a preset problem library, part of the extracted theme problems are replaced randomly. The above process is repeated until the problem proportions for all subjects are appropriate.
(3) And supplementing according to the age information of the user. In consideration of the difference of the subjects of interest of the users in different age groups, the corresponding supplement is performed according to the age information of the users collected in the step S1. The supplementary manner is that if the obtained age value is less than or equal to 25, 2 unextracted subjects are added as the issue of the duration, if the obtained age value is more than 25 and less than or equal to 35, 2 unextracted subjects are added as the issue of Work, and if the obtained age value is more than 35, two unextracted subjects are added as the issue of Family. The new questions are added to the corresponding stage according to their emotion guidance labels.
(4) And (5) post-treatment. And replacing the placeholders of the partial problems according to the collected user information. For example, a question "what are objects through you like about".
Through the strategy, a main problem flow generated by the depression diagnosis dialogue is constructed. During the whole dialog, the questions in the main question flow are output to the user in sequence, and when the questions in the main question flow do not remain and the corresponding subsequent steps are performed completely, the depression clinic dialog is ended.
S3: and judging the emotion polarity replied by the user by using an emotion classification module. The method comprises the steps of firstly training a classifier in an emotion classification module, extracting 6327 samples for training a classification model from a DailyDialog data set, wherein emotion labels of the data samples are classified into positive, negative and none types and are respectively used for marking a positive emotion sample, a negative emotion sample and a non-emotion sample. These samples are used to train a classifier consisting of word2vec in combination with a Bi-directional Long Short-Term Memory network (BilSTM). In the conversation process, the specific use process of classification is to convert the word sequence of the conversation text into a word vector sequence (w) by using word2vec word vectors0,w1,w2,…,wn) The dimension of each word vector is 200 dimensions, the BilSTM is input, and after calculation, the hidden layer state h of the last time step is takennAnd as the whole input semantic representation, obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. If the emotion classification result is positive or negative, then the emotion is returned from the preset emotion reply libraryAnd extracting the corresponding reply by the machine and outputting the reply to the user.
S4: and judging the subsequent problem generation condition. If the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question. In the embodiment of the present invention, the length threshold is set to 25.
S5: subsequent problems are generated. And applying the fine-tuned GPT-2 model to generate subsequent problems. Firstly, the reply r of the user is preprocessed, and redundant spaces and meaningless tone words generated by voice recognition are removed. And connecting the current q and r through a connection indicator to obtain a < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, and obtaining the probability distribution of the next word to be predicted after calculation of a fine-tuned GPT-2 model. The specific calculation process is described above for the GPT-2 calculation process. And decoding by using a Top-k sampling strategy, and adding a decoding result to the tail of the original sequence to form a new input sequence. The above process is repeated until the terminator < endenoxext > is generated or the maximum sequence length is reached. The text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features.

Claims (9)

1. The depression interview conversation generation method based on the pre-training language model is characterized by comprising the following steps:
the dialog generation method comprises the following steps:
step (1), collecting basic information, wherein when a depression clinic dialogue begins, a fixed problem of collecting the basic information is presented to a user, the basic information comprises name, age and place of residence, and the information is extracted and recorded for subsequent steps;
step (2), constructing a main problem flow, extracting problems from a preset problem library according to a set strategy, constructing the main problem flow of the depression clinic dialogue, outputting the problems in the main problem flow to a user in sequence in the whole dialogue process, and finishing the depression clinic dialogue when the problems in the main problem flow are not left and corresponding subsequent steps are executed;
replying emotion classification and response, classifying the replied emotion by using a trained classifier after receiving the reply of the user to the question in the main question stream each time, and responding according to the classification result;
judging the subsequent question generation condition, and determining whether to generate the subsequent question according to the reply emotion classification result and the reply content length in the step (3);
and (5) generating follow-up problems, when the judgment result in the step (4) is that the follow-up problem generation conditions are met, generating the follow-up problems by using the fine-tuned pre-trained language model GPT-2, and generating the follow-up problems according to the current problems of the system and the answers of the users to the problems.
2. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the fine tuning method of the pre-training language model GPT-2 in the step (5) comprises the following steps:
(51) obtaining 4112 data samples in the form of { q, r, f } from an Asynchronous overview data set and an empirical diagnostics data set, wherein q represents a problem, r represents a reply to the problem, f represents a subsequent problem, and all samples are divided into a training set, a verification set and a test set;
(52) 300 samples were randomly drawn in the training set and f of these samples was randomly replaced with a question independent of q and r. The 300 samples are taken as interference samples, the label is 0, and the labels of other normal samples are 1;
(53) splicing each sample { q, r, f } through a special divider to obtain the shape<startoftext>q<speaker1>r<speaker2>f<endoftext>The input sequence of (1) is denoted as "U ═ U0,u1,u2,…,uN);
(54) For each input sequence, obtaining a prediction sequence through a GPT-2 model
Figure FDA0003291121810000021
And a semantic representation h of the entire input sequence;
(55) obtaining a prediction sequence for each sample through a model GPT-2
Figure FDA0003291121810000022
Computing the loss of the language model using multi-class cross-entropy, the partial loss being recorded as
Figure FDA0003291121810000023
(56) And performing two-classification calculation on the content relevance to obtain a classification result, wherein the semantic expression h obtained by calculating each sample through the model GPT-2 is used for performing two-classification calculation on the content relevance
Figure FDA0003291121810000024
The loss is calculated using the two-class cross entropy and this partial loss is noted as
Figure FDA0003291121810000025
(57) For total loss
Figure FDA0003291121810000026
And optimizing by using a gradient descent method, and updating the parameters of the pre-training language model GPT-2.
3. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the basic information collection in the step (1) sets simple questions for inquiring the name, age and residence of the user, and extracts the name, age and residence information in the reply of the user; the method for extracting the name and the age in the user response adopts a regular expression to extract the entity representing the name and the number representing the age in the user response; the method for extracting the residence in the user response adopts a method of a keyword dictionary.
4. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the specific content of the preset problem library in the step (2) comes from two aspects: (a) the DAIC-WOZ data set is used for extracting and counting the problems in the text data of the data set, and finally, the open domain problems in the 50 depression outpatients with the highest current times are extracted, wherein each problem is provided with a theme label and an emotion guidance label; (b) PHQ-9(Patient Health Questionaire-9) Questionnaire.
5. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the main problem flow construction strategy in the step (2) specifically comprises the following contents:
(21) macroscopically dividing a main question flow into three stages, namely a starting stage, an inquiry stage and an ending stage, wherein the range of the selected questions in the starting stage is the problem that the emotion guide tag in the open domain problem is Common, the selected questions in the inquiry stage are all contents of a PHQ-9 questionnaire, the range of the selected questions in the ending stage is the problem that the emotion guide tag in the open domain problem is Positive, and the number of the questions in the starting stage and the ending stage is randomly extracted according to the set number;
(22) adjusting according to the theme label of the problem, calculating the problem proportion of various theme labels for the problems extracted in the starting stage and the ending stage, and replacing the problems under the theme with the proportion exceeding the corresponding proportion in the preset problem library until the problem proportion of all the themes meets the requirement;
(23) considering the difference of topics which are interesting to users in different age groups, and correspondingly supplementing the main question flow according to the age information of the users;
(24) and post-processing, namely replacing the placeholders in the partial problems according to the collected user information.
6. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the emotion classification and response method in the step (3) specifically comprises the following steps: converting a word sequence of a dialog text into a word vector sequence (w) using a word2vec word vector0,w1,w2,…,wn) Inputting the trained bidirectional long-short term memory network, calculating, and taking the hidden layer state h of the last time stepnAnd as the whole input semantic representation, obtaining emotion category probability distribution of the dialog text by using a softmax function as a full connection layer of an activation function, wherein the category corresponding to the maximum probability value is an emotion classification result. And if the emotion classification result is positive or negative, randomly extracting a corresponding response from a preset emotion response library, and outputting the response to the user.
7. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the judgment of the subsequent problem generation condition in the step (4) specifically comprises the following steps: if the reply emotion classification result is negative, directly judging that no follow-up problem is generated; and if the answer emotion classification result is neutral or positive and the answer length exceeds an artificially set length threshold, generating a subsequent question according to the current question and the answer of the user, and receiving the answer of the user about the subsequent question.
8. The pre-trained language model based depression interview conversation generation method of claim 1, wherein: the subsequent problem generation in the step (5) specifically comprises the following steps: using the fine-tuned GPT-2 model to generate subsequent problems, taking < startsoft ext > q < spaker 1> r < spaker 2> as an initial input sequence, obtaining the probability distribution of the next word to be predicted after the calculation of the fine-tuned GPT-2 model, decoding by using a Top-k sampling strategy, adding the decoding result to the tail of the original sequence, and forming a new input sequence; repeating the above process until a terminator < endenoxext > is generated or the maximum sequence length is reached; the text generated in the process is the subsequent question, and is output to the user as the response of the system to the user reply.
9. Depression interview dialog generation system based on a pre-trained language model, characterized in that: the system comprises the following modules:
the voice recognition and synthesis module converts voice input by a user into a text form through voice recognition, is used for information processing of a subsequent module of the dialogue system, correspondingly, the system outputs the problem to the user, and broadcasts the problem to the user in a voice form after passing through the voice synthesis module;
the basic information acquisition module is used for proposing a fixed problem for acquiring user information to a user, extracting the basic information in the user reply, and recording the information for the subsequent steps of conversation;
the main question flow presetting question library is used for storing and constructing the questions of the main question flow;
the main problem flow combination module extracts a plurality of problems from a main problem flow preset problem library according to a set construction strategy to form a main problem flow in a primary depression clinic dialogue;
the emotion classification and response module classifies the emotion polarity replied by the user, and responds to the user with positive and negative classification results according to a preset emotion reply sentence;
and the follow-up question generation module is used for generating follow-up questions to be asked to the user according to the current question and the user response, and the follow-up question generation module adopts the fine-tuned pre-training language model GPT-2 to generate follow-up questions.
CN202111165245.6A 2021-09-30 2021-09-30 Depression interview dialogue generating method based on pre-training language model Active CN113780012B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111165245.6A CN113780012B (en) 2021-09-30 2021-09-30 Depression interview dialogue generating method based on pre-training language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111165245.6A CN113780012B (en) 2021-09-30 2021-09-30 Depression interview dialogue generating method based on pre-training language model

Publications (2)

Publication Number Publication Date
CN113780012A true CN113780012A (en) 2021-12-10
CN113780012B CN113780012B (en) 2023-12-29

Family

ID=78855182

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111165245.6A Active CN113780012B (en) 2021-09-30 2021-09-30 Depression interview dialogue generating method based on pre-training language model

Country Status (1)

Country Link
CN (1) CN113780012B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114496221A (en) * 2022-01-17 2022-05-13 天津大学 Depression automatic diagnosis system based on closed-loop voice chain and deep learning
CN115714002A (en) * 2022-09-06 2023-02-24 湖南工商大学 Depression risk detection model training method, depression state early warning method and related equipment
CN116259407A (en) * 2023-05-16 2023-06-13 季华实验室 Disease diagnosis method, device, equipment and medium based on multi-mode data
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN116776105A (en) * 2023-08-22 2023-09-19 北京大学人民医院 Method and device for constructing wound data safety management system and electronic equipment
CN116992867A (en) * 2023-06-14 2023-11-03 合肥工业大学 Depression emotion detection method and system based on soft prompt theme modeling
CN117055845A (en) * 2023-10-13 2023-11-14 边无际(北京)科技有限公司 Internet of things intelligent application method and device based on large language model
CN117574919A (en) * 2023-08-24 2024-02-20 华东师范大学 Stream question-answering template generation method based on large language model instruction fine tuning
CN117932041A (en) * 2024-03-21 2024-04-26 南京信息工程大学 Emotion support dialogue generation method, system and device based on thinking chain reasoning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153522A1 (en) * 2018-02-09 2019-08-15 卫盈联信息技术(深圳)有限公司 Intelligent interaction method, electronic device, and storage medium
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy
WO2021218029A1 (en) * 2020-04-26 2021-11-04 平安科技(深圳)有限公司 Artificial intelligence-based interview method and apparatus, computer device, and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019153522A1 (en) * 2018-02-09 2019-08-15 卫盈联信息技术(深圳)有限公司 Intelligent interaction method, electronic device, and storage medium
WO2021218029A1 (en) * 2020-04-26 2021-11-04 平安科技(深圳)有限公司 Artificial intelligence-based interview method and apparatus, computer device, and storage medium
CN113139042A (en) * 2021-04-25 2021-07-20 内蒙古工业大学 Emotion controllable reply generation method using fine-tuning and reordering strategy

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114496221A (en) * 2022-01-17 2022-05-13 天津大学 Depression automatic diagnosis system based on closed-loop voice chain and deep learning
CN114496221B (en) * 2022-01-17 2024-05-14 天津大学 Automatic depression diagnosis system based on closed-loop voice chain and deep learning
CN115714002A (en) * 2022-09-06 2023-02-24 湖南工商大学 Depression risk detection model training method, depression state early warning method and related equipment
CN115714002B (en) * 2022-09-06 2023-08-11 湖南工商大学 Training method for depression risk detection model, depression symptom early warning method and related equipment
CN116259407A (en) * 2023-05-16 2023-06-13 季华实验室 Disease diagnosis method, device, equipment and medium based on multi-mode data
CN116259407B (en) * 2023-05-16 2023-07-25 季华实验室 Disease diagnosis method, device, equipment and medium based on multi-mode data
CN116992867B (en) * 2023-06-14 2024-01-23 合肥工业大学 Depression emotion detection method and system based on soft prompt theme modeling
CN116992867A (en) * 2023-06-14 2023-11-03 合肥工业大学 Depression emotion detection method and system based on soft prompt theme modeling
CN116775911B (en) * 2023-08-22 2023-11-03 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN116776105A (en) * 2023-08-22 2023-09-19 北京大学人民医院 Method and device for constructing wound data safety management system and electronic equipment
CN116775911A (en) * 2023-08-22 2023-09-19 北京六元空间信息科技有限责任公司 Medical queue follow-up dialogue assisting method and system based on questionnaire and large model
CN117574919A (en) * 2023-08-24 2024-02-20 华东师范大学 Stream question-answering template generation method based on large language model instruction fine tuning
CN117574919B (en) * 2023-08-24 2024-05-17 华东师范大学 Stream question-answering template generation method based on large language model instruction fine tuning
CN117055845A (en) * 2023-10-13 2023-11-14 边无际(北京)科技有限公司 Internet of things intelligent application method and device based on large language model
CN117055845B (en) * 2023-10-13 2023-12-29 边无际(北京)科技有限公司 Internet of things intelligent application method and device based on large language model
CN117932041A (en) * 2024-03-21 2024-04-26 南京信息工程大学 Emotion support dialogue generation method, system and device based on thinking chain reasoning

Also Published As

Publication number Publication date
CN113780012B (en) 2023-12-29

Similar Documents

Publication Publication Date Title
CN113780012B (en) Depression interview dialogue generating method based on pre-training language model
CN108597541B (en) Speech emotion recognition method and system for enhancing anger and happiness recognition
Bharathi et al. Findings of the shared task on Speech Recognition for Vulnerable Individuals in Tamil
CN109977413A (en) A kind of sentiment analysis method based on improvement CNN-LDA
Kheddar et al. Deep transfer learning for automatic speech recognition: Towards better generalization
CN111666381A (en) Task type question-answer interaction system oriented to intelligent control
CN112417894A (en) Conversation intention identification method and system based on multi-task learning
KR20200105057A (en) Apparatus and method for extracting inquiry features for alalysis of inquery sentence
Kshirsagar et al. A review on application of deep learning in natural language processing
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN114203177A (en) Intelligent voice question-answering method and system based on deep learning and emotion recognition
CN115935975A (en) Controllable-emotion news comment generation method
CN111966824A (en) Text emotion recognition method based on emotion similarity attention mechanism
Antit et al. TunRoBERTa: a Tunisian robustly optimized BERT approach model for sentiment analysis
Zhao et al. Knowledge-aware bayesian co-attention for multimodal emotion recognition
CN114003700A (en) Method and system for processing session information, electronic device and storage medium
CN114297342A (en) Legal document generation method and system based on reading understanding and intention recognition model
Bhangdia et al. Speech emotion recognition and sentiment analysis based therapist bot
Daouad et al. An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture
Dhiaf et al. DocNER: A deep learning system for named entity recognition in handwritten document images
Hore et al. Code-switched end-to-end Marathi speech recognition for especially abled people
Andra et al. Contextual keyword spotting in lecture video with deep convolutional neural network
Bhagchandani et al. A hybrid solution to abstractive multi-document summarization using supervised and unsupervised learning
Bharathi et al. Overview of the third shared task on speech recognition for vulnerable individuals in tamil
Zajíc et al. First insight into the processing of the language consulting center data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant