CN114999610A

CN114999610A - Deep learning-based emotion perception and support dialog system construction method

Info

Publication number: CN114999610A
Application number: CN202210332004.4A
Authority: CN
Inventors: 杨燕; 谭振东; 孙宇翔; 张雨时; 陈妍; 贺樑
Original assignee: East China Normal University
Current assignee: East China Normal University
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-09-02

Abstract

The invention discloses a deep learning-based emotion perception and support dialogue system construction method which comprises a dialogue strategy selection module, a dialogue generation module, a psychological state cognition module and a suggestion recommendation module. The conversation strategy selection module controls the conversation process and selects a conversation strategy in real time according to the conversation history of the user; the dialogue generation module inputs the user dialogue history and the current dialogue strategy information into a decoder, and generates a reply statement through the decoder; the psychological state cognition module classifies the current psychological cognition state of the user through the user dialogue history; and the suggestion recommendation module retrieves the knowledge base to generate suggestion replies according to the user psychological cognitive state and the conversation history. The system constructs a plurality of conversation strategies by introducing the self-adaptive conversation strategy selection module, can select the conversation strategies according to the context, guides the model to generate reply contents with the same sense, and provides more effective emotional support for the user.

Description

Deep learning-based emotion perception and support dialog system construction method

Technical Field

The invention relates to the technical field of a dialogue recommendation system, in particular to a dialogue system construction method based on deep learning emotion perception and support, which determines the psychological cognition condition of a user and selects a proper dialogue strategy through dialogue history to generate a reply with more homonymy, and recommends the most proper answer from a question-answer suggestion knowledge base to feed back to the user.

Background

The dialogue system receives a great deal of attention from businesses and academia due to its high business value, and has been developed further in recent years thanks to the massive growth of network data and the development of deep learning.

In the background of global spread of epidemic situations, the number of psychological diseases such as anxiety and depression has been higher than 19 years ago in recent years. However, psychological counseling is expensive, professional psychological counselors are in short supply, and the like, so that many people cannot treat psychological diseases in time. Through the combination of artificial intelligence and psychology, the contradiction between the limitation of mental health resources and the limitless demand of mental disease patients can be relieved to a certain extent. The artificial intelligence research has been greatly developed in the aspects of deep learning, natural language processing, emotion recognition, computer vision and the like, and has been widely applied in the field of dialog systems, particularly in the aspects of intelligent question answering and intelligent customer service. Meanwhile, the psychological consultation system using artificial intelligence as a core technology gradually enters the visual field of people, and experiments prove that the psychological consultation robot combined with the artificial intelligence can sense the psychological state of a user in an interaction process and relieve the psychological pressure of the user to a certain extent.

Disclosure of Invention

The invention aims to provide a method for constructing a dialogue system based on emotion perception and support of deep learning aiming at the problem that the emotion support process of the existing dialogue system is fixed.

The specific technical scheme for realizing the purpose of the invention is as follows:

a method for constructing a deep learning-based emotion perception and support dialog system comprises the following steps:

the method comprises the following steps:

1) establishing dialogue strategy selection module

The conversation strategy selection module dynamically selects a conversation strategy through conversation context history, trains a conversation strategy selection model of the conversation strategy selection module, needs to be applied to a conversation data set, requires multiple rounds of conversation history in the data set, marks user psychological cognition state types such as 13 psychological cognition state types including 'lingering symptoms', 'occupational troubles', 'school overlong', 'heavy drinking', 'depression', 'academic pressure', 'appearance anxiety', 'sleep problems', 'family conflicts', 'friendship troubles', 'love troubles', 'family troubles', 'growth troubles', and marks the conversation strategy used when the conversation is replied in each round of conversation; in the dialogue system, 6 kinds of dialogue strategies of 'question asking', 'emotion reflection', 'self-exposure', 'affirmation and consolation', 'suggestion giving' and 'other' are set together;

for the dialog data set, set the given dialog context history as { u } ₁ ,r ₁ ,u ₂ ,r ₂ ,...,u _m ,r _m Where i denotes the dialog turn, u _i Statements, r, representing users in the ith round of dialog _i Statement representing the i-th wheel reply to the dialogue centering system, given dialogue policy { S } ₁ ,S ₂ ,...,,S _m }, giving user psycho-cognitive statusAnd O, under the condition of giving the current conversation turn i, selecting the latest two conversations as the current conversation history of the user, and setting U to { U ═ U { (U) } _i-2 ,r _i-2 ,u _i-1 ,r _i-1 ,u _i Selecting a 1 st round of dialogue to a current round of user statement when the number of the user statements is less than two;

sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]Sequence Final addition of tag [ SEP ]]To obtain a splicing sequence U _cat ＝[CLS]u _i-2 [SEP]r _i-2 [SEP]u _i-1 [SEP]r _i-1 [SEP]u _i [SEP](ii) a Splicing sequence U _cat As input, the splicing sequence U is subjected to a pre-trained transformer encoder model _cat Coding to obtain the coding sequence output by the Transformamer Encoder model

The specific process is as follows:

U _E ＝Embedding(U _cat )

Encoder(U _E )＝Add&Norm(MultiHead(U _E ,U _E ,U _E ))

Add&Norm(x)＝LayerNorm[FFN(x)+x]

FFN(x)＝Relu(xW ₁ +b ₁ )W ₂ +b ₂

MultiHead(Q,K,V)＝LayerNorm(Concat(head ₁ ,…head _h )W ^O +Q)

the function of the Embedding function is to convert an input sequence into a continuous embedded matrix, where K is ^T A transposed matrix representing the matrix K, d _k Is a constant whose size depends on the number of columns of the matrix Q at the time of calculation; w _i ^Q 、

Is a parameter matrix which can be learnt by the Attention modules, each kind has h in total, h is the number of the Attention modules, W _i ^Q 、

The index i of (a) indicates the ith Attention Module, the index Q, K, V indicates that the corresponding matrix is Q, K, V, W ^O Is a parameter matrix which can be learnt in a MultiHead module, and LayerNorm represents a layer normalization function; w ₁ 、b ₁ ，W ₂ 、b ₂ Respectively representing learnable parameter matrixes of two layers of Feed Forward Neural Networks (FFNs), and selecting Relu as an activation function;

obtaining the coding sequence

Then, according to the splicing sequence U _cat Middle tag [ CLS]Position P of _CLS Taking the coding sequence

P of (A) to _CLS The vector representation of the position, as a feature representation of the current historical dialog, is written as

Then will be

Inputting the conversation strategy into two FFN layers and Softmax layers for classification

i denotes the current pairThe turn of the conversation:

and finally, training a loss calculation function of the dialogue strategy selection module into a multi-classification cross entropy loss function:

where i represents the current session turn,

representing the total number of dialog policies,

the score of the jth strategy is predicted in the ith round of conversation,

the true label of the jth strategy of the ith round of conversation is 1, and the label of the positive sample is 0;

a total of 6 session strategies were defined in the session dataset as "question", "emotional response", "self-exposure", "positive and placebo", "offer advice" and "others", respectively; on-session policy

When the prediction is ' question ', emotion reflection ', self-exposure ', affirmation and comfort ' or ' other ', the switching-in conversation generation module gives a reply, and if the conversation strategy is

If the prediction is 'give advice', switching to a psychological state cognition module to predict the psychological cognition type of the user, and guiding the advice recommendation module to reply through the psychological cognition type of the user;

2) establishing dialogue generating module

Dialog generation module through dialog context history and dialog policy

The boot generates a reply. Taking the latest two-round conversation as the conversation history and conversation strategy

Splicing is carried out to set as

Inputting the result into a pre-trained Transformer Decoder, and outputting the result of each step of the Decoder to obtain a complete reply;

US _E ＝Embedding(U _cat )

Response＝Decoder(US _E )

Decoder(US _E )＝Add&Norm(MaskedMultiHead(US _E ,US _E ,US _E ))

MaskedMultiHead(Q,K,V)＝LayerNorm(Concat(head ₁ ,…head _h )W ^o +Q)

MaskedMultiHead is distinguished from MultiHead in that when the extension is calculated, in order to ensure that the prediction of position i can only depend on the known output of position i less than i, so the sequence greater than or equal to position i is subjected to the Mask operation, wherein the operator indicates that the corresponding positions of the matrix are multiplied, the Mask is a zero-power matrix, and the main diagonal is all 1; the loss calculation function of the final training dialog generation module is:

wherein T is _r Representing true replies r _i I represents the current dialog turn,

indicating a true reply r in the ith round of conversation _i The jth reply character of (1), r _i ^1:j-1 Indicating a true reply r in the ith round of conversation _i 1 st character to j-1 st reply character sequence; 3) cognitive module for establishing psychological state

The psychological state cognition module is used for helping to predict the psychological cognition condition of the user through the user conversation history so as to guide the suggestion recommendation module to select suggestions, the retrieval range of the knowledge base can be reduced, and the retrieval speed is improved. Splicing all current user conversation histories and setting the current user conversation histories as U _cur ＝[CLS]u _i [SEP]r _i-1 [SEP]u _i-1 [SEP]...[SEP]u ₁ [SEP]r ₁ [SEP]Where i represents the current conversation turn, if a conversation u of a certain conversation turn j is spliced _j And a recovery of r _j Then result in the sequence U _cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omitted without splicing, and U is added _cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:

obtaining the coding sequence

Then, according to the splicing sequence U _cur Middle tag [ CLS]Position P of _CLS Taking the coding sequence

P of (A) to _CLS A vector representation of the location, as a characteristic representation of the current history of all user conversations,is marked as

Then will be

Inputting the current psychological cognition type of the user into two FFN and Softmax layers for classification, and predicting to obtain the current psychological cognition type of the user

And finally training a loss calculation function of the mental state cognition module into a multi-classification cross entropy loss function:

wherein

Represents the total number of types of mental cognitive states,

score, O, representing the predicted j-th type of mental cognitive state _j A real label representing the j-th psychological cognition state type, wherein a positive sample label is 1, and a negative sample label is 0;

4) recommendation module for establishing suggestions

User psychological cognitive state obtained by prediction through suggestion recommendation module

And screening the psychological question-answer suggestion knowledge base, selecting related candidate suggestions, scoring the candidate suggestions through the context history of the conversation, and selecting the suggestion with the highest score for replying.

Establishing a suggestion recommendation module, wherein a question and answer suggestion knowledge base is required to be used, and the requirement of the question and answer suggestion knowledge base is met: the knowledge base of question-answer suggestions requires that for each question, there be multiple responses, and one score for each response, each question-answer being for the type of user's mental cognitive state to which the question is to be tagged.

Using the user's sentence in the current round of conversation as the key sentence, i.e. u under the current round of conversation j _j And is converted into a sentence vector through a pre-training language model, and the sentence vector is set as U _key Similarly, all questions in the question and answer suggestion knowledge base are converted into vector representation through a pre-training language model; the knowledge base is a psychological question and answer suggestion knowledge base, the original data in the knowledge base is from the question and answer data of patients and psychological consultants, and the psychological question and answer suggestion knowledge base is formed by screening through a psychological team, and the knowledge base requires that a plurality of responses are required for one question posed by the patients;

obtaining the dialog vector U of the user _key And psychological cognition type predicted by the psychological cognition module

Then, first through psychological cognition type

Screening questions in a question and answer suggestion knowledge base, selecting questions of the same psychological cognition type, matching similarity of vector representations of the screened questions through an Annoy tool, selecting the most similar 10 questions, marking all replies under the questions as a candidate reply set A, and marking all candidate replies A in the candidate reply set A _i Respectively communicate with the key sentence u _j Making a splice, i.e. u _j ；A _i ]＝[CLS]u _j [SEP]A _i [SEP]Finally, all the spliced sentences are respectively input into a scoring model Score formed by a transform Encoder and a Sigmoid function for scoring judgment, and a candidate reply A with the highest user Score is replied _i ；

Score _i ＝Score(A _i )

Score(A _i )＝Sigmoid(Encoder[u _j ；A _i ])

Wherein

Representing the knowledge base of the question and answer suggestions, A representing the knowledge base of the question and answer suggestions retrieved by using an Annoy tool

Middle and key sentence U _key All recommended replies under the most relevant 10 questions; score _i Representing candidate reply A _i And U _key Score of (A) _best Representing the candidate reply with the highest score;

training a scoring model Score formed by a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R _pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R _neg Recommending a reply R _pos And does not recommend replying to R _neg Forming a reply set and recording the reply set as R, then splicing the question statement and all replies in the reply set R respectively and recording the result as UR _i ＝[CLS]Query [SEP]R _i [SEP]，R _i Indicates the i-th reply in R when R is spliced _i Belonging to the recommended replies R _pos Setting the value as a positive sample, marking the label y as 1, and when the splicing is not recommended to reply R _neg Setting the label y as 0;

Score _i ＝Score(UR _i )

Score(UR _i )＝Sigmoid(Encoder[Query；UR _i ])

and finally, training a loss calculation function of the scoring model into two-class cross entropy loss:

at present, the world is confronted with the contradiction between a large number of mental health problem groups and scarce psychological consultation professionals, so that the active promotion of artificial intelligence participation to help relieve psychological problems is a hot point of common concern at home and abroad. Compared with a psychological consultant, the psychological consultant system has the following characteristics: first, it charges low, even free; secondly, the consultant can put down the defense to the 'machine' more easily, so that the trust is established conveniently; and thirdly, the method can exist in the form of app, really realizes one-to-one interaction at any time and any place, and can provide effective psychological consultation for a large number of users within a certain time.

Psychological counseling systems can function in a variety of scenarios. For example, in an education scene, students can be helped to persuade emotion outside class, and the study and examination pressure is relieved; in the face of a workplace, workers can be helped regularly to relieve work pressure; in the elderly care center, emotion support and companionship can be provided; it can also help patients reduce anxiety to diseases in hospitals. The psychological consultation system has various application scenes, can enrich the daily life of a user, cure soul, effectively improve the psychological health, save nursing resources, reduce the burden of caregivers and is not influenced by cultural differences of the caregivers. In the future, the psychological consulting system aims to combine a high-tech artificial intelligence technology with an authoritative psychologist, explore innovative service modes and working modes, combine psychological theory knowledge with practice and build a social and psychological service network. Therefore, the method is favorable for the construction of a social and psychological service system, daily psychological test and psychological dispersion of the masses and the establishment of healthy mental consciousness of the masses in the society.

The invention aims to relieve the contradiction between huge psychological consultation demand and scarcity of psychologists, develops an emotion perception and support dialogue system based on deep learning under the guidance of emotion calculation theory and human-assisted technology theory, provides psychological consultation by artificial intelligence, recommends a suitable psychological problem solution for a required user, and provides effective, convenient and low-cost emotional support. The invention has the following specific characteristics:

1) aiming at the problem that the emotion support process of the existing conversation system is fixed, the project constructs a plurality of conversation strategies under the guidance of a human-assisted technology theory, can self-adaptively select the conversation strategies according to the conversation context, guides a conversation generation module to generate reply contents with the same mind, and provides more effective emotion support for users.

2) Aiming at the problem that the response quality often has problems when the existing psychological consulting system carries out suggestion response, the high-quality response is provided by retrieving the question-answer suggestion knowledge base.

By constructing the system, the emotion perception and support rich in the same sense of mind can be provided for the user, the accumulation of negative emotions of people is effectively relieved, and the psychological health of people is emphasized to a certain extent.

Drawings

FIG. 1 is a flow chart of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art, and the present invention is not particularly limited, except for those specifically mentioned below.

Referring to fig. 1, the present invention mainly includes the following steps:

the method comprises the following steps: and predicting the conversation strategy of the current turn according to the conversation history of the user.

Adaptively selecting a conversation strategy based on the conversation history to guide a conversation generation module to perform reply generation;

the 6 conversation strategies of 'questioning', 'emotion reflecting', 'self-exposure', 'affirmation and consolation', 'suggestion giving' and 'other' are set in the conversation strategy selection module.

Sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]The sequence is finally tagged [ SEP ]]To obtain a splicing sequence U _cat ＝[CLS]u _i-2 [SEP]r _i-2 [SEP]u _i-1 [SEP]r _i-1 [SEP]u _i [SEP](ii) a If the splicing sequence U _cat If the length exceeds 512, the splicing sequence is cut off in a whole sentence cutting way, and the conversation which occurs firstly in the cutting off is considered in sequence until the sequence length does not exceed 512;

the Transformer Encoder in the dialogue strategy selection module is initialized through an ERNIE pre-training model of Chinese. Training in a pre-training and fine-tuning mode, initializing the Encoder, and then training on the full data of the dialogue data set in a supervision mode;

step two: determining whether to switch into a dialogue generation module or a mental state cognition module according to the predicted dialogue strategy in the first step:

if the conversation strategy prediction is "question asking", "emotion reflection", "self presentation", "affirmation and comfort", or "others", the user is given a reply by the conversation creation module, and if the conversation strategy prediction is "suggestion giving", the user is predicted to be in a mental state cognitive state by the mental state cognitive module.

1) The dialog generation module generates reply content through the user dialog history and the current dialog policy. Taking the latest two conversations as conversation history and conversation strategy

Splicing is carried out to set as

initializing a Transformardecoder by adopting a Chinese pre-training model Robeta; in the training stage, the effect of training is improvedThe spliced conversation strategy is a real conversation strategy S _i Splicing predicted conversation strategy in testing stage

Like the truncation of the concatenation sequence by the dialog strategy module, if the concatenation sequence US is present _cat If the length exceeds 512, the splicing sequence is cut off in a whole sentence cutting way, and the conversation which occurs firstly in the cutting off is considered in sequence until the sequence length does not exceed 512;

after a Chinese pre-training model Robeta is adopted by a dialogue generation module to initialize a transformer Decoder, a common mask matrix is adopted for the mask matrix of attention in the Decoder input, namely, the condition that only data in front of a current point can be seen and data behind the point cannot be seen is limited, the common mask is similar to an inverted triangle when seen from the matrix, the lower half part is 1, and the upper half part is 0; meanwhile, the output of the deorder needs to complete the generation reply, so a language model head needs to be connected behind the deorder to complete the generation work. Training in a pre-training and fine-tuning mode, and training on the full data of the dialogue data set in a supervision mode after initializing the deorder;

2) a psychological state cognition module for splicing the current conversation histories of all users and setting the current conversation histories as U _cur ＝[CLS]u _i [SEP]r _i-1 [SEP]u _i-1 [SEP]...[SEP]u ₁ [SEP]r ₁ [SEP]I represents the current turn, if the dialog u of a certain dialog turn j is spliced _j And a recovery r _j Then result in the sequence U _cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omitted without splicing, and U is added _cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:

the psychological state cognition module initializes a Transformer Encoder by adopting a Chinese pre-training model BERT-Large; training in a pre-training and fine-tuning mode, initializing the Encoder, and then training on the full data of the dialogue data set in a supervision mode;

3) after classifying the psychological cognitive conditions of the user, entering a suggestion recommendation module, screening a question and answer suggestion knowledge base through the psychological cognitive state of the pre-user, selecting relevant candidate replies, and scoring the candidate suggestions through the conversation history of the user to reply the candidate reply with the highest score.

Using the user's sentence in the current round of conversation as the key sentence, i.e. u under the current round of conversation j _j And is converted into a sentence vector through a pre-training language model, and the sentence vector is set as U _key Similarly, all questions in the question and answer suggestion knowledge base are converted into vector representation through a pre-training language model;

Then, first through psychological cognition type

Score _i ＝Score(A _i )

Score(A _i )＝Sigmoid(Encoder[u _j ；A _i ])

Wherein, the first and the second end of the pipe are connected with each other,

representing the knowledge base of question and answer suggestions, A represents that the knowledge base of question and answer suggestions is searched by using Annoy tool

Middle and key sentence U _key All recommended replies under the most relevant 10 questions; score _i Represents a candidate reply A _i And U _key Score of (A) _best Representing the highest scoring candidate reply.

Training a scoring model Score formed by a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R _pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R _neg Recommending a reply R _pos And does not recommend replying to R _neg Forming a reply set and recording the reply set as R, then splicing the question statement and all replies in the reply set R respectively and recording the result as UR _i ＝[CLS]Query[SEP]R _i [SEP]，R _i Represents the ith reply in R, R when spliced _i Belonging to the recommended replies R _pos If the value is positive, the tag y is 1, and when the concatenation is the non-recommended reply R _neg Setting the label y as 0; initializing a Transformer Encoder by adopting a pre-training model BERT-Large, training in a pre-training and fine-tuning mode, and training on the full data of a dialogue data set in a supervision mode after initializing the Encoder;

Score _i ＝Score(UR _i )

Score(UR _i )＝Sigmoid(Encoder[Query；UR _i ])

Claims

1. a method for constructing a dialogue system based on emotion perception and support of deep learning is characterized by comprising the following steps:

1) establishing dialogue strategy selection module

Using the dialog data set, the dialog data set is to satisfy: each piece of data has a plurality of turns of conversation history, and a conversation strategy used in reply in each turn of conversation is marked out in each turn of conversation, and a user psychological cognitive state type is marked out;

for the dialog data set, set the given dialog context history as { u } ₁ ，r ₁ ，u ₂ ，r ₂ ，...，u _m ，r _m Where i denotes the dialog turn, u _i Statements, r, representing users in the ith round of dialog _i Statement representing the i-th wheel reply to the dialogue centering system, given dialogue policy { S } ₁ ，S ₂ ，...，S _m Giving a psychological cognitive state O of the user, and under the condition of giving a current conversation turn i, selecting two most recent conversations as a current conversation history of the user, and setting U as { U ═ U } _i-2 ，r _i-2 ，u _i-1 ，r _i-1 ，u _i Selecting a sentence from the 1 st round of conversation to the current round of conversation when the number of the sentences is less than two;

sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]The sequence is finally tagged [ SEP ]]To obtain a splicing sequence U _cat ＝[CLS]u _i-2 [SEP]r _i-2 [SEP]u _i-1 [SEP]r _i-1 [SEP]u _i [SEP](ii) a Splicing sequence U _cat As input, the splicing sequence U is subjected to a pre-trained transformer encoder model _cat Coding to obtain the coding sequence output by the Transformamer Encoder model

The specific process is as follows:

U _E ＝Embedding(U _cat )

Encoder(U _E )＝Add&Norm(MultiHead(U _E ，U _E ，U _E ))

Add&Norm(x)＝LayerNorm[FFN(x)+x]

FFN(x)＝Relu(xW ₁ +b ₁ )W ₂ +b ₂

MultiHead(Q，K，V)＝LayerNorm(Concat(head ₁ ，...head _h )W ^O +Q)

head _i ＝Attention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

the function of the Embedding function is to convert an input sequence into a continuous embedded matrix, where K is ^T A transposed matrix representing the matrix K, d _k Is a constant with a size that depends on the number of columns of matrix Q at the time of computation; w is a group of _i ^Q 、W _i ^K 、W _i ^V Is a parameter matrix which can be learnt by the Attention modules, each kind has h in total, h is the number of the Attention modules, W _i ^Q 、W _i ^K 、W _i ^V The index i of (a) indicates the ith Attention Module, the index Q, K, V indicates that the corresponding matrix is Q, K, V, W ^O Is a parameter matrix which can be learnt in a MultiHead module, and LayerNorm represents a layer normalization function; w ₁ 、b ₁ ，W ₂ 、b ₂ Respectively representing learnable parameter matrixes of two layers of Feed Forward Neural Networks (FFNs), and selecting Relu as an activation function;

obtaining the coding sequence

Then will be

i denotes the current conversation turn:

and finally training a loss calculation function of the dialogue strategy selection module into a multi-classification cross entropy loss function:

where i represents the current session turn,

representing the total number of dialog policies,

the score of the jth strategy is predicted in the ith round of conversation,

a total of 6 conversational strategies are defined in the conversational dataset, respectively "question", "emotional response", "self-exposure", "positive and placebo", "give advice" and "others"; on-session policy

2) establishing dialogue generating module

Taking the latest two-round conversation as the conversation history and conversation strategy

Splicing is carried out to set as

US _E ＝Embedding(U _cat )

Response＝Decoder(US _E )

Decoder(US _E )＝Add&Norm(MaskedMultiHead(US _E ，US _E ，US _E ))

MaskedMultiHead(Q，K，V)＝LayerNorm(Concat(head ₁ ，，...head _h )W ^O +Q)

head _i ＝MaskedAttention(QW _i ^Q ，KW _i ^K ，VW _i ^V )

maskedfullead and MultiHead are distinguished in that in calculating the Attention, to ensure that the prediction for position i can only depend on the known output for positions less than i, a Mask operation is performed on the sequence greater than or equal to position i, wherein an operator indicates that the corresponding positions of the matrix are multiplied, the Mask is a zero-power matrix, and the main diagonal is all 1;

the loss calculation function of the final training dialog generation module is:

wherein T is _r Representing true replies r _i I denotes the current conversation turn, r _i ^j Indicating a true reply r in the ith round of conversation _i The jth reply character of (1), r _i ^1：j-1 Indicating a true reply r in the ith round of conversation _i 1 st character to j-1 st reply character sequence;

3) establishing psychological state cognition module

Splicing all current user conversation histories and setting the current user conversation histories as U _cur ＝[CLS]u _i [SEP]r _i-1 [SEP]u _i-1 [SEP]...[SEP]u ₁ [SEP]r ₁ [SEP]I represents the current turn, if the dialog u of a certain dialog turn j is spliced _j And a recovery r _j Then result in the sequence U _cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omittedWithout splicing, add U _cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:

obtaining the coding sequence

P (c) of _CLS The vector representation of the position, which is the characteristic representation of the current all-user dialog history, is recorded as

Then will be

wherein

Represents the total number of types of mental cognitive states,

4) recommendation module for establishing suggestions

Using a knowledge base of question and answer suggestions, the knowledge base of question and answer suggestions satisfying: each question in the question-answer suggestion knowledge base has a plurality of replies, and each question-answer is used for marking the psychological cognitive state type of the user of the question;

Then, first through psychological cognition type

Screening questions in a question and answer suggestion knowledge base, selecting the questions with the same psychological cognition type, performing similarity matching on vector representations of the screened questions through an Annoy tool, selecting the most similar 10 questions, marking all replies under the questions as a candidate reply set A, and marking all candidate replies A in the candidate reply set A _i Respectively communicate with the key sentence u _j Make a splice, i.e. [ u ] _j ；A _i ]＝[CLS]u _j [SEP]A _i [SEP]Finally, all the spliced sentences are respectively input into the transform Encoder and the SigScoring evaluation is carried out in a scoring model Score formed by the moid function, and a candidate reply A with the highest user Score is replied _i ；

Score _i ＝Score(A _i )

Score(A _i )＝Sigmoid(Encoder[u _j ；A _i ])

Wherein

training a scoring model Score consisting of a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R _pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R _neg Recommending a reply R _pos And does not recommend reply R _neg Forming a reply set and recording the reply set as R, then respectively splicing the question sentence and all replies in the reply set R and recording the result as UR _i ＝[CLS]Query[SEP]R _i [SEP]，R _i Indicates the i-th reply in R when R is spliced _i Belonging to the recommended replies R _pos Is set to be positiveSample, note label y ═ 1, when concatenation is not recommended to reply R _neg Setting the label y as 0;

Score _i ＝Score(UR _i )

Score(UR _i )＝Sigmoid(Encoder[Query；UR _i ])

and finally, training a loss calculation function of the scoring model into two-classification cross entropy loss: