CN114999610A - Deep learning-based emotion perception and support dialog system construction method - Google Patents

Deep learning-based emotion perception and support dialog system construction method Download PDF

Info

Publication number
CN114999610A
CN114999610A CN202210332004.4A CN202210332004A CN114999610A CN 114999610 A CN114999610 A CN 114999610A CN 202210332004 A CN202210332004 A CN 202210332004A CN 114999610 A CN114999610 A CN 114999610A
Authority
CN
China
Prior art keywords
conversation
reply
user
psychological
sep
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210332004.4A
Other languages
Chinese (zh)
Inventor
杨燕
谭振东
孙宇翔
张雨时
陈妍
贺樑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202210332004.4A priority Critical patent/CN114999610A/en
Publication of CN114999610A publication Critical patent/CN114999610A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/70ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mental therapies, e.g. psychological therapy or autogenous training
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Developmental Disabilities (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Psychology (AREA)
  • Social Psychology (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a deep learning-based emotion perception and support dialogue system construction method which comprises a dialogue strategy selection module, a dialogue generation module, a psychological state cognition module and a suggestion recommendation module. The conversation strategy selection module controls the conversation process and selects a conversation strategy in real time according to the conversation history of the user; the dialogue generation module inputs the user dialogue history and the current dialogue strategy information into a decoder, and generates a reply statement through the decoder; the psychological state cognition module classifies the current psychological cognition state of the user through the user dialogue history; and the suggestion recommendation module retrieves the knowledge base to generate suggestion replies according to the user psychological cognitive state and the conversation history. The system constructs a plurality of conversation strategies by introducing the self-adaptive conversation strategy selection module, can select the conversation strategies according to the context, guides the model to generate reply contents with the same sense, and provides more effective emotional support for the user.

Description

Deep learning-based emotion perception and support dialog system construction method
Technical Field
The invention relates to the technical field of a dialogue recommendation system, in particular to a dialogue system construction method based on deep learning emotion perception and support, which determines the psychological cognition condition of a user and selects a proper dialogue strategy through dialogue history to generate a reply with more homonymy, and recommends the most proper answer from a question-answer suggestion knowledge base to feed back to the user.
Background
The dialogue system receives a great deal of attention from businesses and academia due to its high business value, and has been developed further in recent years thanks to the massive growth of network data and the development of deep learning.
In the background of global spread of epidemic situations, the number of psychological diseases such as anxiety and depression has been higher than 19 years ago in recent years. However, psychological counseling is expensive, professional psychological counselors are in short supply, and the like, so that many people cannot treat psychological diseases in time. Through the combination of artificial intelligence and psychology, the contradiction between the limitation of mental health resources and the limitless demand of mental disease patients can be relieved to a certain extent. The artificial intelligence research has been greatly developed in the aspects of deep learning, natural language processing, emotion recognition, computer vision and the like, and has been widely applied in the field of dialog systems, particularly in the aspects of intelligent question answering and intelligent customer service. Meanwhile, the psychological consultation system using artificial intelligence as a core technology gradually enters the visual field of people, and experiments prove that the psychological consultation robot combined with the artificial intelligence can sense the psychological state of a user in an interaction process and relieve the psychological pressure of the user to a certain extent.
Disclosure of Invention
The invention aims to provide a method for constructing a dialogue system based on emotion perception and support of deep learning aiming at the problem that the emotion support process of the existing dialogue system is fixed.
The specific technical scheme for realizing the purpose of the invention is as follows:
a method for constructing a deep learning-based emotion perception and support dialog system comprises the following steps:
the method comprises the following steps:
1) establishing dialogue strategy selection module
The conversation strategy selection module dynamically selects a conversation strategy through conversation context history, trains a conversation strategy selection model of the conversation strategy selection module, needs to be applied to a conversation data set, requires multiple rounds of conversation history in the data set, marks user psychological cognition state types such as 13 psychological cognition state types including 'lingering symptoms', 'occupational troubles', 'school overlong', 'heavy drinking', 'depression', 'academic pressure', 'appearance anxiety', 'sleep problems', 'family conflicts', 'friendship troubles', 'love troubles', 'family troubles', 'growth troubles', and marks the conversation strategy used when the conversation is replied in each round of conversation; in the dialogue system, 6 kinds of dialogue strategies of 'question asking', 'emotion reflection', 'self-exposure', 'affirmation and consolation', 'suggestion giving' and 'other' are set together;
for the dialog data set, set the given dialog context history as { u } 1 ,r 1 ,u 2 ,r 2 ,...,u m ,r m Where i denotes the dialog turn, u i Statements, r, representing users in the ith round of dialog i Statement representing the i-th wheel reply to the dialogue centering system, given dialogue policy { S } 1 ,S 2 ,...,,S m }, giving user psycho-cognitive statusAnd O, under the condition of giving the current conversation turn i, selecting the latest two conversations as the current conversation history of the user, and setting U to { U ═ U { (U) } i-2 ,r i-2 ,u i-1 ,r i-1 ,u i Selecting a 1 st round of dialogue to a current round of user statement when the number of the user statements is less than two;
sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]Sequence Final addition of tag [ SEP ]]To obtain a splicing sequence U cat =[CLS]u i-2 [SEP]r i-2 [SEP]u i-1 [SEP]r i-1 [SEP]u i [SEP](ii) a Splicing sequence U cat As input, the splicing sequence U is subjected to a pre-trained transformer encoder model cat Coding to obtain the coding sequence output by the Transformamer Encoder model
Figure BDA0003575393850000021
The specific process is as follows:
U E =Embedding(U cat )
Figure BDA0003575393850000022
Encoder(U E )=Add&Norm(MultiHead(U E ,U E ,U E ))
Add&Norm(x)=LayerNorm[FFN(x)+x]
FFN(x)=Relu(xW 1 +b 1 )W 2 +b 2
MultiHead(Q,K,V)=LayerNorm(Concat(head 1 ,…head h )W O +Q)
Figure BDA0003575393850000023
Figure BDA0003575393850000024
the function of the Embedding function is to convert an input sequence into a continuous embedded matrix, where K is T A transposed matrix representing the matrix K, d k Is a constant whose size depends on the number of columns of the matrix Q at the time of calculation; w i Q
Figure BDA0003575393850000025
Is a parameter matrix which can be learnt by the Attention modules, each kind has h in total, h is the number of the Attention modules, W i Q
Figure BDA0003575393850000026
The index i of (a) indicates the ith Attention Module, the index Q, K, V indicates that the corresponding matrix is Q, K, V, W O Is a parameter matrix which can be learnt in a MultiHead module, and LayerNorm represents a layer normalization function; w 1 、b 1 ,W 2 、b 2 Respectively representing learnable parameter matrixes of two layers of Feed Forward Neural Networks (FFNs), and selecting Relu as an activation function;
obtaining the coding sequence
Figure BDA0003575393850000031
Then, according to the splicing sequence U cat Middle tag [ CLS]Position P of CLS Taking the coding sequence
Figure BDA0003575393850000032
P of (A) to CLS The vector representation of the position, as a feature representation of the current historical dialog, is written as
Figure BDA0003575393850000033
Then will be
Figure BDA0003575393850000034
Inputting the conversation strategy into two FFN layers and Softmax layers for classification
Figure BDA0003575393850000035
i denotes the current pairThe turn of the conversation:
Figure BDA0003575393850000036
and finally, training a loss calculation function of the dialogue strategy selection module into a multi-classification cross entropy loss function:
Figure BDA0003575393850000037
where i represents the current session turn,
Figure BDA0003575393850000038
representing the total number of dialog policies,
Figure BDA0003575393850000039
the score of the jth strategy is predicted in the ith round of conversation,
Figure BDA00035753938500000310
the true label of the jth strategy of the ith round of conversation is 1, and the label of the positive sample is 0;
a total of 6 session strategies were defined in the session dataset as "question", "emotional response", "self-exposure", "positive and placebo", "offer advice" and "others", respectively; on-session policy
Figure BDA00035753938500000311
When the prediction is ' question ', emotion reflection ', self-exposure ', affirmation and comfort ' or ' other ', the switching-in conversation generation module gives a reply, and if the conversation strategy is
Figure BDA00035753938500000312
If the prediction is 'give advice', switching to a psychological state cognition module to predict the psychological cognition type of the user, and guiding the advice recommendation module to reply through the psychological cognition type of the user;
2) establishing dialogue generating module
Dialog generation module through dialog context history and dialog policy
Figure BDA00035753938500000313
The boot generates a reply. Taking the latest two-round conversation as the conversation history and conversation strategy
Figure BDA00035753938500000314
Splicing is carried out to set as
Figure BDA00035753938500000315
Inputting the result into a pre-trained Transformer Decoder, and outputting the result of each step of the Decoder to obtain a complete reply;
US E =Embedding(U cat )
Response=Decoder(US E )
Decoder(US E )=Add&Norm(MaskedMultiHead(US E ,US E ,US E ))
MaskedMultiHead(Q,K,V)=LayerNorm(Concat(head 1 ,…head h )W o +Q)
Figure BDA00035753938500000316
Figure BDA0003575393850000041
MaskedMultiHead is distinguished from MultiHead in that when the extension is calculated, in order to ensure that the prediction of position i can only depend on the known output of position i less than i, so the sequence greater than or equal to position i is subjected to the Mask operation, wherein the operator indicates that the corresponding positions of the matrix are multiplied, the Mask is a zero-power matrix, and the main diagonal is all 1; the loss calculation function of the final training dialog generation module is:
Figure BDA0003575393850000042
wherein T is r Representing true replies r i I represents the current dialog turn,
Figure BDA0003575393850000043
indicating a true reply r in the ith round of conversation i The jth reply character of (1), r i 1:j-1 Indicating a true reply r in the ith round of conversation i 1 st character to j-1 st reply character sequence; 3) cognitive module for establishing psychological state
The psychological state cognition module is used for helping to predict the psychological cognition condition of the user through the user conversation history so as to guide the suggestion recommendation module to select suggestions, the retrieval range of the knowledge base can be reduced, and the retrieval speed is improved. Splicing all current user conversation histories and setting the current user conversation histories as U cur =[CLS]u i [SEP]r i-1 [SEP]u i-1 [SEP]...[SEP]u 1 [SEP]r 1 [SEP]Where i represents the current conversation turn, if a conversation u of a certain conversation turn j is spliced j And a recovery of r j Then result in the sequence U cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omitted without splicing, and U is added cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:
Figure BDA0003575393850000044
obtaining the coding sequence
Figure BDA0003575393850000045
Then, according to the splicing sequence U cur Middle tag [ CLS]Position P of CLS Taking the coding sequence
Figure BDA0003575393850000046
P of (A) to CLS A vector representation of the location, as a characteristic representation of the current history of all user conversations,is marked as
Figure BDA0003575393850000047
Then will be
Figure BDA0003575393850000048
Inputting the current psychological cognition type of the user into two FFN and Softmax layers for classification, and predicting to obtain the current psychological cognition type of the user
Figure BDA0003575393850000049
Figure BDA00035753938500000410
And finally training a loss calculation function of the mental state cognition module into a multi-classification cross entropy loss function:
Figure BDA00035753938500000411
wherein
Figure BDA00035753938500000412
Represents the total number of types of mental cognitive states,
Figure BDA00035753938500000413
score, O, representing the predicted j-th type of mental cognitive state j A real label representing the j-th psychological cognition state type, wherein a positive sample label is 1, and a negative sample label is 0;
4) recommendation module for establishing suggestions
User psychological cognitive state obtained by prediction through suggestion recommendation module
Figure BDA0003575393850000051
And screening the psychological question-answer suggestion knowledge base, selecting related candidate suggestions, scoring the candidate suggestions through the context history of the conversation, and selecting the suggestion with the highest score for replying.
Establishing a suggestion recommendation module, wherein a question and answer suggestion knowledge base is required to be used, and the requirement of the question and answer suggestion knowledge base is met: the knowledge base of question-answer suggestions requires that for each question, there be multiple responses, and one score for each response, each question-answer being for the type of user's mental cognitive state to which the question is to be tagged.
Using the user's sentence in the current round of conversation as the key sentence, i.e. u under the current round of conversation j j And is converted into a sentence vector through a pre-training language model, and the sentence vector is set as U key Similarly, all questions in the question and answer suggestion knowledge base are converted into vector representation through a pre-training language model; the knowledge base is a psychological question and answer suggestion knowledge base, the original data in the knowledge base is from the question and answer data of patients and psychological consultants, and the psychological question and answer suggestion knowledge base is formed by screening through a psychological team, and the knowledge base requires that a plurality of responses are required for one question posed by the patients;
obtaining the dialog vector U of the user key And psychological cognition type predicted by the psychological cognition module
Figure BDA0003575393850000052
Then, first through psychological cognition type
Figure BDA0003575393850000053
Screening questions in a question and answer suggestion knowledge base, selecting questions of the same psychological cognition type, matching similarity of vector representations of the screened questions through an Annoy tool, selecting the most similar 10 questions, marking all replies under the questions as a candidate reply set A, and marking all candidate replies A in the candidate reply set A i Respectively communicate with the key sentence u j Making a splice, i.e. u j ;A i ]=[CLS]u j [SEP]A i [SEP]Finally, all the spliced sentences are respectively input into a scoring model Score formed by a transform Encoder and a Sigmoid function for scoring judgment, and a candidate reply A with the highest user Score is replied i
Figure BDA0003575393850000054
Score i =Score(A i )
Score(A i )=Sigmoid(Encoder[u j ;A i ])
Figure BDA0003575393850000055
Wherein
Figure BDA0003575393850000056
Representing the knowledge base of the question and answer suggestions, A representing the knowledge base of the question and answer suggestions retrieved by using an Annoy tool
Figure BDA0003575393850000057
Middle and key sentence U key All recommended replies under the most relevant 10 questions; score i Representing candidate reply A i And U key Score of (A) best Representing the candidate reply with the highest score;
training a scoring model Score formed by a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R neg Recommending a reply R pos And does not recommend replying to R neg Forming a reply set and recording the reply set as R, then splicing the question statement and all replies in the reply set R respectively and recording the result as UR i =[CLS]Query [SEP]R i [SEP],R i Indicates the i-th reply in R when R is spliced i Belonging to the recommended replies R pos Setting the value as a positive sample, marking the label y as 1, and when the splicing is not recommended to reply R neg Setting the label y as 0;
Score i =Score(UR i )
Score(UR i )=Sigmoid(Encoder[Query;UR i ])
and finally, training a loss calculation function of the scoring model into two-class cross entropy loss:
Figure BDA0003575393850000061
at present, the world is confronted with the contradiction between a large number of mental health problem groups and scarce psychological consultation professionals, so that the active promotion of artificial intelligence participation to help relieve psychological problems is a hot point of common concern at home and abroad. Compared with a psychological consultant, the psychological consultant system has the following characteristics: first, it charges low, even free; secondly, the consultant can put down the defense to the 'machine' more easily, so that the trust is established conveniently; and thirdly, the method can exist in the form of app, really realizes one-to-one interaction at any time and any place, and can provide effective psychological consultation for a large number of users within a certain time.
Psychological counseling systems can function in a variety of scenarios. For example, in an education scene, students can be helped to persuade emotion outside class, and the study and examination pressure is relieved; in the face of a workplace, workers can be helped regularly to relieve work pressure; in the elderly care center, emotion support and companionship can be provided; it can also help patients reduce anxiety to diseases in hospitals. The psychological consultation system has various application scenes, can enrich the daily life of a user, cure soul, effectively improve the psychological health, save nursing resources, reduce the burden of caregivers and is not influenced by cultural differences of the caregivers. In the future, the psychological consulting system aims to combine a high-tech artificial intelligence technology with an authoritative psychologist, explore innovative service modes and working modes, combine psychological theory knowledge with practice and build a social and psychological service network. Therefore, the method is favorable for the construction of a social and psychological service system, daily psychological test and psychological dispersion of the masses and the establishment of healthy mental consciousness of the masses in the society.
The invention aims to relieve the contradiction between huge psychological consultation demand and scarcity of psychologists, develops an emotion perception and support dialogue system based on deep learning under the guidance of emotion calculation theory and human-assisted technology theory, provides psychological consultation by artificial intelligence, recommends a suitable psychological problem solution for a required user, and provides effective, convenient and low-cost emotional support. The invention has the following specific characteristics:
1) aiming at the problem that the emotion support process of the existing conversation system is fixed, the project constructs a plurality of conversation strategies under the guidance of a human-assisted technology theory, can self-adaptively select the conversation strategies according to the conversation context, guides a conversation generation module to generate reply contents with the same mind, and provides more effective emotion support for users.
2) Aiming at the problem that the response quality often has problems when the existing psychological consulting system carries out suggestion response, the high-quality response is provided by retrieving the question-answer suggestion knowledge base.
By constructing the system, the emotion perception and support rich in the same sense of mind can be provided for the user, the accumulation of negative emotions of people is effectively relieved, and the psychological health of people is emphasized to a certain extent.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. The procedures, conditions, experimental methods and the like for carrying out the present invention are general knowledge and common general knowledge in the art, and the present invention is not particularly limited, except for those specifically mentioned below.
Referring to fig. 1, the present invention mainly includes the following steps:
the method comprises the following steps: and predicting the conversation strategy of the current turn according to the conversation history of the user.
Adaptively selecting a conversation strategy based on the conversation history to guide a conversation generation module to perform reply generation;
the 6 conversation strategies of 'questioning', 'emotion reflecting', 'self-exposure', 'affirmation and consolation', 'suggestion giving' and 'other' are set in the conversation strategy selection module.
Sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]The sequence is finally tagged [ SEP ]]To obtain a splicing sequence U cat =[CLS]u i-2 [SEP]r i-2 [SEP]u i-1 [SEP]r i-1 [SEP]u i [SEP](ii) a If the splicing sequence U cat If the length exceeds 512, the splicing sequence is cut off in a whole sentence cutting way, and the conversation which occurs firstly in the cutting off is considered in sequence until the sequence length does not exceed 512;
the Transformer Encoder in the dialogue strategy selection module is initialized through an ERNIE pre-training model of Chinese. Training in a pre-training and fine-tuning mode, initializing the Encoder, and then training on the full data of the dialogue data set in a supervision mode;
step two: determining whether to switch into a dialogue generation module or a mental state cognition module according to the predicted dialogue strategy in the first step:
if the conversation strategy prediction is "question asking", "emotion reflection", "self presentation", "affirmation and comfort", or "others", the user is given a reply by the conversation creation module, and if the conversation strategy prediction is "suggestion giving", the user is predicted to be in a mental state cognitive state by the mental state cognitive module.
1) The dialog generation module generates reply content through the user dialog history and the current dialog policy. Taking the latest two conversations as conversation history and conversation strategy
Figure BDA0003575393850000081
Splicing is carried out to set as
Figure BDA0003575393850000082
Inputting the result into a pre-trained Transformer Decoder, and outputting the result of each step of the Decoder to obtain a complete reply;
initializing a Transformardecoder by adopting a Chinese pre-training model Robeta; in the training stage, the effect of training is improvedThe spliced conversation strategy is a real conversation strategy S i Splicing predicted conversation strategy in testing stage
Figure BDA0003575393850000083
Like the truncation of the concatenation sequence by the dialog strategy module, if the concatenation sequence US is present cat If the length exceeds 512, the splicing sequence is cut off in a whole sentence cutting way, and the conversation which occurs firstly in the cutting off is considered in sequence until the sequence length does not exceed 512;
after a Chinese pre-training model Robeta is adopted by a dialogue generation module to initialize a transformer Decoder, a common mask matrix is adopted for the mask matrix of attention in the Decoder input, namely, the condition that only data in front of a current point can be seen and data behind the point cannot be seen is limited, the common mask is similar to an inverted triangle when seen from the matrix, the lower half part is 1, and the upper half part is 0; meanwhile, the output of the deorder needs to complete the generation reply, so a language model head needs to be connected behind the deorder to complete the generation work. Training in a pre-training and fine-tuning mode, and training on the full data of the dialogue data set in a supervision mode after initializing the deorder;
2) a psychological state cognition module for splicing the current conversation histories of all users and setting the current conversation histories as U cur =[CLS]u i [SEP]r i-1 [SEP]u i-1 [SEP]...[SEP]u 1 [SEP]r 1 [SEP]I represents the current turn, if the dialog u of a certain dialog turn j is spliced j And a recovery r j Then result in the sequence U cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omitted without splicing, and U is added cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:
the psychological state cognition module initializes a Transformer Encoder by adopting a Chinese pre-training model BERT-Large; training in a pre-training and fine-tuning mode, initializing the Encoder, and then training on the full data of the dialogue data set in a supervision mode;
3) after classifying the psychological cognitive conditions of the user, entering a suggestion recommendation module, screening a question and answer suggestion knowledge base through the psychological cognitive state of the pre-user, selecting relevant candidate replies, and scoring the candidate suggestions through the conversation history of the user to reply the candidate reply with the highest score.
Using the user's sentence in the current round of conversation as the key sentence, i.e. u under the current round of conversation j j And is converted into a sentence vector through a pre-training language model, and the sentence vector is set as U key Similarly, all questions in the question and answer suggestion knowledge base are converted into vector representation through a pre-training language model;
obtaining the dialog vector U of the user key And psychological cognition type predicted by the psychological cognition module
Figure BDA0003575393850000091
Then, first through psychological cognition type
Figure BDA0003575393850000092
Screening questions in a question and answer suggestion knowledge base, selecting questions of the same psychological cognition type, matching similarity of vector representations of the screened questions through an Annoy tool, selecting the most similar 10 questions, marking all replies under the questions as a candidate reply set A, and marking all candidate replies A in the candidate reply set A i Respectively communicate with the key sentence u j Making a splice, i.e. u j ;A i ]=[CLS]u j [SEP]A i [SEP]Finally, all the spliced sentences are respectively input into a scoring model Score formed by a transform Encoder and a Sigmoid function for scoring judgment, and a candidate reply A with the highest user Score is replied i
Figure BDA0003575393850000093
Score i =Score(A i )
Score(A i )=Sigmoid(Encoder[u j ;A i ])
Figure BDA0003575393850000094
Wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003575393850000095
representing the knowledge base of question and answer suggestions, A represents that the knowledge base of question and answer suggestions is searched by using Annoy tool
Figure BDA0003575393850000096
Middle and key sentence U key All recommended replies under the most relevant 10 questions; score i Represents a candidate reply A i And U key Score of (A) best Representing the highest scoring candidate reply.
Training a scoring model Score formed by a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R neg Recommending a reply R pos And does not recommend replying to R neg Forming a reply set and recording the reply set as R, then splicing the question statement and all replies in the reply set R respectively and recording the result as UR i =[CLS]Query[SEP]R i [SEP],R i Represents the ith reply in R, R when spliced i Belonging to the recommended replies R pos If the value is positive, the tag y is 1, and when the concatenation is the non-recommended reply R neg Setting the label y as 0; initializing a Transformer Encoder by adopting a pre-training model BERT-Large, training in a pre-training and fine-tuning mode, and training on the full data of a dialogue data set in a supervision mode after initializing the Encoder;
Score i =Score(UR i )
Score(UR i )=Sigmoid(Encoder[Query;UR i ])
and finally, training a loss calculation function of the scoring model into two-class cross entropy loss:
Figure BDA0003575393850000101

Claims (1)

1. a method for constructing a dialogue system based on emotion perception and support of deep learning is characterized by comprising the following steps:
1) establishing dialogue strategy selection module
Using the dialog data set, the dialog data set is to satisfy: each piece of data has a plurality of turns of conversation history, and a conversation strategy used in reply in each turn of conversation is marked out in each turn of conversation, and a user psychological cognitive state type is marked out;
for the dialog data set, set the given dialog context history as { u } 1 ,r 1 ,u 2 ,r 2 ,...,u m ,r m Where i denotes the dialog turn, u i Statements, r, representing users in the ith round of dialog i Statement representing the i-th wheel reply to the dialogue centering system, given dialogue policy { S } 1 ,S 2 ,...,S m Giving a psychological cognitive state O of the user, and under the condition of giving a current conversation turn i, selecting two most recent conversations as a current conversation history of the user, and setting U as { U ═ U } i-2 ,r i-2 ,u i-1 ,r i-1 ,u i Selecting a sentence from the 1 st round of conversation to the current round of conversation when the number of the sentences is less than two;
sequentially splicing the current conversation history U according to the conversation history sequence, and passing through an independent tag (SEP)]Separating different dialogues and adding a tag [ CLS ] before the spliced sequence]The sequence is finally tagged [ SEP ]]To obtain a splicing sequence U cat =[CLS]u i-2 [SEP]r i-2 [SEP]u i-1 [SEP]r i-1 [SEP]u i [SEP](ii) a Splicing sequence U cat As input, the splicing sequence U is subjected to a pre-trained transformer encoder model cat Coding to obtain the coding sequence output by the Transformamer Encoder model
Figure FDA0003575393840000011
The specific process is as follows:
U E =Embedding(U cat )
Figure FDA0003575393840000012
Encoder(U E )=Add&Norm(MultiHead(U E ,U E ,U E ))
Add&Norm(x)=LayerNorm[FFN(x)+x]
FFN(x)=Relu(xW 1 +b 1 )W 2 +b 2
MultiHead(Q,K,V)=LayerNorm(Concat(head 1 ,...head h )W O +Q)
head i =Attention(QW i Q ,KW i K ,VW i V )
Figure FDA0003575393840000013
the function of the Embedding function is to convert an input sequence into a continuous embedded matrix, where K is T A transposed matrix representing the matrix K, d k Is a constant with a size that depends on the number of columns of matrix Q at the time of computation; w is a group of i Q 、W i K 、W i V Is a parameter matrix which can be learnt by the Attention modules, each kind has h in total, h is the number of the Attention modules, W i Q 、W i K 、W i V The index i of (a) indicates the ith Attention Module, the index Q, K, V indicates that the corresponding matrix is Q, K, V, W O Is a parameter matrix which can be learnt in a MultiHead module, and LayerNorm represents a layer normalization function; w 1 、b 1 ,W 2 、b 2 Respectively representing learnable parameter matrixes of two layers of Feed Forward Neural Networks (FFNs), and selecting Relu as an activation function;
obtaining the coding sequence
Figure FDA0003575393840000021
Then, according to the splicing sequence U cat Middle tag [ CLS]Position P of CLS Taking the coding sequence
Figure FDA0003575393840000022
P of (A) to CLS The vector representation of the position, as a feature representation of the current historical dialog, is written as
Figure FDA0003575393840000023
Then will be
Figure FDA0003575393840000024
Inputting the conversation strategy into two FFN layers and Softmax layers for classification
Figure FDA0003575393840000025
i denotes the current conversation turn:
Figure FDA0003575393840000026
and finally training a loss calculation function of the dialogue strategy selection module into a multi-classification cross entropy loss function:
Figure FDA0003575393840000027
where i represents the current session turn,
Figure FDA0003575393840000028
representing the total number of dialog policies,
Figure FDA0003575393840000029
the score of the jth strategy is predicted in the ith round of conversation,
Figure FDA00035753938400000210
the true label of the jth strategy of the ith round of conversation is 1, and the label of the positive sample is 0;
a total of 6 conversational strategies are defined in the conversational dataset, respectively "question", "emotional response", "self-exposure", "positive and placebo", "give advice" and "others"; on-session policy
Figure FDA00035753938400000215
When the prediction is ' question ', emotion reflection ', self-exposure ', affirmation and comfort ' or ' other ', the switching-in conversation generation module gives a reply, and if the conversation strategy is
Figure FDA00035753938400000211
If the prediction is 'give advice', switching to a psychological state cognition module to predict the psychological cognition type of the user, and guiding the advice recommendation module to reply through the psychological cognition type of the user;
2) establishing dialogue generating module
Taking the latest two-round conversation as the conversation history and conversation strategy
Figure FDA00035753938400000212
Splicing is carried out to set as
Figure FDA00035753938400000213
Inputting the result into a pre-trained Transformer Decoder, and outputting the result of each step of the Decoder to obtain a complete reply;
US E =Embedding(U cat )
Response=Decoder(US E )
Decoder(US E )=Add&Norm(MaskedMultiHead(US E ,US E ,US E ))
MaskedMultiHead(Q,K,V)=LayerNorm(Concat(head 1 ,,...head h )W O +Q)
head i =MaskedAttention(QW i Q ,KW i K ,VW i V )
Figure FDA00035753938400000214
maskedfullead and MultiHead are distinguished in that in calculating the Attention, to ensure that the prediction for position i can only depend on the known output for positions less than i, a Mask operation is performed on the sequence greater than or equal to position i, wherein an operator indicates that the corresponding positions of the matrix are multiplied, the Mask is a zero-power matrix, and the main diagonal is all 1;
the loss calculation function of the final training dialog generation module is:
Figure FDA0003575393840000031
wherein T is r Representing true replies r i I denotes the current conversation turn, r i j Indicating a true reply r in the ith round of conversation i The jth reply character of (1), r i 1:j-1 Indicating a true reply r in the ith round of conversation i 1 st character to j-1 st reply character sequence;
3) establishing psychological state cognition module
Splicing all current user conversation histories and setting the current user conversation histories as U cur =[CLS]u i [SEP]r i-1 [SEP]u i-1 [SEP]...[SEP]u 1 [SEP]r 1 [SEP]I represents the current turn, if the dialog u of a certain dialog turn j is spliced j And a recovery r j Then result in the sequence U cur If the length exceeds 512, the conversation history of the turn j is omitted, and the conversation histories before the turn j are also omittedWithout splicing, add U cur Inputting the sentence level feature representation into a pre-trained Transformer Encoder, and classifying through FFN and Softmax layers:
Figure FDA0003575393840000032
obtaining the coding sequence
Figure FDA0003575393840000033
Then, according to the splicing sequence U cur Middle tag [ CLS]Position P of CLS Taking the coding sequence
Figure FDA0003575393840000034
P (c) of CLS The vector representation of the position, which is the characteristic representation of the current all-user dialog history, is recorded as
Figure FDA0003575393840000035
Then will be
Figure FDA0003575393840000036
Inputting the current psychological cognition type of the user into two FFN and Softmax layers for classification, and predicting to obtain the current psychological cognition type of the user
Figure FDA0003575393840000037
Figure FDA0003575393840000038
And finally training a loss calculation function of the mental state cognition module into a multi-classification cross entropy loss function:
Figure FDA0003575393840000039
wherein
Figure FDA00035753938400000310
Represents the total number of types of mental cognitive states,
Figure FDA00035753938400000311
score, O, representing the predicted j-th type of mental cognitive state j A real label representing the j-th psychological cognition state type, wherein a positive sample label is 1, and a negative sample label is 0;
4) recommendation module for establishing suggestions
Using a knowledge base of question and answer suggestions, the knowledge base of question and answer suggestions satisfying: each question in the question-answer suggestion knowledge base has a plurality of replies, and each question-answer is used for marking the psychological cognitive state type of the user of the question;
using the user's sentence in the current round of conversation as the key sentence, i.e. u under the current round of conversation j j And is converted into a sentence vector through a pre-training language model, and the sentence vector is set as U key Similarly, all questions in the question and answer suggestion knowledge base are converted into vector representation through a pre-training language model;
obtaining the dialog vector U of the user key And psychological cognition type predicted by the psychological cognition module
Figure FDA0003575393840000041
Then, first through psychological cognition type
Figure FDA0003575393840000042
Screening questions in a question and answer suggestion knowledge base, selecting the questions with the same psychological cognition type, performing similarity matching on vector representations of the screened questions through an Annoy tool, selecting the most similar 10 questions, marking all replies under the questions as a candidate reply set A, and marking all candidate replies A in the candidate reply set A i Respectively communicate with the key sentence u j Make a splice, i.e. [ u ] j ;A i ]=[CLS]u j [SEP]A i [SEP]Finally, all the spliced sentences are respectively input into the transform Encoder and the SigScoring evaluation is carried out in a scoring model Score formed by the moid function, and a candidate reply A with the highest user Score is replied i
Figure FDA0003575393840000043
Score i =Score(A i )
Score(A i )=Sigmoid(Encoder[u j ;A i ])
Figure FDA0003575393840000044
Wherein
Figure FDA0003575393840000045
Representing the knowledge base of the question and answer suggestions, A representing the knowledge base of the question and answer suggestions retrieved by using an Annoy tool
Figure FDA0003575393840000046
Middle and key sentence U key All recommended replies under the most relevant 10 questions; score i Representing candidate reply A i And U key Score of (A) best Representing the candidate reply with the highest score;
training a scoring model Score consisting of a transform Encoder and a Sigmoid function, and preprocessing data of a question-answer suggestion knowledge base: firstly, all N replies of each question sentence Query in a question-and-answer suggestion knowledge base are recorded as recommendation replies R pos And simultaneously randomly and repeatedly selecting N replies from replies of other problems as an unrecommended reply R neg Recommending a reply R pos And does not recommend reply R neg Forming a reply set and recording the reply set as R, then respectively splicing the question sentence and all replies in the reply set R and recording the result as UR i =[CLS]Query[SEP]R i [SEP],R i Indicates the i-th reply in R when R is spliced i Belonging to the recommended replies R pos Is set to be positiveSample, note label y ═ 1, when concatenation is not recommended to reply R neg Setting the label y as 0;
Score i =Score(UR i )
Score(UR i )=Sigmoid(Encoder[Query;UR i ])
and finally, training a loss calculation function of the scoring model into two-classification cross entropy loss:
Figure FDA0003575393840000047
CN202210332004.4A 2022-03-31 2022-03-31 Deep learning-based emotion perception and support dialog system construction method Pending CN114999610A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210332004.4A CN114999610A (en) 2022-03-31 2022-03-31 Deep learning-based emotion perception and support dialog system construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210332004.4A CN114999610A (en) 2022-03-31 2022-03-31 Deep learning-based emotion perception and support dialog system construction method

Publications (1)

Publication Number Publication Date
CN114999610A true CN114999610A (en) 2022-09-02

Family

ID=83023386

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210332004.4A Pending CN114999610A (en) 2022-03-31 2022-03-31 Deep learning-based emotion perception and support dialog system construction method

Country Status (1)

Country Link
CN (1) CN114999610A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483981A (en) * 2023-06-16 2023-07-25 北京好心情互联网医院有限公司 Dialogue generation method, device, equipment and storage medium
CN116501852A (en) * 2023-06-29 2023-07-28 之江实验室 Controllable dialogue model training method and device, storage medium and electronic equipment
CN117828063A (en) * 2024-01-10 2024-04-05 广东数业智能科技有限公司 Psychological field data generation and model training method and device and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116483981A (en) * 2023-06-16 2023-07-25 北京好心情互联网医院有限公司 Dialogue generation method, device, equipment and storage medium
CN116501852A (en) * 2023-06-29 2023-07-28 之江实验室 Controllable dialogue model training method and device, storage medium and electronic equipment
CN116501852B (en) * 2023-06-29 2023-09-01 之江实验室 Controllable dialogue model training method and device, storage medium and electronic equipment
CN117828063A (en) * 2024-01-10 2024-04-05 广东数业智能科技有限公司 Psychological field data generation and model training method and device and storage medium
CN117828063B (en) * 2024-01-10 2024-05-17 广东数业智能科技有限公司 Psychological field data generation and model training method and device and storage medium

Similar Documents

Publication Publication Date Title
CN109885672B (en) Question-answering type intelligent retrieval system and method for online education
CN110781680B (en) Semantic similarity matching method based on twin network and multi-head attention mechanism
Cuayáhuitl et al. Ensemble-based deep reinforcement learning for chatbots
US10860630B2 (en) Methods and systems for generating and traversing discourse graphs using artificial neural networks
CN110175227B (en) Dialogue auxiliary system based on team learning and hierarchical reasoning
CN114999610A (en) Deep learning-based emotion perception and support dialog system construction method
Weber et al. Pedagogical Agents for Interactive Learning: A Taxonomy of Conversational Agents in Education.
CN111275401B (en) Intelligent interview method and system based on position relation
CN110569356B (en) Interviewing method and device based on intelligent interviewing interaction system and computer equipment
CN110532363B (en) Task-oriented automatic dialogue method based on decision tree
Juan et al. Particle swarm optimization neural network for research on artificial intelligence college English classroom teaching framework
CN113672720A (en) Power audit question and answer method based on knowledge graph and semantic similarity
Mazzei et al. Analyzing social robotics research with natural language processing techniques
CN116741411A (en) Intelligent health science popularization recommendation method and system based on medical big data analysis
Catterall Using computer programs to code qualitative data
Chandiok et al. CIT: Integrated cognitive computing and cognitive agent technologies based cognitive architecture for human-like functionality in artificial systems
CN117609486A (en) Intelligent dialogue system in psychological field
Kaiss et al. Effectiveness of an Adaptive Learning Chatbot on Students’ Learning Outcomes Based on Learning Styles.
Day et al. AI customer service system with pre-trained language and response ranking models for university admissions
CN115617960A (en) Post recommendation method and device
CN113011196A (en) Concept-enhanced representation and one-way attention-containing subjective question automatic scoring neural network model
CN117438047A (en) Psychological consultation model training and psychological consultation processing method and device and electronic equipment
CN116737911A (en) Deep learning-based hypertension question-answering method and system
GUO et al. Design and implementation of intelligent medical customer service robot based on deep learning
Nair HR based Chatbot using deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination