CN117828063A

CN117828063A - Psychological field data generation and model training method and device and storage medium

Info

Publication number: CN117828063A
Application number: CN202410033864.7A
Authority: CN
Inventors: 唐天驰; 刘昌松; 刘胜坤; 张汝民
Original assignee: Guangdong Shuye Intelligent Technology Co ltd
Current assignee: Guangdong Shuye Intelligent Technology Co ltd
Priority date: 2024-01-10
Filing date: 2024-01-10
Publication date: 2024-04-05
Anticipated expiration: 2044-01-10
Also published as: CN117828063B

Abstract

The invention belongs to the technical field of artificial intelligence, and provides a psychological domain data generation method, a psychological domain data model training device and a psychological domain data storage medium. The method aims at solving the problem that an effective method for generating high-quality dialogue data in the psychological field and training a large language model in the vertical field is lacking at present. And inputting a strategy labeling corpus and an open source data set, constructing a Markov chain generation strategy, de-duplicating and initializing a topic and strategy group list for generating multiple rounds of conversations. The most similar dialogue cases are found through strategy matching and provided for the expert language model to guide replies. Templates are defined and populated with cases, questions, and policies, and multiple rounds of conversations are generated and saved as a mental domain dataset. Training a large language model to adapt to multi-round dialogue tasks in the psychological field, and improving the expertise and pertinence by adopting a special identifier, a causal language model and a weighted average loss function.

Description

Psychological field data generation and model training method and device and storage medium

Technical Field

The invention belongs to the field of natural language understanding in an artificial intelligence technology man-machine dialogue system, and provides a psychological field data generation and model training method, a psychological field data generation and model training device and a storage medium.

Background

In recent years, with the rapid growth of large-scale corpora and hardware capacity, researchers have found that model capacity can be continually increased by scaling up models and training data, which ultimately drive large language models such as GPT-3 (1750 hundred million parameters), paLM (5400 hundred million parameters), and LLaMA (650 hundred million parameters). Large language models perform significantly in understanding and generating text, and are increasingly becoming popular trends in artificial intelligence research, as compared to smaller models. By utilizing a large language model to perform efficient document analysis, complex data interpretation, document writing and the like, research methods of natural and social science are thoroughly changed, and interdisciplinary cooperation is promoted.

While large language models have great potential to accomplish a wide variety of tasks in the general field, effectively creating them as a "conversational robot" that solves practical problems still faces a great challenge. This has led to the advent of "large language model domain specialization". Specifically, there are significant differences in dialogue and language styles between different fields, roles and tasks, such as from psychology mediation to legal consultation to online chat, etc., pertinence and expertise requirements for large language model replies in different scenes are different, some scenes require abundant past case experiences, and some scenes require deep understanding of specialized complex theoretical knowledge. In addition, different domains, organizations and teams have their own "business models" for different businesses, and a single generic large language model has no way to directly and effectively solve the problems in a particular domain. Secondly, in many scenarios, the knowledge of the domain is required to be deep, real-time and accurate, and the knowledge resources of the domain are the assets and core competency specific to institutions and organizations, and cannot be revealed to a general large language model.

Downstream task learning (Downstream task learning) is a common practice in the field of large language models, where large language models after field specialization are better suited for specific tasks and integration into applications.

As shown in fig. 5, the methods specialized in the large language model field can be classified into three types: external enhancement (externalaugmentations), prompt design (Prompt engineering), and Model Fine-tuning (Model Fine-tuning). The model fine tuning can bring the most remarkable performance improvement to a large language model, and the method needs to train the model on a data set with smaller magnitude and specific field. The latest research shows that by adopting a mode of simulated learning, the small-scale large language model of an open source can "draw" knowledge of the expert large language model of a closed source (such as ChatGPT), namely knowledge contained in the expert large language model with superior engineering mining performance is prompted, and then the small-scale large language model with the next performance learns the knowledge, so that the performance of the model on a specific task is improved. The open-source small-scale large language model (about 7B of parameter) has advantages in hardware requirements, calculation cost, engineering optimization difficulty and the like compared with a more general large model, and a proprietary model which is comparable to the closed-source expert large language model in specific tasks can be built with only a small amount of capital cost by simulating learning, so that the specialization of the large language model field is realized. With the increase of social pressure, the rapid information transmission and the change of home structures, the psychological health problem of teenagers is gradually attracting a great deal of attention. Teenagers face a number of challenges including academic stress, personal relationships, physical image, etc., which can negatively impact their mental health. Therefore, it is an urgent need to conduct mental health risk screening in campuses using advanced artificial intelligence techniques and to provide intelligent mental grooming services for risk objects.

Applying artificial intelligence technology to psychological grooming services is a challenge, and the existing psychological grooming services are mainly used for guiding and replying users based on predefined dialogue paths and corpora, so that grooming flexibility is remarkably reduced, and emotion understanding capability and co-emotion capability are lacked. The general large language model in the field is directly used for psychological dispersion, and the pertinence and the specialization are lack, and the large language model specialized in the psychological field is created for psychological dispersion service, and then a sufficient amount of high-quality dialogue annotation data is lack, and the annotation data depends on time-consuming and expensive manpower. To date, there is still a lack of methods to efficiently generate high quality psychological-domain dialogue data and train a psychological-domain specialized large language model.

The drawbacks of the prior art are mainly manifested in the following aspects:

1. the universality is not enough: while having great versatility, large language models still have significant shortcomings in understanding and generating text for a particular domain. This results in the fact that large language models often require domain specialization to improve their performance on a particular task. However, the domain specialization process is complex and time consuming, and there are significant differences in dialog and language styles between different domains, roles and tasks, which limits the application of large language models in general fields.

2. The pertinence is insufficient: the general large language model has insufficient problem solving capability in a specific field and needs to be specialized in the field. However, different domains, organizations and teams have their own "business models" for different businesses, and a single generic large language model has no way to directly and effectively solve the problems in a particular domain. In addition, in many scenarios, deep, real-time and accurate domain knowledge is required, and these domain knowledge resources are proprietary assets and core competencies of institutions, organizations, and cannot be revealed to the general large language model.

3. Data annotation problem: the high quality dialog annotation data in the psychological domain is inadequate, which makes it difficult for large language models to be adequately trained and optimized in the psycho-grooming service. These annotation data rely on time consuming, expensive human labor, and are difficult to obtain. Therefore, how to effectively generate high-quality psychological-field dialogue data and train a psychological-field professional large language model becomes a current urgent problem to be solved.

Disclosure of Invention

The invention aims to solve the problem that the prior method for generating high-quality dialogue data and training a large language model in the vertical field is lack of effective generation in the heart science field. The prompt engineering is a common method for mining knowledge of a large language model, and can guide a closed-source expert large language model to generate corpus by writing proper prompts.

In order to solve the technical problems, the invention adopts the following technical means:

the invention provides a psychological field data generation and model training method, which comprises the following steps:

step 1, strategy simulation step:

the method comprises the steps of inputting a strategy labeling corpus and an open source data set, then simulating manual labeling to generate a strategy by constructing a Markov chain, then de-duplicating the open source data set, initializing a topic list and a strategy group list, and finally outputting the topic list and the strategy group list to a multi-round dialogue generation step.

Step 2, dialogue case retrieval step:

inputting strategy group and questions, finding similar candidate dialogue cases through strategy matching, coding the candidate dialogue cases and the questions, calculating cosine similarity between the candidate dialogue cases and the questions to find the best matched dialogue cases as samples, and finally providing the found best matched dialogue cases as samples to an expert large language model of a multi-round dialogue generation step as a reference case to better guide the expert large language model to generate replies.

Step 3, a multi-round dialogue generating step:

defining a template, filling the template with dialogue cases, problems of users and strategies generated based on Markov chains and then inputting the large language model to generate multiple rounds of dialogue, and storing data to form a psychological domain multiple rounds of dialogue data set;

Step 4, model training step:

the method comprises the steps of training a large language model based on multi-round dialogue data to adapt to multi-round dialogue tasks in the psychological field, distinguishing different parts of the dialogue data by adding special identifiers, keeping the generated causal relationship by using a causal language model, improving the expertise of the model in the psychological field by using a weighted average loss function of a vector space distance loss function and a cross entropy loss function, calculating loss and carrying out weighted average, and training a model to predict a reply strategy and answer so that the model has more expertise and pertinence on the problems in the psychological field.

In the above technical solution, the policy simulation step specifically includes the following steps:

step 2.1, inputting strategy labeling corpus and open source data set;

the strategy marking corpus is manually marked by a psychological field expert team, so that the daily psychological vexation and complaint of teenagers and the psychological related fields of psychological common sense questioning are covered;

the open source data set comprises multiple rounds of dialogue data in the psychological vertical field, and corpus with low similarity is reserved as topics after duplication removal;

step 2.2, constructing a Markov chain, simulating a manual annotation to generate strategies based on the existing strategy annotation corpus, specifically, firstly counting strategies in the first encountered 'in reply content in each strategy annotation corpus, normalizing to obtain an initialized strategy distribution, then counting and normalizing the number of two adjacent' in each corpus, calculating the probability of transferring among strategies, finally obtaining a strategy transfer matrix, and uniquely determining one Markov chain through the initialized strategy distribution and the strategy transfer matrix, wherein the Markov chain is used for subsequent strategy generation;

Step 2.3, de-duplication is carried out on the open source data set, and corpus with lower similarity is reserved as topics;

initializing a topic list and a strategy group list, adding a first problem of each dialogue in the duplicate-removed dialogue data set as an initial topic to the initial topic list, and then generating a strategy list with the length of T for each topic;

step 2.5, generating a strategy repeatedly for a plurality of times according to one-step transfer of a Markov chain based on the topic list and the strategy group list to obtain a strategy list;

2.6. and outputting the topic list and the strategy group list to the multi-round dialogue generating step.

In the above technical solution, the dialogue case searching step specifically includes the following steps:

the strategy matching unit is used for searching strategies similar to the current strategy group in the strategy labeling corpus to serve as candidate dialogue cases;

the BERT encoder is used for encoding the candidate dialogue cases and the questions into semantic vectors and calculating cosine similarity between the candidate dialogue cases and the questions;

and the output unit is used for outputting the candidate dialogue case with the maximum similarity score with the problem cosine as a sample.

In the above technical solution, in the policy matching unit, the similarity of two groups of policies is determined by calculating the minimum editing distance of the two groups of policies.

In the above technical solution, in the policy matching unit, a dynamic programming manner is adopted to search for a policy similar to an input policy in a policy annotation corpus, and the specific steps include:

step 5.1. Treating the individual policies within each set of policies as a single character rather than a string of characters;

step 5.2, judging the similarity of the two strategies by calculating the minimum editing distance of the two strategies;

and 5.3, screening all dialogue cases with minimum editing distances smaller than a certain threshold value k as candidate dialogue cases.

In the above technical solution, the multi-round dialogue data generating step specifically includes the following steps:

step 6.1, defining a template: firstly, three templates, namely a reply template 1, a reply template 2 and a question template, are defined, and are used for guiding a large language model to reply and question;

step 6.2, filling template content: when the number of the theory is 1, in the reply template 1, the dialogue case, the initial topic and a first group of strategies generated based on a Markov chain output by the dialogue case retrieval step are required to be filled in a groove of the template, so that the template content is complete, the initial topic is defined as a problem 1, and the first group of strategies is a strategy group 1;

step 6.3. Input large language model: inputting the filled reply template 1 into the large language model 1, wherein the large language model 1 replies according to a given prompt to obtain a reply 1 of the large language model 1;

Step 6.4. Generating a plurality of rounds of dialogue:

step 6.4.1, defining the current theory number as N, filling a return N into a slot { answer } of the question template, and filling a question N into the slot { query };

step 6.4.2. The question template guides the large language model 2 to simulate the user to question to obtain a new question N;

step 6.4.3, selecting an N group of strategies from a strategy list generated by a Markov chain, namely, using the strategy group N as a reply strategy of the problem N;

step 6.4.4. Input case search step query out the most relevant dialogue case with question N and tactics group N, then fill dialogue case, question N and tactics group N into replying the template 2 separately, guide the large language model 1 to reply and get replying N;

step 6.4.5, judging whether N is equal to T, if not, carrying out step 6.4.1 by assigning the round number N=N+1, and carrying out the next step 6.4.6 until N is equal to T;

step 6.4.6, save the questions of each round, the policy group corresponding to the questions and the replies to obtain a T round of psychological domain multi-round dialogue, and the multi-round dialogue is as follows: "", it->Represented are questions, S represents a policy group (reply policy), and a represents answers to the questions.

In the above technical solution, the model training step specifically includes the following steps:

step 7.1: acquiring multiple rounds of dialogue data based on the multiple rounds of dialogue generation step;

Step 7.2: different pieces of multi-round dialogue data are distinguished using special identifiers.

For the policy group S part, policy identifiers "< STRATEGY >" and "</STRATEGY >" are added;

for reply part A, add the identifier "Assistant:" and "<./s >";

the above identifiers "< STRATEGY >", "</STRATEGY >", "Assistant:" and "</s >" are denoted as Z, i.e., Z represents these specific identification portions.

Step 7.3: inputting the identifier-added multi-round dialogue data into a large language model;

step 7.4: output vectors corresponding to the strategy group S part in the large language model output are denoted as a1, a2,..as, and pass through the linear layerMapping to a dialogue state semantic vector->；

Step 7.5: embedding policies in an embedding layer of a large language model into a matrixBy means of a linear layer->Mapping into a reply strategy semantic matrix->；

Step 7.6: for each dialog state semantic vectorAnd reply to the tactics semantic matrixMatrix multiplication is carried out to obtain a similarity score vector +.>;

Step 7.7: using vector space distance loss function to make dialogue state semantic vector and its correspondent recovery strategy semantic vector similarity scoreLarger, and other returnsComplex strategy semantic vector similarity score ++>The smaller the vector space distance loss function, the following formula:

Wherein,dot product represents the loss function value, ">Representing natural numbers,/->Representing text length,/->Representing a negative sample of all current policies, +.>Word representing the t-th position of the text, +.>To indicate the function, only>The function takes a value of 1 when belonging to the policy group part S, otherwise, the function takes a value of 0;

step 7.8: for Z and reply portion A, using cross entropy loss, given the text of the first 1 to t positions, let the model predict the word of the t+1st position, the specific formula is as follows:

wherein,represents the cross entropy loss function value, t represents the text length, < ->Word representing the t-th position of the text, +.>Words representing the first t positions, +.>For the sum symbol>To indicate the function, only>The function takes a value of 1 when belonging to the answer part A or the special identification part Z, otherwise, the function takes a value of 0;

step 7.9: the vector space distance loss function and the cross entropy loss function are used for weighted average to improve the expertise of the model on the dialogue in the psychology field:

wherein, withIs a real number between 0 and 1 and satisfies +.>;

Step 7.10: using loss functionsA large language model is trained on multiple rounds of dialog data in the psychological domain.

Step 7.11: after training is complete, the model can predict a series of reply strategies for the problem based on the above, and generate replies.

The invention also provides a psychological field data generation and model training device, which comprises the following modules:

and a strategy simulation module: the method comprises the steps of inputting a strategy labeling corpus and an open source data set, then simulating manual labeling to generate a strategy by constructing a Markov chain, then de-duplicating the open source data set, initializing a topic list and a strategy group list, and finally outputting the topic list and the strategy group list to a multi-round dialogue generation module.

And finally, taking the found best matched dialogue case as a sample, and providing the best matched dialogue case as a reference case for an expert large language model of the multi-round dialogue generation module to better guide the generation reply of the expert large language model.

The multi-round dialogue generation module is used for defining a template, filling the template with dialogue cases, user problems and a strategy based on Markov chain generation output by the dialogue case retrieval module, inputting a large language model, generating multi-round dialogue, and storing data to form a multi-round dialogue data set in the psychological field;

The model training module is used for training a large language model based on multi-round dialogue data so as to adapt to multi-round dialogue tasks in the psychological field, distinguishing different parts of dialogue data by adding special identifiers, keeping the generated causal relationship by using the causal language model, improving the expertise of the model in the psychological field by using a weighted average loss function of a vector space distance loss function and a cross entropy loss function, and training a model to predict a reply strategy and answer by calculating loss and carrying out weighted average so as to ensure that the model has more expertise and pertinence on the problems in the psychological field.

The invention also provides a storage medium, and the processor realizes the psychological field data generation and model training method when executing the program stored in the storage medium.

Because the invention adopts the technical means, the invention has the following beneficial effects:

1. according to the invention, the strategy simulation module simulates manual labeling to generate rich strategy labels, the multi-round dialogue data generation module automatically generates high-quality multi-round dialogue data in the psychological field based on strategies and problems, and the dialogue case retrieval module provides reference cases for the model training module, so that the dialogue capacity of the model in the psychological field is remarkably improved. In addition, the invention also designs a loss function which is obtained by weighted average of the vector space distance loss function and the cross entropy loss function and is used for training a large language model, so that the model can adapt to and show higher professionality on conversational tasks in the psychology field.

2. By the technical means, the invention solves the problems of data generation and model training in the psychological field, and achieves the purpose of providing intelligent psychological dispersion service in the psychological field. Firstly, simulating manual annotation by using a strategy annotation corpus and an open source data set through a strategy simulation module to generate rich topics and strategy groups, and then mining high-quality knowledge contained in the strategy group based on a large language model and generating multiple rounds of conversations in the psychological field, so that the conversational capacity of the model in the psychological field is remarkably improved.

3. Through the dialogue case retrieval module, the invention finds the corpus similar to the current strategy group in the strategy labeling corpus based on the given strategy group and the question as the candidate dialogue case, then encodes the candidate dialogue case and the question into semantic vectors by using the BERT encoder, calculates cosine similarity between the candidate dialogue case and the question, thereby finding the best matched dialogue case as a sample, and provides the best matched dialogue case as a reference case for an expert large language model of the multi-round dialogue generation module to better guide the expert large language model to generate replies.

4. Through the multi-round dialogue data generation module, the invention designs an interactive dialogue flow of two expert large language models, and can automatically generate high-quality multi-round dialogue data in the psychological field. Meanwhile, through the model training module, the large language model is trained based on the multi-round dialogue data obtained by the multi-round dialogue generating module, so that the model can adapt to the mode of multi-round dialogue tasks in the psychological field, and the degree of expertise of the model in the psychological field is remarkably improved.

Drawings

FIG. 1 is a block diagram of mental domain data generation and model training;

FIG. 2 is a simulation strategy module;

FIG. 3 is a dialogue case retrieval module;

FIG. 4 is a multi-round dialog data generation module;

FIG. 5 is a schematic diagram of a large language model domain specialized method classification.

Detailed Description

Hereinafter, embodiments of the present invention will be described in detail. While the invention will be described and illustrated in conjunction with certain specific embodiments, it will be understood that it is not intended to limit the invention to these embodiments alone. On the contrary, the invention is intended to cover modifications and equivalent arrangements included within the scope of the appended claims.

Furthermore, in the following detailed description, numerous specific details are set forth in order to provide a better illustration of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details.

Examples

The invention relates to a psychological field data generation and model training method and system, as shown in figure 1, comprising a strategy simulation module, a multi-round dialogue data generation module, a dialogue case retrieval module and a model training module. The following detailed description is made in terms of modules:

Strategy simulation module

The high-quality strategy labeling corpus can significantly improve the performance of the model in downstream tasks, but the high-quality labeling corpus needs to rely on expensive and time-consuming manual labeling. The strategy simulation module constructs a Markov chain through counting strategy labels in the collected strategy labeling corpus, and continuously repeats one-step transfer simulation manual labeling of the Markov chain to generate rich strategy labels. In addition, an open source data set is introduced and is subjected to de-duplication and then is combined with a strategy generated based on a Markov chain, so that a large number of rich and various topics and replying strategies can be obtained and used for guiding a large language model to generate high-quality data in the vertical field.

The modular structure is shown in fig. 2. The input is a strategy labeling corpus and an open source data set. The source of the strategy labeling corpus is manually labeled by a psychological field expert team, the chat scene covers a plurality of psychological related fields such as daily psychological vexation and complaint of teenagers, psychological common sense questioning and the like, and the reality, content diversity and high professional of the chat corpus are effectively ensured. The strategy labels are divided in a professional way by psychologists according to the characteristics of professional dispersion corpus of cognitive behavior therapy and are classified into a plurality of categories of 'co-emotion, approval, remarks and positive advice … …'. An example with a strategic labeling corpus is as follows: the problem is that a person who is learning is often seen on a campus and then feels that he or she becomes unfamiliar. Anxiety, fear and distance between others are increased, and the mind always feels that there is no room for improvement. What do me do? ". The reply with the strategy label is 'convalescence' very convalescence, namely you are seen as bad in the latest state, you are held by you to see that you are people with crisis, meanwhile, the people can be easily influenced by surrounding people, the people are sensitive to pressure perception, and people suffer from the same emotion. I want you to be a student with a top heart, only temporarily without a very firm learning direction. Generally mild anxiety may be converted into power, helping us learn progress, and anxiety has a positive impact on us. If we cannot properly transform the anxiety, external intervention and adjustment of anxiety emotion are needed. For example, in time, a teacher and a friend communicate own learning situation, so that they evaluate your situation, perhaps more objectively, and you can see own learning situation from multiple angles. Perhaps you can also see room for your advantages and progress. The manual labeling strategies are in the labeling corpus, which indicates which strategy the following content belongs to, and K strategies are in total.

The Markov chain is constructed to simulate a manual annotation generation strategy based on the existing strategy annotation corpus. The method comprises the following steps of firstly counting strategies in the first encountered' in reply content in each labeled corpus, and obtaining an initialized strategy distribution after normalization. Then calculating the probability of transferring among strategies by counting and normalizing the numbers of two adjacent's' in each corpus, and finally obtaining a strategy transfer matrix>Where K is the size of the policy set, a Markov chain is uniquely determined by initializing the policy distribution together with the policy transfer matrix, the Markov chain being used for subsequent policy generation. According to one-step transition of the Markov chain, repeating S times to obtain a set of policies with the length of S, and assuming that the number of rounds of the multi-round dialogue data set to be constructed is T, repeating the steps for T times to obtain a policy with the length of T, wherein the length of each set of policies in the list is S, and the form of the finally obtained policy list is as follows: [ "policy 1-policy 2 … policy s", "policy 1-policy 2 … policy s", "policy 1-policy 2 … policy s";]. If t= 5,s =5, a policy list of length 5 is generated, and the element of each position in the list is a policy group, and each policy group is composed of 5 policies.

The open source data set mainly comprises multiple rounds of dialogue data in the psychological vertical field, in order to ensure the diversity of data quality, the open source data set is firstly de-duplicated, and corpus with low similarity is reserved as topics. Specifically, taking open-source psychological multi-round dialogue data smile as an example, a MinHashLSH algorithm is adopted to remove the duplication of the text, and firstly, word segmentation is carried out on the text corpus of the open-source dataset, for example, a Jieba word segmentation tool is used. Then creating an empty list of MinHash objects, namely MinHash, establishing MinHash objects for each segmented text, searching whether the current MinHash objects are similar to the objects in the MinHash list by using MinHash LSH, and if the similarity cannot be found, indicating that the current text is not repeated, and reserving for subsequent operation. If a similarity is found, the current text is considered to be duplicate, disregarding the current text for no processing.

Initializing an empty topic list L= [ ], and a strategy group list G= [ ], and adding the first question of each dialogue in the de-duplicated smile dialogue dataset as an initial topic to the initial topic list L, and if the de-duplicated dataset size is M, then the final L= [ initial topic 1, initial topic 2, … …, initial topic M ]. Assuming that the number of rounds of the multi-round dialogue data set to be constructed is T, generating a strategy list with the length of T for each topic in the initial topic data set, wherein each strategy group in the list comprises S strategies, adding the strategy list to the strategy group list to obtain a strategy group list G= [ strategy list 1, strategy lists 2 and … …, and strategy list M ], and finally outputting the topic list and the strategy group list to the multi-round dialogue generation module.

Dialogue case retrieval module

Small sample learning aims to learn from it to a general law through a limited sample data guided model, enabling it to perform excellently on new samples. In order to fully utilize high-quality strategy labeling corpus, the invention designs a dialogue case retrieval module, which is based on given strategy groups and problems, firstly, finding corpus similar to the current strategy groups in the strategy labeling corpus through strategy matching as candidate dialogue cases. Then, the candidate dialogue cases and questions are encoded into semantic vectors by using the BERT encoder, cosine similarity between the candidate dialogue cases and the questions is calculated, so that the best matched dialogue cases are found as samples, and an expert large language model provided for the multi-round dialogue generation module is used as a reference case to better guide the generation replies of the expert large language model.

The inputs to the module are a policy group and a question, and then dialogue cases similar to the current question and policy are retrieved. The strategy matching can search the similarity of the dialogue strategies in the input strategy and the strategy labeling corpus, and obtain the corpus corresponding to the strategy similar to the input strategy as a candidate dialogue case. The strategy similar to the input strategy can be searched in the strategy labeling corpus by adopting a dynamic programming mode, and the strategy is specifically as follows:

Assuming that two groups of strategies are respectively s1=strategy 1-strategy 2- … … strategy m, s2=strategy 1-strategy 2- … … strategy n, m and n are respectively the number of strategies in the strategy groups S1 and S2, the independent strategies in each group of strategies are regarded as single characters instead of character strings, the similarity of the two strategies is judged by calculating the minimum editing distance of the two groups of strategies, and the minimum editing distance of the two groups of strategies is calculated by the following recursive formula:

wherein S1[ i ] represents the policy at the i+1th position in the policy group S1, S2[ j ] represents the policy at the j+1th position in the policy group S2, d represents a two-dimensional array, d [ i, j ] is the j+1th column of the i+1th row in the two-dimensional array, and the meaning is the minimum editing distance corresponding to the i previous policy of the policy group S1 and the j previous policy of the policy group S2. And (3) calculating the minimum editing distance of the strategy group in the current strategy group and the strategy labeling corpus, and screening all dialogue cases with the minimum editing distance smaller than a certain threshold k as candidate dialogue cases.

Next, the BERT encoder encodes the input questions into semantic vectorsAnd encodes each candidate dialog case as a semantic vector +.>Wherein, the superscript i represents the ith candidate dialogue case, and the similarity score of the semantic vector of the candidate dialogue case and the question is calculated through cosine similarity +. >And selecting a dialogue case with the maximum similarity with the problem cosine similarity score as output.

Multi-round dialogue generation module (Multi-round dialogue data generation module)

In recent years, the field of large language models has been rapidly developed, and the large language models can be guided to complete various natural language tasks by only a small number of instructions, and excellent effects can be obtained. However, large language models with superior performance are almost closed-source, and common small-scale large language models with open sources cannot compete in performance with these closed-source models. The method has the advantages that the open-source small-scale large language model is trained through imitation learning, the open-source small-scale large language model can be enabled to be comparable with or even surpass the closed-source large model in the vertical field, namely, high-quality knowledge contained in the closed-source large language model is fully mined through prompt engineering, and then the knowledge is used for the small-scale large language model to conduct supervised fine tuning, so that the open-source model can easily copy the excellent closed-source large language model with little expenditure. The multi-round dialogue data module fully excavates the knowledge contained in the large language model based on the prompt engineering mode, designs the interactive dialogue flow of the two expert large language models, and can automatically generate high-quality multi-round dialogue data in the psychological field.

In this module, there are three types of templates, reply template 1, reply template 2, and question template, respectively. Reply templates 1 and 2 are used to guide the large language model reply problem, and the difference between them is that reply template 1 is only used for the beginning of the dialogue, it needs to specify which rules need to be followed when the large language model replies, reply template 2 is used in the dialogue, it does not need to specify reply rules for the large language model.

Specifically, the content of the reply template 1 may be' i want you to play a smart emotional mental health intelligent assistant, which can solve various problems for the user. You should follow the following requirements: 1. the user would specify you reply to the policy at the end of the question, such as (policy 1-policy 2- … -policy k). 2. In replying to the user, you should strictly follow a given reply policy and reply in order. 3. You should mark the policy at the beginning of each sentence reply, such as a "(policy reply). (policy) reply … … (policy) reply. 4. Your reply should have emotion support and the effect of grooming people and be rich in content, providing helpful information. A typical dialog case is: { Condition }, please you write next step by step with reference to the case, user: { query } ({ strategies }) '' the content of the reply template 2 may be "user: { query } ({ strategies })', in the reply template 1, brackets { Convergence }, { query }, and { strategies }, are slots to be filled, and dialogue cases, user problems, and strategies generated based on Markov chains output by the case retrieval module need to be respectively filled in, so that the content of the reply template is complete.

The questioning template is used for guiding the large language model to simulate the user to ask questions, the content of the questioning template can be '' I want to let you play a clever and emotion-rich dialogue prediction intelligent assistant, and given the dialogue content of the user and the intelligent assistant, you predict the next problem or idea in the young and the young, and the dialogue of the intelligent assistant is not required to be predicted. The user: { query } Intelligent helper: { answer }. Your predictions — user: ' wherein brackets { query } and { answer } are slots to be filled, and the questions and replies of the previous round of dialogue need to be respectively filled in so that the contents of the question template are complete.

The specific steps of the multi-round dialog generation are as follows, taking the generation of a T-round dialog as an example: firstly, selecting a topic 1 as an initial topic from a topic list L, selecting a strategy list 1 from a strategy group list L, then taking a first group strategy in the strategy list 1, namely the strategy group 1 as a strategy for replying the initial topic (problem 1), inquiring a most relevant dialogue case by an input case retrieval module of the initial topic (problem 1) and the strategy group 1, respectively filling the dialogue case, the initial topic (problem 1) and the strategy group 1 into a groove { conversation } of a reply template 1, inputting { query } and { stratagies } into a large language model 1, outputting a reply 1 according to a given prompting answer problem, respectively filling the initial topic and the reply 1 into a groove { query } of a question template, guiding a large language model 2 to simulate a user question to obtain a problem 2, and then selecting a second group strategy, namely the strategy group 2 as a reply strategy of a problem 2, from a strategy list generated by a Markov chain, respectively inquiring the dialogue case, respectively filling the problem 2 and the strategy group 2 into a groove { conversation case retrieval module, respectively filling { query 2 and { query 2 } into a corresponding dialogue case retrieval module, respectively and a plurality of the answer 2, respectively filling the following prompting answer problems into a corresponding to the case 1, and the multiple answer models, and obtaining a multiple dialogue models, and the multiple answer models: " "where Q represents a question, S represents a policy group (reply policy), and A represents an answer to the question. The large language model 1 and the large language model 2 described above can select expert large language models of closed sources, such as ChatGPT and Bard.

The above process is performed on each initial topic in the topic list L, if the number of topics in the topic list L is M, M T-turn physiological domain dialogue datasets can be generated by using the above method, and if m=2000, t=5, 5-turn dialogue datasets of 2000 physiological domains can be generated.

Model training module

The module trains a large language model based on the multi-round dialogue data obtained by the multi-round dialogue generation module, so that the model can adapt to the mode of multi-round dialogue tasks in the psychological field. The module designs a loss function which is weighted and averaged by a vector space distance loss function and a cross entropy loss function, and the combination of the vector space distance loss function and the cross entropy loss function can improve the expertise of the model on the dialogue in the psychology field.

The large language model uses a causal language model, words are generated one by one according to the sequence when the causal language model generates texts, the words at the current position are only influenced by the previous words, and the characteristics enable the model to maintain a causal relation, namely, the current generation result is not predicted through future information. Given a T-round dialogue data " "where Q represents a question, S represents a policy group (reply policy) to the question, and A represents an answer to the question. First, a special identifier is added to different fragments of a plurality of rounds of dialogue data to distinguish, and a policy identifier is added to the beginning and the end of a policy group S part "<STRATEGY>"He"<STRATEGY>"corresponds to telling the model that a set of strategies is to be predicted. For the reply A part, an identifier of "Assistant:" and "is added to each of its two ends"<./s>"corresponds to telling the model that what is to be predicted is what the intelligent assistant speaks. These identifiers can clearly distinguish the target area display to be predicted of each part, and the above-mentioned identifiers "<STRATEGY>"、"</STRATEGY>"," Assistant: "and"<./s>"denoted as Z", i.e. Z stands for these specific identification moieties.

Specifically, assuming that there are K different independent policies in total, and a policy group S is made up of S independent policies, each policy is regarded as a special character, denoted by "token1", "token2", … … "token", a plurality of dialogs to which identifiers are added are input into a large language model, and vectors of outputs corresponding to the policy group portion S are denoted as a1, a2,..as, passing through a linear layerThe cross-talk state semantic vectors h1, h2,..hs. Embedding policies in the embedding layer of a large language model into a matrix +. >Mapping into reply strategy semantic matrix by linear layer +.>Wherein->Each row corresponds to a reply policy semantic vector. For each dialog state semantic vector +.>(t=1,..s), and reply policy semantic matrix +.>Matrix multiplication, i.e.)>Obtaining similarity score vector->In order to let dialog state semantic vectors and their corresponding reply policy semantic vector similarity score +.>Bigger, and other reply policy semantic vector similarity scoresThe smaller the vector space distance loss function is used, the following formula is:

wherein,representing the loss function value->Representing natural numbers, T representing text length, representing a negative sample of all current strategies. />Word representing the t-th position of the text, +.>For the sum symbol>To indicate the function, only>The function takes a value of 1 when belonging to the policy group part S, otherwise 0.

For Z and reply portion A, using cross entropy loss, given the text of the first 1 to t positions, let the model predict the word of the t+1st position, the specific formula is as follows:

wherein,representing a loss function value, t representing text length, < +.>Word representing the t-th position of the text, +.>Words representing the first t positions, +.>For the sum symbol>To indicate the function, only >The function takes a value of 1 when belonging to the answer part A or the special identification part Z, otherwise 0, two parts are lost +.>And obtaining a final loss function L by weighted average, wherein the formula is as follows:

wherein,and->Is a real number between 0 and 1 and is satisfied. In summary, only the losses of the policy group part S, the answer part a and the special identification part Z are calculated. After training a large language model on the multi-round dialogue data in the psychology domain in this way, the model can predict a series of reply strategies of the questions according to the order from left to right based on the above, and generate replies based on the strategy guidance, and the answers guided by the strategy can be more specialized and specific on the questions in the psychology domain. />

Claims

1. The psychological domain data generation and model training method is characterized by comprising the following steps of:

step 1, strategy simulation step:

inputting a strategy labeling corpus and an open source data set, then simulating manual labeling by constructing a Markov chain to generate a strategy, then de-duplicating the open source data set, then initializing a topic list and a strategy group list, and finally outputting the topic list and the strategy group list to a multi-round dialogue generation step;

Step 2, dialogue case retrieval step:

inputting a strategy group and a problem, then finding similar candidate dialogue cases through strategy matching, then encoding the candidate dialogue cases and the problem, and calculating cosine similarity between the candidate dialogue cases and the problem to find a best matched dialogue case as a sample;

step 3, a multi-round dialogue generating step:

step 4, model training step:

2. The method for generating and training the psychological-field data according to claim 1, wherein the strategy simulation step specifically comprises the steps of:

step 2.1, inputting strategy labeling corpus and open source data set;

Step 2.3, de-duplication is carried out on the open source data set, and corpus with low similarity is reserved as topics;

step 2.5, generating strategies repeatedly for multiple times based on the topic list and the strategy group list according to one-step transfer of the Markov chain to obtain a strategy list;

step 2.6. Output the topic list and the policy group list to the multi-turn dialog generation step.

3. The method for generating and training the psychological-field data according to claim 1, wherein the dialogue case searching step comprises the following steps:

step 3.1, a strategy matching unit is used for searching strategies similar to the current strategy group in strategy labeling corpus to serve as candidate dialogue cases;

step 3.2. A bert encoder for encoding candidate dialog cases and questions into semantic vectors and calculating cosine similarities between them;

and 3.3. An output unit for outputting the candidate dialogue case with the maximum similarity score with the problem cosine as a sample.

4. A psychological field data generating and model training method according to claim 3, wherein in the strategy matching unit, the similarity of the two strategies is judged by calculating the minimum edit distance of the two strategies.

5. The method for generating and training the psychological-field data according to claim 3, wherein in the policy matching unit, a dynamic programming mode is adopted to search a policy similar to an input policy in a policy annotation corpus, and the specific steps include:

6. A method for generating and training psychological-field data according to claim 3, wherein the step of generating the multi-round dialogue data comprises the steps of:

step 6.4. Generating a plurality of rounds of dialogue:

step 6.4.6, save the questions of each round, the policy group corresponding to the questions and the replies to obtain a T round of psychological domain multi-round dialogue, and the multi-round dialogue is as follows: "", it->Represented are questions, S represents policy groups, and a represents answers to the questions.

7. The method for generating and training the model according to claim 1, wherein the step of training the model specifically comprises the steps of:

step 7.2: distinguishing different fragments of the multi-round dialogue data by using a special identifier;

for reply part A, add the identifier "Assistant:" and "<./s >";

the above identifiers "< STRATEGY >", "</STRATEGY >", "Assistant:" and "</s >" are denoted as Z, i.e., Z represents these specific identification moieties;

step 7.4: output vectors corresponding to the strategy group S part in the large language model output are denoted as a1, a2,..as, and pass through the linear layer Mapping to a dialogue state semantic vector;

Step 7.6: for each dialog state semantic vector, and a reply policy semantic matrixMatrix multiplication is carried out to obtain a similarity score vector +.>;

Step 7.7, adopting a vector space distance loss function to enable the similarity score of the dialogue state semantic vector and the corresponding reply strategy semantic vector to be larger and the similarity score of other reply strategy semantic vectors to be largerThe smaller the vector space distance loss function, the following formula:

wherein,represents the cross entropy loss function value, t represents the text length, < ->Word representing the t-th position of the text, +. >Words representing the first t positions, +.>For the sum symbol>To indicate the function, only>The function takes a value of 1 when belonging to the answer part A or the special identification part Z, otherwise, the function takes a value of 0;

wherein,and->Is a real number between 0 and 1 and satisfies +.>;

Step 7.10: using loss functionsIn mind and mindTraining a large language model on the field multi-round dialogue data;

8. The psychological domain data generating and model training device is characterized by comprising the following modules:

and a strategy simulation module: inputting a strategy labeling corpus and an open source data set, then simulating manual labeling by constructing a Markov chain to generate a strategy, then de-duplicating the open source data set, then initializing a topic list and a strategy group list, and finally outputting the topic list and the strategy group list to a multi-round dialogue generation module;

the dialogue case retrieval module inputs a strategy group and a problem, then finds similar candidate dialogue cases through strategy matching, then encodes the candidate dialogue cases and the problem, calculates cosine similarity between the candidate dialogue cases and the problem to find a best matched dialogue case as a sample;

9. A storage medium, wherein a processor implements a method for generating and model training psychology domain data according to any one of claims 1 to 7 when executing a program stored in the storage medium.