CN117933423A

CN117933423A - Multi-round dialogue fine tuning method of autoregressive LLM

Info

Publication number: CN117933423A
Application number: CN202410127140.9A
Authority: CN
Inventors: 王子豪; 叶佳豪
Original assignee: Focus Technology Co Ltd
Current assignee: Focus Technology Co Ltd
Priority date: 2024-01-30
Filing date: 2024-01-30
Publication date: 2024-04-26

Abstract

The invention discloses a multi-round dialogue fine tuning method of autoregressive LLM, which is characterized by comprising the steps of obtaining multi-round dialogue data by using a large language model for marking and splicing, and performing word segmentation after a stop character is inserted into a text; generating a loss marking mask vector, distinguishing user dialogue and realizing parallel calculation loss; constructing topic transfer data; performing parameter efficient fine adjustment on the large language model, obtaining an Assistant loss and a User loss in parallel by using a loss marking mask vector, and weighting each round of dialogue loss; and outputting a reply to the user by using the trimmed model. According to the invention, each round of dialogue is trained without splitting data, multiple rounds of dialogue data are more fully utilized, smoothness and naturalness of the model in a non-unilateral question-answering scene are improved, the attention of the model to a historical topic and a current topic is balanced by combining topic transfer data and attention weight, topic transfer is better dealt with, and performance and user experience of a dialogue system are improved.

Description

Multi-round dialogue fine tuning method of autoregressive LLM

Technical Field

The invention relates to the fields of natural language processing and large language model fine tuning, in particular to a multi-round dialogue fine tuning method of autoregressive LLM.

Background

GPT is used as a representative of an autoregressive large language model (also called a Decoder-only model) utilizing a Decoder structure in a transducer, and is usually a causal language model (Causal Language Model), namely, predicting subsequent output through feedforward information, and the model has good effect in many natural language tasks and is gradually widely applied. In the multi-round dialogue task, on the basis of a pre-trained model (namely a Chat model) subjected to dialogue data total Parameter fine adjustment, parameter-EFFICIENT FINE-Tuning (PEFT) is further carried out on the downstream task, the model parameters are frozen, an additionally inserted structure is trained, and on the basis of keeping the original capacity of the model, the model can better complete the downstream task, so that the method is a common method at present.

Currently, a large language model such as llama is common, and the input data prompt template of the Chat model conforms to the User-Assistant organization format of alpaca, namely, a User input part and a model reply part. In the process of multi-round dialogue fine tuning, two data construction modes are generally adopted, one is to take a history message as input and the answer of the last round as a label, and the mode is only trained for the last round of dialogue. The other way is to split all rounds of the multi-round dialogue, each time, the history information of the current round is used as input, the answer of the current round is used as a label, and the whole dialogue is split into a plurality of pieces of data. Because the application scenes of the User-Assistant organization format are mostly task type question-answering scenes, the two modes fully utilize the history information when the model replies when the calculation is lost, but in the scenes of non-unilateral question-answering such as daily chat, the topics in the history information are easily affected when the topics are processed for transferring. On the other hand, both construction modes can only train part of dialogue of an Assistant and do not consider User parts, which results in insufficient utilization of training data, and meanwhile, the model tends to answer User questions as an answer robot in a non-unilateral question-answering scene rather than interact as a chat object, and the answer lacks natural fluency and needs further improvement and perfection.

Disclosure of Invention

The invention solves the technical problem of overcoming the defects of the prior art and providing a multi-round dialogue fine tuning method of autoregressive LLM, which mainly comprises the following steps:

step 1: acquiring and labeling multiple rounds of dialogue data by using a large language model;

step 2: splicing multiple rounds of dialogue data, inserting stop symbols into texts of the multiple rounds of dialogue data, and then performing word segmentation;

Step 3: generating a loss marker mask vector;

step 4: constructing and adding topic transfer data;

Step 5: performing parameter efficient fine adjustment on the large language model;

Step 6: and outputting a reply to the user by using the trimmed model.

In the step 1, the method for obtaining and labeling multiple rounds of dialogue data by using a large language model specifically includes:

Collecting original multi-round dialogue data from a dialogue system, wherein the dialogue system comprises an open source data set and a network forum, and generating more than one round of dialogue data by using a large language model and a prompt word to refer to dialogue topics; each multi-round dialogue data is marked according to the formats of [ User: dialogue 1, assistant: dialogue 2, user: dialogue 3, assistant: dialogue 4.] to form a piece of training data.

In the step 2, the splicing of the multi-round dialogue data, and word segmentation after the stop symbol is inserted into the text of the multi-round dialogue data, specifically includes:

Inserting a stop character into each piece of training data for splicing when training and reading the data; and inserting a stop symbol after each User text, inserting a stop symbol after each Assistant text, and obtaining input data after tokenizer word segmentation, wherein the spliced form is [ < s > dialogue 1</s > dialogue 2</s > dialogue 3</s > dialogue 4</s > … >.

In the step 3, the generating a loss flag mask vector specifically includes:

Generating a vector consisting of 0 and 1, and marking a part needing to calculate loss in input data; generating an Assistant loss mark mask vector and a User loss mark mask vector simultaneously, wherein the Assistant loss mark mask vector and the User loss mark mask vector are used for distinguishing the loss of the Assistant and the User role while calculating the loss in parallel in the training process; the position corresponding to the Assistant dialog in the Assistant loss markup mask vector is 1, the rest positions are 0, the position corresponding to the User dialog in the User loss markup mask vector from the two rounds of dialog is 1, and the first round of dialog position and the rest positions are 0.

In the step 4, the construction and addition of topic transfer data specifically includes:

10% of data are randomly selected from the training data to construct a topic transfer data training set. Judging the theme of each selected dialogue by using the large language model, generating a new dialogue irrelevant to the theme by using the prompt words and the large language model, marking and splicing the new dialogue after the original dialogue to form new input data in the same way as described in the step 1-2, and generating a loss marking mask vector for the new input data in the step 3.

In the step 5, the fine tuning of the large language model specifically includes:

The large language model calculates a prediction result according to input data, an assuredly lost and User lost is obtained by using a lost mark mask vector, after each round of dialogue loss of assuredly lost and User is weighted, a lost value is transmitted back to the large language model through a back propagation algorithm, and a trainable module corresponding to a method for updating efficient parameter fine adjustment is updated.

In the step 5, the obtaining the Assistant loss and the User loss by using the loss marking mask vector specifically includes:

The large language model predicts an input text in a fine tuning process to obtain an output sequence, calculates cross entropy loss values of each position, judges by using loss mark mask vectors, only picks out loss values with positions of 1 in the loss mark mask vectors and updates weights; and obtaining a loss set of the Assistant and a loss set of the User through the Assistant loss marking mask vector and the User loss marking mask vector.

In the step 5, the weighting of each session loss of the assuredly and the User specifically includes:

For each round of dialogue, calculating cosine similarity between the current dialogue and the previous dialogue of the current dialogue, constructing relevance as loss weight of the current dialogue in the process of calculating loss, balancing influence of historical context and the current round of dialogue, and losing additional attention of the previous dialogue for each weighted round of dialogue; and then, carrying out weighted summation on the weighted loss of the Assistant and the User again to obtain total loss, controlling the weight of each User through a User weight control parameter, wherein the weight of the loss of the Assistant and the User in the daily chat scene is 0.5 and 0.5, and the weights are set to be 1 and 0 in the complete unilateral question-answering scene.

In the step 6, the outputting the reply to the user by using the trimmed model specifically includes:

The reasoning module of the large language model loads the finely tuned model weight, inputs the input information and the context information of the user into the large language model, outputs token expressed by word segmentation after forward reasoning through a network, decodes according to the word list, generates a reply text sequence, and returns the generated reply to the user.

The invention has the beneficial effects that:

the invention provides a multi-round dialogue fine tuning method of autoregressive LLM, which is characterized in that loss mask vectors are introduced to calculate losses in the multi-round dialogue fine tuning process in parallel, each round of dialogue can be trained without splitting dialogue data, meanwhile, an Assistant loss mask vector and a User loss mask vector are further designed to add consideration to User dialogue loss, the multi-round dialogue data is more fully utilized to improve model output effect, and smoothness and naturalness of model reply in a non-unilateral question-answer scene are improved. On the basis, topic transfer data are constructed, and attention weights are utilized to balance attention of the model to historical topics and current topics in the training process, so that topic transfer is better dealt with.

Drawings

FIG. 1 is a flow chart of a method of an exemplary embodiment of the present invention.

Detailed Description

The invention relates to a context relevance and user contribution weighting multi-round dialogue fine tuning method which mainly comprises a data acquisition and processing module, a model fine tuning module and a model reasoning module.

The data acquisition and processing module collects multiple rounds of dialogue data from dialogue systems such as an open source dataset, a network forum and the like, generates multiple rounds of dialogue data of the same topic by using a large language model represented by chatgpt to refer to the dialogue subject through a prompt word, and generates multiple rounds of dialogue data of different topics to manufacture topic transfer data. And then marking and processing the data, inserting a stop character after each Assistant and User text, and designing a loss marking mask vector to realize the parallel calculation of loss in the fine tuning process, so that each round of dialogue can be trained without splitting dialogue data. And further generates corresponding Assistant loss marking mask vector and User loss marking mask vector for loss calculation in the subsequent fine tuning process, and fully utilizes the multi-round dialogue training data.

In the model fine Tuning module, parameter-EFFICIENT FINE-Tuning (PEFT) is performed on a large language model, and the method includes, but is not limited to, lora, promt-Tuning, adapt-Tuning and the like. And predicting the input text in the fine tuning process by the model to obtain an output sequence, calculating the loss of each position, and only selecting the loss of the position 1 in the mask vector to update the weight by using the loss mark mask vector. And obtaining the assant loss and the User loss respectively by using the assant loss marking mask vector and the User loss marking mask vector. And then, constructing relevance between the latest round of dialogue and the current dialogue as loss weight, balancing the influence of the history context and the current round of dialogue, and avoiding actively generating answers irrelevant to the current dialogue while fully learning the model.

And loading the trimmed model weight in the model reasoning module, sending the input information and the context information of the user into the model for reasoning, generating natural and smooth replies and returning the generated replies to the user.

To implement the above method, a method flow of an exemplary embodiment of the present invention will be described below with reference to fig. 1:

step 1: the method for acquiring and labeling the multi-round dialogue data by using the large language model specifically comprises the following steps:

The data acquisition and processing module collects multiple rounds of dialogue data from dialogue systems such as open source data sets, internet forums and the like, and generates multiple rounds of dialogue data by using chatgpt reference dialogue topics through prompt words. Taking two rounds as an example, each two rounds of dialogue data are marked according to the formats of [ User: dialogue 1, assistant: dialogue 2, user: dialogue 3, assistant: dialogue 4] to form a piece of training data.

Step 2: the dialogue text is spliced and inserted with a stop character for word segmentation, which specifically comprises the following steps:

The stop character is used for each piece of data to splice when training the read data. A stop is inserted after each User text, and a stop is also inserted after each Assistant text, and a spliced form such as [ < s > dialog 1</s > dialog 2</s > dialog 3</s > dialog 4</s > ], is subjected to tokenizer segmentation to obtain a segmented representation, such as [1,29871,232,147,134,30743,232,147,154,30882,2,29871,232,147,134,30743,30214,30919,232,148,165,30882,2,29871,233,154,172,30429,232,147,134,30210,30417,30940,232,188,181,30214,232,193,136,30437,231,188,179,234,150,185,30716,31475,30882,2,29871,232,150,139,232,150,139,30214,235,184,179,30743,232,150,139,30214,30557,30936,30417,30816,30287,31558,31863,31687,30267,2],, wherein 1 and 2 respectively represent numbers corresponding to the start symbol < s > and the stop in a vocabulary, and the number in the middle of each 1 or 2 represents a segmented result of each dialog text, such as [.. 29871,232,147,134,30743,232,147,154,30882, ] represents a segmented result of dialog 1, [.. 29871,232,147,134,30743,30214,30919,232,148,165,30882, ] represents a segmented result of dialog 2, and dialog 3 is the same as dialog 4. The purpose of inserting the stop sign at each position is to ensure that the model learns the stop sign of each dialogue round when calculating loss in parallel in the fine tuning process, so that the stop sign can be normally output to finish output in the reasoning process.

Step 3: the generation of the loss mark mask vector specifically includes:

And generating two loss mark mask vectors, namely an assuredly loss mark mask vector and a User loss mark mask vector, which are used for simultaneously calculating loss in parallel and distinguishing loss of User and assuredly roles in the training process. Taking the above dialog segmentation result as an example, the dialog loss markup mask vector is in the form of [0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1],, in which the position of 1 corresponds to dialog 2 and dialog 4, which are the dialog of the position of the assant, and the User loss markup mask vector is in the form of [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0],, in which assant dialog 2 and dialog 4 are marked 0, user initial dialog 1 is marked 0, and dialog 3 is marked 1.

Step 4: constructing and adding topic transfer data, which specifically comprises the following steps:

10% of data is randomly selected from the training data, and the data is constructed as topic transfer data and added into a training set. Firstly, using chatgpt to judge the theme of every selected conversation, using prompt words to make chatgpt produce new conversation irrelevant to the theme, and marking and splicing the new conversation after the original conversation according to the same mode, if the original conversation is [ User 1 conversation, assistant 2 conversation, user 3 conversation, assistant 4 conversation ] two-round conversation, the new irrelevant theme conversation is conversation 5 and conversation 6 one-round conversation, after splicing marking is [ User 1 conversation, assistant 2 conversation, user 3 conversation, assistant 4 conversation, user 5 conversation, assistant 6 conversation ], after inserting stop character form is as [ < s > conversation 2</s > conversation 3</s > conversation 4</s conversation 6</s > ], the corresponding generated Assistant loss mark mask vector is added 0 in conversation 5, conversation 6 is added 1, user loss mark mask vector is added 1 in conversation 5, conversation 6 is 0 and the form is the same as above.

Step 5: and carrying out parameter efficient fine adjustment on the large language model. And obtaining the assant loss and the User loss by using the loss marking mask vector. After weighting each round of dialogue loss of the Assistant and the User, transmitting the loss value back to the large language model through a back propagation algorithm, updating a trainable module corresponding to the efficient parameter fine adjustment method, using the Lora method, and using a low-rank adaptation matrix as an adapter and updating the low-rank adaptation matrix.

The step 5 of performing parameter efficient fine tuning on the large language model specifically comprises the following steps:

And predicting the input text in the fine tuning process by the model to obtain an output sequence, calculating the loss of each position, and only selecting the loss of the position 1 in the mask vector to update the weight by using the loss mark mask vector. Then, by the assurement penalty flag mask vector, a penalty set representing assurement, defined as a_loss, may be obtained, which may be expressed as:

A_Loss＝Loss(U₁,A₁)+Loss[(U₁,A₁,U₂),A₂]+...+Loss[(U₁,A₁,...,U_n-1,A_n-1,U_n),A_n]

Wherein U _n and A _n represent User and Assistant dialogs, respectively, in the nth round of dialogs. The definition of the Loss function is:

Where h represents the history round context and p (x _i |h) represents the probability of outputting the correct result x _i given the input, i.e. history context, x _i belongs to the true label y, and the final summation is taken to get the loss of output sequence in one calculation.

Similarly, a Loss set representing User, defined as u_loss, may be obtained from the User Loss marker mask vector, which may be expressed as:

U_Loss＝Loss[(U₁,A₁),U₂]+...+Loss[(U₁,A₁,...,U_n-1,A_n-1),U_n]

in order to fine tune topic transfer data better, the model is fully learned, meanwhile, the situation that answers irrelevant to the current dialogue are actively generated is avoided, and for each round of dialogue, the influence of historical context and the current round of dialogue is balanced by using the correlation between the latest round of dialogue and the current dialogue as a loss weight. The weighted loss can be expressed as:

wA_Loss＝w_a1·Loss(U₁,A₁)+w_a2·Loss[(U₁,A₁,U₂),A₂]+...+w_an

·Loss[(U₁,A₁,...,U_n-1,A_n-1,U_n),A_n]

wU_Loss＝w_u2·Loss[(U₁,A₁),U₂]+...+w_un·Loss[(U₁,A₁,...,U_n-1,A_n-1),U_n]

The calculation method of the loss weights w _an and w _un of the Assistant and the User can be expressed as follows:

[w_a1...w_an]＝softmax[cos_similarity(U₁,A₁),...,cos_similarity(U_n,A_n)]

[w_u2...w_un]＝softmax[cos_similarity(A₁,U₂)...,cos_similarity(A_n-1,U_n)],n≥2

If the current dialogue is a node with topic transfer, the cosine similarity of the current dialogue and the previous dialogue of the current dialogue is lower than that of the current dialogue and the previous dialogue of the current dialogue when topic transfer does not occur, and then the weight of the current dialogue when the loss is calculated is reduced to punish. The weighted dialogue loss of each round can pay additional attention to the last dialogue, so that the topic change is better focused to reply after the topic is transferred.

The total loss function is defined as:

Total_loss＝α·A_Loss+β·U_Loss

The sum of alpha and beta is 1, and is a User weight control parameter, default settings are 0.5 and 0.5, and the higher the importance of a User session and an Assistant session in an actual application scene is, the larger the corresponding weight control parameter setting is. Specifically, when α is set to 1 and β is set to 0, this is equivalent to the default setting in the fully unilateral question-answering scenario, i.e., consider only the assuredly lost.

The loss is passed back to the model by a back propagation algorithm, and the Lora low-rank adaptation matrix parameters are updated to minimize the loss function. Compared with the common adapter fine tuning such as Adapt-tuning method, the low-rank adaptation matrix used by the Lora method adopts a method based on knowledge distillation and lightweight parameter updating, can finish fine tuning in a shorter time with less resources, and is more suitable for multi-round dialogue scenes containing a large amount of dialogue data. Meanwhile, the low-rank adaptation matrix in the Lora method is a low-rank approximation of the original model parameter matrix, compared with the Prompt-tuning, the model is finely tuned by directly writing the related information of the dialogue task into the input Prompt text, the generalization is stronger, the characteristic representation capability of the pre-trained model can be better kept in the process of transferring the dialogue questions and finely tuning the loss weight of the user, and the nature and fluency of the dialogue are kept.

Step 6: and outputting a reply to the user by using the trimmed model. The method specifically comprises the following steps:

The model reasoning module loads the finely tuned model weight, inputs the input information and the context information of the user into the model, outputs token expressed by the word segmentation after forward reasoning of the network, decodes according to the word list, generates a reply text sequence, and returns the generated reply to the user. Because </s > stop symbols are added after the User and the Assistant of each round of dialogue in fine tuning, the model can be ensured to learn the stop symbols of each round of dialogue in the process of calculating the loss of the User and the Assistant, and the model stops after replying the input according to the input and the context in reasoning, so that new dialogue can not be continuously generated infinitely.

The invention relates to a multi-round dialogue fine tuning method of autoregressive LLM, which comprises the steps of firstly utilizing a large language model to obtain multi-round dialogue data and marking, splicing dialogue texts, inserting stop symbols after each dialogue, and then performing word segmentation, so that the stop symbols can be learned in the model fine tuning process, and parallel training without splitting the data is realized. A penalty-tag mask vector is also generated for extracting the portions of the computation penalty and for parallel computation of the penalty. And extracting a part of the training data to construct topic transfer data, sending the data into a model, and performing fine adjustment by using an efficient parameter fine adjustment method. In the fine tuning process, the penalty flag mask vector is used to obtain the Assistant penalty and the User penalty in parallel. And then weighting the loss of each dialog of the Assistant and the User by using the correlation between the latest dialog and the current dialog as a loss weight to balance the influence of the historical context and the current dialog, and controlling the weighted loss of the Assistant and the weighted loss of the User through a User weight control parameter. And finally, inputting User information and context to the trimmed model, adding a stop character for adding </s > after each round of dialogue User and Assistant dialogue, stopping the input after outputting the input according to the input and the context, and returning the output as a reply to the User.

The invention has the beneficial effects that:

In the conventional multi-round dialogue fine tuning process of autoregressive LLM for a Decoder-only structure, more training data is omitted only in a multi-round dialogue data construction mode of the last round, and the data splitting method can increase the training data quantity. Because most of application scenes of a User-Assistant organization format are task type question-answering scenes, only the dialogue of the Assistant is trained during fine tuning, and the User is not considered, so that the model tends to answer as an answer robot in a non-unilateral question-answering scene instead of interacting as a chat object, and natural fluency is lacking. In addition, both data construction methods use history round dialogues as contexts during fine tuning, so that more historical information can be focused during model reply, but topics in history messages are easily affected during topic transfer processing in daily dialogues. Thus, there is a need for further improvements and perfection of the data construction and loss calculation process. According to the multi-round dialogue fine tuning method for the autoregressive LLM, loss in the multi-round dialogue fine tuning process is calculated in parallel by introducing the loss mask vector, each round of dialogue can be trained without splitting dialogue data, meanwhile, the assuredly loss mask vector and the User loss mask vector are further designed to consider User dialogue loss, the multi-round dialogue data is more fully utilized to improve the model output effect, and meanwhile, smoothness and naturalness of the model in a non-unilateral question-answer scene are improved. On the basis, topic transfer data are constructed, and attention weights are utilized to balance attention of the model to historical topics and current topics in the training process, so that topic transfer is better dealt with. And the performance and the user experience of the dialogue system are improved.

The above embodiments do not limit the present invention in any way, and through the above description, the related workers can completely make various changes and modifications without departing from the scope of the technical idea of the present invention, and all other improvements and applications made to the above embodiments in equivalent transformation form belong to the protection scope of the present invention, and the technical scope of the present invention is not limited to the content on the description, and must be determined according to the scope of claims.

Claims

1. An autoregressive LLM multi-round dialogue fine tuning method, comprising the steps of:

Step 3: generating a loss marker mask vector;

step 4: constructing and adding topic transfer data;

Step 6: and outputting a reply to the user by using the trimmed model.

2. The method for fine tuning multiple rounds of dialogue in an autoregressive LLM according to claim 1, wherein in step 1, the method for obtaining and labeling multiple rounds of dialogue data by using a large language model specifically comprises:

3. The method for fine tuning multiple rounds of dialogue in an autoregressive LLM according to claim 2, wherein in the step2, the splicing of the multiple rounds of dialogue data, and word segmentation after the stop symbol is inserted into the text of the multiple rounds of dialogue data, specifically comprises:

4. The multi-round dialogue fine-tuning method of an autoregressive LLM as recited in claim 3, wherein in the step 3, the generating a loss mark mask vector specifically includes:

5. The multi-round dialogue fine tuning method of the autoregressive LLM according to claim 4, wherein in the step 4, the constructing and adding topic transfer data specifically includes:

randomly selecting 10% of data in training data to construct a training set of topic transfer data, judging the theme of each selected dialogue by using a large language model, generating a new dialogue irrelevant to the theme by using a prompt word and the large language model, marking and splicing the new dialogue after the original dialogue to form new input data in the same manner as described in the step 1-2, and processing the new input data to generate a loss mark mask vector in the step 3.

6. The multi-round dialogue fine tuning method of auto-regressive LLM according to claim 5, wherein in step 5, the fine tuning of the large language model specifically comprises:

7. The multi-round dialogue fine tuning method of autoregressive LLM according to claim 6, wherein in the step 5, the loss mark mask vector is used to obtain an assuredly loss and a User loss, and the method specifically comprises:

8. The method for multi-round dialogue fine-tuning of autoregressive LLM according to claim 7, wherein in the step 5, each round of dialogue loss between an Assistant and a User is weighted, and the method specifically comprises:

9. The multi-round dialogue tuning method of the autoregressive LLM according to claim 8, wherein in the step 6, the outputting the reply to the user by using the tuned model specifically includes: