CN117273067B - Dialogue response method and device based on large language model - Google Patents

Dialogue response method and device based on large language model Download PDF

Info

Publication number
CN117273067B
CN117273067B CN202311548034.XA CN202311548034A CN117273067B CN 117273067 B CN117273067 B CN 117273067B CN 202311548034 A CN202311548034 A CN 202311548034A CN 117273067 B CN117273067 B CN 117273067B
Authority
CN
China
Prior art keywords
matrix
language model
fine tuning
value matrix
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311548034.XA
Other languages
Chinese (zh)
Other versions
CN117273067A (en
Inventor
杨展悌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinlianxin Intelligent Technology Co ltd
Original Assignee
Shanghai Xinlianxin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xinlianxin Intelligent Technology Co ltd filed Critical Shanghai Xinlianxin Intelligent Technology Co ltd
Priority to CN202311548034.XA priority Critical patent/CN117273067B/en
Publication of CN117273067A publication Critical patent/CN117273067A/en
Application granted granted Critical
Publication of CN117273067B publication Critical patent/CN117273067B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Optimization (AREA)
  • Computational Mathematics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Pure & Applied Mathematics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

A dialogue response method and device based on a large language model comprises the following steps: acquiring a dialogue text of the user for carrying out the Nth fine tuning on the large language model exclusive to the user; the self-attention module of each layer of the micro-adjustment language model is adjusted through the dialogue text; the fine-tuning large language model is the M-th large language model after integral adjustment or the N-1-th large language model after fine tuning; based on the output characteristics of the dialogue text, adjusting parameters of a fine tuning value matrix of the self-attention module until the parameters meet the precision requirement of the large language model, and obtaining an Nth adjusted fine tuning value matrix; obtaining an Nth adjusted value matrix according to the reference value matrix and the Nth adjusted fine adjustment value matrix; generating an Nth adjusted large language model based on the reference query matrix, the reference key matrix and the Nth adjusted value matrix of the self-attention module of each layer; and taking the Nth adjusted large language model as a special large language model of the user, and continuing to converse with the user.

Description

Dialogue response method and device based on large language model
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a dialogue response method and device based on a large language model.
Background
GAI is one of the most vigorous technical directions of current development, and representative technological breakthroughs and popularization are chatGPT and other large language models (large language model, LLM), and the application of GAI covers a plurality of fields such as translation, article generation, article abstract, search, image generation, image analysis, code generation and the like. With the development of large language models, more and more companies develop own models, so that users can train own dedicated large language models.
The core of the large language model is Transformer Neural Network (TNN), each self-attention module in the TNN has a large number of parameters, so when the large language model is wanted to learn new knowledge, the calculation amount in the process of adjusting each parameter of the large language model is large, the training time of the large language model is too long, and a large amount of storage space is also needed to store the parameters.
Therefore, a scheme is needed to reduce the calculation amount and the memory amount when the large language model is adjusted, and shorten the training time.
Disclosure of Invention
The application provides a dialogue response method and device based on a large language model, which are used for reducing calculated amount and memory amount when the large language model is adjusted and shortening training time.
In a first aspect, the present application provides a large language model-based dialog response method, the method comprising: acquiring a dialogue text of performing Nth fine tuning on the large language model exclusive to the user by the user; adjusting the self-attention module of each layer of the fine tuning language model through the dialogue text; the fine-tuning large language model is an M-th large language model after overall adjustment or an N-1-th large language model after fine tuning; the self-attention module of any layer of the fine tuning large language model comprises a reference query matrix, a reference key matrix and a fine tuning value matrix; the reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model; in the process of adjusting the self-attention module of each layer, acquiring the self-characteristics of each word in the dialogue text through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module; based on the output characteristics of the dialogue text, adjusting parameters of a fine tuning value matrix of the self-attention module until the parameters meet the precision requirement of a large language model, and obtaining an Nth adjusted fine tuning value matrix; obtaining an Nth adjusted value matrix according to the reference value matrix and the Nth adjusted fine adjustment value matrix; the reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model; generating an Nth adjusted large language model based on the reference query matrix, the reference key matrix and the Nth adjusted value matrix of the self-attention module of each layer; and taking the Nth adjusted large language model as the exclusive large language model of the user, and continuing to converse with the user.
In the above technical solution, since the value matrix of the self-attention module is a matrix of direct relation text output, adjusting parameters in the value matrix has a great influence on the output of a large language model. Therefore, when the large language model is finely tuned each time, the parameter values of the query matrix and the key matrix of the self-attention module are kept unchanged, and only the parameter values in the value matrix are adjusted, so that the calculated amount and the memory amount when the large language model is adjusted can be reduced, and the training time is shortened.
In one possible design, the method further comprises: determining a first fine tuning weight of the dialog text; the first fine tuning weight is used for representing the influence degree of the dialogue text on the exclusive large language model of the user; the obtaining the value matrix after the nth adjustment according to the reference value matrix and the trimming value matrix after the nth adjustment comprises the following steps: multiplying the trimming value matrix after the Nth adjustment by the first trimming weight, and adding the trimming value matrix with the reference value matrix to obtain the value matrix after the Nth adjustment.
In the technical scheme, the content of the text representation of the dialogue can be emphasized or weakened by adjusting the value of the first fine tuning weight, so that the large language model special for the user is trained to be more personalized.
In one possible design, the weighting of the fine tuning value matrix of the self-attention module by using the attention score of the word to obtain the corresponding output feature of the word under the self-attention module includes: decomposing the fine tuning value matrix into a multiplied first sub-matrix, a second sub-matrix and a third sub-matrix; taking the attention score of the word as the weights of the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix to obtain the corresponding output characteristics of the word under the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix; the step of adjusting parameters of the fine tuning value matrix of the self-attention module based on the output characteristics of the dialogue text until the parameters meet the precision requirement of the large language model, and obtaining the fine tuning value matrix after the Nth adjustment comprises the following steps: based on the output characteristics of the dialogue text, the parameters of the first submatrix, the second submatrix and the third submatrix are adjusted, and when the precision requirement of a large language model is met, the first submatrix, the second submatrix and the third submatrix after the Nth adjustment are obtained; multiplying the first sub-matrix, the second sub-matrix and the third sub-matrix after the N-th adjustment to obtain a trimming value matrix after the N-th adjustment.
In the technical scheme, after the fine tuning value matrix is decomposed into the three multiplied submatrices, only parameters in the three submatrices are required to be adjusted in training of the model, so that the number of parameters participating in training is obviously reduced, and further the training time can be shortened.
In one possible design, the determining the first fine tuning weight of the dialog text includes: comparing the information represented by each vector in the first matrix and/or the third matrix after the N-th adjustment with the information represented by each vector corresponding to the first matrix and/or the third matrix obtained by the adjustment before the N-th adjustment, and if the new topic information exists, increasing the first fine tuning weight of the dialogue text.
In the above technical solution, whether a new topic exists is determined by comparing information represented by each vector in the first matrix and/or the third matrix after the nth adjustment with information represented by each vector corresponding to the first matrix and/or the third matrix obtained by the first adjustment before the nth adjustment. The automatic and quantitative determination method for whether the new topic exists is free of user participation, and the adjustment process of the large language model is more intelligent.
In one possible design, the method further comprises: storing a trimming value matrix after each trimming after the Mth integral adjustment and a corresponding first trimming weight; multiplying the trimming value matrix after the nth adjustment by the first trimming weight, and adding the trimming value matrix with the reference value matrix to obtain a value matrix after the nth adjustment, wherein the method comprises the following steps of: and accumulating the products of the fine tuning value matrix after each fine tuning and the corresponding first fine tuning weight, and adding the products with the reference value matrix to obtain the parameters of the value matrix after the Nth adjustment.
In the above technical solution, by storing the trim value matrix after each trimming and the corresponding first trimming weight, a user may be given a chance to correct, for example, the dialog text information of a certain trimming is wrong or the dialog text information of a certain trimming is not needed, and then the trim value matrix of the certain trimming may be deleted. Or the user wants to emphasize or deemphasize the dialog text information for a particular time, the first trimming weight for that time may be turned up or down.
In one possible design, the method further comprises: acquiring each dialogue text of K times of fine tuning of a user on the large language model exclusive to the user to form a dialogue text set; the matrices of the self-attention modules of all layers of the large language model after the M-th integral adjustment are integrally adjusted through the dialogue text set; in the adjustment process, for each word in the dialogue text set, acquiring the self-characteristics of the word through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of a reference value matrix to obtain the corresponding output characteristic of the word under the self-attention module; based on the output characteristics of the dialogue text set, parameters of a reference query matrix, a reference key matrix and a reference value matrix of the self-attention module are adjusted until the parameters meet the precision requirement of a large language model, and an adjusted fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix are obtained; and updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix to obtain an M+1st overall adjustment large language model.
According to the technical scheme, the three matrixes of the self-attention module of the large language model are all adjusted through the dialogue text set formed by the K fine-tuning dialogue texts, so that the model can learn knowledge in the K fine-tuning dialogue texts better, and the trained large language model is more accurate.
In one possible design, the method further comprises: determining a second fine tuning weight for the set of dialog texts; updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix to obtain an M+1st overall adjustment large language model, wherein the method comprises the following steps: and multiplying the adjusted fine tuning query matrix, the fine tuning key matrix and the fine tuning value matrix by the second fine tuning weight respectively, and then adding the second fine tuning weight with the reference query matrix, the reference key matrix and the reference value matrix respectively to obtain an M+1st overall adjustment large language model.
In the technical scheme, the content of the text set characterization of the dialogue can be emphasized or weakened by adjusting the value of the second fine tuning weight, so that the large language model special for the user is trained to be more personalized.
In a second aspect, embodiments of the present application provide a dialogue response apparatus based on a large language model, where the apparatus includes:
the acquisition module is used for acquiring dialogue texts subjected to Nth fine tuning on the large language model exclusive to the user by the user;
the adjusting module is used for adjusting the self-attention module of each layer of the fine tuning language model through the dialogue text; the fine-tuning large language model is an M-th large language model after overall adjustment or an N-1-th large language model after fine tuning; the self-attention module of any layer of the fine tuning large language model comprises a reference query matrix, a reference key matrix and a fine tuning value matrix; the reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model;
the adjusting module is further used for acquiring self-characteristics of the words through a reference query matrix of the self-attention module for each word in the dialogue text in the process of adjusting the self-attention module of each layer; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module;
The adjustment module is further used for adjusting parameters of the fine adjustment value matrix of the self-attention module based on the output characteristics of the dialogue text until the parameters meet the precision requirement of the large language model, and obtaining an Nth adjusted fine adjustment value matrix;
the processing module is used for obtaining the value matrix after the Nth adjustment according to the reference value matrix and the fine adjustment value matrix after the Nth adjustment; the reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model;
the processing module is further used for generating an Nth adjusted large language model based on the reference query matrix, the reference key matrix and the Nth adjusted value matrix of the self-attention module of each layer;
and the processing module is also used for taking the Nth adjusted large language model as the exclusive large language model of the user and continuing to carry out dialogue with the user.
In one possible design, the processing module is further configured to determine a first fine tuning weight of the dialog text; the first fine tuning weight is used for representing the influence degree of the dialogue text on the exclusive large language model of the user; the processing module is specifically configured to multiply the trimming value matrix after the nth adjustment by the first trimming weight and add the trimming value matrix after the nth adjustment to the reference value matrix to obtain the value matrix after the nth adjustment when obtaining the value matrix after the nth adjustment according to the reference value matrix and the trimming value matrix after the nth adjustment.
In one possible design, the adjustment module is specifically configured to decompose the fine adjustment value matrix into a first submatrix, a second submatrix and a third submatrix that are multiplied when the attention score of the word is used as a weight of the fine adjustment value matrix of the self-attention module to obtain the corresponding output feature of the word under the self-attention module; taking the attention score of the word as the weights of the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix to obtain the corresponding output characteristics of the word under the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix; the adjustment module is specifically configured to adjust parameters of the fine adjustment value matrix of the self-attention module based on the output characteristics of the dialog text until the parameters meet the accuracy requirement of the large language model, and when the fine adjustment value matrix after the nth adjustment is obtained, obtain the first sub-matrix, the second sub-matrix and the third sub-matrix after the nth adjustment when the parameters of the first sub-matrix, the second sub-matrix and the third sub-matrix meet the accuracy requirement of the large language model by adjusting the parameters of the first sub-matrix, the second sub-matrix and the third sub-matrix based on the output characteristics of the dialog text; multiplying the first sub-matrix, the second sub-matrix and the third sub-matrix after the N-th adjustment to obtain a trimming value matrix after the N-th adjustment.
In one possible design, the processing module is specifically configured to compare, when determining the first fine tuning weight of the dialog text, information represented by each vector in the first matrix and/or the third matrix after the nth adjustment with information represented by each vector corresponding to the first matrix and/or the third matrix obtained by the first matrix and/or the third matrix after the nth adjustment, and if it is determined that new topic information exists, adjust the first fine tuning weight of the dialog text higher.
In one possible design, the processing module is further configured to store the trim value matrix after each trimming after the mth overall adjustment and the corresponding first trim weight; the processing module multiplies the trimming value matrix after the nth adjustment by the first trimming weight, and adds the trimming value matrix with the reference value matrix to obtain the value matrix after the nth adjustment, and is specifically configured to add the product of the trimming value matrix after each trimming and the corresponding first trimming weight to the reference value matrix to obtain the parameter of the value matrix after the nth adjustment.
In one possible design, the obtaining module is further configured to obtain each dialog text that is trimmed K times by the user for the user-specific large language model, to form a dialog text set; the adjusting module is further used for integrally adjusting each matrix of the self-attention module of each layer of the large language model after the M-th integral adjustment through the dialogue text set; in the adjustment process, for each word in the dialogue text set, acquiring the self-characteristics of the word through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of a reference value matrix to obtain the corresponding output characteristic of the word under the self-attention module; based on the output characteristics of the dialogue text set, parameters of a reference query matrix, a reference key matrix and a reference value matrix of the self-attention module are adjusted until the parameters meet the precision requirement of a large language model, and an adjusted fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix are obtained; and the processing module is further used for updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, the fine tuning key matrix and the fine tuning value matrix to obtain an M+1st overall adjustment large language model.
In one possible design, the processing module is further configured to determine a second fine tuning weight for the set of dialog texts; the processing module is further specifically configured to, when updating the reference query matrix, the reference key matrix, and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix to obtain an m+1th overall-adjustment large language model, multiply the adjusted fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix by the second fine tuning weights respectively, and then add the multiplied fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix to the reference query matrix, the reference key matrix, and the reference value matrix, respectively, to obtain the m+1th overall-adjustment large language model.
In a third aspect, embodiments of the present application further provide a computing device, including:
a memory for storing program instructions;
a processor for invoking program instructions stored in said memory and executing the method as described in any of the possible designs of the first aspect in accordance with the obtained program instructions.
In a fourth aspect, embodiments of the present application further provide a computer-readable storage medium, in which computer-readable instructions are stored, which when read and executed by a computer, cause the method described in any one of the possible designs of the first aspect to be implemented.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a prior art self-attention module of a large language model;
FIG. 2 is a flow chart of a dialogue response method based on a large language model according to an embodiment of the present application;
FIG. 3 is a schematic diagram I of a self-attention module of a large language model according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for overall tuning a large language model according to an embodiment of the present disclosure;
FIG. 5 is a second schematic diagram of a self-attention module of a large language model according to an embodiment of the present application;
FIG. 6 is a third schematic diagram of a self-attention module of a large language model provided in an embodiment of the present application;
FIG. 7 is a schematic diagram of a self-attention module of a large language model according to an embodiment of the present application;
FIG. 8 is a schematic structural diagram of a dialogue response device based on a large language model according to an embodiment of the present application;
Fig. 9 is a schematic structural diagram of a computing device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the present application more apparent, the present application will be described in further detail below with reference to the accompanying drawings, wherein it is apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
In the embodiments of the present application, a plurality means two or more. The words "first," "second," and the like are used merely for distinguishing between the descriptions and not be construed as indicating or implying a relative importance or order.
In an application scene, a user gathers chat conversations between a plurality of days and friends, takes the words of the friends as the conversation text, takes the words of the friends as the replies of the conversation text, and trains a large language model after the replies are arranged into a format required by the model to obtain a personalized large language model of the exclusive user. After learning the dialogue information of the friends and the days, the large language model can replace the user to respond in the next dialogue if the dialogue content is still within the scope. Because users can continually generate new chat conversations, the large language model needs to be adjusted frequently or periodically based on the new chat conversations so that the large language model can learn new information.
As shown in FIG. 1, the self-attention modules of the various layers of the large language model all include three matrices, a query matrix Wq, a key matrix Wk, and a value matrix Wv. Wherein for each word in the input text there corresponds a query matrix, a key matrix and a value matrix. If the parameters in the query matrix and the key matrix value matrix are adjusted each time when the large language model is adjusted, the parameters to be adjusted are very large, which results in very large calculated amount and memory amount and long training time.
The dialogue response method based on the large language model can reduce the calculated amount and the memory amount when the large language model is adjusted, and shortens the training time.
Fig. 2 is a flow chart of a dialogue response method based on a large language model according to an embodiment of the present application, as shown in fig. 2, the method includes the following steps:
step 210, obtaining the dialogue text of the nth fine tuning of the user on the user-specific large language model.
Step 220, the self-attention module of each layer of the fine tuning language model is adjusted through the dialogue text.
In this embodiment, the user-specific large language model means that the large language model is obtained by performing personalized training on the original large language model according to training data of the user. Taking the nth fine tuning of a user-specific large language model as an example, the solution of the embodiment of the present application will be described.
N-1 fine adjustments and M global adjustments have been made to the large language model before the Nth fine adjustment is made to the large language model. Wherein, the whole adjustment of the large language model means fine adjustment of a query matrix, a key matrix and a value matrix in the large language model; trimming a large language model refers to trimming only the value matrix in the large language model.
In step 220, the trimmed large language model may be the Mth wholly trimmed large language model or the N-1 th trimmed large language model. The self-attention module of any layer of the fine-tuning large language model comprises a reference query matrix, a reference key matrix and a fine-tuning value matrix. The reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model.
That is, if the fine-tuned large language model is the M-th large language model after the overall tuning, the initial value of the parameter in the fine-tuned value matrix at this time of fine-tuning is the parameter value in the value matrix of the M-th large language model after the overall tuning; if the N-1 th fine-tuned large language model is fine-tuned, the initial value of the parameter in the fine-tuned value matrix is the parameter value in the N-1 th fine-tuned large language model value matrix.
Step 230, in the process of adjusting the self-attention module of each layer, for each word in the dialogue text, acquiring the self-feature of the word through the reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the associated characteristics of the word and other words; and taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module.
And 240, adjusting parameters of the fine tuning value matrix of the self-attention module based on the output characteristics of the dialogue text until the parameters meet the precision requirement of the large language model, and obtaining the fine tuning value matrix after the Nth adjustment.
In the embodiment of the application, when the nth fine adjustment is performed on the large language model, parameters of a reference query matrix and a reference key matrix of the self-attention module are kept unchanged, and only parameters of a fine adjustment value matrix of the self-attention module are adjusted to obtain the fine adjustment value matrix after the nth fine adjustment.
When the parameters of the trimming value matrix are adjusted, one adjustment mode is to obtain an adjusted trimming value matrix by adjusting each parameter in the trimming value matrix. However, in this adjustment method, the number of parameters to be adjusted is large, for example, a matrix with a trim value matrix of 256×256, and the number of parameters to be adjusted is 256×256=65536, and although only the trim value matrix is adjusted, it takes a long time to adjust each parameter in the trim value matrix. In this regard, the present application provides another adjustment method, where the fine adjustment value is decomposed into three sub-matrices to be multiplied, and parameters in the matrix for adjusting the fine adjustment value are converted into parameters for adjusting the three sub-matrices respectively, so as to achieve the purposes of reducing adjustment parameters and shortening training time.
Specifically, in step 230, the attention score of the word is used as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output feature of the word under the self-attention module, which includes:
step 231, decomposing the fine tuning value matrix into a multiplied first sub-matrix, a second sub-matrix and a third sub-matrix.
Illustratively, the trim value matrix M may be decomposed into the multiplied first, second, and third submatrices U, sigma, V by singular value decomposition of the trim value matrix, i.e., by equation one T
Equation one
If the trim value matrix M is an mxm matrix, the decomposed first sub-matrix U is an mxn matrix, the second sub-matrix Σ is an nxn matrix, and the third sub-matrix VT is an nxm matrix.
The effective size of the second sub-matrix Σ is determined by the number of non-0 elements on the diagonal of the second sub-matrix Σ, no matter how large the size of the trimming-value matrix M is, and the larger the value of each element on the diagonal is, the higher the importance of the trimming-value matrix M in this direction can be said to be.
And 232, taking the attention scores of the words as weights of the multiplied first submatrices, the multiplied second submatrices and the multiplied third submatrices to obtain corresponding output characteristics of the words under the multiplied first submatrices, the multiplied second submatrices and the multiplied third submatrices.
In step 240, based on the output feature of the dialogue text, parameters of the fine tuning value matrix of the self-attention module are adjusted until the parameters meet the accuracy requirement of the large language model, and the fine tuning value matrix after the nth adjustment is obtained, including:
step 241, based on the output characteristics of the dialogue text, the parameters of the first sub-matrix, the second sub-matrix and the third sub-matrix are adjusted, and when the precision requirement of the large language model is met, the first sub-matrix, the second sub-matrix and the third sub-matrix after the Nth adjustment are obtained.
And step 242, multiplying the first sub-matrix, the second sub-matrix and the third sub-matrix after the nth adjustment to obtain a trimming value matrix after the nth adjustment.
Also taking a matrix with the trimming value matrix of 256×256 as an example, assuming that the first sub-matrix obtained by decomposition is 256×8, the second sub-matrix is 8×8, and the third sub-matrix is 8×256, after the trimming value matrix is decomposed, parameters to be adjusted are reduced to 256×8+8×8+8×256=4160, so that the number of parameters involved in training is obviously reduced, and the training time can be further shortened.
Step 250, obtaining the value matrix after the nth adjustment according to the reference value matrix and the trimming value matrix after the nth adjustment.
In step 240, the trim value matrix after the nth adjustment is obtained, which is only the trim value matrix obtained through the training of the dialog text after the nth adjustment, so that the value matrix after the nth adjustment is obtained according to the reference value matrix and the trim value matrix after the nth adjustment, so that the large language model has both new knowledge and old knowledge learned before. The reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model. Specifically, the reference value matrix may be added to the trimming value matrix after the nth adjustment to obtain the value matrix after the nth adjustment. As shown in 3-1 of fig. 3, parameters in the trim value matrix are adjusted based on the output characteristics of the dialog text to obtain an nth adjusted trim value matrix W vi-N Then, as shown in 3-2, the reference value matrix W i And the trimming value matrix W after the Nth adjustment vi-N Adding to obtain an Nth adjusted value matrix W vi +W vi-N . Wherein i represents the ith input, the value range of i is [1, X]。
Step 260, generating the nth adjusted large language model based on the reference query matrix, the reference key matrix and the nth adjusted value matrix of the self-attention module of each layer.
And 270, taking the Nth adjusted large language model as a special large language model of the user, and continuing to converse with the user.
The nth adjustment of the user's proprietary large language model, which learns knowledge in the nth fine-tuned dialog text, is thus completed, which the user can use to continue the dialog. When the large language model is finely tuned, the parameter values of the query matrix and the key matrix of the self-attention module are kept unchanged, and only the parameter values in the value matrix are adjusted, so that the calculated amount and the memory amount when the large language model is adjusted can be reduced, and the training time is shortened.
Further, after K trims are performed, the K trimmed dialog texts may be formed into a dialog text set, through which all three matrices of the self-attention module of the large language model are adjusted. Specifically, the method for integrally adjusting the large language model is shown in fig. 4, and includes the following steps:
step 410, each dialogue text of K times of fine tuning of the user-specific large language model is obtained to form a dialogue text set.
In this embodiment of the present application, through a dialog text set formed by each dialog text trimmed for K times, overall adjustment for the m+1st time is performed on the large language model dedicated to the user, where each dialog text trimmed for K times may be a dialog text trimmed for the first time to the last time after the overall adjustment for the M time; or K times of fine tuning dialog texts selected by the user.
And 420, integrally adjusting each matrix of the self-attention module of each layer of the large language model after the M-th integral adjustment through the dialogue text set.
Step 430, in the adjustment process, for each word in the dialogue text set, acquiring the own feature of the word through the reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the associated characteristics of the word and other words; and taking the attention score of the word as the weight of the benchmark matrix to obtain the corresponding output characteristic of the word under the self-attention module.
And step 440, adjusting parameters of the reference query matrix, the reference key matrix and the reference value matrix of the self-attention module based on the output characteristics of the dialogue text set until the parameters meet the precision requirement of the large language model, and obtaining an adjusted fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix.
In the embodiment of the application, when the M-th overall adjustment is performed on the large language model, parameters of a reference query matrix, a reference key matrix and a reference value matrix of the self-attention module are all adjusted, so as to obtain a fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix after the M-th overall adjustment.
When parameters of the reference query matrix, the reference key matrix and the reference value matrix are adjusted, one adjustment mode is to adjust each parameter in the three matrices; the other adjustment mode is to decompose the reference query matrix, the reference key matrix and the reference value matrix into three sub-matrices for multiplication respectively, and for each matrix, the parameters in the adjustment matrix are converted into the parameters for adjusting the three sub-matrices respectively, so that the purposes of reducing adjustment parameters and shortening training time are achieved. The specific decomposition method refers to the specific steps of decomposing the fine tuning value matrix.
And 450, updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, the fine tuning key matrix and the fine tuning value matrix to obtain the M+1st integrally adjusted large language model.
The adjusted trim query matrix, trim key matrix, and trim value matrix obtained in step 440 are only a set of dialog texts formed for K trim dialog texts, and therefore the reference query matrix, reference key matrix, and reference value matrix need to be updated based on the adjusted trim query matrix, trim key matrix, and trim value matrix. Specifically, the reference query matrix, the reference key matrix and the base of the large language model after the Mth overall adjustment can be obtained The quasi-value matrix is respectively added with the adjusted fine tuning query matrix, the fine tuning key matrix and the fine tuning value matrix, and the added matrix is respectively used as a reference query matrix, a reference key matrix and a reference value matrix of the M+1st overall adjustment large language model. As shown in fig. 5-1, parameters of the reference query matrix, the reference key matrix and the reference value matrix are adjusted based on the output characteristics of the dialogue text set to obtain a fine tuning query matrix W after the M-th overall adjustment qi-M+1 Fine tuning key matrix W ki-M+1 Trimming value matrix W vi-M+1 Then, as shown in 5-2, the M th overall adjusted reference query matrix W of the large language model qi Reference key matrix W ki And a reference value matrix W vi Respectively with the adjusted fine tuning query matrix W qi-M+1 Fine tuning key matrix W ki-M+1 Trimming value matrix W vi-M+1 Adding to obtain a reference query matrix W of the M+1st overall-adjustment large language model qi +W qi-M+1、 Reference key matrix W ki +W ki-M+1 And a reference value matrix W vi +W vi-M+1 . Wherein i represents the ith input, the value range of i is [1, X]。
According to the technical scheme, the three matrixes of the self-attention module of the large language model are all adjusted through the dialogue text set formed by the K fine-tuning dialogue texts, so that the model can learn knowledge in the K fine-tuning dialogue texts better, and the trained large language model is more accurate.
In one possible implementation, a first trimming weight of the dialog text may also be determined, where the first trimming weight is used to characterize the extent to which the dialog text affects the user's proprietary large language model, and the first trimming weight of the dialog text is multiplied by the adjusted trimming value matrix to emphasize or deemphasize the content of the dialog text characterization according to the specific value of the first trimming weight.
It will be appreciated that assuming that the original large language model is trained by one million dialogs and the fine-tuned large language model is trained by 1 thousand dialogs, the theoretical new knowledge only accounts for 0.1% of the total knowledge of the model. In general, for new knowledge or important knowledge we want a higher weight, that is, multiplying the corresponding fine tuning value matrix by a higher weight, taking the reference value of the weight as 1 as an example, a value greater than 1 can be multiplied, which is equivalent to "Qianliang Wan" for this knowledge; for old knowledge or optional knowledge, the corresponding trim value matrix may be multiplied by a smaller weight, e.g., by a value less than 1, corresponding to "ignoring" the knowledge of this time.
For example, when determining the first fine tuning weight of the dialog text, the information represented by each vector in the first matrix and/or the third matrix after the nth adjustment may be compared with the information represented by each vector corresponding to the first matrix and/or the third matrix obtained by the first adjustment before the nth adjustment, and if it is determined that new topic information exists, the first fine tuning weight of the dialog text is increased. Specifically, each parameter in the first matrix and/or the third matrix after the nth adjustment may be added to an absolute value of a difference between each corresponding parameter in the first matrix and/or the third matrix obtained by the first adjustment before the nth adjustment, and the obtained value is compared with a set threshold value, and if the obtained value is higher than the set threshold value, it is determined that new topic information exists.
In addition, the user using the large language model may set the trimming weight of the dialogue text at each trimming according to the importance level of the dialogue text.
Step 250, according to the reference value matrix and the trimming value matrix after the nth adjustment, obtaining an nth adjusted value matrix, which may be further optimized as step 250', multiplying the trimming value matrix after the nth adjustment by the first trimming weight, and adding the first trimming weight to the reference value matrix to obtain the nth adjusted value matrix. As shown in 6-1 of FIG. 6, the N-th adjusted trim value matrix W vi-N Multiplying the first fine tuning weight alpha and then multiplying the first fine tuning weight alpha by a reference value matrix W i Adding to obtain an Nth adjusted value matrix W viN *W vi-N . Wherein i represents the ith input, the value range of i is [1, X]。
For a set of dialog texts when an overall adjustment is made to a large language model, a second fine-tuning weight for the set of dialog texts may also be determined.
Then, in step 450, based on the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix, the reference query matrix, reference key matrix and reference value matrix are updated to obtain an m+1st overall adjusted large language model, which may be further optimized to 450', and the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix are multiplied by the second fine tuning weights respectively and then added to the reference query matrix, reference key matrix and reference value matrix respectively to obtain an m+1st overall adjusted large language model. As shown in 6-2 of FIG. 6, the adjusted fine tuning query matrix W qi-M+1 Fine tuning key matrix W ki-M+1 Trimming value matrix W vi-M+1 Respectively multiplied by a second fine tuning weight alpha M+1 Then, the reference query matrix is added with the reference query matrix, the reference key matrix and the reference value matrix respectively to obtain the reference query matrix of the M+1st integrally-adjusted large language model Reference key matrixAnd a reference value matrix->. Wherein i represents the ith input, the value range of i is [1, X]。
In one possible implementation, the trim value matrix after each trim after the mth overall adjustment and the corresponding first trim weight may be stored.
Then step 250, multiplying the trimming value matrix after the nth adjustment by the first trimming weight, and adding the trimming value matrix to the reference value matrix to obtain the value matrix after the nth adjustment, including: and accumulating the products of the trimming value matrix after each trimming and the corresponding first trimming weight, and adding the products with the reference value matrix to obtain the parameters of the value matrix after the Nth trimming.
It will be appreciated that, unlike a person, if information in matrix parameters has been written, it is impossible to delete a part of the information, and therefore, in this embodiment of the present application, parameters in the fine tuning value matrix after each fine tuning are not immediately added to parameters in the matrix value matrix to become permanent, but the fine tuning value matrix after each fine tuning is stored and the corresponding first fine tuning weight is stored.
As shown in fig. 7-1, the parameters in the fine tuning value matrix after each fine tuning are stored, and when a large language model is used, the fine tuning value matrix W after each fine tuning is used vi-1 、W vi-2 、…W vi-N And a reference value matrix W vi Adding to obtain an Nth adjusted value matrix W vi +ΣW vi-N . As shown in 7-2 of fig. 7, the parameters in the fine tuning value matrix after each fine tuning and the corresponding first fine tuning weights are stored, and when a large language model is used, the fine tuning value matrix W after each fine tuning is stored vi-1 、W vi-2 、…W vi-N With corresponding first fine tuning weight alpha 1 、α 2 、…α N Product of (2)After accumulating, the data is combined with a reference value matrix W vi Adding to obtain a value matrix after the Nth adjustment +.>. Wherein i represents the ith input, the value range of i is [1, X]。
When the large language model is integrally adjusted, after the reference query matrix, the reference key matrix and the reference value matrix are updated, a storage space for storing the trimming value matrix and the first trimming weight can be released so as to carry out the next round of trimming.
This gives the user the opportunity to correct, for example, that a certain fine-tuned dialog text message is erroneous or that a certain fine-tuned dialog text message is not required, the fine-tuning value matrix of that time can be deleted. Or the user wants to emphasize or deemphasize the dialog text information for a particular time, the first trimming weight for that time may be turned up or down.
Based on the same technical concept, fig. 8 exemplarily shows a schematic structural diagram of a dialogue response device based on a large language model according to an embodiment of the present application, and as shown in fig. 8, the device 800 includes:
An obtaining module 801, configured to obtain a dialogue text for performing an nth fine adjustment on the user-specific large language model by a user;
an adjustment module 802, configured to adjust the self-attention module of each layer of the fine-tuning language model through the dialog text; the fine-tuning large language model is an M-th large language model after overall adjustment or an N-1-th large language model after fine tuning; the self-attention module of any layer of the fine tuning large language model comprises a reference query matrix, a reference key matrix and a fine tuning value matrix; the reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model;
the adjusting module 802 is further configured to obtain, for each word in the dialog text, a self-feature of the word through a reference query matrix of the self-attention module in the process of adjusting the self-attention module of each layer; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module;
The adjustment module 802 is further configured to adjust parameters of the fine adjustment value matrix of the self-attention module based on the output feature of the dialog text until the parameters meet the accuracy requirement of the large language model, and obtain an nth adjusted fine adjustment value matrix;
a processing module 803, configured to obtain an nth adjusted value matrix according to the reference value matrix and the nth adjusted trimming value matrix; the reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model;
the processing module 803 is further configured to generate an nth adjusted large language model based on the reference query matrix, the reference key matrix, and the nth adjusted value matrix of the self-attention module of each layer;
the processing module 803 is further configured to use the nth adjusted large language model as a dedicated large language model of the user, and continue to perform a dialogue with the user.
In one possible design, the processing module 803 is further configured to determine a first fine tuning weight of the dialog text; the first fine tuning weight is used for representing the influence degree of the dialogue text on the exclusive large language model of the user; the processing module 803 is specifically configured to multiply the trimming value matrix after the nth adjustment by the first trimming weight and add the trimming value matrix after the nth adjustment to the reference value matrix to obtain the value matrix after the nth adjustment when obtaining the value matrix after the nth adjustment according to the reference value matrix and the trimming value matrix after the nth adjustment.
In one possible design, the adjustment module 802 is specifically configured to decompose the fine adjustment value matrix into a first submatrix, a second submatrix, and a third submatrix that are multiplied when the attention score of the word is used as a weight of the fine adjustment value matrix of the self-attention module to obtain the corresponding output feature of the word under the self-attention module; taking the attention score of the word as the weights of the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix to obtain the corresponding output characteristics of the word under the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix; the adjusting module 802 is specifically configured to, when adjusting parameters of the fine adjustment value matrix of the self-attention module based on the output characteristics of the dialog text until the parameters meet the accuracy requirement of the large language model, obtain an nth adjusted fine adjustment value matrix, and, when meeting the accuracy requirement of the large language model, obtain the nth adjusted first, second and third submatrices by adjusting parameters of the first, second and third submatrices based on the output characteristics of the dialog text; multiplying the first sub-matrix, the second sub-matrix and the third sub-matrix after the N-th adjustment to obtain a trimming value matrix after the N-th adjustment.
In one possible design, the processing module 803 is specifically configured to compare, when determining the first fine tuning weight of the dialog text, information represented by each vector in the first matrix and/or the third matrix after the nth adjustment with information represented by each vector corresponding to each vector in the first matrix and/or the third matrix obtained by the first adjustment before the nth adjustment, and if it is determined that new topic information exists, raise the first fine tuning weight of the dialog text.
In one possible design, the processing module 803 is further configured to store the trim value matrix after each trimming after the mth overall adjustment and the corresponding first trim weight; the processing module 803 is specifically configured to multiply the trimming value matrix after the nth adjustment by the first trimming weight, and add the trimming value matrix to the reference value matrix to obtain the value matrix after the nth adjustment, and then add the product of the trimming value matrix after each trimming and the corresponding first trimming weight to the reference value matrix to obtain the parameter of the value matrix after the nth adjustment.
In one possible design, the obtaining module 801 is further configured to obtain each dialog text that is trimmed for K times by the user for the user-specific large language model, to form a dialog text set; the adjustment module 802 is further configured to integrally adjust, by using the dialog text set, each matrix of the self-attention module of each layer of the large language model after the M-th overall adjustment; in the adjustment process, for each word in the dialogue text set, acquiring the self-characteristics of the word through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of a reference value matrix to obtain the corresponding output characteristic of the word under the self-attention module; based on the output characteristics of the dialogue text set, parameters of a reference query matrix, a reference key matrix and a reference value matrix of the self-attention module are adjusted until the parameters meet the precision requirement of a large language model, and an adjusted fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix are obtained; the processing module 803 is further configured to update the reference query matrix, the reference key matrix, and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix, to obtain an m+1st overall adjustment large language model.
In one possible design, the processing module 803 is further configured to determine a second fine tuning weight of the set of dialog texts; the processing module 803 is further specifically configured to, when updating the reference query matrix, the reference key matrix, and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix to obtain an m+1st overall adjusted large language model, multiply the adjusted fine tuning query matrix, fine tuning key matrix, and fine tuning value matrix by the second fine tuning weights respectively, and then add the multiplied fine tuning query matrix, fine tuning key matrix, and the adjusted fine tuning value matrix to the reference query matrix, the reference key matrix, and the reference value matrix, respectively, to obtain the m+1st overall adjusted large language model.
Based on the same technical concept, the embodiment of the present application provides a computing device, as shown in fig. 9, including at least one processor 901, and a memory 902 connected to the at least one processor, where a specific connection medium between the processor 901 and the memory 902 is not limited in the embodiment of the present application, and in fig. 9, the processor 901 and the memory 902 are connected by a bus, for example. The buses may be divided into address buses, data buses, control buses, etc.
In the embodiment of the present application, the memory 902 stores instructions executable by the at least one processor 901, and the at least one processor 901 may perform the above-described dialogue response method based on the large language model by executing the instructions stored in the memory 902.
Where the processor 901 is a control center of a computing device, various interfaces and lines may be utilized to connect various portions of the computer device, to make resource settings by executing or executing instructions stored in the memory 902 and invoking data stored in the memory 902.
Alternatively, the processor 901 may include one or more processing units, and the processor 901 may integrate an application processor and a modem processor, wherein the application processor primarily processes operating systems, user interfaces, application programs, and the like, and the modem processor primarily processes wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 901. In some embodiments, processor 901 and memory 902 may be implemented on the same chip, and in some embodiments they may be implemented separately on separate chips.
The processor 901 may be a general purpose processor such as a Central Processing Unit (CPU), digital signal processor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in the processor for execution.
The memory 902 is a non-volatile computer-readable storage medium that can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory 902 may include at least one type of storage medium, which may include, for example, flash Memory, hard disk, multimedia card, card Memory, random access Memory (Random Access Memory, RAM), static random access Memory (Static Random Access Memory, SRAM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read-Only Memory (ROM), charged erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory), magnetic Memory, magnetic disk, optical disk, and the like. Memory 902 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 902 of the present embodiment may also be circuitry or any other device capable of implementing a memory function for storing program instructions and/or data.
Based on the same technical concept, the embodiments of the present application also provide a computer-readable storage medium storing a computer-executable program for causing a computer to execute the large language model-based dialogue response method listed in any one of the above-described modes.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the spirit or scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to cover such modifications and variations.

Claims (9)

1. A large language model based dialog response method, the method comprising:
acquiring a dialogue text of performing Nth fine tuning on the large language model exclusive to the user by the user;
adjusting the self-attention module of each layer of the fine tuning language model through the dialogue text; the fine-tuning large language model is an M-th large language model after overall adjustment or an N-1-th large language model after fine tuning; the self-attention module of any layer of the fine tuning large language model comprises a reference query matrix, a reference key matrix and a fine tuning value matrix; the reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model;
in the process of adjusting the self-attention module of each layer, acquiring the self-characteristics of each word in the dialogue text through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module;
Based on the output characteristics of the dialogue text, adjusting parameters of a fine tuning value matrix of the self-attention module until the parameters meet the precision requirement of a large language model, and obtaining an Nth adjusted fine tuning value matrix;
determining a first fine tuning weight of the dialog text; the first fine tuning weight is used for representing the influence degree of the dialogue text on the exclusive large language model of the user; multiplying the trimming value matrix after the Nth adjustment by the first trimming weight, and adding the trimming value matrix with a reference value matrix to obtain a value matrix after the Nth adjustment; the reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model;
generating an Nth adjusted large language model based on the reference query matrix, the reference key matrix and the Nth adjusted value matrix of the self-attention module of each layer;
and taking the Nth adjusted large language model as the exclusive large language model of the user, and continuing to converse with the user.
2. The method according to claim 1, wherein the weighting of the attention score of the word as the fine adjustment value matrix of the self-attention module to obtain the corresponding output feature of the word under the self-attention module includes:
Decomposing the fine tuning value matrix into a multiplied first sub-matrix, a second sub-matrix and a third sub-matrix;
taking the attention score of the word as the weights of the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix to obtain the corresponding output characteristics of the word under the multiplied first submatrix, the multiplied second submatrix and the multiplied third submatrix;
the step of adjusting parameters of the fine tuning value matrix of the self-attention module based on the output characteristics of the dialogue text until the parameters meet the precision requirement of the large language model, and obtaining the fine tuning value matrix after the Nth adjustment comprises the following steps:
based on the output characteristics of the dialogue text, the parameters of the first submatrix, the second submatrix and the third submatrix are adjusted, and when the precision requirement of a large language model is met, the first submatrix, the second submatrix and the third submatrix after the Nth adjustment are obtained;
multiplying the first sub-matrix, the second sub-matrix and the third sub-matrix after the N-th adjustment to obtain a trimming value matrix after the N-th adjustment.
3. The method of claim 2, wherein the determining the first fine tuning weight of the dialog text comprises:
Comparing the information represented by each vector in the first matrix and/or the third matrix after the N-th adjustment with the information represented by each vector corresponding to the first matrix and/or the third matrix obtained by the adjustment before the N-th adjustment, and if the new topic information exists, increasing the first fine tuning weight of the dialogue text.
4. The method according to claim 1, wherein the method further comprises:
storing a trimming value matrix after each trimming after the Mth integral adjustment and a corresponding first trimming weight;
multiplying the trimming value matrix after the nth adjustment by the first trimming weight, and adding the trimming value matrix with the reference value matrix to obtain a value matrix after the nth adjustment, wherein the method comprises the following steps of:
and accumulating the products of the fine tuning value matrix after each fine tuning and the corresponding first fine tuning weight, and adding the products with the reference value matrix to obtain the parameters of the value matrix after the Nth adjustment.
5. The method according to claim 1, wherein the method further comprises:
acquiring each dialogue text of K times of fine tuning of a user on the large language model exclusive to the user to form a dialogue text set;
the matrices of the self-attention modules of all layers of the large language model after the M-th integral adjustment are integrally adjusted through the dialogue text set; in the adjustment process, for each word in the dialogue text set, acquiring the self-characteristics of the word through a reference query matrix of the self-attention module; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of a reference value matrix to obtain the corresponding output characteristic of the word under the self-attention module;
Based on the output characteristics of the dialogue text set, parameters of a reference query matrix, a reference key matrix and a reference value matrix of the self-attention module are adjusted until the parameters meet the precision requirement of a large language model, and an adjusted fine tuning query matrix, a fine tuning key matrix and a fine tuning value matrix are obtained;
and updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix to obtain an M+1st overall adjustment large language model.
6. The method of claim 5, wherein the method further comprises:
determining a second fine tuning weight for the set of dialog texts;
updating the reference query matrix, the reference key matrix and the reference value matrix based on the adjusted fine tuning query matrix, fine tuning key matrix and fine tuning value matrix to obtain an M+1st overall adjustment large language model, wherein the method comprises the following steps:
and multiplying the adjusted fine tuning query matrix, the fine tuning key matrix and the fine tuning value matrix by the second fine tuning weight respectively, and then adding the second fine tuning weight with the reference query matrix, the reference key matrix and the reference value matrix respectively to obtain an M+1st overall adjustment large language model.
7. A large language model based dialog response device comprising:
the acquisition module is used for acquiring dialogue texts subjected to Nth fine tuning on the large language model exclusive to the user by the user;
the adjusting module is used for adjusting the self-attention module of each layer of the fine tuning language model through the dialogue text; the fine-tuning large language model is an M-th large language model after overall adjustment or an N-1-th large language model after fine tuning; the self-attention module of any layer of the fine tuning large language model comprises a reference query matrix, a reference key matrix and a fine tuning value matrix; the reference query matrix and the reference key matrix are obtained by carrying out the M-th integral adjustment on the large language model;
the adjusting module is further used for acquiring self-characteristics of the words through a reference query matrix of the self-attention module for each word in the dialogue text in the process of adjusting the self-attention module of each layer; acquiring association features with other words through a reference key matrix of the self-attention module; obtaining the attention score of the word according to the self characteristics of the word and the related characteristics of the word and other words; taking the attention score of the word as the weight of the fine tuning value matrix of the self-attention module to obtain the corresponding output characteristic of the word under the self-attention module;
The adjustment module is further used for adjusting parameters of the fine adjustment value matrix of the self-attention module based on the output characteristics of the dialogue text until the parameters meet the precision requirement of the large language model, and obtaining an Nth adjusted fine adjustment value matrix;
a processing module for determining a first fine tuning weight of the dialog text; the first fine tuning weight is used for representing the influence degree of the dialogue text on the exclusive large language model of the user; multiplying the trimming value matrix after the Nth adjustment by the first trimming weight, and adding the trimming value matrix with a reference value matrix to obtain a value matrix after the Nth adjustment; the reference value matrix is obtained by carrying out the M-th integral adjustment on the large language model;
the processing module is further used for generating an Nth adjusted large language model based on the reference query matrix, the reference key matrix and the Nth adjusted value matrix of the self-attention module of each layer;
and the processing module is also used for taking the Nth adjusted large language model as the exclusive large language model of the user and continuing to carry out dialogue with the user.
8. A computing device, comprising:
a memory for storing program instructions;
A processor for invoking program instructions stored in said memory and for performing the method according to any of claims 1-6 in accordance with the obtained program instructions.
9. A computer readable storage medium comprising computer readable instructions which, when read and executed by a computer, cause the method of any one of claims 1 to 6 to be implemented.
CN202311548034.XA 2023-11-20 2023-11-20 Dialogue response method and device based on large language model Active CN117273067B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311548034.XA CN117273067B (en) 2023-11-20 2023-11-20 Dialogue response method and device based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311548034.XA CN117273067B (en) 2023-11-20 2023-11-20 Dialogue response method and device based on large language model

Publications (2)

Publication Number Publication Date
CN117273067A CN117273067A (en) 2023-12-22
CN117273067B true CN117273067B (en) 2024-02-02

Family

ID=89219965

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311548034.XA Active CN117273067B (en) 2023-11-20 2023-11-20 Dialogue response method and device based on large language model

Country Status (1)

Country Link
CN (1) CN117273067B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395841A (en) * 2020-11-18 2021-02-23 福州大学 BERT-based method for automatically filling blank text
CN112612881A (en) * 2020-12-28 2021-04-06 电子科技大学 Chinese intelligent dialogue method based on Transformer
CN115964467A (en) * 2023-01-02 2023-04-14 西北工业大学 Visual situation fused rich semantic dialogue generation method
CN116663638A (en) * 2023-07-26 2023-08-29 海信集团控股股份有限公司 Model fine adjustment training method, device, equipment and medium
WO2023209198A1 (en) * 2022-04-28 2023-11-02 Deepmind Technologies Limited Language model for processing a multi-mode query input
CN117034090A (en) * 2023-09-06 2023-11-10 北京百度网讯科技有限公司 Model parameter adjustment and model application methods, devices, equipment and media
CN117059103A (en) * 2023-10-12 2023-11-14 慧言科技(天津)有限公司 Acceleration method of voice recognition fine tuning task based on low-rank matrix approximation

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230135179A1 (en) * 2021-10-21 2023-05-04 Meta Platforms, Inc. Systems and Methods for Implementing Smart Assistant Systems
US20230325725A1 (en) * 2022-04-12 2023-10-12 Google Llc Parameter Efficient Prompt Tuning for Efficient Models at Scale

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395841A (en) * 2020-11-18 2021-02-23 福州大学 BERT-based method for automatically filling blank text
CN112612881A (en) * 2020-12-28 2021-04-06 电子科技大学 Chinese intelligent dialogue method based on Transformer
WO2023209198A1 (en) * 2022-04-28 2023-11-02 Deepmind Technologies Limited Language model for processing a multi-mode query input
CN115964467A (en) * 2023-01-02 2023-04-14 西北工业大学 Visual situation fused rich semantic dialogue generation method
CN116663638A (en) * 2023-07-26 2023-08-29 海信集团控股股份有限公司 Model fine adjustment training method, device, equipment and medium
CN117034090A (en) * 2023-09-06 2023-11-10 北京百度网讯科技有限公司 Model parameter adjustment and model application methods, devices, equipment and media
CN117059103A (en) * 2023-10-12 2023-11-14 慧言科技(天津)有限公司 Acceleration method of voice recognition fine tuning task based on low-rank matrix approximation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Compressing Pre-trained Language Models by Matrix Decomposition;Matan Ben Noach, Yoav Goldberg;《/Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing》;第884-889页 *
LoRA: Low-Rank Adaptation of Large Language Models;Hu E J, Shen Y, Wallis P, et al.;《arxiv》;第1-26页 *
Parameter-Efficient Sparsity for Large Language Models Fine-Tuning;Yuchao Li, Fuli Luo, Chuanqi Tan, Mengdi Wang, Songfang Huang, Shen Li, Junjie Bai;《arxiv》;1-7页 *
基于大语言模型的问答技术研究进展综述;文森,钱力等;《数据分析与知识发现》;1-19页 *

Also Published As

Publication number Publication date
CN117273067A (en) 2023-12-22

Similar Documents

Publication Publication Date Title
JP2022549238A (en) Semantic understanding model training method, apparatus, electronic device and computer program
AU2016327448B2 (en) Methods for the automated generation of speech sample asset production scores for users of a distributed language learning system, automated accent recognition and quantification and improved speech recognition
KR20200014510A (en) Method for providing prediction service based on mahcine-learning and apparatus thereof
CN111241814A (en) Error correction method and device for voice recognition text, electronic equipment and storage medium
CN111783873A (en) Incremental naive Bayes model-based user portrait method and device
JP7186591B2 (en) Text Classifier, Learner, and Program
CN111968678B (en) Audio data processing method, device, equipment and readable storage medium
CN111310828A (en) Target detection model fine-tuning method and device for ADAS scene
CN113488023A (en) Language identification model construction method and language identification method
CN115017178A (en) Training method and device for data-to-text generation model
CN116129888A (en) Audio data classification method, device, equipment and medium
CN108021544B (en) Method and device for classifying semantic relation of entity words and electronic equipment
CN117273067B (en) Dialogue response method and device based on large language model
CN111241820A (en) Bad phrase recognition method, device, electronic device, and storage medium
US20210073645A1 (en) Learning apparatus and method, and program
EP4092666A1 (en) Information processing device, information processing method, and program
CN117708428A (en) Recommendation information prediction method and device and electronic equipment
CN112084831B (en) Age estimation method based on age editing
JP7112348B2 (en) SIGNAL PROCESSING DEVICE, SIGNAL PROCESSING METHOD AND SIGNAL PROCESSING PROGRAM
CN111368056A (en) Ancient poetry generation method and device
CN107402994B (en) Method and device for classifying multi-group hierarchical division
JP6518142B2 (en) Language model generation device and program thereof
KR20210074833A (en) Syntactic analysis apparatus and method for the same
CN113901842A (en) Machine translation model generation method, machine translation method and device
CN113257235B (en) Model training method, voice recognition method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant