CN117689041A

CN117689041A - Cloud integrated embedded large language model training method and language question-answering method

Info

Publication number: CN117689041A
Application number: CN202410108095.2A
Authority: CN
Inventors: 陈浩; 田聪; 于斌; 贺子轩
Original assignee: Xidian University
Current assignee: Xi'an Dongjian Data Technology Co.,Ltd.
Priority date: 2024-01-26
Filing date: 2024-01-26
Publication date: 2024-03-12
Anticipated expiration: 2044-01-26
Also published as: CN117689041B

Abstract

The invention discloses a cloud integrated embedded large language model training method and a language question-answering method, wherein the training method is applied to an edge end and comprises the following steps: inputting a test sample to a first model to be trained, and reasoning to obtain the confusion degree and throughput rate of the first model to be trained; when the confusion degree and the throughput rate do not meet the user demands, related information is sent to at least one server in communication connection with the server, so that the server trains the second model to be trained by combining training samples and regularized loss functions, and then network parameters of the trained second model to be trained are issued to the edge end; and returning to the step of obtaining the confusion degree and the throughput rate of the first model to be trained by reasoning until the confusion degree and the throughput rate meet the requirements of users, and taking the corresponding first model to be trained as a language question-answer model. The training method provided by the invention realizes the balance between the reasoning speed and the accuracy, and the obtained language question-answering model has higher efficient and intelligent language processing capability.

Description

Cloud integrated embedded large language model training method and language question-answering method

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a cloud integrated embedded large language model training method and a language question-answering method.

Background

The large language model is crucial in the field of natural language processing, is an artificial intelligent model based on deep learning technology, has strong natural language processing capability, can understand and generate natural language texts, and can be applied to automatic question-answering scenes. For example, after a user presents a question, he can understand the semantics of the question and extract relevant information from a large amount of text data, thereby giving an accurate answer.

Because the large language model faces the problems of data privacy, information security and the like, the deployment and application of the edge large language model are huge, and the requirement of deploying the large language model at the mobile phone end is also stronger. However, the calculation capability of the CPU (Central Processing Unit ) and the memory size of the system are very high in the large language model, and when the large language model is applied to an edge device such as a mobile phone, the calculation resources such as the processor speed and the memory capacity often have serious limitation on the reasoning calculation, so that the answer accuracy, the answer speed and the like of the large language model are directly affected. Therefore, how to implement high-speed reasoning under such limited computing resources is a problem to be solved urgently by those skilled in the art.

To cope with this challenge, the existing method optimizes the large language model to reduce the amount of computation required in the inference stage, such as compressing, quantifying, etc., and reduces the computing load by reducing the number of parameters and computing operations to improve the inference efficiency. In addition, lightweight models specifically designed for edge devices have been proposed that reduce the size and complexity of the model while maintaining relatively high accuracy, thereby enabling faster completion of reasoning tasks with limited computational resources.

However, the above method has limitations in solving the reasoning problem of the edge-side large language model, and the existing method usually only focuses on single reasoning speed or accuracy and lacks flexible adjustment capability in different scenes; meanwhile, as the compatibility of the large language model between different edge computing hardware platforms is poor, if the speed and the accuracy are balanced, the operation amount is large, the efficiency is low, and the natural language processing is difficult to realize.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a cloud integrated embedded large language model training method and a language question-answering method. The technical problems to be solved by the invention are realized by the following technical scheme:

in a first aspect, the present invention provides a cloud integrated embedded large language model training method, applied to an edge end, including:

inputting a test sample into a first model to be trained, and obtaining the confusion degree PS and the throughput rate T of the first model to be trained through reasoning;

when the confusion degree PS and the throughput rate T do not meet the user requirements, related information is sent to at least one server in communication connection with the server, so that the server trains a second model to be trained by combining training samples and regularized loss functions, and network parameters of the trained second model to be trained are issued to an edge; the test sample and the training sample are natural language sequences;

substituting the network parameters of the second model to be trained after the training into the first model to be trained, returning to the step of obtaining the confusion degree PS and the throughput rate T of the first model to be trained through reasoning until the confusion degree PS and the throughput rate T meet the user requirements, and taking the first model to be trained meeting the user requirements as a language question-answering model obtained through training.

In one embodiment of the invention, the regularization loss function is:

；

where regulated Loss represents a Regularized Loss function, standard Loss represents a Standard Loss function, IL represents an objective function,representing the super parameter.

In one embodiment of the invention, the objective function is:wherein->、Representing a first weight coefficient and a second weight coefficient, respectively,/->。

In one embodiment of the invention, the standard loss function is:

；

in the method, in the process of the invention,representing the number of natural language sequences entered into the second model to be trained,/->Representing the natural language sequence input to the second model to be trained +.>The%>Personal word element->Representing a desired output sequence corresponding to a natural language sequence input to the second model to be trained,/for>Representation->The%>Personal word element->Representation->The number of words involved,/->Representation->The%>Element(s)>Network parameters representing the second model to be trained, < +.>Representation->Middle->The number of elements to be added to the composition,indicated at given +.>、/>And->Under the condition->Conditional probability of (2).

In one embodiment of the invention, the related information includes at least a super parameterFirst weight coefficient determined in advance according to user's demand and self-computing platform characteristics ∈>And a second weight coefficient->；

When the confusion PS and the throughput T do not meet the user requirement, sending relevant information to at least one server in communication connection with the server, so that the server trains the second model to be trained by combining training samples and regularized loss functions, and sending the trained network parameters of the second model to be trained to an edge, including:

when the confusion degree PS and the throughput rate T do not meet the user requirements, the super-parameters are used for processing the dataInitial value of said first weight coefficient +.>And said second weight coefficient +.>To at least one server in communication with itself, so that the server will add the first weight coefficient +.>Said second weight coefficient->And said superparameter->Substituting initial values into the regularized loss function and utilizing trainingAnd after training the second model to be trained by the sample and the regularization loss function, transmitting the trained network parameters of the second model to be trained to the edge.

In one embodiment of the present invention, the server trains the second model to be trained according to the following steps:

will beInputting a training sample to a second model to be trained;

calculating a loss value of the current second model to be trained based on the regularized loss function;

judging whether the loss value converges or not; if not, adjusting the super-parameters in the regularization loss functionAnd return the ∈>Inputting training samples into a second model to be trained; otherwise, the training is finished, and the network parameters of the second model to be trained after the training is finished are issued to the edge.

In one embodiment of the present invention, the edge sends the super parameter in the form of a messageInitial value of said first weight coefficient +.>And said second weight coefficient +.>To at least one server in communicative connection with itself.

In a second aspect, the present invention provides a cloud integrated embedded large language model training method, applied to a server, including:

receiving related information sent by an edge terminal, and establishing a regularization loss function according to the related information;

after a preset number of training samples are input into a second model to be trained, calculating a loss value of a regularized loss function;

judging whether the loss value converges or not; if not, adjusting the super-parameters in the regularization loss functionAnd returning to the step of inputting the preset number of training samples into the second model to be trained; if yes, training is finished, the network parameters of the second training model to be trained after training are issued to the edge end, so that the edge end substitutes the network parameters of the second training model to be trained after training into the first training model, confusion PS and throughput T of the first training model are obtained through reasoning, and when the inferred confusion PS and throughput T meet user requirements, the first training model meeting the user requirements is used as a language question-answer model obtained through training.

In one embodiment of the present invention, the server issues parameters of the trained second model to be trained to the edge in the form of a message.

In a third aspect, the present invention further provides a language question-answering method, including:

acquiring a problem input by a user;

inputting the questions into a language question-answering model trained by the cloud integrated embedded large language model training method according to the first aspect or the second aspect, and obtaining answers to the questions output in a natural language form.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention provides a cloud integrated embedded large language model training method, which can flexibly adjust training parameters of a model based on different requirements of a user on reasoning speed and accuracy, combines an optimization strategy of a training end and a reasoning end in a training process, and can realize the optimal balance between the reasoning speed and the accuracy.

(2) According to the invention, a self-adaptive parameter adjustment strategy is adopted, and through end-to-end self-adaptive parameter adjustment, the training parameters of the second model to be trained at the server end can be dynamically adjusted in the reasoning process, so that the parameter configuration and training progress of the training of the model at the server end can be adaptively guided according to the performance indexes of speed and accuracy under different edge reasoning scenes.

(3) The invention adopts an end-to-end real-time feedback mechanism, introduces a real-time feedback network protocol, monitors the reasoning speed and accuracy in real time in the edge reasoning process, and transmits related parameters to the cloud server through the real-time feedback network protocol, so that the cloud server can adjust training parameters in time to adapt to the speed and accuracy requirements of dynamic change.

(4) The language question-answering model obtained by training can be applied to various edge end scenes, such as an embedded system, mobile equipment, the Internet of things and the like, so that more efficient and intelligent language processing capability is provided for realizing natural language question-answering of an edge end.

The present invention will be described in further detail with reference to the accompanying drawings and examples.

Drawings

FIG. 1 is a flowchart of a cloud-integrated embedded large language model training method applied to an edge according to an embodiment of the present invention;

FIG. 2 is a flowchart of a cloud integrated embedded large language model training method according to an embodiment of the present invention;

FIG. 3 is a flowchart of a cloud integrated embedded large language model training method applied to a server according to an embodiment of the present invention;

fig. 4 is a schematic diagram of an application scenario of a cloud integrated embedded large language model training method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of another application scenario of the cloud integrated embedded large language model training method provided by the embodiment of the invention;

fig. 6 is a flowchart of a language question-answering method according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.

It should be understood that the limitation of the resources of the edge device can cause that the large language model needs to balance the reasoning speed and the accuracy in the reasoning process, and in general, high-speed reasoning is often accompanied by a certain accuracy loss, while high-accuracy reasoning can sacrifice the reasoning speed; especially on different edge computing hardware platforms, there may also be differences in the demands on the speed and accuracy of reasoning due to the heterogeneity of computing resources.

In order to strike the best balance between speed and accuracy for the user experience, a large language model needs to be trained and deployed for each device's hardware configuration and performance case. At present, although some inference optimization methods exist, the methods still have some limitations when solving the problem of the inference of the large language model at the edge, such as focusing on single inference speed or accuracy, and lacking the capability of flexible adjustment under different scenes; meanwhile, reasoning of the edge-end large language model is carried out on edge-end equipment and training is carried out in a server, the training-reasoning scene of equipment splitting is extremely unfriendly to a developer, connection and testing are needed manually at different ends, if tens of series, high, middle and low-end mobile phones exist, each series has tens of products, and each product needs to iterate once in 3-6 months, so that a great deal of manpower is needed to manually adjust parameters of the cloud training large language model to adapt to mobile phones with different models and different performance parameters.

In view of the above, the invention provides a cloud integrated embedded large language model training method and a language question-answering method.

Fig. 1 is a schematic flow chart of cloud integration applied to an edge according to an embodiment of the present invention. As shown in fig. 1, an embodiment of the present invention provides a cloud integrated embedded large language model training method, which is applied to an edge end and includes:

s11, inputting a test sample into a first model to be trained, and obtaining the confusion degree PS and the throughput rate T of the first model to be trained through reasoning;

s12, when the confusion degree PS and the throughput rate T do not meet the user requirements, related information is sent to at least one server in communication connection with the server, so that the server trains the second model to be trained by combining training samples and regularized loss functions, and network parameters of the trained second model to be trained are issued to the edge end; the test sample and the training sample are natural language sequences;

s13, substituting the network parameters of the second model to be trained after training into the first model to be trained, returning to the step of obtaining the confusion PS and the throughput T of the first model to be trained through reasoning until the confusion PS and the throughput T meet the user requirements, and taking the first model to be trained meeting the user requirements as a language question-answer model obtained through training.

It should be noted that, in this embodiment, the first model to be trained and the second model to be trained are large language models with the same network structure, and existing large language models, such as LLaMA, LLaMA2, etc., may be selectively used, which is not limited in this application.

According to the embodiment, training parameters of the large language model can be dynamically adjusted according to requirements of different scenes and edge computing hardware platforms, so that a language question-answering model with optimal reasoning speed and accuracy is obtained.

Optionally, the regularization loss function is:

；

Specifically, regularized loss functions are definedI.e. the regularized Loss function is the sum of the Standard Loss function Standard Loss and IL (information Loss), wherein the Standard Loss function Standard Loss is usedIn the non-negative real value function for measuring the inconsistency degree of the predicted value and the real value of the second model to be trained, the embodiment selectively uses a negative log likelihood function, and the objective function IL is used for describing the reasoning accuracy and speed of the edge, it should be noted that, in the embodiment, by introducing the super-parameter for the objective function IL in the regularized loss function>The intensity of the objective function IL may be controlled during the training process so as to minimize the objective function IL during the training process.

Optionally, the objective function IL is:wherein->、/>Representing a first weight coefficient and a second weight coefficient, respectively,/->。

Specifically, PS (confusion score) may be used to characterize the inference accuracy of the first model to be trained at the edge, where the confusion PS is defined as follows in this embodiment:

；

wherein, the test sample at the edge end is also a natural language sequence,representing the +.th in the natural language sequence of the first model to be trained entered during the reasoning process>Personal word element->Represent the first/>All tokens before the individual token, +.>Representing the length of the natural language sequence, i.e. the number of tokens,/-for inputting the first model to be trained in the reasoning process>Indicated at given +.>And network parameters of the first model to be trained +.>Condition (II)>Personal word->Conditional probability of (2).

T (throughput) characterizes the reasoning speed of the first model to be trained of the edge, and is defined as:，/>representing the total number of output tokens, < >>Representing the total time from inputting the test sample into the first model to be trained until all the outputs are obtained.

In addition, a first weight coefficientSecond weight coefficient->May be determined based on characteristics of the user's needs, edge computing platform characteristics, etc.,for making a trade-off between the confusion PS of the characterization accuracy and the throughput T of the characterization speed.

Further, the standard loss function is:

；

Fig. 2 is a flowchart of a cloud integrated embedded large language model training method according to an embodiment of the present invention. In this embodiment, the related information at least includes a super parameterFirst weight coefficient determined in advance according to user's demand and self-computing platform characteristics ∈>And a second weight coefficient->。

In step S12, when the confusion PS and the throughput T do not meet the user requirement, the step of sending relevant information to at least one server in communication connection with the server, so that the server trains the second model to be trained by combining the training sample and the regularized loss function, and issues the trained network parameters of the second model to be trained to the edge end includes:

when the confusion degree PS and the throughput rate T do not meet the user requirements, the parameters are exceededInitial value, first weight coefficient->And a second weight coefficient->To at least one server in communication with itself so that the server will add the first weight coefficient>Second weight coefficient->And Supermarameter->Substituting the initial value of the second model to be trained into a regularization loss function, and transmitting the trained network parameters of the second model to be trained to the edge after training the second model to be trained by using the training sample and the regularization loss function.

Specifically, please combine fig. 1-2, the edge first obtains the user demand, then inputs the test sample into the first model to be trained, obtains the confusion PS and throughput T of the first model to be trained through reasoning, compares the inferred confusion PS and throughput T with the user demand, if the user demand is satisfied, ends the process, and indicates that the first model to be trained is a natural predictive question-answering model capable of satisfying the user demand. Otherwise, obtain the first weight coefficientSecond weight coefficientHyper-parameters->The related information such as initial values of the second model to be trained is transmitted to the server in the form of messages, and then the server combines the related information to construct a regularized loss function so as to train the second model to be trained by utilizing training samples and the regularized loss function, and super parameters are adjusted in the training process>And after convergence is achieved, the trained network parameters of the second model to be trained are issued to the edge, the edge substitutes the received network parameters into the first model to be trained, reasoning is carried out again until the deduced confusion degree PS and throughput rate T meet the requirements of users, and then the trained language question-answering model can be obtained.

Optionally, the server trains the second model to be trained according to the following steps:

will beInputting a training sample to a second model to be trained;

judging whether the loss value converges or not; if not, adjusting the super-parameters in the regularized loss functionAnd return to the aboveInputting training samples into a second model to be trained; otherwise, the training is finished, and the network parameters of the second model to be trained after the training is finished are issued to the edge.

This embodimentIn, super parameterIs an empirical parameter for controlling the influence of the objective function IL on the Regularized Loss function adjusted Loss, a super parameter +.>The larger the second model to be trained will take into account the objective function IL during training, but the larger the super-parameter +.>The second model to be trained may not converge or be over-fitted; conversely, too small a superparameter +.>The consideration of the objective function IL is weakened and therefore a suitable superparameter +.>It can be obtained through multiple experiments.

Optionally, the edge terminal uses the super parameter in the form of messageInitial value, first weight coefficient->And a second weight coefficientThe message is sent to at least one server in communication connection with the server, and besides the content, the message further comprises: message ID, server name, edge end ID, edge end name, hash value of first model to be trained, hyper-parameters +.>Initial value, first weight coefficient->Second weight coefficient->And a timestamp, the message content is shown in Table 1:

TABLE 1

In table 1, "int", "string", "long" and "float" are data types, and represent integer, string, long integer and floating point types, respectively.

Fig. 3 is a flowchart of an embedded large language model training method applied to cloud integration of a server according to an embodiment of the present invention. As shown in fig. 3, an embodiment of the present invention provides a cloud integrated embedded large language model training method, which is applied to a server and includes:

s31, receiving related information sent by an edge terminal, and establishing a regularized loss function according to the related information;

s32, after a preset number of training samples are input into a second model to be trained, calculating a loss value of a regularized loss function;

s33, judging whether the loss value converges or not;

s34, if not, adjusting the super parameterAnd returning to the step of inputting the preset number of training samples into the second model to be trained;

s35, judging whether the loss value converges or not; if not, adjusting the super-parameters in the regularized loss functionAnd returning to the step of inputting the preset number of training samples into the second model to be trained; if yes, the training is finished, the network parameters of the second model to be trained after the training are issued to the edge end, so that the edge end substitutes the network parameters of the second model to be trained after the training into the first model to be trained, the confusion PS and the throughput rate T of the first model to be trained are obtained through reasoning,and when the inferred confusion degree PS and the inferred throughput rate T meet the user requirements, taking a first model to be trained which meets the user requirements as a language question-answering model obtained through training.

Specifically, the Regularized Loss function normalized Loss is:

；

where Standard Loss represents the Standard Loss function, IL represents the objective function,representing the super parameter.

The Standard Loss function Standard Loss is:

；

in the method, in the process of the invention,representing the number of natural language sequences entered into the second model to be trained,/->Representing the natural language sequence input to the second model to be trained +.>The%>Personal word element->Representing a desired output sequence corresponding to a natural language sequence input to the second model to be trained,/for>Representation->The%>Personal word element->Representation->The number of words involved,/->Representation->The%>Element(s)>Network parameters representing the second model to be trained, < +.>Representation->Middle->The number of elements to be added to the composition,is shown in given/>、/>And->Under the condition->Conditional probability of (2).

In this embodiment, the server may also send the network parameters of the trained second model to be trained to the edge in the form of a message.

The related information sent by the edge may include: initial value of super parameter lambda, first weight coefficient W determined in advance according to user demand and self-calculation platform characteristic _p And a second weight coefficient W _t First weight coefficient W _p Second weight coefficient W _t The method can be obtained according to the characteristics of user requirements, edge computing platform characteristics and the like, and is used for balancing the confusion degree PS of the characterization accuracy and the throughput rate T of the characterization speed, and the initial value of the super parameter lambda is a random number.

Fig. 4-5 are schematic diagrams of two application scenarios of the cloud integrated embedded large language model training method provided by the embodiment of the invention. For example, as shown in fig. 4, for a smaller scale natural language model, if it is desired to run on multiple different brands of handsets (edge terminals), then an "economy mode" may be employed, i.e., the edge terminals share the same server during training. In another application scenario with high performance requirements for the models, as shown in fig. 5, an edge end may perform collaborative training with multiple servers, and finally select one with optimal performance from the multiple obtained models.

Fig. 6 is a flowchart of a language question-answering method according to an embodiment of the present invention. As shown in fig. 6, the embodiment of the present invention further provides a language question-answering method, including:

s61, acquiring a problem input by a user;

s62, inputting the questions into a language question-answering model trained by the cloud integrated training method to obtain answers to the questions output in a natural language form.

According to the above embodiments, the beneficial effects of the invention are as follows:

In the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise.

The description of the terms "one embodiment," "some embodiments," "example," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.

The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.

Claims

1. The cloud integrated embedded large language model training method is characterized by being applied to an edge end and comprising the following steps of:

2. The cloud integrated embedded large language model training method of claim 1, wherein the regularization loss function is:

；

3. The cloud integrated embedded large language model training method of claim 2, wherein the objective function is:wherein->、/>Representing a first weight coefficient and a second weight coefficient respectively,。

4. the cloud integrated embedded large language model training method of claim 3, wherein the standard loss function is:

；

in the method, in the process of the invention,representing the number of natural language sequences entered into the second model to be trained,/->Representing the natural language sequence input to the second model to be trained +.>The%>Personal word element->Representing a desired output sequence corresponding to a natural language sequence input to the second model to be trained,/for>Representation->The%>Personal word element->Representation->The number of words involved,/->Representation->The%>The number of elements to be added to the composition,network parameters representing the second model to be trained, < +.>Representation->Middle->Element(s)>Indicated at given +.>、/>And->Under the condition->Conditional probability of (2).

5. The cloud integrated embedded large language model training method of claim 1, wherein the related information at least comprises super parametersFirst weight coefficient determined in advance according to user's demand and self-computing platform characteristics ∈>And a second weight coefficient->；

when the confusion degree PS and the throughput rate T do not meet the user requirements, the super-parameters are used for processing the dataInitial value of said first weight coefficient +.>And said second weight coefficient +.>To at least one server in communication with itself, so that the server will add the first weight coefficient +.>Said second weight coefficient->And said superparameter->Substituting the initial value of the second model to be trained into the regularized loss function, and transmitting the trained network parameters of the second model to be trained to the edge after training the second model to be trained by using the training sample and the regularized loss function.

6. The cloud integrated embedded large language model training method according to claim 5, wherein the server trains the second model to be trained according to the following steps:

will beInputting a training sample to a second model to be trained;

7. The cloud integrated embedded large language model training method according to claim 5, wherein the edge terminal uses a message to send the hyper-parametersInitial value of said first weight coefficient +.>And said second weight coefficient +.>To at least one server in communicative connection with itself.

8. The cloud integrated embedded large language model training method is characterized by being applied to a server and comprising the following steps of:

9. The cloud integrated training method of the embedded large language model according to claim 8, wherein the server issues parameters of the trained second model to be trained to the edge in the form of a message.

10. A method for language questions and answers, comprising:

acquiring a problem input by a user;

inputting the questions into a language question-answering model trained by the cloud integrated embedded large language model training method according to any one of claims 1-5 or 8-9 to obtain answers to the questions output in a natural language form.