CN117494717B

CN117494717B - Context construction method and system based on AI large language model

Info

Publication number: CN117494717B
Application number: CN202311818165.5A
Authority: CN
Inventors: 屠静; 赵策; 王亚; 苏岳; 万晶晶; 李伟伟; 颉彬; 周勤民; 张玥; 雷媛媛; 孙岩; 潘亮亮; 刘岩
Original assignee: Zhuoshi Future Beijing technology Co ltd
Current assignee: Zhuoshi Future Beijing technology Co ltd
Priority date: 2023-12-27
Filing date: 2023-12-27
Publication date: 2024-03-19
Anticipated expiration: 2043-12-27
Also published as: CN117494717A

Abstract

The invention relates to the technical field of processing analysis of natural language, in particular to a context construction method and system based on an AI large language model. According to the method, firstly, the task correlation of an upstream task and a downstream task is obtained according to the theme coincidence degree of corresponding texts between corresponding training sets of the upstream task and the downstream task, and then, in the pre-training process of the upstream task, the reward and punishment reference weight lost in the pre-training process is obtained according to the prediction accuracy of the upstream task and the task correlation; and then, adjusting the large language model according to the reward and punishment reference weight, and adjusting the learning target of the model by iterative adjustment and pre-training continuously, so that the adjusted pre-trained model can be subjected to fine adjustment in the task of the special field and can be stored and applied. The invention combines the correlation of the upstream task and the downstream task to introduce a punishment mechanism to the pre-training of the model, strengthens the correlation of the model and the downstream task, and improves the context construction accuracy of the large language model.

Description

Context construction method and system based on AI large language model

Technical Field

The invention relates to the technical field of processing analysis of natural language, in particular to a context construction method and system based on an AI large language model.

Background

The large language model has strong generating capability and context understanding capability, is more suitable for processing text tasks in the real world, has more natural dialogue, is more in line with the human language expression mode, can improve the interaction experience between a system and a user, and provides a powerful and flexible tool for processing different types of text tasks.

However, if the model pre-trained on the general text data is directly applied to the specific field task, the model obtains a more abstract general understanding in the pre-training process of the upstream task, but a more fine granularity or a more special understanding may be required for the downstream specific field task, so that the model cannot capture the context information or semantic association required by the specific field task, and the context construction effect and efficiency of the large language model are affected, thereby affecting the execution effect and efficiency of the downstream specific field task.

Disclosure of Invention

In order to solve the technical problem that the prior large language model pre-training cannot be generalized to a special field so as to cause poor context construction effect of the model, the invention aims to provide a context construction method and system based on an AI large language model, and the adopted technical scheme is as follows:

the invention provides a context construction method based on an AI large language model, which comprises the following steps:

respectively acquiring training sets of an upstream task and a downstream task of a large language model, wherein the training sets comprise sentence pairs formed by all segmentation texts for training corresponding tasks;

acquiring task correlation of an upstream task and a downstream task according to the topic coincidence degree of texts corresponding to the training sets between the upstream task and the downstream task; in the pre-training process of the upstream task, obtaining the punishment and punishment reference weight lost in the pre-training process according to the prediction accuracy of the upstream task and the task correlation;

adjusting a large language model according to the reward and punishment reference weight; pre-training the upstream task by using the adjusted large language model, acquiring the reward and punishment reference weight in the corresponding pre-training process and adjusting the large language model, and continuously and iteratively acquiring the reward and punishment reference weight and adjusting the large language model until the preset cut-off condition is met; and taking the large language model meeting the preset cut-off condition as a pre-trained final large language model.

Further, the task correlation acquisition method includes:

respectively acquiring topic sets of texts corresponding to the training sets and corresponding topic distribution probabilities of each topic, wherein the topic sets correspond to upstream tasks and downstream tasks; acquiring task correlation of an upstream task and a downstream task according to a task correlation calculation formula; the calculation formula of the task correlation is as follows:

wherein, the->Task dependencies between upstream and downstream tasks; />For the number of topics in the union of the upstream task corresponding topic collection and the downstream task corresponding topic collection,the topic number in the intersection of the topic set corresponding to the upstream task and the topic set corresponding to the downstream task is set; />For the upstream task corresponding topic set and the downstream task corresponding topic set +.>Difference of topic distribution probability between the same topics, < ->Is a preset positive constant.

Further, the method for acquiring the reward and punishment reference weight comprises the following steps:

in each pre-training process, respectively acquiring the correct prediction times and the incorrect prediction times, and acquiring the attention weight average value of all the correctly predicted segmented text or sentence pairs and the attention weight average value of all the incorrectly predicted segmented text or sentence pairs; acquiring rewarding parameters according to the correct number of predictions and the corresponding attention weight average value and combining the task correlation; according to the number of prediction errors and the corresponding attention weight average value, combining the task correlation to obtain penalty parameters;

obtaining the reward and punishment reference weight according to a calculation formula of the reward and punishment reference weight; the calculation formula of the prize and punishment reference weight is as follows:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is->The reward and punishment reference weight corresponding to the pre-training process of the secondary upstream task; />Is->A reward parameter of the pre-training process of the secondary upstream task; />Is->Penalty parameters for the pre-training process of the secondary upstream task.

Further, the method for acquiring the reward parameter comprises the following steps:

obtaining the rewarding parameters according to a calculation formula of the rewarding parameters; the calculation formula of the reward parameter is:

wherein, the->Is->A reward parameter of the pre-training process of the secondary upstream task; />To predict the correct symbol; />Is->Predicting correct times in the pre-training process of the secondary upstream task; />Task dependencies between upstream and downstream tasks; />Is->Predicting the corresponding attention weight average value of the correct segmentation text or sentence pairs in the pre-training process of the secondary upstream task; />Is a natural constant.

Further, the method for acquiring the penalty parameter comprises the following steps:

acquiring punishment parameters according to a calculation formula of the punishment parameters; the calculation formula of the punishment parameters is as follows:

wherein, the->Is->Penalty parameters for the pre-training process of the secondary upstream task; />A symbol that is a prediction error; />Is->Predicting the number of errors in the pre-training process of the secondary upstream task;task dependencies between upstream and downstream tasks; />Is->The corresponding attention weight average value of the wrong segmentation text or sentence pair is predicted in the pre-training process of the secondary upstream task; />Is a natural constant.

Further, the method for adjusting the large language model comprises the following steps:

in the first pre-training process of the upstream task, acquiring initial training loss of the upstream task in the first pre-training process; multiplying the reward and punishment reference weight by the initial training loss to obtain the reward and punishment training loss, training the large language model based on the reward and punishment training loss, and taking the large language model obtained after training as an adjusted large language model;

in each non-first pre-training process of an upstream task, the reward and punishment training loss acquired in the previous pre-training process is the initial training loss of the pre-training process, the reward and punishment reference weight of the pre-training process is multiplied by the initial training loss of the pre-training process, the reward and punishment training loss of the pre-training process is obtained, the large language model is trained based on the reward and punishment training loss of the pre-training process, and the large language model obtained after training is used as the adjusted large language model.

Further, the acquisition method of the topic collection and the corresponding topic distribution probability is an LDA algorithm.

Further, the preset cut-off condition is that the reward and punishment training loss is smaller than or equal to a preset constant value.

Further, the sentence pair obtaining method includes:

cutting text data for corresponding training tasks through a BPE algorithm to obtain cut text; and taking any two sentences subjected to text segmentation as a sentence pair.

The invention also provides a context construction system based on the AI large language model, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes the steps of any context construction method based on the AI large language model when executing the computer program.

The invention has the following beneficial effects:

according to the method, firstly, the task correlation of an upstream task and a downstream task is obtained according to the subject coincidence degree of corresponding texts between corresponding training sets of the upstream task and the downstream task, a reward punishment mechanism is introduced into the task correlation so as to strengthen the connection between model training and the downstream task, and then, in the pre-training process of the upstream task, the reward punishment reference weight lost in the pre-training process is obtained according to the prediction accuracy of the upstream task and the task correlation, and the reward punishment reference weight enables the model to pay more attention to the execution of the behavior of obtaining rewards, and meanwhile, the behavior of causing punishment is avoided as much as possible, so that the convergence rate of the model is adjusted; and then, adjusting the large language model according to the reward and punishment reference weight, and adjusting the learning target of the model by iterative adjustment and pre-training continuously, so that the adjusted pre-trained model can be subjected to fine adjustment in the task of the special field and can be stored and applied. The invention combines the correlation of the upstream task and the downstream task to introduce a punishment mechanism to the pre-training of the model, strengthens the correlation of the model and the downstream task, and improves the context construction accuracy of the large language model.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for constructing a context based on an AI large language model according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purposes, the following detailed description is given below of a context construction method based on an AI large language model according to the present invention, and the detailed description of the specific embodiments, the structure, the features and the effects thereof is given below in conjunction with the accompanying drawings and the preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of a context construction method based on an AI large language model provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a method flowchart of a context construction method based on an AI large language model according to an embodiment of the invention is shown, including the following steps:

step S1, training sets of an upstream task and a downstream task of a large language model are respectively obtained, wherein the training sets comprise sentence pairs formed by all segmentation texts for training corresponding tasks.

In order to improve the construction effect and efficiency of the large language model, the invention firstly obtains the task correlation of the upstream and downstream tasks according to the field theme coincidence condition of the training text corresponding to the upstream and downstream tasks, and further introduces a reward and punishment mechanism in combination with the accuracy of prediction in the pre-training process of the upstream tasks, thereby intervening the descending speed of the loss function in the convergence process, adjusting the learning target of the model, leading the model to pay more attention to the execution to obtain the behavior of rewarding, avoiding the execution to the greatest extent, thereby enhancing the relevance of the model and the downstream tasks, and improving the context construction accuracy and efficiency of the large language model.

In one embodiment of the invention, firstly, text data for upstream task pre-training and downstream task fine tuning of a large language model are obtained from a large-scale text corpus, wherein a training set for the upstream pre-training task is a universal text, and multi-domain model data is contained as far as possible so that the model has certain understanding and generalization capability; the training set for the downstream fine tuning training task is text data with labels in specific fields, so that the model is adapted to specific requirements of specific fields; in order to facilitate the subsequent training of the text data, the acquired text data needs to be correspondingly preprocessed so as to improve the data quality; in the embodiment of the invention, firstly, data cleaning is carried out on the collected text data to remove repeated invalid data, and then, a byte pair encoding (Byte Pair Encoding, BPE) text word segmentation method is used for segmenting the text data so as to facilitate subsequent mask prediction; and then, any two sentences subjected to segmentation are built into a sentence pair, wherein the sentences in the sentence pair can be two adjacent sentences with related semantics or two irrelevant sentences randomly selected from the text data, and all the sentence pairs built after the text data segmentation are used as training sets of corresponding tasks. The collection of training set text, data cleaning, text segmentation and sentence pair construction are model training preprocessing techniques well known to those skilled in the art, and are not described in detail herein. In other embodiments of the invention, the practitioner may employ other means to collect training text and perform corresponding preprocessing, depending on the particular implementation.

It should be noted that, the large language model according to the embodiment of the present invention is a deep bi-directional training model (Bidirectional Encoder Representations from Transformers, BERT) based on semantic understanding, and because the training of the large language model is a multi-task learning process, in order to facilitate the context construction of the large language model, in the embodiment of the present invention, the upstream task and the downstream task are respectively analyzed as an overall task, so as to integrally adjust the model; in other embodiments of the present invention, the implementer may also perform a one-to-one correlation analysis on each of the upstream tasks and the downstream tasks, and perform a corresponding adjustment of the large language model according to the correlation of each pair of the upstream tasks and the pre-training process of the corresponding upstream task.

Step S2, acquiring task correlation of an upstream task and a downstream task according to the topic coincidence degree of corresponding texts between training sets corresponding to the upstream task and the downstream task; and in the pre-training process of the upstream task, acquiring the punishment and punishment reference weight lost in the pre-training process according to the prediction accuracy and the task correlation of the upstream task.

The large language model generally has two stages of pre-training and fine-tuning, and in the pre-training stage of an upstream task, the model uses large-scale unlabeled text to perform self-supervision learning, so that the complexity and the context relation of natural language can be better captured, and the model has stronger generalization and adaptability; and in the fine tuning stage of the downstream task, the model uses the labeling text of the supervised task to fine tune the pre-trained large language model, and trains the model on the task in the specific field so as to improve the performance of the model on the task in the field.

However, the relevance of the upstream task and the downstream task is not directly considered by the large language model, if the difference of the fields where the training set texts of the upstream task and the downstream task are located is large or the understanding of the upstream task to the downstream specific field is relatively abstract and general, the model may not perform well in the downstream task, the professional terms of the specific field cannot be understood, and further, the model is wrongly inferred or deviated in the downstream task, so that the performance of the model is affected. Therefore, the embodiment of the invention firstly acquires the task correlation of the upstream task and the downstream task according to the theme coincidence degree of the corresponding texts between the training sets corresponding to the upstream task and the downstream task, and further introduces a reward and punishment mechanism to the pre-training process of the upstream task according to the task correlation so as to adjust the learning target and strengthen the correlation of the model and the downstream task.

Preferably, in an embodiment of the present invention, the task relevance obtaining method includes obtaining a topic set of text corresponding to a training set corresponding to an upstream task and a downstream task, and a topic distribution probability corresponding to each topic, respectively; and acquiring the task correlation of the upstream task and the downstream task according to a task correlation calculation formula.

In the embodiment of the invention, a latent dirichlet allocation model (Linear Discriminant Analysis, LDA) is specifically adopted to obtain corresponding topic distribution probability and topic set by respectively carrying out modeling comparison on text data in training sets of upstream and downstream tasks. The application of LDA is well known to those skilled in the art and will not be described in detail herein.

The calculation formula of the task correlation is as follows:

wherein,task dependencies between upstream and downstream tasks; />For the union of the topic sets corresponding to the upstream tasks and the topic sets corresponding to the downstream tasksConcentrated topic quantity, < >>The topic number in the intersection of the topic set corresponding to the upstream task and the topic set corresponding to the downstream task is set; />For the upstream task corresponding topic set and the downstream task corresponding topic set +.>Difference of topic distribution probability between the same topics, < ->For presetting a positive constant, there is a possibility that the corresponding domain topics in the training set of the upstream and downstream tasks are completely different, so +.>Taking 1 to ensure that the numerator and denominator are not 0.

In the calculation formula of the task dependency,the bigger and the more>Smaller, all illustrate that the topics in the upstream task generic text approach to the topics in the downstream task specific annotation text, which may be more related in terms of topic; meanwhile, the smaller the topic distribution probability difference of the same topic is, the greater the probability that topics expressed by text contents between the topic distribution probability difference and the topic distribution probability difference is consistent is, and the greater the correlation between the topic distribution probability difference and the topic distribution probability difference is on the topic; and combining the two through multiplication, wherein if the corresponding product is smaller, the task correlation is larger as a denominator, so that the correlation of the special labeling of the upstream task and the downstream task is larger.

In the pre-training stage of the upstream task, mask prediction and next sentence continuous prediction are required to be carried out on data in the training set so as to judge the learning effect of the pre-training stage, and model parameters are adjusted according to the learning effect so as to minimize loss and achieve the expected training effect; if the model can pay more attention to the context related to the downstream task in the non-supervision learning process of the pre-training stage, and make corresponding rewards and punishments for each training, the model is guided to be more self-adaptive to the requirements of the downstream task in the training process, so that the model has good performance in the downstream task. Therefore, in the pre-training process of the upstream task, the embodiment of the invention acquires the punishment and punishment reference weight lost in the pre-training process according to the prediction accuracy and the task correlation of the upstream task.

Preferably, in one embodiment of the present invention, considering that there is a certain correlation between upstream and downstream tasks, for an upstream task with a larger correlation with a downstream task, penalty is increased if the model is wrong during training, so as to avoid the behavior of the model which is further executed and causes penalty; for an upstream task with smaller correlation with a downstream task, the more rewards are given to the model when a correct result appears, so as to guide the model to adapt to the requirement of the downstream task in the training process, and the model has better adaptability in the downstream task. Based on the method, the method for acquiring the reward and punishment reference weight comprises the steps of respectively acquiring the correct prediction times and the incorrect prediction times in each pre-training process, and acquiring the attention weight average value of all the correctly predicted segmented text or sentence pairs and the attention weight average value of all the incorrectly predicted segmented text or sentence pairs; acquiring rewarding parameters according to the correct number of predictions and the corresponding attention weight average value and combining task correlation; according to the number of prediction errors and the corresponding attention weight average value, acquiring punishment parameters by combining task correlation, and then acquiring punishment reference weights according to a calculation formula of the punishment reference weights.

It should be noted that, in the embodiment of the present invention, since the pre-training stage of the task upstream of the BERT large language model includes two prediction processes, which are respectively a mask language model (Masked Language Model, MLM) and a next sentence prediction (Next Sentence Prediction, NSP), the number of prediction correctness in each training process is the sum of the number of prediction correctness in the two prediction processes of MLM and NSP; correspondingly, the number of prediction errors is the sum of the number of prediction errors in the two prediction processes. Meanwhile, in each pre-training process, the attention weight of each correctly predicted segmentation text or sentence pair is obtained from the model. The MLM, NSP prediction process and the corresponding segmentation text or sentence acquisition of the corresponding attention weights are well known to those skilled in the art, and will not be described in detail herein.

The calculation formula of the prize and punishment reference weight is as follows:

wherein (1)>Is->The reward and punishment reference weight corresponding to the pre-training process of the secondary upstream task; />Is->A reward parameter of the pre-training process of the secondary upstream task; />Is->Penalty parameters for the pre-training process of the secondary upstream task.

In a calculation formula of the reward and punishment reference weight, the penalty parameters are combined with the reward parameters after carrying out negative correlation operation, so that the reward and punishment parameters in the corresponding training process are obtained; when the rewarding parameter is larger than the punishment parameterIn order to be positive, the more attention is paid to the execution of the behavior of obtaining rewards in the training process in the pre-training of the upstream task, namely, the more tendency is to learn text semantics related to the downstream task; conversely, when the reward parameter is less than the penalty parameter +.>When the model is negative, the text semantics irrelevant to the downstream task are learned in the pre-training of the upstream task by the explanation model; by punishing the prize and punish parameters->After adding 1, represent punishment reference weight, when +.>If the weight is positive, namely the weight of the reward and punishment reference weight is greater than 1, the loss in the pre-training process is increased so as to accelerate the loss convergence; when->When negative, namely the reward and punishment reference weight is smaller than 1, the loss in the pre-training process is reduced, so that convergence is slowed down.

Preferably, in one embodiment of the present invention, the bonus parameters are obtained according to a calculation formula of the bonus parameters; the calculation formula of the reward parameter is:

wherein,is->A reward parameter of the pre-training process of the secondary upstream task; />To predict the correct symbol; />Is->Predicting correct times in the pre-training process of the secondary upstream task; />Task dependencies between upstream and downstream tasks; />Is->Predicting the corresponding attention weight average value of the correct segmentation text or sentence pairs in the pre-training process of the secondary upstream task; />Is a natural constant.

In the calculation formula of the rewarding parameter, the attention weight average value reflects the firstThe pre-training process of the secondary upstream task predicts the importance of the correct segmentation text or sentence pair relative to all other segmentation text or sentence pairs, the task correlation between the upstream task and the downstream task is combined with the attention weight average corresponding to the correct segmentation text or sentence pair through multiplication, and the negative correlation is mapped into an exponential function for normalization, the smaller the task correlation and the relative importance are, the smaller the corresponding exponential weight value is, which means that in the upstream task with the smaller correlation with the downstream task, the model can accurately predict the continuity of the segmentation text or sentence with the lower importance mask in the self-learning process without supervision, and the model is more suitable for the downstream task and the corresponding rewards are carried out; meanwhile, if the number of predicted correct times is larger, the description model is more suitable for downstream tasks, and the corresponding index weight and the predicted correct number are combined through multiplication to represent the rewarding parameter.

Preferably, in one embodiment of the present invention, the penalty parameter is obtained according to a calculation formula of the penalty parameter; the calculation formula of the punishment parameters is as follows:

wherein,is->Penalty parameters for the pre-training process of the secondary upstream task; />A symbol that is a prediction error; />Is->Predicting the number of errors in the pre-training process of the secondary upstream task; />Task dependencies between upstream and downstream tasks; />First->The corresponding attention weight average value of the wrong segmentation text or sentence pair is predicted in the pre-training process of the secondary upstream task; />Is a natural constant.

In the calculation formula of penalty parameters, the attention weight average value reflects the firstThe pre-training process of the secondary upstream task predicts the importance of the wrong segmented text or sentence pair relative to all other segmented text or sentence pairs, merges the task correlation between the upstream task and the downstream task with the attention weight average corresponding to the wrong segmented text or sentence pair through multiplication, and maps to +.>The greater the task correlation and relative importance, the corresponding normalized weightsThe larger the weight value is, the more relevant the model can mispredict the continuity of texts or sentences segmented by masks with higher importance in the unsupervised self-learning process in the upstream task with higher correlation with the downstream task, and the more corresponding punishment should be carried out; meanwhile, if the number of times of prediction errors is larger, the description model is not suitable for downstream tasks, and the corresponding normalized weight and the number of times of prediction errors are combined through multiplication to represent punishment parameters.

Step S3, adjusting a large language model according to the reward and punishment reference weight; pre-training the upstream task by using the adjusted large language model, acquiring the reward and punishment reference weight in the corresponding pre-training process and adjusting the large language model, and continuously and iteratively acquiring the reward and punishment reference weight and adjusting the large language model until the preset cut-off condition is met; and carrying out a large language model which meets the preset cut-off condition for the first time as a final large language model for pre-training.

Because the reward and punishment reference weight reflects the learning tendency of the model in the upstream task pre-training process, the embodiment of the invention adjusts the large language model according to the reward and punishment reference weight, so that the model is more prone to execute the action of obtaining the reward, and meanwhile, the action of causing the punishment is avoided, so that the upstream task training of the model is more suitable for the requirement of the downstream task.

Preferably, in one embodiment of the present invention, the method for adjusting a large language model includes, during a first pre-training process of an upstream task, acquiring an initial training loss of the upstream task during the first pre-training process; multiplying the reward and punishment reference weight by the initial training loss to obtain the reward and punishment training loss, training the large language model based on the reward and punishment training loss, and taking the large language model obtained after training as the adjusted large language model; in each non-first pre-training process of an upstream task, the reward and punishment training loss acquired in the previous pre-training process is the initial training loss of the pre-training process, the reward and punishment reference weight of the pre-training process is multiplied by the initial training loss of the pre-training process, the reward and punishment training loss of the pre-training process is obtained, the large language model is trained based on the reward and punishment training loss of the pre-training process, and the large language model obtained after training is used as the adjusted large language model. And (3) using the reward and punishment training loss to carry out back propagation to update the parameters of the model, starting from the reward and punishment training loss, carrying out back propagation along the model network to calculate the gradient of each model parameter, and simultaneously selecting an adaptive learning rate optimization algorithm Adam to adaptively adjust the learning rate so as to update the model parameters and obtain the adjusted large language model.

It should be noted that, the obtaining of the initial training loss, the back propagation adjustment model parameters, and the Adam algorithm adaptive adjustment learning rate are all well known techniques for those skilled in the art, and are not described herein. In other embodiments of the present invention, model parameters may be adjusted by other means, such as bayesian optimization, genetic algorithm, and the like, and learning rate may be adjusted by other adaptive learning rate optimization algorithms.

Training the upstream task by using the adjusted large language model, acquiring the reward and punishment reference weight in the corresponding pre-training process, readjusting the large language model, and continuously and iteratively acquiring the reward and punishment reference weight and adjusting the large language model until the preset cut-off condition is met; and taking the large language model which meets the preset cut-off condition for the first time as a pre-trained final large language model. In one embodiment of the invention, the preset cutoff condition is that the penalty and punishment training loss is smaller than or equal to a preset constant value, wherein the preset constant value is set to be 0.1, when the penalty and punishment training loss is smaller than or equal to 0.1, the large language model is considered to have the expected fitting effect in the pre-training process of the upstream task, namely, the model has good performance in the downstream task corresponding to the specific field while the complexity and the context relation of the natural language can be well captured, and the updating training of the model is stopped at the moment and is used as the final large language model in the pre-training process.

And further fine-tuning the final large language model obtained in the pre-training stage of the upstream task by using the labeling text of the supervised task in the downstream task training set, and continuously fine-tuning model parameters on the basis of pre-training so as to enable the model to adapt to the requirements of the tasks in the specific field. The tuning of large language models is well known to those skilled in the art and will not be described in detail herein.

After the fine adjustment of the downstream task is completed, the model parameters and related configurations of the final large language model are saved for subsequent model application. It should be noted that, when the model is stored, it is required to ensure that the model parameters when the model reaches the final expected loss after pre-training and fine tuning and the segmentation method adopted during text training set preprocessing are set, for example, in an inference task, if the segmentation method adopted on text data during training and inference application is inconsistent, the input of the model may be inconsistent, and the final performance of the model is further affected.

In summary, according to the method, firstly, the task correlation of the upstream task and the downstream task is obtained according to the topic coincidence degree of the corresponding texts between the corresponding training sets of the upstream task and the downstream task, and then, in the pre-training process of the upstream task, the prize and punishment reference weight lost in the pre-training process is obtained according to the prediction accuracy of the upstream task and the task correlation; and then, adjusting the large language model according to the reward and punishment reference weight, and adjusting the learning target of the model by iterative adjustment and pre-training continuously, so as to fine-tune and save the adjusted pre-trained model in the special field task, thereby obtaining and saving the trained large language model. The invention combines the correlation of the upstream task and the downstream task to introduce a punishment mechanism to the pre-training of the model, strengthens the correlation of the model and the downstream task, and improves the context construction accuracy of the large language model.

The invention also provides a context construction system based on the AI large language model, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor realizes any one of the steps of the context construction method based on the AI large language model when executing the computer program.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims

1. A method of context construction based on AI large language models, the method comprising:

adjusting a large language model according to the reward and punishment reference weight; pre-training the upstream task by using the adjusted large language model, acquiring the reward and punishment reference weight in the corresponding pre-training process and adjusting the large language model, and continuously and iteratively acquiring the reward and punishment reference weight and adjusting the large language model until the preset cut-off condition is met; taking the large language model meeting the preset cut-off condition as a pre-trained final large language model;

the task correlation acquisition method comprises the following steps:

wherein, the->For a task phase between an upstream task and a downstream taskA relationship; />For the number of topics in the union of the topic sets corresponding to the upstream task and the topic sets corresponding to the downstream task,/>The topic number in the intersection of the topic set corresponding to the upstream task and the topic set corresponding to the downstream task is set; />For the upstream task corresponding topic set and the downstream task corresponding topic set +.>Difference of topic distribution probability between the same topics, < ->Is a preset positive constant;

the method for adjusting the large language model comprises the following steps:

in each non-first pre-training process of an upstream task, the reward and punishment training loss obtained in the previous pre-training process is the initial training loss of the pre-training process, the reward and punishment reference weight of the pre-training process is multiplied by the initial training loss of the pre-training process to obtain the reward and punishment training loss of the pre-training process, the large language model is trained based on the reward and punishment training loss of the pre-training process, and the large language model obtained after training is used as an adjusted large language model;

the preset cut-off condition is that the reward and punishment training loss is smaller than or equal to a preset constant value.

2. The context construction method based on AI big language model of claim 1, wherein the obtaining method of the reward and punishment reference weight comprises:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is->The reward and punishment reference weight corresponding to the pre-training process of the secondary upstream task; />Is->A reward parameter of the pre-training process of the secondary upstream task; />Is->Pre-training of secondary upstream tasksPenalty parameters for a pass.

3. The context construction method based on AI large language model of claim 2, wherein the obtaining method of the reward parameter comprises:

4. The context construction method based on AI large language model of claim 2, wherein the penalty parameter obtaining method comprises:

wherein, the->Is->Penalty parameters for the pre-training process of the secondary upstream task; />A symbol that is a prediction error; />Is->Predicting the number of errors in the pre-training process of the secondary upstream task; />Task dependencies between upstream and downstream tasks; />Is->The corresponding attention weight average value of the wrong segmentation text or sentence pair is predicted in the pre-training process of the secondary upstream task; />Is a natural constant.

5. The context construction method based on the AI large language model of claim 1, wherein the topic collection and the obtaining method of the corresponding topic distribution probability are LDA algorithms.

6. The context construction method based on the AI large language model of claim 1, wherein the sentence pair obtaining method comprises:

7. An AI large language model-based context construction system comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing the steps of an AI large language model-based context construction method as claimed in any one of claims 1-6 when the computer program is executed.