CN116610789A

CN116610789A - Accurate low-cost large language model using method and system

Info

Publication number: CN116610789A
Application number: CN202310836247.6A
Authority: CN
Inventors: 汤猛帆; 周万江; 张旭中
Original assignee: Zhongke Jiji Huzhou Information Technology Co ltd
Current assignee: Zhongke Jiji Huzhou Information Technology Co ltd
Priority date: 2023-07-10
Filing date: 2023-07-10
Publication date: 2023-08-18

Abstract

The invention belongs to the technical field of automatic text generation of an AI response system, and particularly relates to an accurate low-cost large language model using method and system. The method comprises the following steps: s1, creating a dialog box; s2, inputting a current problem in a dialog box; s3, outputting a current question answer in a dialog box by using a large language model with the lowest cost; s4, judging whether the current question answer is qualified or not, if so, taking the current question answer as a final answer of the current question, returning to the step S2, and if not, executing the step S5; s5, outputting a current question answer in a dialog box by using a next large language model with higher use cost, and returning to the step S4; in step S4, the large language model that has outputted the answer to the current question is trained for use in the next input of the question based on the current question and the final answer to the current question. The invention can answer the questions presented by the user on the basis of considering accuracy and use cost.

Description

Accurate low-cost large language model using method and system

Technical Field

The invention belongs to the technical field of automatic text generation of an AI response system, and particularly relates to an accurate low-cost large language model using method and system.

Background

The large language (GPT, generating Pre-trained Transformer) refers to a class of large language models developed by OpenAI based on the Transformer architecture. These models learn the structure, semantics, and context information of a language from a large amount of text data through extensive, unsupervised pre-training.

The large language model has strong language understanding and generating capability and can be applied to a plurality of natural language processing tasks, such as text generation, machine translation, dialogue system, question-answering system, abstract generation and the like. These models can not only generate coherent text, but also infer and answer questions based on context.

The GPT model is an autoregressive-based generation model that works on the principle that by learning the probability distribution of a text sequence during a pre-training phase, the next word or character is generated from the above when generating the text. The model has high flexibility and creativity, and can generate natural language text consistent with the input context.

Currently, openAI has proposed multiple versions of large language models, such as GPT-2, GPT-3, and so on. These models have achieved significant results in the field of natural language processing and demonstrate powerful generation and understanding capabilities in various application scenarios. The development of large language models has great significance to the fields of natural language processing and artificial intelligence, and provides powerful tools and technical support for various text-related tasks.

In addition to GPT-2 and GPT-3, there are large language models, which have different characteristics in terms of cost, accuracy and field specificity, for example, GPT-4, which is popular at present, has an accuracy of more than 90%, but uses cost of $ 30 per 1 million characters, and GPT-J only needs $ 0.2 under the condition of the same sample character amount, but has a comprehensive accuracy of only about 78%.

It is therefore a matter of depth how to select an appropriate language model or design a language model application scheme for a specific problem. How to design a set of language model selection mechanism, greatly reduce the use cost of the language model on the premise of ensuring the accuracy of answers, or improve the accuracy of the language model on the premise of equal cost is a problem to be solved.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides an accurate and low-cost large language model using method and system. Questions posed by the user can be answered on the basis of both accuracy and cost of use.

The invention adopts the following technical scheme:

the first aspect of the embodiment of the invention provides an accurate low-cost large language model using method, which comprises the following steps:

s1, creating a dialog box;

s2, inputting a current problem in a dialog box;

s3, outputting a current question answer in a dialog box by using a large language model with the lowest cost;

s4, judging whether the current question answer is qualified or not, if so, taking the current question answer as a final answer of the current question, returning to the step S2, and if not, executing the step S5;

s5, outputting a current question answer in a dialog box by using a next large language model with higher use cost, and returning to the step S4;

in step S4, after the final answer of the current question is output, if there is a large language model that has already output the answer of the current question, training the large language model that has already output the answer of the current question based on the current question and the final answer of the current question for the next time of inputting the question.

Preferably, step S2 further includes: inputting a local database related to the problem in a dialog box;

the steps between the step S2 and the step S3 also comprise the steps of:

A. obtaining an initial question answer based on matching of the current question and a local database;

B. judging whether the initial question answer is qualified or not, if so, taking the initial question answer as a final answer of the current question, returning to the step S2, and if not, executing the step S3.

Preferably, all large language models output the current answers to questions based on all the contents in the dialog box.

As a preferred scheme, the large language model compresses the contents in the dialog box before outputting the answer to the current question based on all the contents in the dialog box, and does not compress the question which does not output the final answer in the compression process, nor does the large language model compress the local database in the dialog turn which executes the input of the local database.

Preferably, the compressing the content in the dialog box includes the steps of:

the method comprises the steps of calculating the compression rate of the content of different rounds based on the conversation rounds of the content and the total conversation rounds which are completed currently;

and compressing the contents of different rounds based on the calculated compression rates.

As a preferable scheme, the compression rate calculation formula of the content of different rounds is as follows:

P _Kt the compression rate of the content in the t-th dialog turn after the K-th dialog turn has been completed is represented, t represents the dialog turn of the content, K represents the completed dialog turn, λ is a negative number, and λ represents a preset compression rate adjustment value.

Preferably, in step S4, when there is a correlation between the next question and the previous question, the process returns to step S2, and when there is no correlation between the next question and the previous question, the process returns to step S1.

Preferably, the step S3 further includes the steps of: selecting a plurality of preliminary screening large language models corresponding to the current problem field;

and step S3, selecting a corresponding large language model from the plurality of preliminary screening large language models in step S5, and outputting a current question answer in a dialog box.

In the preferred scheme, in the step S3 and the step S5, after outputting the answer, the user can manually mark the accuracy of the output answer or the background expert can mark the accuracy of the output answer;

in step S4, whether the current question answer is qualified or not is automatically judged according to the historical answer accuracy marking data.

The second aspect of the embodiment of the invention provides an accurate low-cost large language model using system, which comprises a dialog box creating module, a dialog module, a large language model module, a training module and a judging module, wherein the dialog module and the large language model module are respectively connected with the dialog box creating module, the training module is respectively connected with the large language model module and the dialog box creating module, and the judging module is respectively connected with the dialog box creating module and the training module;

a dialog box creation module for creating a dialog box;

a dialogue module for inputting questions in a dialogue box;

the large language model module is used for sequentially adopting the large language models with the incremental use cost to output the answers of the questions in the dialog box until the output answers of the questions are qualified;

the judging module is used for judging whether the answer of the question output in the dialog box is qualified or not;

and the training module is used for training the large language model which outputs the answers of the questions according to the questions and the final answers of the questions so as to be used when the questions are input next time.

The beneficial effects of the invention are as follows:

in the invention, firstly, the large language model with the lowest cost is used for outputting answers to questions, if the answers are qualified, the large language model with higher cost is not used continuously, and if the answers are unqualified, the large language model with higher cost is used, so that the use cost of the large language model is reduced as much as possible.

In the invention, after the final answer of the current question is output, if the low-cost large language model which has already output the answer of the current question exists, the low-cost large language model which has already output the answer of the current question is trained based on the current question and the final answer of the current question, so that the trained low-cost large language model can directly output qualified answers of the questions when similar questions are input next time, and the high-cost large language model is not required to be used. It should be noted that, training is performed on all low-cost large language models which have already output the answers of the current questions, so that when similar questions are input next time, a trained low-cost large language model can directly output qualified answers of the questions. The use cost of the large language model is further reduced.

In the invention, before formally using the large language model to output the answer, the answer is matched based on the input local database, if the answer is qualified by matching, the large language model is not needed, and the use cost of the large language model is further reduced.

In the invention, all large language models output current answers to questions based on all contents in the dialog box, so that consistency and accuracy of the answers to questions are ensured, but the disadvantage of such processing is that the amount of the prompt characters is excessive, and in order to further reduce the use cost of the large language models, the step of compressing the contents in the dialog box is added before the large language models output the current answers to questions based on all contents in the dialog box.

Because of the difference in the specific attack field of the large language model, the step S3 in the invention further comprises the following steps: selecting a plurality of preliminary screening large language models corresponding to the current problem field; and step S3, selecting a corresponding large language model from the plurality of preliminary screening large language models in step S5, and outputting a current question answer in a dialog box. Minimizing the number of improper large language models that are tried before screening them for.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of an accurate low cost method of using a large language model in accordance with the present invention.

Fig. 2 is an overall flowchart of the compression process.

Fig. 3 is a schematic diagram of compression rate when a fourth dialog round is performed.

Fig. 4 is a schematic diagram of the compression rate when the sixth dialog round is performed.

FIG. 5 is a schematic diagram of an accurate low cost large language model usage system according to the present invention.

Detailed Description

The following specific examples are presented to illustrate the present invention, and those skilled in the art will readily appreciate the additional advantages and capabilities of the present invention as disclosed herein. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

Embodiment one:

referring to fig. 1, the present embodiment provides an accurate low-cost large language model using method, which includes the steps of:

s1, creating a dialog box;

s2, inputting a current problem in a dialog box;

Wherein, judging whether the current question answer is qualified can adopt two modes:

the first way is:

the subjective judgment is directly carried out on the currently output answer to the question by the user, so that the cost is not required in the judgment process, and the disadvantage is that only obvious incorrect answers can be eliminated.

The second way is:

in the step S3 and the step S5, after outputting the answer, the user can manually mark the accuracy of the output answer or the background expert personnel marks the accuracy of the output answer;

what needs to be explained here is: the marking and outputting of the answers by the background expert personnel is not carried out every time, if the answers of the questions do not have marking data, the answers are pushed to the background expert personnel for marking, and if the answers of the questions have the preset number of expert marks, the answers of the questions are not pushed to the background expert personnel for marking, so that the number of times of marking by the expert is reduced, and the cost is controlled.

In step S4, whether the current question answer is qualified or not is automatically judged according to the historical answer accuracy marking data. The background can set corresponding weights for the user mark data and the expert mark data in the automatic judging process so as to improve the reliability of the judging result.

What should be stated here is also: because different users have different question methods aiming at the same or similar questions, answers output by the large language model also have certain access, and the background automatically judges whether the answers of the current questions are qualified or not according to historical answer accuracy marking data, and judges aiming at marking data of all answers corresponding to the similar questions.

The advantage of this approach is that the accuracy of the final output answer is greatly improved.

It can be seen that in this embodiment, the answer to the question is output by using the large language model with the lowest cost, if the answer is qualified, the large language model with higher cost is not used continuously, and if the answer is unqualified, the large language model with higher cost is used, so that the use cost of the large language model is reduced as much as possible.

What needs to be explained here is: in this embodiment, a plurality of low-cost large language models may be used before the high-cost large language model outputs the final answer, so that the cost of answering the question is increased compared with directly using the high-cost large language model to output the final answer, but the cost of using different large language models is quite different, such as the current popular GPT-4, the cost of using the same GPT-J is $ 30 per 1 million characters, and the cost of using the same promtt character amount of GPT-J is only $ 0.2, so that even if the GPT-4 is finally used to answer the question, the cost generated by the GPT-J used before can be ignored. The embodiment can avoid directly using the high-cost large language model to output the answer which can be output by the low-cost large language model as far as possible, for example, the answer can be output by GPT-J, and the use cost is greatly increased by using GPT-4 to output the answer.

In this embodiment, after the final answer of the current question is output, if there is a low-cost large language model that has already output the answer of the current question, the low-cost large language model that has already output the answer of the current question is trained (i.e., fine-tuned) based on the current question and the final answer of the current question, so that when a similar question is input next time, the trained low-cost large language model can directly output a qualified answer of the question without using the high-cost large language model. It should be noted that, training is performed on all low-cost large language models which have already output the answers of the current questions, so that when similar questions are input next time, a trained low-cost large language model can directly output qualified answers of the questions. The use cost of the large language model is further reduced. It should be further noted that, in this embodiment, the training mode is performed during the use process, so that the performance of the low-cost large language model can be continuously improved during the use process, but the cost of answering the questions is guaranteed.

Further, the low-cost large language model which has outputted the answer of the current question can be selectively trained based on the current question and the final answer of the current question, specifically:

in step S4, after outputting the final answer of the current question, if there is a large language model that has already output the answer of the current question, training a plurality of large language models that have output the answer of the current question and have high answer rationality based on the current question and the final answer of the current question, and marking the trained large language models for use when inputting the question next time.

In step S3, specifically: judging whether large language models which are trained in relation to the current input problem exist before according to the marks, if so, using the large language model with the lowest cost in the large language models which are trained in relation to the current input problem, outputting the current problem answer in a dialog box, and if not, using the large language model with the lowest cost in all the large language models, and outputting the current problem answer in the dialog box;

in step S5, similarly, if there is a large language model with higher cost of next use is used in the large language models which are trained in a related manner, the current question answer is output in the dialog box, if there is no large language model with higher cost of next use is used in all the large language models, and the current question answer is output in the dialog box.

That is, when similar questions are input next time, the answer can be directly output from the large language model with the lowest cost selected from the large language models which are trained in a related way, so that the cost of answering the questions is further reduced, and the answer rationality can be automatically judged through the background according to historical answer accuracy marking data.

It should be noted that the relevant training means: through training of similar questions and answers corresponding to the similar questions.

The training process of the large language model is not the focus of this embodiment, and will not be described in detail here.

More specifically:

the step S2 further includes: inputting a local database related to the problem in a dialog box;

the steps between the step S2 and the step S3 also comprise the steps of:

Therefore, in this embodiment, before formally using the large language model to output the answer, the answer is matched based on the input local database, and if the answer is qualified by matching, the large language model is not needed, so that the use cost of the large language model is further reduced.

Further, all large language models output the current question answer based on all the content in the dialog box.

Referring to fig. 2, 3, and 4, the large language model compresses the contents of the dialog box before outputting the answer to the current question based on all the contents of the dialog box, and does not compress the questions for which no final answer is output during the compression process, nor does it compress the local database during the dialog turn for which the input of the local database is performed.

The compressing process of the content in the dialog box comprises the steps of:

Compression ratio= (original data size-compressed data size)/original data size x 100%, i.e., the larger the compression ratio, the larger the amplitude of compression, the smaller the compressed data.

The compression rate calculation formula of the content of different rounds is as follows:

P _Kt after K conversation passes are completed, the compression rate of the content of the t conversation pass is represented, t represents the conversation pass of the content, K represents the completed conversation pass, lambda represents a preset compression rate adjustment value for adjusting the compression change rate between different conversation passes to be a negative value smaller than 0, and the compression rate can be specifically set according to requirements.

It should be noted that, the completion flag of one dialog turn is that an answer is output in the dialog box, and the input question, the local database, and the initial question answer output based on the matching of the local database all belong to the content in the same dialog turn.

Referring to fig. 2, the compression process only occurs after at least one dialog turn has been completed in the dialog box, otherwise, the compression process is not performed (it should be noted that, in order to simplify the drawing, no dialog turn for outputting answers to questions based on local database matching is added in fig. 2).

The compression ratio is explained below, referring to fig. 3, for example, three dialog runs are completed, and when the fourth dialog run is to be executed, then the compression ratio of the first dialog run isThe compression ratio of the second dialog turns is +.>The compression rate of the third dialog round is e ^λ 。

Referring to FIG. 4, for example, five dialog runs are completed, and when the sixth dialog run is to be executed, then the compression rate of the first dialog run isThe compression ratio of the second dialog turns is +.>The compression rate of the third dialog turn is +.>The compression ratio of the fourth dialog turn is +.>The compression rate of the fifth dialog round is e ^λ 。

From the above, the smaller the compression ratio is after the dialog turns, the smaller the compression amplitude is, i.e The content of the dialog of the same round gradually increases with the increase of the completed rounds of the dialog, the compression ratio and the compression amplitude gradually increase, such as +.>

The compression ratio calculation formula adopted specifically can be adjusted according to actual conditions or requirements, and only needs to satisfy the following requirements: 1. the smaller the compression rate of the content is, the smaller the compression amplitude is; 2. the content of the dialog of the same round is gradually increased along with the increase of the completed rounds of the dialog, and the compression ratio and the compression amplitude are gradually increased. The preferred compression ratio calculation formula is adopted in the present embodiment.

The above-mentioned compressing process does not compress the questions that do not output the final answer (the integrity of the questions before the final answer is not output can be guaranteed), and the compressing process is not performed on the local database in the dialogue round in which the input of the local database is performed, which will be described in detail with reference to fig. 4, for example:

if the first to fifth dialog turns are executed for the first question and the final answer of the first question is output after the fifth dialog turn is completed, the second question needs to be input again and the fifth question needs to be input again when the sixth dialog turn is executedA local database corresponding to the second problem, which is based onCompressing the local database entered in the first question and in the first session, but not compressing the second question and the local database entered in the sixth session;

if the first to fifth dialog turns are all executed for the first question and the final answer of the first question is not output after the fifth dialog turn is completed, the sixth dialog turn is executed without being based onThe first problem is compressed but needs to be based on +.>The local database entered in the first session is compressed.

It should be noted that, in the embodiment, the compression process adopts a segment mode to compress, the fields with corresponding percentages can be directly intercepted from the initial character of the content, or the intermediate characters in the content can be randomly selected as the initial characters to intercept the fields with corresponding percentages, so that the compression operation cost is lower. In this embodiment, the original text may be summarized and compressed, and in general, the method involves a machine learning model, and may be used when the cost of summarizing and reasoning using the machine learning model is low.

The compression process described above is similar to the time-decoding method, which is a method for processing time-series data. It is based on one assumption: over time, past data has less impact on current predictions. Thus, in time series analysis, earlier data points will be given lower weight, while newer data points will be given higher weight. the time-decoding method is a method of modeling time-series data using time information. It fully accounts for the decaying effects of past data points, thereby improving the predictive power of future values. By adopting the method, on the basis of ensuring the consistency and accuracy of outputting answers to questions, the amount of the prompt characters is reduced as much as possible, and particularly, a local database is required to be input aiming at the questions in the embodiment, so that the use cost of a large language model can be further reduced by the compression processing.

Further, in step S4, when there is a correlation between the next question and the previous question, the process returns to step S2, and when there is no correlation between the next question and the previous question, the process returns to step S1. That is, a dialog box is independently created for each group with relevance, so as to avoid redundancy of contents.

Further, the step S3 is preceded by the steps of: selecting a plurality of preliminary screening large language models corresponding to the current problem field;

Because the special attack fields of the large language models have differences, the use cost of the large language models can be increased to a certain extent from the large language model with the lowest use cost for any problem, so that the field classification is carried out on all the large language models in the embodiment, a plurality of preliminary screening large language models corresponding to the current problem field can be selected according to the field to which the problem belongs after the problem is input, the corresponding large language models are selected from the plurality of preliminary screening large language models, and the current problem answer is output in a dialog box. The method reduces the number of improper large language models to try before screening the proper large language models as much as possible, and further reduces the use cost of the large language models.

Embodiment two:

referring to fig. 5, the present embodiment provides an accurate low-cost large language model usage system, which is based on the accurate low-cost large language model usage method described in the first embodiment, and includes a dialog box creation module, a dialog module, a large language model module, a training module, and a judgment module, where the dialog module and the large language model module are respectively connected with the dialog box creation module, the training module is respectively connected with the large language model module and the dialog box creation module, and the judgment module is respectively connected with the dialog box creation module and the training module;

a dialog box creation module for creating a dialog box;

a dialogue module for inputting questions in a dialogue box;

Further, the dialogue module is also used for inputting a local database in the dialogue box;

and the large language model module is also used for matching the answers of the questions based on the questions and the local database, outputting the matched answers (namely the initial answers of the questions) in the dialog box if the matching is successful, using a large language model with corresponding use cost if the matching is failed, and outputting the answers of the questions in the dialog box based on all contents in the dialog box.

The large language model module comprises a dialogue turn recording unit and a compression unit which are respectively connected with the dialogue box creation module, wherein the dialogue turn recording unit is used for giving dialogue turns generated by contents in the dialogue box and recording the total dialogue turns which are completed currently, the compression unit is used for calculating the compression rate of the contents generated by different turns according to the dialogue turns generated by the contents and the total dialogue turns which are completed currently, and the compression processing is carried out on the contents generated by different turns based on the calculated compression rates.

Further, the system also comprises a domain division module which is respectively connected with the dialogue module and the large language model module, wherein the domain division module is used for dividing the domain of the user input problem and is also used for dividing the specialized domain of the large language model;

the large language model module also comprises a preliminary screening unit, wherein the preliminary screening unit is used for selecting a plurality of preliminary screening large language models corresponding to the problem field, and then selecting corresponding large language models from the plurality of preliminary screening large language models, and outputting the current problem answers in a dialog box.

The system also comprises a foreground marking module and a background marking module. The foreground marking module is used for enabling a user to manually mark the accuracy of the output answer after the answer is output. The background marking module is used for enabling background expert personnel to mark and output answer accuracy. The judging module is respectively connected with the foreground marking module and the background marking module, and automatically judges whether the current question answer is qualified or not according to the historical answer accuracy marking data.

It should be noted that, the system for using a large language model with accurate and low cost provided in this embodiment is similar to the embodiment, and will not be described in detail herein.

The above examples are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the protection scope of the present invention without departing from the design spirit of the present invention.

Claims

1. An accurate low-cost large language model using method is characterized by comprising the following steps:

s1, creating a dialog box;

s2, inputting a current problem in a dialog box;

2. The method for using an accurate low cost large language model according to claim 1, wherein step S2 further comprises: inputting a local database related to the problem in a dialog box;

the steps between the step S2 and the step S3 also comprise the steps of:

3. An accurate low cost large language model usage method according to claim 2, wherein all large language models output current answers to questions based on all contents in dialog box.

4. A method of using an accurate low cost large language model according to claim 3, wherein the large language model compresses the contents of the dialog box before outputting the answer to the current question based on all the contents of the dialog box, and does not compress the questions for which no final answer is output during the compression process, nor does it compress the local database during the dialog turn for which the input of the local database is performed.

5. The method for using an accurate low cost large language model according to claim 4, wherein compressing contents in the dialog box comprises the steps of:

6. The method for using an accurate low cost large language model according to claim 5, wherein the compression rate calculation formula of the contents of different rounds is:

7. The method of claim 1, wherein in step S4, when there is a correlation between the next question and the previous question, the method returns to step S2, and when there is no correlation between the next question and the previous question, the method returns to step S1.

8. The method for using an accurate low cost large language model according to claim 1, further comprising the step of, before step S3: selecting a plurality of preliminary screening large language models corresponding to the current problem field;

9. The method for using accurate low-cost large language model according to claim 1, wherein in step S3 and step S5, after outputting the answer, the answer accuracy can be manually marked by the user or the answer accuracy can be marked by the background expert;

10. An accurate low-cost large language model using system based on the accurate low-cost large language model using method of any one of claims 1-9, characterized by comprising a dialog box creating module, a dialog module, a large language model module, a training module and a judging module, wherein the dialog module and the large language model module are respectively connected with the dialog box creating module, the training module is respectively connected with the large language model module and the dialog box creating module, and the judging module is respectively connected with the dialog box creating module and the training module;

a dialog box creation module for creating a dialog box;

a dialogue module for inputting questions in a dialogue box;