CN117217191A - Prompt processing method, device, equipment and storage medium of language model - Google Patents

Prompt processing method, device, equipment and storage medium of language model Download PDF

Info

Publication number
CN117217191A
CN117217191A CN202310448055.8A CN202310448055A CN117217191A CN 117217191 A CN117217191 A CN 117217191A CN 202310448055 A CN202310448055 A CN 202310448055A CN 117217191 A CN117217191 A CN 117217191A
Authority
CN
China
Prior art keywords
template
prompt
language model
sentence
bias
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310448055.8A
Other languages
Chinese (zh)
Inventor
吴秉哲
马焕
张长青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202310448055.8A priority Critical patent/CN117217191A/en
Publication of CN117217191A publication Critical patent/CN117217191A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a prompt processing method, a device, equipment and a storage medium of a language model; comprising the following steps: acquiring a first prompt template, a first bias index and a training set of a language model to be optimized; the following process is iteratively performed: traversing the training set to perform non-return sampling, generating a prompt based on a sentence sample obtained by each sampling and a corresponding label, merging the prompt and the first prompt template into a second prompt template, acquiring a second bias index, selecting the second prompt template corresponding to the second bias index with the minimum value as a candidate optimization template, and updating the first prompt template based on the candidate optimization template; and responding to the second bias index with the minimum value obtained by the current iteration being greater than or equal to the first bias index, and taking the candidate optimization template obtained by the current iteration as a prompt template after the language model is optimized. According to the application, on the premise of saving computing resources, the prompt template applicable to the language model can be queried efficiently.

Description

Prompt processing method, device, equipment and storage medium of language model
Technical Field
The present application relates to artificial intelligence technology, and in particular, to a method, an apparatus, a device, and a storage medium for processing a prompt for a language model.
Background
In recent years, with the growth of high-quality data accumulated in industry, the growth of computing resources and the development of large language models and training techniques, large language models are widely used in the fields of translation, dialogue, and the like. The applicant finds that the construction of the prompt of the language model generally depends on factors such as selection of samples, arrangement order and the like, and the difference of the factors often causes great difference of final performance, and under a huge search space, how to find an optimal prompt template to improve the performance of the prediction task of the language model at the downstream, so that no effective technical scheme exists in the related art.
Disclosure of Invention
The embodiment of the application provides a method, a device, electronic equipment, a computer readable storage medium and a computer program product for processing a prompt language of a language model, which can efficiently query a prompt language template suitable for the language model on the premise of saving computing resources.
The technical scheme of the embodiment of the application is realized as follows:
The embodiment of the application provides a prompt processing method of a language model, which comprises the following steps:
acquiring a first prompt template to be optimized of a language model;
acquiring a first bias index of the first prompt template;
acquiring a training set of the language model, wherein the training set comprises a plurality of statement samples and labels corresponding to the statement samples;
the following process is iteratively performed:
traversing the training set to perform non-return sampling, generating a prompt based on a sentence sample obtained by each non-return sampling and a label corresponding to the sentence sample,
combining the prompt with the first prompt template to form a second prompt template,
acquiring second bias indexes of the second prompt template corresponding to each sentence sample, and taking the second prompt template corresponding to the second bias index with the smallest value in all acquired second bias indexes as a candidate optimization template;
and responding to the second bias index with the minimum value obtained by the current iteration to be larger than or equal to the first bias index, and taking the candidate optimization template obtained by the current iteration as the prompt template after the language model is optimized.
The embodiment of the application provides a prompt processing device of a language model, which comprises:
the first acquisition module is used for acquiring a first prompt template to be optimized of the language model;
the second acquisition module is used for acquiring a first bias index of the first prompt template;
a third obtaining module, configured to obtain a training set of the language model, where the training set includes a plurality of sentence samples and labels corresponding to the sentence samples;
the updating module is used for iteratively executing the following processes: traversing the training set to perform non-return sampling, and generating a prompt based on a sentence sample obtained by each non-return sampling and a label corresponding to the sentence sample; combining the prompt with the first prompt template to form a second prompt template; acquiring second bias indexes of the second prompt template corresponding to each sentence sample, and taking the second prompt template corresponding to the second bias index with the smallest value in all acquired second bias indexes as a candidate optimization template;
and the selection module is used for responding to the fact that the second bias index with the minimum value obtained by the current iteration is larger than or equal to the first bias index, and taking the candidate optimization template obtained by the current iteration as the prompt template after the language model is optimized.
An embodiment of the present application provides an electronic device, including:
a memory for storing computer executable instructions or computer programs;
and the processor is used for realizing the prompt processing method of the language model provided by the embodiment of the application when executing the computer executable instructions or the computer programs stored in the memory.
The embodiment of the application provides a computer readable storage medium which stores computer executable instructions or a computer program for realizing the prompt processing method of the language model provided by the embodiment of the application when being executed by a processor.
The embodiment of the application provides a computer program product, which comprises a computer program or a computer executable instruction, wherein the computer program or the computer executable instruction realizes the prompt processing method of the language model provided by the embodiment of the application when being executed by a processor.
The embodiment of the application has the following beneficial effects:
calculating a second bias index corresponding to the second cue template by calling the language model for multiple times, comparing the second bias index with the first bias index to determine the cue template with the minimum bias index obtained by the traversal, thereby obtaining the cue template with the minimum bias index obtained by the traversal, namely a local optimal solution, and taking the candidate optimization template with the minimum bias index as the optimized cue template by traversing the training set for multiple times, thereby obtaining the cue template corresponding to the optimal bias index under the condition that the training set is not required to be subjected to exhaustive search, reducing the computational complexity of search, saving calculation resources and realizing search results similar to the global optimal solution.
Drawings
FIG. 1 is a schematic diagram of a natural language processing system of a language model according to an embodiment of the present application;
fig. 2 is a schematic structural diagram of a server according to an embodiment of the present application;
FIG. 3A is a schematic diagram of a language model according to an embodiment of the present application;
FIG. 3B is a schematic diagram of converting text information into vectors according to an embodiment of the present application;
FIG. 3C is a schematic diagram of a language model processing text classification task provided by an embodiment of the application;
FIG. 3D is a schematic diagram of a language model for processing text fill tasks according to an embodiment of the present application;
FIG. 3E is a schematic diagram of a language model processing sentence matching task provided by an embodiment of the present application;
FIG. 3F is a schematic diagram of a language model processing statement question-answering task provided by an embodiment of the application;
FIGS. 4A-4C are schematic flow diagrams of a method for processing a prompt in a language model according to an embodiment of the present application;
FIG. 5 is a flowchart of a method for processing a resume evaluation prompt template of a language model according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a resume evaluation prompt template provided by an embodiment of the present application;
FIG. 7 is a schematic diagram of a resume evaluation prompt template under scenario learning provided by an embodiment of the present application;
FIG. 8 is a schematic diagram of sample selection of an emotion classification task according to an embodiment of the present application;
FIG. 9 is a schematic diagram of the effect of optimized alert templates on language model performance provided by an embodiment of the present application.
Detailed Description
The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.
In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the embodiments of the application is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.
It should be noted that, in the embodiment of the present application, related data such as user information and past resumes is related, when the embodiment of the present application is applied to a specific product or technology, permission or consent of the user needs to be obtained, and collection, use and processing of related data need to comply with related laws and regulations and standards of related countries and regions.
Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.
1) Large language model (Large Language Models, LLM): simply referred to as language Model, a machine learning Model, such as a transformer-based bi-directional encoder (Bidir ectional Encoder Representation from Transformers, BERT) Model, capable of processing and generating natural language, generates a Pre-Training Model (GPT) Model. The predictive tasks of the language model may include text classification, completion fill, question-answer, resume evaluation, and the like.
2) Prompt (Prompt): the language model is also called a prompt term and a prompt word, which are texts conforming to the natural language rule to guide or excite the language model to complete the prediction task.
3) A prompt template: templates for generating hints may structurally include 2 types of locations (also referred to as slots), one type of locations for filling in text (denoted by X) that requires input of a language model, and another type of locations for filling in results (denoted by Z) of a desired language model prediction, with the number of locations of each type being at least 1. For example, if the language model is to be subjected to an emotion classification task, an emotion such as "the sentence expresses a 'Z' may be added to the input sentence, and then two options such as" positive "or" negative "are given to let the language model predict the value of" Z ".
4) Bias index: the degree to which the prompt is used to quantify the prediction results that the language model deviates from fairness. The output of the predictive task of the language model also contains a prediction result that is fair to a certain attribute, such as region, gender, academy, social status, etc., due to the bias contained in the prompt template.
5) Evenly distributed: also called rectangular distribution, is a symmetric probability distribution, i.e. the probability of the distribution being the same at intervals of the same length. The uniform distribution is defined by two parameters a and b, which are minimum and maximum values on the number axis, commonly abbreviated as U (a, b).
6) KL distance (Kullback-Leibler Divergence): also called Kullback-Leibler difference, relative Entropy (Relative Entropy) is used to measure the difference of two probability distributions in the same event space. The physical meaning is as follows: in the same event space, each event corresponding to the probability distribution P (x) is coded by the probability distribution Q (x), and the average coding length of each basic event (symbol) is increased by a bit.
7) Global optimal solution: refers to a problem/goal under a certain condition/environment, if a decision is optimal compared with all decisions that can solve the problem, or the solution of a matter is optimal in a larger range or higher dimension, or is optimal in a certain plane, in vivo, i.e. the decision is called a globally optimal solution.
8) Local optimal solution: it is meant that the solution to a problem is optimal within a certain range or area, or that the means of solving the problem or achieving the goal is optimal within a certain range or limit.
With the growth of high quality data accumulated in industry, the growth of computing resources and the development of large language models such as GPT3 are widely used in the fields of translation, dialogue systems, advertisement recommendation, etc. Compared with the traditional model with smaller scale, the large language model has higher model parameter quantity, calculated amount and memory capacity, and also has stronger expression capacity and data fitting capacity, thereby greatly improving the performance ceilings of the neural network model in various services, and even greatly exceeding the human expert level in many tasks. The most powerful capability of the large language model represented by GPT3 is that small sample Learning can be performed through scene Learning (In-context Learning) without adjusting the original model parameters, so that the scene Learning enables the large language model to be quickly migrated to various downstream tasks, and downstream developers can quickly construct new applications by means of the capability of the large language model.
With the advent of the phenomenon-level product ChatGPT related to large language models, practitioners in various fields also recognize the great potential of large language models to be applied to traditional vertical industries (e.g., finance, law), and related teams of various large companies begin to enable business digitized upgrades by using large language models. For example, product descriptions are generated for search engine optimization, intelligent customer service, translation, etc., scenarios based on large language models. The language model generates probabilities by training a text corpus of one or more languages, and for a given word sequence of any length m, the trained language model assigns probabilities P (w 1 ,w 2 ,…w m ) To the whole sequence, since language can be used to express an infinite number of valid sentences (digital infinite character), the language model needs to assign non-zero probabilities to a plurality of sequences that are valid in language, respectively, while most of these sequences may never be encountered in training data. Several modeling methods have been devised to overcome this problem, such as applying markov assumptions or using neural architectures, such as recurrent neural networks or transformers.
The application is thatIn embodiments using a large language model, the model architecture may be built based on a transducer. As an example, the language model may be applied to the remaining words (Next-word pre-text) that make up an incomplete sentence, specifically given the embedded expression (w 1 ,w 2 ,…w t ) The large language model may predict the probability (probability of belonging to a word) distribution of the next word in the vocabulary: p (w) t+1 |w 1 ,…w t ) The method comprises the steps of carrying out a first treatment on the surface of the After the prediction is completed, the word with the highest probability is selected as the next word candidate of the original sentence. The large language model is trained by using the data of a large-scale corpus (such as Wikipedia) so as to have the capability of context-aware learning, and thus the large language model can be directly used for solving downstream tasks such as resume evaluation, emotion analysis, sentence translation and the like without using downstream data training.
However, applicants have found that due to variations in training samples (Training Examples), example orders (Exa sample orders), and Prompt Formats (Prompt Formats), language models may exhibit a high degree of instability when subjected to context learning, and thus constructing appropriate Prompt templates is critical to improving the performance of the model in context learning. Related art generally attempts to build a suitable alert template from two directions: (1) hint adjustment in coding space (Prompt Tuning); (2) A Prompt search (Prompt search) is performed in the original space. The key idea in which hint adjustment is to inject task-specific embedded vectors (emmbeddings) into the hidden layer, and then adjust these emmbeddings using gradient-based optimization; the prompt search is to automatically find a template given a set of training inputs X and y, that is, a large text corpus (e.g., wikipedia) is grabbed by using a character string containing X and y, and find an intermediate word or a dependency path between the inputs and the outputs, and finally take the intermediate word or the dependency path with the largest occurrence number as the template of the "[ X ] intermediate word [ Z ]. However, these methods require modification of the original inference process of the model and obtain model gradients, and are impractical to apply to black-box large language model services such as GPT-3 and ChatGPT. Furthermore, there is a need in the related art to optimize the alert templates either globally or locally, which can result in significant computing resources.
The embodiment of the application provides a method, a device, an electronic device, a computer readable storage medium and a computer program product for processing a prompt of a language model, which can quickly find an optimal prompt template on the premise of greatly reducing computing resources, and the following description shows an exemplary application of the prompt processing device of the language model provided by the embodiment of the application. In the following, an exemplary application when the electronic device is implemented as a server will be described.
Referring to fig. 1, fig. 1 is a schematic architecture diagram of a language model natural language processing system provided in an embodiment of the present application, where a terminal 200 is connected to a server 400 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.
The terminal 200 sends a request for optimizing the prompt template to the server 400 through the network 300, the server 400 obtains the optimized prompt template after calculation, the optimized prompt template is sent to the terminal 200 through the network 300, a user corresponding to the terminal 200 inputs the prompt into the template according to the optimized prompt template, the terminal 200 or the server 400 calls a language model to predict specific tasks, and accordingly unbiased prediction results are obtained, and the prediction results are displayed on the terminal.
The scheme of the embodiment of the application belongs to an artificial intelligence (Artificial Intelligence, AI) technology, wherein the artificial intelligence is a comprehensive technology of computer science, and the machine has the functions of sensing, reasoning and decision by researching the design principles and implementation methods of various intelligent machines. Artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, such as natural language processing technology, machine learning/deep learning and other directions, and with the development of technology, the artificial intelligence technology will be applied in more fields and has an increasingly important value.
An exemplary application scenario for performing different downstream tasks is described below using the language model as BERT.
In the context of text classification tasks, referring to fig. 3C, fig. 3C is a schematic diagram of a language model for processing text classification tasks according to an embodiment of the present application; when the input text needs to be classified, a symbol [ CLS ] representing classification is added at the beginning of the input corpus (such as resume text), then the output of the BERT model to the position is input to a classifier (such as a softmax classifier), and a class is predicted by the classifier. After the parameters in the BERT are finely adjusted, the other characters in the sentence are respectively subjected to category prediction, the prediction probability distribution of the text to be classified (such as resume) which respectively belongs to each type (such as a plurality of levels of resume) is output, and then the category with the highest prediction probability is selected as the category of the whole text.
In a context of a text blank filling task, referring to fig. 3D, fig. 3D is a schematic diagram of a language model provided by an embodiment of the present application for processing the text blank filling task; after adding a symbol [ CLS ] to the head of the sentence, inputting the sentence into a B ERT model to obtain a feature vector, and then calling a classifier to calculate and output the prediction probability distribution of each character in the dictionary belonging to the slot to be filled in the prompt.
In a scenario of a sentence matching task, referring to fig. 3E, fig. 3E is a schematic diagram of a language model provided by an embodiment of the present application for processing the sentence matching task; when a premise (a first sentence) is given and a hypothesis (a second sentence) is given, a symbol [ CLS ] is added before the first sentence, after a separation symbol [ SEP ] is added between the first sentence and the second sentence, the sentence is input into a BERT model for prediction processing, then the output of the BERT model on the [ CLS ] position is input into a classifier, the prediction probability distribution of whether the hypothesis is correct, incorrect or unknown is output through the classifier, and the maximum probability is selected as a prediction result of the hypothesis under the premise.
In the context of a question-answering task, referring to fig. 3F, fig. 3F is a schematic diagram of a language model processing sentence question-answering task provided by an embodiment of the present application; when processing a question-answering task, inputting an article and a question (an answer appears in the article) into a language model, firstly separating the question and the article by [ SEP ], adding [ CLS ] in a question header, then outputting marks of characters in a text, respectively carrying out dot product processing on each mark by using two extra vectors (a start vector and an end vector), thereby obtaining the probability that each character is related to the start vector and the end vector, and respectively taking the character corresponding to the mark with the highest probability as the start word and the end word, namely outputting two parameters s and e by the language model, wherein the two parameters represent that the answer of the question appears in the s-th word to the e-th word of the article.
In some embodiments, the server 400 may be a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, or may be a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, content delivery networks (Content Delivery Network, CDNs), and basic cloud computing services such as big data and artificial intelligence platforms.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a server provided in an embodiment of the present application, and a server 400 shown in fig. 2 includes: at least one processor 410, a memory 430, at least one network interface 420. The various components in server 400 are coupled together by bus system 440. It is understood that the bus system 440 is used to enable connected communication between these components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 440.
The processor 410 may be an integrated circuit chip having signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, a digital signal processor (Digital Signal Processor, DSP), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.
Memory 430 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 430 optionally includes one or more storage devices physically remote from processor 410.
Memory 430 includes volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (Random Access Memory, RA M). The memory 430 described in the embodiments of the present application is intended to comprise any suitable type of memory.
In some embodiments, memory 430 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.
An operating system 431 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;
network communication module 432 for reaching other electronic devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (Universal Serial Bus, USB), etc.;
In some embodiments, the prompt processing apparatus of the language model provided in the embodiments of the present application may be implemented in a software manner, and fig. 2 shows a prompt processing apparatus 433 of the language model stored in a memory 430, which may be software in the form of a program, a plug-in, or the like, and includes the following software modules: the first acquiring module 4331, the second acquiring module 4332, the third acquiring module 4333, the updating module 4334 and the selecting module 4335 are logical, so that any combination or further splitting can be performed according to the implemented functions. The functions of the respective modules will be described hereinafter.
The method for processing the prompt in the language model provided by the embodiment of the application will be described in connection with the exemplary application and implementation of the server provided by the embodiment of the application.
In the following, an electronic device for implementing the method for processing a prompt in a language model according to the embodiment of the present application is taken as an example of a server, and the method for processing a prompt in a language model according to the embodiment of the present application is described, so that the execution subject of each step will not be described repeatedly.
Referring to fig. 4A, fig. 4A is a flowchart of a method for processing a prompt in a language model according to an embodiment of the present application, and will be described with reference to the steps shown in fig. 4A.
In step 101, a first prompt template of a language model to be optimized is obtained.
In some embodiments, the first prompt template of the language model to be optimized may be constructed by methods such as manual construction, automatic generation, and hidden space.
The manual construction method can construct templates capable of filling in sentences by utilizing various data sets, namely a complete filling Prompt template (Cloze Prompt), and sequentially verifies whether a pre-training model can predict missing words, for example, "Dante was born in (; or, regarding all natural language processing tasks as text generation tasks, adding a 'template for filling subsequent texts on the basis of prefixes', namely a Prefix Prompt template (Prefix Prompt), giving an additional Prefix condition information to a language model to guide the model to generate the subsequent texts, embedding the token after inputting a token sequence into the model, splicing the token after the Prefix Prompt template, and inputting the token into a coding layer of the language model; the knowledge distillation method can also be adopted, namely, a plurality of prompts are constructed for one task at the same time, fine-tuning (Fine-tune) is carried out on the model based on each prompt, then the prediction results of the models are fused, namely, through multi-round Fine tuning, the training data are expanded by using labels produced by other Fine-tuning models in the previous round for Fine-tuning data of each prompt in each round, so that the information of each prompt template can be fused, and the fused prompt template is obtained.
The automatic generation mode is that a plurality of templates which can be used as prompt are mined from a large number of corpus, for example, after the input [ X ] of the templates and the [ Z ] corresponding to the labels are determined, matching can be carried out in a massive text library, sentences containing [ X ] and [ Z ] are found out, and then the sentences are utilized to construct the prompt template; the words can be traversed from the word list and randomly combined into the prompt template, so that the prompt template can finally generate the words to be filled in the training data, and the reverse processing of the text filling task is equivalent. The words to be filled in the prompt template are initialized by using a MASK (MASK), and then other words are used for replacing the MASK, so that the probability of correct labels corresponding to the filled words obtained after the prompt template is input into the model is maximum, and the prompt template is obtained.
Hidden space templates refer to learning a text vector in hidden space, which cannot be mapped to a specific word, but is in the same vector space as the embedded vector (embedding) of each word, and the automatically generated templates do not need to guarantee that the template is a real text, so that the template has a larger flexible space.
In the embodiment of the application, the first prompt template to be optimized meeting the requirement of the downstream task can be quickly constructed through the one or more template construction schemes, so that the first prompt template to be optimized can be conveniently and further optimized later, and the optimal prompt template is obtained.
In step 102, a first bias indicator for a first prompt template is obtained.
In some embodiments, referring to fig. 4B, fig. 4B is a flowchart illustrating a method for processing a prompt in a language model according to an embodiment of the present application, and step 102 shown in fig. 4A may be implemented by steps 1021 to 1023 in fig. 4B, which are described in detail below.
In step 1021, the sentence in the first prompt template is replaced with the semantic character-free sentence, so as to obtain a replaced first prompt template.
In some embodiments, a first prompt template t p Comprises at least one original sentence which is a sentence in a training set, and the original sentence in a first prompt template is replaced by a semantic-free character [ N/A ]]The first prompt template after replacement is obtained, so that interference of original sentences on prejudice indexes of the prompt template after the sentence samples and the labels corresponding to the sentence samples are inserted is avoided.
By way of example, a semantic-free character may be any character that does not express specific semantics, such as a code symbol having a program meaning, including "+|! The expression symbols "+, -, ++, -, + =, - =," etc., if these characters are separated and word-segmented alone, specific semantic features cannot be represented, and the dimension of the constructed feature vector can be increased.
For example, in a question-answering task, one of the original sentences included in the first notice template is "the sushi is a famous literature of north song, eyebrow mountain man, eyebrow state. Where is sushi? "replace original sentence in first prompt template with semanteme-less character, i.e. every character of the above-mentioned sentence is substituted with semanteme-less character [ N/A ]]Replacing to obtain a replaced first prompt template
In step 1022, the language model is invoked to determine the predictive probability distribution of the replaced first prompt template.
In some embodiments, referring to fig. 3B, fig. 3B is a schematic diagram illustrating conversion of text information into vectors according to an embodiment of the present application; including input (text information), word embedded vector sequences, sentence embedded vector sequences, and position embedded vector sequences. First, for two sentences "my dog is cut", "he lik es playing" input, the sentences are first labeled (token), i.e. the text is converted into a plurality of elements, also called tokens, and special elements are inserted at the beginning of the first sentencePlain [ CLS ]]For marking the beginning of sentences, inserting SEP at the end of the second sentence]For marking the end position of sentences and inserting [ SEP ] between two sentences ]For separating two sentences; the word embedding vector layer of the language model then converts each element into a vector, e.g., vector E for "my" in FIG. 3B my Representation, to obtain a word embedding vector sequence, the sentence embedding vector of each element of the first sentence is assigned 0, the sentence embedding vector of each element of the second sentence is assigned 1, the sequence number used to characterize the sentence, e.g., the first sentence "my dog is cut" in FIG. 3B is all written with E A The representation, to obtain two sentence embedded vector sequences, the position embedded vector layer of the language model automatically generates a corresponding position embedded vector sequence according to the sequence number of each element in the sentence, such as the second position of "my" in the sequence in FIG. 3B, sequence number E 1 And finally, summing the word embedded vector sequence of each element, the sentence embedded vector sequence and the position embedded vector sequence, correspondingly obtaining the word label, the sentence label and the position information corresponding to each element in the input text information, and finally, according to a specific downstream prediction task, calling a classifier to output a prediction probability distribution, for example, in a prompt template task, a language model to output a feature vector of the prompt template, and then calling a corresponding classifier to output the prediction probability distribution of the first prompt template after replacement.
Referring to fig. 3A, fig. 3A is a schematic structural diagram of a language model provided in an embodiment of the present application, where the language model includes a coding structure in addition to the above-mentioned embedded layer, and may specifically include a multi-head attention layer, a residual error and normalization module, a feed-forward network, and another residual error and normalization module, where a vector of n×h (where n is the maximum sentence length and h is the number of hidden layers) is input, and a vector of n×h is output through processing of a plurality of modules inside. Specifically, in the text mode, semantic coding processing is performed on the embedded vector representation of the sequence by calling a pre-trained language model, and the output of the last hidden layer (equivalent to a full-connection layer) of the model is used as a feature vector sequence of the tag sequence, wherein the feature vector sequence comprises feature vectors of each tag in the tag sequence.
In some embodiments, the replaced first prompt template is input into a language model to obtain a feature vector sequence, and then a classifier is called to output predictive probability distribution based on the feature vector sequenceHere the predictive probability distribution ∈ ->The probability distribution of the prediction results of various scenes in the embodiment of the present application may be, for example, a prediction probability distribution of a plurality of levels in a resume evaluation scene.
In step 1023, the distance between the predictive probability distribution and the uniform distribution is determined, and the distance is used as a first bias index of a first prompt template.
In some embodiments, a predictive probability distribution is determinedAnd the first prompt template after replacement>The formula for the KL distance between uniformly distributed U for the condition is as follows:
wherein i represents a first prompt template after replacementThe maximum value b of the uniformly distributed U is the highest level in a plurality of levels in the resume evaluation scene, and the minimum value a is the lowest level in the plurality of levels in the resume evaluation scene.
The KL distance is used as a first Bias index Bias (t p ) I.e.
In some embodiments, the first bias index for the first prompt template may also be calculated using the following formula:
wherein eta represents the replaced semantically character [ N/A ] in the first prompt template]The method comprises the steps of carrying out a first treatment on the surface of the ρ represents labels corresponding to other sentence samples in the first prompt template and other sentence samples; y represents samples in the sample set, Y represents the number of samples decimated from the sample set for insertion into the first prompt template; the fair (ρ) represents a sentence sample and a first hint template t of a label ρ corresponding to the sentence sample p Is a first bias indicator of (1).
In the embodiment of the application, the sample inserted with the first prompt template and the label thereof are combined with the original sentence without semantics, and then the language model is called based on the combined first prompt template to obtain the prediction probability distribution.
With continued reference to fig. 4A, in step 103, a training set of language models is obtained.
In some embodiments, the training set includes a plurality of sentence samples and labels corresponding to the sentence samples, e.g., when the number of sentence samples and labels corresponding to the sentence samples in the training set is N, i.e., the training setWherein x is i Representing sentence samples, y i E y is the label of the ith statement sample, y is the space of all labels of the predictive task of the language model.
For example, the sentence sample may be "[ wind wave determination ] which is a term created by the sushi. "tag can be" who is the author of "constant wind wave? Sushi ", or" what work is the "constant wind wave"? Words. "
In step 104, the training set is traversed to perform no-put-back sampling, and a prompt is generated based on one sentence sample obtained by each no-put-back sampling and a label corresponding to the sentence sample.
In some embodiments, traversing the training set for non-put back sampling may be accomplished by: and reading one sentence sample and a label corresponding to the sentence sample from the training set each time to serve as the sentence sample obtained by sampling without replacement and the label corresponding to the sentence sample, and deleting the read sentence sample and the label corresponding to the sentence sample from the training set, so that the identical sample and the label thereof are prevented from repeatedly appearing in the prompt template. For example, the copy of the sentence sample 1 and the copy of the tag of the sentence sample 1 are read from the sample set for the first time, and the copy of the sentence sample 1 and the copy of the tag of the sentence sample 1 are taken as sampling results, and then the sentence sample 1 and the tag of the sentence sample 1 are deleted from the training set. Generating a prompt based on a sentence sample obtained by sampling without replacement and a label corresponding to the sentence sample each time can be realized by the following steps: acquiring a preset conversion template corresponding to a prediction task of a language model, wherein the conversion template comprises at least one input position (slot) [ X ] and at least one output position [ Z ]; filling a sentence sample obtained by sampling each time and a label corresponding to the sentence sample into a corresponding position in the conversion template, namely filling the sentence sample into an input position, filling the label into an output position, and taking the filled conversion template as a prompt language conforming to the language model input.
When the language model is used to handle a specific task, for sample text x, the function f is passed through prompt (x) Converting x into the form x' of the cue, namely:
x′=f prompt (x) (3)
function f prompt (x) Comprises the following steps: first, a prompt template, typically a natural language, is constructed, andcomprises two slots: position for filling in X [ X]And a location [ Z ] for generating answer text Z]Then, the input X is padded to [ X ]]Is a position of (c). For example, in a task of text emotion classification, it is assumed that the input is "I Love this movie", "the template used is" [ X "]Overall,i t was a[Z]movie. Then the resulting x' should be "I love this movie. Over all it wa s a [ Z ]]movie. Wherein language model predicted words representing emotion are populated at [ Z]Is a position of (c).
For example, in a question-answering task, it is assumed that an input sentence sample is "[ constant wind wave ] which is a word created by a sushi. Who is the author of the "constant wind wave? "will the language model be the original question" whatis the author of the constant wind wave? "and" the wind wave "is a term for su. "splice into an input sequence, the middle is separated by [ SEP ] symbol, then input into BERT model to extract the characteristic separately. And extracting a feature vector sequence at the BERT, and classifying the feature vector of each element of the feature vector sequence output by the BERT model through a classifier, so as to obtain a predictive probability distribution that each element is the starting position (s start position) or the ending position (end position) of the answer.
For example, in the task of completing the gap-filling, assuming that the input sentence sample is "my dog is hair", 15% of the elements (words or characters) in the sentence sample are replaced with the following: these elements are replaced with MASKs (MASK) with a probability of 80%, such as mydog is [ MASK ]; there is a 10% probability of being replaced with any other token, such as a mydog is apple; there is a 10% probability that no change is made, such as mydog is hair, and then the language model predicts and reduces for the covered or replaced portion.
For example, in the translation task, assume that the statement sample is "I transmitted the bus today," and the sample used is English: [ X ], french: [ Z ], then the sample of the input model is "English: i missed t he bus today.French: [ Z ] ", wherein English: and French: namely, the prompt word, the language model fills corresponding French sentences in the empty space [ Z ] according to the input sentence sample.
For example, in the resume evaluation task, the ith past resume text to be inserted is noted as x i The corresponding resume evaluation tag is denoted as y i The process of connecting the past resume sample and the corresponding resume evaluation tag as the prompt can be expressed as t i =T(x i y i ) T represents the process of connecting the past resume sample and the label thereof as a prompt.
In step 105, the alert is merged with the first alert template into a second alert template.
In some embodiments, merging the alert with the first alert template into the second alert template may be accomplished by: inserting a cue into the first cue template at one of the following locations: a start position, an intermediate position, and an end position; and taking the first prompt template with the prompt inserted as a second prompt template.
For example, in a task of text emotion classification, assuming that the first hint template is "This movie made me feel relaxed" and the first hint template is "I love this movie a [ Z ] movie," the hint may be inserted into the first hint template at the start position, and generated as a second hint template, that is, "This movie made me feel delayed. I love this movie a [ Z ] movie," or may be inserted into the first hint template at the end position, and generated as a second hint template, that is, "I love this movie. This movie made me feel delayed. Overview wa a [ Z ] movie," and also may be inserted into the first hint template at the intermediate position when a plurality of sentence samples have been inserted into the first hint template, so as to generate the second hint template.
In some embodiments, referring to fig. 8, fig. 8 is a schematic view of emotion classification task sample selection provided in an embodiment of the present application; when the sentence sample contains a big positive word or other similar descriptive words, the attitude of the sentence can be considered to be positive, and the emotion label of the sentence in the training set is positive; when the sentence sample contains a positive word of 'self shortage' or other similar descriptive words, the attitude of the sentence can be considered as negative, and the emotion label of the sentence in the training set is negative. When traversing each sample in the training set, randomly selecting a sentence sample and a corresponding emotion label from the training set, connecting the sentence sample and the emotion label as a prompt, and then inserting the sentence sample into a first prompt template (namely the prompt template in fig. 8) until all the samples in the training set are selected, or the samples selected from the training set cannot enable the prejudice index of the first prompt template to be reduced again.
For example, in the resume evaluation task, the prompt t after the connection is obtained i After that, the prompt word t i Inserting the ending position of the first prompt template to obtain a second prompt template t for resume evaluation q I.e. t q =concat(t 1 ,…t i ,t m ) Or inserting the beginning of the first prompt template, i.e. t q =concat(t i ,t 1 ,…,t m ) Or inserting intermediate positions of the first prompt template, i.e. t q =concat(t 1 ,…,t i ,…,t m ) Wherein concat represents the prompt t i And (5) inserting a first prompt template.
In the embodiment of the application, after the prompt is inserted into the starting position, the middle position or the ending position in the first prompt template, a second prompt template is obtained, then the second prompt template is input into the language model for prediction, and because a plurality of sentence samples inserted into the second prompt template are simultaneously input into the model, corresponding calculation is carried out on the whole second prompt template, the arrangement sequence of the sentence samples in the second prompt template does not influence the calculation result, but is convenient for the insertion operation of the prompt.
Wherein the language model predicts that a word representing emotion fills in at [ Z ]. For example, there are empty locations in the prompter template for filling in answers, which are typically located in sentences or at the end of sentences, and the positions and numbers of [ X ] and [ Z ] may have an effect on the results, and thus can be flexibly adjusted according to actual needs.
In step 106, a second bias index of a second prompt template corresponding to each sentence sample is obtained.
In some embodiments, referring to fig. 4C, fig. 4C is a flowchart illustrating a method for processing a prompt in a language model according to an embodiment of the present application, and step 106 shown in fig. 4A may be implemented by executing steps 1061 to 1063 shown in fig. 4C on a second prompt template corresponding to each statement sample without a put-back sample, which is described in detail below.
In step 1061, the sentence sample in the second prompt template is replaced with the semantic character-free sentence sample, so as to obtain a replaced second prompt template.
In some embodiments, the second alert template t p Comprises at least one original sentence which is a sentence in a training set, and the original sentence in a first prompt template is replaced by a semantic-free character [ N/A ]]And obtaining a replaced second prompt template, so as to avoid interference of the original sentence on the prejudice index of the prompt template after the sentence sample and the label corresponding to the sentence sample are inserted.
For example, in a question-answering task, one of the original sentences included in the second notice template is "sushi is a famous literature in North Song, eyebrow mountain people in eyebrow state". Where is sushi? And replacing the original sentence in the second prompt template with the semantic-free character, namely replacing each character of the sentence with the semantic-free character [ N/A ] to obtain a replaced second prompt template.
In step 1062, the language model is invoked to determine the predictive probability distribution of the replaced second prompt template.
In some embodiments, referring to fig. 3B, fig. 3B is a schematic diagram illustrating conversion of text information into vectors according to an embodiment of the present application; including input (text information), word embedded vector sequences, sentence embedded vector sequences, and position embedded vector sequences. Firstly, for two input sentences 'my dog is cut' and 'he likes playing', firstly, making sentence undergo the process of marking (token) treatment, i.e. converting text into several elements, and inserting special element [ CLS ] at the beginning of first sentence]For marking the beginning of sentences, inserting SEP at the end of the second sentence]For marking the end position of sentencesAnd inserting [ SEP ] between two sentences]For separating two sentences; the word embedding vector layer of the language model then converts each element into a vector, e.g., vector E for "my" in FIG. 3B my Representation, to obtain a word embedding vector sequence, the sentence embedding vector of each element of the first sentence is assigned 0, the sentence embedding vector of each element of the second sentence is assigned 1, the sequence number used to characterize the sentence, e.g., the first sentence "my dog iscute" in FIG. 3B is all written with E A The representation, to obtain two sentence embedded vector sequences, the position embedded vector layer of the language model automatically generates a corresponding position embedded vector sequence according to the sequence number of each element in the sentence, such as the second position of "my" in the sequence in FIG. 3B, sequence number E 1 And finally, summing the word embedded vector sequence of each element, the sentence embedded vector sequence and the position embedded vector sequence, correspondingly obtaining the word label, the sentence label and the position information corresponding to each element in the input text information, and finally, according to a specific downstream prediction task, calling a classifier to output a prediction probability distribution, for example, in a prompt template task, a language model to output a feature vector of a prompt template, and then calling a corresponding classifier to output a prediction probability distribution of a second prompt template after replacement.
Referring to fig. 3A, fig. 3A is a schematic structural diagram of a language model provided in an embodiment of the present application, where the language model includes a coding structure in addition to the above-mentioned embedded layer, and may specifically include a multi-head attention layer, a residual error and normalization module, a feed-forward network, and another residual error and normalization module, where a vector of n×h (where n is the maximum sentence length and h is the number of hidden layers) is input, and a vector of n×h is output through processing of a plurality of modules inside. Specifically, in the text mode, semantic coding processing is performed on the embedded vector representation of the sequence by calling a pre-trained language model, and the output of the last hidden layer (equivalent to a full-connection layer) of the model is used as a feature vector sequence of the tag sequence, wherein the feature vector sequence comprises feature vectors of each tag in the tag sequence.
In some embodiments, a post-replacement second prompt templateInputting the language model to obtain a feature vector sequence, and calling a classifier to output predictive probability distribution based on the feature vector sequenceHere the predictive probability distribution ∈ ->The probability distribution of the prediction results of various scenes in the embodiment of the present application may be, for example, a prediction probability distribution of a plurality of levels in a resume evaluation scene.
In step 1063, the distance between the predicted probability distribution and the uniform distribution is determined, and the distance is used as a second bias index for a second alert template.
In some embodiments, a predictive probability distribution is determinedThe formula of the KL distance between the uniform distribution U and the replaced second prompt template is shown in the formula (1), the KL distance is used as a second bias index for measuring the second prompt template, or the calculation result of the formula (2) can be used as the second bias index for measuring the second prompt template.
In the embodiment of the application, the sample inserted with the second prompt template and the label thereof are combined with the original sentence without semantics, and then the prediction probability distribution of the language model input by the second prompt template is obtained, under ideal conditions, the prediction probability output by the language model is close to uniform distribution because the original sentence (test sample) lacks semantic information, so that the entropy is used for measuring the prediction deviation, and the deviation index of the second prompt template can be intuitively and accurately measured.
With continued reference to fig. 4A, in step 107, a second alert template corresponding to the second bias index with the smallest value of all the obtained second bias indexes is used as a candidate optimization template.
In some embodiments, the second prompt template corresponding to the second bias index with the smallest value in all the obtained second bias indexes is used as a candidate optimization template, so that the second prompt template which is optimal, namely the candidate optimization template, can be generated by inserting the first prompt template into the proper sentence sample selected from the traversal training set each time, and the bias index of the candidate optimization template is smallest, so that the optimized prompt template is gradually found.
In some embodiments, after iteratively performing steps 104 through 107, the first prompt template may also be updated during the iteration based on the candidate optimization template, by: responding to the second bias index of the candidate optimization template being smaller than the first bias index, and taking the candidate optimization template as a new first prompt template; and in response to the second bias index of the candidate optimized template being greater than or equal to the first bias index, maintaining the first prompt template unchanged.
After traversing sentence samples in the training set to find a candidate optimization template, comparing a second bias index of the candidate optimization template with a first bias index corresponding to the first prompt template, and taking the candidate optimization template as a new first prompt template, namely replacing the original first prompt template when the second bias index of the candidate optimization template is smaller than the first bias index corresponding to the first prompt template, so that the next iteration processing can be based on the updated first prompt template to insert a prompt; when the second bias index of the candidate optimization template is greater than or equal to the first bias index corresponding to the first prompt template, the next iteration is directly carried out, namely the training set is traversed again to carry out non-return sampling and subsequent steps, so that the first bias index corresponding to the first prompt template in each iteration processing process is guaranteed to be the current minimum.
In step 108, in response to the second bias index with the minimum value obtained by the current iteration being greater than or equal to the first bias index, the candidate optimization template obtained by the current iteration is used as the prompt template after the language model is optimized.
In some embodiments, the second bias index with the smallest value obtained after the current iteration is greater than or equal to the first bias index, which indicates that the bias index of the current prompt template cannot be reduced any more, the iteration process is stopped, and the candidate optimization template obtained by the current iteration is used as the optimized prompt template of the language model, so that the bias index of the optimized prompt template is the smallest in all the prompt templates.
In some embodiments, the candidate optimization templates obtained in the last iteration are used as the prompt templates after language model optimization in response to traversing the completion training set. When the minimum second bias index obtained after each iteration is smaller than the first bias index, the bias index of the prompt template obtained after each iteration is reduced until all sentence samples in the training set and labels corresponding to the sentence samples are sampled, the iteration processing is stopped, and the candidate optimization template obtained in the last iteration is used as the optimized prompt template of the language model, so that the bias index of the optimized prompt template is minimum in all the prompt templates.
Referring to fig. 9, fig. 9 is a schematic diagram showing the influence of the optimized Prompt template provided by the embodiment of the present application on the performance of the language model, where (a), (b) and (c) in fig. 9 respectively represent the influence of using a stanford emotion tree library, a news article data set and an identified text implication data set as training sets on the performance of the language model by using the Prompt template obtained for the same task, where "Random Prompt" represents the average Accuracy of the prediction results obtained by all the Prompt templates, "Oracle accelary" represents the upper limit performance of the language model when using different Prompt templates, "T-fair-Prompting" represents the upper limit performance of the language model when using the Prompt template obtained by using the global search strategy, and "G-fair-Prompting" represents the upper limit performance of the language model when using the Prompt template obtained by the method provided by the embodiment of the present application, where the upper limit performance is the Accuracy, precision, recall and optimal evaluation indexes of the output prediction results when executing a specific downstream task by the language model. As can be seen from FIG. 9, the upper limit performance of the language model is very close to that of the prompt template obtained by using the method provided by the embodiment of the application when the language model is input, but the calculation complexity of the prompt template obtained by using the method provided by the embodiment of the application is far less than that of the prompt template obtained by using the global search strategy, so that the calculation resource is saved.
In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described.
When a large language model is applied to a task of resume evaluation, referring to fig. 6, fig. 6 is a schematic diagram of a resume evaluation prompt template provided by an embodiment of the present application, and a developer inserts a question after a specific test resume text message, for example, "how does you evaluate the resume? And inputting the resume and the inserted question together as a prompt into a large language model, wherein the large language model outputs an evaluation for the test resume. Referring to fig. 7, fig. 7 is a schematic diagram of a resume evaluation prompt template under scenario learning provided by the embodiment of the present application, and scenario learning shown in fig. 7 is to insert multiple past resumes which have been manually evaluated and resume evaluation information (e.g. scoring) corresponding to the past resume before the prompt, so that a model learns past resumes and the resume evaluation before the test resume is evaluated, thereby enabling a more accurate evaluation result to be obtained when the test resume is evaluated.
By inputting the manually marked data into the large language model, the large language model can have the capability of context awareness learning, so that the labels (scores of test resume) of the test samples are accurately predicted. However, when the scheme is actually applied, the training data or the manual annotation data contains social bias, so that the large language model can output a prediction result unfair to a certain attribute of the test sample. For example, in the example of the resume evaluation described above, since there may be a bias for gender in the past resumes, this would result in the final output of the model also containing a bias for gender of the test sample, e.g., a lower score for the unmarked woman. Therefore, in practical tasks, the construction of the above-mentioned prompt template generally depends on a plurality of factors, such as selection of the past resume samples, arrangement order of the past resume samples, etc., and the difference of these factors often causes a great difference of the prompt words, so as to affect accuracy of the result output by the model. When the prompt word template is constructed, a plurality of samples need to be selected from a sample set by using an exhaustive search method to form the prompt word template under scene learning, so that the bias index of the prompt word template is minimum. However, obtaining the optimal prompt template by using the method of exhaustive search can generate great computational complexity and consume huge computational resources.
Referring to fig. 5, fig. 5 is a schematic flow chart of a method for processing a resume evaluation prompt template of a language model according to an embodiment of the present application, and the following details are described with reference to the steps shown in fig. 5.
In step 501, a first resume evaluation prompt template to be optimized is obtained.
Referring to fig. 6, fig. 6 is a schematic diagram of a resume evaluation prompt template provided by an embodiment of the present application, in which a developer inserts a question after a specific test resume text, for example, "how do you evaluate the resume? ", at this time, the resume and the inserted question need to be converted into a prompt (Pr ompt) that can be input into a large language model, and the conversion process can be expressed as t test =T(x test ) Wherein x is test Representing text of a given test resume, T represents text transformation of a prompt template to which text of the test resume corresponds, T test And evaluating the prompt template for the first resume obtained after transformation.
In step 502, a first bias indicator of a first resume evaluation prompt template is obtained.
Replacing test resume text in the first resume evaluation prompt template with characters [ N/A ] without any semantic information]Obtaining a first resume evaluation prompt template after replacementThen the first resume after replacement is evaluated and prompt form Inputting a language model M to a first resume evaluation prompt after replacementPredictive probability distribution of language templates->I.e.Wherein (1)>The level of resume output for language model, +.>For the replaced first prompt template, M represents a language model, and l represents a plurality of resume evaluation levels and corresponding probabilities thereof. Calculate +.>Probability distribution being a conditionThe KL distance with the uniformly distributed U is calculated by the following formula: />
The calculation result is used as a first bias index of a first resume evaluation prompt template and is recorded asThe result of the calculation is positively correlated with the bias of the corresponding template.
The following steps 503 to 506 are iteratively performed:
in step 503, the past resume in the sample set is traversed to perform the sample without replacement, and one resume sample obtained by each sample and the label corresponding to the resume sample are connected to be a prompt.
If a downstream developer has a sample set S containing n past resumes x and labels y, then s= { (x) i ,y i )} n Selecting a past resume sample and a label corresponding to the past resume sample from the sample set each time, deleting the selected past resume sample and the label corresponding to the past resume sample from the sample set, and assuming that the ith past resume is selected, marking the text of the ith past resume as x i The corresponding resume evaluation is marked as a mark y i The process of connecting a resume sample obtained by sampling each time and a label corresponding to the resume sample as a prompt can be realized by the following steps: acquiring a preset conversion template corresponding to a resume evaluation task of a language model, wherein the conversion template comprises at least one input position [ X ]]And at least one output position [ Z ]]And filling a past resume sample obtained by sampling each time and a label corresponding to the past resume sample into a corresponding position in the conversion template, namely filling the past resume sample into an input position, filling the label into an output position, and taking the filled conversion template as a prompt meeting the input of a natural language model. The above steps can be expressed as t i =T(x i y i ) T represents the conversion of the text of the past resume to the prompt of the prompt template, T i And the converted prompt which can be input with the i-th past resume of the large language model is represented.
In step 504, the prompt is inserted into the first resume evaluation prompt template to obtain a second resume evaluation prompt template.
Referring to fig. 7, fig. 7 is a schematic diagram of a resume evaluation prompt template under scenario learning provided by the embodiment of the application, after a past resume and a test resume are subjected to text conversion and connected into a prompt, the prompt is inserted into a first resume evaluation prompt template to obtain a second resume evaluation prompt template, and the prompt can be inserted into any position in the first resume evaluation prompt template to prompt t i The starting position of the first resume evaluation prompt template is inserted as an example, namely t p =concat(t i ,t 1 ,…t m ,t test ) Wherein concat represents the stitching process, t p And evaluating a prompt template for a second resume under scene learning.
In step 505, a second bias index of a second resume evaluation prompt template corresponding to each past resume sample is obtained, and a second resume evaluation prompt template corresponding to the smallest second bias index is selected as a candidate optimization prompt template.
For the second resume evaluation prompt template corresponding to each past resume sample, a second bias index of the second resume evaluation prompt template is calculated, and the specific calculation method refers to step 502 and is not described herein.
And then selecting a second resume evaluation prompt template corresponding to the minimum second bias index as a candidate optimization prompt template, so as to gradually iterate to obtain the resume evaluation prompt template with the minimum bias index.
In step 506, the first resume evaluation prompt template is updated based on the candidate optimized prompt template.
After traversing the past resume samples in the sample set each time to find the candidate optimized cue templates, the second bias indexes corresponding to the candidate optimized cue templates are obtained First bias index corresponding to first resume evaluation prompt template>Comparing, when the second bias index of the candidate optimized prompt template is +.>Is smaller than a first bias index corresponding to the first resume evaluation prompt template>When the candidate optimized prompt template is used as a new first resume evaluation prompt template, so that the next iteration processing can perform the operation of inserting the prompt based on the updated first resume evaluation prompt template; second bias index when candidate optimization hint template +.>Is greater than or equal to a first bias index corresponding to the first resume evaluation prompt template>And when the method is used, the next iteration is directly carried out, namely, the sample set is traversed again to carry out non-replacement sampling and subsequent steps, so that the first bias index corresponding to the first resume evaluation prompt template in each iteration processing process is guaranteed to be the minimum currently.
In step 507, when the second bias index and the first bias index of the candidate optimized prompt template meet the iteration stop condition, the candidate optimized prompt template obtained in the last iteration is used as the resume evaluation prompt template after optimization.
The iteration stop condition includes: the smallest second bias indicator obtained after the current iteration is greater than or equal to the first bias indicator. And stopping the iteration processing when the bias index of the current resume evaluation prompt template cannot be reduced, and taking the candidate optimized prompt template obtained in the last iteration as an optimized resume evaluation template of the language model.
When all past resume samples in the sample set are traversed, the candidate optimized prompt template obtained after the last iteration is used as the optimized resume evaluation prompt template of the language model. When all the past resume samples in the sample set and the labels corresponding to the past resume samples are sampled, stopping the iteration processing at the moment, and taking the candidate optimized prompt template obtained in the last iteration as the optimized resume evaluation prompt template of the language model, so that the bias index of the optimized resume evaluation prompt template is minimum in all the resume evaluation prompt templates.
According to the scheme, the language model is required to be called for multiple times when the training set is traversed, the second bias index corresponding to the second resume evaluation prompt template inserted into the current resume sample is calculated, the second bias index is compared with the first bias index to determine the resume evaluation prompt template with the minimum bias index obtained by traversing, so that the prompt template with the minimum bias index obtained by traversing can be obtained when the training set is traversed each time, namely, the local optimal solution, the candidate optimized resume template with the minimum bias index obtained by traversing multiple times is used as the optimized resume evaluation prompt template through traversing the training set for multiple times, and therefore, under the condition of avoiding exhaustive searching of the training set, the resume evaluation prompt template corresponding to the optimal bias index is obtained, the calculation complexity of global searching is greatly reduced, and the searching result similar to the global optimal solution is realized.
Table 1: accuracy of different hint strategies
/>
Table 2: comparison of precision after calibration
Table 1 shows the accuracy of different prompt strategies, that is, the performance of different strategies, the method provided by the embodiment of the application is compared with the diversity guided search strategy and the similarity guided search strategy in the related technology, experiments are carried out in different models, prompts obtained by operation are used for comparison, wherein 'random' represents the average accuracy of enumeration of all cases, and 'diversity' and 'similarity' represent the selection and demonstration of the various and similarity. It can be seen that, although the performance of the global search strategy is better than the random selection, the performance of the local search strategy provided by the embodiment of the application is always better than the global search strategy.
Comparison of the calibrated accuracy shown in table 2, in most cases, the accuracy of the search result hint may be improved after the training set is calibrated, so the data in table 2 shows the performance of the hint search method and the randomly selected method provided by the embodiment of the present application after calibration, where "average" and "worst" represent the average accuracy and the worst performance over all training example arrangements. It can be seen that the accuracy of the method provided by the embodiment of the application is better than that of random selection in most cases, for example, when using a grammar data set, the method provided by the embodiment of the application can generate better effects when being used on different models.
Continuing with the description below of an exemplary structure of the prompt processing device 433 of the language model implemented as a software module provided in an embodiment of the present application, in some embodiments, as shown in fig. 2, the software module stored in the prompt processing device 433 of the language model of the memory 430 may include:
the first obtaining module 4331 is configured to obtain a first prompt template to be optimized of the language model.
The second obtaining module 4332 is configured to obtain a first bias indicator of the first prompt template.
In some embodiments, the second obtaining module 4332 is further configured to replace the sentence in the first prompt template with a semantic-free character to obtain a replaced first prompt template; invoking the language model to determine the predictive probability distribution of the replaced first prompt template; and determining the distance between the predictive probability distribution and the uniform distribution, and taking the distance as a first bias index of the first prompt template.
And a third obtaining module 4333, configured to obtain a training set of the language model, where the training set includes a plurality of sentence samples and labels corresponding to the sentence samples.
Update module 4334, configured to iteratively perform the following processes: traversing the training set to perform non-return sampling, and generating a prompt based on a sentence sample obtained by each non-return sampling and a label corresponding to the sentence sample; combining the prompt with the first prompt template to form a second prompt template; and acquiring second bias indexes of the second prompt templates corresponding to each sentence sample, and taking the second prompt templates corresponding to the second bias indexes with the smallest values in all acquired second bias indexes as candidate optimization templates.
In some embodiments, the updating module 4334 is further configured to perform the following processing for the second hint template corresponding to each of the sentence samples: replacing the sentence sample in the second prompt template with the semantic-free character to obtain a replaced second prompt template; invoking the language model to determine the predicted probability distribution of the replaced second prompt template; and determining the distance between the predictive probability distribution and the uniform distribution, and taking the distance as a second bias index of the second prompt template.
In some embodiments, the updating module 4334 is further configured to obtain a conversion template corresponding to the prediction task of the language model, wherein the conversion template comprises at least one input location and at least one output location; filling a sentence sample obtained by sampling each time and a label corresponding to the sentence sample into a corresponding position in the conversion template, and taking the filled conversion template as a prompt.
In some embodiments, the updating module 4334 is further configured to insert the cue into one of the following locations in the first cue template: a start position, an intermediate position, and an end position; and taking the first prompt template inserted with the prompt as a second prompt template.
In some embodiments, the updating module 4334 is further configured to update the first tagline template based on the candidate optimization template.
In some embodiments, the updating module 4334 is further configured to, in response to the second bias indicator for the candidate optimization template being less than the first bias indicator, treat the candidate optimization template as a new first prompt template; and in response to the second bias indicator of the candidate optimized template being greater than or equal to the first bias indicator, maintaining the first prompt template unchanged.
In some embodiments, the updating module 4334 is further configured to read one sentence sample and a label corresponding to the sentence sample from the training set at a time, as a sentence sample obtained by sampling without put back and a label corresponding to the sentence sample, and delete the read one sentence sample and the label corresponding to the sentence sample from the training set.
The selecting module 4335 is configured to respond to the second bias indicator with the smallest value obtained by the current iteration being greater than or equal to the first bias indicator, and use the candidate optimization template obtained by the current iteration as the optimized prompt template of the language model, where the iteration stop condition includes: and the smallest second bias index obtained after the current iteration is larger than or equal to the first bias index.
In some embodiments, the selecting module 4335 is further configured to use the candidate optimization template obtained in the last iteration as the prompt template after the language model optimization in response to completing the training set through traversal.
Embodiments of the present application provide a computer program product comprising a computer program or computer-executable instructions stored in a computer-readable storage medium. The processor of the electronic device reads the computer executable instructions from the computer readable storage medium, and the processor executes the computer executable instructions, so that the electronic device executes the prompt processing method of the language model according to the embodiment of the application.
The embodiment of the present application provides a computer-readable storage medium storing computer-executable instructions or a computer program stored therein, which when executed by a processor, cause the processor to perform a method for processing a prompt for a language model provided by the embodiment of the present application, for example, a method for processing a prompt for a language model as shown in fig. 4A.
In some embodiments, the computer readable storage medium may be RAM, ROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.
In some embodiments, computer-executable instructions may be written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, in the form of programs, software modules, scripts, or code, and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.
As an example, computer-executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (Hyper Text Markup Language, HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).
As an example, computer-executable instructions may be deployed to be executed on one electronic device or on multiple electronic devices located at one site or, alternatively, on multiple electronic devices distributed across multiple sites and interconnected by a communication network.
In summary, the first resume evaluation prompt template to be optimized of the language model is obtained and the first bias index is calculated through the embodiment of the application, then each resume sample in the training set is traversed to sample, the resume sample obtained through each sampling is inserted into the first resume evaluation prompt template to obtain the second resume evaluation prompt template, thus the second bias index corresponding to the current second resume evaluation prompt template is calculated through calling the language model for each time of traversing the training set, the second bias index is compared with the first bias index to determine the resume evaluation prompt template with the minimum bias index obtained through traversing, so that a local optimal solution can be obtained through traversing the training set for each time, and the candidate optimized prompt template with the minimum bias index obtained through multi-washing is used as the optimized resume evaluation prompt template, so that the brief evaluation prompt template corresponding to the optimal bias index is obtained under the condition of avoiding searching the training set, the complexity of global search is greatly reduced, and the global search result is similar to the global search result.
The foregoing is merely exemplary embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims (13)

1. A method for processing a prompt for a language model, the method comprising:
acquiring a first prompt template to be optimized of a language model, and acquiring a first bias index of the first prompt template;
acquiring a training set of the language model, wherein the training set comprises a plurality of statement samples and labels corresponding to the statement samples;
the following process is iteratively performed:
traversing the training set to perform non-return sampling, generating a prompt based on a sentence sample obtained by each non-return sampling and a label corresponding to the sentence sample, combining the prompt and the first prompt template into a second prompt template,
acquiring second bias indexes of the second prompt template corresponding to each sentence sample, and taking the second prompt template corresponding to the second bias index with the smallest value in all acquired second bias indexes as a candidate optimization template;
And responding to the second bias index with the minimum value obtained by the current iteration to be larger than or equal to the first bias index, and taking the candidate optimization template obtained by the current iteration as the prompt template after the language model is optimized.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises,
the prompt template comprises at least one sentence;
the obtaining the first bias index of the first prompt template includes:
replacing the sentences in the first prompt template with semantic-free characters to obtain a replaced first prompt template;
invoking the language model to determine the predictive probability distribution of the replaced first prompt template;
and determining the distance between the predictive probability distribution and the uniform distribution, and taking the distance as a first bias index of the first prompt template.
3. The method of claim 1, wherein the obtaining a second bias indicator for the second prompt template corresponding to each sentence sample comprises:
executing the following processing for the second prompt template corresponding to each statement sample:
replacing the sentence sample in the second prompt template with the semantic-free character to obtain a replaced second prompt template;
Invoking the language model to determine the predicted probability distribution of the replaced second prompt template;
and determining the distance between the predictive probability distribution and the uniform distribution, and taking the distance as a second bias index of the second prompt template.
4. The method of claim 1, wherein generating the hint based on one sentence sample obtained from each no-put-back sample and the tag corresponding to the sentence sample comprises:
obtaining a conversion template corresponding to a prediction task of the language model, wherein the conversion template comprises at least one input position and at least one output position;
filling a sentence sample obtained by sampling each time and a label corresponding to the sentence sample into a corresponding position in the conversion template, and taking the filled conversion template as a prompt.
5. The method of claim 1, wherein the merging the cue with the first cue template into a second cue template comprises:
inserting the cue into the first cue template at one of the following locations: a start position, an intermediate position, and an end position;
and taking the first prompt template inserted with the prompt as a second prompt template.
6. The method according to claim 1 or 5, characterized in that the method further comprises:
and responding to the completion of traversing the training set, and taking the candidate optimization template obtained in the last iteration as the prompt template after the language model optimization.
7. The method according to any one of claims 1 to 5, wherein after taking, as the candidate optimization template, the second alert template corresponding to the second bias index having the smallest value among all the obtained second bias indexes, the method further comprises:
updating the first prompt template based on the candidate optimized template.
8. The method of claim 8, wherein the step of determining the position of the first electrode is performed,
the updating the first prompt template based on the candidate optimization template includes:
in response to the second bias indicator of the candidate optimization template being less than the first bias indicator, taking the candidate optimization template as the new first cue template;
the method further comprises the steps of: and in response to the second bias indicator of the candidate optimized template being greater than or equal to the first bias indicator, maintaining the first prompt template unchanged.
9. The method of any of claims 1 to 5, wherein said traversing the training set for non-return sampling comprises:
Reading a sentence sample and a label corresponding to the sentence sample from the training set each time to serve as the sentence sample obtained by sampling without replacement and the label corresponding to the sentence sample, and deleting the read sentence sample and the label corresponding to the sentence sample from the training set.
10. A prompt processing apparatus for a language model, the apparatus comprising:
the first acquisition module is used for acquiring a first prompt template to be optimized of the language model;
the second acquisition module is used for acquiring a first bias index of the first prompt template;
a third obtaining module, configured to obtain a training set of the language model, where the training set includes a plurality of sentence samples and labels corresponding to the sentence samples;
the updating module is used for iteratively executing the following processes: traversing the training set to perform non-return sampling, and generating a prompt based on a sentence sample obtained by each non-return sampling and a label corresponding to the sentence sample; combining the prompt with the first prompt template to form a second prompt template; acquiring second bias indexes of the second prompt template corresponding to each sentence sample, and taking the second prompt template corresponding to the second bias index with the smallest value in all acquired second bias indexes as a candidate optimization template;
And the selection module is used for responding to the fact that the second bias index with the minimum value obtained by the current iteration is larger than or equal to the first bias index, and taking the candidate optimization template obtained by the current iteration as the prompt template after the language model is optimized.
11. An electronic device, the electronic device comprising:
a memory for storing computer executable instructions or computer programs;
a processor for implementing the prompt processing method of the language model of any one of claims 1 to 9 when executing computer-executable instructions or computer programs stored in the memory.
12. A computer-readable storage medium storing computer-executable instructions or a computer program, wherein the computer-executable instructions or the computer program, when executed by a processor, implement the method for processing a prompt for a language model according to any one of claims 1 to 9.
13. A computer program product comprising computer executable instructions or a computer program which, when executed by a processor, implements the method for processing a prompt for a language model as claimed in any one of claims 1 to 9.
CN202310448055.8A 2023-04-17 2023-04-17 Prompt processing method, device, equipment and storage medium of language model Pending CN117217191A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310448055.8A CN117217191A (en) 2023-04-17 2023-04-17 Prompt processing method, device, equipment and storage medium of language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310448055.8A CN117217191A (en) 2023-04-17 2023-04-17 Prompt processing method, device, equipment and storage medium of language model

Publications (1)

Publication Number Publication Date
CN117217191A true CN117217191A (en) 2023-12-12

Family

ID=89041314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310448055.8A Pending CN117217191A (en) 2023-04-17 2023-04-17 Prompt processing method, device, equipment and storage medium of language model

Country Status (1)

Country Link
CN (1) CN117217191A (en)

Similar Documents

Publication Publication Date Title
CN111967242A (en) Text information extraction method, device and equipment
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
CN116820429B (en) Training method and device of code processing model, electronic equipment and storage medium
CN114218379B (en) Attribution method for question answering incapacity of intelligent question answering system
CN117033667B (en) Knowledge graph construction method and device, storage medium and electronic equipment
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN110084323A (en) End-to-end semanteme resolution system and training method
CN115827819A (en) Intelligent question and answer processing method and device, electronic equipment and storage medium
CN112101042A (en) Text emotion recognition method and device, terminal device and storage medium
CN115858750A (en) Power grid technical standard intelligent question-answering method and system based on natural language processing
CN112988982B (en) Autonomous learning method and system for computer comparison space
CN117648429B (en) Question-answering method and system based on multi-mode self-adaptive search type enhanced large model
CN116882450B (en) Question-answering model editing method and device, electronic equipment and storage medium
CN117648093A (en) RPA flow automatic generation method based on large model and self-customized demand template
CN113705207A (en) Grammar error recognition method and device
CN111831624A (en) Data table creating method and device, computer equipment and storage medium
CN115906818A (en) Grammar knowledge prediction method, grammar knowledge prediction device, electronic equipment and storage medium
CN116483314A (en) Automatic intelligent activity diagram generation method
CN114625759A (en) Model training method, intelligent question answering method, device, medium, and program product
CN115115984A (en) Video data processing method, apparatus, program product, computer device, and medium
CN117217191A (en) Prompt processing method, device, equipment and storage medium of language model
CN114254622A (en) Intention identification method and device
CN111782781A (en) Semantic analysis method and device, computer equipment and storage medium
CN114444470B (en) Method, device, medium and equipment for recognizing domain named entities in patent text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40099419

Country of ref document: HK