Disclosure of Invention
The invention aims to overcome the defects of the prior art, provides a construction method of a code example library and a use method of the code example library, and solves the problems that the existing recall strategy is insufficient in generalization and cannot be interpreted.
The technical scheme for achieving the purpose is as follows:
the invention provides a method for constructing a code example library, which comprises the following steps:
collecting task data of all scenes;
inputting the collected task data of all scenes into a large model to obtain a corresponding operation sequence;
extracting the label from the operation sequence to obtain a first label list;
marking the API document based on the first tag list to obtain a marked API document;
manually labeling the collected task data of all scenes to obtain a basic example library;
performing tag extraction on each example data in the basic example library based on the large model to obtain a tag list of the example task;
code syntax analysis is carried out on each example data in the basic example library so as to obtain all called function names; searching corresponding labels for all called function names from the marked API document to obtain a label list of the example codes;
based on the labels in the label list of example tasks and the labels in the label list of example codes, an accuracy of example code validity for evaluating the quality of the base example library is calculated.
The code example construction method of the invention analyzes and summarizes the API document and the use scene of the closed domain to form a label system; the method and the device can improve the construction quality of the code example library to the greatest extent, ensure the consistency of the code and task semantics, and improve the performance of code generation by improving the code quality of recall examples under the condition of a small number of samples.
The construction method of the code example library of the invention is further improved in that the task data of all scenes is collected, and the method comprises the following steps:
collecting an artificial design scene, wherein the artificial design scene is obtained based on artificial design;
collecting a supplementary scene, wherein the supplementary scene is obtained by clustering user problems based on historical data of the collected user problems;
and summarizing the task data of the manual design scene and the task data of the supplementary scene to serve as the task data of all scenes.
The construction method of the code example library is further improved in that the label extraction of the operation sequence comprises the following steps:
and extracting operation menus and tab names in the operation sequence by using a regular or NER method as labels extracted by the operation sequence.
The method for constructing the code example library of the present invention further comprises the steps of:
and screening the labels extracted from the operation list, and storing the screened labels in a first label list.
The construction method of the code example library of the invention is further improved in that the marking of the API document based on the first tag list comprises the following steps:
and calculating the similarity between the descriptive text of the API document and the first tag list, and obtaining the first k tags as tags of the API document through threshold definition, wherein k is a positive integer.
A further improvement of the method for constructing the code example library of the present invention is that the API document includes functions and constants;
the marking the API document based on the first tag list comprises the following steps:
calculating the similarity between the descriptive text of each function and the first tag list, and obtaining the first k tags as tags of the objective function through threshold definition, wherein k is a positive integer;
calculating similarity between the descriptive text of the constant and the first tag list, and obtaining the first k tags as tags of the target constant through threshold definition, wherein k is a positive integer;
and marking the objective function in the API document based on the first tag list.
The method for constructing the code example library is further improved in that the method for calculating the accuracy of the effectiveness of the example code for evaluating the quality of the basic example library based on the labels in the label list of the example tasks and the labels in the label list of the example code comprises the following steps:
taking the label in the label list of the example task as an actual value, and taking the label in the label list of the example code as a predicted value;
and (3) calculating: f1 =2×precision×recovery/(precision+recovery);
where recovery=tp/(tp+fn), precision=tp/(tp+fp);
f1 is the accuracy of the example code validity; TP is a real example, wherein the real example is a predicted value and an actual value are both positive examples; FP is a false positive example, the false positive example is a positive example of the predicted value and the actual value is a negative example; FN is a false negative example, the false negative example is a predicted value negative example and the actual value is a positive example; TN is a true negative example, and true negative examples are both predicted values and actual values.
The method for constructing the code example library is further improved in that the method for calculating the accuracy of the effectiveness of the example code for evaluating the quality of the basic example library based on the labels in the label list of the example tasks and the labels in the label list of the example code comprises the following steps:
taking the label in the label list of the example task as an actual value, and taking the label in the label list of the example code as a predicted value;
and (3) calculating: f-score= (1+β) 2 )×precision×recallβ 2 ×precision+recall;
Where recovery=tp/(tp+fn), precision=tp/(tp+fp);
F-Score is the accuracy of the example code effectiveness, TP is a real example, and the real example is a predicted value and an actual value which are both positive examples; FP is a false positive example, the false positive example is a positive example of the predicted value and the actual value is a negative example; FN is a false negative example, the false negative example is a predicted value negative example and the actual value is a positive example; TN is a true negative example, the true negative example is a predicted value and an actual value are both negative examples, and beta is a set value larger than 1.
The invention also provides a use method of the code example library, the code example library is constructed based on the construction method of the code example library, and the use method comprises the following steps:
receiving a task input by a user as an original task, and extracting task labels of the original task to obtain a label list of the original task;
calculating the similarity of labels in a label list of an original task and a label list of an example task of example data in a basic example library, and taking the first h as candidate examples for example recall, wherein h is a positive integer;
designing a prompt frame, the prompt frame comprising: candidate examples of system hints and recalls for the target task are illustrated;
and calling LLM with code generation capability to generate the code of the target task based on the prompt framework.
Detailed Description
The invention will be further described with reference to the drawings and the specific examples.
Referring to fig. 1, the invention provides a method for constructing a code example library and a method for using the code example library, which are used for analyzing and summarizing an API document and a usage scenario of a closed domain to form a label system, aligning examples and labels by using a large model based on the label system, and quantitatively evaluating the quality of the code example library. The scheme of the invention comprises three parts of label system construction, example library construction and example library use, which can improve the construction quality of the code example library to the greatest extent, ensure the consistency of code and task semantics, and improve the code generation performance by improving the quality of recall example codes under the condition of a small number of samples. The method for constructing the code example library of the present invention and the method for using the code example library will be described below with reference to the accompanying drawings.
Referring to FIG. 1, a flow chart of a method of constructing a code instance library is shown. The construction method of the code example library of the present invention will be described below with reference to fig. 1.
As shown in fig. 1, a method for constructing a code example library of the present invention includes the following steps:
executing step S101, collecting task data of all scenes; step S102 is then executed;
step S102 is executed, task data of all the collected scenes are input into a large model, and a corresponding operation sequence is obtained; step S103 is then performed;
step S103 is executed, wherein the operation sequence is subjected to label extraction to obtain a first label list; step S104 is then executed;
executing step S104, marking the API document based on the first label list to obtain a marked API document; step S105 is then performed;
step S105 is executed, wherein the collected task data of all scenes are manually marked to obtain a basic example library; step S106 is then executed;
executing step S106, extracting labels from each example data in the basic example library based on the large model to obtain a label list of the example task; step S107 is then performed;
step S107 is executed, wherein code syntax analysis is carried out on each example data in the basic example library so as to obtain all called function names; searching corresponding labels for all called function names from the marked API document to obtain a label list of the example codes; step S108 is then executed;
step S108 is performed to calculate the accuracy of the example code validity for evaluating the quality of the base example library based on the labels in the label list of the example task and the labels in the label list of the example code.
Steps S101 to S104 of the present invention belong to a step of constructing a tag system, and steps S105 to S108 belong to a step of constructing an example library. Specifically, the label system is constructed by providing an API document and a scene for a specific project, extracting a label list and marking the API function to obtain a marked API function, wherein the API document is matched with the specific project, the API document can be regarded as a service protocol between two parties, and the document outlines how a second party and software thereof respond when the first party sends a request of a certain type. The API document is divided into two parts, function and constant. Constructing an example library is based on a constructed label system, writing the code examples, and evaluating the quality of the written code examples to ensure the quality of the code examples.
In one embodiment of the present invention, collecting task data for all scenes includes the steps of: collecting an artificial design scene, wherein the artificial design scene is obtained based on artificial design; collecting a supplementary scene, wherein the supplementary scene is obtained by clustering user problems based on historical data of the collected user problems; and summarizing task data of the manual design scene and task data of the supplementary scene to serve as task data of all scenes.
The collected task data of all scenes comprises manually designed common scenes and supplementary scene data which are found from historical data of user problems in a clustering mode, wherein the scene data are matched with specific projects and are used under specific projects related to code examples to be constructed. One of the projects has a plurality of scenes, and each scene has a plurality of task data. Manually designing a common scenario may include a table region operation, a statistics class operation, etc., writing a small number of example tasks for the designed common scenario. The user problems in the historical data of the collected user problems are tasks which the actual user wants to realize, and the clustering can use a kmeans clustering method to see whether the returned category is proper according to the set K value. Clustering may also use a density-based clustering algorithm to determine if the return category is appropriate by trying different superparameters. The clustering method is not limited to kmeans clustering and density-based clustering algorithms, and other clustering methods may be used for clustering. Task data of all scenes can be obtained by manually designing scene data and adding supplementary scene data.
And carrying out prompt construction of a large Model based on the collected scene data, taking the batch of user problems in the collected scene data as the input of a large Model (Foundation Model), and acquiring an operation sequence which is output by the large Model and related to the task after inputting the batch of user problems into the large Model. For example, in the office software related project, specific operation steps for realizing tasks on the software or the tools can be given through a large model, so as to obtain an operation sequence of the tasks.
Further, the tag extraction of the operation sequence includes the steps of: and extracting the operation menu and the tab name in the operation sequence by using a regular or NER method as the extracted label of the operation sequence. For example, < open "screen" menu, select "top ten" >, the labels of screen, top ten, etc. can be obtained.
Still further, the step of obtaining a first tag list includes; and screening the labels extracted from the operation sequence, and storing the screened labels in a first label list. Manual screening is mainly deduplication, and checks whether tags have errors in names, such as some tags in an item scene are unlikely: ten items, the error labels are modified or deleted, and finally a final label system, namely a label list related to the current item, is formed.
Still further, marking the API document based on the first tag list includes the steps of:
and calculating the similarity between the descriptive text of the API document and the first tag list, and obtaining the first k tags as tags of the API document through threshold definition, wherein k is a positive integer. k is typically given a value of 1.
Still further, the API document includes functions and constants;
marking the API document based on the first tag list comprises the following steps:
calculating the similarity between the descriptive text of each function and the first tag list, and obtaining the first k tags as tags of the objective function through threshold definition, wherein k is a positive integer; k is typically given a value of 1.
Calculating similarity between the descriptive text of the constant and the first tag list, and obtaining the first k tags as tags of the target constant through threshold definition, wherein k is a positive integer; k is typically given a value of 1.
Marking the objective function in the API document based on the first tag list.
The API document aligns the label system, and the API function is labeled with a specific label in the established first label list. The API document is divided into two parts of functions and constants, and first, the similarity between the descriptive text of each function and the first tag list can be calculated (not limited to calculating the similarity by using the dice), and the Top k (kcommonly=1) tags are obtained as the tags of the target function through threshold definition. Next, the similarity between the descriptive text of the constant and the first tag list may be calculated (not limited to calculating the similarity by using the dice), and the Top k (ktypically=1) tags may be obtained as the tags of the target constant by defining the threshold. And obtaining related objective functions by searching the API document, marking corresponding labels for the functions, so as to obtain the marked API function based on the first label list, and marking the marked API function as TAPI.
In a specific embodiment of the invention, an example library is constructed, first task data of all collected scenes is manually marked, the task data of all collected scenes comprises the established scenes and the collected tasks, and the manual marking is task code writing, so that a basic example library is obtained. The basic example library comprises a plurality of pieces of example data, wherein each piece of example data is recorded as S and comprises the following contents: 1. an example task, denoted s.task, like: selecting the first ten of the A columns; 2. example code, denoted s.code, is a piece of code that needs to be executed to solve the example task content.
For each s.task in the base example library, a large model based approach was used for label extraction, denoted s.task_tags.
And carrying out code syntax analysis on each S.CODE in the basic example library to obtain all called function names, inquiring TAPI for the name of each function call to obtain a label list corresponding to the whole code, and marking the label list as S.CODE_TAGS.
Further, calculating an accuracy of the example code validity, in particular, based on the labels in the label list of the example task and the labels in the label list of the example code, the accuracy of the example code validity for evaluating the quality of the underlying example library comprises the steps of:
taking the label in the label list of the example task as an actual value, wherein the actual value is the label contained in the S.TASK_TAGS, and taking the label in the label list of the example code as a predicted value, and the predicted value is the label contained in the S.CODE_TAGS;
and (3) calculating: f1 =2×precision×recovery/(precision+recovery);
where recovery=tp/(tp+fn), precision=tp/(tp+fp);
f1 is the accuracy of the example code validity; TP (True Positives) is a real example, wherein the real example is a predicted value and an actual value are both positive examples; FP (False Positives) is a false positive example, the false positive example is a positive example of the predicted value and the actual value is a negative example; FN (False Negatives) is a false negative example, where the predicted value is negative and the actual value is positive; TN (True Negatives) is a true negative example, where the true negative example is a predicted value and the actual value are both negative examples. The recovery represents the duty cycle of the real case in all the actual positive cases, and the precision represents the duty cycle of the real case in all the predicted positive cases.
In another preferred embodiment calculating the accuracy of the example code validity for evaluating the quality of the base example library based on the labels in the label list of the example task and the labels in the label list of the example code comprises the steps of:
taking the label in the label list of the example task as an actual value, wherein the actual value is the label contained in the S.TASK_TAGS, and taking the label in the label list of the example code as a predicted value, and the predicted value is the label contained in the S.CODE_TAGS;
and (3) calculating: f-score= (1+β) 2 )×precision×recallβ 2 ×precision+recall;
Where recovery=tp/(tp+fn), precision=tp/(tp+fp);
F-Score is the accuracy of the example code validity, TP (True Positives) is a true example, the true example is a predicted value and an actual value are both positive examples, FP (True Positives) is a false positive example, the false positive example is a predicted value is a positive example and the actual value is a negative example; FN (False Negatives) is a false negative example, where the predicted value is negative and the actual value is positive; TN (True Negatives) is a true negative example, where the true negative example is a predicted value and the actual value are both negative examples, and β is a set value greater than 1.
The quality of the constructed example library is evaluated by the calculated accuracy of the example code validity, the larger the value of the accuracy of the example code validity indicates the better the quality of the constructed example library.
Manual iterations may optionally be performed, example codes adjusted, etc., based on the accuracy of the example code validity, after which the accuracy of the example code validity is verified in a loop until the established base example library meets quality requirements. Specifically, an evaluation limit value may be set, if the accuracy of the example code validity is higher than the evaluation limit value, the quality of the constructed example library is good, the subsequent use of the example library may be performed, if the accuracy of the example code validity is lower than the evaluation limit value, the quality of the constructed example library is poor, and reconstruction is required, that is, writing of the task code is resumed, until the accuracy of the example code validity is higher than the evaluation limit value.
The invention also provides a using method of the code example library, and the using method is described below.
The code example library in the use method of the code example library of the invention is constructed based on the construction method of the code example library, and the use method comprises the following steps:
receiving a task input by a user as an original task, and extracting task labels of the original task to obtain a label list of the original task;
calculating the similarity of labels in a label list of an original task and a label list of an example task of example data in a basic example library, and taking the first h as candidate examples for example recall, wherein h is a positive integer;
design suggestion frame, suggestion frame includes: candidate examples of system hints and recalls for the target task are illustrated;
based on the prompt framework, the LLM with code generation capability is called to generate the code of the target task.
And recalling examples related to the user problem by using a simple label matching algorithm by using an example library, and realizing controllable code generation after learning examples with few samples by using LLM.
The code generation mainly comprises the following modules: task understanding, example recall, task code generation. The task understanding model is aimed at an original task input by a user, and task label extraction is carried out by using LLM through fewslot measurement, namely the INPUT.TASK_TAGS. The example recall module calculates text character level similarity scores for the input. Task_tags and the s.task_tags (similarity algorithm is not limited to dice, bm25, etc.), taking top h as candidate examples. The task code generation module combines the candidate examples of recall, uses LLM to perform conditional code generation, needs to perform design of a prompt framework, and mainly comprises parts such as system prompt, related examples and the like. And (3) system prompting: the target task is described. Related examples: the data structure of the recall example, which is mainly prefaced, is similar task or subtask description and realization code related to the task input by the current user, and helps LLM learn the code collaboration specification. Based on the prompt framework, the LLM (such as open ai chat) with the code generation capability is called to carry out text renewal, so that the generated code content can be obtained quickly.
The invention can improve the construction quality of the code example library to the greatest extent, ensure the consistency of the code and task semantics, and improve the performance of generating the code by improving the quality of the recall example code under the condition of a small number of samples.
The present invention has been described in detail with reference to the embodiments of the drawings, and those skilled in the art can make various modifications to the invention based on the above description. Accordingly, certain details of the illustrated embodiments are not to be taken as limiting the invention, which is defined by the appended claims.