CN116301893A - Lightweight code generation method based on prompt learning - Google Patents

Lightweight code generation method based on prompt learning Download PDF

Info

Publication number
CN116301893A
CN116301893A CN202310237856.XA CN202310237856A CN116301893A CN 116301893 A CN116301893 A CN 116301893A CN 202310237856 A CN202310237856 A CN 202310237856A CN 116301893 A CN116301893 A CN 116301893A
Authority
CN
China
Prior art keywords
model
prompt
training
code
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310237856.XA
Other languages
Chinese (zh)
Inventor
周宇
徐一然
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310237856.XA priority Critical patent/CN116301893A/en
Publication of CN116301893A publication Critical patent/CN116301893A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/42Syntactic analysis
    • G06F8/425Lexical analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a lightweight code generation method based on prompt learning, which comprises the following steps: searching natural language, and returning some 'natural language-code' pairs most similar to the current natural language in the corpus; reordering the results, and using the result with the highest score after reordering as prompt information; splicing the search result with the original natural language, and training by using a pre-training language model CodeT 5-base; the light weight method is studied simultaneously in training, and particularly, most of parameters of the model are fixed and do not participate in the fine tuning process of the model; the model is finally tested to achieve the final objective of code generation. The invention researches the effectiveness of template prompt on the model generation result and the feasibility of a light weight method from the angles of prompt learning and light weight pre-training language model, and fuses with deep learning, thereby realizing the generation work of some corresponding code languages according to natural language.

Description

Lightweight code generation method based on prompt learning
Technical Field
The invention belongs to the technical field of automatic code generation of intelligent software engineering, and particularly relates to a code generation method integrating prompt learning and light-weight deep learning.
Background
With the development of modern society, computer software plays an indispensable role in human society, and in the development process of software engineering, related researchers are also always devoted to related research of automatic program generation, because coding is a vital ring in the whole software life cycle, the coding is a direct realization of early demand design, and is also a difficult and complicated task, and the automatic generation of codes can improve the working efficiency of programmers and reduce repeated labor. Therefore, how to make a computer automatically generate codes required by programmers is one of the current studies.
The academic community's study of code generation can be traced back to early formalized logical deduction methods. However, as open source codes continue to evolve, many high quality open source projects provide program information in all aspects, such as code notes, source codes, and test cases, so researchers have begun to divert their eyes to study code generation using some machine-learned methods based on the statistical nature of these data. However, machine learning depends on a certain feature extraction, and the establishment of feature engineering by a manual mode is often unreliable and cannot well reflect the real situation of data, and the use of manually extracted features for code data with high abstract is obviously not completed, so that research shows that deep learning has a good effect on automatic code generation and is a mainstream research method at the same time.
Also, prompt learning has a great deal of research effort in the field of Natural Language Processing (NLP), and the main method is to perform prompt operation on models by constructing templates in a certain format. As a technical means of knowledge-like enhancement, such hints may enhance the performance of model generation, especially when the pre-trained language model is faced with a small amount of data and the generation of programming language not yet involved in the pre-training phase is more pronounced. Due to the dependence of deep learning on a high-quality data set, prompt learning can select proper prompt information from limited data resources to conduct guidance generation on a model.
For lightweight model parameter tuning, the main method is to fix most of parameters in a pre-trained language model, so that the parameters cannot participate in the updating process in the subsequent fine tuning process, because as the data scale is continuously expanded, more and more large-scale models are used for researching code generation, a certain threshold is brought to researchers, and the large models are often trained by means of high computational resources, so that the computational overhead brought by fine tuning of the model can be effectively reduced on the premise that the performance of the model is not obviously affected.
Disclosure of Invention
In view of the above-described drawbacks of the related art, an object of the present invention is to provide a lightweight code generation method based on prompt learning.
In order to achieve the above purpose, the invention adopts the following technical scheme:
according to the lightweight code generation method based on prompt learning, based on the pre-training language model with the CodeT5-base as a main body, the performance of the model can be effectively guaranteed and even improved on the premise of reducing calculation cost by using two technical means of retrieval knowledge prompt and lightweight parameter fine adjustment. The method comprises the following steps:
(1) Searching according to the natural language to be input into the model, and returning some natural language-code pairs most similar to the current natural language into the corpus according to a certain searching mode;
(2) The returned results are reordered according to a certain rule, and the result with the highest score after reordering is recorded so as to be used as an input prompt later;
(3) Splicing the retrieved result and the natural language to be input into the model according to a certain template, training by using a deep learning model transducer (specifically using a pre-training model code 5-base), wherein in the training process, the encoder is responsible for encoding and extracting the data characteristics of the text sequence, inputting the characteristics into the encoder, and capturing the relation between the tokens through a multi-head attention mechanism in the encoder;
(4) When the decoder is trained, combining the hidden state and the group-trunk (namely expected output codes of the model) after being output by the multi-layer coding module to carry out a process of testing-forming, completing probability output of a final prediction result by means of a mask self-attention mechanism in the decoder, calculating a loss value when in training by using a loss function of cross entropy, and carrying out a subsequent model parameter updating process;
(5) A method for generating light codes is implemented simultaneously in training, and particularly, most of parameters of a pre-training language model are fixed, the parameters do not participate in the fine tuning process of the model, and finally, the fine-tuning trained model is tested.
In order to optimize the technical scheme, the specific measures adopted further comprise:
the above-mentioned search operation in step (1) does not adopt conventional information search modes such as BM25 and TFIDF, because these search modes only consider word frequency and repetition degree and fail to solve the problem of semantic relevance of terms, and meanwhile, some pre-training language models based on deep learning exist, such as BERT, when information search is performed, that is, semantic similarity matching, there is an anisotropic (anisotropic) problem between the tokens, so when final cosine similarity is performed, the result obtained by the model cannot truly reflect the similarity, that is, there is a deviation, so that the most direct cause of the problem arises is that the calculation of cosine similarity is established on the premise of standard orthogonality, but the sentence vectors provided by the pre-training language model are not necessarily on the basis of the standard orthogonality, so there is a problem.
In order to solve the above problem, it is decided to attempt to orthogonalize sentence vectors generated by a pre-training language model by using a whitening (whitening) method, so as to achieve isotropy between the token. The whitening method is specifically shown as the following formula:
Figure BSA0000296710580000031
where x_i is the sentence vector generated by the language model, μ is the mean of the set of sentence vectors, W is the whitening transformation matrix responsible for transforming the covariance matrix of the sentence vectors into a unit matrix.
After the transformation, the new sentence vector is obtained
Figure BSA0000296710580000032
Then the vector satisfying the standard orthogonal basis, at this time, the semantic similarity between the respective vectors can be calculated by using the cosine similarity formula as follows:
Figure BSA0000296710580000033
wherein x is i And y i Is two sentence vectors.
When the retrieval result is reordered in the step (2), the TopK similar results retrieved after whitening are taken as 5 by K, and the results are reordered again, so that a most similar result is obtained. The Jaccard similarity is used as a sequencing basis during the reordering, and the calculation formula is as follows:
Figure BSA0000296710580000034
s1 and S2 are token sets of the current query statement and the statement to be sequenced respectively, and the similarity of the two sets is reflected by the value.
The code generator used in the step (3) is a pre-training language model taking the code 5-base of the Encoder-Decoder architecture as a backbone, the structure is naturally applicable to the task of code generation, and the pre-training language model is one of the currently recognized models most suitable for code generation. For model input, advanced preprocessing is needed, specifically, before the similar natural language-code pair obtained after retrieval and reordering in the step (2) is spliced to the original input as model prompt information, reconstructing according to a certain template, and the reconstructed input structure is as follows:
‘Generate<LAN>:<NL`>,Code:<PL`>;Generate<LAN>:<NL>,Code:’
where < LAN > represents the type of language to be generated, < NL 'and < PL' are the natural language requirements most similar to the current input < NL > and their corresponding code implementations, respectively, that were queried during the retrieval phase.
In the step (3), a multi-head attention mechanism of a traditional transducer is adopted when the generated model is finely tuned. In the encoder stage, when a text is input into a model, the text is converted into a vector form after passing through an Embedding layer, and after the processing of position coding, each word vector has respective position information coding, so that the position information of a word can be obtained when a self-attention value is calculated conveniently, the characteristics of a text sequence and the relation among various token can be extracted by calculating the self-attention value, and the calculation formula of the self-attention value is as follows:
Figure BSA0000296710580000041
wherein Q, K, V is a new vector generated by linear transformation of the input vector, d k Is QK T Dimension of the matrix, softmax (·) the results were subjected to softmax normalization operations.
When the decoder in the step (4) generates the result probability output in the training stage, the invention adopts the cross entropy loss function for describing the difference between the real probability distribution and the predicted probability distribution, so as to calculate the loss value for the subsequent model parameter update, and the calculation formula of the cross entropy loss function is as follows:
Figure BSA0000296710580000042
where p (x) represents the true distribution of samples and q (x) represents the distribution predicted by the model.
The lightweight code generation method described in the step (5) refers to fixing most of parameters when the model is subjected to fine tuning, and the fixed parameters do not participate in the parameter updating process, so that the calculation resource cost in model training can be reduced, the effect similar to that of a full-parameter fine tuning model can be achieved, and the feasibility of lightweight code generation is met. Specifically, the template in the step (3) is further improved, and besides the fixed prompt (hard prompt), a section of learning vector is added before the prompt, so that the template is used as a soft prompt. The prompting method can be directly executed in an embedded space of a model and has own learning parameters, the parameters can be correspondingly regulated and optimized according to training data of a downstream task, and the parameters are used as a hybrid prompt (hybrid prompt), and the specific prompt format is as follows:
Hybrid Prompt:=CONCAT(<SP>,<HP>)
wherein < SP > represents softprompt, < HP > represents the immobilization hint already constructed in step (3), CONCAT (·) represents natural splice.
Drawings
Fig. 1 is a schematic diagram of a lightweight code generation method based on prompt learning
Detailed Description
The invention will be further described with reference to examples and drawings, to which reference is made by way of illustration, but not limitation, for the understanding of those skilled in the art.
Referring to fig. 1, the lightweight code generation method based on prompt learning includes the steps of:
(1) Searching is carried out according to the natural language to be input into the model, and some natural language-code pairs which are most similar to the current natural language are returned in the corpus according to a certain searching mode. The invention does not adopt the conventional information retrieval modes such as BM25, TFIDF and the like, because the retrieval modes only consider word frequency and repetition degree and can not solve the problem of semantic relevance of words, and meanwhile, when information retrieval is carried out, namely semantic similarity matching, the problem of anisotropy (aniotopy) between the tokens exists, so that when final cosine similarity is carried out, the similarity can not be truly reflected by the results obtained by the models, namely deviation exists, the most direct cause of the problem is that the calculation of the cosine similarity is established on the premise of standard orthogonal basis, but sentence vectors provided by the pre-trained language model are not necessarily on the standard orthogonal basis, so that the problem exists.
In order to solve the above problem, it is decided to attempt to orthogonalize sentence vectors generated by a pre-training language model by using a whitening (whitening) method, so as to achieve isotropy between the token. The whitening method is specifically shown as the following formula:
Figure BSA0000296710580000051
where x_i is the sentence vector generated by the language model, u is the mean of the set of sentence vectors, W is the whitening transformation matrix responsible for transforming the covariance matrix of the sentence vectors into a unit matrix.
After the transformation, the new sentence vector is obtained
Figure BSA0000296710580000052
Then the vector satisfying the standard orthogonal basis, at this time, the semantic similarity between the respective vectors can be calculated by using the cosine similarity formula as follows:
Figure BSA0000296710580000053
wherein x is i And y i Is two sentence vectors.
(2) And reordering the returned results according to a certain rule, and recording the result with the highest score after reordering so as to be used as an input prompt later. When reordering the retrieval results, K is taken as 5 for TopK similar results retrieved after whitening, and the results are reordered again to obtain a most similar result. The Jaccard similarity is used as a sequencing basis during the reordering, and the calculation formula is as follows:
Figure BSA0000296710580000054
s1 and S2 are token sets of the current query statement and the statement to be sequenced respectively, and the similarity of the two sets is reflected by the value.
(3) Splicing the retrieved result and the natural language to be input into the model according to a certain template, training by using a deep learning model transducer, wherein in the training process, the encoder is responsible for encoding and extracting the data characteristics of the text sequence, inputting the characteristics into the encoder, and capturing the relation between the token through a multi-head attention mechanism in the encoder. The code generator used is a pre-trained language model based on the code 5-base of the Encoder-Decoder architecture, which is naturally suitable for code generation tasks, and which is one of the most suitable models currently accepted for code generation. For model input, advanced preprocessing is needed, specifically, before the similar natural language-code pair obtained after retrieval and reordering in the step (2) is spliced to the original input as model prompt information, reconstructing according to a certain template, and the reconstructed input structure is as follows:
‘Generate<LAN>:<NL`>,Code:<PL`>;Generate<LAN>:<NL>,Code:’
where < LAN > represents the type of language to be generated, < NL 'and < PL' are the natural language requirements most similar to the current input < NL > and their corresponding code implementations, respectively, that were queried during the retrieval phase.
When fine tuning the generative model, a multi-head attention mechanism of a traditional transducer is adopted. In the encoder stage, when a text is input into a model, the text is converted into a vector form after passing through an Embedding layer, and after the processing of position coding, each word vector has respective position information coding, so that the position information of a word can be obtained when a self-attention value is calculated conveniently, the characteristics of a text sequence and the relation among various token can be extracted by calculating the self-attention value, and the calculation formula of the self-attention value is as follows:
Figure BSA0000296710580000061
wherein Q, K, V is a new vector generated by linear transformation of the input vector, d k Is QK T Dimension of the matrix, softmax (·) the results were subjected to softmax normalization operations.
(4) And when the decoder is trained, combining the hidden state and the group-trunk (namely the expected output code of the model) after being output by the multi-layer coding module to carry out a process of testing-forming, completing probability output of a final prediction result by means of a mask self-attention mechanism in the decoder, calculating a loss value when in training by using a loss function of cross entropy, and carrying out a subsequent model parameter updating process. When the decoder generates result probability output in a training stage, the invention adopts a cross entropy loss function for describing the difference between the real probability distribution and the predicted probability distribution, so as to calculate a loss value for updating the subsequent model parameters, wherein the calculation formula of the cross entropy loss function is as follows:
Figure BSA0000296710580000062
where p (x) represents the true distribution of samples and q (x) represents the distribution predicted by the model.
(5) A method for generating light codes is implemented simultaneously in training, and particularly, most of parameters of a pre-training language model are fixed, the parameters do not participate in the fine tuning process of the model, and finally, the fine-tuning trained model is tested. The lightweight code generation method can achieve the effects of reducing the cost of computing resources in model training and achieving the effect similar to that of a full-parameter fine tuning model, and meets the feasibility of lightweight code generation. Specifically, the template in the step (3) is further improved, and besides the fixed prompt (hard prompt), a section of learning vector is added before the prompt, so that the template is used as a soft prompt. The prompting method can be directly executed in an embedded space of a model and has own learning parameters, the parameters can be correspondingly regulated and optimized according to training data of a downstream task, and the parameters are used as a hybrid prompt (hybrid prompt), and the specific prompt format is as follows:
Hybrid Prompt:=CONCAT(<SP>,<HP>)
wherein < SP > represents soft sample, < HP > represents the immobilization hint already constructed in step (3), CONCAT (·) represents the natural splice.
The performance of the process according to the invention is shown experimentally below.
The main contents of the experiment are: the method is characterized in that a 'natural language-code' pair which is most similar to the current natural language in a corpus (training set) is searched, the natural language-code pair is used as prompt information to be spliced with the original input, a code result generated by a model is recorded, and the code result is compared with a real result, so that the effectiveness of the method is reflected. Meanwhile, the memory overhead generated by fine adjustment of the model is recorded to reflect the feasibility of fine adjustment of the light-weight parameters.
The experiment employed two data sets: one is the data set of domain-specific language solubility 4CG and the other is the data set of general language Java, con code.
The calculation mode of the model performance evaluation is BLEU, codeBLEU and EM values, the indexes are widely applied to the task of code generation, and the higher the result is, the better the effect is; the MEM value represents the actual consumed memory condition during fine tuning of the model, and the PAR value represents the proportion of the parameters actually participating in fine tuning, and takes the parameter quantity of the CodeT5-base model as a reference. The specific results are shown in tables 1 and 2:
TABLE 1 Solidity4CG Experimental results Table
Figure BSA0000296710580000071
TABLE 2 CONCODE test results Table
Figure BSA0000296710580000081
The bolded data in tables 1 and 2 represent the optimal results of the experiment, and the underlined data represent the suboptimal results of the experiment. Experiments show that the method can ensure that the performance of the model is not influenced or even exceeds the original generation result of the pre-training language model on the premise of effectively reducing the memory overhead during fine adjustment of the model, and the model related by the method has good code generation capability.
The present invention has been described in terms of the preferred embodiments thereof, and it should be understood by those skilled in the art that various modifications can be made without departing from the principles of the invention, and such modifications should also be considered as being within the scope of the invention.

Claims (7)

1. A lightweight code generation method based on prompt learning is characterized by comprising the following steps:
(1) Searching according to the natural language to be input into the model, and returning some natural language-code pairs most similar to the current natural language into the corpus according to a certain searching mode;
(2) The returned results are reordered according to a certain rule, and the result with the highest score after reordering is recorded so as to be used as an input prompt later;
(3) Splicing the retrieved result and the natural language to be input into the model according to a certain template, training by using a deep learning model transducer (specifically using a pre-training model code 5-base), wherein in the training process, the encoder is responsible for encoding and extracting the data characteristics of the text sequence, inputting the characteristics into the encoder, and capturing the relation between the tokens through a multi-head attention mechanism in the encoder;
(4) When the decoder is trained, combining the hidden state and the group-trunk (namely expected output codes of the model) after being output by the multi-layer coding module to carry out a process of testing-forming, completing probability output of a final prediction result by means of a mask self-attention mechanism in the decoder, calculating a loss value when in training by using a loss function of cross entropy, and carrying out a subsequent model parameter updating process;
(5) A method for generating light codes is implemented simultaneously in training, and particularly, most of parameters of a pre-training language model are fixed, the parameters do not participate in the fine tuning process of the model, and finally, the fine-tuning trained model is tested.
2. The lightweight code generation method based on prompt learning according to claim 1, wherein for the search operation in the step (1), conventional information search modes such as BM25 and TFIDF are not adopted, because these search modes only consider word frequency and repetition degree and fail to solve the problem of semantic relevance of terms, meanwhile, some pre-training language models based on deep learning, such as BERT, exist in the process of information search, that is, semantic similarity matching, and there is an anisotropic (anotropy) problem between the tokens, so when the final cosine similarity is subjected to semantic comparison, the result obtained by the model cannot truly reflect the similarity, that is, there is a deviation, and the most direct cause that the problem occurs is that the calculation of the cosine similarity is to be established on the premise of the standard orthogonality, but the sentence vector provided by the pre-training language model is not necessarily on the standard orthogonality, so that there is a problem.
In order to solve the above problem, it is decided to attempt to orthogonalize sentence vectors generated by a pre-training language model by using a whitening (whitening) method, so as to achieve isotropy between the token. The whitening method is specifically shown as the following formula:
Figure FSA0000296710570000011
where x_i is the sentence vector generated by the language model, μ is the mean of the set of sentence vectors, W is the whitening transformation matrix responsible for transforming the covariance matrix of the sentence vectors into a unit matrix.
After the transformation, the new sentence vector is obtained
Figure FSA0000296710570000023
Then the vector satisfying the standard orthogonal basis, at this time, the semantic similarity between the respective vectors can be calculated by using the cosine similarity formula as follows:
Figure FSA0000296710570000021
wherein x is i And y i Is two sentence vectors.
3. The method for generating lightweight codes based on prompt learning according to claim 1, wherein when reordering the search results in the step (2), K is taken as 5 for TopK similar results retrieved after whitening, and the results are reordered again to obtain a most similar result. The Jaccard similarity is used as a sequencing basis during the reordering, and the calculation formula is as follows:
Figure FSA0000296710570000022
s1 and S2 are token sets of the current query statement and the statement to be sequenced respectively, and the similarity of the two sets is reflected by the value.
4. The method of claim 1, wherein the code generator used in the step (3) is a pre-training language model based on the code 5-base of the Encoder-Decoder architecture, which is naturally applicable to the task of code generation, and is one of the recognized models currently most suitable for code generation. For model input, advanced preprocessing is needed, specifically, before the similar natural language-code pair obtained after retrieval and reordering in the step (2) is spliced to the original input as model prompt information, reconstructing according to a certain template, and the reconstructed input structure is as follows:
‘Generate<LAN>:<NL`>,Code:<PL`>;Generate<LAN>:<NL>,Code:’
where < LAN > represents the type of language to be generated, < NL 'and < PL' are the natural language requirements most similar to the current input < NL > and their corresponding code implementations, respectively, that were queried during the retrieval phase.
5. The method for generating lightweight codes based on prompt learning according to claim 1, wherein in the step (3), a multi-head attention mechanism of a conventional transducer is adopted when the model is generated by fine tuning. In the encoder stage, when a text is input into a model, the text is converted into a vector form after passing through an Embedding layer, and after the processing of position coding, each word vector has respective position information coding, so that the position information of a word can be obtained when a self-attention value is calculated conveniently, the characteristics of a text sequence and the relation among various token can be extracted by calculating the self-attention value, and the calculation formula of the self-attention value is as follows:
Figure FSA0000296710570000031
wherein Q, K, V is a new vector generated by linear transformation of the input vector, d k Is QK T Dimension of the matrix, softmax (·) the results were subjected to softmax normalization operations.
6. The method for generating lightweight codes based on prompt learning as recited in claim 1, wherein for the decoder in the step (4) generating the result probability output in the training phase, the present invention uses a cross entropy loss function for describing the difference between the true probability distribution and the predicted probability distribution, so as to calculate the loss value for the subsequent model parameter update, and the calculation formula of the cross entropy loss function is as follows:
Figure FSA0000296710570000032
where p (x) represents the true distribution of samples and q (x) represents the distribution predicted by the model.
7. The method for generating the lightweight code based on prompt learning according to claim 1, wherein the lightweight code generation method described in the step (5) is characterized in that most of parameters are fixed when the model is subjected to fine tuning, and the fixed parameters do not participate in the parameter updating process, so that the cost of computing resources during model training can be reduced, the effect similar to that of a full-parameter fine tuning model can be achieved, and the feasibility of lightweight code generation is satisfied. Specifically, the template in the step (3) is further improved, and besides the fixed prompt (hard prompt), a section of learning vector is added before the prompt, so that the template is used as a soft prompt. The prompting method can be directly executed in an embedded space of a model and has own learning parameters, the parameters can be correspondingly regulated and optimized according to training data of a downstream task, and the parameters are used as a hybrid prompt (hybrid prompt), and the specific prompt format is as follows:
Hybrid Prompt:=CONCAT(<SP>,<HP>)
wherein < SP > represents softprompt, < HP > represents the immobilization hint already constructed in step (3), CONCAT (·) represents natural splice.
CN202310237856.XA 2023-03-13 2023-03-13 Lightweight code generation method based on prompt learning Pending CN116301893A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310237856.XA CN116301893A (en) 2023-03-13 2023-03-13 Lightweight code generation method based on prompt learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310237856.XA CN116301893A (en) 2023-03-13 2023-03-13 Lightweight code generation method based on prompt learning

Publications (1)

Publication Number Publication Date
CN116301893A true CN116301893A (en) 2023-06-23

Family

ID=86820186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310237856.XA Pending CN116301893A (en) 2023-03-13 2023-03-13 Lightweight code generation method based on prompt learning

Country Status (1)

Country Link
CN (1) CN116301893A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644196A (en) * 2023-07-26 2023-08-25 北京智谱华章科技有限公司 Parameter-based efficient general retrieval method and device
CN116719520A (en) * 2023-08-07 2023-09-08 支付宝(杭州)信息技术有限公司 Code generation method and device
CN117724695A (en) * 2024-02-18 2024-03-19 浙江同花顺智能科技有限公司 Code generation optimization method, device, equipment and medium for large language model

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116644196A (en) * 2023-07-26 2023-08-25 北京智谱华章科技有限公司 Parameter-based efficient general retrieval method and device
CN116719520A (en) * 2023-08-07 2023-09-08 支付宝(杭州)信息技术有限公司 Code generation method and device
CN116719520B (en) * 2023-08-07 2023-11-17 支付宝(杭州)信息技术有限公司 Code generation method and device
CN117724695A (en) * 2024-02-18 2024-03-19 浙江同花顺智能科技有限公司 Code generation optimization method, device, equipment and medium for large language model
CN117724695B (en) * 2024-02-18 2024-04-30 浙江同花顺智能科技有限公司 Code generation optimization method, device, equipment and medium for large language model

Similar Documents

Publication Publication Date Title
CN116301893A (en) Lightweight code generation method based on prompt learning
CN110020438A (en) Enterprise or tissue Chinese entity disambiguation method and device based on recognition sequence
CN117236337B (en) Method for generating natural language based on mixed prompt learning completion history knowledge graph
CN110765264A (en) Text abstract generation method for enhancing semantic relevance
CN115048447B (en) Database natural language interface system based on intelligent semantic completion
CN118170894B (en) Knowledge graph question-answering method, knowledge graph question-answering device and storage medium
CN117827886B (en) Method for converting natural sentence into SQL sentence based on large language model
CN115858750A (en) Power grid technical standard intelligent question-answering method and system based on natural language processing
CN117807202A (en) Knowledge graph enhanced large language model reasoning trademark law intelligent question-answering method
CN117909458A (en) Construction method of mould specialized question-answering system based on LLM model
CN117828050B (en) Traditional Chinese medicine question-answering method, equipment and medium based on long-document retrieval enhancement generation
CN117851445A (en) Large language model Text2SQL chart generation method and device
CN117492825A (en) Method for generating stability annotation based on context learning and large language model
CN114676708B (en) Low-resource neural machine translation method based on multi-strategy prototype generation
CN115858736A (en) Emotion text generation method based on emotion prompt fine adjustment
CN115906879A (en) Translation model training method for vertical domain and storage medium
Chen et al. Eliciting knowledge from language models with automatically generated continuous prompts
Gu et al. Extension-Compression Learning: A deep learning code search method that simulates reading habits
Ribeiro et al. Domain adaptation in dialogue systems using transfer and meta-learning
CN118331152B (en) Industrial control system logic optimization method and system based on natural language big model
CN117035064B (en) Combined training method for retrieving enhanced language model and storage medium
CN117575026B (en) Large model reasoning analysis method, system and product based on external knowledge enhancement
CN117272979B (en) Unsupervised sentence representation method, device, computer equipment and storage medium
PV et al. An approach to customization of pre-trained neural network language model to specific domain
Xia et al. SPEADO: Segmentation and Punctuation for Ancient Chinese Texts via Example Augmentation and Decoding Optimization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication