CN116541071A - Application programming interface migration method based on prompt learning - Google Patents

Application programming interface migration method based on prompt learning Download PDF

Info

Publication number
CN116541071A
CN116541071A CN202310486787.6A CN202310486787A CN116541071A CN 116541071 A CN116541071 A CN 116541071A CN 202310486787 A CN202310486787 A CN 202310486787A CN 116541071 A CN116541071 A CN 116541071A
Authority
CN
China
Prior art keywords
api
source
parameter
migration
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310486787.6A
Other languages
Chinese (zh)
Inventor
王新颖
姚远
徐锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310486787.6A priority Critical patent/CN116541071A/en
Publication of CN116541071A publication Critical patent/CN116541071A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/76Adapting program code to run in a different environment; Porting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses an application programming interface migration method based on prompt learning, which can effectively solve the problem of matching between a source API and a target API in API migration and the problem of matching API parameters. A hint construction method is presented to generate API migration code using a large-scale code pre-training model. The method mainly comprises the following steps: extracting a source API from source codes to be migrated, and retrieving an API migration mapping table to obtain a candidate target API list; obtaining a parametric function statement of a source API and a candidate target API by using an API document; combining the parameter types, parameter names and parameter quantity matching degree between the source and candidate target APIs to obtain the best matched source API and target API with parameter function statement; and constructing a prompt of the pre-training model, inputting the prompt into the large-scale pre-training model, obtaining prediction output, and extracting the code after migration from the output. Experiments show that the method has higher accuracy than the prior work.

Description

Application programming interface migration method based on prompt learning
Technical Field
The invention relates to an application programming interface migration method based on prompt learning, in particular to a code migration method containing an application programming interface (Application Programming Interface, namely API) based on prompt learning in a cross-language or cross-library mode, and belongs to the technical field of code migration in software development and maintenance.
Background
In software development maintenance tasks, it is often necessary to migrate an item from using a source library or source language to using a target library or target language. However, in a code block to be migrated, an API is often included, so how to migrate code including the API becomes a key to code migration.
Conventional rule-based code migration methods typically rely on careful manual rule formulation, which is costly in terms of both time and effort. The related efforts of code migration across libraries are particularly lacking, they focus mainly on the replacement relationships of APIs, and do not further work on code migration containing APIs. Code migration based on cross languages includes a code migration method based on a statistical machine translation model, more than one, a model architecture of an encoder-decoder is adopted to train a pre-training model, and finally, code migration is performed through fine tuning. However, such work mostly requires high quality parallel code to train as a dataset, although the gitsub has a large number of open-source repositories in multiple languages. However, these data are not parallel and cannot be used for supervised translation. Therefore, the lack of high quality parallel code data is one of the main bottlenecks that hamper the development of code migration models. Furthermore, such work does not take into account migration of APIs alone.
Disclosure of Invention
The invention aims to: existing correlation work tends to use a priori knowledge, such as high quality parallel code, so that the efficiency of code migration can be reduced when such knowledge cannot be collected or the quality of the collected knowledge is low. In addition, more related work performs code migration by manually formulating conversion rules, and such work consumes a lot of manpower and material resources and is expensive. In order to solve the problems and the shortcomings in the prior art, a code migration method containing an application programming interface (Application Programming Interface, i.e. API) based on prompt learning is provided. The present invention considers that considering the API migration and the code migration separately can achieve better effect on the code migration work including the API.
The invention provides an application programming interface migration method based on prompt learning, in particular to a method which calculates a target API most similar to a source API by designing a matching algorithm through giving the source code to be migrated and the API used in the source code. According to different code migration tasks, three types of information, namely source codes, source APIs and target APIs, are input into a large-scale code pre-training model in a code annotation mode as prompts, and then migrated codes are extracted from model prediction output. The following problems need to be considered in the technical scheme:
(1) How to design a matching algorithm to calculate a target API most similar to the source API;
(2) How to construct hints to be input into the model by utilizing the existing data to obtain migration results.
For problem (1), the method collects a series of APIs containing the shape parameters from the API document and distinguishes APIs having the same method name but different parameter lists, constructing an API dataset having a complete parameter list. Then, an API migration mapping table is constructed based on the semantic similarity between the source domain API and the target domain API, and the semantic similarity between APIs is recorded. The method designs a matching algorithm for matching the API with parameter function statement in the appointed source domain with the proper target domain API with parameter function statement, comprehensively considers the similarity of parameter types, the similarity of parameter names and the parameter quantity matching degree, and combines the API semantic similarity obtained by inquiring the API migration mapping table.
For problem (2), it was found by research that the model can return the desired information by entering hints into the large-scale code pre-training model. For two code migration tasks of cross-language and cross-library, the invention designs two different prompt construction methods. First, for a given pair of parametric declarations of source and target APIs, the declarations have the package name deleted but the class name and method name information retained. For cross-library migration of codes in the same programming language, as the programming languages used by the source code and the target code are the same, only the used libraries are different, so that when prompting construction, only the source domain API, the target domain API and the position of the source code are needed to be contained; for a code migration task of cross-language, when a prompt is constructed, a source domain API, a target domain API and the position of source codes are needed to be contained, and source domain programming language types and target domain programming language types are needed to be contained. And finally, constructing prompts in a code annotation form according to different code migration tasks by using a source API, a target API and a source code to be migrated which retain class names and method names, inputting the prompts into a large-scale code pre-training model, obtaining prediction output, and extracting migrated codes from the output, thereby realizing code migration of the APIs in cross-language or cross-library. For example, in experiments, taking a large-scale pre-training model CodeX as an example, the model prediction result can more naturally and accurately complete the code migration task including the application programming interface.
The invention provides an application programming interface migration method based on prompt learning, in particular to a code migration method containing an application programming interface based on prompt learning in a cross-language or cross-library mode, which is CMIAPL (Code Migration Including APIs based on Prompt Learning) for short. Experimental evaluation shows that the method has a good effect on code migration tasks including application programming interfaces of cross-library and cross-language.
The technical scheme is as follows: a migration method of an application programming interface based on prompt learning extracts a source API from source codes to be migrated, queries an API data set constructed by an API document, and accordingly finds a parametric function statement of the source API matched with parameter information of the source API in the source codes. And obtaining a candidate target API list similar to the source API and a similarity value between the candidate target API and the source API by retrieving the API migration mapping table. The query API dataset then finds the parametric declarations of the source API and candidate target APIs. And designing a matching algorithm to calculate the similarity of the parameter types of the source API and all candidate target APIs, the similarity of the parameter names and the parameter quantity matching degree, and combining the semantic similarity of the source API and the candidate target APIs in the API migration mapping table so as to match proper target domain API with parameter function declarations for the API with parameter function declarations in the designated source domain. And finally, according to different code migration tasks, constructing a prompt of a code pre-training model by the source API with parameter function statement, the target API with parameter function statement and the source code to be migrated in a code annotation form, inputting the prompt into the large-scale code pre-training model to obtain a prediction output, and extracting the migrated code from the output so as to realize code migration of the cross-language or cross-library code containing the application programming interface.
The method specifically comprises the following five parts:
(1) Extracting a source API from source codes to be migrated, and obtaining a candidate target API list similar to the source API and a similarity value of the source API and the candidate target API by retrieving an API migration mapping table.
The method constructs an API migration mapping table based on semantic similarity between a source API and a target API, and records the semantic similarity between APIs. For a given source API, a list of candidate target APIs that are similar to the source API and a similarity value for the source API to the candidate target APIs may be obtained by retrieving the API migration map.
(2) The parametrized function declarations of the source and target APIs are obtained by utilizing the API documents of the API dataset.
A series of APIs containing the argument are collected from the API document, and APIs having the same method name but different parameter lists are distinguished, and an API data set having a complete parameter list is constructed. For a given API, its parametrized function declaration may be obtained by querying the API data set.
(3) And calculating the similarity of the parameter types, the similarity of the parameter names and the parameter quantity matching degree of the source API and all the target APIs by designing a matching algorithm, and combining the similarity values of the source API and the candidate target APIs in the API migration mapping table so as to obtain the best matched source API and target API with parameter function statement.
According to the source codes and the source APIs in the source codes, a matching algorithm is designed for matching proper target domain API with parameter function declarations for the API with parameter function declarations in the appointed source domain, and the algorithm comprehensively considers the similarity of parameter types, the similarity of parameter names and the parameter quantity matching degree and combines the API semantic similarity obtained by inquiring an API migration mapping table.
(4) And reserving class names and method names of the source API and the target API, and constructing prompts of the pre-training model in a code annotation form according to different code migration tasks by utilizing the source API, the target API and source codes to be migrated.
First, the package names of the source API and the target API with the argument declarations are deleted, and class names and method names are reserved, because the package names tend to be diversified and misleading, and both the class names and the method names are closely related to the functions of the APIs. For two code migration tasks of cross-language and cross-library, the invention designs two different prompt construction methods. For cross-library migration of codes in the same programming language, as the programming languages used by the source code and the target code are the same, only the used libraries are different, so that when prompting construction, only the source domain API, the target domain API and the position of the source code are needed to be contained; for a code migration task of cross-language, when a prompt is constructed, a source domain API, a target domain API and the position of source codes are needed to be contained, and source domain programming language types and target domain programming language types are needed to be contained.
(5) And inputting the prompt into a large-scale code pre-training model to obtain a prediction output, and extracting the migrated codes from the output, thereby realizing the code migration of the cross-language or cross-library code containing the application programming interface.
Inputting the prompt constructed in the previous step into a large-scale code pre-training model to obtain prediction output, and extracting the migrated codes from the output, thereby realizing the code migration of the cross-language or cross-library code containing the application programming interface. The invention uses the large-scale code pre-training model CodeX to carry out experiments and display, and the model prediction result can more naturally and accurately complete the code migration task comprising the application programming interface.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing a prompt learning based application programming interface migration method as described above when executing the computer program.
A computer-readable storage medium storing a computer program that performs the prompt learning-based application programming interface migration method as described above.
Drawings
FIG. 1 is a flow chart of an implementation of the present invention.
Detailed Description
The present invention is further illustrated below in conjunction with specific embodiments, it being understood that these embodiments are meant to be illustrative of the invention only and not limiting the scope of the invention, and that modifications of the invention, which are equivalent to those skilled in the art to which the invention pertains, will fall within the scope of the invention as defined in the claims appended hereto.
The implementation flow chart of the method including the embodiment is shown in fig. 1. First, a source API is extracted from source code, and then an API migration mapping table is queried to obtain candidate target APIs having functions similar to those implemented by the source API. Then, the API data set constructed by the API document is queried to find the parametric function declaration of the source API that matches the parameter information of the source API in the source code, and to find the parametric function declaration of the candidate target API. Then, a matching algorithm is designed for matching the API with parameter function statement in the appointed source domain with the proper target domain API with parameter function statement, the algorithm comprehensively considers the similarity of parameter types, the similarity of parameter names and the matching degree of parameter quantity, and the API semantic similarity obtained by inquiring the API migration mapping table is combined. And finally, constructing prompts in a code annotation form according to different code migration tasks by utilizing the source API statement with the parameter function, the target API statement with the parameter function and the source code to be migrated, inputting the prompts into a large-scale code pre-training model to obtain prediction output, and extracting the migrated code from the output so as to realize code migration containing application programming interfaces in cross-language or cross-library.
A prompt learning-based cross-language or cross-library code migration method comprising an application programming interface, comprising:
(1) And extracting the source API to be migrated from the source code, and obtaining a candidate target API list and a similarity value of the source API and the candidate target API by retrieving an API migration mapping table.
According to the method, an API migration mapping table is constructed based on semantic similarity between the source domain API and the target domain API, and the semantic similarity between APIs is recorded. For a given source API, a candidate target API list and similarity values of the source API and the candidate target API may be obtained by retrieving the API migration mapping table.
(2) The parametric function declarations of the source and target APIs are obtained by utilizing the API document.
A series of APIs containing the argument are collected from the API document, and APIs having the same method name but different parameter lists are distinguished, and an API data set having a complete parameter list is constructed. For a given API, its parametrized function declaration may be obtained by querying the API data set.
(3) And calculating the similarity of the parameter types, the similarity of the parameter names and the parameter quantity matching degree of the source API and all candidate target APIs by designing a matching algorithm, and combining the similarity values of the source API and the candidate target APIs in an API migration mapping table so as to obtain the best matched source API and target API with parameter function statement.
For a source API, candidate target APIs and similarity values of the candidate target APIs and the source API can be obtained by consulting an API migration mapping table, and the source API and the parametric function declarations of all candidate target APIs are found by querying an API data set. Next, the present invention designs a matching algorithm for matching the appropriate target domain API tape parameter function declarations for the API tape parameter function declarations in the specified source domain. The algorithm is used to calculate the source API src Candidate target API tgt Similarity relation between:
Sim(API src ,API tgt )=k*Sim(API src′ ,API tgt′ )
if Cnt src =Cnt tgt When=0, k=1;
if Cnt src =0 and Cnt tgt Not equal to 0 or Cnt src Not equal to 0 and Cnt tgt When=0, if Sim (API src′ ,API tgt′ ) > 0.5, then k=0.5, otherwise k=0.1;
if Cnt src Not equal to 0 and Cnt tgt When it is not equal to 0,
wherein Sim (API) src ,API tgt ) Representing the similarity value of the source API to the candidate target API. Cnt src Represents the number of parameters of the source API, cnt tgt The number of parameters representing the candidate target APIs,type of i parameter representing source API i Similarity to the parameter types of the candidate target APIs. />The ith parameter name nαme representing the source API i Similarity to the parameter names of candidate target APIs. p is a constant related to the number of parameters of the source API parameter list and the number of parameters of the candidate target API parameter list. Sim (API) src′ ,API tgt′ ) And representing semantic similarity values of the source API and the candidate target API in the API migration mapping table. The specific algorithm is introduced as follows:
first consider the similarity between source API and candidate target API parameters. The number of parameters in the parameter list of the source API is recorded as Cnt src Similarly, the number of parameters of the candidate target API is recorded as Cnt tgt . Separating all parameters in a parameter list to form two lists, wherein one list records the parameter types of the source API and is recorded asWherein n is Cnt src . Another list records the parameter names corresponding to the parameter types, which are recorded asWherein n is Cnt src . The list stores elements in an order consistent with the order in which the parameters appear in the source API. Likewise, for each candidate target API, the parameter type and parameter name of its API are saved. Are respectively marked as-> Wherein n is Cnt tgt
In order to find out the best matching source API and target API with parameter function declarations as accurately as possible, the method also considers the API semantic similarity between the source API and the candidate target API, namely Sim (API) src′ ,API tgt′ ). This allows the candidate target APIs to be initially filtered and ranked because the source API is more likely to be the most functionally similar pair of APIs than the highly similar candidate target APIs. Consider the special case where the number of parameters Cnt of the source API src Number Cnt of parameters of candidate target API of =0 tgt Not equal to 0 or the parameter number Cnt of the source API src Not equal to 0 and the number of parameters Cnt of the candidate target API tgt When=0, the source API src Candidate target API tgt The similarity calculation formula of (2) is as follows:
Sim(API src ,API tgt )=k*Sim(API src′ ,API tgt′ )
where k is a constant, when Sim (API src′ ,API tgt′ ) At > 0.5, k=0.5, otherwise k=0.1. Therefore, when the similarity of the source API and the candidate target API is calculated, the candidate target APIs with high semantic similarity screened out according to the API migration mapping table are ranked slightly before, but not too much. And candidate target APIs with low semantic similarity screened by the API migration mapping table are ranked later but are not ignored.
The similarity of the source API to the list of parameters of the candidate target APIs is then calculated. First, the parameter types are all lowercase, if the parameter types contain'<>", delete this'<>"mark, and delete"<>"content contained in the content. At the same time, possible punctuation marks, such as "etc., present in the parameter type name are deleted. This is done for both the source API and the candidate target API. Specifically, for different parameter types of different migration tasks, the following different processing modes are set. For cross-library migration tasks, since the programming languages used by the source and target code are the sameFor example, the basic parameter types of int, bool, etc. used in the source code and the object code are identical, so when the parameter type of the source API is identical to the parameter type of the candidate object API, the similarity of the two parameter types is recorded as Sim (type src ,type tgt ) =1. But the parameter type may contain "[]"Structure, known from common sense, int and int [ []Is similar in nature, so to distinguish from two identical parameter types, note the similarities it and int [ []The similarity of the parameter types of (a) is Sim (type) src ,type tgt ) =0.95. If the parameter type of the source API or the candidate target API is Object, T, class, the similarity of the parameter type is Sim (type src ,type tgt ) =0.95. For a migration task across languages, since the programming languages used by the source code and the object code are different, for example, a common basic type boolean type, in JAVA language, boolean type is denoted as booleans, and in c# language, boolean type is denoted as bool. Therefore, for this case, the similarity of these two parameter types is noted as Sim (type src ,type tgt ) =1. For string and char []The charequence and the char []The cases of string and charequence, the similarity between these two parameter types is noted as Sim (type) src ,type tgt ) =0.95. If the parameter type of the source API or the candidate target API is Object, T, class, ICollection, the similarity of the parameter type is Sim (type src ,type tgt ) =0.95. The similarity of the parameter types except for this case is calculated in accordance with the similarity of the names of the parameter types. The method utilizes a trained BERT word vector model, when a parameter type name is input, the model outputs corresponding embedded representations, and for a source API and a candidate target API, the embedded representations corresponding to the parameter type are respectively marked as typeVec src And typeVec tgt . And calculating the similarity of two parameter types according to the following cosine similarity calculation formula:
Sim(type src ,type tgt )=cos(typeVec src ,typeVec tgt )
due to consideration of some functionally similar source APIs andin the case where the order of parameters of the target APIs is inconsistent, for example, the order of parameters of the source API "junit. Framework. Assetasetequals (String message, int expected, int actual)" and the functionally similar candidate target API "org. Testng. Assetequals (int actual, int expected, string message)" are inconsistent, and thus, for the ith parameter type of the source API i The similarity between the parameter type and each parameter type of the candidate target API is calculated, and the maximum value is taken as the ith parameter type of the source API i Similarity with the parameter types of the candidate target APIs, and the calculation formula is as follows:
where i=1,..cnt src
Next, the parameter names are processed, first, the parameter names are all lowercase, and special characters or the like possibly contained in the parameter names are deleted. If the parameter name contains'<>", delete'<>", and delete'<>"content contained in the content. Obtaining an embedded representation of a parameter name using the BERT word vector model just used, when a parameter name is input to the model, the model outputs its corresponding embedded representation, which is respectively noted as nameVec for the source API and the candidate target API src And nameVec tgt . Then, the similarity of the two parameter names is calculated according to the following cosine similarity calculation formula:
Sim(name src ,name tgt )=cos(nameVec src ,nameVec tgt )
likewise, the order of parameters of the source API "junit. Framework. Assailant Equals (String message, int expected, int actual)" is inconsistent with the order of parameters of the functionally similar candidate target API "org. Testng. Assailant Equals (int actual, int expected, string message)", for which case the ith parameter name of the source API is referred to as name i Calculate its similarity with each parameter name of candidate target API and take the maximum valueIth parameter name as source API i Similarity to parameter names of candidate target APIs:
where i=1,..cnt src
Next consider the number of API parameters. If the parameter number Cnt of the source API src And the parameter number Cnt of candidate target APIs tgt And not 0, then the degree of match Sim of the ith parameter of the source API with the parameters of the candidate target API i The calculation formula of (2) is as follows:
where i=1,..cnt src
Wherein p is a constant, when Cnt src =Cnt tgt Time and Cnt src When not equal to 0, p=1; when Cnt src ≠Cnt tgt Time and Cnt src When not equal to 0, p=0.95.
Finally, the semantic similarity Sim (API) between the source API and the candidate target API is obtained by consulting an API migration mapping table src′ ,API tgt′ ) Finally source API with parameters src With the target API tgt The similarity calculation formula of (2) is as follows:
then to Sim (API) src ,API tgt ) The results of (a) are ordered from big to small to find the API with the source src Most similar target APIs tgt . While the source API src And target API tgt And the source code to be migrated will be used as hint for code migration.
(4) And reserving class names and method names of the source API and the target API, and constructing prompts of the pre-training model in a code annotation form according to different code migration tasks by utilizing the source API, the target API and source codes to be migrated.
Large-scale code pre-training models have been successfully used to assist programmers in accomplishing a variety of programming tasks, such as type prediction, code completion, code repair, etc., and may also be attempted for code migration class tasks. Prompt learning is a common method for completing a single learning task by using a large-scale pre-training model, and is critical to how to construct a proper prompt according to actual task requirements.
In order to enable the model to understand and successfully perform code migration, for the parameterized function declarations of a given source API and target API, only class names and method names are saved, while package names are deleted. Aiming at different migration tasks, two different prompt construction methods are designed. For cross-library migration of codes in the same programming language, since the programming languages used by the source code and the target code are the same, only the used libraries are different, so that when a hint is constructed, only the source domain API, the target domain API and the position of the source code are needed to be contained. An example of a cross-library hint is as follows:
///source api
Assert.assertEquals(String message,double expected,double actual,double delta)
///target api
Assert.assertEquals(double actual,double expected,double delta,String message)
///source code
double esp=17.2;
double comp=devolvido.comparticipaCom();
assertEquals(″Doesn’t Match″,esp,comp,0);
///target code
the prompt is added with the annotation information of "///source API", "// target API", "// source code", "// target code" to guide the model to output the migration code which uses the target API and realizes the similar function with the source code.
For a code migration task of cross-language, when a prompt is constructed, a source domain API, a target domain API and the position of source codes are needed to be contained, and source domain programming language types and target domain programming language types are needed to be contained. An example of a cross-language hint is as follows:
///JAVA source api
File.mkdirs()
///C#target api
Directory.CreateDirectory(string path)
///JAVA source code
File f=new File(path);
f.mkdirs();
System.out.println(″Directory is created″);
///C#target code
the above code shows a hint example of a migration scenario of JAVA language code to c# language code, and "///JAVA source API", "// c# target API", "// JAVA source code", "// c# target code" annotation information is added to the hint to guide the model to output migration code using the target API and achieving a similar function to the source code.
(5) And inputting the prompt into a large-scale code pre-training model to obtain a prediction output, and extracting the migrated codes from the output, thereby realizing the code migration of the cross-language or cross-library code containing the application programming interface.
The present method is demonstrated and experimented with using a large-scale code pre-training model CodeX. The CodeX model is an improved version of the GPT-3 series model, and is trained from a corpus containing billions of lines of code and natural language text, and can process Python, JAVA, C # programming languages and the like.
And finally, inputting the prompt constructed in the previous step into a large-scale code pre-training model CodeX to obtain a predicted output, and extracting the migrated codes from the output, thereby realizing the code migration containing the application programming interface in a cross-language or cross-library mode.
It will be apparent to those skilled in the art that the steps of a prompt-learning based application programming interface migration method of the embodiments of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device, or distributed over a network of computing devices, or they may alternatively be implemented in program code executable by a computing device, such that they may be stored in a memory device and executed by the computing device, and in some cases, the steps shown or described may be executed in a different order than what is shown or described herein, or they may be implemented as individual integrated circuit modules, respectively, or as individual integrated circuit modules. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.
Experiment setting: experiments are carried out by using a large-scale code pre-training model CodeX, wherein the parameter version of the CodeX model is 'code-davinci-002', and the temperature parameter is set to 0, i.e. the randomness of model output is the lowest. The experiment was run on a server with 2 Tesla T4 GPUs.
Evaluation index: (1) BLEU score is commonly used to measure the quality of machine translation, where BLEU score is used to look at n-gram matches between the target migration code generated and the correct target migration code, the present invention computes 1-gram matches; (2) CodeBLEU: the CodeBLEU score is used to evaluate the quality of the generated code from n-gram matches, weighted n-gram matches, grammar matches, and semantic matches between the generated target migration code and the correct target migration code. The weight of the n-gram matching is set as (0.25,0.25,0.25,0.25), the weight of the weighted n-gram matching is set as 1 if the current word is a keyword, or as 0.2 if the current word is a keyword. Finally, the weighted sum of the similarity of n-gram matching, weighted n-gram matching, grammar matching and semantic matching is the value of codeBLEU, and the weight of the weighted sum is (0.1,0.1,0.4,0.4).
Experimental data: the invention performs cross-library and cross-language code migration experiments containing application programming interfaces. The libraries used for the cross-library code migration experiments are shown in table 1. Cross-language code migration experiments implement JAVA to C# code migration. According to the method, two reference files are prepared in advance, firstly, an API migration mapping table is constructed based on semantic similarity between a source domain API and a target domain API, and the semantic similarity between APIs is recorded. A series of APIs containing the arguments are then collected from the API document and the APIs with the same method name but different parameter lists are distinguished, creating an API dataset with a complete parameter list. The invention also prepares a source code data set for constructing prompts, and respectively collects 14 different source codes containing APIs by using a mockito library, 52 different source codes containing APIs by using a junit library, 13 different source codes containing APIs by using a log4j library, 15 different source codes containing APIs by using a gson library and 15 different source code data containing APIs by using a common-httpcLIent library for a cross-library code migration experiment. For cross-language code migration, the invention collects 144 sections of different source code data containing APIs in JAVA language to complete the task of code migration from JAVA language to C# language.
The comparison method comprises the following steps: two different code migration methods were chosen to compare with CMIAPL:
(1) In order to perform a comparison experiment, the invention uses a code section which is contained in the code submission record and is used for realizing similar functions from a source library and a target library as a source code data set for constructing prompts and is used for code migration by a code X;
(2) A model pre-trained with multi-language fragment denoising auto-coding and multi-language fragment translation is used for code migration between different programming languages.
Experimental results: CMIAPL was compared to both methods and the results are shown in tables 2 and 3. It can be seen that CMIAPL achieves a BLEU score of 0.963 and a codebleu score of 0.946 in the code migration effect across libraries. In the cross-language code migration effect, the BLER score reaches 0.997 and the CodeBLER score reaches 0.882. Compared with a comparison method, the method has the advantages that the method proves that the method has obvious action effect on code migration work containing APIs of cross-library and cross-language.
Table 1 JAVA five pair migration library
TABLE 2 Cross-library code migration Effect
Method BLEU CodeBLEU
MigrationMapper 0.898 0.929
CMIAPL 0.963 0.946
TABLE 3 Cross-language code migration effect
Method BLEU CodeBLEU
MuST-CoST 0.880 0.654
CMIAPL 0.997 0.882

Claims (9)

1. The application programming interface migration method based on prompt learning is characterized in that a source API is extracted from source codes to be migrated, and a candidate target API list which has similar functions with the source API is obtained by searching an API migration mapping table; inquiring an API data set constructed by an API document, and obtaining a parametric function statement of a source API and a candidate target API by using the API document; calculating the similarity of the parameter types, the similarity of the parameter names and the parameter quantity matching degree of the source API and all candidate target APIs through a design matching algorithm, and combining the semantic similarity values of the source API and the candidate target APIs in an API migration mapping table so as to obtain a best-matched source API and target API with parameter function statement and construct a prompt of a pre-training model; finally, inputting the prompt into a large-scale code pre-training model to obtain prediction output, and extracting the migrated codes from the output, thereby realizing cross-language or cross-library code migration containing application programming interfaces; the method specifically comprises the following five parts:
(1) Extracting a source API from source codes to be migrated, and obtaining a candidate target API list and a similarity value of the source API and the candidate target API by retrieving an API migration mapping table;
(2) Obtaining a parametric function declaration of the source API and the candidate target API by using the API document;
(3) Calculating the similarity of the parameter types, the similarity of the parameter names and the parameter quantity matching degree of the source API and all candidate target APIs through a design matching algorithm, and combining the similarity values of the source API and the candidate target APIs in an API migration mapping table so as to obtain a best-matched source API and target API with parameter function statement;
(4) The class names and the method names of the source API and the target API are reserved, and the prompt of a pre-training model is constructed in a code annotation form according to different code migration tasks by utilizing the source API, the target API and source codes to be migrated;
(5) And inputting the prompt into a large-scale code pre-training model to obtain a prediction output, and extracting the migrated codes from the output, thereby realizing the code migration of the cross-language or cross-library code containing the application programming interface.
2. The prompt learning based application programming interface migration method of claim 1, wherein an API migration mapping table is constructed based on semantic similarity between the source domain API and the target domain API, and the semantic similarity between APIs is recorded.
3. The prompt learning based application programming interface migration method of claim 1, wherein a series of APIs containing the shape parameters are collected from the API document and APIs with the same method name but different parameter lists are distinguished, and an API dataset with a complete parameter list is constructed; for a given API, its parametrized function declaration is obtained by querying the API data set.
4. The prompt learning-based application programming interface migration method of claim 1, wherein a matching algorithm is designed for matching the API tape parameter function declarations in the specified source domain with the API tape parameter function declarations in the appropriate target domain, and the algorithm comprehensively considers the similarity of parameter types, the similarity of parameter names and the matching degree of parameter numbers, and combines the API semantic similarity obtained by querying the API migration mapping table.
5. The prompt learning-based application programming interface migration method of claim 1, wherein two different prompt construction methods are designed for two code migration tasks of cross-language and cross-library; first, for a given pair of parametric declarations of source and target APIs, the declaration deletes the package name but retains class name and method name information; for cross-library migration of the same programming language code, when prompting construction, only the source domain API, the target domain API and the position of the source code are needed to be contained; for a code migration task of cross-language, when a prompt is constructed, a source domain API, a target domain API and the position of source codes are needed to be contained, and source domain programming language types and target domain programming language types are needed to be contained.
6. The prompt learning-based application programming interface migration method of claim 1, wherein the prompts are input into a large-scale code pre-training model to obtain predicted outputs, and migrated codes are extracted from the outputs, thereby realizing cross-language or cross-library code migration containing application programming interfaces.
7. The prompt learning based application programming interface migration method of claim 1, wherein a matching algorithm is designed to match the appropriate target domain API tape parameter function declarations for the API tape parameter function declarations in the specified source domain; the matching algorithm is used to calculate the source API src Candidate target API tgt Similarity relation between:
Sim(API src ,API tgt )=k*Sim(API src′ ,API tgt′ )
if Cnt src =Cnt tgt When=0, k=1;
if Cnt src =0 and Cnt tgt Not equal to 0 or Cnt src Not equal to 0 and Cnt tgt When=0, if Sim (API src′ ,API tgt′ )>0.5, then k=0.5, otherwise k=0.1;
if Cnt src Not equal to 0 and Cnt tgt When it is not equal to 0,
wherein Sim (API) src ,API tgt ) A similarity value representing a source API and a candidate target API; cnt src Represents the number of parameters of the source API, cnt tgt The number of parameters representing the candidate target APIs,type of i parameter representing source API i Similarity to the parameter types of the candidate target APIs; />The ith parameter name representing the source API i Similarity to the parameter names of the candidate target APIs; p is a constant related to the number of parameters of the source API parameter list and the number of parameters of the candidate target API parameter list; sim (API) src′ ,API tgt′ ) And representing semantic similarity values of the source API and the candidate target API in the API migration mapping table.
8. A computer device, characterized by: the computer device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the prompt learning based application programming interface migration method of any one of claims 1-6 when executing the computer program.
9. A computer-readable storage medium, characterized by: the computer-readable storage medium stores a computer program for executing the prompt learning-based application programming interface migration method according to any one of claims 1 to 6.
CN202310486787.6A 2023-05-04 2023-05-04 Application programming interface migration method based on prompt learning Pending CN116541071A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310486787.6A CN116541071A (en) 2023-05-04 2023-05-04 Application programming interface migration method based on prompt learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310486787.6A CN116541071A (en) 2023-05-04 2023-05-04 Application programming interface migration method based on prompt learning

Publications (1)

Publication Number Publication Date
CN116541071A true CN116541071A (en) 2023-08-04

Family

ID=87448184

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310486787.6A Pending CN116541071A (en) 2023-05-04 2023-05-04 Application programming interface migration method based on prompt learning

Country Status (1)

Country Link
CN (1) CN116541071A (en)

Similar Documents

Publication Publication Date Title
JP5768063B2 (en) Matching metadata sources using rules that characterize conformance
US9318027B2 (en) Caching natural language questions and results in a question and answer system
US8630989B2 (en) Systems and methods for information extraction using contextual pattern discovery
US10430469B2 (en) Enhanced document input parsing
JP2020500371A (en) Apparatus and method for semantic search
US9754083B2 (en) Automatic creation of clinical study reports
US20200356726A1 (en) Dependency graph based natural language processing
Jain et al. Query2vec: An evaluation of NLP techniques for generalized workload analytics
CN115576984A (en) Method for generating SQL (structured query language) statement and cross-database query by Chinese natural language
Nguyen et al. Rule-based extraction of goal-use case models from text
Kashmira et al. Generating entity relationship diagram from requirement specification based on nlp
US20220358379A1 (en) System, apparatus and method of managing knowledge generated from technical data
Cheng et al. A similarity integration method based information retrieval and word embedding in bug localization
Xiao et al. Datalab: A platform for data analysis and intervention
Wang et al. Exploring semantics of software artifacts to improve requirements traceability recovery: a hybrid approach
Malhotra et al. Analyzing and evaluating security features in software requirements
WO2021004118A1 (en) Correlation value determination method and apparatus
US20230385037A1 (en) Method and system for automated discovery of artificial intelligence (ai)/ machine learning (ml) assets in an enterprise
CN116400910A (en) Code performance optimization method based on API substitution
JP6895795B2 (en) Data processing systems, data processing methods, and data processing programs
CN116541071A (en) Application programming interface migration method based on prompt learning
CN113254612A (en) Knowledge question-answering processing method, device, equipment and storage medium
CN117540004B (en) Industrial domain intelligent question-answering method and system based on knowledge graph and user behavior
JP2020170426A (en) Table structure estimation system and table structure estimation method
Pamungkas et al. Performance Improvement of Business Process Similarity Calculation using Word Sense Disambiguation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination