CN111090462A

CN111090462A - API (application program interface) matching method and device based on API document

Info

Publication number: CN111090462A
Application number: CN201911239725.5A
Authority: CN
Inventors: 潘敏学; 张天; 张则君
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2019-12-06
Filing date: 2019-12-06
Publication date: 2020-05-01
Anticipated expiration: 2039-12-06
Also published as: CN111090462B

Abstract

The invention discloses an API matching method and device based on an API document. The method extracts API information by analyzing the description document of the API. The API information includes: input information, output information, behavior information. And then, respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis. The invention integrates the information of input, output, behavior and the like, and improves the accuracy of API matching.

Description

API (application program interface) matching method and device based on API document

Technical Field

The present invention relates to the field of automation of software design and development.

Background

Software developers often need to rewrite a project using different programming languages in order to migrate the project to different platforms. With the dramatic increase in the number of software, relying solely on manual migration is time consuming and laborious. Many code migration tools have been developed to speed up the migration of the same item in different languages, but they all face the challenge of API matching, i.e. how to match the API in language a to the API in language B.

To solve the API matching challenge, the mainstream method is to obtain API mapping by analyzing and learning the project source codes of different languages. However, this method has strict requirements for the data set. For example, multiple large-scale identical items in different languages, identical code fragments, larger API mapping data sets, etc. are required. Another current method based on the API documents mainly uses methods of statistics and text similarity to realize API mapping, but the methods do not fully utilize semantic information of the documents, such as form parameter description, return value description and API signature, and cannot well realize API matching.

Therefore, in order to avoid the defect of strict data requirement based on codes, the better implementation of API matching by fully utilizing semantic information of API documents is a problem to be solved at present.

Disclosure of Invention

The problems to be solved by the invention are as follows: the APIs of the two languages match the corresponding problem with each other.

In order to solve the problems, the invention adopts the following scheme:

the API matching method based on the API document comprises the following steps:

s1: obtaining description documents of at least two APIs;

s2: extracting API information by analyzing the description document of the API;

the API information includes: input information, output information, behavior information;

s3: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;

the step S2 includes the steps of:

s21: extracting an API name, input parameters and a return type in a description document of the API;

s22: extracting key words in an API function description text of the description document of the API as behavior information;

s23: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;

s24: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;

the step S3 includes:

s31: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;

s32: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;

s33: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;

s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.

Further, according to the API matching method based on API documents of the present invention, in step S1, the description documents of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the steps S2 and S3, API similarity values of the profiles of the APIs in the first language and the profiles of the APIs in the respective second languages are calculated, and then the profile of the API in the second language with the highest API similarity value is selected as a matching result of the profiles of the APIs in the first language.

The API matching device based on the API document comprises the following modules:

m1, used for: obtaining description documents of at least two APIs;

m2, used for: extracting API information by analyzing the description document of the API;

m3, used for: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;

the module M2 includes:

m21, used for: extracting an API name, input parameters and a return type in a description document of the API;

m22, used for: extracting key words in an API function description text of the description document of the API as behavior information;

m23, used for: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;

m24, used for: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;

the module M3 includes:

m31, used for: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;

m32, used for: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;

m33, used for: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;

Further, according to the API matching apparatus based on API documents of the present invention, in the module M1, the description documents of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the modules M2 and M3, API similarity values of the description documents of the API of the first language and the description documents of the APIs of the respective second languages are calculated, and then the description document of the API of the second language with the highest API similarity value is selected as a matching result of the description documents of the API of the first language.

The invention has the following technical effects: the invention integrates the information of input, output, behavior and the like, and improves the accuracy of the API.

Drawings

FIG. 1 is a flow chart of an embodiment of the present invention.

FIG. 2 is an example of a description document of an API entered by an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings.

The API matching method based on the API document is used for matching the API of the Java language with the API of the Swift language. As is well known, mobile applications on the android system are typically developed based on the Java language, while mobile applications on the apple system are typically developed based on the Swift language. In the API matching method based on the API document of the embodiment, specifically, given an API in Java language of a certain project, an API in Swift language corresponding to the API in Java language is found from the API set in Swift language of the same project. More specifically, the description document of the API corresponding to the specified Java language API and the description documents of the APIs corresponding to the APIs in the Swift language API set are first obtained, and then the API of the Swift language corresponding to the description document of the API in the Swift language with the largest API similarity value is selected as the API of the Swift language corresponding to the API in the Java language matching the API in the corresponding Swift language by calculating the API similarity values of the description documents of the APIs in the two languages. The process of calculating the API similarity value of the description document of the two languages API as shown in fig. 1 mainly includes two steps, that is, the step S2 extracts the input/output behavior information and the step S3 calculates the similarity by the input/output behavior information.

Step S2 specifically includes the following steps:

s24: and extracting key words in the API return description text of the description document of the API, and forming corresponding output information by the return types of the key words.

Taking FIG. 2 as an example, the API description in FIG. 2 defines an API named addAlll. The description document of the API comprises four parts: the first part is API definition text, namely "LinkedList Boolean addAll (LinkedList, int index, Collection c)"; the second part is API function Description text, namely part of the content defined by Description; the third part is API parameter description text, namely part of the content defined by Parameters; the fourth part is the API return description text, i.e., the part of the content defined by Return.

In step S21, the first part of API definition text of the API description document is processed, and the API name is extracted as: addAll; the input parameters are: { { LinkedList, anonym }, { int, index }, { Collection, c } }; the return type is Boolean. Wherein the input parameters can be expressed as a set of { p _ type, p _ name }, p _ type representing the type of the input parameters, and p _ name representing the name of the input parameters.

In step S22, namely, the second part of the API function description text in the description document of the API is processed, specifically: and removing stop words and numbers from the first sentence, and drying each word to obtain the keyword. Specifically in the example of fig. 2, the keywords "insert", "element", "spec", "collect", "list", "start", "posit" for word desiccation may be obtained. The set formed by the word drying keywords is behavior information.

In step S23, namely, the third part of the API parameter description text in the description document of the API is processed, specifically: and eliminating stop words and numbers from the first sentence corresponding to the input parameters, drying each word and word to obtain a keyword, and then corresponding to the parameters. In the example of fig. 2, the keywords "insert", "first", "element", "spec", "collect" of word mummification can be obtained for the input parameter { int, index }; for the input parameters { Collection, c }, the keywords "collect", "contact", "element", "add", "list" of word mummification can be obtained. The input information thus composed can be divided into two parts: the first part is input parameter type information, namely { "LinkedList", "int", "Collection" }; the second part is input parameter semantic information, { }, { "insert", "first", "element", "spec", "collect" }, { "collect", "contact", "element", "add", "list" }.

In step S24, a fourth part API of the API description document returns a description text for processing, specifically, stop words and numbers are removed from the first sentence, and each word and word is dried to obtain a keyword. Specifically, in the example of fig. 2, the output information is { Boolean, "true," "list," "change," "result" }.

It should be noted that, in the above steps S22, S23, and S24, the removal of stop words and numbers from the text and the drying of words in the text are familiar to those skilled in the art, and the specific processing procedures are not described in detail herein.

Step S3 is to calculate similarity values for the behavior information, the input information, and the output information, respectively, and then synthesize them to obtain an API similarity value, specifically:

In step S31, the first similarity value is calculated as a similarity value of a set of the word-drying keywords as the behavior information. The similarity calculation of the keyword sets is well known to those skilled in the art, and will not be described in detail herein.

In step S32, the similarity value is calculated for the input parameter type information, and then the similarity value is calculated according to the input parameter semantic information, and finally, the similarity value is integrated.

The similarity value is calculated for the input parameter type information, i.e. the similarity value between the input parameter type information of the two APIs is calculated. For API matching, the two matching API input parameter types must be consistent, and therefore, the similarity value between the input parameter type information of the two APIs is either 1 or 0. If the input parameter type information of the two APIs is the same, the similarity is 1, and otherwise, the similarity is 0. Specifically, in the example of fig. 2, it is sufficient to compare whether the input parameter type information of another API is the same as { "LinkedList", "int", "Collection" }.

And calculating a similarity value according to the input parameter semantic information, namely comparing the input parameter semantic information of the two APIs. Because there are a plurality of input parameters, each input parameter can be calculated independently and then an average similarity value is calculated comprehensively, specifically in the example of fig. 2, similarity is calculated for the keyword sets { }, { "insert", "first", "element", "spec", "collect" }, { "collect", "contact", "element", "add", "list" } corresponding to each input parameter and the keyword sets corresponding to each input parameter of another API, respectively, and then an average similarity value is calculated. Those skilled in the art understand that semantic information of each input parameter may also be synthesized to calculate a similarity value, specifically, in the example of fig. 2, a keyword set { "insert", "first", "element", "spec", "collect", "contact", "element", "add", "list" is obtained after synthesis, and then the keyword set is compared with a keyword set obtained after synthesis of another API input parameter to calculate a similarity value.

The integration between the similarity values calculated for the input parameter type information and the similarity values calculated according to the input parameter semantic information can generally be performed in two ways: the first is a weighted average and the second is a calculated product. Considering that the two matching API input parameter types must be kept consistent, the latter implementation manner is adopted in this embodiment, that is, assuming that the similarity value calculated for the input parameter type information is a, and the similarity value calculated according to the input parameter semantic information is b, then the second similarity value after the two are integrated is a × b.

In addition, another preferable mode is that if the two API input parameter type information are different, the second similarity value is directly taken as 0, otherwise, the similarity value is calculated according to the input parameter semantic information and taken as the second similarity value.

Step S33 is similar to step S32, and first, it is compared whether the return types of the two APIs are consistent, if not, the third similarity value is 0, otherwise, the similarity value is calculated as the third similarity value according to the keyword set of the word drying in the output information. Specifically, in the example of fig. 2, that is, the similarity values between the word-dried keyword set and the keyword sets { "true", "list", "change", "result" } in the output information of the other API are calculated to obtain a third similarity value.

Step S34 can be formulated as: s ═ w₁×s₁+w₂×s₂+w₃×s₃. Wherein s is₁,s₂,s₃Respectively a first similarity value, a second similarity value and a third similarity value; w is a₁,w₂,w₃The weighting coefficients are respectively corresponding to the first similarity value, the second similarity value and the third similarity value, and s is an API similarity value. Weighting coefficients w corresponding to the first similarity value, the second similarity value and the third similarity value₁,w₂,w₃Preset, and comprises the following steps: w is a₁+w₂+w₃＝1。

Claims

1. An API matching method based on an API document is characterized by comprising the following steps:

s1: obtaining description documents of at least two APIs;

the step S2 includes:

the step S3 includes:

2. The API matching method based on API document as recited in claim 1, wherein in said step S1, profiles of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the steps S2 and S3, API similarity values of the profiles of the APIs in the first language and the profiles of the APIs in the respective second languages are calculated, and then the profile of the API in the second language with the highest API similarity value is selected as a matching result of the profiles of the APIs in the first language.

3. An API matching device based on an API document is characterized by comprising the following modules:

m1, used for: obtaining description documents of at least two APIs;

the module M2 includes:

the module M3 includes:

4. The API matching apparatus based on API document as set forth in claim 3, wherein in said module M1, the profile of API in two languages is obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the modules M2 and M3, API similarity values of the description documents of the API of the first language and the description documents of the APIs of the respective second languages are calculated, and then the description document of the API of the second language with the highest API similarity value is selected as a matching result of the description documents of the API of the first language.