CN111090462A - API (application program interface) matching method and device based on API document - Google Patents

API (application program interface) matching method and device based on API document Download PDF

Info

Publication number
CN111090462A
CN111090462A CN201911239725.5A CN201911239725A CN111090462A CN 111090462 A CN111090462 A CN 111090462A CN 201911239725 A CN201911239725 A CN 201911239725A CN 111090462 A CN111090462 A CN 111090462A
Authority
CN
China
Prior art keywords
api
information
description
similarity value
document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911239725.5A
Other languages
Chinese (zh)
Other versions
CN111090462B (en
Inventor
潘敏学
张天
张则君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201911239725.5A priority Critical patent/CN111090462B/en
Publication of CN111090462A publication Critical patent/CN111090462A/en
Application granted granted Critical
Publication of CN111090462B publication Critical patent/CN111090462B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/73Program documentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an API matching method and device based on an API document. The method extracts API information by analyzing the description document of the API. The API information includes: input information, output information, behavior information. And then, respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis. The invention integrates the information of input, output, behavior and the like, and improves the accuracy of API matching.

Description

API (application program interface) matching method and device based on API document
Technical Field
The present invention relates to the field of automation of software design and development.
Background
Software developers often need to rewrite a project using different programming languages in order to migrate the project to different platforms. With the dramatic increase in the number of software, relying solely on manual migration is time consuming and laborious. Many code migration tools have been developed to speed up the migration of the same item in different languages, but they all face the challenge of API matching, i.e. how to match the API in language a to the API in language B.
To solve the API matching challenge, the mainstream method is to obtain API mapping by analyzing and learning the project source codes of different languages. However, this method has strict requirements for the data set. For example, multiple large-scale identical items in different languages, identical code fragments, larger API mapping data sets, etc. are required. Another current method based on the API documents mainly uses methods of statistics and text similarity to realize API mapping, but the methods do not fully utilize semantic information of the documents, such as form parameter description, return value description and API signature, and cannot well realize API matching.
Therefore, in order to avoid the defect of strict data requirement based on codes, the better implementation of API matching by fully utilizing semantic information of API documents is a problem to be solved at present.
Disclosure of Invention
The problems to be solved by the invention are as follows: the APIs of the two languages match the corresponding problem with each other.
In order to solve the problems, the invention adopts the following scheme:
the API matching method based on the API document comprises the following steps:
s1: obtaining description documents of at least two APIs;
s2: extracting API information by analyzing the description document of the API;
the API information includes: input information, output information, behavior information;
s3: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;
the step S2 includes the steps of:
s21: extracting an API name, input parameters and a return type in a description document of the API;
s22: extracting key words in an API function description text of the description document of the API as behavior information;
s23: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;
s24: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;
the step S3 includes:
s31: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;
s32: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;
s33: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;
s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.
Further, according to the API matching method based on API documents of the present invention, in step S1, the description documents of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the steps S2 and S3, API similarity values of the profiles of the APIs in the first language and the profiles of the APIs in the respective second languages are calculated, and then the profile of the API in the second language with the highest API similarity value is selected as a matching result of the profiles of the APIs in the first language.
The API matching device based on the API document comprises the following modules:
m1, used for: obtaining description documents of at least two APIs;
m2, used for: extracting API information by analyzing the description document of the API;
the API information includes: input information, output information, behavior information;
m3, used for: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;
the module M2 includes:
m21, used for: extracting an API name, input parameters and a return type in a description document of the API;
m22, used for: extracting key words in an API function description text of the description document of the API as behavior information;
m23, used for: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;
m24, used for: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;
the module M3 includes:
m31, used for: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;
m32, used for: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;
m33, used for: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;
s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.
Further, according to the API matching apparatus based on API documents of the present invention, in the module M1, the description documents of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the modules M2 and M3, API similarity values of the description documents of the API of the first language and the description documents of the APIs of the respective second languages are calculated, and then the description document of the API of the second language with the highest API similarity value is selected as a matching result of the description documents of the API of the first language.
The invention has the following technical effects: the invention integrates the information of input, output, behavior and the like, and improves the accuracy of the API.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is an example of a description document of an API entered by an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings.
The API matching method based on the API document is used for matching the API of the Java language with the API of the Swift language. As is well known, mobile applications on the android system are typically developed based on the Java language, while mobile applications on the apple system are typically developed based on the Swift language. In the API matching method based on the API document of the embodiment, specifically, given an API in Java language of a certain project, an API in Swift language corresponding to the API in Java language is found from the API set in Swift language of the same project. More specifically, the description document of the API corresponding to the specified Java language API and the description documents of the APIs corresponding to the APIs in the Swift language API set are first obtained, and then the API of the Swift language corresponding to the description document of the API in the Swift language with the largest API similarity value is selected as the API of the Swift language corresponding to the API in the Java language matching the API in the corresponding Swift language by calculating the API similarity values of the description documents of the APIs in the two languages. The process of calculating the API similarity value of the description document of the two languages API as shown in fig. 1 mainly includes two steps, that is, the step S2 extracts the input/output behavior information and the step S3 calculates the similarity by the input/output behavior information.
Step S2 specifically includes the following steps:
s21: extracting an API name, input parameters and a return type in a description document of the API;
s22: extracting key words in an API function description text of the description document of the API as behavior information;
s23: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;
s24: and extracting key words in the API return description text of the description document of the API, and forming corresponding output information by the return types of the key words.
Taking FIG. 2 as an example, the API description in FIG. 2 defines an API named addAlll. The description document of the API comprises four parts: the first part is API definition text, namely "LinkedList Boolean addAll (LinkedList, int index, Collection c)"; the second part is API function Description text, namely part of the content defined by Description; the third part is API parameter description text, namely part of the content defined by Parameters; the fourth part is the API return description text, i.e., the part of the content defined by Return.
In step S21, the first part of API definition text of the API description document is processed, and the API name is extracted as: addAll; the input parameters are: { { LinkedList, anonym }, { int, index }, { Collection, c } }; the return type is Boolean. Wherein the input parameters can be expressed as a set of { p _ type, p _ name }, p _ type representing the type of the input parameters, and p _ name representing the name of the input parameters.
In step S22, namely, the second part of the API function description text in the description document of the API is processed, specifically: and removing stop words and numbers from the first sentence, and drying each word to obtain the keyword. Specifically in the example of fig. 2, the keywords "insert", "element", "spec", "collect", "list", "start", "posit" for word desiccation may be obtained. The set formed by the word drying keywords is behavior information.
In step S23, namely, the third part of the API parameter description text in the description document of the API is processed, specifically: and eliminating stop words and numbers from the first sentence corresponding to the input parameters, drying each word and word to obtain a keyword, and then corresponding to the parameters. In the example of fig. 2, the keywords "insert", "first", "element", "spec", "collect" of word mummification can be obtained for the input parameter { int, index }; for the input parameters { Collection, c }, the keywords "collect", "contact", "element", "add", "list" of word mummification can be obtained. The input information thus composed can be divided into two parts: the first part is input parameter type information, namely { "LinkedList", "int", "Collection" }; the second part is input parameter semantic information, { }, { "insert", "first", "element", "spec", "collect" }, { "collect", "contact", "element", "add", "list" }.
In step S24, a fourth part API of the API description document returns a description text for processing, specifically, stop words and numbers are removed from the first sentence, and each word and word is dried to obtain a keyword. Specifically, in the example of fig. 2, the output information is { Boolean, "true," "list," "change," "result" }.
It should be noted that, in the above steps S22, S23, and S24, the removal of stop words and numbers from the text and the drying of words in the text are familiar to those skilled in the art, and the specific processing procedures are not described in detail herein.
Step S3 is to calculate similarity values for the behavior information, the input information, and the output information, respectively, and then synthesize them to obtain an API similarity value, specifically:
s31: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;
s32: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;
s33: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;
s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.
In step S31, the first similarity value is calculated as a similarity value of a set of the word-drying keywords as the behavior information. The similarity calculation of the keyword sets is well known to those skilled in the art, and will not be described in detail herein.
In step S32, the similarity value is calculated for the input parameter type information, and then the similarity value is calculated according to the input parameter semantic information, and finally, the similarity value is integrated.
The similarity value is calculated for the input parameter type information, i.e. the similarity value between the input parameter type information of the two APIs is calculated. For API matching, the two matching API input parameter types must be consistent, and therefore, the similarity value between the input parameter type information of the two APIs is either 1 or 0. If the input parameter type information of the two APIs is the same, the similarity is 1, and otherwise, the similarity is 0. Specifically, in the example of fig. 2, it is sufficient to compare whether the input parameter type information of another API is the same as { "LinkedList", "int", "Collection" }.
And calculating a similarity value according to the input parameter semantic information, namely comparing the input parameter semantic information of the two APIs. Because there are a plurality of input parameters, each input parameter can be calculated independently and then an average similarity value is calculated comprehensively, specifically in the example of fig. 2, similarity is calculated for the keyword sets { }, { "insert", "first", "element", "spec", "collect" }, { "collect", "contact", "element", "add", "list" } corresponding to each input parameter and the keyword sets corresponding to each input parameter of another API, respectively, and then an average similarity value is calculated. Those skilled in the art understand that semantic information of each input parameter may also be synthesized to calculate a similarity value, specifically, in the example of fig. 2, a keyword set { "insert", "first", "element", "spec", "collect", "contact", "element", "add", "list" is obtained after synthesis, and then the keyword set is compared with a keyword set obtained after synthesis of another API input parameter to calculate a similarity value.
The integration between the similarity values calculated for the input parameter type information and the similarity values calculated according to the input parameter semantic information can generally be performed in two ways: the first is a weighted average and the second is a calculated product. Considering that the two matching API input parameter types must be kept consistent, the latter implementation manner is adopted in this embodiment, that is, assuming that the similarity value calculated for the input parameter type information is a, and the similarity value calculated according to the input parameter semantic information is b, then the second similarity value after the two are integrated is a × b.
In addition, another preferable mode is that if the two API input parameter type information are different, the second similarity value is directly taken as 0, otherwise, the similarity value is calculated according to the input parameter semantic information and taken as the second similarity value.
Step S33 is similar to step S32, and first, it is compared whether the return types of the two APIs are consistent, if not, the third similarity value is 0, otherwise, the similarity value is calculated as the third similarity value according to the keyword set of the word drying in the output information. Specifically, in the example of fig. 2, that is, the similarity values between the word-dried keyword set and the keyword sets { "true", "list", "change", "result" } in the output information of the other API are calculated to obtain a third similarity value.
Step S34 can be formulated as: s ═ w1×s1+w2×s2+w3×s3. Wherein s is1,s2,s3Respectively a first similarity value, a second similarity value and a third similarity value; w is a1,w2,w3The weighting coefficients are respectively corresponding to the first similarity value, the second similarity value and the third similarity value, and s is an API similarity value. Weighting coefficients w corresponding to the first similarity value, the second similarity value and the third similarity value1,w2,w3Preset, and comprises the following steps: w is a1+w2+w3=1。

Claims (4)

1. An API matching method based on an API document is characterized by comprising the following steps:
s1: obtaining description documents of at least two APIs;
s2: extracting API information by analyzing the description document of the API;
the API information includes: input information, output information, behavior information;
s3: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;
the step S2 includes:
s21: extracting an API name, input parameters and a return type in a description document of the API;
s22: extracting key words in an API function description text of the description document of the API as behavior information;
s23: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;
s24: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;
the step S3 includes:
s31: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;
s32: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;
s33: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;
s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.
2. The API matching method based on API document as recited in claim 1, wherein in said step S1, profiles of APIs in two languages are obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the steps S2 and S3, API similarity values of the profiles of the APIs in the first language and the profiles of the APIs in the respective second languages are calculated, and then the profile of the API in the second language with the highest API similarity value is selected as a matching result of the profiles of the APIs in the first language.
3. An API matching device based on an API document is characterized by comprising the following modules:
m1, used for: obtaining description documents of at least two APIs;
m2, used for: extracting API information by analyzing the description document of the API;
the API information includes: input information, output information, behavior information;
m3, used for: respectively carrying out similarity calculation on input information, output information and behavior information of the two API information, and judging whether the two APIs are matched or not after synthesis;
the module M2 includes:
m21, used for: extracting an API name, input parameters and a return type in a description document of the API;
m22, used for: extracting key words in an API function description text of the description document of the API as behavior information;
m23, used for: extracting key words in an API parameter description text of the description document of the API, and forming corresponding input information by the key words and corresponding input parameters;
m24, used for: extracting key words in an API return description text of the description document of the API, and forming corresponding output information by the return types of the key words;
the module M3 includes:
m31, used for: similarity calculation is carried out on the keywords in the behavior information of the two API information to obtain a first similarity value;
m32, used for: similarity calculation is carried out on the keywords and the input parameters in the input information of the two API information, and a second similarity value is obtained;
m33, used for: similarity calculation is carried out on the keywords and the return types in the output information of the two API information, and a third similarity value is obtained;
s34: and carrying out weighted average on the first similarity value, the second similarity value and the third similarity value to obtain an API similarity value.
4. The API matching apparatus based on API document as set forth in claim 3, wherein in said module M1, the profile of API in two languages is obtained: a description document of an API of a first language and a description document of an API of a second language; wherein the description document of the API of the second language relates to description documents of a plurality of APIs; through the modules M2 and M3, API similarity values of the description documents of the API of the first language and the description documents of the APIs of the respective second languages are calculated, and then the description document of the API of the second language with the highest API similarity value is selected as a matching result of the description documents of the API of the first language.
CN201911239725.5A 2019-12-06 2019-12-06 API (application program interface) matching method and device based on API document Active CN111090462B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911239725.5A CN111090462B (en) 2019-12-06 2019-12-06 API (application program interface) matching method and device based on API document

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911239725.5A CN111090462B (en) 2019-12-06 2019-12-06 API (application program interface) matching method and device based on API document

Publications (2)

Publication Number Publication Date
CN111090462A true CN111090462A (en) 2020-05-01
CN111090462B CN111090462B (en) 2021-04-30

Family

ID=70394819

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911239725.5A Active CN111090462B (en) 2019-12-06 2019-12-06 API (application program interface) matching method and device based on API document

Country Status (1)

Country Link
CN (1) CN111090462B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650833A (en) * 2020-12-25 2021-04-13 哈尔滨工业大学(深圳) API (application program interface) matching model establishing method and cross-city government affair API matching method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255718A1 (en) * 2006-04-28 2007-11-01 Sap Ag Method and system for generating and employing a dynamic web services interface model
CN101547218A (en) * 2009-05-07 2009-09-30 南京大学 Multi-stage semantic Web service finding method
CN101567005A (en) * 2009-05-07 2009-10-28 浙江大学 Semantic service registration and query method based on WordNet
CN102780580A (en) * 2012-06-21 2012-11-14 东南大学 Trust-based composite service optimization method
CN103036931A (en) * 2011-09-30 2013-04-10 富士通株式会社 Generating equipment and method of semantic network service document and web ontology language (OWL) concept analysis method
CN103473243A (en) * 2012-06-08 2013-12-25 富士通株式会社 Method and device for generating semantic network service document

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070255718A1 (en) * 2006-04-28 2007-11-01 Sap Ag Method and system for generating and employing a dynamic web services interface model
CN101547218A (en) * 2009-05-07 2009-09-30 南京大学 Multi-stage semantic Web service finding method
CN101567005A (en) * 2009-05-07 2009-10-28 浙江大学 Semantic service registration and query method based on WordNet
CN103036931A (en) * 2011-09-30 2013-04-10 富士通株式会社 Generating equipment and method of semantic network service document and web ontology language (OWL) concept analysis method
CN103473243A (en) * 2012-06-08 2013-12-25 富士通株式会社 Method and device for generating semantic network service document
CN102780580A (en) * 2012-06-21 2012-11-14 东南大学 Trust-based composite service optimization method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112650833A (en) * 2020-12-25 2021-04-13 哈尔滨工业大学(深圳) API (application program interface) matching model establishing method and cross-city government affair API matching method

Also Published As

Publication number Publication date
CN111090462B (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN110502361B (en) Fine granularity defect positioning method for bug report
CN110348214B (en) Method and system for detecting malicious codes
CN107301170B (en) Method and device for segmenting sentences based on artificial intelligence
CN111680494A (en) Similar text generation method and device
CN104008166A (en) Dialogue short text clustering method based on form and semantic similarity
CN112084794A (en) Tibetan-Chinese translation method and device
JP6738769B2 (en) Sentence pair classification device, sentence pair classification learning device, method, and program
CN111210432A (en) Image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN108536735A (en) Multi-modal lexical representation method and system based on multichannel self-encoding encoder
CN111723192B (en) Code recommendation method and device
CN111090462B (en) API (application program interface) matching method and device based on API document
CN111783843A (en) Feature selection method and device and computer system
CN111354354B (en) Training method, training device and terminal equipment based on semantic recognition
CN113220996B (en) Scientific and technological service recommendation method, device, equipment and storage medium based on knowledge graph
CN113935387A (en) Text similarity determination method and device and computer readable storage medium
CN117573084A (en) Code complement method based on layer-by-layer fusion abstract syntax tree
KR102474042B1 (en) Method for analyzing association of diseases using data mining
CN115794105A (en) Micro-service extraction method and device and electronic equipment
CN112988999A (en) Construction method, device, equipment and storage medium of Buddha question and answer pair
CN109783820B (en) Semantic parsing method and system
CN112597776A (en) Keyword extraction method and system
CN117435246B (en) Code clone detection method based on Markov chain model
CN112559841A (en) Method and system for processing item comments, electronic equipment and readable storage medium
CN118587017B (en) Big data marketing service method and system based on multi-mode generation type artificial intelligence
CN118332387B (en) Classification method of text content classification system based on BERT model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant