CN112650833A

CN112650833A - API (application program interface) matching model establishing method and cross-city government affair API matching method

Info

Publication number: CN112650833A
Application number: CN202011558922.6A
Authority: CN
Inventors: 李旭涛; 龙永深; 陈武桥
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-04-13

Abstract

The invention provides an API matching model establishing method and a cross-city government affair API matching method, wherein the method comprises the following steps: acquiring training samples, wherein one training sample comprises description texts of two APIs (application program interfaces), and the description text of each API consists of at least one short text; calculating semantic similarity between short texts of two APIs in each training sample; according to semantic similarity between short texts of two APIs in each training sample, constructing a similarity vector corresponding to each training sample; and inputting the similarity vector into a preset model for training until a loss function of the preset model is converged, and taking the preset model with the converged loss function as an API (application program interface) matching model. According to the invention, the similarity conversion is carried out on the description information of the API, and the semantic similarity between the API is used as the input data of model training, so that the matching accuracy is effectively improved, and a high-accuracy, high-efficiency and high-automation API matching scheme is realized.

Description

API (application program interface) matching model establishing method and cross-city government affair API matching method

Technical Field

The invention relates to the technical field of machine learning, in particular to an API matching model establishing method and a cross-city government API matching method.

Background

In recent years, government data opening platforms are launched by local governments in China, and a plurality of application programs based on government opening data are derived. However, since there is no unified standard between APIs provided by open data platforms of various governments, it is difficult for developers to find the same API between different cities, and it is necessary to manually search and screen the APIs, which is inefficient and easy to miss. This results in higher development costs when developing applications across cities and higher migration costs when applications migrate between cities.

Currently, in addition to the manual search matching method, the existing technology also uses string matching to find the same API between different cities, such as the longest public string or calculating the edit distance. And calculating the similarity of character strings between the API to be matched and all the APIs of the target city one by one according to the description text of the API. And then according to the similarity calculation result, selecting a plurality of APIs with the highest similarity and returning the APIs to the user. However, due to the richness of chinese expression and the disagreement between various government API names, the accuracy of string matching is low. Meanwhile, the method does not reduce the manual participation to the maximum extent, and only realizes semi-automatic matching.

Disclosure of Invention

The invention solves the problem that the matching accuracy of the existing API matching method is too low.

In order to solve the problems, the invention provides an API matching model establishing method and a cross-city government API matching method.

The invention provides an API matching model establishing method, which comprises the following steps:

acquiring training samples, wherein one training sample comprises description texts of two APIs (application program interfaces), and the description text of each API consists of at least one short text;

calculating semantic similarity between short texts of two APIs in each training sample;

according to semantic similarity between short texts of two APIs in each training sample, constructing a similarity vector corresponding to each training sample;

and inputting the similarity vector into a preset model for training until a loss function of the preset model is converged, and taking the preset model with the converged loss function as an API (application program interface) matching model.

Optionally, the description text contains a name short text, a keyword short text and at least one return parameter name short text;

the calculating the semantic similarity between short texts of two APIs in each training sample comprises:

in each training sample, the short texts of the return parameter names of the two APIs are arranged and combined to obtain a plurality of short text pairs of the return parameter names;

calculating the similarity of all the returned parameter name short text pairs;

and calculating the similarity of the short texts of the two API names and the similarity of the short texts of the keywords.

Optionally, the constructing a similarity vector corresponding to each training sample according to semantic similarity between short texts of two APIs in each training sample includes:

selecting a preset number of similarities with the maximum similarity from the similarities of all the returned parameter name short text pairs, wherein the preset number is marked as N;

and combining the similarity of the preset number with the similarity of the name short text, the similarity of the keyword short text and a training sample label to form a similarity vector of 1 × M dimension, wherein M is N + 3.

Optionally, the API matching model is an XGBoost model.

Optionally, the calculating semantic similarity between short texts of two APIs in each training sample includes:

performing word segmentation processing on each short text of the two APIs respectively through a Jieba word segmentation algorithm to obtain a word set after word segmentation processing;

mapping each word in the set of words into a vector using a FastText algorithm;

calculating the TextRank value of each word in the word set;

obtaining sentence vectors of short texts of two APIs (application program interfaces) according to the vectors and the TextRank values of the words in the word set;

and calculating semantic similarity between sentence vectors of the short texts of the two APIs, wherein the semantic similarity between the sentence vectors of the short texts of the two APIs is the semantic similarity between the short texts of the two APIs.

The invention also provides an API matching method, which comprises the following steps:

obtaining description texts of two APIs to be matched, wherein the description text of each API consists of at least one short text;

calculating semantic similarity between short texts of the two APIs to be matched;

determining similarity vectors corresponding to the two APIs to be matched according to semantic similarity between short texts of the two APIs to be matched;

and inputting the similarity vectors corresponding to the two APIs to be matched into an API matching model, wherein the output result of the API matching model is the matching result of the two APIs to be matched, and the API matching model is generated based on the API matching model establishing method.

Optionally, after obtaining the description texts of the two APIs to be matched and before calculating the semantic similarity between the short texts of the two APIs to be matched, the method further includes:

identifying whether the description text of each API contains short text of the geographic position qualifier;

and when the description text of any API contains the short text of the geographic position limiting word, removing the short text of the geographic position limiting word.

The invention also provides a cross-city government affair API matching method, which comprises the following steps:

receiving a cross-city government affair API matching query request, and obtaining a query API and a query range based on the cross-city government affair API matching query request;

traversing a preset government affair API database, respectively forming an API pair to be matched by the query API and each API in the query range in the preset government affair API database, and judging whether the API pair to be matched is matched based on the API matching method;

and outputting all the APIs which are matched with the query API in the query range in the preset government affair API database.

The invention provides an electronic device, comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, is configured to implement an API matching model building method as described above or an API matching method as described above or a cross-city government API matching method as described above.

The present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements an API matching model building method as described above or an API matching method as described above or a cross-city government API matching method as described above.

The API matching model establishing method and the cross-city government affair API matching method have the beneficial effects that: the method solves the problem of cost of manual matching by using a machine learning method to carry out automatic API matching, and simultaneously, in the aspect of feature extraction, the API description text similarity is adopted as a feature, and the description information of the API is converted into a similarity vector, so that the matching accuracy is effectively improved, and finally, a high-accuracy, high-efficiency and high-automation API matching scheme is realized.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an API matching model building method according to the present invention;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of the API matching model building method of the present invention;

FIG. 3 is a schematic flowchart of another embodiment of a method for creating an API matching model according to the present invention;

FIG. 4 is a schematic diagram of the construction process from an API pair to a similarity vector;

FIG. 5 is a flowchart illustrating an embodiment of an API matching method of the present invention;

FIG. 6 is a flowchart illustrating an API matching method according to an embodiment of the present invention;

FIG. 7 is a diagram of an embodiment of an API for matching a system across cities;

FIG. 8 is a schematic diagram of another embodiment of the API cross-city matching system.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

According to statistics of 'open report of local data in China', by the last half of 2019 years, 82 government data open platforms exist in China, and 62801 data sets are opened. Compared with 2017, the amplification is nearly 7 times. Meanwhile, more and more open platforms provide APIs (application programming interfaces) which can be called for the data set, so that developers can integrate the APIs into the urban service application program to obtain the latest data in real time, but the problem is that the open data platforms of various governments do not have uniform standards when defining the APIs, and the phenomenon causes obstacles for the developers to use the same type of APIs. When a city service application needs to integrate APIs of multiple cities, or an application migrates from one city to another, developers need to find APIs of the same type from different open platforms, which is called as cross-city matching of APIs. The non-normative nature of API naming and the diversity of chinese expressions present challenges to cross-city matching, which in turn lead to difficulties with application migration. The successful application of migration from other cities is one of the most economical and efficient ways for a city to extend its urban service system. Therefore, the problem of API matching across cities is urgently to be solved.

The disclosed embodiments provide an API matching model building method, which may be executed by a processor in an electronic device, where the electronic device may be implemented as, for example, a smart phone, a tablet computer, a computer device as a server, or the like. FIG. 1 is a flowchart of an API matching model building method according to an embodiment of the present invention; the API matching model establishing method comprises the following steps:

step S10, obtaining training samples, wherein one training sample comprises description texts of two APIs, and the description text of each API is composed of at least one short text.

The description text of the API contains at least one short text, for example: the description text comprises one or more of an API name, an API keyword and an API return argument name, wherein the API return argument name short text may be one or more since there may be more than one API return argument names. For the cross-city API matching, the description text also comprises the geographic position limiting words such as the city or the region to which the API belongs, and the like, but when the training sample is constructed, the geographic position limiting words can be automatically identified, and the geographic position limiting words are removed, so that the matching accuracy of the trained model is improved. As the open data platform is improved, the number of short texts contained in the description text of the API may be increased, the types of short texts may be different, and in the case that the description text containing the short texts is changed, the description text containing the short texts may also be applied to the embodiments of the present disclosure, and is included in the scope of the present disclosure.

The training samples are labeled samples, i.e., labels that identify whether the training samples are positive or negative samples.

And step S20, calculating semantic similarity between short texts of the two APIs in each training sample.

The semantic similarity between short texts of two APIs refers to the semantic similarity between similar short texts of the two APIs, for example, if an API includes three short texts, such as an API name, an API keyword, and an API return parameter name, the semantic similarity between short texts of the two APIs refers to: semantic similarity of a name of one API to a name of another API, semantic similarity of a keyword of one API to a keyword of another API, semantic similarity of a return argument name of one API to a return argument name of another API.

It should be noted that, because the present invention is applied to matching of the cross-city government affairs API, and because of the limitation of the government data platform, only the keywords, the API names, the API return parameter names, etc. of the API can be obtained, these keywords, the API names, and the return parameter names are all short texts, and each short text has different words, some words have strong universality, and cannot well embody the current short text. Therefore, the method uses the semantic similarity as the short text similarity, weakens meaningless words in the text, and simultaneously reduces the interference caused by synonymy different Chinese words, so as to more accurately represent the similarity of the corresponding short text between the two APIs and greatly improve the accuracy of API matching.

And step S30, constructing a similarity vector corresponding to each training sample according to semantic similarity between short texts of two APIs in each training sample.

For a training sample, after calculating the semantic similarity between short texts of two APIs contained in the training sample, obtaining a plurality of semantic similarities, wherein the number of the semantic similarities obtained at this time is greater than or equal to the number of the short texts contained in the description text of the API.

In order to facilitate subsequent model training, the data form of model input is simplified, a similarity vector is obtained based on semantic similarity integration between short texts of two APIs (application program interfaces), namely, each training sample corresponds to one similarity vector, and in subsequent model training, the similarity vector corresponding to each training sample is input into the model to represent the corresponding training sample.

According to semantic similarity between short texts of two APIs in each training sample, constructing a similarity vector corresponding to each training sample, which in an embodiment specifically includes: and combining semantic similarities between all short texts of the two APIs to form a similarity vector. In another embodiment, it specifically comprises: and taking a plurality of similarities from the semantic similarity between the short texts of the two APIs, and integrating the similarities into a similarity vector. In another embodiment, the method specifically comprises: and distributing weight to each short text, calculating the final similarity of each short text based on the similarity and the weight corresponding to each short text, and combining the final similarities of each short text to form a similarity vector. By converting the similarity vector of the description text of the API, the model training speed and the prediction speed are effectively improved, and the API matching accuracy and the API matching efficiency are also effectively improved.

And step S40, inputting the similarity vector into a preset model for training until the loss function of the preset model is converged, and taking the preset model with the converged loss function as an API (application program interface) matching model.

The preset model is an initialized machine learning model, the API matching model trained based on the preset model is finally realized by automatic two-classification, and the output result is that two APIs are matched or not matched. The preset model/API matching model may be a LightGBM model, which is a Gradient Boosting Decision Tree (GBDT), and the related principles thereof belong to the prior art, which is not described herein.

Optionally, the preset model/API matching model is an XGBoost model. Compared with the traditional machine learning classification algorithm, such as logistic regression and support vector machine, XGboost can automatically process missing values, on the aspect of overfitting, the traditional machine learning method generally has no overfitting prevention capability, and has higher requirements on the quantity, purity, feature dimensionality, model complexity and the like of training data, and XGboost uses regularization to prevent overfitting, so that the requirements on the aspects are reduced, and the XGboost model is used for API matching to obtain a better matching effect and has higher accuracy.

The loss function of the preset model can be a logarithmic loss function logloss or other self-defined loss functions.

The prior art has the defects of low API matching accuracy, high labor cost, low efficiency and the like, and cannot be widely applied to actual production. Aiming at the defects, the invention carries out automatic API matching by using a machine learning method, solves the problem of cost of manual matching, simultaneously adopts the similarity of each short text of the API description text as the characteristic in the aspect of characteristic extraction, converts the description text of the API into a similarity vector constructed based on the semantic similarity between each short text of the API because the semantic similarity between each short text of the API and the API have strong correlation or not, effectively improves the matching accuracy, and finally realizes a high-accuracy, high-efficiency and high-automation API matching scheme.

The invention uses the semantic similarity as the semantic similarity between short texts of two APIs, uses a machine learning algorithm-training automatic API matching model to solve the API matching problem, can realize automatic API matching, saves cost, improves efficiency, and can solve the problems of low accuracy, poor practical application effect, great dependence on manual work and the like in the existing method. And for the cross-city API matching, the development of various application programs based on government open data can be assisted, and the promotion of smart cities is accelerated.

Optionally, the description text contains a name short text, a keyword short text and at least one return parameter name short text; as shown in fig. 2, the step S20 includes:

step S201, permutation and combination are performed on the short texts of the return parameter names of the two APIs in each training sample, so as to obtain a plurality of pairs of short texts of the return parameter names.

The number and the arrangement sequence of the return parameter names of different APIs may be different, so that the return parameter names of the two APIs are arranged and combined, and the similarity of each short text pair of the return parameter names formed by the arrangement and the combination is respectively calculated, so that the similarity of the short texts of the return parameter names is accurately calculated. Meanwhile, the short text of the returned parameter name may contain a plurality of returned parameter names, and when the semantic similarity is calculated, the semantic similarity between two returned parameter names from two APIs is calculated, so that the short text of the returned parameter name may contain a plurality of similarities.

Step S202, calculating the similarity of all the returned parameter name short text pairs.

For two APIs with return parameter names with the same or larger similarity, even if the number and the arrangement sequence of the return parameter names of the two APIs are different, after the similarity calculation is carried out on the arrangement combination, the return parameter names with the same or larger similarity of the two APIs can be determined based on the similarity, and the similarity of the short text pairs of the return parameter names with the maximum preset number of similarities is obtained, so that the problem of misjudgment caused by the fact that the number and the arrangement sequence of the return parameter names are different can be solved, and the matching accuracy can be further improved.

For two APIs which do not have the same or return parameter names with larger similarity, the similarity of the preset number with the maximum similarity is taken as the similarity of the short texts of the two API return parameter names, and the similarity of the short texts of the two API return parameter names can be well represented.

Step S203, calculating the similarity of the short texts of the two API names and the similarity of the short texts of the keywords.

The similarity calculation sequence of the name short text, the keyword short text and the return parameter name short text is not limited here. Calculating the similarity of the name short text, the keyword short text may be performed before, after, or simultaneously with the similarity calculation that returns the parameter name short text. The similarity calculation sequence of the name short text and the keyword short text is not limited.

When the description text contains the name short text, the keyword short text and the returned parameter name short text, the similarity of each short text is calculated respectively, the similarity of each short text is obtained, a similarity set is obtained, so that a similarity vector is generated in the subsequent steps based on the similarity set, the similarity between the two API description texts is comprehensively represented, the accuracy of feature construction is ensured, and the accuracy of model training and the prediction accuracy are improved.

Optionally, the step S30 includes: selecting a preset number of similarities with the maximum similarity from the similarities of all the returned parameter name short text pairs, wherein the preset number is marked as N; and combining the similarity of the preset number with the similarity of the name short text, the similarity of the keyword short text and a training sample label to form a similarity vector of 1 × M dimension, wherein M is N + 3.

The number of the pairs of the short texts with the names of the returned parameters obtained after permutation and combination is possibly larger, and the similarity of the short texts with the names of the returned parameters with the maximum preset number of similarities can accurately represent the similarity of the short texts with the names of the two API returned parameters, so that the overlarge similarity vector which is finally input into a preset model for training can be avoided by limiting the number of the similarities which finally represent the similarity of the short texts with the names of the two API returned parameters, the training efficiency and the prediction efficiency are ensured, and the accuracy and the efficiency of model training are considered through the construction of the similarity vector.

When the number of the similarity of all the returned parameter name short text pairs is larger than the preset number, taking the similarity of the preset number with the maximum similarity from the similarity as the similarity of the two API returned parameter name short texts, and discarding redundant similarities; and when the number of the similarities of all the returned parameter name short text pairs is smaller than the preset number, taking the similarities of all the returned parameter name short text pairs as the similarities of the two API returned parameter name short texts, and adding one or more similarities, wherein the sum of the added similarities and the number of the similarities of all the returned parameter name short text pairs is equal to the preset number, and the newly added similarities are filled with 0. Wherein the preset number can be selected from 6 to 10.

The similarity vector also includes a training sample label. And the training sample label is used for identifying whether the training sample is a positive sample or a negative sample, wherein the positive sample refers to the two APIs which are matched APIs of the same type, and the negative sample refers to the two APIs which are unmatched APIs of different types.

The training sample label can be represented in the similarity vector in the form of 1, 0, the positive sample can be labeled as 1, and the negative sample can be labeled as 0, or vice versa, the positive sample can be labeled as 0, and the negative sample can be labeled as 1.

The training sample label can be arranged at the head end or the tail end of the similarity vector so as to be convenient for model acquisition, and the formed similarity vector is a vector comprising the similarity of each short text of the two APIs and the training sample label.

Fig. 4 is a schematic diagram illustrating the construction process of the API to the similarity vector. In the API-a shown in fig. 4, the description text includes 1 name of a, 1 keyword of a, and a short text LA of return parameter name including 6 return parameter names, and in the API-B, the description text includes 1 name of B, 1 keyword of B, and a short text LB of return parameter name including 5 return parameter names, and meanwhile, the API pair shown in fig. 4 also includes a tag indicating whether a matches B.

Calculating the similarity of each short text, namely calculating the similarity of the name of A and the name of B, calculating the similarity of the keyword of A and the keyword of B, then arranging and combining 6 returned parameter names of LA and 5 returned parameter name short texts of LB, calculating the similarity of a returned parameter name short text pair, obtaining 30 similarities under the returned parameter name short texts, sorting the 30 similarities in size, selecting the largest preset number of similarities as the final returned parameter name short text similarity, and participating in the construction of subsequent feature vectors (namely the similarity vectors in the text above);

and combining the similarity of the name short text, the similarity of the keyword short text and the final similarity of the returned parameter name short text with the label to construct a feature vector, wherein the feature vector is used for training a machine learning model (namely the preset model or the API matching model).

As can be seen from fig. 4, the feature vector obtained by the final construction is a vector with dimensions of 1 × M, where the dimensions respectively form: 1 (name similarity) +1 (keyword similarity) + N (return parameter name similarity) +1 (label).

In other schemes using vectors as model input, word vectors trained by a word2vec model are often used as model input, and a conventional word vector generated by the word2vec model has a feature of large dimension, usually more than 1 × 200 dimension. The similarity of each short text of two APIs and a training sample label are combined to form a similarity vector, the similarity vector is used as model input to be trained, the vector constructed based on the vector construction scheme of the invention can represent the API characteristics better because the similarity of each short text is strongly associated with the API, the accuracy of the prediction result of the trained model is higher, and meanwhile, the number of the obtained similarities is lower because the number of the short texts contained in the API description text is generally lower, so that the dimension of the finally obtained vector is lower, namely the vector is compared with the word vector generated by a word2vec model, the vector dimension is reduced, and the model training speed and the prediction speed are improved.

Alternatively, as shown in fig. 3, the step S20 includes:

and S211, performing word segmentation processing on each short text of the two APIs respectively through a Jieba word segmentation algorithm to obtain a word set after word segmentation processing.

The Jieba word segmentation algorithm uses a prefix dictionary to realize efficient word graph scanning, generates a directed acyclic graph formed by all possible word generation conditions of Chinese characters in a sentence, then adopts dynamic programming to search a maximum probability path and finds out a maximum segmentation combination based on word frequency, adopts an HMM model based on Chinese character word forming capability for unknown words, uses a Viterbi algorithm (Viterbi algorithm), and is an existing open-source Chinese word segmentation algorithm, which is not repeated here.

The word segmentation processing is respectively carried out on each short text of the two APIs through a Jieba word segmentation algorithm to obtain a word set corresponding to each short text, for example, the word segmentation processing is carried out on the keywords to obtain a word set corresponding to the short text of the keywords, the word segmentation processing is carried out on the name to obtain a word set corresponding to the short text of the name.

In step S212, each word in the set of words is mapped into a vector using the FastText algorithm.

Compared with other text classification models such as a support vector machine, a logistic regression model, a neural network and the like, the FastText greatly shortens training time while keeping classification effect, does not need pre-trained word vectors, and can train the word vectors by self, so that convenience is brought to training of a BooXST model.

Step S213, calculating TextRank values of the words in the word set.

Step S213 specifically includes: (1) performing part-of-speech tagging on words after each short text word segmentation of the API, filtering out stop words, and only keeping words with specified part-of-speech, such as nouns, verbs and adjectives, namely S_i＝[t_i,1,t_i,2,...,t_i,n]Wherein S is_iRepresents short texts of API, t_i,nRepresenting the remaining words. (2) Constructing a candidate keyword graph G (V, E), wherein V is a node set and consists of candidate keywords generated in the step (1), and then constructing an edge between any two nodes by adopting a co-occurrence relation: edges exist between two nodes only when the corresponding words co-occur in a window with the length of K, wherein K represents the size of the window, namely K words co-occur at most, and generally K is 2. (3) Then according to the formula:

the weights of the nodes are propagated iteratively until convergence. Where in (Vi) represents the predecessor node set of a node, out (Vj) represents the successor node set of a node Vj, d is the damping coefficient, ω_jiTo indicate that the edge connections between two nodes have different degrees of importance.

According to the formula, the TextRank value of each word can be obtained.

Step S214, obtaining sentence vectors of short texts of the two APIs according to the vectors and the TextRank values of the words in the word set.

The execution order of step S212 and step S213 is not limited, and step S212 may be executed before step S213 or after step S213.

Each short text has a corresponding word set, for a word set corresponding to a short text, taking the reciprocal of the TextRank value of the words in the word set to obtain the weight of each word, multiplying the vector of each word by the weight, and adding the vectors after multiplying the weights to obtain the sentence vector of the short text.

Step S215, calculating semantic similarity between sentence vectors of short texts of the two APIs, wherein the semantic similarity between the sentence vectors of the short texts of the two APIs is the semantic similarity between the short texts of the two APIs.

Semantic similarity between sentence vectors can be calculated by cosine similarity intersection. The cosine formula is as follows:

wherein similarity refers to similarity, V_A、V_BAs a vector of sentences, A_iAnd B_iAre each V_AAnd V_BEach dimension value, for a total of k dimensions.

Because the name of the API, the keyword and the name of the return parameter are short texts consisting of a plurality of words, different words exist in each short text, and some words have strong universality and cannot well reflect the current short text. Therefore, in the independently developed feature engineering, each item of short text (namely each short text) of the API is segmented through the Jieba segmentation, each segmented word is mapped into a vector by using a FastText algorithm, each segmented word is subjected to weight calculation (the weight is the reciprocal of a TextRank value) through a TextRank algorithm, the vector of each word is multiplied by the weight, the vectors after the weight multiplication are added to obtain the vector of the short text, and the semantic similarity between the vectors of each item of short text is calculated to obtain the semantic similarity between the short texts of the two APIs. The similarity obtained by the method can accurately reflect the similarity between short texts of the two APIs, provides accurate input data for subsequent model training, and is beneficial to improving the prediction accuracy of the API matching model.

Based on the semantic similarity between the short texts of the two APIs obtained in the above steps, a specified number of similarity values can be selected from the short texts, for example, when the short text of the returned parameter name includes a plurality of returned parameter names, the short text of the returned parameter names has a plurality of returned parameter names, the return parameter names of the two APIs are arranged and combined to obtain a short text pair of the returned parameter names, the similarity is calculated for each short text pair of the returned parameter names according to the above steps S211 to S215 to obtain a plurality of similarities, a preset number of similarities are selected from the similarities as the final short text similarity of the returned parameter names, the similarity is calculated for the short text of the names and the short texts of the keywords by using the steps S211 to S215, all the finally obtained similarity construction vectors are constructed to construct a similarity vector, and the similarity vector is input into a machine learning model for training, so as to greatly reduce the training time of the machine learning model, more accurate prediction results are obtained.

The embodiment of the disclosure provides an API matching method. The method may be performed by a processor in an electronic device, where the electronic device refers to a computer, a server, and the like. As shown in fig. 5, which is a flowchart of an API matching method according to an embodiment of the present disclosure, the API matching method includes:

step S50, obtaining description texts of two APIs to be matched, where the description text of each API is composed of at least one short text.

The short text may be one or more of an API name, an API keyword, and an API return argument name, where the API return argument name short text may contain one or more return argument names, i.e., the API return argument name contains the names of the respective return argument names of the API.

For the cross-city API matching, the description text also comprises the geographic position limiting words such as the city to which the API belongs or the region to which the API belongs, therefore, before the matching judgment of the two APIs to be matched, the geographic position limiting words in the API description text are automatically identified, the geographic position limiting words are removed, and then the description texts of the two APIs from which the geographic position limiting words are removed are subjected to subsequent processing, so that the accuracy of model prediction is improved. As the open data platform is improved, the number of short texts contained in the description text of the API may be increased, the types of short texts may be different, and in the case that the description text containing the short texts is changed, the description text containing the short texts may also be applied to the embodiments of the present disclosure, and is included in the scope of the present disclosure.

And step S60, calculating semantic similarity between short texts of the two APIs to be matched.

The semantic similarity between short texts of the two APIs refers to the similarity between similar short texts of the two APIs, for example, if the API includes three short texts such as an API name, an API keyword, and an API return parameter name, the semantic similarity between short texts of the two APIs refers to: similarity of a name of one API to a name of another API, similarity of a keyword of one API to a keyword of another API, similarity of a return argument name of one API to a return argument name of another API.

The specific calculation method of the semantic similarity between the short texts of the two APIs may be the same as the calculation method of the similarity of the training samples during the training of the API matching model.

And step S70, determining similarity vectors corresponding to the two APIs to be matched according to the semantic similarity between the short texts of the two APIs to be matched.

And combining the semantic similarity between the short texts of the two APIs to be matched to form a similarity vector, wherein the similarity vector is the similarity vector corresponding to the two APIs to be matched.

In the embodiment of the present disclosure, the specific manner of determining the similarity vector corresponding to the two APIs based on the semantic similarity between the short texts of the two APIs is described in the above API matching model establishing method in reference to the content related to step S30, and the basic content is consistent, which is not described herein again.

Step S80, inputting the similarity vectors corresponding to the two APIs to be matched into an API matching model, where an output result of the API matching model is a matching result of the two APIs to be matched, where the API matching model is generated based on any one of the API matching model establishing methods.

And taking the similarity vectors corresponding to the two APIs to be matched as the feature data of the two APIs, inputting the feature data into the API matching model, and outputting a result of whether the two APIs are matched or not by the API matching model.

The API matching model can be selected as an XGboost model or a LightGBM model.

The method solves the problem of cost of manual matching by using a machine learning method to carry out automatic API matching, and simultaneously, in the aspect of feature extraction, the API description text similarity is adopted as a feature, and the description information of the API is converted into a similarity vector, so that the matching accuracy is effectively improved, and finally, a high-accuracy, high-efficiency and high-automation API matching scheme is realized.

Optionally, after the step S50 and before the step S60, the method further includes: identifying whether the description text of each API contains short text of the geographic position qualifier; and when the description text of any API contains the short text of the geographic position limiting word, removing the short text of the geographic position limiting word.

The short text of the geographic position qualifier is removed, and the remaining short text is used as the short text for calculating the similarity in step S60. For a specific application scenario of API cross-city matching, the short texts of the geographic position qualifiers of the APIs of different cities are different, and if the short texts are incorporated into similarity calculation and similarity vector construction, the prediction of the matching result of the API matching model can be interfered, so that the prediction accuracy of the API matching model is reduced, and therefore the prediction accuracy of the API matching model can be improved by removing the short texts of the geographic position qualifiers. Specifically, the description text of each API can be matched with words in the geographic position limiting word bank one by presetting the geographic position limiting word bank, and the words which are successfully matched are the geographic position limiting words.

The embodiment of the disclosure provides a cross-city government affair API matching method. The method may be performed by a processor in an electronic device, where the electronic device refers to a computer, a server, and the like. As shown in fig. 6, which is a flowchart of a cross-city government API matching method according to an embodiment of the present disclosure, the API matching method includes:

step S90, receiving the cross-city government affair API matching query request, and obtaining the query API and the query range based on the cross-city government affair API matching query request.

The query request is matched across the city government API, which contains the relevant information of the query API to be queried and may also contain the query scope, e.g., the region scope of the query.

Step S100, traversing a preset government API database, respectively combining the query API and each API in the query range in the preset government API database to form an API pair to be matched, and determining whether the API pair to be matched is matched based on the API matching method (step S50-step S80).

The preset government affair API database stores known API data, for example, for matching the specific application scene across cities, description texts of all APIs can be obtained from data open platforms of various local governments.

When the API matching query request contains a query range, the query API is paired with all the APIs in the query range in a preset government affair API database one by one to respectively form an API pair to be matched.

And step S110, outputting all the APIs matched with the query API in the query range in the preset government affair API database.

And sequentially judging whether all the API pairs to be matched are matched based on the step S100, and finally outputting all the APIs matched with the query API in the preset government affair API database. When the query scope is included in the API match query request, all APIs that match the query API refer to all APIs that match the query API within the query scope.

Through the steps, automatic API matching can be realized, all matched APIs can be returned only by inputting the query conditions (including the query API and the query range), the query efficiency and the API matching efficiency are greatly improved, and the matching accuracy can be ensured through the API matching mode.

The embodiment of the present disclosure provides an implementation manner, and provides an API cross-city matching system shown in fig. 7 and 8, to which the above-described cross-city government API matching method can be applied. The API cross-city matching system comprises an API matching model, a matching mining engine and a government affair API database. The method comprises the following steps that an API (application programming interface) cross-city matching system receives known API information and stores the known API information in a government affair API database according to city classification; the API matching model is constructed by the API matching model establishing method, and the matching judgment model receives the description texts of a pair of APIs, automatically judges whether the two APIs are matched and outputs a result. After receiving the API matching query request, the matching mining engine determines a query API and a query range, matches the query API with all APIs in the query range in the government affair API database one by one, sequentially inputs API matching models, and calls all matched APIs from the government affair API database to return to the query interface according to the judgment result of the API matching models.

Another embodiment of the present disclosure provides an electronic device comprising a memory and a processor; the memory for storing a computer program; the processor, when executing the computer program, is configured to implement an API matching model building method as described above or an API matching method as described above or a cross-city government API matching method as described above.

Yet another embodiment of the present disclosure provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an API matching model building method as described above or an API matching method as described above or a cross-city government API matching method as described above.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like. In this application, the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment of the present invention. In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Although the present disclosure has been described above, the scope of the present disclosure is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the spirit and scope of the present disclosure, and these changes and modifications are intended to be within the scope of the present disclosure.

Claims

1. An API matching model building method is characterized by comprising the following steps:

2. The API matching model building method of claim 1, wherein said descriptive text comprises a name short text, a keyword short text, and at least one return parameter name short text;

calculating the similarity of all the returned parameter name short text pairs;

3. The method for building the API matching model according to claim 2, wherein the constructing the similarity vector corresponding to each training sample according to the semantic similarity between the short texts of the two APIs in each training sample comprises:

4. The API matching model creation method of any one of claims 1 to 3, wherein the API matching model is an XGBoost model.

5. The API matching model building method of any one of claims 1 to 3 wherein the calculating semantic similarity between short texts of two APIs in each of the training samples comprises:

mapping each word in the set of words into a vector using a FastText algorithm;

calculating the TextRank value of each word in the word set;

6. An API matching method, comprising:

inputting the similarity vectors corresponding to the two APIs to be matched into an API matching model, wherein an output result of the API matching model is a matching result of the two APIs to be matched, and the API matching model is generated based on the API matching model establishing method according to any one of claims 1 to 5.

7. The API matching method according to claim 6, wherein after obtaining the description texts of the two APIs to be matched and before calculating the semantic similarity between the short texts of the two APIs to be matched, further comprising:

8. A cross-city government API matching method, comprising:

traversing a preset government affair API database, respectively forming the query API and each API in the query range in the preset government affair API database into an API pair to be matched, and judging whether the API pair to be matched is matched based on the API matching method according to claim 6 or 7;

9. An electronic device comprising a memory and a processor;

the memory for storing a computer program;

the processor, when executing the computer program, for implementing the API matching model building method according to any one of claims 1 to 5 or the API matching method according to claim 6 or 7 or the cross-city government API matching method according to claim 8.

10. A computer-readable storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the API matching model building method according to any one of claims 1 to 5 or the API matching method according to claim 6 or 7 or the cross-city government API matching method according to claim 8.