CN116107619A

CN116107619A - Web API recommendation method based on factoring machine

Info

Publication number: CN116107619A
Application number: CN202211534754.6A
Authority: CN
Inventors: 俞东进; 胡学友; 王思轩; 俞婷
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-05-12

Abstract

The invention provides a Web API recommendation method based on a factoring machine, which comprises the following steps: crawling Mashup and Web API metadata from a Programmable Web site to construct a service library data set; preprocessing the obtained Mashup and API function text description; inputting the preprocessed text into a Sentence-BERT model to obtain a vector representation of a Sentence, and calculating the similarity between an API and Mashup through the obtained Sentence vector and a multi-feature extraction component; calculating popularity of the APIs and compatibility of the API combination according to interaction records between Mashup and the APIs; obtaining a feature matrix of Mashup-API through complete series connection, and using the feature matrix as input of an AFMHN model; the output of the AFMHN model gives Web APIs of Top-k with highest probability. According to the method, different characteristics of Web API metadata are extracted, a deep neural network is used for capturing interaction between any low-order and high-order nonlinear characteristics, and meanwhile, an attention mechanism is used for capturing different importance among the characteristics, so that the recommended Web API can meet the requirements of development Mashup proposed by a developer.

Description

Web API recommendation method based on factoring machine

Technical Field

The invention belongs to the technical field of data mining and recommendation, and particularly relates to a Web API recommendation method based on a factoring machine.

Background

With the deep research of the service computing field, the value of internet service resources is becoming more and more accepted, and more enterprises issue business functions thereof as APIs (application programming interfaces) which can be accessed remotely. In the last decade, under the push of micro-service architecture (MSA), more and more enterprises began to use reusable APIs to create Mashup applications that meet complex business needs, rather than starting coding from scratch, which greatly shortens the development cycle.

As API economies continue to flourish, many API-shared libraries are emerging. The Programmable Web is one of the largest online stores. According to the statistics of the website, by month 4 of 2022, the number of available APIs has reached 24000, including 400 categories. The increasing number of APIs makes it difficult for a developer to select an appropriate API from a large number of candidate APIs when composing a service. Therefore, there is an urgent need to develop better recommendation techniques for developers to find APIs suitable for Mashup development.

The combination of the recently emerging factorizer-based hybrid model with deep neural networks has proven to be a relatively successful recommendation model, achieving high scalability. However, in practical applications, implicit features such as API popularity and compatibility between APIs are not modeled well, and these features play a very important role in efficient recommendations. For long text information such as service function descriptions, simple word models do not fully reflect content associations. On the other hand, a simple DNN network cannot learn any feature interaction relationship between low and high orders. Worse, it lacks a corresponding attention mechanism and cannot assign different weights to feature interactions to reduce noise impact.

Disclosure of Invention

Aiming at the technical problems in the prior art, the invention provides a Web API recommendation method based on a factoring machine, which can enable the recommended Web API to meet the requirements of developing Mashup proposed by a developer.

A Web API recommendation method based on a factoring machine comprises the following steps:

(1) Mashup and Web API metadata are crawled from a Programmable Web site to construct a service library data set, wherein the service library data set comprises Mashup and API category information, function description text information, historical interaction records and the like.

(2) Preprocessing the obtained Mashup and API function text description, wherein the preprocessing comprises duplication removal and standardization of sentences, namely deleting invalid words and restoring abbreviations of English sentences; and (3) standardization, namely removing word affix and unifying statement tenses.

(3) The preprocessed text is input into a Sentence-BERT model, a vector representation of the Sentence is obtained, and the similarity between the API and Mashup is calculated by the obtained Sentence vector and the multi-feature extraction component.

(4) Popularity of the APIs and compatibility of the API combinations are calculated according to interaction records between Mashup and the APIs. Wherein the popularity definition formula of the API is as follows:

wherein: m is the number of all mashups, N is the number of all APIs, p _i,j Representing the number of calls between mashi and API j. If Mashup i calls API j, p _i,j =1; otherwise, p _i,j =0. The compatibility formula defining API j is as follows:

wherein: let G be an API common call graph that contains API nodes. If APIs i and j are called together by the same Mashup, an edge is added between them. Let d (i, j). Gtoreq.1 denote the shortest distance between i and i. Compatibility of i and j is defined as com (i, j) =e ^1-d(i,j) 。

(5) And obtaining a feature matrix of the Mashup-API through complete series connection, and taking the feature matrix as an input of an AFMHN model. The AFMHN model has four training components, namely a linear component, a DNN component, a CIN component, and an attention component, and two output components, namely a predictive component and an assessment component. The interaction formula for learning complex features by the AFMHN model is defined as follows:

wherein: the formula can be divided into four parts, namely a linear regression part for learning basic feature contribution

Part of the attention mechanism for learning feature interactions of different importance +.>

DNN network part for capturing hidden high-order feature interaction

CIN network part interacting with capturing dominant higher order features>

(6) The output of the AFMHN model, namely the Web APIs of Top-k with highest probability, is described according to Mashup text requirements of a developer.

Preferably, in the step (5), the linear regression section for learning the contribution of the basic feature is calculated by the following formula

wherein ：

is global bias,/->

Is the intensity of the i-th variable. />

One row v of _i An embedded vector representing feature i, wherein>

Is a hyper-parameter defining a factorized dimension.

Preferably, in the step (5), the attention mechanism part for learning the feature interactions with different importance is calculated by the following formula

a _i ′ _j ＝h ^T ReLU((v _i ⊙v _j )x _i x _j +b)

/>

wherein ：

is a parameter of the model, and t represents the size of the hidden layer of the attention network, which is called attention factor. v _i Representing an embedded vector corresponding to a feature field, x _i Representing the feature value, as would be the element product of the two vectors, reLU represents the activation function of the network. a, a _ij Representing the attention score representing the cross term weight, p representing the neural weight of the prediction layer.

As a preferenceIn the step (5), the DNN network part capturing the hidden high-order characteristic interaction is calculated by the following formula

Wherein: vector h represents the neural weight of the prediction layer, L represents the number of hidden layers, W _L 、b _L and σ_L Representing the weight matrix, bias vector and activation function of layer L.

Preferably, in the step (5), CIN network part capturing dominant higher order feature interactions is calculated by the following formula

wherein ：

is the input feature vector, m is the number of features, and D is the dimension of the embedded vector.

Representing the output of the kth layer of CIN network, where H _k The number of features representing the k-th layer can also be understood as the number of neurons. Furthermore, the degree represents the Hadamard product, i.e. the multiplication of the corresponding dimensional elements between vectors. />

H eigenvectors representing the kth layer, which are added from the D dimension to get +.>

Let T denote the depth of the network, a k-th layer length H can be obtained _k Is>

By concatenating all vectors obtained from different layers we produce the final output of CIN, w ₀ Is a regression parameter.

The invention has the beneficial effects that:

the invention provides a novel mixed network factorizer model which is used for Web API recommendation in the process of developing Mashups. Better learning of the Mashup and the functional text description features of the Web API through a Sentence-BERT model; providing an AFMHN model which integrates DNN and CIN networks to learn dominant and recessive feature interactions between low-order and high-order, and integrates an attention network (namely an attention component) to capture the specific importance of different feature interactions; the invention can make the recommended Web API more in line with the development Mashup requirement of the developer, thereby reducing the search cost of the developer and improving the satisfaction degree of the user.

Drawings

Fig. 1 is a schematic diagram of a system architecture of a Web API recommendation method of the present invention.

Fig. 2 is a schematic diagram of a neural network structure of an AFMHN model in the Web API recommendation method of the present invention.

Detailed Description

In order to more particularly describe the present invention, the following detailed description of the technical scheme of the present invention is provided with reference to the accompanying drawings and the specific embodiments.

The embodiment provides a Web API recommendation method based on a factoring machine, which comprises the following steps:

(2) Preprocessing the obtained Mashup and API function text description, wherein the preprocessing comprises the steps of duplication removal, normalization and standardization of sentences.

DNN network part for capturing hidden high-order feature interaction

CIN network part interacting with capturing dominant higher order features>

Fig. 1 shows the architecture of the Web API recommendation method based on the factoring machine of the present embodiment. The framework consists of a data training part and an attention decomposition machine (AFMHN) based on a mixed network model, which is mixed with a deep neural network and an attention mechanism. The data training part mainly comprises preprocessing the obtained Mashup and API text description and embedding sentence vectors, calculating popularity of the API and compatibility of the combined API according to interaction records between the Mashup and the API, and taking the obtained Mashup-API feature matrix as input of an AFMHN model. The AFMHN model has four training components, namely a linear component, a DNN component, a CIN component, and an attention component, and two output components, namely a predictive component and an assessment component. Finally, the output of the AFMHN model, namely, the Top-k Web APIs with highest probability are given according to the Mashup text demand description of the developer.

Fig. 2 shows the neural network structure of the proposed AFMHN model, with four training components, namely a linear component, a DNN component, a CIN component and an attention component, and two output components, namely a predictive component and an evaluation component. Wherein the linear part, the attention mechanism part, the DNN part and the CIN part are represented by red, green, blue and purple connections, respectively. The prediction component sums all of the outputs of these training components, predicting the probability that Mashup calls the web API.

The previous description of the embodiments is provided to facilitate a person of ordinary skill in the art in order to make and use the present invention. It will be apparent to those having ordinary skill in the art that various modifications to the above-described embodiments may be readily made and the generic principles described herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.

Claims

1. A Web API recommendation method based on a factoring machine comprises the following steps,

(1) Crawling Mashup and Web API metadata to construct a service library data set, wherein the service library data set comprises category information, function description text information and historical interaction records of Mashup and APIs;

(2) Preprocessing the obtained service library data set;

(3) Inputting the preprocessed functional description text information into a Sentence-BERT model to obtain a vector representation of a Sentence, and calculating the similarity between an API and Mashup through the obtained vector of the Sentence and a multi-feature extraction component;

(4) Calculating popularity of the APIs and compatibility of the API combination according to the historical interaction record between Mashup and the APIs;

(5) The feature matrix of the Mashup-API is obtained through complete series connection, and is used as the input of an AFMHN model, wherein the AFMHN model is provided with four training components, namely a linear component, a DNN component, a CIN component and an attention component, and two output components, namely a prediction component and an evaluation component, and an interaction formula for learning complex features by the AFMHN model is defined as follows:

wherein the formula can be divided into four parts, which are respectively linear regression parts for learning basic feature contribution

DNN network part for capturing implicit higher order feature interactions>

CIN network part interacting with capturing dominant higher order features>

2. The Web API recommendation method based on a factoring machine as recited in claim 1, wherein said service library data set includes Mashup and API category information, function description text information and history interaction record in said step (1).

3. The Web API recommendation method based on factoring machine of claim 2, wherein said preprocessing method of said service library dataset comprises: deleting invalid words, restoring abbreviations of English sentences, removing word affix and unifying sentence tenses.

4. The Web API recommendation method based on a factoring machine of claim 1 wherein said API popularity calculation method is as follows:

the popularity definition formula of the API is as follows:

where M is the number of all mashups, N is the number of all APIs, p _i,j Representing the number of calls between mashi and API j, if mashi calls API j, p _i,j =1; otherwise, p _i,j ＝0。

5. The Web API recommendation method based on factoring machine of claim 4 wherein said API combination compatibility calculating method is as follows:

the compatibility formula defining API j is as follows:

wherein, let G be an API common call graph containing API nodes, if API i and j are commonly called by the same Mashup, an edge is added between them, let d (i, j) gtoreq 1 represent the shortest distance between i and i, and the compatibility of i and j is defined as com (i, j) =e ^1-d(i,j) 。

6. The Web API recommendation method using a factoring machine as recited in claim 1, wherein said step (5) calculates a linear regression portion of the learned basis feature contribution by the following formula

wherein ,

is global bias,/->

Is the intensity of the i-th variable, +.>

One row v of _i An embedded vector representing feature i, wherein>

Is a hyper-parameter defining a factorized dimension.

7. The Web API recommendation method using factorization machine according to claim 1, wherein said step (5) calculates attention mechanism part for learning feature interactions of different importance by the following formula

a′ _ij ＝h ^T ReLU((v _i ⊙v _j )x _i x _j +b)

wherein ,

is a parameter of the model, t represents the size of the attention network hidden layerCalled attention factor, v _i Representing an embedded vector corresponding to a feature field, x _i Representing the feature value, as would be the elemental product of two vectors, reLU represents the activation function of the network, a _ij Representing the attention score representing the cross term weight, p representing the neural weight of the prediction layer.

8. The Web API recommendation method based on factorization of claim 1 wherein said step (5) calculates DNN network portions capturing implicit higher order feature interactions by the formula

Wherein the vector h represents the neural weight of the prediction layer, L represents the number of hidden layers, W _L 、b _L and σ_L Representing the weight matrix, bias vector and activation function of layer L.

9. The Web API recommendation method based on factoring machine as recited in claim 1, wherein said step (5) calculates a CIN network portion capturing explicit higher order feature interactions by the following formula

wherein ,

is an input feature vector, m is the number of features, and D is the dimension of the embedded vector;

representing the output of the kth layer of CIN network, where H _k The number of features representing the kth layer can also be understood as the number of neurons; furthermore, the degree represents the Hadamard product, i.e. the multiplication of the corresponding dimensional elements between vectors; />

By concatenating all vectors obtained from different layers, the final output of CIN, w, is generated ₀ Is a regression parameter. />