CN116107619A - Web API recommendation method based on factoring machine - Google Patents
Web API recommendation method based on factoring machine Download PDFInfo
- Publication number
- CN116107619A CN116107619A CN202211534754.6A CN202211534754A CN116107619A CN 116107619 A CN116107619 A CN 116107619A CN 202211534754 A CN202211534754 A CN 202211534754A CN 116107619 A CN116107619 A CN 116107619A
- Authority
- CN
- China
- Prior art keywords
- api
- mashup
- web
- feature
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 230000003993 interaction Effects 0.000 claims abstract description 27
- 239000013598 vector Substances 0.000 claims abstract description 27
- 239000011159 matrix material Substances 0.000 claims abstract description 10
- 230000007246 mechanism Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 4
- 239000008186 active pharmaceutical agent Substances 0.000 claims abstract 39
- 230000009193 crawling Effects 0.000 claims abstract 2
- 230000006870 function Effects 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 8
- 238000012417 linear regression Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 230000001537 neural effect Effects 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 2
- 210000002569 neuron Anatomy 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims 1
- 238000013528 artificial neural network Methods 0.000 abstract description 5
- 238000011161 development Methods 0.000 abstract description 4
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/70—Software maintenance or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a Web API recommendation method based on a factoring machine, which comprises the following steps: crawling Mashup and Web API metadata from a Programmable Web site to construct a service library data set; preprocessing the obtained Mashup and API function text description; inputting the preprocessed text into a Sentence-BERT model to obtain a vector representation of a Sentence, and calculating the similarity between an API and Mashup through the obtained Sentence vector and a multi-feature extraction component; calculating popularity of the APIs and compatibility of the API combination according to interaction records between Mashup and the APIs; obtaining a feature matrix of Mashup-API through complete series connection, and using the feature matrix as input of an AFMHN model; the output of the AFMHN model gives Web APIs of Top-k with highest probability. According to the method, different characteristics of Web API metadata are extracted, a deep neural network is used for capturing interaction between any low-order and high-order nonlinear characteristics, and meanwhile, an attention mechanism is used for capturing different importance among the characteristics, so that the recommended Web API can meet the requirements of development Mashup proposed by a developer.
Description
Technical Field
The invention belongs to the technical field of data mining and recommendation, and particularly relates to a Web API recommendation method based on a factoring machine.
Background
With the deep research of the service computing field, the value of internet service resources is becoming more and more accepted, and more enterprises issue business functions thereof as APIs (application programming interfaces) which can be accessed remotely. In the last decade, under the push of micro-service architecture (MSA), more and more enterprises began to use reusable APIs to create Mashup applications that meet complex business needs, rather than starting coding from scratch, which greatly shortens the development cycle.
As API economies continue to flourish, many API-shared libraries are emerging. The Programmable Web is one of the largest online stores. According to the statistics of the website, by month 4 of 2022, the number of available APIs has reached 24000, including 400 categories. The increasing number of APIs makes it difficult for a developer to select an appropriate API from a large number of candidate APIs when composing a service. Therefore, there is an urgent need to develop better recommendation techniques for developers to find APIs suitable for Mashup development.
The combination of the recently emerging factorizer-based hybrid model with deep neural networks has proven to be a relatively successful recommendation model, achieving high scalability. However, in practical applications, implicit features such as API popularity and compatibility between APIs are not modeled well, and these features play a very important role in efficient recommendations. For long text information such as service function descriptions, simple word models do not fully reflect content associations. On the other hand, a simple DNN network cannot learn any feature interaction relationship between low and high orders. Worse, it lacks a corresponding attention mechanism and cannot assign different weights to feature interactions to reduce noise impact.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides a Web API recommendation method based on a factoring machine, which can enable the recommended Web API to meet the requirements of developing Mashup proposed by a developer.
A Web API recommendation method based on a factoring machine comprises the following steps:
(1) Mashup and Web API metadata are crawled from a Programmable Web site to construct a service library data set, wherein the service library data set comprises Mashup and API category information, function description text information, historical interaction records and the like.
(2) Preprocessing the obtained Mashup and API function text description, wherein the preprocessing comprises duplication removal and standardization of sentences, namely deleting invalid words and restoring abbreviations of English sentences; and (3) standardization, namely removing word affix and unifying statement tenses.
(3) The preprocessed text is input into a Sentence-BERT model, a vector representation of the Sentence is obtained, and the similarity between the API and Mashup is calculated by the obtained Sentence vector and the multi-feature extraction component.
(4) Popularity of the APIs and compatibility of the API combinations are calculated according to interaction records between Mashup and the APIs. Wherein the popularity definition formula of the API is as follows:
wherein: m is the number of all mashups, N is the number of all APIs, p i,j Representing the number of calls between mashi and API j. If Mashup i calls API j, p i,j =1; otherwise, p i,j =0. The compatibility formula defining API j is as follows:
wherein: let G be an API common call graph that contains API nodes. If APIs i and j are called together by the same Mashup, an edge is added between them. Let d (i, j). Gtoreq.1 denote the shortest distance between i and i. Compatibility of i and j is defined as com (i, j) =e 1-d(i,j) 。
(5) And obtaining a feature matrix of the Mashup-API through complete series connection, and taking the feature matrix as an input of an AFMHN model. The AFMHN model has four training components, namely a linear component, a DNN component, a CIN component, and an attention component, and two output components, namely a predictive component and an assessment component. The interaction formula for learning complex features by the AFMHN model is defined as follows:
wherein: the formula can be divided into four parts, namely a linear regression part for learning basic feature contributionPart of the attention mechanism for learning feature interactions of different importance +.>DNN network part for capturing hidden high-order feature interactionCIN network part interacting with capturing dominant higher order features>
(6) The output of the AFMHN model, namely the Web APIs of Top-k with highest probability, is described according to Mashup text requirements of a developer.
Preferably, in the step (5), the linear regression section for learning the contribution of the basic feature is calculated by the following formula
wherein :is global bias,/->Is the intensity of the i-th variable. />One row v of i An embedded vector representing feature i, wherein>Is a hyper-parameter defining a factorized dimension.
Preferably, in the step (5), the attention mechanism part for learning the feature interactions with different importance is calculated by the following formula
a i ′ j =h T ReLU((v i ⊙v j )x i x j +b)
wherein :is a parameter of the model, and t represents the size of the hidden layer of the attention network, which is called attention factor. v i Representing an embedded vector corresponding to a feature field, x i Representing the feature value, as would be the element product of the two vectors, reLU represents the activation function of the network. a, a ij Representing the attention score representing the cross term weight, p representing the neural weight of the prediction layer.
As a preferenceIn the step (5), the DNN network part capturing the hidden high-order characteristic interaction is calculated by the following formula
Wherein: vector h represents the neural weight of the prediction layer, L represents the number of hidden layers, W L 、b L and σL Representing the weight matrix, bias vector and activation function of layer L.
Preferably, in the step (5), CIN network part capturing dominant higher order feature interactions is calculated by the following formula
wherein :is the input feature vector, m is the number of features, and D is the dimension of the embedded vector.Representing the output of the kth layer of CIN network, where H k The number of features representing the k-th layer can also be understood as the number of neurons. Furthermore, the degree represents the Hadamard product, i.e. the multiplication of the corresponding dimensional elements between vectors. />H eigenvectors representing the kth layer, which are added from the D dimension to get +.>Let T denote the depth of the network, a k-th layer length H can be obtained k Is>By concatenating all vectors obtained from different layers we produce the final output of CIN, w 0 Is a regression parameter.
The invention has the beneficial effects that:
the invention provides a novel mixed network factorizer model which is used for Web API recommendation in the process of developing Mashups. Better learning of the Mashup and the functional text description features of the Web API through a Sentence-BERT model; providing an AFMHN model which integrates DNN and CIN networks to learn dominant and recessive feature interactions between low-order and high-order, and integrates an attention network (namely an attention component) to capture the specific importance of different feature interactions; the invention can make the recommended Web API more in line with the development Mashup requirement of the developer, thereby reducing the search cost of the developer and improving the satisfaction degree of the user.
Drawings
Fig. 1 is a schematic diagram of a system architecture of a Web API recommendation method of the present invention.
Fig. 2 is a schematic diagram of a neural network structure of an AFMHN model in the Web API recommendation method of the present invention.
Detailed Description
In order to more particularly describe the present invention, the following detailed description of the technical scheme of the present invention is provided with reference to the accompanying drawings and the specific embodiments.
The embodiment provides a Web API recommendation method based on a factoring machine, which comprises the following steps:
(1) Mashup and Web API metadata are crawled from a Programmable Web site to construct a service library data set, wherein the service library data set comprises Mashup and API category information, function description text information, historical interaction records and the like.
(2) Preprocessing the obtained Mashup and API function text description, wherein the preprocessing comprises the steps of duplication removal, normalization and standardization of sentences.
(3) The preprocessed text is input into a Sentence-BERT model, a vector representation of the Sentence is obtained, and the similarity between the API and Mashup is calculated by the obtained Sentence vector and the multi-feature extraction component.
(4) Popularity of the APIs and compatibility of the API combinations are calculated according to interaction records between Mashup and the APIs. Wherein the popularity definition formula of the API is as follows:
wherein: m is the number of all mashups, N is the number of all APIs, p i,j Representing the number of calls between mashi and API j. If Mashup i calls API j, p i,j =1; otherwise, p i,j =0. The compatibility formula defining API j is as follows:
wherein: let G be an API common call graph that contains API nodes. If APIs i and j are called together by the same Mashup, an edge is added between them. Let d (i, j). Gtoreq.1 denote the shortest distance between i and i. Compatibility of i and j is defined as com (i, j) =e 1-d(i,j) 。
(5) And obtaining a feature matrix of the Mashup-API through complete series connection, and taking the feature matrix as an input of an AFMHN model. The AFMHN model has four training components, namely a linear component, a DNN component, a CIN component, and an attention component, and two output components, namely a predictive component and an assessment component. The interaction formula for learning complex features by the AFMHN model is defined as follows:
wherein: the formula can be divided into four parts, namely a linear regression part for learning basic feature contributionPart of the attention mechanism for learning feature interactions of different importance +.>DNN network part for capturing hidden high-order feature interactionCIN network part interacting with capturing dominant higher order features>
(6) The output of the AFMHN model, namely the Web APIs of Top-k with highest probability, is described according to Mashup text requirements of a developer.
Fig. 1 shows the architecture of the Web API recommendation method based on the factoring machine of the present embodiment. The framework consists of a data training part and an attention decomposition machine (AFMHN) based on a mixed network model, which is mixed with a deep neural network and an attention mechanism. The data training part mainly comprises preprocessing the obtained Mashup and API text description and embedding sentence vectors, calculating popularity of the API and compatibility of the combined API according to interaction records between the Mashup and the API, and taking the obtained Mashup-API feature matrix as input of an AFMHN model. The AFMHN model has four training components, namely a linear component, a DNN component, a CIN component, and an attention component, and two output components, namely a predictive component and an assessment component. Finally, the output of the AFMHN model, namely, the Top-k Web APIs with highest probability are given according to the Mashup text demand description of the developer.
Fig. 2 shows the neural network structure of the proposed AFMHN model, with four training components, namely a linear component, a DNN component, a CIN component and an attention component, and two output components, namely a predictive component and an evaluation component. Wherein the linear part, the attention mechanism part, the DNN part and the CIN part are represented by red, green, blue and purple connections, respectively. The prediction component sums all of the outputs of these training components, predicting the probability that Mashup calls the web API.
The previous description of the embodiments is provided to facilitate a person of ordinary skill in the art in order to make and use the present invention. It will be apparent to those having ordinary skill in the art that various modifications to the above-described embodiments may be readily made and the generic principles described herein may be applied to other embodiments without the use of inventive faculty. Therefore, the present invention is not limited to the above-described embodiments, and those skilled in the art, based on the present disclosure, should make improvements and modifications within the scope of the present invention.
Claims (9)
1. A Web API recommendation method based on a factoring machine comprises the following steps,
(1) Crawling Mashup and Web API metadata to construct a service library data set, wherein the service library data set comprises category information, function description text information and historical interaction records of Mashup and APIs;
(2) Preprocessing the obtained service library data set;
(3) Inputting the preprocessed functional description text information into a Sentence-BERT model to obtain a vector representation of a Sentence, and calculating the similarity between an API and Mashup through the obtained vector of the Sentence and a multi-feature extraction component;
(4) Calculating popularity of the APIs and compatibility of the API combination according to the historical interaction record between Mashup and the APIs;
(5) The feature matrix of the Mashup-API is obtained through complete series connection, and is used as the input of an AFMHN model, wherein the AFMHN model is provided with four training components, namely a linear component, a DNN component, a CIN component and an attention component, and two output components, namely a prediction component and an evaluation component, and an interaction formula for learning complex features by the AFMHN model is defined as follows:
wherein the formula can be divided into four parts, which are respectively linear regression parts for learning basic feature contributionPart of the attention mechanism for learning feature interactions of different importance +.>DNN network part for capturing implicit higher order feature interactions>CIN network part interacting with capturing dominant higher order features>
(6) The output of the AFMHN model, namely the Web APIs of Top-k with highest probability, is described according to Mashup text requirements of a developer.
2. The Web API recommendation method based on a factoring machine as recited in claim 1, wherein said service library data set includes Mashup and API category information, function description text information and history interaction record in said step (1).
3. The Web API recommendation method based on factoring machine of claim 2, wherein said preprocessing method of said service library dataset comprises: deleting invalid words, restoring abbreviations of English sentences, removing word affix and unifying sentence tenses.
4. The Web API recommendation method based on a factoring machine of claim 1 wherein said API popularity calculation method is as follows:
the popularity definition formula of the API is as follows:
where M is the number of all mashups, N is the number of all APIs, p i,j Representing the number of calls between mashi and API j, if mashi calls API j, p i,j =1; otherwise, p i,j =0。
5. The Web API recommendation method based on factoring machine of claim 4 wherein said API combination compatibility calculating method is as follows:
the compatibility formula defining API j is as follows:
wherein, let G be an API common call graph containing API nodes, if API i and j are commonly called by the same Mashup, an edge is added between them, let d (i, j) gtoreq 1 represent the shortest distance between i and i, and the compatibility of i and j is defined as com (i, j) =e 1-d(i,j) 。
6. The Web API recommendation method using a factoring machine as recited in claim 1, wherein said step (5) calculates a linear regression portion of the learned basis feature contribution by the following formula
7. The Web API recommendation method using factorization machine according to claim 1, wherein said step (5) calculates attention mechanism part for learning feature interactions of different importance by the following formula
a′ ij =h T ReLU((v i ⊙v j )x i x j +b)
wherein ,is a parameter of the model, t represents the size of the attention network hidden layerCalled attention factor, v i Representing an embedded vector corresponding to a feature field, x i Representing the feature value, as would be the elemental product of two vectors, reLU represents the activation function of the network, a ij Representing the attention score representing the cross term weight, p representing the neural weight of the prediction layer.
8. The Web API recommendation method based on factorization of claim 1 wherein said step (5) calculates DNN network portions capturing implicit higher order feature interactions by the formula
Wherein the vector h represents the neural weight of the prediction layer, L represents the number of hidden layers, W L 、b L and σL Representing the weight matrix, bias vector and activation function of layer L.
9. The Web API recommendation method based on factoring machine as recited in claim 1, wherein said step (5) calculates a CIN network portion capturing explicit higher order feature interactions by the following formula
wherein ,is an input feature vector, m is the number of features, and D is the dimension of the embedded vector;representing the output of the kth layer of CIN network, where H k The number of features representing the kth layer can also be understood as the number of neurons; furthermore, the degree represents the Hadamard product, i.e. the multiplication of the corresponding dimensional elements between vectors; />H eigenvectors representing the kth layer, which are added from the D dimension to get +.>Let T denote the depth of the network, a k-th layer length H can be obtained k Is>By concatenating all vectors obtained from different layers, the final output of CIN, w, is generated 0 Is a regression parameter. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211534754.6A CN116107619A (en) | 2022-12-02 | 2022-12-02 | Web API recommendation method based on factoring machine |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211534754.6A CN116107619A (en) | 2022-12-02 | 2022-12-02 | Web API recommendation method based on factoring machine |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116107619A true CN116107619A (en) | 2023-05-12 |
Family
ID=86260574
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211534754.6A Pending CN116107619A (en) | 2022-12-02 | 2022-12-02 | Web API recommendation method based on factoring machine |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116107619A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117493697A (en) * | 2024-01-02 | 2024-02-02 | 西安电子科技大学 | Web API recommendation method and system based on multi-mode feature fusion |
-
2022
- 2022-12-02 CN CN202211534754.6A patent/CN116107619A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117493697A (en) * | 2024-01-02 | 2024-02-02 | 西安电子科技大学 | Web API recommendation method and system based on multi-mode feature fusion |
CN117493697B (en) * | 2024-01-02 | 2024-04-26 | 西安电子科技大学 | Web API recommendation method and system based on multi-mode feature fusion |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhou et al. | A comprehensive survey on pretrained foundation models: A history from bert to chatgpt | |
CN111368996B (en) | Retraining projection network capable of transmitting natural language representation | |
US20210124878A1 (en) | On-Device Projection Neural Networks for Natural Language Understanding | |
WO2022057776A1 (en) | Model compression method and apparatus | |
CN108628935B (en) | Question-answering method based on end-to-end memory network | |
CN111191002B (en) | Neural code searching method and device based on hierarchical embedding | |
CN111985245A (en) | Attention cycle gating graph convolution network-based relation extraction method and system | |
CN106202010A (en) | The method and apparatus building Law Text syntax tree based on deep neural network | |
US11836438B2 (en) | ML using n-gram induced input representation | |
CN108090231A (en) | A kind of topic model optimization method based on comentropy | |
CN112733027B (en) | Hybrid recommendation method based on local and global representation model joint learning | |
CN115048447B (en) | Database natural language interface system based on intelligent semantic completion | |
CN115269847A (en) | Knowledge-enhanced syntactic heteromorphic graph-based aspect-level emotion classification method | |
CN111274790A (en) | Chapter-level event embedding method and device based on syntactic dependency graph | |
CN115017294A (en) | Code searching method | |
CN116820429A (en) | Training method and device of code processing model, electronic equipment and storage medium | |
CN113312480A (en) | Scientific and technological thesis level multi-label classification method and device based on graph convolution network | |
CN114528398A (en) | Emotion prediction method and system based on interactive double-graph convolutional network | |
Imam et al. | The use of natural language processing approach for converting pseudo code to C# code | |
CN116107619A (en) | Web API recommendation method based on factoring machine | |
Bazaga et al. | Translating synthetic natural language to database queries with a polyglot deep learning framework | |
CN116414988A (en) | Graph convolution aspect emotion classification method and system based on dependency relation enhancement | |
CN109902273A (en) | The modeling method and device of keyword generation model | |
WO2022164613A1 (en) | Ml using n-gram induced input representation | |
CN114911940A (en) | Text emotion recognition method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |