CN114996567A - API recommendation method based on context and graph learning - Google Patents

API recommendation method based on context and graph learning Download PDF

Info

Publication number
CN114996567A
CN114996567A CN202210487835.9A CN202210487835A CN114996567A CN 114996567 A CN114996567 A CN 114996567A CN 202210487835 A CN202210487835 A CN 202210487835A CN 114996567 A CN114996567 A CN 114996567A
Authority
CN
China
Prior art keywords
api
information
prediction
graph
recommendation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210487835.9A
Other languages
Chinese (zh)
Inventor
郭俊霞
赖宝强
李征
赵瑞莲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Chemical Technology
Original Assignee
Beijing University of Chemical Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Chemical Technology filed Critical Beijing University of Chemical Technology
Priority to CN202210487835.9A priority Critical patent/CN114996567A/en
Publication of CN114996567A publication Critical patent/CN114996567A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The API recommendation method based on context and graph learning is a recommendation method for analyzing and predicting the API according to context limited information, so that the problem that recommendation performance is affected due to cold start when the context information is insufficient is solved. According to the method, fine-grained API modeling and analysis are needed, context information is enriched by fusing structure and attribute information, more characteristic information can be provided for an API prediction task, more potential relationships and use relationships can be found, and the API prediction range is expanded. And then link prediction is carried out based on the feature representation learned by the graph, so that potential relations are discovered. And meanwhile, Bayesian prediction is carried out based on historical use information, and the possible API of the next step is deduced. And finally, combining the prediction scores of the two to predict the API which is possibly called next so as to improve the recommendation performance. Compared with the existing API recommendation method, the API recommendation method can effectively improve the API recommendation accuracy, can provide rich API recommendation lists, and has certain application value.

Description

API recommendation method based on context and graph learning
Technical Field
The invention belongs to the field of software engineering intelligent development, and relates to a method for recommending a proper API for developers by combining with a programming context information application graph learning technology.
Background
APIs are widely used in daily software development tasks as function access interfaces of application programs such as software toolkits and software frameworks. Currently, many help documents have problems of low quality, incomplete code examples and the like, so that developers often face the problem that the API is difficult to use. In recent years, big data and artificial intelligence technologies have made some research progress in the field of code search and recommendation, and many scholars have made further intensive research and exploration on the API recommendation problem and have made a series of research results.
However, the current API recommendation method has shortcomings in the utilization of contextual information. When the context information is insufficient, the recommendation is prone to get into a "cold start," resulting in the recommendation performance being affected. Graph learning, which is a technique of learning a relationship and attribute feature fusion, can learn a deep feature representation of a relationship entity from a data set, thereby finding a hidden relationship. Therefore, the image learning technology has wide application in the fields of classification, prediction, recommendation and the like.
Aiming at the problem of 'cold start' in the field of API recommendation, the invention provides an API recommendation method based on graph learning by combining a graph learning technology. By researching the API use relationship and the API attribute information of context fine granularity and fusing the API structure and the attribute information by using a graph learning technology, the representation of the context information is enhanced, and the API prediction capability is improved.
Disclosure of Invention
The invention aims to provide rich API use suggestions for developers according to limited contexts when the context information of the code where the developers are located is insufficient. The method has the main advantages that the graph learning mechanism is utilized to fuse the API structure relationship and the API attribute characteristics of fine granularity, so that the deep characteristic representation of the API is learned, the context information is enhanced, and the cold start problem is solved. These API features will be used for downstream predictive tasks, providing more efficient information for recommended tasks. The API feature fusion model based on graph learning is shown in FIG. 1. In addition, the invention combines the link prediction technology and the Bayesian prediction method to predict the candidate API, realizes the prediction of the generation probability of the unknown relation edge and takes the prediction as the recommendation basis.
The method mainly comprises the following steps:
(1) relationship construction
And constructing a relation graph of the API use, and representing the API use relation of the context.
(2) Feature extraction
The method mainly aims at extracting the node features in the API relationship graph.
(3) Model training
The method mainly fuses and represents the API relational graph and the characteristics, so as to obtain deep characteristics of the nodes.
(4) Prediction and recommendation
The method mainly predicts unknown APIs based on feature representation of context API nodes and API use distribution information, and then carries out sequencing recommendation according to the size of the predicted value.
Detailed Description
The API recommendation framework based on graph learning is shown in fig. 2, and the specific implementation steps are as follows:
step 1: and (5) constructing a relation.
Through a static analysis technology, the calling relation of the method and the API in a project can be extracted, and the association relation can be constructed based on the calling relation, so that the API association subgraph corresponding to the project is constructed. And the definition of the association relationship is as follows:
given data set P ═ M i 1, 2., n }, where M is i Representing the set of APIs called in the ith client method. If a pair of API nodes (u, v) are satisfied for use in client method M simultaneously i If so, the value o (u, v) is 1, otherwise, the value o (u, v) is 0. When the API nodes u and v satisfy
Figure BDA0003629926970000021
Then an association is considered to exist between API node u and API node v, where minsup represents the minimum support and defaults to 1.
The fine-grained API structure relationship is represented by a weighted undirected API association graph G ═ V, E }, wherein a node V represents an API set, and an edge E represents an association relationship set. The label of the node V is represented by an API full-limited name and is identified by a unique number; the edge E is represented by a node pair, for example, the node pair (u, v, w) represents an edge with an association between the node u and the node v, and the association strength is w, i.e., E is (u, v, w), and E is E.
The construction of the API structural relationship is completed according to the above extraction method, and as shown in fig. 3, an API association diagram display of an item is shown.
And 2, step: and (5) feature extraction.
The feature extraction mainly aims at the node attribute in the API association graph to carry out feature extraction and serves as the initial feature of the node in the graph learning process. The attribute information of the node is mainly considered from two aspects: API project structure information and API function semantic information.
In order to simultaneously consider the advantages of the project structure information and the API function semantic information of the API, the method fuses the two kinds of information as node attributes, and embeds the node attributes into the same vector space to express the node attributes as feature vectors. The specific node attribute extraction method is that the API full-restriction names are separated according to the hierarchical relationship of project names (project), package names (package), class names (class) and method names (method) to obtain API project structure information. And then splitting each name according to a hump splitting method to obtain a word sequence so as to obtain API functional semantic information. And then all the structural parts are spliced, so that the attribute information of the API node is obtained.
And then constructing a vector space model to carry out vectorization representation on the node text attribute, wherein the weight of each bag-of-word model in the node text attribute is calculated by using a TF-IDF algorithm, so that the coding and the initial vector representation of the API are completed. The initial feature vector of API node i is represented as
Figure BDA0003629926970000031
Wherein i w The calculation formula of (c) is as follows.
Figure BDA0003629926970000032
In the formula, f w The text attribute representing the API node i comprises the number of words w, | i | represents the number of API nodes in the API association graph, a w Indicating the number of APIs in the API association graph where all text attributes contain the word w.
And step 3: and (5) training a model.
In order to perform fusion learning on the topology information and the node attributes, the method performs graph representation learning by using a graph learning framework GraphSAGE based on an airspace, so as to obtain the fusion characteristic representation of the API. The API association graph learning model based on GraphSAGE is mainly divided into three parts: an input layer, a convolutional layer, and an output layer.
(ii) an input layer
There are mainly two parts of input information, which are structure information and attribute information. The structural information refers to the API association graph G for on-graph computing tasks. The attribute information refers to a text attribute T of the node, and the initial feature vector of the API is obtained by firstly encoding by using the vector space model.
② convolution layer
And the convolutional layer aggregates the feature information of the neighbor nodes by using the structural characteristics of the graph G through a sampling strategy and an aggregation function, so as to realize the fusion and the update of the features. The method for sampling the neighbor nodes adopts a fixed-length sampling method, the number S of the neighbors needs to be defined firstly, the purpose is to keep the number of the neighbors constant, and the method is convenient for splicing a plurality of nodes for batch training. And then, a re-sampling method with a return is adopted to reach S, and a neighbor node set N (v) of the API node v can be obtained through the method. The aggregation process of the aggregation function is to transmit messages from k-order neighbor nodes, so that the feature representation of the nodes is updated, and the attribute characteristics of the balanced nodes also keep the structural characteristics of the graph.
And providing the following three types of aggregation functions to realize the feature update of the API node v:
1) average aggregation function
And taking the weighted average value of the neighbor nodes of v as the update of the nodes:
Figure BDA0003629926970000041
wherein, the node u is a neighbor node of the node v, and W is a parameter to be learned.
2) LSTM aggregation function
And (5) regarding the neighbor node set N (v) of the v as a sequence, and processing the sequence by using an LSTM module structure.
3) Pooled aggregation function
Aggregating information from neighboring nodes using maximal pooling:
Figure BDA0003629926970000042
where σ is a Sigmoid function, and W and B are parameters to be learned.
Output layer
Final feature representation z of output layer output API node v v For prediction of the API. To make the training process more stable, the output vector after each convolutional layer is normalized, i.e. normalized
Figure BDA0003629926970000043
The vectors of each convolutional layer imply the low-order and high-order characteristic signals of the nodes, which can reflect the local characteristics of the graph, so that the node representation of the k convolutional layers is output through the linear conversion layer. The formula is as follows:
Figure BDA0003629926970000044
wherein, the connection function concat () connects the feature representations of each layer of nodes in order, and W and B are parameters to be learned.
And the parameters of the aggregation function are learned and updated through a back propagation algorithm, and the parameters are learned in an unsupervised learning mode. Based on the SkipGram model idea, a loss function of the graph is adopted to enable adjacent nodes to have similar expressions, and the loss function is shown as the following formula.
Figure BDA0003629926970000045
Wherein z is u And z v Representing the final feature representation of nodes u and v, node u being the neighbor node of node v sampled in the k-th neighborhood, P n Is the negative sampling probability distribution, Q is the negative sample size, and σ represents the Sigmoid function.
And 4, step 4: and (6) predicting and recommending.
Based on the context-limited API information, deep API feature representation can be obtained according to graph learning and used for predicting and recommending the API. Given an API node set B, only corresponding node embedding needs to be searched from a parameter server or a database, and the latest feature vector of the node can be obtained by recalling through a forward propagation algorithm, wherein the mini-batch-based forward propagation algorithm is shown in FIG. 4.
In the algorithm, lines 2 to 7 represent sampling processes, and 1-order, 2-order and other high-order neighbor nodes of the sampling node u are sampled. Lines 9 to 15 represent an aggregation process, and only the nodes of the local neighborhood are aggregated, so that the iteration speed is increased.
The feature vectors are fused with the structural information and the attribute information of the API, and the unknown connection relation is predicted through link prediction, so that the prediction probability of the candidate API is obtained; in addition, Bayesian prediction is performed according to API historical use information in the code base so as to enhance the stability of prediction. Finally, the prediction probabilities of the two are combined as the final score of the candidate API. Assuming that the API usage information of the context is denoted as Q as the input of the model and the API sequence table as the output result D, the API prediction problem can be converted into solving the generation probability P (D | Q) of D under the Q condition.
Link prediction based on API (application program interface) feature representation
To capture higher order potential relationships, API-based associationsThe graph features the potential edges of the candidate APIs predicted using the similarity between nodes as the feature representation of the potential edges. Given API node u and candidate API node v, then the similarity between the two nodes can be calculated by the inner product of the vectors, i.e.
Figure BDA0003629926970000051
Given a set Q of contextual API nodes, then the feature representation between Q and the candidate API node v is noted as
Figure BDA0003629926970000052
The probability distribution y of the potential edges can be obtained through a softmax function d Namely:
y d =P 1 (d|Q)=softmax(E Q )
wherein d is the API node to be predicted.
Bayesian prediction based on API usage information
Bayesian prediction estimates posterior probability by using prior probability, and can predict API possibly called next. The frequency of d occurring in the code library can be estimated using the prior probability p (d) according to the bayesian formula shown in the following equation.
P(d|Q)=P(d,Q)/P(Q)
∝P(d,Q)=P(Q|d)P(d)
For P (Q | d) probability, according to conditional probability similar to n-gram model
Figure BDA0003629926970000061
And (6) performing calculation. Thus, given the API set Q, the predicted probability of a candidate API d is given by:
Figure BDA0003629926970000062
wherein f is the co-occurrence frequency.
According to the formula, each candidate result D in the candidate result set D can obtain a prediction probability, the probability is used as a probability score recommended by the API, and each score is normalized. Therefore, the arithmetic score of the two scores is taken as the final score of the candidate API d, as shown in the following formula.
Score(d)=αScore 1 (d)+βScore 2 (d)
Wherein, Score 1 (d) Is passing through P 1 (d | Q) Score of each candidate API after normalization, Score 2 (d) Is passing through P 2 (d | Q) score of each candidate API after normalization. Alpha and beta are respectively set to 0.5 by default. And finally, sorting the scores in the result set D, and returning Top-k as a final recommendation result.
Drawings
FIG. 1 API feature fusion model based on graph learning
FIG. 2 API recommendation framework based on graph learning
API dependency graph example of an item of FIG. 3
FIG. 4 is based on the mini-batch forward propagation algorithm.

Claims (4)

1. An API recommendation method based on context and graph learning is characterized in that an API association graph is constructed through a relational modeling method to represent a context API use relation; the problem of insufficient context information is relieved by the aid of the fusion characteristics of the API relation and the attributes through a graph learning model; and predicting the candidate API according to a prediction method of multi-source information fusion.
2. The relational modeling method according to claim 1, wherein class file information, method declaration information, method parameter information, API call information, and API parameter information in a project are extracted by a static analysis technique, thereby constructing an API association graph according to a co-occurrence relationship between a method and an API, and performing attribute feature extraction according to API project structure information and semantic information.
3. The graph learning model according to claim 1, wherein the relationship structure and attribute information of the API are fused by using a direct-push graph learning framework graph, so as to obtain the feature representation of the API nodes and the parameters of the learning aggregation function, and when the API nodes are input, the API feature vectors can be recalled according to the shared parameters for the downstream prediction task.
4. The prediction method of claim 1, wherein the target API is predicted by combining a link prediction method and a bayesian prediction method based on the API feature representation, when limited context information is known, the potential relationship can be predicted according to the existing API node information, and prediction and recommendation of candidate APIs are completed based on the prediction probability.
CN202210487835.9A 2022-05-06 2022-05-06 API recommendation method based on context and graph learning Pending CN114996567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210487835.9A CN114996567A (en) 2022-05-06 2022-05-06 API recommendation method based on context and graph learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210487835.9A CN114996567A (en) 2022-05-06 2022-05-06 API recommendation method based on context and graph learning

Publications (1)

Publication Number Publication Date
CN114996567A true CN114996567A (en) 2022-09-02

Family

ID=83024547

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210487835.9A Pending CN114996567A (en) 2022-05-06 2022-05-06 API recommendation method based on context and graph learning

Country Status (1)

Country Link
CN (1) CN114996567A (en)

Similar Documents

Publication Publication Date Title
US11941522B2 (en) Address information feature extraction method based on deep neural network model
JP7468929B2 (en) How to acquire geographical knowledge
CN109325231A (en) A kind of method that multi task model generates term vector
CN110309195B (en) FWDL (full Width Domain analysis) model based content recommendation method
CN112949281B (en) Incremental social event detection method for graph neural network
CN112597296B (en) Abstract generation method based on plan mechanism and knowledge graph guidance
CN112507699A (en) Remote supervision relation extraction method based on graph convolution network
CN114218389A (en) Long text classification method in chemical preparation field based on graph neural network
CN116521882A (en) Domain length text classification method and system based on knowledge graph
Sun et al. Graph force learning
Jin et al. Deepwalk-aware graph convolutional networks
CN114444515A (en) Relation extraction method based on entity semantic fusion
CN114330717A (en) Data processing method and device
CN114117000A (en) Response method, device, equipment and storage medium
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN117574898A (en) Domain knowledge graph updating method and system based on power grid equipment
Xu et al. A novel entity joint annotation relation extraction model
CN112417170A (en) Relation linking method for incomplete knowledge graph
CN115600012B (en) API recommendation method based on knowledge enhancement and structure comparison
CN116562286A (en) Intelligent configuration event extraction method based on mixed graph attention
CN115859963A (en) Similarity judgment method and system for new word and semantic recommendation
CN115270988A (en) Fine adjustment method, device and application of knowledge representation decoupling classification model
CN114996567A (en) API recommendation method based on context and graph learning
Zhou et al. Joint big data extraction method for coal mine safety with characters and words fusion
Wang et al. Event extraction via dmcnn in open domain public sentiment information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination