CN116860991A

CN116860991A - API recommendation-oriented intent clarification method based on knowledge graph driving path optimization

Info

Publication number: CN116860991A
Application number: CN202310757316.4A
Authority: CN
Inventors: 黄箐; 李子帅; 左正康; 邢振昌; 曾锦山; 王昌晶
Original assignee: Jiangxi Normal University
Current assignee: Jiangxi Normal University
Priority date: 2023-06-26
Filing date: 2023-06-26
Publication date: 2023-10-10

Abstract

The application provides an API recommendation-oriented intention clarification method based on knowledge graph driving path optimization, which comprises the following steps: s1, extracting entities reflecting API actions, events, objects and constraints, enriching relation categories among the entities, and storing the extracted entities and relations by taking a knowledge graph as a carrier; s2, generating a dialogue process based on the API knowledge graph, clarifying the requirement of a user, searching a sub-graph from the knowledge graph according to a query statement input by the user, designing a guiding mechanism and a decision tree algorithm, and generating a man-machine dialogue process; s3, expanding the APIs and providing interpretability, expanding to obtain other APIs with semantic relation with the initial APIs, and mapping from a dialogue process to a knowledge graph to obtain an optimal clarification path of each API. The application combines the instant response capability of API research with the interactive, clarifying, explaining and expandability capability of social technical information search, is beneficial to the efficient clarification problem, and enables a user to more effectively navigate and understand the functions of the API.

Description

API recommendation-oriented intent clarification method based on knowledge graph driving path optimization

Technical Field

The application relates to the technical field of Application Programming Interface (API) recommendation, in particular to an intention clarification method based on knowledge graph driving path optimization and oriented to API recommendation.

Background

The need for developers to apply programming interface APIs has exceeded the extent of simply finding the so-called best API for a particular programming task. To minimize misuse of APIs, they must consider a number of aspects, such as the specific usage environment of the APIs, relationships to cooperating APIs, and confusion between similar APIs that have minor differences. Thus, API search techniques should be aimed at guiding developers to clarify ambiguous problem intentions, providing diverse and heuristic APIs for different needs, interpreting search results, and expanding other potentially useful API knowledge. This desire reveals some practical API requirements, namely seeking API recommendations and knowledge discovery with implications, interpretability, and extensibility, rather than just presenting a so-called best API. Meeting these practical API requirements not only helps the developer to choose the ideal API that suits his needs, but also motivates and expands his thinking, e.g., explores alternative or better solutions, discovering previously unknown API knowledge.

The mainstream API search technology adopts the technology of directly performing API search by using API query sentences, and can be roughly divided into the following two types:

(1) Keyword matching based methods. Such methods employ fuzzy keyword matching to retrieve APIs that match the literal meaning of the keywords (e.g., API names, requirement descriptions, and labels). However, this approach is limited by the semantic relationship limitations between keywords.

(2) A deep learning-based method. Such methods require that the query statement contain enough keywords to accurately reflect the needs of the user. In practice, however, the intention of the developer is not sufficiently expressed due to the lack of the query sentence keywords, which seriously reduces the effectiveness of the deep learning method.

In order to solve the problem of insufficient keywords in the initial query, methods based on intention clarification are proposed, which are mainly divided into the following two categories:

(1) Query expansion based methods. The query expansion method retrieves expansion keywords related to the query from the knowledge base and modifies or adds them to the initial query statement. However, these methods may have negative expansion effects because it is difficult to fully express the expansion needs of the user only with the keywords in the initial query statement, and if the knowledge base lacks the correct expansion keywords, the effect of knowledge recommendation is greatly compromised.

(2) Query-based clarification methods. These methods accurately express the needs of the user by addressing clarification questions, providing options for interaction with the user, and adjusting query statements based on the user's replies. However, the performance of such methods depends on the diversity and scale of the training data, resulting in a single query result if the training data lacks the correct expanded keywords. Even if expanded keywords exist in the information, the methods can require multiple rounds of query clarification to accurately capture the user's intent and obtain the correct expanded keywords, which greatly reduces the efficiency of the query clarification process.

In summary, we summarize the two limitations of the prior art approaches.

Limitation 1: the query result is single; knowledge recommendation of existing methods is also limited to recommending so-called best APIs, but lacks recommendation knowledge with heuristics, diversity and interpretability, thereby failing to meet the needs of developers for practical APIs.

Limitation 2: the query process is inefficient; the existing method is lack of efficient query clarification for guiding query sentences with undefined requirements or unspecified details, so that the number of query rounds is excessive, redundant information exists in the query process, and accurate practical API knowledge cannot be recommended in the shortest time.

Disclosure of Invention

The application aims to overcome the defects of the prior art and provides an API-oriented recommendation method for clarifying intention based on knowledge graph driving path optimization, which aims to solve the problems of the prior art.

In order to achieve the above purpose, in one aspect, the present application provides an API recommendation-oriented method for clarifying intent based on knowledge graph driving path optimization, comprising the following steps:

s1, extracting entities reflecting API actions, events, objects and constraints from an initial API document, enriching relation categories among the entities, including API functional relations and API semantic relations, and storing the extracted entities and relations with rich types by taking a knowledge graph as a carrier;

s2, generating a high-efficiency dialogue process based on the API knowledge graph, clarifying the requirement of a user, searching a sub-graph from the knowledge graph according to a query statement input by the user, and designing a high-efficiency guiding mechanism and a decision tree algorithm so as to generate a man-machine dialogue process according to the content of the sub-graph;

and S3, expanding the APIs and providing interpretability, expanding to obtain other APIs with semantic relation with the initial APIs, mapping from the dialogue process to the knowledge graph, obtaining the optimal clarification path of each API, and performing interpretation.

In the step S1, the entity reflecting the API actions, events, objects and constraints is extracted from the initial API document, and the relationship categories between the entities, including the API functional relationship and the API semantic relationship, are enriched, and the extracted entity and relationship with rich types are stored by using the knowledge graph as a carrier, and the process is as follows:

s11, designing sentence screening rules, and extracting API description sentences related to API behaviors from an API document;

step S12, designing grammar and semantic tags and labeling rules, and designing grammar role labeling and semantic role labeling for API description sentences according to the design rules;

s13, combining the grammar roles and the semantic roles according to rules to form an entity;

s14, designing entity relations which embody the API behaviors to organize the entities to form triples;

and S15, storing all triples by taking the knowledge graph as a carrier to form an API behavior knowledge graph.

Further, the design carries out grammar role labeling and semantic role labeling on the API descriptive statement, and 6 classes of grammar roles are obtained through labeling, which are respectively: verb, direct object modification direct object modifier, preposition preposition object, preposition modification preposition object modifier; the semantic roles obtained through labeling are 9 types, and are respectively: position constraint, direction constraint, mode constraint, range constraint, time constraint, object constraint, objective constraint, result constraint and condition constraint semantic roles.

10. In step S1, the relationship types among the rich entities are 6 types, including an application program interface API, an Action, an Event, an Object, an Object Constraint Object Constraint, and an Event Constraint; the relation category comprises event function relation, constraint function relation and semantic relation;

the Event function relationship is used for organizing four types of entities, namely an API, an Action, an Event and an Object, and comprises 4 types of API Event API Has Event, action Event Act Has Event, direct Object Has Direct Object and indirect Object Has Preposition Object;

the constraint function relation is used for organizing four types of entities, namely Event, object and Object Constraint, and comprises 11 types, namely a state, a Type, a position, a Direction, a mode, a range, a time, a target, a Result and a Condition, namely a Condition, wherein the states, the types, the positions, the directions and the modes are all 11 types;

the semantic relationships are used to organize different API entities, including Function Similarity functional similarity, function Opposite functional opposite, functional replacement, function Collaboration functional collaboration, logic Constraint, behavior Difference behavioral differences, efficiency Comparison efficiency comparison of these 7 classes.

In step S2, the search sub-graph is closely related to the user query statement, the content of the search sub-graph includes an API entity and other entities and relationships related to the API entity, and the whole efficient guiding mechanism process includes:

s21, converting the entities and the relations in the subgraph into Aspect and options thereof, and designing an attribute table to store the Aspect and the options thereof;

step S22, selecting the best Aspect from the attribute table by a design decision tree algorithm, and narrowing the scope of the subgraph, wherein the Aspect is taken as a node of the decision tree, and the option corresponding to the Aspect is taken as an edge of the node;

step S23, repeating the step S21 and the step S22 to generate a complete decision tree;

and S24, designing a clarification problem template according to the decision tree, and finally forming a man-machine conversation process.

Further, the decision tree algorithm is an information gain algorithm, and the algorithm preferentially selects Aspect with the maximum information gain, and the specific calculation formula is as follows:

Gain(aspect)＝I(API ₁ ，...，API _m )-E(aspect) (1)

in the above formula, gain (aspect) represents the information Gain; (I (API) ₁ ，…，API _m ) Information entropy representing all APIs; (E (Aspect)) represents the information entropy of the current Aspect.

Furthermore, the man-machine interaction process comprises a plurality of rounds of man-machine interaction, each round of man-machine interaction comprises a clarification question generated by the system, to-be-selected items and options selected by a user, the last round of system gives a recommended API list, the clarification question templates are fourteen, the clarification question templates respectively correspond to 14 types of nodes in the decision tree, and all sides of each node are used as to-be-selected items.

In step S3, the expanding obtains other APIs having semantic relation with the initial API, and realizes mapping from the dialogue process to the knowledge graph, so as to obtain an optimal clarification path of each API, where the process of obtaining the clarification path includes:

converting the dialogue process into a path in the decision tree, wherein the path is from a head node to a path with highest recommended API efficiency, the starting point of the path is the head node of the decision tree, and the end point is a recommended API list;

and (3) reducing nodes and edges in the paths into entities and relations in the knowledge graph from the mapping of the paths in the decision tree to the knowledge graph, wherein the entities and relations are connected in the knowledge graph, so that the optimal clarification path is formed.

On the other hand, the application also provides an intention clarifying system applying the intention clarifying method, which comprises a diversified knowledge composition form construction module, a knowledge efficient guiding strategy module and a recommendation result expansion and interpretation module;

the knowledge form construction module is used for taking an API description sentence in an API document as input, extracting API behavior knowledge, outputting various entities and entity relations, and storing the entity and entity relations in an API behavior knowledge graph;

the knowledge efficient guiding strategy module is used for receiving an initial query statement input by a user, searching out a subgraph from an API behavior knowledge graph by using a subgraph searching algorithm, taking the subgraph as input, constructing a decision tree by using a decision tree algorithm, generating a clarification problem and options of each round according to the decision tree, and returning the clarification problem and options to the user; updating the decision tree according to the options selected by the user, generating a new round of clarification questions and options, repeating the steps until the user selects to stop the dialogue, and outputting a recommendation result API;

the recommendation result expansion and interpretation module is used for taking the result API as input, and using an API expansion strategy to obtain an expansion API, so that each result API and an optimal clarification path corresponding to the expansion API are obtained.

Compared with the prior art, the application has the beneficial effects that:

1. the application provides an API recommendation-oriented high-efficiency query clarification method based on knowledge graph KG driving path optimization, which designs a novel knowledge perception artificial intelligent dialogue agent KAHAID, and combines the instant response capability of API research with the interactive, clarification, interpretation and scalability capability of social technical information search; the present application transitions API searching from just finding the best API to enhancing the overall query process by providing potentially useful and heuristic knowledge, enabling interactive, heuristic, interpretable and extensible exploratory discovery.

2. The application digs meaningful API behavior knowledge including operation, object, constraint and function/semantic relation of the API, and organizes the knowledge into an API behavior knowledge graph KG; this comprehensive knowledge graph not only helps to clarify the problem efficiently, but also allows the user to more effectively navigate and understand the functionality of the API.

3. On the basis of the basic knowledge graph, the application designs a high-efficiency knowledge guiding strategy based on a decision tree algorithm to give priority to the optimal question aspect, reduce the turn of the question answers and gradually guide the developer to clarify the fuzzy question intention.

Drawings

FIG. 1 is a flow chart of the API-oriented recommendation intent clarification method based on knowledge-graph driven path optimization of the present application;

FIG. 2 is a diagram of a novel knowledge organization form design in an embodiment of the application;

FIG. 3 is a flow chart of a knowledge extraction method in an embodiment of the application;

fig. 4 is a flowchart of a human-computer dialog generating method in an embodiment of the application.

Detailed Description

In order to make the technical conception and advantages of the application to achieve the objects of the application more apparent, the technical scheme of the application is further described in detail below with reference to the accompanying drawings. It is to be understood that the following examples are intended to illustrate and describe preferred embodiments of the application and should not be construed as limiting the scope of the application as claimed.

Example 1

As shown in fig. 1, the present application provides an API recommendation-oriented method for clarifying intent based on knowledge graph driving path optimization, comprising:

In order to meet the requirements of practical API, the application designs a brand-new knowledge organization form with accuracy, diversity, heuristics and interpretations; the method takes a knowledge graph as a carrier to organize rich entities and entity relations.

As shown in FIG. 2, in the present embodiment, through the processing and research of API documents, 6-class entities are designed and organized through 22-class entity relationships.

Wherein the entities are of 6 types including API, action, event, object, object Constraint, event Constraint. Wherein the Event entity comprises two types of entities, namely an Action entity and an Object entity.

The entity relationship is designed from the three aspects of event functional relationship, constraint functional relationship and semantic relationship:

(1) The Event function relationship is responsible for organizing four types of entities, namely an API, an Action, an Event and an Object, and comprises 4 types of API Has Event, act Has Event, has Direct Object and Has Preposition Object;

(2) Constraint relationships are responsible for organizing four classes of entities, event, object, object Constraint, including Object constraint relationships and Event constraint relationships. The object constraint relationship includes 2 classes, has Status and Has Type. Event constraints include categories 9 of Has Location relation, has Direction relation, has Manner relation, has Extent relation, has Temporal relation, has Goal relation, has Purpose relation, has Result relation, has Condition relation;

(3) The semantic relationship is responsible for organizing different API entities, including 7 classes such as functional replacement relationship, performance comparison relationship and the like; the method comprises the following steps: function Similarity, function Opposite, function replacement, function Collaboration, logic Constraint, behavior Difference, efficiency Comparison.

As shown in fig. 3, the extracting entity embodying API actions, events, objects and constraints from the initial API document in step S1 includes the following sub-processes:

step S12, designing grammar and semantic tags and labeling rules, and designing grammar role labeling and semantic role labeling for API description sentences according to the design rules; the grammar roles obtained by labeling are 6 classes, namely, verb, direct object, direct object modifier, preparation, preposition object, preposition object modifier; the semantic roles obtained through labeling have 9 types, such as constraint semantic roles of time constraint, place constraint, object constraint, aspect constraint and the like;

Taking fig. 3 as an example, after receiving the initial input API description sentence "converts a path string, or a sequence of strings that when joined form a path string, to a path", a plurality of different tags (e.g., V, ARG1, ARGM-TMP, and ARG 2) are labeled by sentence component using an NLP tool (including ALLenNLP, chatGPT, etc.). And extracting each grammar role and semantic role of the sentence according to the labels. For example, the verb "overt" in the grammar role is extracted from the position of the V-tag, and the semantic role "time constraint" is extracted from the tag ARGM-TMP, which represents a definition of the whole event over time. These sentence components are then combined to form an entity. In this case, the verb, the direct object, and the preposition object are combined to form one Event entity, e.g., "convert a path string to a nonempty path". As entities form, the relationship between them naturally occurs. All the extracted entities and functional relationships are organized into triples. For example, an Action entity and an Event entity may form a "Has Event" relationship, resulting in a triplet < "conver", act Has Event, "convert a path string to a nonempty path".

And finally, storing the triples by utilizing a Neo4j graph database, thereby constructing an API behavior knowledge graph.

The performance of the above method was evaluated experimentally in this example:

verifying the accuracy of the triples through experiments; because the triples contain entities and relationships, the accuracy of the triples can effectively embody the performance of the knowledge extraction method.

The experimental result shows that the average accuracy of the triples reaches more than 90 percent, and the minimum accuracy also reaches 85 percent; this shows that the triples constructed in this embodiment have high accuracy, and the method described in step S1 can efficiently extract accurate knowledge.

Fig. 4 is a flowchart of the man-machine conversation generating method in step 3, the flowchart includes the following four steps:

1. organizing each API and functional relationship into an attribute table, wherein constructing the attribute table is a key step of constructing a decision tree, and the step realizes conversion of the API and the functional relationship into a two-dimensional form, facilitates comparison of different aspects, and is helpful for determining the most critical aspect needing clarification in the query; the API entity and its functional relationship in this embodiment are called API { < e1, r, e2>,... The first column of the attribute table is the API column, which contains the ID numbers of all matching APIs. The columns following the first column are aspect columns. "e1#r" in the functional relationship < e1, r, e2> is selected as a column name, and option "e2" is selected as a column value. Note that if r in < e1, r, e2> represents an "Action Has Event" relationship, then "action#has Event" is directly used as a column name, instead of "e1#r".

2. Selecting key clarification aspects and dividing subgraphs according to an attribute table to obtain a sub-data set, wherein selecting the optimal key clarification aspects helps to minimize the height of a decision tree, thereby improving the efficiency of the question-answering process, and in the embodiment, an aspect column with the highest information gain is selected by adopting a decision tree algorithm ID3 based on the information gain, and the higher information gain represents more options or APIs related to the problem, which helps to reduce dialogue rounds and guide a user to quickly clarify the intention of the user;

specifically, the decision tree algorithm has the following calculation formula

Gain(aspect)＝I(API ₁ ，...，API _m )-E(aspect) (1)

The information gain of each attribute column can be calculated by formula (1), which is the difference between the information entropy I (API 1,..apim) of all APIs and the information entropy E (aspect) of the current attribute column; the uncertainty of the random variable is measured by entropy, the impurity degree of an example is represented, and the higher the entropy is, the larger the information quantity is represented.

Information entropy I (API 1,.. APIm) of all APIs is calculated by formula (2), where m is the number of APIs and Pi represents the probability that the I-th API appears in all APIs;

the information entropy E (aspect) of the current attribute column is calculated by formula (3), where k is the number of kinds of column values in the attribute column, "api_ {1j },...

Based on the above calculations, the column of the attribute with the highest information gain, whose attribute is to be used as the current node in the decision tree, and the different column values (including the null values) are connected as edges to the current node, can be determined from the attribute table. In order to generate a child node for each edge of the decision tree, the sub-graph needs to be divided, in this embodiment, APIs { < e1, r, e2> -with the same column value are grouped to form a child data set, and in this way, the construction of a layer of nodes and edges in the decision tree is completed.

3. Steps 1 and 2 are recursively repeated to build a complete decision tree supporting the query clarification process, the results of which are shown in fig. 4-b through 4-c.

To build a complete decision tree, steps 1 and 2 are recursively repeated in this embodiment for each sub-data set to build a child node until the following stopping criteria are met:

a. the sub-data set contains only one API;

the sub-data set contains a plurality of APIs with the same functionality;

when the stopping criteria are met, all APIs in the current sub-dataset are referred to as current nodes and are set as leaf nodes in the decision tree. It should be noted that when constructing a new attribute table, the selected aspects are deleted to avoid creating redundant clarification problems.

4. A query clarification process is generated from the decision tree as shown in fig. 4-c through 4-d.

The current node of the decision tree and all its edges are used in this embodiment to generate the clarification question and diversity options, and according to different aspects of the current node, the following templates may be used to generate the clarification question: after the user selects the option, the decision tree is pruned, leaving only the subtrees of the branches selected by the user; if the subtree has only one node, the answer will be directly recommended; otherwise, a new round of clarification and options will be generated; in the dialogue process, the user can manually stop generating a new round of clarification and options at any time; in this case, all APIs contained in the subtree are treated as recommended API sequences.

Finally, the embodiment evaluates the performance of the API-oriented recommendation intent clarification method based on knowledge graph driving path optimization through experiments. The evaluation is mainly performed from three aspects of dialogue efficiency, API recommendation and knowledge extension.

In terms of dialogue efficiency, the method successfully generates efficient semantic diversity problem options (the average diversity between any two options is 74.9%), and effectively reduces dialogue rounds (the average rounds required by query are not more than three rounds); the method has strong capability in API recommendation and knowledge extension.

For API recommendation, the method is superior to two prior API searching methods BIKER and CLEAR in average reciprocal ranking (MRR) and average precision (MAP) of 0.769 and 0.794 respectively, the MRR is improved by at least 47%, and the MAP is improved by 226.7%.

In knowledge extension, the MRR and MAP of the method are 0.815 and 0.864 respectively, which is at least 42% higher than that of the most advanced dialogue-based code search method ZaCQ, and the MAP is 45.2% higher.

While the preferred embodiments of the present application have been illustrated and described, the present application is not limited to the embodiments, and various equivalent modifications and substitutions can be made by one skilled in the art without departing from the spirit of the present application, and these modifications and substitutions are intended to be included in the scope of the present application as defined in the appended claims.

Claims

1. An API-oriented recommendation intention clarification method based on knowledge graph driving path optimization is characterized by comprising the following steps:

2. The method for clarifying the intention of the path optimization based on the knowledge graph driving of the API recommendation according to claim 1, wherein in the step S1, the entity reflecting the action, the event, the object and the constraint of the API is extracted from the initial API document, the relation category among the entities is enriched, the relation category comprises the API function relation and the API semantic relation, the knowledge graph is taken as a carrier, and the extracted entity and the relation with rich types are stored, and the process is as follows:

3. The method for clarifying the intention of the API-oriented recommendation based on the knowledge graph driving path optimization according to claim 2, wherein the design carries out grammar role labeling and semantic role labeling on API description sentences, and grammar roles obtained through labeling have 6 types, which are respectively: verb, direct object modification direct object modifier, preposition preposition object, preposition modification preposition object modifier; the semantic roles obtained through labeling are 9 types, and are respectively: position constraint, direction constraint, mode constraint, range constraint, time constraint, object constraint, objective constraint, result constraint and condition constraint semantic roles.

4. The API-recommendation-oriented intent clarification method based on knowledge graph driving path optimization of claim 1, wherein in step S1, the rich entities have 6 types, including an application program interface API, an Action, an Event, an Object, an Object Constraint Object Constraint, and an Event Constraint; the relation category comprises event function relation, constraint function relation and semantic relation;

5. The method for clarifying the intention of the API-oriented recommendation based on the knowledge graph driving path optimization according to claim 1, wherein in the step S2, the searching sub-graph is closely related to the user query statement, the searching sub-graph content comprises an API entity and other entities and relations related to the API entity, and the whole efficient guiding mechanism process comprises:

6. The API-recommendation-oriented intent clarification method based on knowledge graph driving path optimization of claim 1, wherein the decision tree algorithm is an information gain algorithm, and the algorithm preferably selects Aspect with the maximum information gain, and the specific calculation formula is as follows:

Gain(aspect)＝I(API ₁ ，...，API _m )-E(aspect) (1)

in the above formula, gain (aspect) represents the information Gain; (I (API) ₁ ，…，API _m ) Information entropy representing all APIs; (E (Aspect)) represents the information entropy of the current Aspect; m represents the number of APIs; pi represents the probability that the ith API appears in all APIs; k represents the number of kinds of column values in the attribute column; api_ {1j }, api_ { mj } represents m APIs associated with the j-th column value.

7. The API-recommendation-oriented intent clarification method based on knowledge graph driving path optimization of claim 5, wherein the man-machine interaction process includes a plurality of rounds of man-machine interaction, each round of man-machine interaction includes a clarification question generated by a system and options to be selected and a user selected option, the last round of system gives a recommended API list, the clarification question templates are fourteen, and all sides of each node are used as options to be selected, corresponding to 14 types of nodes in the decision tree respectively.

8. The method for clarifying the intention of driving path optimization based on the knowledge graph according to claim 1, wherein in step S3, the expanding obtains other APIs having semantic relation with the initial API, mapping from a dialogue process to the knowledge graph is realized, an optimal clarifying path of each API is obtained, and the process of obtaining the clarifying path comprises:

9. An intention clarification system applying the intention clarification method as claimed in claims 1 to 8, characterized by comprising a diversified knowledge composition form construction module, a knowledge efficient guidance strategy module and a recommendation result expansion and interpretation module;