CN115543402B - Software knowledge graph increment updating method based on code submission - Google Patents

Software knowledge graph increment updating method based on code submission Download PDF

Info

Publication number
CN115543402B
CN115543402B CN202211453262.4A CN202211453262A CN115543402B CN 115543402 B CN115543402 B CN 115543402B CN 202211453262 A CN202211453262 A CN 202211453262A CN 115543402 B CN115543402 B CN 115543402B
Authority
CN
China
Prior art keywords
code
entity
relationship
commit
knowledge graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211453262.4A
Other languages
Chinese (zh)
Other versions
CN115543402A (en
Inventor
邹艳珍
常文辉
王玥
谢冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202211453262.4A priority Critical patent/CN115543402B/en
Publication of CN115543402A publication Critical patent/CN115543402A/en
Application granted granted Critical
Publication of CN115543402B publication Critical patent/CN115543402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • G06F8/658Incremental updates; Differential updates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a software knowledge graph increment updating method based on code submission, which comprises the following steps: 1) According to the existing knowledge graph of the target software project, a code evolution knowledge graph meta-model is established for the target software project, and the code evolution knowledge graph meta-model comprises a code submitting entity, a code entity and the relationship among the code submitting entity and the code entity; the code entity comprises a code file, a variable, a function, a structural body and a domain member; the relationship comprises addition, deletion and modification relationship, and the definition relationship, reference relationship, calling relationship and belonging relationship of the code file to the variable, the function and the structure body; 2) Analyzing Commit data of the target software project each time to obtain code syntax trees of two versions before and after the update of the project; then traversing each syntax tree, and extracting a code element set to be updated; 3) And updating the current knowledge graph of the target software project according to the code element set to be updated and the evolution knowledge graph meta-model to obtain an updated software knowledge graph.

Description

Software knowledge graph increment updating method based on code submission
Technical Field
The invention belongs to the technical field of computer software, and relates to a software knowledge graph increment updating method based on code submission (Commit).
Background
The software knowledge graph is an effective way for assisting developers to master domain knowledge and understand software project architecture. However, as software project code continues to evolve, new challenges are faced in software project code understanding and software project knowledge graph construction. Research shows that software maintenance tasks caused by the continuous development of software occupy most of development cost in software development. In an actual development scene, a software code faces a large amount of frequent modification submission, and the timeliness and consistency of a statically constructed software knowledge graph are difficult to ensure; development and test personnel also need to pay attention to the evolution information of the project source code, such as the addition and deletion of a data structure, the modification of a function call relation and the like, so that the quality and the maintainability of the code architecture are guaranteed. In the software development process, code evolution information is generally managed by a version control system (such as Git) by taking code submission (Commit) as a unit, so that a developer is difficult to quickly obtain the range and content of code change, and thus, the change of a software project architecture is understood.
The existing work aiming at knowledge graph updating mainly focuses on the research of a knowledge representation learning method facing to a general knowledge graph, the method has higher dependence on data, the support of enough data is lacked in the knowledge graph facing to a specific field such as a software project knowledge graph, and specific knowledge concepts in a plurality of fields are difficult to learn effectively; in the aspect of code evolution information representation, the related work of the software knowledge graph mostly takes codes of specific versions as research objects, the research on the extraction and utilization of code evolution knowledge is relatively less, the knowledge graph is usually only simply constructed for code data of each version aiming at multi-version codes, the independence between different versions can hinder the query operation of a user to a great extent, and the acquisition of code evolution information is inconvenient.
In order to update the established software knowledge graph in time and help developers conveniently and quickly acquire code evolution information and understand a software project architecture, it is necessary to invent a method for updating the increment of the software knowledge graph.
Disclosure of Invention
Aiming at the problems in the prior art, the invention aims to provide a method and a mechanism for history tracing of software code segments.
In order to overcome the problems that the existing software knowledge graph is low in updating efficiency and code evolution knowledge is lack of explicit representation, the invention provides a software knowledge graph increment updating method based on code submission (Commit), and a tool plug-in is realized based on the method. By the method and the tool provided by the invention, the code submission data of the Git version control system of the software source code warehouse can be based on, the collection of the update files is extracted, the code evolution knowledge in the corresponding software knowledge graph is acquired by analyzing the update code files, and the corresponding knowledge graph update algorithm is designed to realize the incremental update of the software knowledge graph.
The technical scheme adopted by the invention for solving the technical problem is as follows:
a software knowledge graph incremental updating method based on code submission comprises the following steps:
1) Establishing a code evolution knowledge map meta-model for a target software project according to the existing knowledge map of the target software project; the code evolution knowledge graph meta-model comprises a code submission entity (i.e., commit entity) and a code entity, the code entity comprising: code files, variables, functions, structures, and domain members; the relationship type of the code evolution knowledge graph meta-model comprises an adding relationship between the Commit entity and the code entity, a deleting relationship between the Commit entity and the code entity, a modifying relationship between the Commit entity and the code entity, a defining relationship of the code file to the variable, a defining relationship of the code file to the function, a defining relationship of the code file to the structure, a reference relationship between the code files, a calling relationship of the function entity to other functions or variables, and a belonging relationship between the domain member and the structure; the version control system takes each Commit of the target software item as a Commit entity;
2) Analyzing Commit data of the target software project each time, extracting a code file set modified by the Commit at the current time, and respectively analyzing code files of two versions before and after the target software project is updated to obtain two corresponding syntax trees; then traversing each syntax tree, recording the information of each syntax tree in a corresponding set according to the node type and the parent-child relationship between the nodes, obtaining sets corresponding to the code files of the two versions before and after updating, carrying out difference analysis on the sets, and extracting an updated code element set;
3) Updating the current knowledge graph of the target software project according to the updated code element set and the evolution knowledge graph meta-model to obtain an updated software knowledge graph:
(1) For the newly added code entity, creating entity nodes of corresponding types in the knowledge graph, setting a BEGIN _ TIMESTAMMP attribute value of the entity nodes as a TIMESTAMP of the current Commit, mounting the code entity into the knowledge graph, establishing a relationship with other entities, and setting the BEGIN _ TIMESTAMMP attribute value of the established relationship; BEGIN _ TIMESTAMP is a start time attribute;
(2) For deleting a code entity, reserving the relevant content of the code entity in the knowledge graph, and setting the code entity and the attribute END _ TIMESTAMMP of the relevant relationship as the TIMESTAMP of the next Commit;
(3) For the modified code entity, updating the modified attribute information of the code entity in the knowledge graph; executing the operation of creating the relationship for the newly added relationship with other entities of the modified code entity, and setting the BEGIN _ TIMESTAMP attribute of the relationship as the TIMESTAMP of the current Commit; and setting the END _ TIMESTAMMP attribute of the relationship as the TIMESTAMP of the current Commit for the relationship between the modified code entity and other entities deleted by the code entity.
Further, in step 2), the method for extracting the updated code element set includes: analyzing Commit data to obtain a new type code element, a deleted type code element and a modified type code element; 21 For new type code elements): performing static analysis on a local source code where a newly added type code element in the updated version is located, and acquiring entity attributes and association relations of the newly added type code element in the knowledge graph; 22 For delete type code elements: matching the deleted type code element to an entity in the knowledge graph according to the absolute path and the element name of the code file to which the deleted type code element belongs in the target software project; 23 For modification type code elements: firstly, matching the modified code elements to the entities in the knowledge graph, then performing static analysis on the source code where the code elements are located, and performing difference analysis on the analysis result and the current knowledge graph to obtain the entity attributes and the association relation of the code elements corresponding to the knowledge graph to be updated.
Further, in step 3), if the dependency relationship between the entities to be updated exists in the secondary Commit data, that is, when the newly added element of the secondary Commit includes a function a and a function B, and the function a calls the function B, the method for updating the knowledge graph is as follows:
31 Searching an entity matched with the function B in the knowledge graph, and if the matching fails, judging the function B as a newly added entity; then, inquiring a virtual node matched with the function B in the virtual node set, if a virtual node B matched with the function B exists in the virtual node set, establishing an association relation between a node corresponding to the function A and the virtual node B, and setting a BEGIN _ TIMETAMMP attribute value of the association relation; then removing the virtual node b from the virtual node set;
32 If no virtual node matched with the function B exists in the virtual node set, a file name, a name and a type of the comprehensive function B are used for creating a virtual node B' for the function B, and the virtual node is added to the virtual node set; then, the association relationship between the node corresponding to the function A and the virtual node b' is established, and the BEGIN _ TIMETAMMP attribute value of the association relationship is set.
Further, when the current knowledge graph of the target software project is updated, for the newly added code entity: creating a corresponding entity in the knowledge graph, supplementing attribute information and setting a BEGIN _ TIMETAMMP attribute value of the created entity as a TIMESTAMP of the current Commit; establishing a relationship between the entity and other entities, and setting a BEGIN _ TIMESTAMMP attribute value of the established relationship as a TIMESTAMP of the current Commit; and establishing an incremental relationship between the current Commit entity and the newly created entity.
Further, when the current knowledge-graph of the target software project is updated, for the deleted code entity: matching a code entity to be deleted in the knowledge graph, inquiring the entry edge of the code entity, a head entity associated with the entry edge and a tail entity associated with the exit edge and the exit edge, and performing recursive inquiry on the tail entity; and then setting the queried END _ TIMESTAMMP attribute values of all entities and relations as the TIMESTAMP of the current Commit, and creating a deletion relation between the current Commit entity and the related code entity.
Further, when the current knowledge-graph of the target software project is updated, for the modified code entity: if the modification operation only relates to modification of the entity attribute, modifying the attribute value of the related entity; if the modification operation relates to a new relationship between entities, establishing a relationship between corresponding head and tail entities and setting a relationship attribute BEGIN _ TIMETAMMP; if the modify operation involves an entity deletion, the END _ TIMESTAMP attribute of the deleted entity is set to the Commit TIMESTAMP, creating a modified relationship between the current Commit entity and the updated entity.
Further, the Eclipse CDT tool is used for analyzing the code files of the two versions before and after updating respectively to obtain two corresponding syntax trees.
A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.
A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.
The invention comprises the following steps:
1. design of software code evolution knowledge graph spectral element model
For a specific software project, the knowledge graph needs to define the knowledge entities to be acquired in the field and the types and attributes of the relations, namely, a knowledge graph meta-model is defined, and the basic concept and structural mode of the knowledge graph are defined and constrained. The design of the software code evolution knowledge graph spectral element model in the invention mainly considers from two aspects: the method comprises the following steps that firstly, the syntax structure of C language codes depends on, such as the reference to header files, variable functions defined in files and the like, and the functions call variables and other functions, \8230; on the other hand, the information is evolution information of the code, the update of the Commit to the code file relates to the addition or deletion of the code entity and the relationship between the code entities, the relationship between the two entities is established according to the update operation type (addition, deletion, modification) executed by the Commit to the code entity, and the time/version information contained in the Commit is identified and recorded. The finally established code evolution knowledge graph meta-model comprises 6 types of entities, each Commit of the version control system is a type of entity, namely a Commit entity, information of the Commit is used as the attribute of the Commit entity, and the submitted data is specifically modified code content and is represented by establishing the relationship between the Commit entity and the related code entities. The attributes of the Commit entity include: the id of the current Commit, the Commit date (i.e. timestamp) of the Commit, the list of Commit modification files, and the code content specifically modified by the Commit is represented by establishing an association between the Commit entity and the related code entities; the remaining 5 types of entities are code entities, including: code files, variables, functions, constructs, and domain members. The model comprises 9 relation types, wherein the Commit entity and the code entity have the following relationship types according to the update type: adding relationship (Add), deleting relationship (Delete) and modifying relationship (Modify); the code type entity then includes: definition (Define) of code files on variables, functions and structures, reference relationship (Include) existing among the code files, calling relationship (Invoke) of function entities on other functions and variables, and membership relationship (Member _ of) of domain members and structures.
Commit update operations essentially correspond to the addition/deletion of related entities and relationships, so the defined model adds two types of attribute types for each type of code entity, as well as for the relationships between code entities: the BEGIN _ TIMESTAMP, END _ TIMESTAMP (initial value is set to-1), the Commit time of the corresponding Commit is used as a TIMESTAMP, the time of creating and deleting the entity or relationship is represented, and the life cycle of a code entity or relationship is represented by the two attribute information. The change in the code evolution process corresponds to the addition or deletion of entities or relations in the map and the modification of entity attributes, and the evolution information of map elements is described by adding the two types of attribute types. For example, when a certain function entity is deleted, the entity and the directly associated Define and Invoke relationship remain, and only the END _ TIMESTAMP attribute value therein is set to the TIMESTAMP of the current Commit to indicate the deletion of the related graph element. The END _ TIMESTAMP attribute value of any code entity or relationship is not a default value, which indicates that the entity or relationship is information in a history version, and the time span in which it exists and the update record of the corresponding Commit can be known according to the two attribute values.
According to the definition of the code evolution knowledge graph meta-model, the invention explains the graph structure representation of the updating operation of the code evolution in the knowledge graph. In particular, the method of manufacturing a semiconductor device,
(1) New addition of code elements: and creating a corresponding entity in the map, supplementing the attribute information, and establishing the relationship between the entity and other entities. All newly created entities and relationships involved in this process, the corresponding BEGIN _ TIMESTAMP attribute value is set to the TIMESTAMP of the current Commit, and an ADD relationship is established between the current Commit and the newly created entities.
(2) Deletion of code elements: and matching the code entities to be deleted, further inquiring the entry and entry associated head entities of the entities and the exit and exit associated tail entities of the entities, and recursively inquiring the tail entities according to the same processing. Setting the END _ TIMESTAMMP attribute values of all queried entities and relationships to the TIMESTAMP of the current Commit, and creating a DELETE relationship between the Commit entity and the related code entity.
(3) Modification of code elements: since some modification operations involve only modification of an entity's attributes, such as a change in a function modifier, such modification operations do not correspond to a change in a relationship between entities, only the attribute values of the relevant entities are modified for such updates. For the changed relation among the entities related to the modification operation, for the newly added relation, establishing the relation between the corresponding head and tail entities and setting a relation attribute BEGIN _ TIMETAMMP; for deleted relationships, only the END _ TIMESTAMP attribute may be set to Commit TIMESTAMP. And finally creating a modified relation between the Commit entity and the modified entity.
2. Code evolution knowledge extraction based on code submission
Analyzing Commit data by using a JGit tool, extracting a code file set modified by the Commit at the current time, acquiring specific contents before and after updating the code file, further analyzing the code files of two versions before and after updating by using an Eclipse CDT tool respectively to obtain two corresponding abstract syntax trees, traversing the syntax trees, recording information of the syntax trees in the corresponding sets according to node types and parent-child relations between nodes, namely header files, variables, structural bodies and domain members contained in the code files, and calling of functions and functions to other functions and variables, performing difference analysis on the sets corresponding to the two obtained version code files, and finally extracting an updated code element set.
The positioning of the update code elements is realized by analyzing Commit data, and in order to obtain the specific update content of the update elements in the map, corresponding processing rules are respectively designed according to the update types of the elements:
(1) New type code elements: utilizing an Eclipse CDT tool to perform local static analysis on the source code of the updated version (only a code source file where an updating element is located is analyzed), and acquiring information such as entity attributes and incidence relations of the element corresponding to a target map;
(2) Delete type code element: matching the deleted code element to an entity in the existing map according to the absolute path of the code file to which the element belongs in the project and the element name;
(3) Modify type code element: and (3) processing in the step (2), matching the elements to the entities in the map, analyzing the updated code file according to the local static analysis method in the step (1), and performing difference analysis (attribute difference and associated entity difference) on the analysis result and the existing map content to obtain the attributes to be updated and the relationships of the code elements of the update type corresponding to the map entities.
3. Knowledge graph updating algorithm based on virtual nodes
On the basis of acquiring code evolution knowledge, updating the existing knowledge graph. And executing the following operations according to the entity updating type:
(1) For the newly added type code entity, establishing an entity node of a corresponding type in the graph, setting an attribute BEGIN _ TIMESTAMMP as a TIMESTAMP of Commit, mounting the entity into the graph, establishing a relationship with other entities, and setting a BEGIN _ TIMESTAMMP attribute value of the relationship;
(2) For a deletion type entity, the related content of the entity in the graph is still reserved, and only the attribute END _ TIMESTAMP of the entity and the related relation is set as the current Commit TIMESTAMP to indicate the deletion of the entity and the relation;
(3) Modifying the attribute of the relevant entity corresponding to the modification type entity and adding or deleting the relationship between the entities, and representing the code evolution information by setting the attribute values of BEGIN _ TIMESTEMTAMP and END _ TIMEMTAMP.
In the process of map updating, considering that there may be a dependency relationship between entities to be updated, for example, in a Commit update, the function a newly adds a call relationship to the function B, and the function B belongs to an entity newly added in the update, so that the function B must be updated first before the function a is updated, otherwise, when the entity of the function B is searched, a matching failure occurs because the entity is not created, which causes a dependency between entity updates. Aiming at the situations, the invention provides a method for creating virtual nodes, and when entity matching fails, the attribute information of the entity nodes which are temporarily created and used for establishing the relationship between the entities is supplemented in the subsequent updating process. The specific operation process is as follows:
(1) Aiming at the entity with failed matching, creating a corresponding virtual node by a Hash method of Hash (fileName, elementary name, elementary type), i.e. integrating the information of the file name, the element name and the element type of the code element, and adding the node to the virtual node set for maintenance;
(2) When a matching entity is queried, if a common entity node cannot be matched, querying a set of virtual nodes: if the relationship exists, directly establishing the association between the entity nodes, setting a BEGIN _ TIMESTAMMP attribute of the relationship, and representing the creation TIMESTAMP of the relationship; if the matched and inquired node does not exist in the virtual node set, a corresponding virtual node is created, a hash value is calculated, the node is added into the virtual node set, meanwhile, the relationship among the nodes is created, and a corresponding timestamp attribute value is set;
(3) When a code entity node is newly built, if the hash index corresponding to the entity already exists in the virtual node set, the attribute information of the virtual node is perfected according to the result of static analysis, and the node is removed from the set.
Taking the functions a and B in the foregoing as examples, in the process of updating the function entity a, the function entity B called by a needs to be searched, but since the function entity B is not created at this time, a virtual node corresponding to B is generated, here, temporarily marked as vB, and is added to the set of virtual nodes, and meanwhile, the call relationship from the function entity a to the vB is established; and in the subsequent updating process of the function entity B, firstly accessing the virtual node set when the entity node is created, matching the corresponding virtual node vB according to the calculated hash value, not performing the operation of creating the entity node, and only completing the attribute information of the virtual node and removing the attribute information from the virtual node set to obtain the updated function entity B.
The invention has the beneficial effects that:
a user can perform incremental updating on the existing software knowledge graph by setting Commit data as the input of a tool, and simultaneously record and represent evolution information of software codes in a data range in the graph, so that the redundant overhead caused by the total updating of the graph is reduced to a greater extent, the timeliness and consistency of graph contents are ensured, and a more intuitive way for acquiring a code change range and contents is provided for the user.
Drawings
FIG. 1 is a flow chart illustrating the implementation of the present invention.
FIG. 2 is a software code evolution knowledge map spectral meta-model.
Fig. 3 is an overall implementation framework of the present invention.
FIG. 4 is an updated exemplary subgraph of the present invention.
Detailed Description
The invention will be described in further detail with reference to the following drawings, which are given by way of example only for the purpose of illustrating the invention and are not intended to limit the scope of the invention.
FIG. 1 illustrates the execution flow of the inventive tool: firstly, designing a software code evolution knowledge graph spectral element model to explicitly express code evolution knowledge; based on code submission, extracting the code evolution knowledge through the processes of extracting the updating elements and updating entity mapping, and finally realizing incremental updating of the software code evolution knowledge map through a map updating algorithm; FIG. 2 is a design of a knowledge graph spectral meta-model of software code evolution, which defines entities, relationships and attributes of the entities in the knowledge graph; fig. 3 illustrates the overall implementation framework of the inventive tool: the update element extraction module acquires Commit updated code elements, the update entity mapping module acquires specific contents to be updated of the code elements in the knowledge graph, and the knowledge graph update module performs incremental update on the software knowledge graph by executing database transactions.
In order to verify the effect of the tool, namely the efficiency of incremental updating of the knowledge graph and the correctness of updating the knowledge graph, 4C language software projects are selected for experimental verification, and the C language software projects comprise a GitHub high star source project and an enterprise internal software project. The contents of the code file in which a Commit modification is made are as follows:
Figure DEST_PATH_IMAGE002
the content of this update includes: adding a reference of a head file los _ toolchain.h; a new function LOS _ LowBitGet is defined; the LOS _ BITMAP SetNBits are modified, the calls to the variables BITMP _ BITS _ PER _ WORD and OS _ BITMP _ WORD _ MASK are deleted, and the call to the function BITMP _ LAST _ WORD _ MASK is added.
In the updating element extraction stage, the operation is identified as the modification operation of the code file los _ bitmap.c, and is recorded as the addition of a header file; adding a function and modifying the function; in the entity mapping updating stage, the recorded updating code elements are mapped to the map entities and the relations, and the specific content to be updated corresponding to the knowledge map in the updating operation is obtained; and then, by executing the database transaction, the entities to be updated and the relationships are represented in the graph, so as to obtain a structural representation as shown in fig. 4 (for visual representation, only a partial sub-graph of the Commit update is shown).
Wherein, the Commit entity establishes a modify relationship with the file entity los _ bitmap.c, and the specific update content includes: adding an include of a los _ toolchain.h header file, setting a BEGIN _ TIMESTAMMP attribute value of the include relationship as the submission date of Commit, and indicating the initial time of creating the relationship; meanwhile, the Commit entity establishes add relationship with a function entity LOS _ LowBitGet to represent a new definition, and the attribute BEGIN _ TIMETAMP of the function entity, the outgoing edge of the two invoke types and the incoming edge of the define type are set as Commit submission dates; and the modification of the Commit to the function entity LOS _ BITMAP setnbits is also represented by the relationship of the modify, because the function deletes the call to the two variable entities in the update, the END _ timeout attribute value of the two invoke relationships is set as Commit submission date, which represents the deletion time of the relationship, and similarly, the BEGIN _ timeout attribute value of the newly added invoke relationship to the function entity BITMAP _ LAST _ WORD _ MASK is set as Commit submission date, which represents the creation time of the relationship.
Although specific embodiments of the invention have been disclosed for purposes of illustration, and to facilitate an understanding of the context of the invention and its implementation, those skilled in the art will appreciate that: various substitutions, changes and modifications are possible without departing from the spirit and scope of the present invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims (9)

1. A software knowledge graph increment updating method based on code submission comprises the following steps:
1) Establishing a code evolution knowledge map meta-model for a target software project according to the existing knowledge map of the target software project; the code evolution knowledge map spectral meta-model comprises a code submission entity and a code entity, wherein the code submission entity is called Commit entity; the code entity includes: code files, variables, functions, structures, and domain members; the relationship type of the code evolution knowledge graph meta-model comprises an adding relationship between the Commit entity and the code entity, a deleting relationship between the Commit entity and the code entity, a modifying relationship between the Commit entity and the code entity, a defining relationship of the code file to the variable, a defining relationship of the code file to the function, a defining relationship of the code file to the structure, a reference relationship between the code files, a calling relationship of the function entity to other functions or variables, and a belonging relationship between the domain member and the structure; the version control system takes each Commit of the target software item as a Commit entity;
2) Analyzing Commit data of the target software project each time, extracting a code file set modified by the Commit at the current time, and respectively analyzing code files of two versions before and after the target software project is updated to obtain two corresponding syntax trees; then traversing each syntax tree, recording the information of each syntax tree in a corresponding set according to the node type and the parent-child relationship between the nodes, obtaining sets corresponding to the code files of the two versions before and after updating, carrying out difference analysis on the sets, and extracting an updated code element set;
3) Updating the current knowledge graph of the target software project according to the updated code element set and the evolution knowledge graph meta-model to obtain an updated software knowledge graph:
(1) For the newly added code entity, creating entity nodes of corresponding types in the knowledge graph, setting a BEGIN _ TIMESTAMMP attribute value of the entity nodes as a TIMESTAMP of the current Commit, mounting the code entity into the knowledge graph, establishing a relationship with other entities, and setting the BEGIN _ TIMESTAMMP attribute value of the established relationship; BEGIN _ TIMESTAMP is a start time attribute;
(2) For deleting a code entity, reserving the relevant content of the code entity in the knowledge graph, and setting the code entity and the attribute END _ TIMESTAMMP of the relevant relationship as the TIMESTAMP of the next Commit;
(3) For the modified code entity, updating the modified attribute information of the code entity in the knowledge graph; executing the operation of creating the relationship for the newly added relationship with other entities of the modified code entity, and setting the BEGIN _ TIMESTAMP attribute of the relationship as the TIMESTAMP of the current Commit; and setting the END _ TIMESTAMP attribute of the relation as the time stamp of the Commit at the time for the relation between the modified code entity and other entities deleted by the code entity.
2. The method of claim 1, wherein in step 2), the method of extracting the updated code element set comprises: analyzing Commit data to obtain a new type code element, a deleted type code element and a modified type code element; 21 For new type code elements): performing static analysis on a local source code where a newly added type code element in the updated version is located to obtain entity attributes and association relations of the newly added type code element in the knowledge graph; 22 For delete type code elements: matching the deleted type code element to an entity in the knowledge graph according to the absolute path and the element name of the code file to which the deleted type code element belongs in the target software project; 23 For modification type code elements): firstly, matching the modified code elements to an entity in a knowledge graph, then performing static analysis on a source code where the code elements are located, and performing difference analysis on an analysis result and the current knowledge graph to obtain entity attributes and association relations of the code elements corresponding to the knowledge graph to be updated.
3. The method according to claim 1 or 2, wherein in step 3), if the dependency relationship between the entities to be updated exists in the secondary Commit data, that is, when the secondary Commit newly added element includes a function a and a function B, and the function a makes a call to the function B, the method for updating the knowledge-graph is as follows:
31 Searching an entity matched with the function B in the knowledge graph, and if the matching fails, judging the function B as a newly added entity; then, inquiring a virtual node matched with the function B in the virtual node set, if a virtual node B matched with the function B exists in the virtual node set, establishing an association relation between a node corresponding to the function A and the virtual node B, and setting a BEGIN _ TIMETAMMP attribute value of the association relation; then removing the virtual node b from the virtual node set;
32 If no virtual node matched with the function B exists in the virtual node set, a file name, a name and a type of the comprehensive function B are used for creating a virtual node B' for the function B, and the virtual node is added to the virtual node set; then, the association relationship between the node corresponding to the function A and the virtual node b' is established, and the BEGIN _ TIMETAMMP attribute value of the association relationship is set.
4. The method of claim 1, wherein updating the current knowledge-graph of the target item of software comprises, for newly added code entities: creating a corresponding entity in the knowledge graph, supplementing attribute information and setting a BEGIN _ TIMETAMMP attribute value of the created entity as a TIMESTAMP of the current Commit; establishing a relationship between the entity and other entities, and setting a BEGIN _ TIMESTAMMP attribute value of the established relationship as a TIMESTAMP of the current Commit; and establishing an incremental relationship between the current Commit entity and the newly created entity.
5. The method of claim 1, wherein updating the current knowledge-graph of the target item of software comprises, for deleted code entities: matching a code entity to be deleted in the knowledge graph, inquiring the entry edge of the code entity, a head entity associated with the entry edge and a tail entity associated with the exit edge and the exit edge, and performing recursive inquiry on the tail entity; and then setting the queried END _ TIMESTAMMP attribute values of all entities and relations as the TIMESTAMP of the current Commit, and creating a deletion relation between the current Commit entity and the related code entity.
6. The method of claim 1, wherein, when updating the current knowledge-graph of the target item of software, for modifying code entities: if the modification operation only relates to modification of the entity attribute, modifying the attribute value of the related entity; if the modification operation relates to a new relationship between entities, establishing a relationship between corresponding head and tail entities and setting a relationship attribute BEGIN _ TIMETAMMP; if the modify operation involves an entity deletion, the END _ TIMESTAMMP attribute of the deleted entity is set to the Commit TIMESTAMP, creating a modified relationship between the current Commit entity and the updated entity.
7. The method of claim 1, wherein the Eclipse CDT tool is utilized to parse the code files of the two versions before and after the update respectively to obtain two corresponding syntax trees.
8. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 7.
9. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202211453262.4A 2022-11-21 2022-11-21 Software knowledge graph increment updating method based on code submission Active CN115543402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211453262.4A CN115543402B (en) 2022-11-21 2022-11-21 Software knowledge graph increment updating method based on code submission

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211453262.4A CN115543402B (en) 2022-11-21 2022-11-21 Software knowledge graph increment updating method based on code submission

Publications (2)

Publication Number Publication Date
CN115543402A CN115543402A (en) 2022-12-30
CN115543402B true CN115543402B (en) 2023-03-24

Family

ID=84721188

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211453262.4A Active CN115543402B (en) 2022-11-21 2022-11-21 Software knowledge graph increment updating method based on code submission

Country Status (1)

Country Link
CN (1) CN115543402B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116028651B (en) * 2023-03-28 2023-06-09 南京万得资讯科技有限公司 Knowledge graph construction system and method supporting ontology and data increment updating
CN116756052B (en) * 2023-08-18 2023-11-14 建信金融科技有限责任公司 Data processing method and device
CN117130987B (en) * 2023-10-27 2024-02-23 北京卓翼智能科技有限公司 Flight control management method for large-scale unmanned aerial vehicle cluster

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598021A (en) * 2018-05-25 2019-12-20 阿里巴巴集团控股有限公司 Method, device and system for acquiring knowledge graph of picture
CN114692155A (en) * 2022-05-30 2022-07-01 中国海洋大学 Vulnerability code map construction and code vulnerability detection method based on knowledge map

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10474949B2 (en) * 2014-08-19 2019-11-12 Qualcomm Incorporated Knowledge-graph biased classification for data

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110598021A (en) * 2018-05-25 2019-12-20 阿里巴巴集团控股有限公司 Method, device and system for acquiring knowledge graph of picture
CN114692155A (en) * 2022-05-30 2022-07-01 中国海洋大学 Vulnerability code map construction and code vulnerability detection method based on knowledge map

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
代码知识图谱构建及智能化软件开发方法研究;王飞等;《软件学报》;20191106(第01期);全文 *
基于大数据的软件项目知识图谱构造及问答方法;邹艳珍;《大数据》;20200131;全文 *
面向开源软件项目的软件知识图谱构建方法;李文鹏等;《计算机科学与探索》;20160131(第06期);全文 *

Also Published As

Publication number Publication date
CN115543402A (en) 2022-12-30

Similar Documents

Publication Publication Date Title
CN115543402B (en) Software knowledge graph increment updating method based on code submission
US10127250B2 (en) Data transformation system, graphical mapping tool and method for creating a schema map
US8826225B2 (en) Model transformation unit
Göde et al. Studying clone evolution using incremental clone detection
CN106062751B (en) Management of data profiling operations relating to data types
US9400733B2 (en) Pattern matching framework for log analysis
US20060235899A1 (en) Method of migrating legacy database systems
US20110295794A1 (en) System and method for supporting data warehouse metadata extension using an extender
US11354284B2 (en) System and method for migration of a legacy datastore
EP3674918B1 (en) Column lineage and metadata propagation
CN110795455A (en) Dependency relationship analysis method, electronic device, computer device and readable storage medium
US7792851B2 (en) Mechanism for defining queries in terms of data objects
US11455283B2 (en) Candidate element selection using significance metric values
US11263187B2 (en) Schema alignment and structural data mapping of database objects
US8607217B2 (en) Incremental upgrade of entity-relationship systems
CN112434046B (en) Data blood margin analysis method, device, equipment and storage medium
US7159171B2 (en) Structured document management system, structured document management method, search device and search method
CN110083617B (en) DDL statement processing method and device, electronic device and medium
CN111367893A (en) Method and device for database version iteration
Souza et al. Provenance of dynamic adaptations in user-steered dataflows
CN111460000A (en) Backtracking data query method and system based on relational database
CN114461454A (en) Data recovery method and device, storage medium and electronic equipment
EP0990983A1 (en) Automation of the design recovery and forward engineering of legacy applications
CN112817931A (en) Method and device for generating incremental version file
CN111221846B (en) Automatic translation method and device for SQL sentences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant