CN114491536A - Code analysis method and device based on knowledge graph - Google Patents

Code analysis method and device based on knowledge graph Download PDF

Info

Publication number
CN114491536A
CN114491536A CN202210103947.XA CN202210103947A CN114491536A CN 114491536 A CN114491536 A CN 114491536A CN 202210103947 A CN202210103947 A CN 202210103947A CN 114491536 A CN114491536 A CN 114491536A
Authority
CN
China
Prior art keywords
class
code
calling
function
relation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210103947.XA
Other languages
Chinese (zh)
Inventor
吕博良
旷亚和
程佩哲
张�诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210103947.XA priority Critical patent/CN114491536A/en
Publication of CN114491536A publication Critical patent/CN114491536A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Stored Programmes (AREA)

Abstract

The disclosure provides a code analysis method based on a knowledge graph, which can be applied to the technical field of artificial intelligence. The method comprises the following steps: determining the information to be analyzed according to the full-scale code information and the user configuration information; determining the inheritance and implementation relation and the function single-level calling relation of each class according to the information of the class to be analyzed; supplementing the function single-level calling relation according to the inheritance and implementation relation of the class to generate a single-level function calling node; generating a code class calling knowledge graph according to the single-level function calling node; and calling a knowledge graph to analyze the code according to the code class. The present disclosure also provides a code analysis apparatus, a device, a storage medium, and a program product based on the knowledge-graph.

Description

Code analysis method and device based on knowledge graph
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to the field of static code analysis technologies, and more particularly, to a code analysis method, apparatus, device, medium, and program product based on a knowledge graph.
Background
Static code analysis refers to analyzing code semantics and behavior without actually executing the program, thereby finding out program semantics or undefined behavior in the program that is abnormal due to incorrect encoding. The static analysis does not need to wait for the completion of writing all codes, does not need to establish a running environment, writes test cases, and can find various problems in the codes in the early stage of a software development process.
However, modern software systems are larger and larger, the number of code lines is increased from tens of thousands or hundreds of thousands of lines to tens of millions of lines, the system complexity is higher and higher, and especially with the appearance of object-oriented languages, the function calling situation becomes more complex due to the appearance of scenes such as multiple states, abstract classes and the like. How to quickly and accurately analyze the function call flow is a core problem of code-level vulnerability mining and dangerous function call chain mining.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
In view of the foregoing, the present disclosure provides methods, apparatus, devices, media and program products for knowledge-graph based code analysis.
According to a first aspect of the present disclosure, there is provided a method of knowledge-graph based code analysis, comprising: determining the information to be analyzed according to the full-scale code information and the user configuration information;
determining the inheritance and implementation relationship and the function single-stage calling relationship of each class according to the information of the class to be analyzed;
supplementing the function single-level calling relation according to the inheritance and implementation relation of the class to generate a single-level function calling node;
generating a code class calling knowledge graph according to the single-level function calling node; and
and calling a knowledge graph according to the code class to analyze the code.
According to an embodiment of the present disclosure, the inheritance and implementation relationship of the class includes an inheritance relationship of the class, a relationship of the interface class and the implementation class, and a relationship of the abstract class and the implementation class, and the determining the single-level function call node by supplementing the function single-level call relationship according to the inheritance and implementation relationship of the class includes:
determining a class function call relation according to the inheritance relation of the class, the relation between the interface class and the implementation class and the relation between the abstract class and the implementation class;
determining a supplemented function single-level calling relation according to the class function calling relation and the function single-level calling relation;
and determining a single-stage function calling node according to the function name and the supplemented function single-stage calling relation corresponding to the function name, and storing the single-stage function calling node in a graph database.
According to an embodiment of the present disclosure, the generating a code class call knowledge-graph according to the single-level function call node includes:
acquiring a single-level function calling node in a graph database;
determining an entity relation table, an entity attribute table and a schema table according to the single-stage function calling node; and
and generating a code class calling knowledge graph according to the entity relation table, the entity attribute table and the schema table.
According to the embodiment of the disclosure, determining the entity relationship table and the schema table according to the single-stage function call node includes:
generating a plurality of entity-relation-entity triples of the code class, the calling relation and the code class by using the code class function and the class calling relation field, wherein the entity is the code class and the relation is the calling relation; and
and constructing a schema table, wherein the schema table comprises code class names, attribute types and association relations.
According to an embodiment of the present disclosure, the calling a knowledge graph according to the code class to analyze the code includes:
searching a code class calling chain relation through a depth-first traversal algorithm to generate all paths of any starting point; and
and determining a program calling chain of the target function according to the target function name and the path.
According to an embodiment of the present disclosure, said calling a knowledge graph according to the code class to analyze the code further includes:
determining the entrance and exit degree of the function according to the name of the target function and the knowledge graph; and
and determining the heat of the function according to the entrance and exit degrees.
According to an embodiment of the present disclosure, said calling a knowledge graph according to the code class to analyze the code further includes:
and calculating the similarity of the call chains of the plurality of software.
According to an embodiment of the present disclosure, the calculating call chain similarities of the plurality of pieces of software includes:
acquiring code class calling knowledge maps of at least two pieces of software;
calling a point-edge relation of a connected graph of the knowledge graph according to the code class to generate at least two vector matrixes;
calculating the similarity of the vector matrix by using a deep learning algorithm; and
and determining the similarity of the two pieces of software according to the similarity of the vector matrix.
A second aspect of the present disclosure provides a knowledge-graph-based code analysis apparatus including: the first determining module is used for determining the class information to be analyzed according to the full-scale class code information and the user configuration information;
the second determining module is used for determining the inheritance and implementation relation and the function single-level calling relation of each class according to the class information to be analyzed;
the first generation module is used for supplementing the function single-level calling relation according to the inheritance and implementation relation of the class so as to generate a single-level function calling node;
the second generation module is used for generating a code class calling knowledge graph according to the single-level function calling node; and
and the analysis module is used for calling the knowledge graph according to the code class to analyze the code.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described method of knowledge-graph based code analysis.
The fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described method of knowledge-graph based code analysis.
The fifth aspect of the present disclosure also provides a computer program product comprising a computer program that, when executed by a processor, implements the above-described method of knowledge-graph based code analysis.
According to the code analysis method based on the knowledge graph, the knowledge graph is constructed by analyzing the function single-level calling relation in the class codes, and the function call flow can be quickly and accurately analyzed in a graph path searching mode based on the knowledge graph, so that the work of code static analysis and dangerous call chain searching is completed in an auxiliary mode.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium, and program product for knowledge-graph based code analysis in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a method of knowledge-graph based code analysis in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow chart of a method of determining a single level function call site according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a single-level function call site diagram according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a flow diagram of a method of building a class function knowledge-graph according to an embodiment of the present disclosure;
FIG. 6 schematically shows a flow diagram of a code analysis method according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a block diagram of a knowledge-graph based code analysis apparatus according to an embodiment of the present disclosure; and
FIG. 8 schematically illustrates a block diagram of an electronic device suitable for implementing a knowledge-graph based code analysis method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B and C" would include but not be limited to systems that have a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
The embodiment of the disclosure provides a code analysis method based on a knowledge graph, which comprises the following steps: determining the information to be analyzed according to the full-scale code information and the user configuration information; determining the inheritance and implementation relation and the function single-level calling relation of each class according to the information of the class to be analyzed; supplementing the function single-level calling relation according to the inheritance and implementation relation of the class to generate a single-level function calling node; generating a code class calling knowledge graph according to the single-level function calling node; and calling a knowledge graph to analyze the code according to the code class.
FIG. 1 schematically illustrates an application scenario diagram of a method, apparatus, device, medium, and program product for knowledge-graph based code analysis, in accordance with embodiments of the present disclosure. It should be noted that the application scenario shown in fig. 1 is only an example of an application scenario that may be used in the embodiments of the present disclosure to help those skilled in the art understand the technical content of the present disclosure, but does not mean that the embodiments of the present disclosure may not be used in other devices, systems, environments or scenarios. It should be noted that the code analysis method and apparatus based on the knowledge graph provided by the embodiment of the present disclosure may be used in the relevant aspects of the static code analysis technical field and the financial field, and may also be used in any field other than the financial field.
As shown in fig. 1, the application scenario 100 according to this embodiment may include a code package analysis mining scenario. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various client applications installed thereon, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and otherwise process the received data such as the user request, and feed back a processing result (e.g., information or data generated according to the user configuration information and the request information) to the terminal device.
It should be noted that the method for analyzing code based on knowledge graph provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the code analysis apparatus based on knowledge graph provided by the embodiments of the present disclosure may be generally disposed in the server 105. The code analysis method based on knowledge graph provided by the embodiment of the present disclosure may also be performed by a server or a server cluster which is different from the server 105 and can communicate with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the code analysis apparatus based on knowledge graph provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster different from the server 105 and capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The knowledge-graph based code analysis method of the disclosed embodiments will be described in detail below with reference to fig. 2 to 6 based on the scenario described in fig. 1.
FIG. 2 schematically shows a flow diagram of a method of knowledge-graph based code analysis in accordance with an embodiment of the present disclosure.
As shown in FIG. 2, the method for knowledge-graph based code analysis of this embodiment includes operations S210-S250, which may be performed by a processor or other computing device.
In operation S210, class information to be analyzed is determined according to the full-scale class code information and the user configuration information.
In one example, the code package to be analyzed is integrated according to the user configuration, including integrating the code package, the single code class, and the basic class information of the operating environment, and determining the class range to be analyzed.
In operation S220, an inheritance and implementation relationship and a function single-level call relationship of each class are determined according to the class information to be analyzed.
In one example, each class file is sequentially scanned and analyzed according to the range information of the class to be analyzed determined in operation S210, an inheritance relationship of each class and a single-level call relationship of all functions included in the class are extracted, after all classes are analyzed, due to the occurrence of object-oriented language, and the occurrence of scenes such as multiple states and abstract classes, the function call situation is more complicated, and in order to obtain a complete single-level function call relationship, the inheritance and implementation relationships of a single class need to be combed out, where the inheritance and implementation relationships of a class include an inheritance relationship of the class, a relationship of an interface class and an implementation class, and a relationship of an abstract class and an implementation class.
In operation S230, the function single-level calling relationship is supplemented according to the inheritance and implementation relationship of the class to generate a single-level function calling node.
In one example, in order to obtain a complete single-level function call relationship, after determining the inheritance and implementation relationship of a class and the function single-level call relationship, the single-level call relationship of each class needs to be traversed according to the inheritance and implementation relationship of the class, the single-level call relationship is supplemented, the supplemented single-level call relationship is generated into a complete single-level function call node, and the complete single-level function call node is stored in a graph database manner, and a specific process of supplementing the single-level call relationship can be referred to a method for determining the single-level function call node in fig. 3.
In operation S240, a code class call knowledgegraph is generated from the single-level function call node.
In one example, function information data, i.e., a single level function call node (including function name and function single level call relationship), is obtained from a graph database. Mapping the code class function and the class calling relation field to a triple formed by a basic unit entity-relation-entity of the knowledge graph, wherein the entity is a code class. For example, generating "entity-relationship-entity" of the code class-invocation relationship-code class, constructing a graph generation point-edge relationship system and a schema table, and forming a class invocation graph through the entity relationship table and the schema table for subsequent presentation and analysis, where the generation process of the code class invocation knowledge graph may refer to operation S241 to operation S243 in fig. 5.
In operation S250, the knowledge graph is called according to the code class to analyze the code.
In one example, the code class call knowledge graph may be utilized to perform graph analysis to implement fast analysis on the software code, for example, a depth-first traversal algorithm is used to explore the call chain depth, a centrality algorithm measures the call degree of a class or method, a deep walk algorithm may be combined with a node2vec algorithm to measure the similarity of different software nails, and the like, and the code class call knowledge graph may be expanded, for example, the function call chain depth may be displayed, that is, by inputting a search value of a starting point v, a code class call chain with an arbitrary v as a starting point may be searched for distinguishing a display including possible reachable paths, path depths, path nodes, and other information. The popularity of the function can be shown, that is, the input function starting point can calculate the centrality of the class and the method through a knowledge graph, the popularity can be ranked by using the entrance and exit popularity of the entity stage, and the stage with high entrance and exit popularity is a method with high popularity or an inheritance method.
According to the code analysis method based on the knowledge graph, the knowledge graph is constructed by analyzing the function single-level calling relation in the class codes, and the function call flow can be quickly and accurately analyzed in a graph path searching mode based on the knowledge graph, so that the work of code static analysis and dangerous call chain searching is completed in an auxiliary mode.
Fig. 3 schematically illustrates a flow chart of a method of determining a single-level function call site according to an embodiment of the present disclosure. FIG. 4 schematically illustrates a single-level function call site diagram according to an embodiment of the disclosure. As shown in fig. 3, operation S230 includes operations S231 through S233.
In operation S231, a class function call relationship is determined according to the inheritance relationship of the class, the relationship between the interface class and the implementation class, and the relationship between the abstract class and the implementation class. In operation S232, a supplemented function single-level calling relationship is determined according to the class function calling relationship and the function single-level calling relationship. In operation S233, a single-level function call node is determined according to the function name and the supplemented function single-level call relationship corresponding to the function name, and stored in the graph database.
In one example, the single-level call relation of each Class needs to be traversed according to the inheritance and implementation relation of the Class, and the single-level call relation is supplemented, for example, the Method1 Method of Class a calls the Method1 interface of interface Class interface I, while Class B and Class c are both implementation classes of interface Class I, so that the single-level call relation of Class a.method1- > Class i.method1 is expanded to Class a.method1- > Class b.method1 and Class a.method1- > Class c.method 1; for example, method1 of Class A calls method1 method of abstract Class M, and Class D is a concrete implementation Class of Class M, then Class A. method1- > abstract Class M. method1 will be updated to Class A. method1- > Class D. method 1; similarly, if the class function called by class a. method1 has a subclass rewrite function, an extension process is also performed to store all subclass functions rewritten by the called function into a single-level call relation node. After the processing is finished, a complete single-stage function calling node is formed, as shown in fig. 4, there are three single-stage calling relationship nodes, a Class a Method1 has a calling relationship with a Class B Method1, a Class C Method1, a Class D Method1 and a Class a Method2, a Class B Method1 has a calling relationship with a Class B Method2, and a Class B Method2 has a calling relationship with a Class B Method3 and a Class C Method5, and the single-stage calling relationship nodes are aggregated to obtain multiple complete program calling chains, for example, a Class a Method1- > Class B Method1- > Class B Method2- > Class B Method 3.
FIG. 5 schematically shows a flow chart of a method of building a class function knowledge-graph according to an embodiment of the present disclosure. As shown, operation S240 includes operations S241 to S243.
In operation S241, a single-level function call node in a graph database is obtained. In operation S242, an entity relationship table, an entity attribute table, and a schema table are determined according to the single-stage function call node. In operation S243, a code class calling knowledge graph is generated according to the entity relationship table, the entity attribute table, and the schema table.
According to the embodiment of the disclosure, a code class function and a class calling relationship field are used for generating a plurality of entity-relationship-entity triples of a code class, a calling relationship and a code class, wherein the entity is the code class and the relationship is the calling relationship; and constructing a schema table, wherein the schema table comprises code class names, attribute types and association relations.
In one example, it is desirable to construct a graph generation point-edge relationship table and a schema table, such as tables 1-3 below. As shown in table 1, in the entity relationship table, an entity is a code class and is used as a point of the knowledge graph, and an association relationship is a call relationship and is used as an edge of the knowledge graph; the entity attribute table records the relationship between the code classes, for example, Class B and Class A are inheritance relationship. The graph database can be input to carry out drawing through an entity-relation-entity table and a Schema table, and a code class calling knowledge graph is generated. After the knowledge graph is generated, the code can be quickly and accurately analyzed based on the knowledge graph.
TABLE 1 "entity-relationship-entity" Table
Code class Code class Association relation Edge number
Class A Method3 Class A Methodl l 0_flow
Class A Methodl Class A Method2 1 1_flow
Class A Method1 Class B Methodl 1 2_flow
Class B Method1 Class B Method2 1 3_flow
TABLE 2 entity-Attribute value Table
Code class Relationship of belongings Value of relationship
Class B Inheritance Class A
Class C Inheritance Class B
Class A To realize interface I
Class B Implementation of interface I
TABLE 3 Schema Table
Type (B) Name (R) Description of the invention Unidirectional Attribute name Attribute type
Entity Code class Class of start point codes Class of start point codes String
Entity Code class Destination code class Destination code class String
Relationships between Association relation Association relation TRUE Class of start point codes String
Destination code class String
Edge number String
Association relation int
FIG. 6 schematically shows a flow diagram of a code analysis method according to an embodiment of the present disclosure. As shown in fig. 6, operation S250 includes operations S251 to S253. It should be noted that operations S251, S252, and S253 are only different analysis directions based on the knowledge graph, and the execution order is not particularly limited.
In operation S251, a depth-first traversal algorithm explorer is used to call a chain depth.
According to the embodiment of the disclosure, the code class calling chain relation is searched through a depth-first traversal algorithm, and all paths of any starting point are generated. And determining a program calling chain of the target function according to the name and the path of the target function.
Executing operation S251 may obtain a program call chain, specifically, taking a depth-first traversal algorithm as an example, search the code class call chain relation through the depth-first traversal algorithm, assuming that all vertices of the given graph G have not been visited yet. If a vertex v is chosen as the initial starting point (starting point code class) in G, depth-first traversal can be defined as follows: firstly, visiting a starting point v and marking the starting point v as visited; then, starting from v, each adjacent point w of v is searched. If w has not been visited, then the depth-first traversal continues with w as the new starting point until all vertices in the graph that have paths to the starting point code class v (also referred to as vertices reachable from the source point) have been visited. If there are still un-visited vertices in the graph, the above process is repeated with another un-visited vertex as a new source point until all vertices in the graph have been visited.
The specific traversal process is as follows: let x be the currently visited vertex, and after marking x as visited, choose an undetected edge (x, y) starting from x. If the vertex y is found to be visited, another undetected edge starting from x is reselected, otherwise, the undetected y is reached along the edge (x, y), and y is visited and marked as visited; then, starting from y, the search is performed until all paths from y are searched, that is, all vertices reachable from y are visited, and then the vertex x is traced back, and an undetected edge from x is selected again. The above process is performed until all edges from x have been detected. At this time, if x is not the source point, backtracking to the vertex visited before x; otherwise, all the vertexes which are communicated with the path of the source point (namely all vertexes which can be reached from the source point) in the graph are accessed, if the graph G is a communicated graph, the traversal process is ended, otherwise, one vertex which is not accessed yet is continuously selected as a new source point, and a new search process is carried out. All paths of any starting point v can be generated after algorithm traversal, such as the call chain relation of 'Class A Method1-Class B Method1-Class B Method2-Class C Method 5'.
In operation S252, call chain similarities of a plurality of software are calculated.
According to the embodiment of the disclosure, a code class calling knowledge graph of at least two pieces of software is obtained; calling a point-edge relation of a connected graph of the knowledge graph according to the code class to generate at least two vector matrixes; calculating the similarity of the vector matrix by using a deep learning algorithm; and determining the similarity of the two pieces of software according to the similarity of the vector matrix.
In one example, the similarity of different software can be analyzed based on the knowledge graph, and the code class calling knowledge graph of at least two pieces of software is constructed according to the construction method of the knowledge graph; the point-edge relation of a connected Graph of a knowledge Graph is converted into a set of mathematical vectors by using a Graph embedding method such as deep walk and node2vec, Graph data are marked by using a matrix, and different software is converted into the vectors, so that the matrix similarity can be measured by using a deep learning algorithm such as a convolutional neural network or a clustering algorithm, and the similarity between the software is judged according to the similarity of a vector matrix.
In operation S253, the heat of the objective function is calculated.
According to the embodiment of the disclosure, the entrance and exit degree of the function is determined according to the name of the target function and the knowledge graph; and determining the function heat according to the entrance and exit degree.
In one example, the input function starting point can calculate the centrality of the class and method through a knowledge graph, the function popularity can be ranked by rank through using the entrance and exit of the entity stage, and the stage with high entrance and exit is a method with high popularity or an inheritance method. The method can be expanded, the depth of the function call chain can be displayed, namely, the code class call chain taking any function V as a starting point can be searched by inputting the search value of the starting point function V, and the code class call chain is used for judging display and comprises possible reachable paths, path depths, path nodes and other information.
Based on the code analysis method based on the knowledge graph, the disclosure also provides a code analysis device based on the knowledge graph. The apparatus will be described in detail below with reference to fig. 7.
Fig. 7 schematically shows a block diagram of a knowledge-graph based code analysis apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the knowledge-graph-based code analysis apparatus 700 of this embodiment includes a first determination module 710, a second determination module 720, a first generation module 730, a second generation module 740, and an analysis module 750.
The first determining module 710 is configured to determine class information to be analyzed according to the full-scale class code information and the user configuration information. In an embodiment, the first determining module 710 may be configured to perform the operation S210 described above, which is not described herein again.
The second determining module 720 is configured to determine an inheritance and implementation relationship and a function single-level call relationship of each class according to the class information to be analyzed. In an embodiment, the second determining module 720 may be configured to perform the operation S220 described above, which is not described herein again.
The first generation module 730 is configured to supplement the function single-level call relationship according to the inheritance and implementation relationship of the class to generate a single-level function call node. In an embodiment, the first generating module 730 may be configured to perform the operation S230 described above, which is not described herein again.
The second generating module 740 is configured to generate a code class call knowledge graph from the single-level function call nodes. In an embodiment, the second generating module 740 may be configured to perform the operation S240 described above, which is not described herein again.
The analysis module 750 is configured to analyze the code by invoking a knowledge-graph according to the class of the code. In an embodiment, the analysis module 750 may be configured to perform the operation S250 described above, which is not described herein again.
According to an embodiment of the present disclosure, any plurality of the first determining module 710, the second determining module 720, the first generating module 730, the second generating module 740, and the analyzing module 750 may be combined into one module to be implemented, or any one of them may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first determining module 710, the second determining module 720, the first generating module 730, the second generating module 740, and the analyzing module 750 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first determining module 710, the second determining module 720, the first generating module 730, the second generating module 740 and the analyzing module 750 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
FIG. 8 schematically illustrates a block diagram of an electronic device suitable for implementing a knowledge-graph based code analysis method according to an embodiment of the present disclosure.
As shown in fig. 8, an electronic apparatus 900 according to an embodiment of the present disclosure includes a processor 901 which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage portion 908 into a Random Access Memory (RAM) 903. Processor 901 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 901 may also include on-board memory for caching purposes. The processor 901 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present disclosure.
In the RAM 903, various programs and data necessary for the operation of the electronic apparatus 900 are stored. The processor 901, the ROM 902, and the RAM 903 are connected to each other through a bus 904. The processor 901 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM 902 and/or the RAM 903. Note that the programs may also be stored in one or more memories other than the ROM 902 and the RAM 903. The processor 901 may also perform various operations of the method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
Electronic device 900 may also include input/output (I/O) interface 905, input/output (I/O) interface 905 also connected to bus 904, according to an embodiment of the present disclosure. The electronic device 900 may also include one or more of the following components connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include the ROM 902 and/or RAM 903 described above and/or one or more memories other than the ROM 902 and RAM 903.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the code analysis method based on the knowledge graph provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 901. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, and the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, and downloaded and installed through the communication section 909 and/or installed from the removable medium 911. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The computer program, when executed by the processor 901, performs the above-described functions defined in the system of the embodiment of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (12)

1. A code analysis method based on knowledge graph is characterized by comprising the following steps:
determining class information to be analyzed according to the full-scale class code information and the user configuration information;
determining the inheritance and implementation relationship and the function single-stage calling relationship of each class according to the information of the class to be analyzed;
supplementing the function single-level calling relation according to the inheritance and implementation relation of the class to generate a single-level function calling node;
generating a code class calling knowledge graph according to the single-level function calling node; and
and calling a knowledge graph according to the code class to analyze the code.
2. The method of claim 1, wherein the inheritance and implementation relationships of the class comprise an inheritance relationship of a class, a relationship of an interface class and an implementation class, and a relationship of an abstract class and an implementation class, and wherein determining a single-level function call node in addition to the function single-level call relationship according to the inheritance and implementation relationships of the class comprises:
determining a class function call relation according to the inheritance relation of the class, the relation between the interface class and the implementation class and the relation between the abstract class and the implementation class;
determining a supplemented function single-level calling relation according to the class function calling relation and the function single-level calling relation;
and determining a single-stage function calling node according to the function name and the supplemented function single-stage calling relation corresponding to the function name, and storing the single-stage function calling node in a graph database.
3. The method of claim 2, wherein generating a code class call knowledge graph from the single level function call node comprises:
acquiring a single-level function call node in a graph database;
determining an entity relation table, an entity attribute table and a schema table according to the single-stage function calling node; and
and generating a code class calling knowledge graph according to the entity relation table, the entity attribute table and the schema table.
4. The method of claim 3 wherein determining an entity relationship table and a schema table from the single level function call node comprises:
generating a plurality of entity-relation-entity triples of the code class, the calling relation and the code class by using the code class function and the class calling relation field, wherein the entity is the code class and the relation is the calling relation; and
and constructing a schema table, wherein the schema table comprises code class names, attribute types and association relations.
5. The method of claim 4, wherein invoking a knowledge-graph to analyze code according to the code class comprises:
searching a code class calling chain relation through a depth-first traversal algorithm to generate all paths of any starting point; and
and determining a program calling chain of the target function according to the target function name and the path.
6. The method of claim 4, wherein invoking a knowledge-graph to analyze code according to the code class further comprises:
determining the entrance and exit degree of the function according to the name of the target function and the knowledge graph; and
and determining the heat of the function according to the entrance and exit degrees.
7. The method of claim 4, wherein invoking a knowledge-graph to analyze code according to the code class further comprises:
and calculating the similarity of the call chains of the plurality of software.
8. The method of claim 7, wherein calculating call chain similarities for a plurality of software comprises:
acquiring code class calling knowledge maps of at least two pieces of software;
calling a point-edge relation of a connected graph of the knowledge graph according to the code class to generate at least two vector matrixes;
calculating the similarity of the vector matrix by using a deep learning algorithm; and
and determining the similarity of the two pieces of software according to the similarity of the vector matrix.
9. A code analysis apparatus based on a knowledge-graph, comprising:
the first determining module is used for determining the class information to be analyzed according to the full-scale class code information and the user configuration information;
the second determining module is used for determining the inheritance and implementation relation and the function single-level calling relation of each class according to the class information to be analyzed;
the first generation module is used for supplementing the function single-level calling relation according to the inheritance and implementation relation of the class so as to generate a single-level function calling node;
the second generation module is used for generating a code class calling knowledge graph according to the single-level function calling node; and
and the analysis module is used for calling the knowledge graph according to the code class to analyze the code.
10. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements a method according to any one of claims 1 to 8.
CN202210103947.XA 2022-01-27 2022-01-27 Code analysis method and device based on knowledge graph Pending CN114491536A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210103947.XA CN114491536A (en) 2022-01-27 2022-01-27 Code analysis method and device based on knowledge graph

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210103947.XA CN114491536A (en) 2022-01-27 2022-01-27 Code analysis method and device based on knowledge graph

Publications (1)

Publication Number Publication Date
CN114491536A true CN114491536A (en) 2022-05-13

Family

ID=81476449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210103947.XA Pending CN114491536A (en) 2022-01-27 2022-01-27 Code analysis method and device based on knowledge graph

Country Status (1)

Country Link
CN (1) CN114491536A (en)

Similar Documents

Publication Publication Date Title
US9740591B2 (en) Aggregating source code metric values
CN113032244B (en) Interface test method, device, computer system and computer readable storage medium
JP7465870B2 (en) System and method for dependency analysis in a multidimensional database environment - Patents.com
CN109710220B (en) Relational database query method, relational database query device, relational database query equipment and storage medium
TW201235943A (en) Unchanged object management
US10768925B2 (en) Performing partial analysis of a source code base
CN113535577A (en) Application testing method and device based on knowledge graph, electronic equipment and medium
US20210406254A1 (en) Provenance analysis systems and methods
CN116541069A (en) Key function evaluation method, device, electronic equipment, medium and program product
US20200349128A1 (en) Clustering within database data models
CN116166547A (en) Code change range analysis method, device, equipment and storage medium
US10318257B2 (en) Selective object sensitive points-to analysis
CN114491536A (en) Code analysis method and device based on knowledge graph
CN114677114A (en) Approval process generation method and device based on graph dragging
US20210248492A1 (en) Learning and Using Property Signatures for Computer Programs
US9244657B2 (en) System and method for an object instance acquirer
CN113032256A (en) Automatic test method, device, computer system and readable storage medium
US9465723B2 (en) Systems and/or methods for monitoring live software
US20230418845A1 (en) Connection nature between nodes in graph structure
US20220414101A1 (en) Shifting left database degradation detection
Kaminski et al. Towards process design for efficient organisational problem solving
de Lange Basic Concepts of Computer Science
CN115904968A (en) Interface testing method and device, computer equipment and storage medium
CN118069507A (en) Regression testing quality assessment method and device based on code knowledge graph
CN114119806A (en) Nested flow chart generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination