CN116661857A - Data extraction method, device, equipment and storage medium - Google Patents

Data extraction method, device, equipment and storage medium Download PDF

Info

Publication number
CN116661857A
CN116661857A CN202310636435.4A CN202310636435A CN116661857A CN 116661857 A CN116661857 A CN 116661857A CN 202310636435 A CN202310636435 A CN 202310636435A CN 116661857 A CN116661857 A CN 116661857A
Authority
CN
China
Prior art keywords
node
target
leaf node
adjacency matrix
target leaf
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310636435.4A
Other languages
Chinese (zh)
Inventor
李勉燕
连煜伟
李欢余
黄琼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310636435.4A priority Critical patent/CN116661857A/en
Publication of CN116661857A publication Critical patent/CN116661857A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/74Reverse engineering; Extracting design information from source code
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/75Structural analysis for program understanding

Abstract

The disclosure provides a data extraction method, a device, equipment and a storage medium, which can be applied to the technical field of data processing or the technical field of finance and technology. The method comprises the following steps: converting the acquired code file into an initial grammar tree, wherein the initial grammar tree comprises a root node and a leaf node; traversing root nodes and leaf nodes in an initial grammar tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root nodes and target leaf nodes and between the target leaf nodes, and comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf nodes are leaf nodes with preset node identifiers; and extracting data associated with the target leaf nodes from the database according to target element values meeting preset conditions in the directed adjacency matrix as a data extraction result, wherein the data extraction result also comprises a target grammar tree generated according to the directed adjacency matrix.

Description

Data extraction method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of data processing technology or financial technology, and in particular, to a data extraction method, apparatus, device, storage medium, and program product.
Background
Along with the transformation of the service architecture, in the process of transferring the service from the original service architecture to the new service architecture, all the services need to be subjected to detailed design, coding, testing and other processes. In the service transformation process, the original service development is generally not affected, and the original service program logic is required to be kept unchanged.
In the process of implementing the inventive concept of the present disclosure, the inventor found that the following problems generally exist in the related art: in the detailed design flow for re-developing the transformation business, the carding process or the extraction logic for extracting the program data is complicated, and the extraction efficiency of the program data is low, so that the time cost required by the drawing process of the program call flow chart is high, the drawing efficiency is low, and the drawing error rate is high.
Disclosure of Invention
In view of the above, the present disclosure provides a data extraction method, apparatus, device, storage medium, and program product.
One aspect of the present disclosure provides a data extraction method, including: converting the acquired code file into an initial syntax tree, wherein the code file comprises a plurality of code objects, and the initial syntax tree comprises a root node and a leaf node for representing the code objects; traversing the root node and the leaf node in the initial grammar tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root node and a target leaf node and between the target leaf node and the target leaf node, the directed adjacency matrix comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf node is a leaf node with a preset node identifier; and extracting data associated with the target leaf nodes from a database according to target element values meeting preset conditions in the directed adjacency matrix, wherein the data extraction result further comprises a target grammar tree generated according to the directed adjacency matrix.
According to an embodiment of the present disclosure, traversing the root node and the leaf node in the initial syntax tree to obtain a directed adjacency matrix includes: adding a target node to the root node of the initial grammar tree to obtain a complete grammar tree; identifying the root node and the leaf node in the complete grammar tree based on the preset node identifier to obtain an identification result; traversing the complete grammar tree based on the identification result to obtain the directed adjacency matrix.
According to an embodiment of the present disclosure, the identification result includes a root node and the target leaf node; traversing the complete grammar tree based on the identification result, and obtaining the directed adjacency matrix comprises the following steps: constructing an initial adjacent matrix, wherein the size of the initial adjacent matrix is related to the total number of nodes of the root node and the leaf node, and initial element values in the initial adjacent matrix are the same; assigning different numbers to the root node and the target leaf node in the target syntax tree; determining the position of an initial element value to be modified in the initial adjacency matrix according to the number of the root node and the number of the target leaf node under the condition that the root node and at least one target leaf node have a hierarchical relationship; and/or determining the position of the initial element value to be modified in the initial adjacency matrix according to the number of the target leaf nodes under the condition that the hierarchical relationship exists between at least two target leaf nodes; and modifying the initial element value to be modified into a preset value to obtain the directed adjacency matrix.
According to an embodiment of the present disclosure, the extracting, from the database, data associated with the target leaf node according to the target element value satisfying a preset condition in the directed adjacency matrix includes: traversing the directed adjacency matrix based on a preset traversing rule, and screening out target element values meeting preset conditions; tracing at least one target leaf node according to the position of the target element value in the directed adjacent matrix; and extracting data associated with the target leaf node from the code base based on the target leaf node.
According to an embodiment of the present disclosure, the extracting, based on the target leaf node, data associated with the target leaf node from the code base includes: determining the node type of the target leaf node; and extracting data associated with the target leaf node from the code base directly according to a mapping file associated with the target leaf node when the node type of the target leaf node is a data declaration type node; searching a calling method of the target leaf node under the condition that the node type of the target leaf node is a calling method type node; analyzing a grammar tree constructed based on the calling method, and determining a mapping file associated with the target leaf node; and extracting data associated with the target leaf node from the code base according to the mapping file associated with the target leaf node.
According to an embodiment of the present disclosure, the extracting, from the code base, data associated with the target leaf node according to a mapping file associated with the target leaf node includes: generating an initial mapping file list based on the mapping files associated with the target leaf nodes; removing repeated mapping files from the initial mapping file list to obtain a target mapping file list; and extracting data associated with the target leaf node from the code base according to the target mapping file list.
According to an embodiment of the present disclosure, the node type of the leaf node includes at least one of: identifier type node, data declaration type node, function call type node, statement type node, expression type node.
Another aspect of the present disclosure also provides a data extraction apparatus, including: and the conversion module is used for converting the acquired code file into an initial grammar tree, wherein the code file comprises a plurality of code objects, and the initial grammar tree comprises root nodes and leaf nodes for representing the code objects. The traversing module is configured to traverse the root node and the leaf node in the initial syntax tree to obtain a directed adjacency matrix, where the directed adjacency matrix is used to characterize a situation where a hierarchical relationship exists between the root node and a target leaf node, and between the target leaf node and the target leaf node, and the directed adjacency matrix includes a plurality of element values obtained according to the situation where the hierarchical relationship exists, and the target leaf node is a leaf node with a preset node identifier. And the extraction module is used for extracting data associated with the target leaf nodes from the database according to target element values meeting preset conditions in the directed adjacency matrix as a data extraction result, wherein the data extraction result also comprises a target grammar tree generated according to the directed adjacency matrix.
Another aspect of the present disclosure also provides an electronic device, including: one or more processors; and a storage device for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the data extraction method.
Another aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the data extraction method described above.
Another aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the data extraction method described above.
According to the data extraction method, the device, the equipment, the storage medium and the program product provided by the embodiment of the disclosure, the obtained code file is converted into an initial grammar tree; traversing the initial grammar tree to obtain a directed adjacency matrix; and carrying out data extraction according to the target element values meeting the preset conditions in the directed adjacent matrix to obtain a data extraction result comprising data and a target grammar tree generated according to the directed adjacent matrix. Because in the process of extracting program data, a grammar tree is constructed based on a code file, the grammar tree is traversed to obtain a directed adjacent matrix, nodes with preset identifiers in the grammar tree are screened out, and nodes without the preset identifiers are removed, so that when the data is extracted according to the adjacent matrix, the nodes without the preset identifiers are not needed to be considered, the workload of extracting the program data is reduced, the extraction logic of the program data is simplified, the whole process can be automatically executed, the extraction efficiency of the program data is high, the error rate is low, the problem of low extraction efficiency of the program data in the related art is at least partially solved, and the technical effects of shortening the drawing time cost of a call flow chart, improving the drawing efficiency and reducing the drawing error rate are achieved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be more apparent from the following description of embodiments of the disclosure with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates a system architecture diagram of a data extraction method and apparatus according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a data extraction method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a schematic diagram of an initial syntax tree according to an embodiment of the present disclosure;
FIG. 4 schematically illustrates a diagram of a complete syntax tree obtained by adding target nodes on the basis of the initial syntax tree shown in FIG. 3, according to an embodiment of the present disclosure;
FIG. 5 schematically illustrates a diagram of leaving only a target leaf node with a preset identifier, in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a simplified syntax tree diagram according to an embodiment of the present disclosure;
FIG. 7 schematically illustrates a flowchart for obtaining relevant data table names according to a mapping relationship of a subroutine and a mapper, according to an embodiment of the disclosure;
FIG. 8 schematically illustrates a flow chart of a data extraction method according to another implementation of the present disclosure;
FIG. 9 schematically illustrates a flow chart of eliminating redundant nodes of operation S820 according to an embodiment of the disclosure;
Fig. 10 schematically shows a block diagram of a data extraction apparatus according to an embodiment of the present disclosure; and
fig. 11 schematically illustrates a block diagram of an electronic device adapted to implement a data extraction method according to an embodiment of the disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It should be noted that the terms used herein should be construed to have meanings consistent with the context of the present specification and should not be construed in an idealized or overly formal manner.
Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.).
In the technical scheme of the disclosure, the related data (such as including but not limited to personal information of a user) are collected, stored, used, processed, transmitted, provided, disclosed, applied and the like, all conform to the regulations of related laws and regulations, necessary security measures are adopted, and the public welcome is not violated.
Along with the transformation of the current service architecture, the host computer of the current host computer credit card on the lower platform is decoupled, and all services need to be re-designed, encoded and tested in detail, and a program call flow chart in the detailed design process is of great importance to the developer and the tester, especially in the process of understanding the relevance and the flow of the content of the service module by the developer and the tester. The program data extraction process is complicated and is easy to make mistakes, so that the program call flow chart is easy to be lost, the program call flow chart and the actual program processing process have deviation, the program call is nested layer by layer, the carding cost is high, the mistakes are easy, and the drawing of the program call flow chart and the carding of test assets are high.
In view of this, embodiments of the present disclosure provide a data extraction method, apparatus, device, storage medium, and program product for improving data extraction efficiency and accuracy, thereby shortening the drawing time cost of a call flow chart, improving drawing efficiency, and reducing drawing error rate. Specifically, the method includes converting an acquired code file into an initial syntax tree, wherein the code file includes a plurality of code objects, and the initial syntax tree includes a root node and a leaf node for representing the code objects; traversing root nodes and leaf nodes in an initial grammar tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root nodes and target leaf nodes and between the target leaf nodes, and comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf nodes are leaf nodes with preset node identifiers; and extracting data associated with the target leaf nodes from the database according to target element values meeting preset conditions in the directed adjacency matrix as a data extraction result, wherein the data extraction result also comprises a target grammar tree generated according to the directed adjacency matrix.
It should be noted that the data extraction method and apparatus determined in the embodiments of the present disclosure may be used in the field of data processing technology or the field of financial technology, and may also be used in any field other than the field of data processing technology or the field of financial technology, and the application field of the determined data extraction method and apparatus in the embodiments of the present disclosure is not limited.
Fig. 1 schematically illustrates a system architecture diagram of a data extraction method and apparatus according to an embodiment of the present disclosure.
As shown in fig. 1, a system architecture 100 according to this embodiment may include a first terminal device 101, a second terminal device 102, a third terminal device 103, a network 104, and a server 105. The network 104 is a medium used to provide a communication link between the first terminal device 101, the second terminal device 102, the third terminal device 103, and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 through the network 104 using at least one of the first terminal device 101, the second terminal device 102, the third terminal device 103, to receive or transmit code files, etc. Various communication client applications, such as a code analysis class application, a financial class application, a shopping class application, a web browser application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only) may be installed on the first terminal device 101, the second terminal device 102, the third terminal device 103.
The first terminal device 101, the second terminal device 102, the third terminal device 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (merely an example) providing analysis processing for a code file transmitted by a user using the first terminal apparatus 101, the second terminal apparatus 102, and the third terminal apparatus 103. The background management server may analyze and process the received data such as the code file, and feed back the processing result (e.g., program data, database table, grammar tree, web page, information, or data obtained or generated from the code file) to the terminal device.
It should be noted that the data extraction method provided by the embodiments of the present disclosure may be generally performed by the server 105. Accordingly, the data extraction apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The data extraction method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105. Accordingly, the data extraction apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the first terminal device 101, the second terminal device 102, the third terminal device 103, and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The data extraction method of the disclosed embodiment will be described in detail below with reference to fig. 2 to 9 based on the system architecture described in fig. 1.
Fig. 2 schematically illustrates a flow chart of a data extraction method according to an embodiment of the present disclosure.
As shown in fig. 2, the data extraction of this embodiment includes operations S210 to S230.
The acquired code file is converted into an initial syntax tree including a plurality of code objects, the initial syntax tree including root nodes and leaf nodes for representing the code objects in operation S210.
Traversing the root node and the leaf node in the initial syntax tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root node and the target leaf node and between the target leaf node and the target leaf node, and the directed adjacency matrix comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf node is a leaf node with a preset node identifier.
In operation S230, data associated with the target leaf node is extracted from the database according to the target element values satisfying the preset condition in the directed adjacency matrix as a data extraction result, wherein the data extraction result further includes a target syntax tree generated according to the directed adjacency matrix.
According to embodiments of the present disclosure, a code file may refer to a code source file, and a plurality of code objects may be included in the code source file, each of which may be a program. Specifically, the code file may be used to draw a program call flow chart, and the method provided in the embodiment of the present disclosure may automatically obtain program data corresponding to each program, for example, various data tables related to the program, and may also convert the program into a tree diagram form according to the code source file, thereby implementing automatic drawing of the program call flow chart, and improving drawing efficiency and accuracy.
According to embodiments of the present disclosure, an initial syntax tree may be an abstract representation of a syntax structure in a code file, the initial syntax tree may represent the structure and semantics of the code, each node is an object representing a statement or an expression in the code, each node has its own attributes and child nodes, which may facilitate analysis of the code. The process of converting the code into the syntax tree may be implemented by using a common conversion tool, which is not described herein.
According to an embodiment of the present disclosure, the preset identifier may include var_decl (variable declaration), stmt_expr (statement expression), modification_expr (modification expression), call_expr (function call), etc., the node having the preset identifier may be understood as the useful node, and the target leaf node may be the useful node. The process of traversing the initial grammar tree to obtain the directed adjacency matrix can be understood as a process of eliminating the nodes without the identification, eliminating the useless redundant nodes and flowing down the useful nodes. If a hierarchical relationship exists between the root node and the target leaf node, indicating that an edge exists between the root node and the target leaf node, mapping the edge into a directed adjacency matrix, wherein element values corresponding to the root node and the target leaf node are preset element values with the hierarchical relationship; if no hierarchical relationship exists between the two target leaf nodes, indicating that no edge exists between the two target leaf nodes, mapping the two target leaf nodes into a directed adjacency matrix, wherein the element values corresponding to the two target leaf nodes are preset element values with no hierarchical relationship.
According to embodiments of the present disclosure, the target leaf node may be considered a subroutine, and the data associated with the target leaf node may be the associated database table name of the subroutine. The data extraction results may include the relevant database table names of the subroutines, as well as the target grammar tree for only useful nodes based on the directed adjacency matrix.
According to the data extraction method, the device, the equipment, the storage medium and the program product provided by the embodiment of the disclosure, the obtained code file is converted into an initial grammar tree; traversing the initial grammar tree to obtain a directed adjacency matrix; and carrying out data extraction according to the target element values meeting the preset conditions in the directed adjacent matrix to obtain a data extraction result comprising data and a target grammar tree generated according to the directed adjacent matrix. Because in the process of extracting program data, a grammar tree is constructed based on a code file, the grammar tree is traversed to obtain a directed adjacent matrix, nodes with preset identifiers in the grammar tree are screened out, and nodes without the preset identifiers are removed, so that when the data is extracted according to the adjacent matrix, the nodes without the preset identifiers are not needed to be considered, the workload of extracting the program data is reduced, the extraction logic of the program data is simplified, the whole process can be automatically executed, the extraction efficiency of the program data is high, the error rate is low, the problem of low extraction efficiency of the program data in the related art is at least partially solved, and the technical effects of shortening the drawing time cost of a call flow chart, improving the drawing efficiency and reducing the drawing error rate are achieved.
Fig. 3 schematically illustrates a schematic diagram of an initial syntax tree according to an embodiment of the present disclosure.
As shown in fig. 3, fig. 3 may be an initial grammar tree graph generated based on a code file, with the data in fig. 3 being for example only. After converting the code file into the initial syntax tree, each node in the syntax tree may also be assigned a node type identifier, such as an identifier type node, a data declaration type node, a function call type node, a statement type node, an expression type node, etc. An identifier type node may be used to indicate that the node is of the identifier type; the data type node may represent a numeric type, an alphabetic type, or the like for the node; the data declaration node may represent that the node is a variable declaration or the like; a function-invoking node may indicate that the node invoked a function, etc. A statement node may represent that the node is statement-type; an expression node may indicate that the node is an expression. By marking the node type, redundant nodes can be deleted directly according to the node mark when the adjacent matrix is constructed later, and the efficiency of constructing the adjacent matrix is improved.
According to an embodiment of the present disclosure, operation S220 may include the following operations: adding a target node for the root node of the initial grammar tree to obtain a complete grammar tree; based on a preset node identifier, identifying root nodes and leaf nodes in the complete grammar tree to obtain an identification result; traversing the complete grammar tree based on the identification result to obtain the directed adjacency matrix.
Operation S220 may be understood as eliminating syntax tree redundant nodes according to an embodiment of the present disclosure. The redundant nodes, namely nodes which cannot express program meanings in the abstract syntax tree, eliminate nodes which are irrelevant to programs in the syntax tree, so that the efficiency of extracting static information of the syntax tree is improved.
Fig. 4 schematically illustrates a schematic diagram of a complete syntax tree obtained by adding a target node on the basis of the initial syntax tree illustrated in fig. 3 according to an embodiment of the present disclosure.
Referring to fig. 4, in particular, a head node may be added to the root nodes of the plurality of syntax trees obtained in operation S210, resulting in 1 complete syntax tree. The target node may refer to the head node.
Fig. 5 schematically illustrates a schematic diagram of leaving only a target leaf node with a preset identifier according to an embodiment of the present disclosure.
After the complete grammar tree is obtained, text traversal can be performed on the grammar tree to identify useful nodes. Namely the following are useful nodes: if the node identifier is var decl, namely, a variable declaration; if the node identifier is stmt expr, namely the statement expression; modifying the expression if the node identifier is mod i (6) expr; if the node identifier is call_expr, i.e. a function call, the node identifier satisfies the above 4 cases, which may be useful nodes. And traversing the initial grammar tree through the identification result, and eliminating useless redundant nodes in the initial grammar tree. The final remaining useful nodes may be as shown in fig. 5.
According to embodiments of the present disclosure, the identification result may include a root node and a target leaf node. Traversing the complete grammar tree according to the identification result to obtain a directed adjacency matrix can comprise the following operations: constructing an initial adjacency matrix, wherein the size of the initial adjacency matrix is related to the total number of nodes of the root node and the leaf node, and the initial element values in the initial adjacency matrix are the same; respectively assigning different numbers to a root node and a target leaf node in a target grammar tree; under the condition that a hierarchical relationship exists between the root node and at least one target leaf node, determining the position of an initial element value to be modified in an initial adjacency matrix according to the number of the root node and the number of the target leaf node; and/or determining the position of the initial element value to be modified in the initial adjacency matrix according to the number of the target leaf nodes under the condition that the hierarchical relationship exists between at least two target leaf nodes; and modifying the initial element value to be modified into a preset value to obtain a directed adjacency matrix.
According to the embodiment of the disclosure, a directed adjacency matrix is established according to an initial grammar tree, depth-first traversal and breadth-first traversal are carried out on the labels in the useful nodes, and nodes, such as function types, function names, variable names and the like, in the abstract grammar tree are found out. First, performing depth first traversal from top to bottom, recording identifiers of corresponding nodes, and performing second traversal according to branches of the nodes. Determining a branch through the node identifier, and directly realizing depth-first traversal by using the mark value transmission of the node in the top-down traversal process; after finishing traversing one branch, traversing the next marked branch in the node, and realizing breadth-first traversal of the node marks.
Fig. 6 schematically illustrates a simplified syntax tree diagram according to an embodiment of the present disclosure.
Referring to fig. 6, in particular, the process of traversing a plain grammar tree and creating a directed adjacency matrix using depth-first traversal, breadth-first traversal may be as follows. Taking the plain grammar tree shown in FIG. 6 as an example, a number may be first created to represent each node. For example, the number 0 indicates node a, the number 1 indicates node B, and so on. These numbers are then placed in a matrix, which is called the adjacency matrix. The adjacency matrix is a matrix of n x n, where n is the number of nodes, and if there is an edge between node i and node j, the value of matrix element a [ i ] [ j ] is 1; otherwise, the value of a [ i ] [ j ] is 0. Traversing the grammar tree by using depth priority traversal, and constructing an adjacency matrix, wherein the specific steps are as follows:
starting from the root node a, its number is assigned 0.
All child nodes (B and C) of a are traversed and their numbers are assigned 1 and 2, respectively.
For child nodes D of B, their numbers are assigned 3, respectively.
The child nodes E and F of C are traversed and their numbers are assigned to 4 and 5.
A 6x6 adjacency matrix with an initial value of 0 can be created by the above steps. Then, for the node pair where an edge exists, the corresponding matrix element is set to 1. For the above example, the adjacency matrix may be as shown in equation (1).
Wherein the first row of the matrix corresponds to node a, the second row corresponds to node B, and so on. Thus, the value of the first row and the second column is 1, indicating that there is an edge between node A and node B. The value of the third column of the first row is 1, indicating that there is an edge between node a and node C. The value of the fourth column of the second row is 1, indicating that there is an edge between node B and node D. The value of the fifth column of the third row is 1, indicating that there is an edge between node C and node F. The value of the sixth column in the third row is 1, indicating that there is an edge between node C and node F. Adjacency matrix is an expression of a tree, and for non-tree structures such as graphs, construction of adjacency matrix requires some additional processing, e.g., merging repeated edges, etc.
According to the embodiment of the disclosure, by converting the initial grammar tree into the directed adjacency matrix, redundant nodes in the initial grammar tree can be eliminated, and only useful nodes are left, so that when data extraction is performed according to the adjacency matrix, the workload of program data extraction can be reduced, the extraction logic of the program data is simplified, and the data extraction efficiency is improved.
According to an embodiment of the present disclosure, operation S230 may include the following operations: traversing the directed adjacency matrix based on a preset traversing rule, and screening out target element values meeting preset conditions; tracing at least one target leaf node according to the position of the target element value in the directed adjacency matrix; based on the target leaf node, data associated with the target leaf node is extracted from the code base.
According to an embodiment of the present disclosure, based on the target leaf node, extracting data associated with the target leaf node from the code base may include the following operations: determining the node type of the target leaf node; and extracting data associated with the target leaf node from the code base directly according to the mapping file associated with the target leaf node under the condition that the node type of the target leaf node is a data declaration type node; searching a calling method of the target leaf node under the condition that the node type of the target leaf node is a calling method type node; analyzing a grammar tree constructed based on a calling method, and determining a mapping file associated with a target leaf node; data associated with the target leaf node is extracted from the code base according to the mapping file associated with the target leaf node.
According to an embodiment of the present disclosure, extracting data associated with a target leaf node from a code base according to a mapping file associated with the target leaf node may include the operations of: generating an initial mapping file list based on the mapping files associated with the target leaf nodes; removing repeated mapping files from the initial mapping file list to obtain a target mapping file list; and extracting data associated with the target leaf node from the code base according to the target mapping file list.
According to the embodiment of the present disclosure, according to the adjacency matrix obtained in operation S220, the nodes of the program may be called from left to right, further sub-program acquisition may be performed, and at the same time, a mapping (mapper) file may be acquired according to the program name, and further the related table name may be acquired. The mapper file refers to an xml file of a corresponding SQL statement stored for adding, deleting and modifying a table under a distributed architecture, and related database table names can be obtained from the file. The method comprises the following specific steps:
traversing the obtained adjacent matrix from left to right by a preset traversing rule from top to bottom to obtain an element a [ i ] [ j ] with a matrix element value of 1, wherein i is more than or equal to 1, and j is more than or equal to 6.
After determining the element a [ i ] [ j ] with the corresponding element value of 1, searching the corresponding information of the target leaf node corresponding to the element a [ i ] [ j ].
If the target leaf node identifier is a declaration node identifier, judging whether the definition value of the target leaf node ends with a Mapper, if so, performing global scanning code library according to the value to acquire a corresponding Mapper file, and storing a file path, otherwise, skipping.
If the node identifier is a calling method identifier, acquiring a program class name corresponding to the method, analyzing the class by a grammar tree, eliminating redundant nodes and acquiring useful nodes according to the steps, and further acquiring and storing a Mapper file which may exist.
And merging the obtained mapper files, and removing the repeated mapper files.
FIG. 7 schematically illustrates a flow chart for obtaining relevant data table names from a mapping relationship of a subroutine to a mapper, according to an embodiment of the disclosure.
Referring to fig. 7, in the program name of the target leaf node, if the node type is the class to which the program belongs, a reference packet is called, and a mapper file is acquired. And under the condition that the node type is a function call type, finding out a method or class to which the function belongs, calling a reference package, and obtaining a mapper file. And under the condition that the node type is declaration type, finding out the method or class to which the declaration belongs, calling a reference package, and acquiring a mapper file. And finally merging the obtained mapper files, removing the repeated mapper files, and finding out the corresponding distributed platform database table according to the repeated mapper files.
According to the embodiment of the disclosure, the above operation may be understood that, in the case that the identifier of the target leaf node is the first identifier, the current target leaf node may be represented by the minimum granularity, the node does not call a method or a function, and there is no leaf node of a lower hierarchy, where in this case, the mapper file may be obtained directly according to whether the program name of the target leaf node ends with a mapper, and the corresponding table name of the distributed platform database is found according to the mapper file. In the case that the identifier of the target leaf node is the second identifier, it may represent that the current target leaf node is not the minimum granularity, where the node has a calling method or function, and possibly has a leaf node of the lower hierarchy, in this case, a new syntax tree may be constructed by using the target leaf node as a vertex, where the new syntax tree includes the method or function called by the target leaf node, and the new syntax tree is processed by using the above method (an adjacency matrix is built to reject useless redundant nodes, leave useful nodes, etc.), to determine whether the new leaf node in the new syntax tree calls the method or function again, if not, then processing may be performed according to the case that the identifier of the target leaf node is the first identifier, and if the method or function is called again, processing is performed according to the case that the identifier of the target leaf node is the second identifier again, and then processing is performed by analogy until the leaf node of the minimum granularity is found, and the method or function is not called any more. And finally, according to the leaf node with the minimum granularity, which does not call the method or the function any more, finding out the associated mapper file, and according to the mapper file, finding out the table name of the corresponding distributed platform database. The processing method for identifying the target leaf node as the second identifier may be understood as a method of circularly executing the relevant steps in operations S210 to S220.
Illustratively, suppose that the program to which MDL1 belongs Loanclass calls 3 mapper files, ABCD001.Xml, ABCD002.Xml, and ABCD003.Xml, respectively. The program Hisclass under the method corresponding to the function to which MDL1 belongs calls 2 mapper files, ABCD002.Xml and ABCD004.Xml, respectively, and the list generated from these files is shown in Table 1. The final mapper file after the duplicate mapper file is removed is abcd001.Xml, abcd002.Xml, abcd003.Xml, abcd004.Xml.
TABLE 1
The corresponding distributed platform database table is found according to the mapper file, which can be shown in table 2.
TABLE 2
MAPPER file Content of MAPPER File
ABCD001.xml update bcu_abe set...
ABCD002.xml select*from bcu_abc
insert into bcu_abc
update bcu_abc set...
ABCD003.xml select*from bcu_setor
ABCD004.xml select*from bcu_manor
The database table name output by the above method may include at least one of: [ MDL1, update, bcu _ abe ], [ MDL1, query, bcu _abc ], [ MDL1, insert, bcu _abc ], [ MDL1, update, bcu _abc ], [ MDL1, query, bcu _monitor ], [ MDL1, query, bcu _manger ].
According to the embodiment of the disclosure, the hierarchical relationship between the root node and the useful target leaf node is represented in the directed adjacency matrix, so that a grammar tree consisting of the root node and the useful target leaf node can be regenerated as a target grammar tree according to the directed adjacency matrix. And outputting the target grammar tree graph and the related database table names as data extraction results. Therefore, a detailed design flow chart of automatic drawing is obtained, and the drawing efficiency of the flow chart is improved.
By the data extraction method provided by the embodiment of the disclosure, the whole process from code to data extraction and flow chart drawing can be automatically completed, and the problems of high time cost, low completion efficiency and high error rate caused by manual checking and drawing are at least partially solved.
Fig. 8 schematically illustrates a flow chart of a data extraction method according to another implementation of the present disclosure.
As shown in fig. 8, the data extraction method of this embodiment may include operations S810 to S840.
In operation S810, the code is parsed.
Specifically, a source code file is acquired, an initial abstract syntax tree is constructed for corresponding program codes through the abstract syntax tree, and node types corresponding to the initial abstract syntax tree comprise identifier nodes, type nodes, statement nodes, function nodes, statement nodes and expression nodes.
In operation S820, the original abstract syntax tree redundancy node is eliminated.
The redundant nodes, namely nodes which cannot express program meanings in the initial abstract syntax tree, eliminate nodes which are irrelevant to programs in the initial abstract syntax tree, so that the efficiency of extracting static information of the syntax tree is improved. The method specifically comprises the following steps: adding head nodes to the top nodes of the multiple trees obtained in the operation S910 to obtain 1 complete tree; first, text traversal is performed on the initial abstract syntax tree, identifying useful nodes. The following conditions are satisfied as useful nodes: if the node identifier is var decl, i.e. variable declaration, if the node identifier is stmt expr, i.e. statement expression, if the node identifier is mod (6) expr, i.e. modification expression, if the node identifier is call expr, i.e. function call. The child nodes of the useful node are also useful nodes, so the child nodes are further traversed.
In operation S830, the relevant database table names and fields are acquired according to the mapping relationship of the program names and the mapper.
According to the adjacency matrix obtained in operation S820, the node calling the program from left to right further acquires the subroutine, and at the same time acquires the mapping map file according to the program name, and further acquires the related table name. The mapper file refers to an xml file of a corresponding SQL statement stored for adding, deleting and modifying a table under a distributed architecture, and related database table names can be obtained from the xml file
In operation S840, a detailed design flowchart corresponding to the transaction correlation table and the subroutine is output.
According to the embodiment of the present disclosure, the contents of operations S810 to S840 may refer to the related contents of operations S210 to S230, and are not described herein.
Fig. 9 schematically illustrates a flow chart of eliminating redundant nodes of operation S820 according to an embodiment of the present disclosure.
As shown in fig. 9, operation S820 may further include operations S821 to S824.
In operation S821, an initial abstract syntax tree is acquired.
In operation S822, depth-first traversal and breadth-first traversal are performed on the syntax tree.
In operation S823, redundant nodes are eliminated, and a directed adjacency matrix is generated.
In operation S824, a target syntax tree is generated according to the directed adjacency matrix.
According to the embodiment of the present disclosure, the contents of operations S821 to S824 may refer to the relevant contents of operations S210 to S230, and will not be described herein.
The data extraction method provided by the embodiment of the disclosure reduces manual check and improves the data extraction efficiency and the drawing efficiency of the calling program flow chart; the method has wide applicability to various coding languages based on the abstract syntax tree, and meanwhile, the program flow chart and the table relation are output quickly and accurately, so that convenience is provided for scheme design.
It should be noted that, unless there is an execution sequence between different operations or an execution sequence between different operations in technical implementation, the execution sequence between multiple operations may be different, and multiple operations may also be executed simultaneously in the embodiment of the disclosure.
Based on the data extraction method, the disclosure also provides a data extraction device. The device will be described in detail below in connection with fig. 10.
Fig. 10 schematically shows a block diagram of a data extraction apparatus according to an embodiment of the present disclosure.
As shown in fig. 10, the data extraction apparatus 1000 of this embodiment may include a conversion module 1010, a traversal module 1020, and an extraction module 1030.
The conversion module 1010 is configured to convert the obtained code file into an initial syntax tree, where the code file includes a plurality of code objects, and the initial syntax tree includes a root node and a leaf node for representing the code objects.
The traversing module 1020 is configured to traverse the root node and the leaf node in the initial syntax tree to obtain a directed adjacency matrix, where the directed adjacency matrix is used to characterize a situation where a hierarchical relationship exists between the root node and a target leaf node, and the target leaf node, and the directed adjacency matrix includes a plurality of element values obtained according to the situation where the hierarchical relationship exists, and the target leaf node is a leaf node with a preset node identifier.
And the extraction module 1030 is configured to extract, from the database, data associated with the target leaf node according to the target element values in the directed adjacency matrix that satisfy the preset condition, as a data extraction result, where the data extraction result further includes a target syntax tree generated according to the directed adjacency matrix.
According to the data extraction method, the device, the equipment, the storage medium and the program product provided by the embodiment of the disclosure, the obtained code file is converted into an initial grammar tree; traversing the initial grammar tree to obtain a directed adjacency matrix; and carrying out data extraction according to the target element values meeting the preset conditions in the directed adjacent matrix to obtain a data extraction result comprising data and a target grammar tree generated according to the directed adjacent matrix. Because in the process of extracting program data, a grammar tree is constructed based on a code file, the grammar tree is traversed to obtain a directed adjacent matrix, nodes with preset identifiers in the grammar tree are screened out, and nodes without the preset identifiers are removed, so that when the data is extracted according to the adjacent matrix, the nodes without the preset identifiers are not needed to be considered, the workload of extracting the program data is reduced, the extraction logic of the program data is simplified, the whole process can be automatically executed, the extraction efficiency of the program data is high, the error rate is low, the problem of low extraction efficiency of the program data in the related art is at least partially solved, and the technical effects of shortening the drawing time cost of a call flow chart, improving the drawing efficiency and reducing the drawing error rate are achieved.
According to embodiments of the present disclosure, the traversal module may include an add sub-module, an identify sub-module, a traverse sub-module.
And the adding sub-module is used for adding a target node for the root node of the initial grammar tree to obtain a complete grammar tree.
The identification sub-module is used for identifying root nodes and leaf nodes in the complete grammar tree based on a preset node identifier to obtain an identification result.
And the traversing sub-module is used for traversing the complete grammar tree based on the identification result to obtain the directed adjacency matrix.
According to an embodiment of the present disclosure, the traversal sub-module may include a building unit, an assigning unit, a first determining unit, a second determining unit, a modifying unit.
And the construction unit is used for constructing an initial adjacency matrix, wherein the size of the initial adjacency matrix is related to the total number of nodes of the root node and the leaf node, and the initial element values in the initial adjacency matrix are the same.
And the giving unit is used for giving different numbers to the root node and the target leaf node in the target grammar tree respectively.
And the first determining unit is used for determining the position of the initial element value to be modified in the initial adjacency matrix according to the number of the root node and the number of the target leaf node under the condition that the root node and at least one target leaf node have a hierarchical relationship.
And the second determining unit is used for determining the position of the initial element value to be modified in the initial adjacency matrix according to the number of the target leaf nodes under the condition that the hierarchical relationship exists between at least two target leaf nodes.
And the modification unit is used for modifying the initial element value to be modified into a preset value to obtain the directed adjacency matrix.
According to embodiments of the present disclosure, the extraction module may include a screening sub-module, a tracing sub-module, an extraction sub-module.
And the screening sub-module is used for traversing the directed adjacent matrix based on a preset traversing rule and screening out target element values meeting preset conditions.
And the tracing sub-module is used for tracing at least one target leaf node according to the position of the target element value in the directed adjacency matrix.
And the extraction sub-module is used for extracting data associated with the target leaf node from the code base based on the target leaf node.
According to an embodiment of the present disclosure, the extraction sub-module may include a third determination unit, a first extraction unit, a search unit, an analysis unit, a second extraction unit.
And the third determining unit is used for determining the node type of the target leaf node.
And the first extraction unit is used for directly extracting the data associated with the target leaf node from the code base according to the mapping file associated with the target leaf node in the case that the node type of the target leaf node is the data declaration type node.
The searching unit is used for searching the calling method of the target leaf node under the condition that the node type of the target leaf node is the calling method type node.
And the analysis unit is used for analyzing the grammar tree constructed based on the calling method and determining a mapping file associated with the target leaf node.
And the second extraction unit is used for extracting the data associated with the target leaf node from the code base according to the mapping file associated with the target leaf node.
According to an embodiment of the present disclosure, the first extraction unit may include a first generation subunit, a first culling subunit, a first extraction subunit.
A first generation subunit for generating an initial mapping file list based on the mapping files associated with the target leaf node.
And the first removing subunit is used for removing repeated mapping files from the initial mapping file list to obtain a target mapping file list.
And the first extraction subunit is used for extracting the data associated with the target leaf node from the code base according to the target mapping file list.
According to an embodiment of the present disclosure, the second extraction unit may include a second generation subunit, a second culling subunit, a second extraction subunit.
And a second generation subunit configured to generate an initial mapping file list based on the mapping file associated with the target leaf node.
And the second eliminating subunit is used for eliminating repeated mapping files in the initial mapping file list to obtain a target mapping file list.
And the second extraction subunit is used for extracting the data associated with the target leaf node from the code base according to the target mapping file list.
Any of the conversion module 1010, the traversal module 1020, and the extraction module 1030 may be combined in one module to be implemented, or any of the modules may be split into multiple modules, according to embodiments of the present disclosure. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. At least one of the conversion module 1010, the traversal module 1020, and the extraction module 1030 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-a-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware, such as any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware, in accordance with embodiments of the present disclosure. Alternatively, at least one of the conversion module 1010, the traversal module 1020, and the extraction module 1030 may be at least partially implemented as a computer program module that, when executed, performs the corresponding functions.
It should be noted that, in the embodiments of the present disclosure, the data extraction device portion corresponds to the data extraction method portion in the embodiments of the present disclosure, and the description of the data extraction device portion specifically refers to the data extraction method portion and is not described herein.
Fig. 11 schematically illustrates a block diagram of an electronic device adapted to implement a data extraction method according to an embodiment of the disclosure.
As shown in fig. 11, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flow according to embodiments of the present disclosure.
In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. Note that the program may be stored in one or more memories other than the ROM 1102 and the RAM 1103. The processor 1101 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the disclosure, the electronic device 1100 may also include an input/output (I/O) interface 1105, the input/output (I/O) interface 1105 also being connected to the bus 1104. The electronic device 1100 may also include one or more of the following components connected to an input/output (I/O) interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to an input/output (I/O) interface 1105 as required. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.
The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1102 and/or RAM 1103 described above and/or one or more memories other than ROM 1102 and RAM 1103.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the methods shown in the flowcharts. The program code, when executed in a computer system, causes the computer system to implement the data extraction method provided by embodiments of the present disclosure.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1101. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 1109, and/or installed from the removable media 1111. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1101. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be provided in a variety of combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. In particular, the features recited in the various embodiments of the present disclosure and/or the claims may be variously combined and/or combined without departing from the spirit and teachings of the present disclosure. All such combinations and/or combinations fall within the scope of the present disclosure.
The embodiments of the present disclosure are described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (11)

1. A data extraction method, comprising:
converting the acquired code file into an initial syntax tree, wherein the code file comprises a plurality of code objects, and the initial syntax tree comprises a root node and a leaf node for representing the code objects;
Traversing the root node and the leaf node in the initial grammar tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root node and a target leaf node and between the target leaf node and the target leaf node, the directed adjacency matrix comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf node is a leaf node with a preset node identifier;
and extracting data associated with the target leaf nodes from a database according to target element values meeting preset conditions in the directed adjacency matrix, wherein the data extraction result further comprises a target grammar tree generated according to the directed adjacency matrix.
2. The method of claim 1, wherein traversing the root node and the leaf nodes in the initial syntax tree to obtain a directed adjacency matrix comprises:
adding a target node to the root node of the initial grammar tree to obtain a complete grammar tree;
identifying the root node and the leaf node in the complete grammar tree based on the preset node identifier to obtain an identification result;
Traversing the complete grammar tree based on the identification result to obtain the directed adjacency matrix.
3. The method of claim 2, wherein the identification result comprises a root node and the target leaf node;
traversing the complete grammar tree based on the identification result, wherein obtaining the directed adjacency matrix comprises the following steps:
constructing an initial adjacency matrix, wherein the size of the initial adjacency matrix is related to the total number of nodes of the root node and the leaf node, and initial element values in the initial adjacency matrix are the same;
assigning different numbers to the root node and the target leaf node in the target grammar tree respectively;
determining the position of an initial element value to be modified in the initial adjacency matrix according to the number of the root node and the number of the target leaf node under the condition that the root node and at least one target leaf node have a hierarchical relationship; and/or
Determining the position of an initial element value to be modified in the initial adjacency matrix according to the number of the target leaf nodes under the condition that a hierarchical relationship exists between at least two target leaf nodes; and
And modifying the initial element value to be modified into a preset value to obtain the directed adjacency matrix.
4. A method according to claim 3, wherein the extracting data associated with the target leaf node from a database according to target element values in the directed adjacency matrix satisfying a preset condition comprises:
traversing the directed adjacency matrix based on a preset traversing rule, and screening out target element values meeting preset conditions;
tracing at least one target leaf node according to the position of the target element value in the directed adjacency matrix;
based on the target leaf node, data associated with the target leaf node is extracted from the code base.
5. The method of claim 4, wherein the extracting data associated with the target leaf node from the code base based on the target leaf node comprises:
determining a node type of the target leaf node; and
extracting data associated with the target leaf node from the code base directly according to a mapping file associated with the target leaf node under the condition that the node type of the target leaf node is a data declaration type node;
Searching a calling method of the target leaf node under the condition that the node type of the target leaf node is a calling method type node;
analyzing a grammar tree constructed based on the calling method, and determining a mapping file associated with the target leaf node;
and extracting data associated with the target leaf node from the code base according to a mapping file associated with the target leaf node.
6. The method of claim 1, wherein the extracting data associated with the target leaf node from the code base according to a mapping file associated with the target leaf node comprises:
generating an initial mapping file list based on the mapping files associated with the target leaf nodes;
removing repeated mapping files from the initial mapping file list to obtain a target mapping file list;
and extracting data associated with the target leaf node from the code base according to the target mapping file list.
7. The method of claim 1, wherein the node type of the leaf node comprises at least one of: identifier type node, data declaration type node, function call type node, statement type node, expression type node.
8. A data extraction apparatus comprising:
and the conversion module is used for converting the acquired code file into an initial grammar tree, wherein the code file comprises a plurality of code objects, and the initial grammar tree comprises root nodes and leaf nodes for representing the code objects.
The traversing module is used for traversing the root node and the leaf node in the initial grammar tree to obtain a directed adjacency matrix, wherein the directed adjacency matrix is used for representing the existence condition of the hierarchical relationship between the root node and a target leaf node and between the target leaf node and the target leaf node, the directed adjacency matrix comprises a plurality of element values obtained according to the existence condition of the hierarchical relationship, and the target leaf node is a leaf node with a preset node identifier.
And the extraction module is used for extracting data associated with the target leaf node from a database according to the target element values meeting the preset conditions in the directed adjacency matrix as a data extraction result, wherein the data extraction result also comprises a target grammar tree generated according to the directed adjacency matrix.
9. An electronic device, comprising:
one or more processors;
Storage means for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.
10. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.
11. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 7.
CN202310636435.4A 2023-05-31 2023-05-31 Data extraction method, device, equipment and storage medium Pending CN116661857A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310636435.4A CN116661857A (en) 2023-05-31 2023-05-31 Data extraction method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310636435.4A CN116661857A (en) 2023-05-31 2023-05-31 Data extraction method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116661857A true CN116661857A (en) 2023-08-29

Family

ID=87718537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310636435.4A Pending CN116661857A (en) 2023-05-31 2023-05-31 Data extraction method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116661857A (en)

Similar Documents

Publication Publication Date Title
US20230281012A1 (en) Systems and methods for automating and monitoring software development operations
CN113326247B (en) Cloud data migration method and device and electronic equipment
CN115599386A (en) Code generation method, device, equipment and storage medium
CN114281803A (en) Data migration method, device, equipment, medium and program product
CN116483888A (en) Program evaluation method and device, electronic equipment and computer readable storage medium
CN116166547A (en) Code change range analysis method, device, equipment and storage medium
CN113138767B (en) Code language conversion method, device, electronic equipment and storage medium
CN116661857A (en) Data extraction method, device, equipment and storage medium
CN113392311A (en) Field searching method, field searching device, electronic equipment and storage medium
CN113419740A (en) Program data stream analysis method and device, electronic device and readable storage medium
CN113032256A (en) Automatic test method, device, computer system and readable storage medium
CN111949259A (en) Risk decision configuration method, system, electronic equipment and storage medium
CN116382703B (en) Software package generation method, code development method and device, electronic equipment and medium
Habibi et al. Sharif-TaaWS: a tool to automate unit testing of web services
CN112860259B (en) Interface processing method, device, electronic equipment and storage medium
CN116680184A (en) Code scanning method, device, electronic equipment and medium
CN114841707A (en) Check account rule extraction method, device, equipment, storage medium and program product
CN115600578A (en) Data blood relationship analysis method, apparatus, device, medium, and program product
CN117785205A (en) Data evaluation method, device, electronic equipment and computer readable medium
CN114266547A (en) Method, device, equipment, medium and program product for identifying business processing strategy
CN114780517A (en) Data verification method, device, equipment and storage medium
CN115688687A (en) Data processing method, device, equipment and medium
CN117370177A (en) Project document normalization checking method and device
CN116450416A (en) Redundancy check method and device for software test cases, electronic equipment and medium
CN116610296A (en) Program call chain statistical method, apparatus, device, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination