CN116431639A - Data optimization method, device, computer equipment and medium based on graphics - Google Patents

Data optimization method, device, computer equipment and medium based on graphics Download PDF

Info

Publication number
CN116431639A
CN116431639A CN202310413516.8A CN202310413516A CN116431639A CN 116431639 A CN116431639 A CN 116431639A CN 202310413516 A CN202310413516 A CN 202310413516A CN 116431639 A CN116431639 A CN 116431639A
Authority
CN
China
Prior art keywords
path
nodes
data
loop
call
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310413516.8A
Other languages
Chinese (zh)
Inventor
石娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202310413516.8A priority Critical patent/CN116431639A/en
Publication of CN116431639A publication Critical patent/CN116431639A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application belongs to the technical field of data processing, is applied to the field of insurance claims of financial science and technology, and relates to a data optimization method based on graphics, which comprises the steps of analyzing a data query script corresponding to a large database table to obtain a table blood edge relation diagram; identifying leaf table nodes in the table blood edge relation diagram, and acquiring all calling paths from the leaf table nodes to the root table nodes according to the table blood edge relation diagram; identifying a ring call loop in the call path, and disassembling the ring call loop to obtain a unidirectional table relation diagram; determining a longest calling path between the leaf table node and the root table node based on the unidirectional table relation graph; and optimizing the longest calling path according to the target service to obtain a table optimization blood margin relation graph. The application also provides a data optimization device, computer equipment and medium based on the imaging. Furthermore, the present applicationPlease also refer to blockchain technology, in which table blood relationship graphs can be stored The method and the device can simplify the calling link of the data table and reduce the complexity of the data processing link.

Description

Data optimization method, device, computer equipment and medium based on graphics
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data optimization method, apparatus, computer device, and medium based on graphics.
Background
The data architecture design for big data generally comprises a data operations layer ODS (Operational Data Store), a data warehouse layer DW (Data Warehouse), and a data application layer ADS (Application Data Store), each of which comprises a different subdivision level, each subdivision level actually comprising multiple data processing logic. Along with the continuous superposition of the complexity of data requirements, the processing process often has the phenomenon of more three layers, namely more processing layer numbers, more upstream tables relied on by the same layer, and more table circulating calls, and the conditions lead to the fact that the downstream application end has later time and the user experience feel is poor.
Particularly in a large-scale large-data cluster, the number of the service library tables is large, one end table may depend on thousands of upstream tables, and even tens of thousands of upstream tables, so that the processing performance of inquiring, reading and the like of the data is reduced, the processing efficiency of the service data is reduced, and meanwhile, due to the lack of clear tables and table calling relations, the service development and modification are not facilitated.
Disclosure of Invention
The embodiment of the application aims to provide a data optimization method, a data optimization device, computer equipment and a data optimization medium based on graphics, so as to solve the technical problems that when the number of the service database tables is large, the data tables are complex to call, the data processing efficiency is low and the service development and the modification are not facilitated in the related art.
In order to solve the above technical problems, the embodiments of the present application provide a data optimization method based on graphics, which adopts the following technical scheme:
acquiring a data query script corresponding to a large database table, and analyzing the data query script to obtain a table blood-edge relation diagram;
identifying leaf table nodes in the table blood edge relation diagram, and acquiring all calling paths between the leaf table nodes and root table nodes according to the table blood edge relation diagram;
identifying a ring call loop in the call path, and disassembling the ring call loop to obtain a unidirectional table relation diagram;
determining a longest call path between the leaf table node to the root table node based on the unidirectional table relationship graph;
and optimizing the longest calling path according to the target service to obtain a table optimization blood margin relation diagram.
Further, the step of analyzing the data query script to obtain the table blood edge relationship graph includes:
Carrying out grammar analysis on the data query script to obtain an abstract grammar tree;
extracting data table information according to the abstract syntax tree;
determining a table node based on the data table information;
determining the hierarchical relation among the table nodes according to the hierarchical relation of the sentences corresponding to the data table information in the abstract syntax tree;
and obtaining a table blood edge relation graph based on the hierarchical relation among the table nodes.
Further, the step of obtaining all call paths from the leaf table node to the root table node according to the table blood edge relation graph includes:
obtaining the directed edges of the leaf table nodes according to the table blood edge relation diagram;
and traversing the directed edge to the upstream of the leaf table node until the root table node stops, and obtaining the calling path.
Further, the step of identifying a loop call loop in the call path includes:
extracting all table nodes on the calling path;
de-duplicating the table nodes to obtain the number of path nodes;
obtaining the calling path length according to the calling path;
and determining a loop call loop based on the number of path nodes and the call path length.
Further, the step of determining a loop call loop based on the number of path nodes and the call path length includes:
comparing the number of path nodes with the call path length;
and when the length of the calling path is greater than or equal to the number of the path nodes, the calling path is a ring-shaped calling path.
Further, the step of disassembling the ring call loop includes:
determining a minimum loop in the loop call loop;
and disassembling the minimum loop according to a preset disassembly rule according to the directed edge in the minimum loop.
Further, the step of determining the smallest loop in the loop call loop includes:
acquiring ring table nodes on the ring call path to form a node set;
traversing the ring table nodes in the node set according to the directed edges to obtain the same ring paths of the starting node and the end node;
obtaining the number of ring nodes and the length of the ring path after the duplication removal according to the ring path;
comparing the ring path length with the number of ring nodes;
when the ring path length and the ring node number are equal, the ring path is the minimum loop.
In order to solve the above technical problems, the embodiments of the present application further provide a data optimization device based on graphics, which adopts the following technical scheme:
the analysis module is used for acquiring a data query script corresponding to the large database table and analyzing the data query script to obtain a table blood edge relation diagram;
the identifying module is used for identifying the leaf table nodes in the table blood edge relation diagram and acquiring all calling paths from the leaf table nodes to the root table nodes according to the table blood edge relation diagram;
the disassembly module is used for identifying a ring call loop in the call path and disassembling the ring call loop to obtain a unidirectional table relation diagram;
an obtaining module, configured to determine a longest call path between the leaf table node and the root table node based on the unidirectional table relationship graph;
and the optimizing module is used for optimizing the longest calling path according to the target service to obtain a table optimization blood-margin relation diagram.
In order to solve the above technical problems, the embodiments of the present application further provide a computer device, which adopts the following technical schemes:
the computer device comprises a memory having stored therein computer readable instructions which when executed by the processor implement the steps of the graphical based data optimization method as described above.
In order to solve the above technical problems, embodiments of the present application further provide a computer readable storage medium, which adopts the following technical solutions:
the computer readable storage medium has stored thereon computer readable instructions which when executed by a processor implement the steps of the graphically-based data optimization method as described above.
Compared with the prior art, the embodiment of the application has the following main beneficial effects:
according to the method, a data query script corresponding to a large database table is obtained, and the data query script is analyzed to obtain a table blood-edge relation diagram; identifying leaf table nodes in the table blood edge relation diagram, and acquiring all calling paths from the leaf table nodes to the root table nodes according to the table blood edge relation diagram; identifying a ring call loop in the call path, and disassembling the ring call loop to obtain a unidirectional table relation diagram; determining a longest calling path between the leaf table node and the root table node based on the unidirectional table relation graph; obtaining a table optimization blood margin relation graph according to the longest calling path of target service optimization; according to the method and the device, the call paths among the data tables are obtained through the table blood-edge relation diagram of the large data table base, the annular call loops in the call paths are identified and disassembled, the call links of the data tables are simplified, the complexity of data processing links is reduced, the data acquisition and processing efficiency is improved, meanwhile, the clear table call links are beneficial to positioning analysis problems, and the service development and modification are facilitated.
Drawings
For a clearer description of the solution in the present application, a brief description will be given below of the drawings that are needed in the description of the embodiments of the present application, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a graph-based data optimization method according to the present application;
FIG. 3 is a table blood relationship graph of one particular embodiment of a graph-based data optimization method according to the present application;
FIG. 4 is a diagram of a one-way table relationship for one particular embodiment of a graph-based data optimization method according to the present application;
FIG. 5 is a schematic diagram of one embodiment of a patterning-based data optimization device, according to the present application;
FIG. 6 is a schematic structural diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the applications herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "comprising" and "having" and any variations thereof in the description and claims of the present application and in the description of the figures above are intended to cover non-exclusive inclusions. The terms first, second and the like in the description and in the claims or in the above-described figures, are used for distinguishing between different objects and not necessarily for describing a sequential or chronological order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present application. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
In order to better understand the technical solutions of the present application, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings.
The present application provides a data optimization method based on graphics, which can be applied to a system architecture 100 shown in fig. 1, where the system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as a web browser application, a shopping class application, a search class application, an instant messaging tool, a mailbox client, social platform software, etc., may be installed on the terminal devices 101, 102, 103.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic video expert compression standard audio plane 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic video expert compression standard audio plane 4) players, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the data optimization method based on the graphics provided in the embodiments of the present application is generally executed by a server/terminal device, and accordingly, the data optimization device based on the graphics is generally set in the server/terminal device.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flowchart of one embodiment of a graphically-based data optimization method according to the present application is shown, including the steps of:
Step S201, obtaining a data query script corresponding to the large database table, and analyzing the data query script to obtain a table blood edge relation diagram.
The data architecture design of big data generally comprises a data operation layer ODS, a data warehouse layer DW and a data application layer ADS, each of which comprises different subdivision levels, and each subdivision level actually comprises a plurality of data processing logic. The data in the large database is stored in the form of a table, and a calling relationship exists between the table and the data of the table, namely, a data blood relationship exists. The data blood-source belongs to the concept of data management, and refers to the relationship between related data found in the process of tracing the data.
The big data has the corresponding data query script, and the blood-edge relation diagram between the tables in the big database, namely the table blood-edge relation diagram, can be obtained by analyzing the data query script.
The data query script includes an SQL (Structured Query Language ) script, where the SQL script may include program instructions for data access, query, update, and management operations.
In this embodiment, static analysis or dynamic analysis may be performed on the data query script, and a table blood-edge relationship graph may be obtained after analysis.
The table blood edge relation graph comprises table nodes and directed edges, wherein the table nodes are nodes corresponding to source tables or target tables in a large database, the directed edges are directed connecting lines, which point to the table nodes corresponding to the source tables, of the table nodes corresponding to the target tables, represent calling relations, for example, the target tables A and B represent the table nodes, and A-B represent that the table A calls the table B.
It should be noted that the source table and the target table are relative concepts, and one data table may be the source table or the target table at the same time.
It should be emphasized that to further ensure the privacy and security of the table blood relationship graph, the table blood relationship graph may also be stored in a node of a blockchain.
The blockchain referred to in the application is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and the like. The Blockchain (Blockchain), which is essentially a decentralised database, is a string of data blocks that are generated by cryptographic means in association, each data block containing a batch of information of network transactions for verifying the validity of the information (anti-counterfeiting) and generating the next block. The blockchain may include a blockchain underlying platform, a platform product services layer, an application services layer, and the like.
Step S202, identifying leaf table nodes in the table blood edge relation diagram, and acquiring all calling paths from the leaf table nodes to the root table nodes according to the table blood edge relation diagram.
In this embodiment, a table, which is close to source layer data in a large database, is called a root node, and the root node has no upstream and is the first layer of a large data processing link; the table of data processing ends of big data is called a leaf node, which has no downstream and is closest to the application end.
Determining leaf table nodes in the table blood edge relation diagram, and acquiring directed edges of the leaf table nodes according to the table blood edge relation diagram; and traversing the upstream of the leaf table node based on the directed edge until the root table node stops, and obtaining a calling path.
According to the call relation of the directed edges corresponding to the leaf table nodes, namely the leaf table nodes, traversing and searching the upper layer of table nodes from the arrow points of the directed edges in the table blood edge relation graph until the root table nodes are encountered, and stopping searching. It should be noted that the walked edge cannot be traversed repeatedly during the lookup process.
There are multiple paths between the root table node and leaf table node, each path being a call path.
According to the embodiment, the directed edges of the table blood edge relation graph traverse from the leaf table nodes to the root table nodes, so that the acquisition efficiency and accuracy of a calling path can be ensured.
Step S203, a ring call loop in the call path is identified, and the ring call loop is disassembled to obtain a unidirectional table relation diagram.
Because the large database table processing has no limitation of main key/external key/index, the data of the large database table can be read when writing the table. The different tables can also be called circularly, and a loop is generated in the table blood relationship diagram, which is called a ring. In the actual machining process, a large number of ring calls exist, and some of the ring calls are formed because machining logic of a large wide table field can have many associated calls, some of the ring calls are formed because of cross-calling of different data partitions in a table, some of the ring calls are formed because of design errors of a data model, special data machining requirements and the like. Ring calls can increase the complexity of the data model and the actual scheduling risk. And the processing complexity of the upstream link of the leaf table node can directly influence the output timeliness and the data result, so that the experience of the user at the application end is reduced.
In this embodiment, a ring call loop in the call path is identified and disassembled, so that the complexity of the call link can be reduced, and the data model is simplified.
In some alternative implementations, the step of identifying the loop call loop in the call path includes:
Extracting all table nodes on the calling path;
removing the duplication of the table nodes to obtain the number of path nodes;
obtaining the length of a calling path according to the calling path;
a loop call loop is determined based on the number of path nodes and the call path length.
There are multiple call paths between the root table node and the leaf table node, and assuming that the path length of a certain call path is M (i.e. the number of passing directed edges and also represents the processing level), the number of path nodes after the duplication removal of all table nodes (including the root table node and the leaf table node) passing through the path is N (i.e. the number of physical tables).
The relationship between M and N exists in two cases:
1) M=n-1 is a directed acyclic graph, i.e., traversal can be completed by N table nodes with only N-1 directed edges;
2) M is greater than or equal to N, which indicates that a ring call loop exists between the table nodes on the call path.
In the present embodiment, the number N of path nodes and the call path length M are compared; when the calling path length M is greater than or equal to the number N of the path nodes, the calling path is a ring calling path.
In some embodiments, a call path between a leaf table node and a root table node is obtained, and a ring call loop in the call path is identified, which is equivalent to solving a table node z through which a directed edge passes between a leaf table node x and a root table node y, z=f (x, y), z∈ { z } 1 ,z 2 ,…,z n Re-solving table node z for table node u, u=f (z) through which the directed edge passes simultaneously as a start point and an end point i ,z i ),u∈{u 1 ,u 2 ,…,u m }. If u is empty, the upstream link of the leaf table node x is unidirectional loop-free; if u is not null, u is the most simple list node set which can form the ring call loop.
And disassembling the identified annular calling loop to obtain a unidirectional table relation diagram without the ring, so that the calling link is unidirectional and clear.
For example, referring to fig. 3, the leaf table node only calls one upstream table a, the root table node only calls one downstream table D, and analyzing the call relationship between the root table node and the leaf table node can be simplified to analyzing the call path between the table a and the table D, and the analysis of the call path between the ADs is shown in table 1.
TABLE 1
M N Passing point (in turn) Passing edge (in turn) Comparison Conclusion(s)
2 3 ABD 27 M=N-1 No loop
2 3 AED 16 M=N-1 No loop
3 3 AEED 156 M=N Single point ring
5 5 ABCAED 24316 M=N Single point ring
6 5 AEDCABD 168327 M>N Multi-point ring
6 5 ABCAEED 243156 M>N Multi-point ring
6 5 ABDCAED 278316 M>N Multi-point ring
7 5 AEEDCABD 1568327 M>N Multi-point ring
7 5 ABDCAEED 2783156 M>N Multi-point ring
As can be seen from the above table, 7 call paths are loop call loops.
The annular calling loop is determined through the number of path nodes and the calling path length, and the method is simple, convenient and quick.
In some alternative embodiments, the step of disassembling the ring call loop includes:
Determining a minimum loop in the loop call loop;
and disassembling the minimum loop according to a preset disassembly rule according to the directed edge in the minimum loop.
The minimum loop indicates that from any table node in the ring, the node can be returned, and the passing directional edges are the same, for example, the single-point ring must be the minimum ring.
As can be seen from the above examples, 7 loop call loops exist in the call path, and although no repeated directed edge is taken, the call loops can cause a detour from the leaf table node to the root table node, so that the data query acquisition efficiency is reduced.
The number of the minimum loops can judge the complexity of data call, and the embodiment realizes the disassembly of the loop call loop by disassembling the minimum loops. And disassembling according to the directed edges in the minimum loop, namely calling relations, and a preset disassembling rule.
The preset disassembly rules are set according to actual service logic, so that the accuracy of data is met, and service availability is ensured. For example, according to the service logic, splitting the table according to the table field into a plurality of tables, and placing the split tables at the corresponding calling positions in the calling path.
By way of example, referring to fig. 4, fig. 4 illustrates one way of disassembling a ring call loop. In order to simultaneously meet the condition that the processing contents of the upstream table and the downstream table of the table C and the table E are unchanged, the table C is disassembled into a table C1 and a table C2, the table E is disassembled into a table E1 and a table E2, and the tables are placed on different processing levels to obtain a unidirectional table relation diagram; in this case, although the number of tables is large in the unidirectional table relation diagram, the links of the call path are unidirectional and clear.
In the loop call loop, the longest path m=9 of the table node AD, and there are two "leaf aeedcad root" and "leaf abdcaaeed root" at the same time; whereas in the unidirectional table relation diagram without a ring, the longest path m=7 of AD has only one "leaf C1AE1E2DC2 root". Clearly, the latter is better in data level definition and fluency.
It should be understood that the above-mentioned disassembly method is only one possibility, and multiple disassembly methods can exist in practice, and no optimal or worst disassembly method exists, so long as no loop exists, the traversing complexity of the whole path can be simplified.
The embodiment can disassemble the annular call loop by disassembling the minimum loop, so that the disassembly can be realized more efficiently and accurately, and the disassembled table can be ensured to be used in the service.
Step S204, determining the longest calling path between the leaf table node and the root table node based on the unidirectional table relation diagram.
Based on the unidirectional table relation diagram without the loop, the longest calling path between the leaf table node and the root table node is obtained from the unidirectional table relation diagram. The longest calling path index is used for measuring the complexity of the data model, and the index is high, which means that the data hierarchy is more and more complex, and the longest calling path can be further reduced to completely simplify the data calling and simplify the data processing model.
Step S205, optimizing the longest calling path according to the target service to obtain a table-optimized blood-margin relation diagram.
The longest calling path is optimized and comprehensively rated according to the data query script logic, the data characteristics of the association table, the data updating time and the service requirements. The resulting table-optimized blood-lineage relationship graph will vary with different needs.
In some optional implementations, after the step of obtaining all call paths between the leaf table node and the root table node according to the table blood edge relationship graph, the method further includes:
identifying that the annular calling loop in the calling paths does not have the annular calling loop, and determining the longest calling path between the leaf table node and the root table node based on the table blood edge relation graph;
and optimizing the longest calling path according to the target service to obtain a table optimization blood margin relation graph.
When the table blood edge relation diagram does not have the annular calling loop, the calling paths in the table blood edge relation diagram are all unidirectional and clear, and the table blood edge relation diagram is directly optimized according to the service.
After optimization, comparing the quantity of minimum loops between the leaf table nodes and the root table nodes, and judging whether the complexity of the model is reduced.
According to the method and the device, the call paths among the data tables are obtained through the table blood-edge relation diagram of the large data table base, the annular call loops in the call paths are identified and disassembled, the call links of the data tables are simplified, the data acquisition and processing efficiency is improved, meanwhile, the clear table call links are beneficial to positioning analysis problems and are beneficial to development and modification of services; secondly, expressing the blood edges of the data table in a graphical mode, and calculating the longest calling path corresponding to the leaf table node by identifying the number of the minimum loops in the paths of the leaf table node upstream, so that the complexity of the data model can be measured, and meanwhile, a reference basis can be provided for the optimization model.
In some optional implementations, the step of parsing the data query script to obtain the table blood-edge relationship graph includes:
carrying out grammar analysis on the data query script to obtain an abstract grammar tree;
extracting data table information according to the abstract syntax tree;
determining table nodes based on the data table information;
determining the hierarchical relation among the table nodes according to the hierarchical relation of the sentences corresponding to the data table information in the abstract syntax tree;
and obtaining a table blood edge relation graph based on the hierarchical relation among the table nodes.
The abstract syntax tree may be used to abstract syntax information and lexical information representing the data query script. Splitting the data query script into a plurality of grammar units, carrying out grammar analysis on the grammar units, determining grammar nodes in an abstract grammar tree, determining hierarchical relations among the grammar nodes, extracting data table information from the grammar nodes, wherein the data table information comprises table information and field information, the table information can be information for identifying a table, and the table information can be a table name of the data table; the field information may be information for identifying a field, and the field information may be a field name.
Each table information and each field information may correspond to a table node in the table blood-edge relationship graph, and a hierarchical relationship between table nodes in the table blood-edge relationship graph is determined according to the hierarchical relationship of the table information and the field information.
In this embodiment, the data query script may be statically parsed or dynamically parsed.
For example, the static script may be obtained from a git/svn code library, obtain an SQL script file, write python (sqlparse) or java code to format and split the SQL code in the script, obtain the tables behind the keywords by identifying the keywords insert/with as/create as/from/join, and further extract the source and/or target tables.
In addition, the table blood-edge relation graph can be obtained by analyzing the dynamic execution script, that is, the SQL script is obtained and analyzed in the query in execution or after the execution is finished, hook (LineageLogger) is set before the execution of the hive script, the blood-edge information of the query is written into the formulated log file, and the accurate table and the calling relation of the table can be obtained by analyzing the log file.
It should be understood that the data query script is more parsed, and is not limited to the above two, and may be used in combination with the above method to obtain a more accurate blood relationship.
According to the method and the device, the table blood edge relation graph is obtained by analyzing the data query script, and the accuracy of the obtained table blood edge relation can be ensured.
In some alternative implementations, the step of determining the smallest loop in the loop call loop includes:
Acquiring ring table nodes on a ring call path to form a node set;
traversing the ring table nodes in the node set according to the directed edges to obtain the same ring paths of the starting node and the end node;
obtaining the number of ring nodes and the length of the ring path after the duplication removal according to the ring path;
comparing the ring path length with the number of ring nodes;
when the ring path length and the number of ring nodes are equal, the ring path is the minimum loop.
The minimum number of loops may be used to measure the complexity of the data processing link, and this embodiment may be determined by algorithms and scripts provided by the large database.
The above examples are taken as examples for detailed description, and the steps are as follows:
1) Extracting all table nodes in the AD path to form a node set;
2) Sequentially traversing table nodes in the node set, setting the current node N as a starting node, and if a path p coming back from the node N can be found and the node de-duplication number N on the path is equal to the directed edge number M (namely, the annular path length), then p is the minimum loop; if such a path is not found, it is stated that none of the edges through node n are part of the ring call path.
3) All table nodes on the loop p are obtained, namely the points forming the loop, and the nodes are returned to the node set, namely the minimum loop.
The resulting minimum loop is shown in table 2.
TABLE 2
Figure BDA0004186223780000131
Figure BDA0004186223780000141
The minimum loops in the loop call loops are determined, so that the number of the minimum loops is obtained, the complexity of the data model can be measured, and meanwhile, a reference basis can be provided for the optimization model.
The subject application is operational with numerous general purpose or special purpose computer system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like. The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The method and the device can be applied to the field of financial science and technology, the data query script corresponding to the financial data table is obtained from the large database, the data query script is analyzed to obtain the table blood edge relation diagram of the financial data, the annular calling path in the table blood edge relation diagram is identified and disassembled, and the unidirectional table relation diagram is obtained. The method can simplify the calling link of the financial data table, improves the financial data acquisition and processing efficiency, and meanwhile, the clear table calling link is beneficial to positioning analysis and development and modification of financial business.
Specifically, the application may be applied to insurance claims in the field of financial and scientific technology, specifically, obtain claim settlement requests, extract claim settlement parameters from the claim settlement requests, obtain corresponding data query scripts from an insurance database according to the claim settlement parameters, analyze the data query scripts to obtain table blood-edge relation diagrams corresponding to the claim settlement requests, where, for example, table nodes included in the table blood-edge relation diagrams include a policy table, a guarantee schedule table, a risk table, a responsibility table, a pay responsibility table, a bill item list and the like, and the calling relations include: policy, security schedule A, risk list B, responsibility list C, bill item list, policy, security schedule A, risk list B, responsibility list C, giving responsibility list D, security schedule A, security schedule E and bill item list; determining leaf table nodes in the table blood relationship diagram as a policy table according to the claim settlement parameters, and further acquiring all calling paths from the policy table to the root table nodes; identifying a ring call loop in a call path, disassembling the ring call loop to obtain a one-way table relation diagram, for example, disassembling a delivery responsibility table D into a delivery responsibility table D1 and a delivery responsibility table D2, and disassembling a delivery responsibility table A into a delivery responsibility table A1 and a delivery responsibility table A2, and putting the delivery responsibility table A and the delivery responsibility table A2 on different processing levels so that the call relation is a policy table, a delivery responsibility table A1, a dangerous seed table B, a responsibility table C, a delivery responsibility table D1, a bill item list, a policy table, a delivery responsibility table A1, a dangerous seed table B, a responsibility table C, a delivery responsibility table A2, a delivery responsibility table E and a bill item list; and optimizing the unidirectional table relation diagram.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by computer readable instructions stored in a computer readable storage medium that, when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited in order and may be performed in other orders, unless explicitly stated herein. Moreover, at least some of the steps in the flowcharts of the figures may include a plurality of sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, the order of their execution not necessarily being sequential, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.
With further reference to fig. 5, as an implementation of the method shown in fig. 2, the present application provides an embodiment of a data optimization apparatus based on patterning, where an embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 5, the data optimization device 500 based on patterning according to this embodiment includes: parsing module 501, identification module 502, disassembly module 503, acquisition module 504, and optimization module 505. Wherein:
the parsing module 501 is configured to obtain a data query script corresponding to a large database table, and parse the data query script to obtain a table blood edge relationship diagram;
the identifying module 502 is configured to identify a leaf table node in the table blood edge relationship graph, and obtain all call paths between the leaf table node and a root table node according to the table blood edge relationship graph;
the disassembly module 503 is configured to identify a ring call loop in the call path, disassemble the ring call loop, and obtain a unidirectional table relationship diagram;
the obtaining module 504 is configured to determine a longest call path between the leaf table node and the root table node based on the unidirectional table relationship graph;
the optimizing module 505 is configured to optimize the longest call path according to the target service, and obtain a table-optimized blood-edge relationship graph.
It should be emphasized that to further ensure the privacy and security of the table blood relationship graph, the table blood relationship graph may also be stored in a node of a blockchain.
Based on the data optimizing device based on the graphics, the calling paths among the data tables are obtained through the table blood relationship diagram of the large data table base, the annular calling loops in the calling paths are identified and disassembled, the calling links of the data tables are simplified, the complexity of the data processing links is reduced, the data acquisition and processing efficiency is improved, and meanwhile, the clear table calling links are beneficial to positioning analysis and business development and modification.
In some alternative implementations, parsing module 501 is further configured to:
carrying out grammar analysis on the data query script to obtain an abstract grammar tree;
extracting data table information according to the abstract syntax tree;
determining table nodes based on the data table information;
determining the hierarchical relation among the table nodes according to the hierarchical relation of the sentences corresponding to the data table information in the abstract syntax tree;
and obtaining a table blood edge relation graph based on the hierarchical relation among the table nodes.
According to the embodiment, the data query script is analyzed to obtain the table blood edge relation graph, so that the accuracy of the obtained table blood edge relation can be ensured.
In this embodiment, the identification module 502 includes an acquisition sub-module and a traversal sub-module, where the acquisition sub-module is configured to acquire a directed edge of a leaf table node according to a table blood edge relationship graph; the traversal submodule is used for traversing the upstream of the leaf table node based on the directed edge until the root table node stops, and obtaining a calling path.
According to the embodiment, the directed edges of the table blood edge relation graph traverse from the leaf table nodes to the root table nodes, so that the acquisition efficiency and accuracy of a calling path can be ensured.
In this embodiment, the disassembling module 503 includes an identifying sub-module, where the identifying sub-module includes an extracting unit, a deduplication unit, an obtaining unit, and a determining unit, where:
the extraction unit is used for extracting all table nodes on the calling path;
the de-duplication unit is used for de-duplicating the table nodes to obtain the number of the path nodes;
the obtaining unit is used for obtaining the length of the calling path according to the calling path;
the determining unit is used for determining a ring call loop based on the number of path nodes and the call path length.
The annular calling loop is determined through the number of path nodes and the calling path length, and the method is simple, convenient and quick.
In this embodiment, the determining unit is further configured to: comparing the number of path nodes with the length of the calling path; when the length of the calling path is greater than or equal to the number of the path nodes, the calling path is a ring calling path.
In some alternative implementations, the disassembly module 503 further includes a disassembly sub-module for: determining a minimum loop in the loop call loop; and disassembling the minimum loop according to a preset disassembly rule according to the directed edge in the minimum loop.
The embodiment can disassemble the annular call loop by disassembling the minimum loop, so that the disassembly can be realized more efficiently and accurately, and the disassembled table can be ensured to be used in the service.
In this embodiment, the disassembly submodule includes an acquisition unit, a traversal unit, an acquisition unit, and a comparison unit, where:
the acquisition unit is used for acquiring the ring table nodes on the ring call path to form a node set;
the traversing unit is used for traversing the ring table nodes in the node set according to the directed edges to obtain the same ring paths of the starting node and the end node;
the obtaining unit is used for obtaining the number of ring nodes and the length of the ring path after the duplication removal according to the ring path;
the comparison unit is used for comparing the annular path length and the annular node number; when the ring path length and the number of ring nodes are equal, the ring path is the minimum loop.
The minimum loops in the loop call loops are determined, so that the number of the minimum loops is obtained, the complexity of the data model can be measured, and meanwhile, a reference basis can be provided for the optimization model.
In order to solve the technical problems, the embodiment of the application also provides computer equipment. Referring specifically to fig. 6, fig. 6 is a basic structural block diagram of a computer device according to the present embodiment.
The computer device 6 comprises a memory 61, a processor 62, a network interface 63 communicatively connected to each other via a system bus. It is noted that only computer device 6 having components 61-63 is shown in the figures, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead. It will be appreciated by those skilled in the art that the computer device herein is a device capable of automatically performing numerical calculations and/or information processing in accordance with predetermined or stored instructions, the hardware of which includes, but is not limited to, microprocessors, application specific integrated circuits (Application Specific Integrated Circuit, ASICs), programmable gate arrays (fields-Programmable Gate Array, FPGAs), digital processors (Digital Signal Processor, DSPs), embedded devices, etc.
The computer equipment can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The computer equipment can perform man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch pad or voice control equipment and the like.
The memory 61 includes at least one type of readable storage media including flash memory, hard disk, multimedia card, card memory (e.g., SD or DX memory, etc.), random Access Memory (RAM), static Random Access Memory (SRAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), programmable Read Only Memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the storage 61 may be an internal storage unit of the computer device 6, such as a hard disk or a memory of the computer device 6. In other embodiments, the memory 61 may also be an external storage device of the computer device 6, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the computer device 6. Of course, the memory 61 may also comprise both an internal memory unit of the computer device 6 and an external memory device. In this embodiment, the memory 61 is typically used to store an operating system and various application software installed on the computer device 6, such as computer readable instructions based on a graphical data optimization method. Further, the memory 61 may be used to temporarily store various types of data that have been output or are to be output.
The processor 62 may be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip in some embodiments. The processor 62 is typically used to control the overall operation of the computer device 6. In this embodiment, the processor 62 is configured to execute computer readable instructions stored in the memory 61 or process data, for example, execute computer readable instructions of the graphics-based data optimization method.
The network interface 63 may comprise a wireless network interface or a wired network interface, which network interface 63 is typically used for establishing a communication connection between the computer device 6 and other electronic devices.
According to the method, the steps of the data optimization method based on the graphics in the embodiment are realized when the processor executes the computer readable instructions stored in the memory, the call paths among the data tables are obtained through the table blood relationship diagram of the large data table base, the annular call loops in the call paths are identified and disassembled, the call links of the data tables are simplified, the complexity of the data processing links is reduced, the data acquisition and processing efficiency is improved, and meanwhile, the clear table call links are beneficial to positioning analysis problems and business development and modification.
The application further provides another embodiment, namely a computer readable storage medium, wherein the computer readable storage medium stores computer readable instructions, the computer readable instructions can be executed by at least one processor, so that the at least one processor executes the steps of the data optimization method based on the graphics, a call path between data tables is obtained through a table blood-edge relation diagram of a large data table base, a ring call loop in the call path is identified and disassembled, a call link of the data table is simplified, the complexity of a data processing link is reduced, the data acquisition and processing efficiency is improved, and meanwhile, the clear table call link is beneficial to positioning analysis problem improvement and business development and version improvement.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk), comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.
It is apparent that the embodiments described above are only some embodiments of the present application, but not all embodiments, the preferred embodiments of the present application are given in the drawings, but not limiting the patent scope of the present application. This application may be embodied in many different forms, but rather, embodiments are provided in order to provide a more thorough understanding of the present disclosure. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described in the foregoing, or equivalents may be substituted for elements thereof. All equivalent structures made by the specification and the drawings of the application are directly or indirectly applied to other related technical fields, and are also within the protection scope of the application.

Claims (10)

1. A data optimization method based on graphics, comprising the steps of:
acquiring a data query script corresponding to a large database table, and analyzing the data query script to obtain a table blood-edge relation diagram;
identifying leaf table nodes in the table blood edge relation diagram, and acquiring all calling paths between the leaf table nodes and root table nodes according to the table blood edge relation diagram;
Identifying a ring call loop in the call path, and disassembling the ring call loop to obtain a unidirectional table relation diagram;
determining a longest call path between the leaf table node to the root table node based on the unidirectional table relationship graph;
and optimizing the longest calling path according to the target service to obtain a table optimization blood margin relation diagram.
2. The method of claim 1, wherein the step of parsing the data query script to obtain a table blood relationship graph comprises:
carrying out grammar analysis on the data query script to obtain an abstract grammar tree;
extracting data table information according to the abstract syntax tree;
determining a table node based on the data table information;
determining the hierarchical relation among the table nodes according to the hierarchical relation of the sentences corresponding to the data table information in the abstract syntax tree;
and obtaining a table blood edge relation graph based on the hierarchical relation among the table nodes.
3. The method of claim 1, wherein the step of obtaining all call paths from the leaf table node to the root table node according to the table blood edge relationship graph comprises:
Obtaining the directed edges of the leaf table nodes according to the table blood edge relation diagram;
and traversing the directed edge to the upstream of the leaf table node until the root table node stops, and obtaining the calling path.
4. A method of optimizing data based on patterning of claim 3, wherein the step of identifying a loop call loop in the call path comprises:
extracting all table nodes on the calling path;
de-duplicating the table nodes to obtain the number of path nodes;
obtaining the calling path length according to the calling path;
and determining a loop call loop based on the number of path nodes and the call path length.
5. The method of optimizing data based on patterning of claim 4, wherein the step of determining a loop call loop based on the number of path nodes and the call path length comprises:
comparing the number of path nodes with the call path length;
and when the length of the calling path is greater than or equal to the number of the path nodes, the calling path is a ring-shaped calling path.
6. The method of claim 1, wherein the step of disassembling the loop call loop comprises:
Determining a minimum loop in the loop call loop;
and disassembling the minimum loop according to a preset disassembly rule according to the directed edge in the minimum loop.
7. The method of optimizing data based on patterning of claim 6, wherein the step of determining a minimum loop in the loop call loop comprises:
acquiring ring table nodes on the ring call path to form a node set;
traversing the ring table nodes in the node set according to the directed edges to obtain the same ring paths of the starting node and the end node;
obtaining the number of ring nodes and the length of the ring path after the duplication removal according to the ring path;
comparing the ring path length with the number of ring nodes;
when the ring path length and the ring node number are equal, the ring path is the minimum loop.
8. A data optimization apparatus based on patterning, comprising:
the analysis module is used for acquiring a data query script corresponding to the large database table and analyzing the data query script to obtain a table blood edge relation diagram;
the identifying module is used for identifying the leaf table nodes in the table blood edge relation diagram and acquiring all calling paths from the leaf table nodes to the root table nodes according to the table blood edge relation diagram;
The disassembly module is used for identifying a ring call loop in the call path and disassembling the ring call loop to obtain a unidirectional table relation diagram;
an obtaining module, configured to determine a longest call path between the leaf table node and the root table node based on the unidirectional table relationship graph;
and the optimizing module is used for optimizing the longest calling path according to the target service to obtain a table optimization blood-margin relation diagram.
9. A computer device comprising a memory having stored therein computer readable instructions which when executed by a processor implement the steps of the graphically based data optimization method of any one of claims 1 to 7.
10. A computer readable storage medium having stored thereon computer readable instructions which when executed by a processor implement the steps of the patterning based data optimization method according to any one of claims 1 to 7.
CN202310413516.8A 2023-04-11 2023-04-11 Data optimization method, device, computer equipment and medium based on graphics Pending CN116431639A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310413516.8A CN116431639A (en) 2023-04-11 2023-04-11 Data optimization method, device, computer equipment and medium based on graphics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310413516.8A CN116431639A (en) 2023-04-11 2023-04-11 Data optimization method, device, computer equipment and medium based on graphics

Publications (1)

Publication Number Publication Date
CN116431639A true CN116431639A (en) 2023-07-14

Family

ID=87085113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310413516.8A Pending CN116431639A (en) 2023-04-11 2023-04-11 Data optimization method, device, computer equipment and medium based on graphics

Country Status (1)

Country Link
CN (1) CN116431639A (en)

Similar Documents

Publication Publication Date Title
US10102480B2 (en) Machine learning service
US9886670B2 (en) Feature processing recipes for machine learning
CN111177231A (en) Report generation method and report generation device
US20170083569A1 (en) Natural language interface to databases
US10042921B2 (en) Robust and readily domain-adaptable natural language interface to databases
AU2014233672A1 (en) System for metadata management
CN109933514B (en) Data testing method and device
CN111414376A (en) Data early warning method and device
US20180004724A1 (en) Providing action associated with event detected within communication
US20170109697A1 (en) Document verification
CN111427971A (en) Business modeling method, device, system and medium for computer system
CN114091426A (en) Method and device for processing field data in data warehouse
CN114461644A (en) Data acquisition method and device, electronic equipment and storage medium
CN116594683A (en) Code annotation information generation method, device, equipment and storage medium
CN113962597A (en) Data analysis method and device, electronic equipment and storage medium
CN112363814A (en) Task scheduling method and device, computer equipment and storage medium
CN116955856A (en) Information display method, device, electronic equipment and storage medium
CN116450723A (en) Data extraction method, device, computer equipment and storage medium
CN112256566B (en) Fresh-keeping method and device for test cases
US8868485B2 (en) Data flow cost modeling
US11372899B2 (en) Method and system for selecting target data
US20230086564A1 (en) System and method for automatic discovery of candidate application programming interfaces and dependencies to be published
US20150073902A1 (en) Financial Transaction Analytics
CN116431639A (en) Data optimization method, device, computer equipment and medium based on graphics
CN113762702A (en) Workflow deployment method, device, computer system and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination