CN112579709B

CN112579709B - Data table identification method and device, storage medium and electronic equipment

Info

Publication number: CN112579709B
Application number: CN202011497172.6A
Authority: CN
Inventors: 顾冠雄; 段义霖
Original assignee: Agricultural Bank of China
Current assignee: Agricultural Bank of China
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2023-07-28
Anticipated expiration: 2040-12-17
Also published as: CN112579709A

Abstract

The invention discloses a data table identification method, a device, a storage medium and electronic equipment, wherein a target table node can be determined in a first database-level table association diagram associated with a target database, a second database-level table association diagram is formed by other table nodes except the target table node and directed edges among the other table nodes, and a data table to be split is further determined by determining the table nodes to be split in the second database-level table association diagram, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association diagram is in a preset splitting ratio interval. The method and the system enable the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association graph to be in a preset splitting ratio interval, determine the data table to be split through the table nodes to be split, and facilitate the effective and scientific splitting of the database when the target database is modified according to the data table to be split subsequently.

Description

Data table identification method and device, storage medium and electronic equipment

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a data table identification method, a data table identification device, a storage medium, and an electronic device.

Background

The load and the data volume of the database are not large at the construction place of the monomer project, and the capacity of the database and various processing performances of the data can meet the business requirements of customers on the monomer project.

However, as the business requirements of the single project continue to increase, the data in the database continues to expand, and even some data tables may show geometric growth, and when the data in the database reaches a certain scale, the processing performance of querying, reading, etc. the data will decrease, resulting in a decrease in the efficiency of processing the data in the database. Meanwhile, with the rise of micro-service items, the database in the single item needs to be modified into a database meeting the business requirement of customers on the micro-service items, so how to determine a scheme for effectively modifying the database becomes a technical problem which needs to be solved by technicians first.

Disclosure of Invention

In view of the above problems, the present invention provides a data table identification method, apparatus, storage medium and electronic device, which overcome or at least partially solve the above problems, and the technical solutions are as follows:

a data sheet identification method, comprising:

Obtaining a first database-level table association graph associated with a target database, the first database-level table association graph comprising: the system comprises table nodes and directed edges, wherein the table nodes are nodes corresponding to a source table or a target table in the target database, and the directed edges are directed connecting lines, which point to the table nodes corresponding to the target table, of the table nodes corresponding to the source table;

determining target table nodes in all the table nodes in the first database-level table association graph according to the number of the directed edges connected with the table nodes;

obtaining a second database-level table association diagram according to the first database-level table association diagram, wherein the second database-level table association diagram is composed of other table nodes except the target table node and the directed edges between the other table nodes;

determining whether the second database level table association graph comprises at least two independent communication graphs, if so, determining table nodes in at least one independent communication graph in the second database level table association graph as table nodes to be split, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database level table association graph is in a preset splitting ratio interval, each table node in any independent communication graph is directly or indirectly connected through at least one directed edge, and the connection relationship of the directed edge does not exist between any table node in one independent communication graph and each table node in the other independent communication graph;

And determining the data table corresponding to the node of the table to be split as the data table to be split.

Optionally, the determining the table node in at least one independent connected graph in the second database level table association graph as a table node to be split, where a ratio of the number of table nodes to the number of table nodes in the first database level table association graph is in a preset split ratio interval, includes:

determining the table node in the independent connected graph with the least table node in the second database level table association graph as the current target splitting table node, determining whether the ratio of the number of the current target splitting table nodes to the number of the table nodes in the first database level table association graph is in a preset splitting ratio interval, if so, determining the current target splitting table node as the table node to be split, if not, determining the independent connected graph with the least table node in each independent connected graph without the current target splitting table node, determining the table node in the independent connected graph determined at this time as the current target splitting table node, and returning to execute the step of determining whether the ratio of the number of the current target splitting table nodes to the number of the table nodes in the first database level table association graph is in the preset splitting ratio interval.

Optionally, the method further comprises:

when the ratio of the number of the determined current target split table nodes to the number of the table nodes in the first database-level table association graph cannot be in the preset split ratio interval, identifying an independent communication graph with the most table nodes from the second database-level table association graph;

determining at least two table node groups in the independent connected graph with the maximum number of table nodes through a first preset graph searching algorithm;

dividing the at least two table node groups into at least two table node subgraphs according to the preset split proportion interval, wherein the at least two table node subgraphs comprise target table node subgraphs, and the ratio of the number of table nodes in the target table node subgraphs to the number of table nodes in the first database-level table association graph is in the preset split proportion interval;

and determining the table node in the target table node subgraph as the table node to be split.

Optionally, the method further comprises:

when the second database-level table association diagram is an overall connection diagram, determining at least two table node groups in the overall connection diagram through a second preset diagram searching algorithm;

Optionally, the method further comprises:

determining the target table node as a table node to be synchronized;

and determining the data table corresponding to the node of the table to be synchronized as the data table to be synchronized.

Optionally, the method further comprises:

determining a second associated node of the at least two table node subgraphs as a table node to be synchronized;

Optionally, the obtaining a first database-level table association diagram associated with the target database includes:

acquiring at least one SQL sentence in the target database through a preset database connection string and a built-in relation associated with the SQL sentence;

establishing an association relation between the at least one SQL sentence and each data table in the built-in relation associated with the SQL sentence according to preset key characters and preset regular expressions;

And generating a first database-level table association diagram associated with the target database according to the association relation between the at least two data tables.

A data sheet identification apparatus comprising: a first database level table association diagram obtaining unit, a target table node determining unit, a second database level table association diagram obtaining unit, a connected diagram determining unit, a table node determining unit to be split and a data table determining unit to be split,

the first database-level table association diagram obtaining unit is configured to obtain a first database-level table association diagram associated with a target database, where the first database-level table association diagram includes: the system comprises table nodes and directed edges, wherein the table nodes are nodes corresponding to a source table or a target table in the target database, and the directed edges are directed connecting lines, which point to the table nodes corresponding to the target table, of the table nodes corresponding to the source table;

the target table node determining unit is configured to determine a target table node in each table node in the first database-level table association graph according to the number of the directed edges connected by the table node;

the second database-level table association diagram obtaining unit is configured to obtain a second database-level table association diagram according to the first database-level table association diagram, where the second database-level table association diagram is formed by other table nodes except the target table node and the directed edges between the other table nodes;

The communication diagram determining unit is used for determining whether the second database level table association diagram comprises at least two independent communication diagrams, and if so, triggering the table node determining unit to be split;

the to-be-split table node determining unit is configured to determine table nodes in at least one independent connected graph in the second database level table association graph as to-be-split table nodes, where a ratio of the number of to-be-split table nodes to the number of table nodes in the first database level table association graph is in a preset split ratio interval, each table node in any independent connected graph is directly or indirectly connected through at least one directed edge, and a connection relationship of a directed edge does not exist between any table node in one independent connected graph and each table node in another independent connected graph;

the to-be-split data table determining unit is used for determining the data table corresponding to the to-be-split table node as the to-be-split data table.

A storage medium having stored thereon a program which, when executed by a processor, implements the data table identification method of any one of the above.

An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the data table identification method of any of the above.

By means of the technical scheme, the data table identification method, the data table identification device, the storage medium and the electronic equipment can determine at least one target table node in a first database-level table association diagram associated with a target database, form a second database-level table association diagram by other table nodes except the target table node and directed edges among the other table nodes, and further determine a data table to be split by determining the table nodes to be split in the second database-level table association diagram, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association diagram is in a preset split ratio interval. The method and the system enable the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association graph to be in a preset splitting ratio interval, determine the data table to be split through the table nodes to be split, and facilitate the effective and scientific splitting of the database when the target database is modified according to the data table to be split subsequently.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a flow chart of a data table identification method according to an embodiment of the present invention;

FIG. 2 illustrates a first database-level table association diagram provided by an embodiment of the present invention;

FIG. 3 is a schematic flow chart of another method for identifying a data table according to an embodiment of the present invention;

FIG. 4 illustrates a second database-level table association diagram provided by an embodiment of the present invention;

FIG. 5 illustrates another second database-level table association diagram provided by an embodiment of the present invention;

FIG. 6 is a schematic flow chart of another method for identifying a data table according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of another method for identifying a data table according to an embodiment of the present invention;

FIG. 8 is a schematic flow chart of another method for identifying a data table according to an embodiment of the present invention;

Fig. 9 shows a schematic structural diagram of a data table identification device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, a data table identification method provided by an embodiment of the present invention may include:

s100, obtaining a first database-level table association diagram associated with a target database, wherein the first database-level table association diagram comprises: the system comprises table nodes and directed edges, wherein the table nodes are nodes corresponding to source tables or target tables in the target database, and the directed edges are directed connecting lines, which point to the table nodes corresponding to the target tables, of the table nodes corresponding to the source tables.

The target database may be an organized, sharable, uniformly managed data set stored in the computer for a long period of time. The user can perform operations such as adding, inquiring, updating, deleting and the like on the data in the data set. The target database is a relational database.

The first database-level table association graph may be a directed graph established according to directed edges between table nodes corresponding to the data tables in the target database and the data tables. The data tables may include a source table and a target table. For example: as shown in fig. 2, the first database-level table association graph may have a square as a table node, a line segment with an arrow as a directed edge, in a normal case, the arrow of the directed edge points to the table node corresponding to the target table, and the table nodes connected in the opposite direction to the direction pointed by the directed edge arrow are table nodes corresponding to the source table. It should be noted that the source table and the target table are relative concepts, and one data table may be the source table or the target table at the same time. For example, in table node a in fig. 2, table node a is a table node corresponding to the source table for table node B, and table node C is a table node corresponding to the target table.

Specifically, based on the method shown in fig. 1, as shown in fig. 3, another data table identification method provided in the embodiment of the present invention, step S100 may include:

s110, obtaining at least one SQL sentence and a built-in relation associated with the SQL sentence in the target database through a preset database connection string.

The database connection string may be a character string including at least one of a user name, a password, a server IP address of the database, a port number, and a service name. For example: the database string may be "user/password@127.0.0.1:1521/server_name". The preset database connection string corresponds to the target database. According to the embodiment of the invention, the target database can be logged in and connected with the target database through the preset database connection string, and then the SQL log and the configuration of the external key and the trigger are queried in the target database, so that at least one SQL sentence is obtained in the SQL log, and the built-in relation related to the SQL sentence is determined according to the configuration of the external key and the trigger. Alternatively, the embodiment of the invention can remove completely repeated SQL sentences, namely only one SQL sentence is reserved for a plurality of identical SQL sentences.

Wherein, the foreign key refers to a field in one data table and a preset key associated field in another data table in the database. Because the association between data tables in the database is not fully represented by the relationships between tables in the SQL statement, for example: the trigger "{ when 1 data is inserted into the table a, the insertion time of the data is inserted into the table b }", when the data is inserted into the table a by using the SQL statement, the table b actually acts, so that the association relationship between the table a and the table b exists, however, the data inserted into the table a is only embodied in the SQL statement, and the association relationship between the table a and the table b is not embodied. Therefore, the built-in relation between the query external key and the trigger configuration and the SQL sentence is added, and a data table which reflects the relation between the query external key and the SQL sentence except the SQL sentence can be queried.

S120, according to preset key characters and preset regular expressions, establishing association relations between the at least one SQL sentence and each data table in built-in relations associated with the SQL sentence.

According to the embodiment of the invention, each SQL sentence can be split into at least one sub-sentence which only comprises one first preset key character according to the first preset key character through the preset regular expression. Alternatively, the first preset key character may be SELECT. For example: according to the embodiment of the invention, the SQL sentence can be split into at least one sub-sentence, and the sub-sentence containing more than two first preset key characters is continuously split until each sub-sentence contains only one first preset key character.

According to the embodiment of the invention, the sub-sentences each containing only one first preset key character can be split into at least two parts according to the second preset key character. It is noted that some of the at least two portions may be empty. Optionally, the second preset key character may be at least one of SELECT, FROM, and WHERE. Optionally, in the embodiment of the present invention, the sub-sentences each including only one first preset key character may be split into four parts according to SELECT, FROM and WHERE. Alternatively, embodiments of the present invention may allow 1 to 3 of the four sections to be empty. Specifically, the embodiment of the invention can determine the target table name in the part before the SELECT according to the third preset key character and the position (for example, after the insert into and after the merge into). For example: assuming that the third predetermined key is "after insert into" and the portion before SELECT is "insert into TableA SELECT taba, tabb from B where tabb > tab", the determined target table name is "TableA". According to the embodiment of the invention, the source table name can be determined after the part before WHERE after FROM is filtered according to the fourth preset key character. Wherein the fourth preset key character may include at least one of "comma '," join "," inner "," outer ", and" on "and" bracket' () ".

The embodiment of the invention can inquire whether the data table corresponding to the target table name exists in the database, and if so, the data table is determined to be the target table. Similarly, the embodiment of the invention can query whether the data table corresponding to the source table name exists in the database, and if so, the data table is determined to be the source table. According to the embodiment of the invention, the data table corresponding to the target table name and the data table corresponding to the source table name are inquired, and the target table name and the source table name of the data table which do not exist in the database can be filtered while the target table and the source table are determined.

The embodiment of the invention can be used for sub-sentences each containing only one first preset key character: and establishing a directed edge pointed to the target table by the source table in the sub-statement so as to embody the association relationship between the source table and the target table.

Optionally, the embodiment of the present invention may remove completely repeated sub-sentences including only one first preset key character, that is, only one sub-sentence including only one first preset key character is reserved for a plurality of identical sub-sentences including only one first preset key character.

S130, generating a first database-level table association diagram associated with the target database according to the association relation between the at least two data tables.

Specifically, according to the directed edges, which are established in step S120 and point to the target table from the source table, a first database-level table association diagram is generated. The target table and the source table are embodied in the form of table nodes in a first database-level table association graph. Optionally, in the embodiment of the present invention, the first database-level table association map may be stored in a preset data structure.

According to the embodiment of the invention, the first database level table association diagram is constructed through at least one SQL sentence and the built-in relation associated with the SQL sentence in the target database, so that the risk that the association diagram established only according to the SQL sentence is inconsistent with the actual situation is avoided, and the subsequent processing of the first database level table association diagram is ensured to be reasonable and accurate.

S200, determining target table nodes in all the table nodes in the first database-level table association diagram according to the number of the directed edges connected with the table nodes.

Alternatively, the embodiment of the present invention may determine 0 or at least one target table node among the table nodes in the first database-level table association graph.

It will be appreciated that in the case where 0 target table nodes are determined, the first database-level table association diagram is equivalent to the second database-level table association diagram herein.

Optionally, the embodiment of the present invention may be configured to, for each table node in the first database-level table association graph: and ordering at least from more than one according to the number of connected directed edges, and taking the table nodes with the number of the preset table nodes which are ordered at the front as target table nodes. The number of the preset table nodes can be determined according to actual requirements. For example, as shown in fig. 2, if the preset number of table nodes is 1, the table node a is the target table node.

S300, obtaining a second database-level table association diagram according to the first database-level table association diagram, wherein the second database-level table association diagram is composed of other table nodes except the target table node and the directed edges among the other table nodes.

Wherein, according to the difference of the target table nodes, two situations may occur in the second database-level table association diagram formed by other table nodes except the target table node and the directed edges between the other table nodes: the first case is that all table nodes in the second database level table association diagram are still directly or indirectly communicated through at least one directed edge, and the second case is that at least two table nodes in the second database level table association diagram are not directly or indirectly communicated through any directed edge, that is, at least two mutually independent parts exist. The first case may be considered that the second database-level table association diagram is still an overall connectivity diagram, and the second case may be considered that the second database-level table association diagram is composed of at least two independent connectivity diagrams. For ease of understanding, the description herein is further provided with respect to the second database-level table association diagram shown in fig. 4 and 5, based on the first database-level table association diagram shown in fig. 2: if the table node a in the first database-level table association diagram is the target table node, the second database-level table association diagram is shown in fig. 4, and the table nodes except the table node a are still directly or indirectly connected through the directed edge. If the table node a, the table node C, and the table node D in the first database-level table association diagram are target table nodes, the second database-level table association diagram is shown in fig. 5, and the second database-level table association diagram is composed of an independent connection diagram including the table node B and an independent connection diagram including the table node E, the table node F, and the table node H. When the second database-level table association diagram is not formed by at least two independent communication diagrams, the second database-level table association diagram is equivalent to the second database-level table association diagram which only comprises one independent communication diagram, namely, as shown in fig. 4, the second database-level table association diagram is an integral communication diagram.

S400, determining whether the second database level table association diagram comprises at least two independent communication diagrams, and if so, executing step S500.

Alternatively, the embodiment of the present invention may use a target graph search algorithm to determine whether the second database level table association graph is an overall connected graph. The target graph Search algorithm may include at least one of a Breadth-First Search algorithm (BFS) and a Depth-First Search algorithm (DFS), among others. Preferably, the embodiment of the invention can determine whether the second database level table association graph is an overall connectivity graph through a breadth-first search algorithm. It will be appreciated that other graph search algorithms may also be used in embodiments of the present invention to determine whether the second database level table association graph is an overall connected graph. It will be appreciated that when the second database-level table association diagram is not an overall connectivity diagram, then the second database-level table association diagram includes at least two independent connectivity diagrams.

S500, determining the table nodes in at least one independent communication graph in the second database level table association graph as table nodes to be split, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database level table association graph is in a preset splitting ratio interval, each table node in any independent communication graph is directly or indirectly connected through at least one directed edge, and no directed edge connection relationship exists between any table node in one independent communication graph and each table node in the other independent communication graph.

The preset split proportion interval can be set according to actual requirements. For example: the preset split ratio interval may be 0.5 to 0.6. The user can set a preset split proportion interval according to the proportion of the split data table to the data tables corresponding to all table nodes in the first database-level table association diagram, so that the proportion of the split data table to the data tables corresponding to all table nodes in the first database-level table association diagram meets the requirement of the user.

Alternatively, the embodiment of the present invention may traverse all the independent connectivity graphs in the second database-level table association graph, for each connectivity graph combination including at least one independent connectivity graph: and simulating the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association graph when the table nodes in each independent communication graph in the communication graph combination are used as the table nodes to be split. The embodiment of the invention can determine the table node in the independent communication graph in the communication graph combination when the proportion accords with the preset splitting proportion interval as the table node to be split.

Optionally, based on the method shown in fig. 1, as shown in fig. 6, another data table identification method provided in the embodiment of the present invention, step S500 may include:

S510, determining the table node in the independent connected graph with the least table node in the second database level table association graph as the current target split table node.

The embodiment of the invention can determine the number of the table nodes in each independent communication graph in the second database level table association graph, and determine the table node in the independent communication graph with the minimum number of the table nodes as the current target split table node.

S520, determining whether the ratio of the number of the current target split table nodes to the number of table nodes in the first database level table association graph is in a preset split ratio interval, if so, executing a step S530, and if not, executing a step S540.

S530, determining the current target splitting table node as the table node to be split.

S540, determining the independent communication graph with the least table nodes in each independent communication graph which does not comprise the current target split table node, determining the table node in the determined independent communication graph as the current target split table node, and returning to the step S520.

To facilitate an understanding of steps S510 through S540, further description is provided herein by way of example: assuming that the second database level table association graph includes an independent connectivity graph 1 (the number of table nodes is 30), an independent connectivity graph 2 (the number of table nodes is 20) and an independent connectivity graph 3 (the number of table nodes is 10), the embodiment of the invention determines 10 table nodes in the independent connectivity graph 3 as current target split table nodes, calculates whether the ratio of the 10 table nodes in the independent connectivity graph 3 to the number of table nodes in the first database level table association graph is in a preset split ratio interval, if yes, determines the 10 table nodes in the independent connectivity graph 3 as table nodes to be split, if not, determines 20 table nodes in the independent connectivity graph 2 as current target split table nodes (i.e. the number of the current target split table nodes is 30) as table nodes in the independent connectivity graph 3 and the independent connectivity graph 2, calculates whether the ratio of the number of the 30 table nodes to the table nodes in the first database level table association graph is in the preset split ratio interval, if yes, determines whether the 30 table nodes to be split as the table nodes to be split, if not, and if not, determines that the number of table nodes in the independent connectivity graph 1 and the independent connectivity graph 2 is continuously split target split ratio is the current target split ratio of table nodes (i.e. 60) and if not, determines that the number of table nodes in the independent connectivity graph 2 is continuously connected as current target split ratio node and 60.

It can be understood that, after the table node in the independent connected graph with the most table nodes in the second database level table association graph is also used as the current target split table node, the ratio of the number of the current target split table nodes to the number of the table nodes in the first database level table association graph still cannot be in the preset split ratio interval, and then the loop from step S510 to step S540 is ended.

In practical applications, a situation may occur that the ratio of the number of table nodes in a connected graph combination including at least one independent connected graph to the number of table nodes in a first database-level table association graph cannot be in the preset split ratio interval, or a situation may occur that the ratio of the number of table nodes in the current target split table determined in steps S510 to S540 to the number of table nodes in the first database-level table association graph cannot always be in the preset split ratio interval. The method can be specifically understood that after the table nodes in each independent connected graph in the second database level table association graph are determined to be the current target split table nodes, the ratio of the number of the current target split table nodes to the number of the table nodes in the first database level table association graph still cannot be in a preset split ratio interval.

Optionally, when the determined ratio of the number of the current target split table nodes to the number of table nodes in the first database-level table associated graph cannot be in the preset split ratio interval, and/or when the ratio of the number of table nodes in any connected graph combination including at least one independent connected graph to the number of table nodes in a database-level table associated graph cannot be in the preset split ratio interval, the embodiment of the present invention may further identify, from the second database-level table associated graph, the independent connected graph including the most table nodes, determine, by a first preset graph search algorithm, at least two table node groups in the independent connected graph including the most table nodes, and divide the at least two table node groups into at least two table node groups according to the preset split ratio interval, where the at least two table node sub-graph includes a target table node sub-graph, the ratio of the number of table nodes in the target table node to the number of table nodes in the first database-level table associated graph is in the preset split ratio interval, and determine that the split table sub-graph includes the target sub-graph node is the target sub-graph node.

Specifically, the embodiment of the invention can identify the independent connected graph with the most table nodes in the second database-level table association graph by using a target graph searching algorithm. Preferably, the target graph search algorithm may be a breadth-first search algorithm.

The first preset graph search algorithm may be a community discovery (Community Detection) algorithm. Specifically, the embodiment of the invention can determine the aggregation condition of each table node in the independent connected graph with the most table nodes through a community discovery algorithm, and determine at least two table node groups meeting the preset association degree according to the aggregation condition.

The embodiment of the invention can determine the table nodes connected among the table node groups as the first association node.

Optionally, in the embodiment of the present invention, at least one first target subgraph including a proportion of the number of table nodes to the number of table nodes in the first database level table association graph in a preset split proportion interval may be determined from the at least two table node subgraphs obtained by the dividing. Optionally, when the number of the first target subgraphs is 1, determining that the first target subgraph is a target table node subgraph. Optionally, when the number of the first target subgraphs is not less than 2, a second target subgraph is determined in each first target subgraph. Alternatively, the second target subgraph may be the first target subgraph including the least number of first associated nodes in each first target subgraph.

The target table node subgraph may be a table node subgraph with the least number of table nodes in the at least two table node subgraphs. Preferably, the embodiment of the present invention may divide the at least two table node groups into two table node subgraphs.

Optionally, based on the method shown in fig. 1, as shown in fig. 7, another data table identification method provided by the embodiment of the present invention may further include: when it is determined in step S400 that the second database-level table association diagram is an overall connected diagram, step S700 is performed.

S700, determining at least two table node groups in the whole communication graph through a second preset graph searching algorithm.

The second preset map search algorithm may be the same as the first preset map search algorithm. Specifically, the embodiment of the invention can determine the aggregation condition of each table node in the whole communication graph through a community discovery algorithm, and determine at least two table node groups meeting the preset association degree according to the aggregation condition.

S800, dividing the at least two table node groups into at least two table node subgraphs according to the preset split proportion interval, wherein the at least two table node subgraphs comprise target table node subgraphs, and the ratio of the number of table nodes in the target table node subgraphs to the number of table nodes in the first database level table associated graph is in the preset split proportion interval.

Optionally, in the embodiment of the present invention, at least one third target subgraph that a ratio of the number of table nodes included to the number of table nodes in the first database level table association graph is in a preset split ratio interval may be determined in the at least two table node subgraphs. Optionally, when the number of the third target subgraph is 1, determining the third target subgraph as a target table node subgraph. Optionally, when the number of the third target subgraphs is not less than 2, a fourth target subgraph is determined in each third target subgraph. Alternatively, the fourth target subgraph may be the subgraph including the least number of first associated nodes in each third target subgraph.

S900, determining the table node in the target table node subgraph as the table node to be split.

S600, determining the data table corresponding to the table node to be split as the data table to be split.

According to the embodiment of the invention, the table node group can be divided into at least two table node subgraphs according to the aggregation condition of the table nodes and the preset splitting proportion interval, so that the at least two table node subgraphs can comprise target table node subgraphs with the ratio of the number of the table nodes to the number of the table nodes in the first database-level table association graph in the preset splitting proportion interval, the table nodes of the target table node subgraphs are further conveniently and directly determined to be the table nodes to be split, the determined table nodes to be split are more scientific and effective, the follow-up database is prevented from being modified by blindly determining the splitting points, and the service requirements of users on micro-service items can be met after the database is modified.

The embodiment of the invention can provide a reasonable splitting scheme according to the determined data table to be split aiming at the database with high functional coupling degree, and improves the efficiency of logically splitting the database.

The data table identification method provided by the embodiment of the invention can determine at least one target table node in the first database-level table association diagram associated with the target database, and form a second database-level table association diagram by other table nodes except the target table node and directed edges among other table nodes, and further determine the data table to be split by determining the table nodes to be split in the second database-level table association diagram, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association diagram is in a preset splitting ratio interval. The method and the system enable the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association graph to be in a preset splitting ratio interval, determine the data table to be split through the table nodes to be split, and facilitate the effective and scientific splitting of the database when the target database is modified according to the data table to be split subsequently.

Optionally, based on the method shown in fig. 1, as shown in fig. 8, another data table identification method provided by the embodiment of the present invention may further include:

S10, determining the target table node as a table node to be synchronized.

S20, determining the data table corresponding to the node of the table to be synchronized as the data table to be synchronized.

Optionally, another data table identification method provided by the embodiment of the present invention may further include: and determining a second association node of the at least two table node subgraphs as a table node to be synchronized, and determining a data table corresponding to the table node to be synchronized as a data table to be synchronized.

In particular, the at least two table node subgraphs may comprise a first table node subgraph and a second table node subgraph. The first table node subgraph is communicated with the second table node subgraph through a second association node, and the second table node subgraph is communicated with the first table node subgraph through the second association node. The first table node subgraph may or may not include the second association node. The second table node subgraph may or may not include the second association node.

According to the embodiment of the invention, the target table node and/or the second association node are used as the high-association table node, the high-association table node is determined as the table node to be synchronized, and the data table to be synchronized is determined, so that the number of the data tables to be synchronized in the process of modifying the database is effectively reduced, the large-scale synchronization operation among the databases due to the fact that the service consistency is ensured after the database is split is avoided, and the efficiency of modifying the database is improved.

Optionally, in the embodiment of the present invention, other table nodes except for the table node to be split and the table node to be synchronized in the first database level table association graph may be determined as a current state maintaining table node, and the data table corresponding to the current state maintaining table node may be determined as a current state maintaining data table.

According to the embodiment of the invention, a database transformation scheme can be generated according to at least one of the data table to be split, the data table to be synchronized and the current state data table to be maintained, so that the target database can be transformed according to the database transformation scheme, and the logical splitting of the database can be realized. Optionally, the embodiment of the invention can display the generated database modification scheme.

Corresponding to the above method embodiment, the structure of the data table identifying apparatus provided in the embodiment of the present invention is shown in fig. 9, and may include: the first database-level table association diagram obtaining unit 100, the target table node determining unit 200, the second database-level table association diagram obtaining unit 300, the connected diagram determining unit 400, the table node to be split determining unit 500, and the data table to be split determining unit 600.

The first database-level table association diagram obtaining unit 100 is configured to obtain a first database-level table association diagram associated with a target database, where the first database-level table association diagram includes: the system comprises table nodes and directed edges, wherein the table nodes are nodes corresponding to source tables or target tables in the target database, and the directed edges are directed connecting lines, which point to the table nodes corresponding to the target tables, of the table nodes corresponding to the source tables.

The first database-level table association graph may be a directed graph established according to directed edges between table nodes corresponding to the data tables in the target database and the data tables. The data tables may include a source table and a target table.

Specifically, another data table identifying apparatus provided by the embodiment of the present invention, the first database-level table association diagram obtaining unit 100 may include: the method comprises an SQL sentence and built-in relation obtaining subunit, an association relation establishing subunit and a first database level table association diagram obtaining subunit.

The SQL sentence and built-in relation obtaining subunit is configured to obtain at least one SQL sentence and a built-in relation associated with the SQL sentence in the target database through a preset database connection string;

the incidence relation establishing subunit is used for establishing the incidence relation between the at least one SQL sentence and each data table in the built-in relation associated with the SQL sentence according to preset key characters and preset regular expressions;

The first database-level table association diagram obtaining subunit is configured to generate a first database-level table association diagram associated with the target database according to the association relationship between the at least two data tables.

The target table node determining unit 200 is configured to determine a target table node in each table node in the first database-level table association graph according to the number of the directed edges connected by the table node.

The second database-level table association diagram obtaining unit 300 is configured to obtain a second database-level table association diagram according to the first database-level table association diagram, where the second database-level table association diagram is configured by other table nodes except the target table node and the directed edges between the other table nodes.

Wherein, according to the difference of the target table nodes, two situations may occur in the second database-level table association diagram formed by other table nodes except the target table node and the directed edges between the other table nodes: the first case is that all table nodes in the second database level table association diagram are still directly or indirectly communicated through at least one directed edge, and the second case is that at least two table nodes in the second database level table association diagram are not directly or indirectly communicated through any directed edge, that is, at least two mutually independent parts exist. The first case may be considered that the second database-level table association diagram is still an overall connectivity diagram, and the second case may be considered that the second database-level table association diagram is composed of at least two independent connectivity diagrams.

The connection diagram determining unit 400 is configured to determine whether the second database level table association diagram includes at least two independent connection diagrams, and if so, trigger the table node determining unit 500 to be split.

The table node to be split determining unit 500 is configured to determine table nodes in at least one independent connection graph in the second database level table association graph as table nodes to be split, where a ratio of the number of table nodes to be split to the number of table nodes in the first database level table association graph is in a preset splitting ratio interval, each table node in any independent connection graph is directly or indirectly connected through at least one directed edge, and no connection relationship of a directed edge exists between any table node in one independent connection graph and each table node in another independent connection graph.

The preset split proportion interval can be set according to actual requirements.

Optionally, in another data table identifying apparatus provided by the embodiment of the present invention, the to-be-split table node determining unit 500 may include: the system comprises a first splitting table node determining subunit, a proportion determining subunit, a table node determining subunit to be split and a second splitting table node determining subunit.

The first split table node determining subunit may be configured to determine a table node in the independent connectivity graph with the least table node in the second database level table association graph as a current target split table node.

The proportion determining subunit may be configured to determine whether a proportion of the number of the current target splitting table nodes to the number of table nodes in the first database level table association graph is in a preset splitting proportion interval, if so, trigger the to-be-split table node determining subunit, and if not, trigger the second splitting table node determining subunit.

And the to-be-split table node determining subunit is used for determining the current target split table node as the to-be-split table node.

And the second splitting table node determining subunit is used for determining the independent communication diagram with the least table nodes in each independent communication diagram which does not comprise the current target splitting table node, determining the table node in the determined independent communication diagram as the current target splitting table node, and triggering the proportion determining subunit.

Optionally, when the determined ratio of the number of the current target split table nodes to the number of table nodes in the first database-level table associated graph cannot be in the preset split ratio interval, and/or when the ratio of the number of table nodes in any one of the connected graph combinations including at least one independent connected graph to the number of table nodes in a database-level table associated graph cannot be in the preset split ratio interval, the to-be-split table node determining unit 500 may be further configured to identify, from the second database-level table associated graph, the independent connected graph including the most table nodes, determine, by a first preset graph search algorithm, at least two table node groups in the independent connected graph including the most table nodes, and divide the at least two table node groups into at least two table node subgraphs according to the preset split ratio interval, where the at least two table node subgraphs include the target table node, the ratio of the number of table nodes to the number of table nodes in the first database-level table associated graph is in the preset ratio interval, and determine the target subgraph node to be split.

Optionally, another data table identifying apparatus provided by the embodiment of the present invention may further include: a node group determination unit and a table node sub-graph dividing unit.

The node group determining unit may be configured to determine, when the connection graph determining unit 400 determines that the second database level table association graph is an overall connection graph, at least two table node groups in the overall connection graph through a second preset graph searching algorithm.

The table node sub-graph dividing unit divides the at least two table node groups into at least two table node sub-graphs according to the preset split proportion interval, wherein the at least two table node sub-graphs comprise target table node sub-graphs, and the ratio of the number of table nodes in the target table node sub-graphs to the number of table nodes in the first database level table associated graph is in the preset split proportion interval.

The to-be-split table node determining subunit may be further configured to determine a table node in the target table node subgraph as the to-be-split table node.

The to-be-split data table determining unit 600 is configured to determine a data table corresponding to the to-be-split table node as the to-be-split data table.

The data table identification device provided by the embodiment of the invention can determine at least one target table node in the first database-level table association diagram associated with the target database, and form a second database-level table association diagram by other table nodes except the target table node and directed edges among other table nodes, and further determine the data table to be split by determining the table nodes to be split in the second database-level table association diagram, wherein the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association diagram is in a preset splitting ratio interval. The method and the system enable the ratio of the number of the table nodes to be split to the number of the table nodes in the first database-level table association graph to be in a preset splitting ratio interval, determine the data table to be split through the table nodes to be split, and facilitate the effective and scientific splitting of the database when the target database is modified according to the data table to be split subsequently.

Optionally, another data table identifying apparatus provided by the embodiment of the present invention may further include: and the node determining unit of the table to be synchronized and the data table determining unit to be synchronized.

And the to-be-synchronized table node determining unit can be used for determining the target table node as the to-be-synchronized table node.

Optionally, the to-be-synchronized table node determining unit may be further configured to determine a second associated node of the at least two table node subgraphs as the to-be-synchronized table node.

And the to-be-synchronized data table determining unit can be used for determining the data table corresponding to the node of the to-be-synchronized table as the to-be-synchronized data table.

Optionally, another data table identifying apparatus provided by the embodiment of the present invention may further include: the present data table determining unit is maintained.

And the maintaining current state data table determining unit is used for determining other table nodes except the table node to be split and the table node to be synchronized in the first database level table association graph as maintaining current state table nodes and determining the data table corresponding to the maintaining current state table nodes as maintaining current state data tables.

Optionally, another data table identifying apparatus provided by the embodiment of the present invention may further include: and a database modification scheme generating unit.

The database modification scheme generating unit may be configured to generate a database modification scheme according to at least one of the to-be-split data table, the to-be-synchronized data table, and the presence data table. So that the target database is transformed according to the database transformation scheme, and the logical splitting of the database is realized. Optionally, the embodiment of the invention can display the generated database modification scheme.

The data table identifying apparatus includes a processor and a memory, where the first database level table association diagram obtaining unit 100, the target table node determining unit 200, the second database level table association diagram obtaining unit 300, the connection diagram determining unit 400, the table node to be split determining unit 500, the data table to be split determining unit 600, and the like are stored as program units, and the processor executes the program units stored in the memory to implement corresponding functions.

The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one, the proportion of the number of the table nodes to be split and the number of the table nodes in the first database level table association graph is in a preset splitting proportion interval by adjusting kernel parameters, and the data table to be split is determined by the table nodes to be split, so that the database can be effectively and scientifically split when the target database is modified according to the data table to be split.

The storage medium provided by the embodiment of the invention stores a program, and the program realizes the data table identification method according to any one of the above when being executed by a processor.

The embodiment of the invention provides a processor which is used for running a program, wherein the data table identification method is executed when the program runs.

The embodiment of the invention provides electronic equipment, which comprises at least one processor, at least one memory connected with the processor and a bus; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the data table identification method of any of the above.

The electronic device herein may be a server, a PC, a PAD, a mobile phone, etc.

The present application also provides a computer program product adapted to perform a program initialized with the above-mentioned data sheet identification method steps when executed on an electronic device.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, the electronic device includes one or more processors (CPUs), memory, and a bus. The electronic device may also include input/output interfaces, network interfaces, and the like.

The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip. Memory is an example of a computer-readable medium.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A method of data table identification, comprising:

2. The method according to claim 1, wherein determining the table node in the at least one independent connectivity graph in the second database level table association graph as the table node to be split, wherein a ratio of the number of table nodes to be split to the number of table nodes in the first database level table association graph is in a preset split ratio interval, includes:

3. The method as recited in claim 2, further comprising:

4. The method as recited in claim 1, further comprising:

5. The method as recited in claim 1, further comprising:

determining the target table node as a table node to be synchronized;

6. The method according to claim 3 or 4, further comprising:

7. The method of claim 1, wherein obtaining a first database-level table association graph associated with a target database comprises:

and generating a first database-level table association diagram associated with the target database according to the association relation between at least two data tables.

8. A data sheet identification apparatus, comprising: a first database level table association diagram obtaining unit, a target table node determining unit, a second database level table association diagram obtaining unit, a connected diagram determining unit, a table node determining unit to be split and a data table determining unit to be split,

9. A storage medium having a program stored thereon, wherein the program, when executed by a processor, implements the data table identification method according to any one of claims 1 to 7.

10. An electronic device comprising at least one processor, and at least one memory, bus connected to the processor; the processor and the memory complete communication with each other through the bus; the processor is configured to invoke program instructions in the memory to perform the data table identification method of any of claims 1 to 7.