WO2023130771A1

WO2023130771A1 - Data management method and apparatus, and electronic device and storage medium

Info

Publication number: WO2023130771A1
Application number: PCT/CN2022/121315
Authority: WO
Inventors: 张聪; 严茂胜; 王一涵; 周剑
Original assignee: 中移(成都)信息通信科技有限公司; 中国移动通信集团有限公司
Priority date: 2022-01-05
Filing date: 2022-09-26
Publication date: 2023-07-13
Also published as: CN116450637A

Abstract

Disclosed in the embodiments of the present application are a data management method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring a data stream to be processed, wherein said data stream comprises several database tables; determining order information and key field information of the several database tables; and generating a data stream model according to the order information and the key field information of the several database tables, wherein the data stream model is used for representing the association relationship between the several database tables. In this way, by means of a data stream model which is generated on the basis of order information and key field information corresponding to several database tables in a data stream to be processed, the efficient management of the several database tables can be realized, such that the manual maintenance cost is reduced, and querying is also facilitated, thereby improving the data management efficiency.

Description

A data management method, device, electronic device and storage medium

Cross References to Related Applications

This application claims the priority of the Chinese patent application with the application number 202210006368.3 and the application name "a data management method, device, electronic device and storage medium" submitted to the China Patent Office on January 05, 2022, and the entire contents of which are passed References are incorporated in this application.

technical field

The present application relates to the technical field of data management, and in particular to a data management method, device, electronic equipment and storage medium.

Background technique

Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources. Here, metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; Efficient discovery, search, integrated organization and efficient management of resource usage.

As the business logic of business systems becomes increasingly complex, how to effectively manage metadata information under massive data has become an urgent problem to be solved. At present, for the management of metadata, it is mainly to enter the metadata information one by one into the document through manual sorting, and it is also necessary to establish a standard data dictionary model, resulting in high manual maintenance costs and a large query workload, which in turn leads to low efficiency.

Contents of the invention

The present application provides a data management method, device, electronic equipment, and storage medium, which can realize efficient management of several database tables through a data flow model, reduce manual maintenance costs, facilitate query, and improve data management efficiency.

The technical scheme of the present application is realized like this:

In the first aspect, the embodiment of the present application provides a data management method, the method includes:

Obtain a data stream to be processed, the data stream to be processed includes several database tables;

Determine the sequence information and key field information of the several database tables;

A data flow model is generated according to the sequence information and key field information of the several database tables; wherein, the data flow model is used to characterize the association relationship among the several database tables.

In the second aspect, the embodiment of the present application provides a data management device, including an acquisition unit, a determination unit, and a generation unit, wherein,

The acquiring unit is configured to acquire a data stream to be processed, and the data stream to be processed includes several database tables;

The determining unit is configured to determine sequence information and key field information of the several database tables;

The generation unit is configured to generate a data flow model according to the sequence information and key field information of the several database tables; wherein the data flow model is used to characterize the association relationship between the several database tables.

In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, wherein,

said memory for storing a computer program capable of running on said processor;

The processor is configured to execute the data management method as described in the first aspect when running the computer program.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the data management method as described in the first aspect is implemented.

A data management method, device, electronic device, and storage medium provided by an embodiment of the present application, the method includes: obtaining a data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and Key field information; generate a data flow model according to the order information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.

Description of drawings

FIG. 1 is a schematic flow diagram of a data management method provided in an embodiment of the present application;

FIG. 2 is a schematic flow diagram of another data management method provided in the embodiment of the present application;

FIG. 3 is a schematic flowchart of another data management method provided in the embodiment of the present application;

FIG. 4 is a schematic diagram of the composition and structure of a data management device provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of the composition and structure of an electronic device provided in an embodiment of the present application;

FIG. 6 is a schematic diagram of the composition and structure of another electronic device provided by the embodiment of the present application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the specific embodiments described here are only used to explain the related application, not to limit the application. It should also be noted that, for the convenience of description, only the parts related to the relevant application are shown in the drawings.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.

In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.

It should be pointed out that the term "first\second\third" involved in the embodiment of this application is only to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, "first\second\third" Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein.

Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources. Metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; realize effective discovery of information resources, Finding, integrated organization and efficient management of resource usage.

Since metadata is also data, it can be stored and retrieved in the database in a similar way to data. If the organization that provides the data element also provides the metadata describing the data element, the use of the data element will become accurate and efficient. When users use data, they can first check its metadata so that they can obtain the information they need.

With the business logic of the business system becoming more and more complex, especially since the popularity of agile development, a project can be divided into multiple small projects that are interrelated and can run independently, and they are completed separately, which brings great impact on the data quality and consistency of the system. In this context, a set of metadata management methods is needed to ensure the quality of business system data and subsequent maintainability.

At present, one solution is to manage business system metadata through data dictionary or document management. Among them, the data dictionary is a collection of information describing data and a collection of definitions for all data elements used in the system. In traditional relational databases (such as Oracle, MySQL), there are some data dictionary tables to store table information, field information, indexes, constraint information, etc. And so on, the information in the data dictionary table will be updated accordingly. The database data dictionary is not only the center of every database, but also very important information for every user. Another solution is to use documents (such as Excel, Word) to manage metadata. Among them, document management metadata needs to establish a standard data dictionary model to manage the definition and description information of tables and fields in the database, and special personnel need to be arranged to maintain and update document information to ensure the consistency between the data dictionary document and the database table structure sex.

However, when using the data dictionary for metadata management, it is necessary to establish a physical foreign key to record the relationship between tables. The advantage of the physical foreign key is that data that does not meet the foreign key constraints cannot be entered into the system. The generation of garbage data is avoided. However, there are also big problems with physical foreign keys. First, it increases the difficulty of system development, second, it increases the difficulty of data processing, and third, it has a greater impact on system performance. Therefore, most enterprises usually do not choose to establish physical foreign keys, but add verification of foreign key constraints in system development. In addition, the data dictionary cannot describe some complex business scenarios. The foreign key relationship in the data dictionary can only describe the relationship between one child table and one parent table. In actual business, there may be a relationship between one child table and multiple parent tables. . When using document management for metadata management, the cost of manual maintenance is too high, and it is prone to the situation that the document is not updated in time and the system version is inconsistent. In addition, the query workload is often large, and it is necessary to check multiple places in the document to confirm the relationship between tables, resulting in complex queries and low efficiency. In addition, for the above two existing solutions, there is still a problem that only the association relationship between two tables can be queried at one time, and the upstream and downstream relationships of multiple tables cannot be queried at one time.

Based on this, the embodiment of the present application provides a data management method. The basic idea of the method is: obtain the data stream to be processed, which includes several database tables; determine the sequence information and key fields of several database tables information; according to the sequence information and key field information of several database tables, a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.

Various embodiments of the present application will be described in detail below with reference to the accompanying drawings.

In an embodiment of the present application, refer to FIG. 1 , which shows a schematic flowchart of a data management method provided in an embodiment of the present application. As shown in Figure 1, the method may include:

S101. Obtain a data stream to be processed, where the data stream to be processed includes several database tables.

S102. Determine sequence information and key field information of several database tables.

S103. Generate a data flow model according to the sequence information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.

It should be noted that the embodiment of the present application provides a data management method, which may specifically refer to a metadata management method. The method can be applied to a data management device, or an electronic device integrated with the data management device. Here, the electronic device may be, for example, a computer, a smart phone, a tablet computer, a notebook computer, a palmtop computer, a personal digital assistant (Personal Digital Assistant, PDA), a navigation device, a server, etc., which are not specifically limited in this embodiment of the present application.

It should also be noted that, for a certain business process in the business system, data generated in the business process can be stored in individual tables of the database, and these tables are called database tables. These database tables form the data stream to be processed, and each database table represents a certain stage in the business process. Here, the database tables are metadata, and the data management method provided by the embodiment of the present application can efficiently manage these metadata, and clearly represent the association relationship between various database tables.

Taking a simple commodity ordering business as an example, in the business process of this business, refer to Table 1 and Table 2 for the database tables included in the corresponding data flow to be processed.

Table 1

Table 2

As shown in Table 1 and Table 2, each database table may include a table name, field names of several fields, field descriptions, field types, and whether the fields are primary keys. Among them, table 1 is the product table, and table 2 is the order table.

Take table 1 as an example, the table name is product, table 1 includes three fields, and the field names of the three fields are: product_no, product_name and price. Among them, product_no represents the product number, and the field type is integer (int), which is the primary key of Table 1; product_name represents the product name, and the field type is a variable-length string (varchar(100)); price represents the unit price, and the field type is decimal (decimal(10,2)).

The table name of Table 2 is order. In Table 2, order_no and item_no are the primary keys of Table 2. At this time, order_no and item_no can form the joint primary key of Table 2.

In some embodiments, determining the sequence information and key field information of several database tables may include:

Determine the sequence of several database tables in the data stream to be processed, and generate sequence information of several database tables according to the sequence;

An inflow field and an outflow field corresponding to each of the several database tables are determined, and key field information of the several database tables is determined according to the inflow field and the outflow field corresponding to each database table.

It should be noted that, in the embodiment of the present application, the sequence information of the database tables indicates the sequence in which the database tables are generated in the data stream to be processed. Therefore, first determine the sequence of several database tables in the data stream to be processed, and then generate the sequence information of each database table according to the sequence. Here, the sequence information can be represented by a natural number that increases sequentially from 1.

In addition, for a data stream to be processed, there may be two or more database tables in the same order. For example, when purchasing, the object of purchasing can be product A or product B; among them, product A corresponds to the database Table A, product B correspond to database table B. At this time, database table A and database table B have the same sequence information.

For the data stream to be processed composed of Table 1 and Table 2, the sequence may be to form an order table for the product according to the product table, that is, Table 1 is generated first, and Table 2 is generated later. Then the sequence information corresponding to Table 1 is 1, and the sequence information corresponding to Table 2 is 2.

It should also be noted that in the embodiment of the present application, when determining the key field information, the inflow fields and outflow fields corresponding to several database tables can be respectively determined, and the keywords of the database tables can be determined according to the inflow fields and outflow fields segment information. That is to say, the key field information of the database table may include the inflow field and the outflow field of the database table.

Wherein, the outflow field is the primary key field of the database table, the inflow field is the primary key field of the inflow table of the database table, and the inflow table is usually the previous database table whose sequence information is adjacent to the database table. Here, the primary key field can be one or more fields in the database table, and its value can be used to uniquely identify the database table.

Further, for each database table, the key field information may also include: an inflow table and a receiving field.

It should be noted that the key field information corresponding to the database table can also include the name of the inflow table and the receiving field; wherein, the name of the inflow table is the name field of the inflow table of the database table, and the receiving field is the associated field of the inflow table (usually the database The foreign key field of the table), the receiving field corresponding to the database table is associated with the inflow field of the inflow table.

In some embodiments, the data flow model is generated according to the sequence information and key field information of several database tables, which may include:

Generate several data flow nodes according to several database tables;

According to the sequence information and key field information of several database tables, several data flow nodes are connected in series to obtain a data flow model.

It should be noted that, according to several database tables, several data flow nodes can be generated. Among them, one database table can generate one or more data flow nodes. In the case of a database table corresponding to a parent table, a database table corresponds One data flow node is generated; in the case of one database table corresponding to multiple parent tables, one database table corresponds to multiple data flow nodes.

The content of a data flow node includes but not limited to the sequence information and key field information of the database table. Then, according to the sequence information and key field information of each database table, the obtained data flow nodes are sequentially connected together to obtain the data flow model.

In the data flow model, the data flow nodes are serially connected according to the order information of the database tables corresponding to each data flow node, and for two data flow nodes with adjacent order information, the outflow field of the previous data flow node is The inflow field of the latter data flow node forms the association between the two data flow nodes, and clearly shows the association relationship of the database tables.

In some embodiments, determining the inflow field and outflow field corresponding to each database table in several database tables may include:

Determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table;

Determining that the first database table corresponds to the second database table of the previous data flow node, determining the second database table as the inflow table corresponding to the first database table, and using the primary key field of the inflow table as the inflow field corresponding to the first database table;

Wherein, the first database table is any one of several database tables.

It should be noted that, for any one of several database tables (called the first database table), when determining its inflow field and outflow field, the primary key field of the first database table can be determined first, and the primary key field Determined as the outflow field of the first database table; at the same time, determine the second database table of the previous data flow node corresponding to the first database table, that is, the second database table is adjacent to the first database table and located in the first database for the order information The database table before the table, that is, the inflow table of the first database table, uses the primary key field of the inflow table as the inflow field corresponding to the first database table.

In some embodiments, the method may also include:

When the first database table is at the initial data flow node of the data stream to be processed, it is determined that the inflow table and the inflow field corresponding to the first database table are both empty;

When the first database table is at the end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.

It should be noted that, for the first database table, if it is the start of the data flow to be processed, and there is no inflow table, it is determined that the corresponding inflow table and inflow field are both empty (null), and at the same time, its The receiving field is also empty; if the first database table is the end of the data stream to be processed, then there is no outgoing field, and its corresponding outgoing field is empty.

For Table 1, its outflow field is its primary key field product_no. Since it is the start of the data stream to be processed, there is no inflow of database tables. Therefore, the inflow field, inflow table and receiving field of Table 1 are all empty. For Table 2, its inflow table is Table 1, and its inflow field is the primary key field product_no of Table 1. Since Table 2 is the end of the data flow to be processed, its outflow field is empty.

See Table 3, which is a data flow model generated based on Table 1 and Table 2 (also referred to as a data flow model table).

table 3

flow_nameflow_name	seq_noseq_no	tab_nametab_name	out_colout_col	get_colget_col	in_tab_namein_tab_name	in_colin_col
产品订货流product order flow	11	productproduct	product_noproduct_no	nullnull	nullnull	nullnull
产品订货流product order flow	22	orderorder	nullnull	product_noproduct_no	productproduct	product_noproduct_no

The data flow model shown in Table 3, each of which is a data flow node of the data flow model, Table 3 includes two data flow nodes, one is the data flow node corresponding to Table 1 (ie the second row of Table 3 ), and the other is the data flow node corresponding to Table 2 (that is, the third row of Table 3).

In Table 3, flow_name represents the name of the data flow model, usually the name of the data flow it represents to be processed. Since Table 3 represents the product order flow, its flow_name is the product order flow; seq_no represents the data flow node corresponding The sequence information of the database table; tab_name indicates the table name of the database table corresponding to the data flow node; out_col indicates the outflow field of the data flow node, that is, the outflow field of the database table corresponding to the data flow node; get_col indicates the receiving field of the data flow node, That is, the outflow field of the database table corresponding to the data flow node; in_tab_name indicates the inflow table of the data flow node, that is, the name of the inflow table of the database table corresponding to the data flow node; in_col indicates the inflow field of the data flow node, that is, the data The inflow field of the database table corresponding to the flow node.

As shown in Table 3, in addition to sequence information and key field information, the data flow model may also include a data flow name (flow_name), which is used to indicate the data flow to be processed corresponding to the data flow model.

It can be seen that through Table 3, the relationship between Table 1 and Table 2 can be characterized. For example, the inflow and outflow relationship between Table 1 and Table 2 is that Table 1 flows into Table 2, and the inflow fields, outflow fields, and receiving fields of each table, etc.

It should also be noted that the data flow model generated by the data management method can realize the management of database tables in a certain business process. Here, the business process may not have been actually performed; that is, the embodiment of the present application can generate a data flow model at any time before or after the business is started.

It should also be noted that the basic idea of the embodiment of the present application is to regard the database table as each node in the database, and connect the nodes in series according to the sequence of data generation and the relationship between the database tables to form a The data flow of the business process, and then store the information in the data flow in the database according to a certain data structure to obtain a data flow model, which includes several data flow nodes. Exemplarily, as shown in Table 4, it shows a table structure and example data of a data flow model provided by the embodiment of the present application. It mainly describes the content included in a data flow node in the data flow model, as well as the field description, field type and field attribute of each field of the data flow node.

Table 4

As shown in Table 4, data_flow represents a data flow. For a data flow to be processed, the data flow nodes included in the generated data flow model may include the following contents: flow_name, seq_no, tab_name, out_col, get_col, in_tab_name and in_col.

flow_name, seq_no, tab_name, and in_tab_name are the primary keys of the data flow model, and the three can be used as the joint primary key of the data flow model; seq_no indicates the order information of tab_name (the database table corresponding to the data flow node) in the data flow to be processed, and also It is called the serial number; in_tab_name is the name of the inflow table (also called the name of the upstream node table), in_col is the primary key field of in_tab_name; out_col is the outflow field of the data flow node (usually the primary key field of the data flow node), get_col is the base The node receives the associated field (usually the foreign key field of the data flow node) of the upstream node (that is, the data flow node corresponding to the inflow table), and a complete data flow model is multiple data_flow tables of the same data flow to be processed (that is, the data flow node) record composition.

Table 2, Table 3, and Table 4 show the process of generating a data flow model from the data stream to be processed in a simple business scenario. In practice, there are often more complex scenarios. For example, a child table corresponds to the business scenario of multiple parent tables.

Therefore, in some embodiments, generating several data flow nodes according to several database tables may include:

If the first database table corresponds to a parent table, then generate a data flow node according to the first database table;

If the first database table corresponds to at least two parent tables, at least two data flow nodes are generated according to the first database table, and the order information of at least two data flow nodes is the same;

Wherein, the first database table is any one of several database tables.

It should be noted that any one of the several database tables is marked as the first database table, and the parent table represents the inflow table of the first database.

For the first database table (denoted as a child table), its corresponding inflow table is denoted as a parent table. If a child table corresponds to only one parent table, that is, there is only one inflow table in the first database table, as in Tables 2 to 4 above, the first database table corresponds to a data flow node.

If a child table corresponds to at least two parent tables, that is, there are multiple inflow tables in the first database table, at least two data flow nodes corresponding to the first database table are respectively generated, and the order information of the at least two data flow nodes is the same, However, the inflow information is different (the inflow information may include the aforementioned: inflow table name, inflow field and receiving field), and the number of at least two data flow nodes corresponding to the child table is the same as the number of at least two parent tables.

Exemplarily, taking the data flow to be processed as an example of a material procurement flow, the database tables included in it are the following tables 5-8.

table 5

Table 6

Table 7

Table 8

Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Among them, Table 5 is the table of engineering components, Table 6 is the table of standard parts, Table 7 is the table of technical requirements, and Table 8 is the table of purchase orders. Engineering parts are special materials for a certain project, and standard parts are common materials for all projects. Since the characteristic attributes of the two materials are quite different, they are stored in two tables.

In this kind of business, the technical requirements may be for engineering components, or for standard parts, so Table 5 and Table 6 are the parent tables of Table 7. At this time, when generating the data flow model, it is Two dataflow nodes need to be generated for Table 7. The final generated data flow model is shown in Table 9.

Table 9 Example data of material procurement flow in a manufacturing industry

As shown in Table 9, since both Table 5 and Table 6 are the start of the material procurement flow, the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (sequence information) to ensure that the primary keys of the two tables are not the same conflict. The material procurement process in this industry is that the technical department first creates a technical demand list, specifies the materials and quantities to be purchased, and then submits it to the purchasing department to create a purchase order and initiate the procurement.

There are two initial data flow nodes in the data flow model, proj_mat and std_mat, so there are two records with seq_no as 1 in the data flow model, and the receiving field mat_no of tech_mat_req comes from the outflow field proj_mat_no of proj_mat, or the outflow from std_mat The field std_mat_no, so there are two records with seq_no as 2 in the data flow model.

In addition, since the primary key of tech_mat_req is the joint primary key composed of tmr_no and tmr_item_no, the output field of the table tech_mat_req and the input field and receiving field of the table purchase_order are expressed in the tuple format of the joint primary key. Tables 5 to 9 illustrate how to use the data flow model to represent complex business scenarios where one child table corresponds to multiple parent tables.

Furthermore, in reality, there is a scenario where one data stream to be processed corresponds to multiple business processes. For this kind of complex data stream to be processed, it can also be processed in the manner of first splitting and then merging. Referring to FIG. 2 , it shows a schematic flowchart of another data management method provided by an embodiment of the present application. As shown in Figure 2, the method may include:

S201. Perform split processing on several database tables included in the data stream to be processed to obtain at least two groups of database tables.

S202. Determine sequence information and key field information of each group of database tables in at least two groups of database tables.

S203. Generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; each data flow sub-model is used to represent the association relationship between each group of database tables.

When it needs to be explained, in some complex business scenarios, although the data flow to be processed corresponds to the business process of the same target business, even if it is the same target business, there may be different differences when the business is in progress. The business process, that is, the data flow to be processed corresponds to at least two business processes.

In this case, the embodiment of the present application may also perform split processing on the database tables included in the data stream to be processed, that is, split the database tables in the data stream to be processed into at least two groups of database tables. Here, for a database table that participates in multiple business processes at the same time, the database table is included in each database table during distribution.

For each group of database tables obtained by splitting, the order information and key field information of each group of database tables are respectively determined, and corresponding data flow sub-models are generated accordingly. Each data flow sub-model is used to represent the relationship between each group of database tables.

For each group of database tables, the method of determining the data flow sub-model is as described above.

Exemplarily, the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through the technical demand form, and the other is to generate an inquiry form first and then generate a purchase order through the technical demand form. This scenario starts with a technical demand sheet, passes through the two branches of inquiry and non-inquiry, and finally merges into the purchase order. Regarding this scenario, the business process should be attributed to two business flows, that is, the data flow to be processed is divided into Inquiry procurement flow and non-inquiry procurement flow.

Taking this material procurement scenario of inquiry and non-inquiry as an example, the database tables included in the data flow to be processed are Table 7 and the following Tables 10-14.

Table 10

Table 11

Table 12

Table 13

Table 14

Among them, Table 7 is the technical demand list, Table 10 is the inquiry form; Table 11 is the technical demand and inquiry association table (tmr and enq association table); Table 12 is the inquiry and procurement association table (enq and po association table ); Table 13 is the association table between technical requirements and procurement (the association table between tmr and po), and Table 14 is the purchase order table.

It should be noted that for the material procurement flow with two business processes of inquiry and non-inquiry, there may be a many-to-many relationship between the technical demand list, the inquiry list and the purchase order, which needs to be carried out through several association tables. Association to distinguish the process of inquiry and non-inquiry. And the tmr_no and tmr_item_no fields are no longer needed in the purchase order table.

Divide the data flow to be processed composed of Table 7 and Tables 10 to 14, and the result of the diversion is two sets of data flows: the inquiry procurement flow and the non-inquiry procurement flow, where the inquiry procurement flow includes Table 7, Table 11, Table 10. Table 12 and Table 14 correspond to the material procurement process in the inquiry scenario; the non-inquiry procurement flow includes Table 7, Table 13 and Table 14, which correspond to the material procurement process in the non-inquiry scenario.

Table 15 shows the data flow sub-model obtained according to the RFQ procurement flow.

Table 15

Table 16 shows the data flow sub-model obtained according to the non-inquiry procurement flow.

Table 16

As shown in Tables 15 and 16, dividing the database tables corresponding to the business processes of inquiry and non-inquiry into two data streams can describe the business process more clearly. The data streams to be processed are divided into two sets of database tables, and Two data flow sub-models are obtained, and the database tables corresponding to the start data flow nodes of the two data flow sub-models are both Table 7, and the database tables corresponding to the end data flow nodes are both Table 14, that is, both have the same The starting point, after diverging, finally merges into the same end point.

Further, in some embodiments, the method may also include:

Merge processing is performed on at least two data flow sub-models to obtain a data flow model.

It should be noted that after the two data flow sub-models are obtained, the two data flow sub-models can also be merged, such as spliced or saved in the same directory, and at least two data flow sub-models can be merged to obtain this Data flow model in complex scenarios.

This example illustrates how to use the data flow model to represent the scene of data splitting and then merging. For a very complex data flow to be processed, if there is splitting and remerging in the middle stage, the pending data flow can be divided into multiple stages. To deal with, when naming the data stream, you can use "XX stream-XX stage" to distinguish, and the database tables for the split are in separate data streams. For example, name the data flow corresponding to Table 15: material procurement flow-inquiry stage, and name the data flow corresponding to Table 16: material procurement flow-no inquiry stage.

Further, after the data flow model is obtained, the data flow model can also be compared with the data dictionary to ensure that the information is correct. Therefore, in some embodiments, after generating the data flow model, the method may further include:

Compare the data information in the data flow model with the data dictionary; wherein, the data dictionary stores the data information corresponding to the data flow to be processed;

If the data information in the data flow model is inconsistent with the data information in the data dictionary, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data in the data dictionary The message is consistent.

It should be noted that the data dictionary stores the business data of the target business corresponding to the data flow to be processed. After obtaining the data flow model corresponding to the data flow to be processed, compare and check the data flow model and the data dictionary. If the two information is consistent , it means that the generated data flow model is correct; if the two information is inconsistent, the data flow information in the data flow model is corrected based on the data information in the data dictionary, so that the data flow model is consistent with the data information in the data dictionary , so as to obtain an accurate data flow model.

Furthermore, based on the generated data flow model, the database table corresponding to each data flow node can also be queried, and the upstream and downstream relationship of any data flow node can be queried. Therefore, in some embodiments, after generating the data flow model, the method may further include:

Determine the information to be queried;

Perform a query in the data flow model based on the information to be queried, determine a database table corresponding to the information to be queried, and/or determine a data flow node corresponding to the information to be queried and an association relationship between the data flow nodes.

It should be noted that, firstly, the information to be queried is determined. Here, the sequence information and/or key field information can be queried in the data flow model as the information to be queried, so that at least one data flow node corresponding to the information to be queried can be obtained, and The database table corresponding to the data flow node is further obtained, and at the same time, the association relationship between the data flow node and other data flow nodes can also be obtained. The association relationship may include the upstream and downstream relationship between data flow nodes, the association between key field information, etc., wherein the data flow node flowing into the data flow node is called the upstream data flow node, and the data flow node is referred to as The incoming data flow node is called a downstream data flow node; for example, in the aforementioned Table 3, the product order flow 1 in the second row is the upstream data flow node of the product order flow 2 in the third row, and the product order flow 2 is the product order flow 1's downstream data flow node. When querying, you can use Structured Query Language (SQL statement) to conveniently query the upstream and downstream relationships and complex logical associations of multiple database tables.

This embodiment provides a data management method, by obtaining the data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and key field information of several database tables; according to the sequence information of several database tables and key field information to generate a data flow model; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management; in addition, for scenarios where one child table corresponds to multiple parent tables and the data flow to be processed is too complex, the embodiment of this application can also generate multiple data flow nodes for one data flow table, or By splitting the data streams to be processed, obtaining the sub-models of the data streams separately, and then merging them, the data streams to be processed in these complex scenarios can be converted into a concise and clear data stream model for efficient management of the data streams.

In another embodiment of the present application, refer to FIG. 3 , which shows a schematic flowchart of another data management method provided by the embodiment of the present application. As shown in Figure 3, the method may include:

S301. Determine the data flow to be processed.

It should be noted that a data flow is a description of a business process, and in this embodiment of the application, each data flow to be processed generally corresponds to only one business process. To determine the data flow to be processed is to determine the business process. The data flow to be processed can be named according to the business process, for example: product order flow, material procurement flow, etc., and the name can be used as the unique identifier of the data flow to be processed.

S302. Determine all database tables included in the data stream to be processed.

It should be noted that after the data flow to be processed is determined, all database tables involved in the business process corresponding to the data flow to be processed need to be determined.

It should also be noted that the same database table may exist in multiple different data streams to be processed, that is, different data streams to be processed may include the same database table. For example, the aforementioned Table 7 exists in two data streams to be processed respectively.

Exemplarily, for the product ordering flow in the aforementioned embodiments, the database tables it includes are Table 1 (product table) and Table 2 (order table); for the material procurement flow in the aforementioned embodiments, the database tables it includes are Table 5 (engineering component table), table 6 (standard part table), table 7 (technical demand list table) and table 8 (purchase order table); for the complex material procurement flow that needs to be diverted-combined in the aforementioned embodiments, The database tables included are Table 7 (Technical Requirements Form), Table 10 (Inquiry Form), Table 11 (Technical Requirements and Inquiry Association Table), Table 12 (Inquiry and Purchase Association Table), Table 13 ( Technical Requirements and Purchasing Association Table) and Table 14 (Purchasing Form).

S303. Determine the sequence of all database tables in the data stream to be processed, number each database table, determine the inflow and outflow field information of each database table, and determine the data flow information.

S304. Write the determined data flow information into the data flow model.

It should be noted that, according to the sequence of database tables generated in the business process, the sequence of all database tables in the data stream to be processed is determined, each database table is numbered, and the inflow and outflow field information of each database table is determined.

The determined data flow information may include sequence information, an inflow field, an outflow field, and may also include a receiving field, an inflow table, and the like.

Specifically, according to the order of the database tables in the data stream to be processed, the sequence number of each database table in the data stream to be processed is generated, that is, the sequence information in the foregoing embodiments. Here, a sequence of numbers from 1 to n can be generated in numerical order and written into the data flow model.

In the data flow model, each row of data is a data flow node in the data flow model. For each data flow node, its outgoing field is the primary key of the database table itself, the receiving field is the foreign key field, and the incoming table is the previous The database table of the node, the inflow field is the primary key of the inflow table (the receiving field of the database table is associated with the inflow field of the inflow table).

The data flow node corresponding to the first database table is the start data flow node, the receiving field, inflow table, and inflow field of the start data flow node are null, and the data flow node corresponding to the last database table is the end data flow node , the outflow field of the end dataflow node is null.

In an implementation of the embodiment of the present application, for a product order flow that only includes two database tables (Table 1 and Table 2), in the data flow model 3, the serial number of the data flow node corresponding to Table 1 is 1, The node number corresponding to Table 2 is 2. Table 3 shows the data flow model corresponding to the business process.

As shown in Table 3, it includes two data flow nodes: product order flow 1 and product order flow 2, product order flow 1 is the data flow node corresponding to table 1 (product table, product table), and product order flow 2 is the table 2 (order table, order table) corresponds to the data flow node.

Since the product order flow 1 corresponding to the product table is the initial data flow node of the data flow model, the get_col, in_tab_name, and in_col of the product order flow 1 are all null; the product order flow 2 corresponding to the order table is the data flow model’s End the data flow node, so the out_col of product order flow 2 is null. Among them, product.product_no and order.product_no are related to each other.

Tables 1 to 3 show a simple business scenario of product order flow. In practice, there are often more complex business scenarios.

In another implementation of the embodiment of the present application, for a complex situation where one child table corresponds to multiple parent tables, such as the material procurement flow, Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Components can be special materials for a certain project, and standard parts can be general materials for all projects. Due to the large difference in the characteristic attributes of the two materials, they are divided into two tables (Table 5 and Table 6) for storage; the material procurement process in this industry is to first create a technical demand list by the technical department, and specify the materials to be purchased and their quantities , and then submit it to the purchasing department to create a purchase order and initiate the purchase. The data flow information of material procurement in this industry is shown in Table 9.

In this case, Table 7 can be associated with Table 5, and Table 7 can also be associated with Table 6, that is, Table 7 is a child table, and there are two parent tables, Table 5 and Table 6. In the database model, the primary keys of the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (that is, order information) to ensure that the primary keys of the two tables do not conflict. As shown in Table 9, since the data flow nodes corresponding to Table 5 and Table 6 can be used as the initial data flow node of the data flow model, the node serial numbers of the two data flow nodes corresponding to Table 5 and Table 6 are both 1 . tech_mat_req.mat_no comes from either proj_mat.proj_mat_no or std_mat.std_mat_no, so there are two records with seq_no 2 in the table tech_mat_req.

In addition, since the primary key of tech_mat_req is the joint primary key of tmr_no and tmr_item_no, the output field of table tech_mat_req and the input field and receiving field of table purchase_order are expressed in tuple format. This example illustrates how to use the data flow model to represent a complex business scenario where one child table corresponds to multiple parent tables.

In yet another implementation of the embodiment of the present application, there may be different business processes in the procurement process of materials. For example, the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through a technical demand list, The second is to first generate an inquiry form through the technical demand form and then generate a purchase order. There may be a many-to-many relationship between tmr, enq, and po, so several association tables are needed, and the purchase order table no longer needs the tmr_no and tmr_item_no fields. This scenario starts with a technical demand sheet, through the two branches of inquiry and non-inquiry, and finally merges into the purchase order. Regarding this scenario, the business can be attributed to two data flow sub-models, as shown in Table 15 and Table 1. 16.

As shown in Tables 15 and 16, dividing inquiry and non-inquiry into two data streams can describe the business process more clearly. This example illustrates how to use the data flow model to represent the scene of data splitting and then merging. For a very complex data flow, if there is splitting and remerging in the middle stage, the data flow can be divided into multiple stages for processing. The named data flow is distinguished by "XX flow-XX stage", and the split information is still in a separate data flow.

Furthermore, when inserting data flow information into the data flow model, it can be realized by the following statement:

This statement can be executed repeatedly until the number of data flow nodes in the data flow model reaches the total number of data flow nodes contained in the data flow model.

When there is an update requirement for data flow services, such as updating data based on the aforementioned determined data flow information (including sequence information, inflow fields, outflow fields, receiving fields, and inflow tables, etc.), it can be implemented through the following statements:

The parameters of this statement can be set according to the aforementioned determined data flow information, and the fields to be updated are all optional, and the data is updated according to the actual situation.

S305. Check the data flow information, and keep consistent with the relevant information of the data dictionary.

After completing the processing of data flow information and obtaining the data flow model, check whether the table name and key field information in the data flow are consistent with the table name and key field information in the data dictionary, and keep the two consistent for subsequent The data flow model is used in conjunction with the data dictionary.

To sum up, the embodiment of the present application provides a data flow model and a method for generating a data flow model. The data flow model can be used to describe the relationship between tables in the business system database, so that the logic of the business system It is easier to understand, and at the same time makes the system easier to maintain and secondary development, and also brings convenience for later integration of system data into the data warehouse. In the embodiment of the present application, a concise data flow model can be used to record the relationship between tables in the database, and can be stored in the database and coexist with the data dictionary. In the data flow model, you can use SQL statements to easily query the upstream and downstream relationships and complex logical associations of multiple database tables. At the same time, the system also avoids various performance problems caused by creating physical foreign keys.

A simple example is used to illustrate the use of the data flow model. Taking commodity ordering as an example, as shown in Tables 1 to 3, the two data in Table 3 are the data flow node information of the product table and the order table of the product order flow. Since the product table is the starting data flow node, get_col, in_tab_name, and in_col are null, the order table is the ending data flow node, so out_col is null, and product.product_no and order.product_no are related to each other.

The above is a simple example of the data flow model. The core of the data flow model lies in the collection of data flow information. The complete data collection process in the embodiment of the present application is as follows: (1) Determine the data flow. (2) Determine all database tables included in the data stream. (3) Determine the sequence of all database tables in the data stream, number each database table, and determine the inflow and outflow fields of each database table in the data stream. (4) Write the determined data flow information into the data flow model. (5) Check the data flow information and keep it consistent with the relevant information of the data dictionary.

In related technical solutions, recording the relationship between tables in the business system database through physical foreign keys or documents in the data dictionary will lead to system performance problems, low efficiency, and error-prone. However, the embodiment of this application proposes a data flow model, which can not only completely record the relationship between database tables and tables, avoid performance problems caused by creating physical foreign keys, but also query data flow information in the database, which is far more efficient than Documentation is more convenient and efficient. In addition, while proposing the data flow model in the embodiment of the present application, it also proposes a data standard for a business scenario in which one child table corresponds to multiple parent tables and the data is split and then merged, so that more complex business scenarios can be recorded, and data Combined with the single-table metadata of the dictionary, the data flow model will have a wider range of application scenarios.

This embodiment provides a data management method. This embodiment is a detailed description of the specific implementation of the foregoing embodiments. It can be seen that, compared with related technologies, the technical solution provided by this embodiment of the application has at least the following advantages: (1) Related technologies use data fields to manage metadata. It is necessary to establish physical foreign keys on tables to record the relationship between tables. Establishing physical foreign keys will make system development more difficult, data processing more difficult, and affect For issues such as system performance, the use of the data flow model avoids the establishment of physical foreign keys during system development, and also avoids various related problems. (2) Related technologies use document management metadata, which requires a large workload for manual maintenance, makes it difficult to consult, and easily causes inconsistencies with system information. Using the data flow model means creating a data flow table in the system database and maintaining data flow information in the database, which can be easily compared with the data dictionary to avoid inconsistencies with the system information, and can also easily query the data flow in the database information. (3) The embodiment of the present application uses a data flow model, which can not only represent the association relationship between two adjacent tables, but also completely represent the sequence association relationship of the entire data link; the establishment of a data flow model is the definition of business logic. The combing process can be carried out simultaneously with the development of the business system to facilitate the discovery of problems in the business logic; the data flow model can also represent complex business scenarios in which one child table corresponds to multiple parent tables, and the data is split and then merged. Compared with related technologies, Wider application scenarios.

In yet another embodiment of the present application, refer to FIG. 4 , which shows a schematic diagram of the composition and structure of a data management device 40 provided in the embodiment of the present application. As shown in FIG. 4, the data management device 40 may include an acquisition unit 401, a determination unit 402 and a generation unit 403, wherein,

The obtaining unit 401 is configured to obtain a data stream to be processed, and the data stream to be processed includes several database tables;

A determination unit 402 configured to determine sequence information and key field information of several database tables;

The generating unit 403 is configured to generate a data flow model according to sequence information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.

In some embodiments, the determining unit 402 is specifically configured to determine the sequence of several database tables in the data stream to be processed, and generate sequence information of several database tables according to the sequence; and determine the sequence information of several database tables The inflow field and outflow field corresponding to each database table, and the key field information of several database tables are determined according to the inflow field and outflow field corresponding to each database table.

In some embodiments, the generation unit 403 is specifically configured to generate several data flow nodes according to several database tables; and to concatenate several data flow nodes according to the sequence information and key field information of several database tables , get the data flow model.

In some embodiments, the determining unit 402 is also specifically configured to determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table; and determine the first database table corresponding to the first data flow node Two database tables, the second database table is determined as the inflow table corresponding to the first database table, and the primary key field of the inflow table is used as the inflow field corresponding to the first database table; wherein, the first database table is one of several database tables any database table.

In some embodiments, the determining unit 402 is further configured to determine that both the inflow table and the inflow field corresponding to the first database table are empty when the first database table is at the start data flow node of the data flow to be processed; and When the first database table is at the end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.

In some embodiments, the generation unit 403 is further specifically configured to generate a data flow node according to the first database table if the first database table corresponds to one parent table; and if the first database table corresponds to at least two parent tables, then At least two data flow nodes are generated according to the first database table, and the sequence information of the at least two data flow nodes is the same; wherein, the first database table is any one of several database tables.

In some embodiments, as shown in FIG. 4 , the data management device may further include a splitting unit 404 configured to split several database tables included in the data stream to be processed to obtain at least two groups of database tables;

The determination unit 402 is further configured to determine the sequence information and key field information of each group of database tables in at least two groups of database tables;

The generation unit 403 is further configured to generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; wherein, each data flow sub-model is used to represent each group of database tables relationship.

In some embodiments, as shown in FIG. 4 , the data management apparatus may further include a merging unit 405 configured to perform merging processing on at least two data flow sub-models to obtain a data flow model.

In some embodiments, as shown in FIG. 4 , the data management device may further include a comparison unit 406 configured to compare the data information in the data flow model with the data dictionary; and if the data information in the data flow model is consistent with the data If the data information in the dictionary is inconsistent, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data information in the data dictionary.

In some embodiments, as shown in FIG. 4 , the data management device may further include a query unit 407 configured to determine the information to be queried; and perform a query in the data flow model based on the information to be queried, and determine the database corresponding to the information to be queried table, and/or, determine the data flow node corresponding to the information to be queried and the association relationship between the data flow nodes.

It can be understood that, in this embodiment, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular. Moreover, each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.

If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.

Therefore, this embodiment provides a computer storage medium, where the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the steps of the data processing method described in any one of the preceding embodiments are implemented.

Based on the composition of the above-mentioned data management apparatus 40 and the computer storage medium, refer to FIG. 5 , which shows a schematic diagram of the composition and structure of an electronic device 50 provided by an embodiment of the present application. As shown in FIG. 5 , it may include: a communication interface 501 , a memory 502 and a processor 503 ; each component is coupled together through a bus system 504 . It can be understood that the bus system 504 is used to realize connection and communication between these components. In addition to the data bus, the bus system 504 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 504 in FIG. 5 . Among them, the communication interface 501 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;

memory 502, used to store computer programs that can run on the processor 503;

The processor 503 is configured to, when running the computer program, execute:

Obtain the data stream to be processed, which includes several database tables;

Determine the sequence information and key field information of several database tables;

According to the sequence information and key field information of several database tables, a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables.

It can be understood that the memory 502 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous chain dynamic random access memory (Synchronous link DRAM, SLDRAM ) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). Memory 502 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.

The processor 503 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 503 or instructions in the form of software. The above-mentioned processor 503 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 502, and the processor 503 reads the information in the memory 502, and completes the steps of the above method in combination with its hardware.

It should be understood that the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices used to perform the functions described in this application electronic unit or its combination.

For a software implementation, the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein. Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.

Optionally, as another embodiment, the processor 503 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.

Based on the composition of the data processing apparatus 40 described above, refer to FIG. 6 , which shows a schematic diagram of the composition and structure of another electronic device 50 provided by the embodiment of the present application. As shown in FIG. 6 , the electronic device 50 at least includes the data management apparatus 40 described in any one of the foregoing embodiments.

For the electronic device 50, due to the data flow model generated based on the order information and key field information of several database tables in the data stream to be processed, it can not only realize efficient management of these several database tables, but also reduce manual maintenance. cost, and can also be applied to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids the performance problems caused by creating physical foreign keys, and can also be used in the data flow model It is convenient to query the data flow information in the middle, which improves the efficiency of data management.

The above descriptions are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application.

It should be noted that in this application, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements , but also includes other elements not expressly listed, or also includes elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

The methods disclosed in several method embodiments provided in this application can be combined arbitrarily to obtain new method embodiments under the condition of no conflict.

The features disclosed in several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.

The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.

The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Industrial Applicability

In the embodiment of this application, the data flow model generated based on the sequence information and key field information of several database tables in the data flow to be processed can not only realize efficient management of these several database tables, but also reduce the cost of manual maintenance , and can also apply to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids performance problems caused by creating physical foreign keys, and can also be used in the data flow model The data flow information is conveniently queried, which improves the efficiency of data management.

Claims

A data management method comprising:

Obtain a data stream to be processed, the data stream to be processed includes several database tables;

Determine the sequence information and key field information of the several database tables;

A data flow model is generated according to the sequence information and key field information of the several database tables; wherein, the data flow model is used to characterize the association relationship among the several database tables.
The method according to claim 1, wherein said determining the order information and key field information of said several database tables comprises:

Determining the sequence of the several database tables in the data stream to be processed, and generating sequence information of the several database tables according to the sequence;

Determine the inflow field and outflow field corresponding to each database table in the several database tables, and determine the key field information of the several database tables according to the inflow field and outflow field corresponding to each database table.
The method according to claim 2, wherein said generating a data flow model according to the sequence information and key field information of said several database tables comprises:

Generate several data flow nodes according to the several database tables;

According to the order information of the several database tables and the key field information, the several data flow nodes are concatenated to obtain the data flow model.
The method according to claim 3, wherein said determining the inflow field and outflow field corresponding to each of the several database tables comprises:

Determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table;

Determining that the first database table corresponds to the second database table of the previous data flow node, determining the second database table as the inflow table corresponding to the first database table, and using the primary key field of the inflow table as the The inflow field corresponding to the first database table;

Wherein, the first database table is any one of the several database tables.
The method according to claim 4, wherein the method further comprises:

When the first database table is at the start data flow node of the data flow to be processed, it is determined that both the inflow table and the inflow field corresponding to the first database table are empty;

In a case where the first database table is at an end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.
The method according to claim 3, wherein said generating several data flow nodes according to said several database tables comprises:

If the first database table corresponds to a parent table, then generate a data flow node according to the first database table;

If the first database table corresponds to at least two parent tables, at least two data flow nodes are generated according to the first database table, and the sequence information of the at least two data flow nodes is the same;

Wherein, the first database table is any one of the several database tables.
The method according to claim 1, wherein the method further comprises:

Perform splitting processing on several database tables included in the data stream to be processed to obtain at least two groups of database tables;

determining sequence information and key field information for each set of database tables in the at least two sets of database tables;

According to the sequence information and key field information of each group of database tables, at least two data flow sub-models are generated; wherein each data flow sub-model is used to represent the association relationship between each group of database tables.
The method according to claim 7, wherein the method further comprises:

Perform merge processing on the at least two data flow sub-models to obtain the data flow model.
The method according to any one of claims 1 to 8, wherein, after said generating the data flow model, said method further comprises:

comparing the data information in the data flow model with a data dictionary;

If the data information in the data flow model is inconsistent with the data information in the data dictionary, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data flow model The data information in is consistent with the data information in the data dictionary.
The method according to any one of claims 1 to 8, wherein, after said generating the data flow model, said method further comprises:

Determine the information to be queried;

Perform a query in the data flow model based on the information to be queried, determine a database table corresponding to the information to be queried, and/or determine a data flow node corresponding to the information to be queried and an association between data flow nodes relation.
A data management device, including an acquisition unit, a determination unit and a generation unit, wherein,

The acquiring unit is configured to acquire a data stream to be processed, and the data stream to be processed includes several database tables;

The determining unit is configured to determine sequence information and key field information of the several database tables;

The generation unit is configured to generate a data flow model according to the sequence information and key field information of the several database tables; wherein the data flow model is used to characterize the association relationship between the several database tables.
An electronic device comprising a memory and a processor, wherein,

said memory for storing a computer program capable of running on said processor;

The processor is configured to execute the data management method according to any one of claims 1 to 10 when running the computer program.
A computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the data management method according to any one of claims 1 to 10 is realized.