WO2023130771A1 - Data management method and apparatus, and electronic device and storage medium - Google Patents

Data management method and apparatus, and electronic device and storage medium Download PDF

Info

Publication number
WO2023130771A1
WO2023130771A1 PCT/CN2022/121315 CN2022121315W WO2023130771A1 WO 2023130771 A1 WO2023130771 A1 WO 2023130771A1 CN 2022121315 W CN2022121315 W CN 2022121315W WO 2023130771 A1 WO2023130771 A1 WO 2023130771A1
Authority
WO
WIPO (PCT)
Prior art keywords
data flow
data
information
database
database tables
Prior art date
Application number
PCT/CN2022/121315
Other languages
French (fr)
Chinese (zh)
Inventor
张聪
严茂胜
王一涵
周剑
Original Assignee
中移(成都)信息通信科技有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移(成都)信息通信科技有限公司, 中国移动通信集团有限公司 filed Critical 中移(成都)信息通信科技有限公司
Publication of WO2023130771A1 publication Critical patent/WO2023130771A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Definitions

  • the present application relates to the technical field of data management, and in particular to a data management method, device, electronic equipment and storage medium.
  • Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources.
  • metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; Efficient discovery, search, integrated organization and efficient management of resource usage.
  • the present application provides a data management method, device, electronic equipment, and storage medium, which can realize efficient management of several database tables through a data flow model, reduce manual maintenance costs, facilitate query, and improve data management efficiency.
  • the embodiment of the present application provides a data management method, the method includes:
  • Obtain a data stream to be processed, the data stream to be processed includes several database tables;
  • a data flow model is generated according to the sequence information and key field information of the several database tables; wherein, the data flow model is used to characterize the association relationship among the several database tables.
  • the embodiment of the present application provides a data management device, including an acquisition unit, a determination unit, and a generation unit, wherein,
  • the acquiring unit is configured to acquire a data stream to be processed, and the data stream to be processed includes several database tables;
  • the determining unit is configured to determine sequence information and key field information of the several database tables
  • the generation unit is configured to generate a data flow model according to the sequence information and key field information of the several database tables; wherein the data flow model is used to characterize the association relationship between the several database tables.
  • an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, wherein,
  • said memory for storing a computer program capable of running on said processor
  • the processor is configured to execute the data management method as described in the first aspect when running the computer program.
  • an embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the data management method as described in the first aspect is implemented.
  • a data management method, device, electronic device, and storage medium provided by an embodiment of the present application, the method includes: obtaining a data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and Key field information; generate a data flow model according to the order information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.
  • the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.
  • FIG. 1 is a schematic flow diagram of a data management method provided in an embodiment of the present application
  • FIG. 2 is a schematic flow diagram of another data management method provided in the embodiment of the present application.
  • FIG. 3 is a schematic flowchart of another data management method provided in the embodiment of the present application.
  • FIG. 4 is a schematic diagram of the composition and structure of a data management device provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of the composition and structure of an electronic device provided in an embodiment of the present application.
  • FIG. 6 is a schematic diagram of the composition and structure of another electronic device provided by the embodiment of the present application.
  • first ⁇ second ⁇ third involved in the embodiment of this application is only to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, “first ⁇ second ⁇ third” Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein.
  • Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources. Metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; realize effective discovery of information resources, Finding, integrated organization and efficient management of resource usage.
  • Metadata is also data, it can be stored and retrieved in the database in a similar way to data. If the organization that provides the data element also provides the metadata describing the data element, the use of the data element will become accurate and efficient. When users use data, they can first check its metadata so that they can obtain the information they need.
  • the data dictionary is a collection of information describing data and a collection of definitions for all data elements used in the system.
  • traditional relational databases such as Oracle, MySQL
  • data dictionary tables to store table information, field information, indexes, constraint information, etc. And so on, the information in the data dictionary table will be updated accordingly.
  • the database data dictionary is not only the center of every database, but also very important information for every user.
  • Another solution is to use documents (such as Excel, Word) to manage metadata.
  • document management metadata needs to establish a standard data dictionary model to manage the definition and description information of tables and fields in the database, and special personnel need to be arranged to maintain and update document information to ensure the consistency between the data dictionary document and the database table structure sex.
  • the embodiment of the present application provides a data management method.
  • the basic idea of the method is: obtain the data stream to be processed, which includes several database tables; determine the sequence information and key fields of several database tables information; according to the sequence information and key field information of several database tables, a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables.
  • the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.
  • FIG. 1 shows a schematic flowchart of a data management method provided in an embodiment of the present application.
  • the method may include:
  • the embodiment of the present application provides a data management method, which may specifically refer to a metadata management method.
  • the method can be applied to a data management device, or an electronic device integrated with the data management device.
  • the electronic device may be, for example, a computer, a smart phone, a tablet computer, a notebook computer, a palmtop computer, a personal digital assistant (Personal Digital Assistant, PDA), a navigation device, a server, etc., which are not specifically limited in this embodiment of the present application.
  • data generated in the business process can be stored in individual tables of the database, and these tables are called database tables.
  • database tables form the data stream to be processed, and each database table represents a certain stage in the business process.
  • the database tables are metadata, and the data management method provided by the embodiment of the present application can efficiently manage these metadata, and clearly represent the association relationship between various database tables.
  • each database table may include a table name, field names of several fields, field descriptions, field types, and whether the fields are primary keys.
  • table 1 is the product table
  • table 2 is the order table.
  • table 1 the table name is product
  • table 1 includes three fields, and the field names of the three fields are: product_no, product_name and price.
  • product_no represents the product number
  • the field type is integer (int), which is the primary key of Table 1
  • product_name represents the product name
  • the field type is a variable-length string (varchar(100))
  • price represents the unit price
  • the field type is decimal (decimal(10,2)).
  • Table 2 The table name of Table 2 is order.
  • order_no and item_no are the primary keys of Table 2.
  • order_no and item_no can form the joint primary key of Table 2.
  • determining the sequence information and key field information of several database tables may include:
  • An inflow field and an outflow field corresponding to each of the several database tables are determined, and key field information of the several database tables is determined according to the inflow field and the outflow field corresponding to each database table.
  • the sequence information of the database tables indicates the sequence in which the database tables are generated in the data stream to be processed. Therefore, first determine the sequence of several database tables in the data stream to be processed, and then generate the sequence information of each database table according to the sequence.
  • the sequence information can be represented by a natural number that increases sequentially from 1.
  • a data stream to be processed there may be two or more database tables in the same order.
  • the object of purchasing can be product A or product B; among them, product A corresponds to the database Table A, product B correspond to database table B.
  • database table A and database table B have the same sequence information.
  • the sequence may be to form an order table for the product according to the product table, that is, Table 1 is generated first, and Table 2 is generated later. Then the sequence information corresponding to Table 1 is 1, and the sequence information corresponding to Table 2 is 2.
  • the inflow fields and outflow fields corresponding to several database tables can be respectively determined, and the keywords of the database tables can be determined according to the inflow fields and outflow fields segment information. That is to say, the key field information of the database table may include the inflow field and the outflow field of the database table.
  • the outflow field is the primary key field of the database table
  • the inflow field is the primary key field of the inflow table of the database table
  • the inflow table is usually the previous database table whose sequence information is adjacent to the database table.
  • the primary key field can be one or more fields in the database table, and its value can be used to uniquely identify the database table.
  • the key field information may also include: an inflow table and a receiving field.
  • the key field information corresponding to the database table can also include the name of the inflow table and the receiving field; wherein, the name of the inflow table is the name field of the inflow table of the database table, and the receiving field is the associated field of the inflow table (usually the database The foreign key field of the table), the receiving field corresponding to the database table is associated with the inflow field of the inflow table.
  • the data flow model is generated according to the sequence information and key field information of several database tables, which may include:
  • one database table can generate one or more data flow nodes.
  • a database table corresponds One data flow node is generated; in the case of one database table corresponding to multiple parent tables, one database table corresponds to multiple data flow nodes.
  • the content of a data flow node includes but not limited to the sequence information and key field information of the database table. Then, according to the sequence information and key field information of each database table, the obtained data flow nodes are sequentially connected together to obtain the data flow model.
  • the data flow nodes are serially connected according to the order information of the database tables corresponding to each data flow node, and for two data flow nodes with adjacent order information, the outflow field of the previous data flow node is The inflow field of the latter data flow node forms the association between the two data flow nodes, and clearly shows the association relationship of the database tables.
  • determining the inflow field and outflow field corresponding to each database table in several database tables may include:
  • the first database table is any one of several database tables.
  • the primary key field of the first database table when determining its inflow field and outflow field, can be determined first, and the primary key field Determined as the outflow field of the first database table; at the same time, determine the second database table of the previous data flow node corresponding to the first database table, that is, the second database table is adjacent to the first database table and located in the first database for the order information
  • the database table before the table, that is, the inflow table of the first database table uses the primary key field of the inflow table as the inflow field corresponding to the first database table.
  • the method may also include:
  • the inflow table and the inflow field corresponding to the first database table are both empty;
  • the outflow field corresponding to the first database table is empty.
  • the first database table if it is the start of the data flow to be processed, and there is no inflow table, it is determined that the corresponding inflow table and inflow field are both empty (null), and at the same time, its The receiving field is also empty; if the first database table is the end of the data stream to be processed, then there is no outgoing field, and its corresponding outgoing field is empty.
  • Table 3 is a data flow model generated based on Table 1 and Table 2 (also referred to as a data flow model table).
  • Table 3 includes two data flow nodes, one is the data flow node corresponding to Table 1 (ie the second row of Table 3 ), and the other is the data flow node corresponding to Table 2 (that is, the third row of Table 3).
  • flow_name represents the name of the data flow model, usually the name of the data flow it represents to be processed. Since Table 3 represents the product order flow, its flow_name is the product order flow; seq_no represents the data flow node corresponding The sequence information of the database table; tab_name indicates the table name of the database table corresponding to the data flow node; out_col indicates the outflow field of the data flow node, that is, the outflow field of the database table corresponding to the data flow node; get_col indicates the receiving field of the data flow node, That is, the outflow field of the database table corresponding to the data flow node; in_tab_name indicates the inflow table of the data flow node, that is, the name of the inflow table of the database table corresponding to the data flow node; in_col indicates the inflow field of the data flow node, that is, the data The inflow field of the database table corresponding to the flow node.
  • the data flow model may also include a data flow name (flow_name), which is used to indicate the data flow to be processed corresponding to the data flow model.
  • flow_name a data flow name
  • Table 3 the relationship between Table 1 and Table 2 can be characterized.
  • the inflow and outflow relationship between Table 1 and Table 2 is that Table 1 flows into Table 2, and the inflow fields, outflow fields, and receiving fields of each table, etc.
  • the data flow model generated by the data management method can realize the management of database tables in a certain business process.
  • the business process may not have been actually performed; that is, the embodiment of the present application can generate a data flow model at any time before or after the business is started.
  • the basic idea of the embodiment of the present application is to regard the database table as each node in the database, and connect the nodes in series according to the sequence of data generation and the relationship between the database tables to form a The data flow of the business process, and then store the information in the data flow in the database according to a certain data structure to obtain a data flow model, which includes several data flow nodes.
  • Table 4 shows a table structure and example data of a data flow model provided by the embodiment of the present application. It mainly describes the content included in a data flow node in the data flow model, as well as the field description, field type and field attribute of each field of the data flow node.
  • data_flow represents a data flow.
  • the data flow nodes included in the generated data flow model may include the following contents: flow_name, seq_no, tab_name, out_col, get_col, in_tab_name and in_col.
  • flow_name, seq_no, tab_name, and in_tab_name are the primary keys of the data flow model, and the three can be used as the joint primary key of the data flow model; seq_no indicates the order information of tab_name (the database table corresponding to the data flow node) in the data flow to be processed, and also It is called the serial number; in_tab_name is the name of the inflow table (also called the name of the upstream node table), in_col is the primary key field of in_tab_name; out_col is the outflow field of the data flow node (usually the primary key field of the data flow node), get_col is the base The node receives the associated field (usually the foreign key field of the data flow node) of the upstream node (that is, the data flow node corresponding to the inflow table), and a complete data flow model is multiple data_flow tables of the same data flow to be processed (that is, the data flow node) record composition.
  • Table 2 Table 3, and Table 4 show the process of generating a data flow model from the data stream to be processed in a simple business scenario.
  • a child table corresponds to the business scenario of multiple parent tables.
  • generating several data flow nodes according to several database tables may include:
  • first database table corresponds to a parent table, then generate a data flow node according to the first database table;
  • the first database table corresponds to at least two parent tables, at least two data flow nodes are generated according to the first database table, and the order information of at least two data flow nodes is the same;
  • the first database table is any one of several database tables.
  • any one of the several database tables is marked as the first database table, and the parent table represents the inflow table of the first database.
  • the first database table (denoted as a child table), its corresponding inflow table is denoted as a parent table. If a child table corresponds to only one parent table, that is, there is only one inflow table in the first database table, as in Tables 2 to 4 above, the first database table corresponds to a data flow node.
  • a child table corresponds to at least two parent tables, that is, there are multiple inflow tables in the first database table, at least two data flow nodes corresponding to the first database table are respectively generated, and the order information of the at least two data flow nodes is the same, However, the inflow information is different (the inflow information may include the aforementioned: inflow table name, inflow field and receiving field), and the number of at least two data flow nodes corresponding to the child table is the same as the number of at least two parent tables.
  • the database tables included in it are the following tables 5-8.
  • Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Among them, Table 5 is the table of engineering components, Table 6 is the table of standard parts, Table 7 is the table of technical requirements, and Table 8 is the table of purchase orders. Engineering parts are special materials for a certain project, and standard parts are common materials for all projects. Since the characteristic attributes of the two materials are quite different, they are stored in two tables.
  • Table 9 since both Table 5 and Table 6 are the start of the material procurement flow, the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (sequence information) to ensure that the primary keys of the two tables are not the same conflict.
  • the material procurement process in this industry is that the technical department first creates a technical demand list, specifies the materials and quantities to be purchased, and then submits it to the purchasing department to create a purchase order and initiate the procurement.
  • proj_mat and std_mat There are two initial data flow nodes in the data flow model, proj_mat and std_mat, so there are two records with seq_no as 1 in the data flow model, and the receiving field mat_no of tech_mat_req comes from the outflow field proj_mat_no of proj_mat, or the outflow from std_mat.
  • the field std_mat_no so there are two records with seq_no as 2 in the data flow model.
  • the primary key of tech_mat_req is the joint primary key composed of tmr_no and tmr_item_no
  • the output field of the table tech_mat_req and the input field and receiving field of the table purchase_order are expressed in the tuple format of the joint primary key.
  • Tables 5 to 9 illustrate how to use the data flow model to represent complex business scenarios where one child table corresponds to multiple parent tables.
  • FIG. 2 shows a schematic flowchart of another data management method provided by an embodiment of the present application. As shown in Figure 2, the method may include:
  • S201 Perform split processing on several database tables included in the data stream to be processed to obtain at least two groups of database tables.
  • S202 Determine sequence information and key field information of each group of database tables in at least two groups of database tables.
  • S203 Generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; each data flow sub-model is used to represent the association relationship between each group of database tables.
  • the data flow to be processed corresponds to the business process of the same target business, even if it is the same target business, there may be different differences when the business is in progress.
  • the business process that is, the data flow to be processed corresponds to at least two business processes.
  • the embodiment of the present application may also perform split processing on the database tables included in the data stream to be processed, that is, split the database tables in the data stream to be processed into at least two groups of database tables.
  • split processing on the database tables included in the data stream to be processed, that is, split the database tables in the data stream to be processed into at least two groups of database tables.
  • the database table is included in each database table during distribution.
  • each group of database tables obtained by splitting For each group of database tables obtained by splitting, the order information and key field information of each group of database tables are respectively determined, and corresponding data flow sub-models are generated accordingly. Each data flow sub-model is used to represent the relationship between each group of database tables.
  • the method of determining the data flow sub-model is as described above.
  • the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through the technical demand form, and the other is to generate an inquiry form first and then generate a purchase order through the technical demand form.
  • This scenario starts with a technical demand sheet, passes through the two branches of inquiry and non-inquiry, and finally merges into the purchase order.
  • the business process should be attributed to two business flows, that is, the data flow to be processed is divided into Inquiry procurement flow and non-inquiry procurement flow.
  • the database tables included in the data flow to be processed are Table 7 and the following Tables 10-14.
  • Table 7 is the technical demand list
  • Table 10 is the inquiry form
  • Table 11 is the technical demand and inquiry association table (tmr and enq association table)
  • Table 12 is the inquiry and procurement association table (enq and po association table )
  • Table 13 is the association table between technical requirements and procurement (the association table between tmr and po)
  • Table 14 is the purchase order table.
  • Table 15 shows the data flow sub-model obtained according to the RFQ procurement flow.
  • Table 16 shows the data flow sub-model obtained according to the non-inquiry procurement flow.
  • Tables 15 and 16 dividing the database tables corresponding to the business processes of inquiry and non-inquiry into two data streams can describe the business process more clearly.
  • the data streams to be processed are divided into two sets of database tables, and Two data flow sub-models are obtained, and the database tables corresponding to the start data flow nodes of the two data flow sub-models are both Table 7, and the database tables corresponding to the end data flow nodes are both Table 14, that is, both have the same The starting point, after diverging, finally merges into the same end point.
  • the method may also include:
  • Merge processing is performed on at least two data flow sub-models to obtain a data flow model.
  • the two data flow sub-models can also be merged, such as spliced or saved in the same directory, and at least two data flow sub-models can be merged to obtain this Data flow model in complex scenarios.
  • This example illustrates how to use the data flow model to represent the scene of data splitting and then merging.
  • the pending data flow can be divided into multiple stages.
  • name the data flow corresponding to Table 15 material procurement flow-inquiry stage
  • name the data flow corresponding to Table 16 material procurement flow-no inquiry stage.
  • the method may further include:
  • the data dictionary stores the data information corresponding to the data flow to be processed
  • the data information in the data flow model is inconsistent with the data information in the data dictionary, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data in the data dictionary The message is consistent.
  • the data dictionary stores the business data of the target business corresponding to the data flow to be processed. After obtaining the data flow model corresponding to the data flow to be processed, compare and check the data flow model and the data dictionary. If the two information is consistent , it means that the generated data flow model is correct; if the two information is inconsistent, the data flow information in the data flow model is corrected based on the data information in the data dictionary, so that the data flow model is consistent with the data information in the data dictionary , so as to obtain an accurate data flow model.
  • the method may further include:
  • the information to be queried is determined.
  • the sequence information and/or key field information can be queried in the data flow model as the information to be queried, so that at least one data flow node corresponding to the information to be queried can be obtained, and
  • the database table corresponding to the data flow node is further obtained, and at the same time, the association relationship between the data flow node and other data flow nodes can also be obtained.
  • the association relationship may include the upstream and downstream relationship between data flow nodes, the association between key field information, etc., wherein the data flow node flowing into the data flow node is called the upstream data flow node, and the data flow node is referred to as
  • the incoming data flow node is called a downstream data flow node; for example, in the aforementioned Table 3, the product order flow 1 in the second row is the upstream data flow node of the product order flow 2 in the third row, and the product order flow 2 is the product order flow 1's downstream data flow node.
  • SQL statement Structured Query Language
  • This embodiment provides a data management method, by obtaining the data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and key field information of several database tables; according to the sequence information of several database tables and key field information to generate a data flow model; wherein, the data flow model is used to represent the association relationship between several database tables.
  • the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management; in addition, for scenarios where one child table corresponds to multiple parent tables and the data flow to be processed is too complex, the embodiment of this application can also generate multiple data flow nodes for one data flow table, or By splitting the data streams to be processed, obtaining the sub-models of the data streams separately, and then merging them, the data streams to be processed in these complex scenarios can be converted into a concise and clear data stream model for efficient management of the data streams.
  • FIG. 3 shows a schematic flowchart of another data management method provided by the embodiment of the present application. As shown in Figure 3, the method may include:
  • a data flow is a description of a business process, and in this embodiment of the application, each data flow to be processed generally corresponds to only one business process.
  • To determine the data flow to be processed is to determine the business process.
  • the data flow to be processed can be named according to the business process, for example: product order flow, material procurement flow, etc., and the name can be used as the unique identifier of the data flow to be processed.
  • the same database table may exist in multiple different data streams to be processed, that is, different data streams to be processed may include the same database table.
  • the aforementioned Table 7 exists in two data streams to be processed respectively.
  • the database tables it includes are Table 1 (product table) and Table 2 (order table); for the material procurement flow in the aforementioned embodiments, the database tables it includes are Table 5 (engineering component table), table 6 (standard part table), table 7 (technical demand list table) and table 8 (purchase order table); for the complex material procurement flow that needs to be diverted-combined in the aforementioned embodiments,
  • the database tables included are Table 7 (Technical Requirements Form), Table 10 (Inquiry Form), Table 11 (Technical Requirements and Inquiry Association Table), Table 12 (Inquiry and Purchase Association Table), Table 13 ( Technical Requirements and Purchasing Association Table) and Table 14 (Purchasing Form).
  • S303 Determine the sequence of all database tables in the data stream to be processed, number each database table, determine the inflow and outflow field information of each database table, and determine the data flow information.
  • each database table is numbered, and the inflow and outflow field information of each database table is determined.
  • the determined data flow information may include sequence information, an inflow field, an outflow field, and may also include a receiving field, an inflow table, and the like.
  • the sequence number of each database table in the data stream to be processed is generated, that is, the sequence information in the foregoing embodiments.
  • a sequence of numbers from 1 to n can be generated in numerical order and written into the data flow model.
  • each row of data is a data flow node in the data flow model.
  • its outgoing field is the primary key of the database table itself
  • the receiving field is the foreign key field
  • the incoming table is the previous
  • the database table of the node the inflow field is the primary key of the inflow table (the receiving field of the database table is associated with the inflow field of the inflow table).
  • the data flow node corresponding to the first database table is the start data flow node, the receiving field, inflow table, and inflow field of the start data flow node are null, and the data flow node corresponding to the last database table is the end data flow node , the outflow field of the end dataflow node is null.
  • Table 3 it includes two data flow nodes: product order flow 1 and product order flow 2, product order flow 1 is the data flow node corresponding to table 1 (product table, product table), and product order flow 2 is the table 2 (order table, order table) corresponds to the data flow node.
  • the get_col, in_tab_name, and in_col of the product order flow 1 are all null; the product order flow 2 corresponding to the order table is the data flow model’s End the data flow node, so the out_col of product order flow 2 is null.
  • product.product_no and order.product_no are related to each other.
  • Tables 1 to 3 show a simple business scenario of product order flow. In practice, there are often more complex business scenarios.
  • Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Components can be special materials for a certain project, and standard parts can be general materials for all projects. Due to the large difference in the characteristic attributes of the two materials, they are divided into two tables (Table 5 and Table 6) for storage; the material procurement process in this industry is to first create a technical demand list by the technical department, and specify the materials to be purchased and their quantities , and then submit it to the purchasing department to create a purchase order and initiate the purchase. The data flow information of material procurement in this industry is shown in Table 9.
  • Table 7 can be associated with Table 5, and Table 7 can also be associated with Table 6, that is, Table 7 is a child table, and there are two parent tables, Table 5 and Table 6.
  • the primary keys of the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (that is, order information) to ensure that the primary keys of the two tables do not conflict.
  • the node serial numbers of the two data flow nodes corresponding to Table 5 and Table 6 are both 1 .
  • tech_mat_req.mat_no comes from either proj_mat.proj_mat_no or std_mat.std_mat_no, so there are two records with seq_no 2 in the table tech_mat_req.
  • the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through a technical demand list, The second is to first generate an inquiry form through the technical demand form and then generate a purchase order.
  • This scenario starts with a technical demand sheet, through the two branches of inquiry and non-inquiry, and finally merges into the purchase order.
  • the business can be attributed to two data flow sub-models, as shown in Table 15 and Table 1. 16.
  • This statement can be executed repeatedly until the number of data flow nodes in the data flow model reaches the total number of data flow nodes contained in the data flow model.
  • the parameters of this statement can be set according to the aforementioned determined data flow information, and the fields to be updated are all optional, and the data is updated according to the actual situation.
  • the embodiment of the present application provides a data flow model and a method for generating a data flow model.
  • the data flow model can be used to describe the relationship between tables in the business system database, so that the logic of the business system It is easier to understand, and at the same time makes the system easier to maintain and secondary development, and also brings convenience for later integration of system data into the data warehouse.
  • a concise data flow model can be used to record the relationship between tables in the database, and can be stored in the database and coexist with the data dictionary.
  • the system also avoids various performance problems caused by creating physical foreign keys.
  • Tables 1 to 3 A simple example is used to illustrate the use of the data flow model.
  • the two data in Table 3 are the data flow node information of the product table and the order table of the product order flow. Since the product table is the starting data flow node, get_col, in_tab_name, and in_col are null, the order table is the ending data flow node, so out_col is null, and product.product_no and order.product_no are related to each other.
  • the above is a simple example of the data flow model.
  • the core of the data flow model lies in the collection of data flow information.
  • the complete data collection process in the embodiment of the present application is as follows: (1) Determine the data flow. (2) Determine all database tables included in the data stream. (3) Determine the sequence of all database tables in the data stream, number each database table, and determine the inflow and outflow fields of each database table in the data stream. (4) Write the determined data flow information into the data flow model. (5) Check the data flow information and keep it consistent with the relevant information of the data dictionary.
  • the data flow model in the embodiment of the present application, it also proposes a data standard for a business scenario in which one child table corresponds to multiple parent tables and the data is split and then merged, so that more complex business scenarios can be recorded, and data Combined with the single-table metadata of the dictionary, the data flow model will have a wider range of application scenarios.
  • This embodiment provides a data management method.
  • This embodiment is a detailed description of the specific implementation of the foregoing embodiments. It can be seen that, compared with related technologies, the technical solution provided by this embodiment of the application has at least the following advantages: (1) Related technologies use data fields to manage metadata. It is necessary to establish physical foreign keys on tables to record the relationship between tables. Establishing physical foreign keys will make system development more difficult, data processing more difficult, and affect For issues such as system performance, the use of the data flow model avoids the establishment of physical foreign keys during system development, and also avoids various related problems. (2) Related technologies use document management metadata, which requires a large workload for manual maintenance, makes it difficult to consult, and easily causes inconsistencies with system information.
  • the embodiment of the present application uses a data flow model, which can not only represent the association relationship between two adjacent tables, but also completely represent the sequence association relationship of the entire data link; the establishment of a data flow model is the definition of business logic.
  • the combing process can be carried out simultaneously with the development of the business system to facilitate the discovery of problems in the business logic; the data flow model can also represent complex business scenarios in which one child table corresponds to multiple parent tables, and the data is split and then merged. Compared with related technologies, Wider application scenarios.
  • FIG. 4 shows a schematic diagram of the composition and structure of a data management device 40 provided in the embodiment of the present application.
  • the data management device 40 may include an acquisition unit 401, a determination unit 402 and a generation unit 403, wherein,
  • the obtaining unit 401 is configured to obtain a data stream to be processed, and the data stream to be processed includes several database tables;
  • a determination unit 402 configured to determine sequence information and key field information of several database tables
  • the generating unit 403 is configured to generate a data flow model according to sequence information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.
  • the determining unit 402 is specifically configured to determine the sequence of several database tables in the data stream to be processed, and generate sequence information of several database tables according to the sequence; and determine the sequence information of several database tables The inflow field and outflow field corresponding to each database table, and the key field information of several database tables are determined according to the inflow field and outflow field corresponding to each database table.
  • the generation unit 403 is specifically configured to generate several data flow nodes according to several database tables; and to concatenate several data flow nodes according to the sequence information and key field information of several database tables , get the data flow model.
  • the determining unit 402 is also specifically configured to determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table; and determine the first database table corresponding to the first data flow node Two database tables, the second database table is determined as the inflow table corresponding to the first database table, and the primary key field of the inflow table is used as the inflow field corresponding to the first database table; wherein, the first database table is one of several database tables any database table.
  • the determining unit 402 is further configured to determine that both the inflow table and the inflow field corresponding to the first database table are empty when the first database table is at the start data flow node of the data flow to be processed; and When the first database table is at the end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.
  • the generation unit 403 is further specifically configured to generate a data flow node according to the first database table if the first database table corresponds to one parent table; and if the first database table corresponds to at least two parent tables, then At least two data flow nodes are generated according to the first database table, and the sequence information of the at least two data flow nodes is the same; wherein, the first database table is any one of several database tables.
  • the data management device may further include a splitting unit 404 configured to split several database tables included in the data stream to be processed to obtain at least two groups of database tables;
  • the determination unit 402 is further configured to determine the sequence information and key field information of each group of database tables in at least two groups of database tables;
  • the generation unit 403 is further configured to generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; wherein, each data flow sub-model is used to represent each group of database tables relationship.
  • the data management apparatus may further include a merging unit 405 configured to perform merging processing on at least two data flow sub-models to obtain a data flow model.
  • the data management device may further include a comparison unit 406 configured to compare the data information in the data flow model with the data dictionary; and if the data information in the data flow model is consistent with the data If the data information in the dictionary is inconsistent, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data information in the data dictionary.
  • a comparison unit 406 configured to compare the data information in the data flow model with the data dictionary; and if the data information in the data flow model is consistent with the data If the data information in the dictionary is inconsistent, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data information in the data dictionary.
  • the data management device may further include a query unit 407 configured to determine the information to be queried; and perform a query in the data flow model based on the information to be queried, and determine the database corresponding to the information to be queried table, and/or, determine the data flow node corresponding to the information to be queried and the association relationship between the data flow nodes.
  • a query unit 407 configured to determine the information to be queried; and perform a query in the data flow model based on the information to be queried, and determine the database corresponding to the information to be queried table, and/or, determine the data flow node corresponding to the information to be queried and the association relationship between the data flow nodes.
  • a "unit” may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular.
  • each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • this embodiment provides a computer storage medium, where the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the steps of the data processing method described in any one of the preceding embodiments are implemented.
  • FIG. 5 shows a schematic diagram of the composition and structure of an electronic device 50 provided by an embodiment of the present application.
  • the bus system 504 is used to realize connection and communication between these components.
  • the bus system 504 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as bus system 504 in FIG. 5 .
  • the communication interface 501 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
  • memory 502 used to store computer programs that can run on the processor 503;
  • the processor 503 is configured to, when running the computer program, execute:
  • a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables.
  • the memory 502 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories.
  • the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash.
  • the volatile memory can be Random Access Memory (RAM), which acts as external cache memory.
  • RAM Static Random Access Memory
  • SRAM Static Random Access Memory
  • DRAM Dynamic Random Access Memory
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDRSDRAM
  • enhanced SDRAM ESDRAM
  • synchronous chain dynamic random access memory Synchronous link DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the processor 503 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 503 or instructions in the form of software.
  • the above-mentioned processor 503 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the storage medium is located in the memory 502, and the processor 503 reads the information in the memory 502, and completes the steps of the above method in combination with its hardware.
  • the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices used to perform the functions described in this application electronic unit or its combination.
  • ASIC Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device digital signal processing device
  • DSPD digital signal processing device
  • PLD programmable Logic Device
  • Field-Programmable Gate Array Field-Programmable Gate Array
  • FPGA Field-Programmable Gate Array
  • the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • Software codes can be stored in memory and executed by a processor.
  • Memory can be implemented within the processor or external to the processor.
  • the processor 503 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
  • FIG. 6 shows a schematic diagram of the composition and structure of another electronic device 50 provided by the embodiment of the present application.
  • the electronic device 50 at least includes the data management apparatus 40 described in any one of the foregoing embodiments.
  • the electronic device 50 due to the data flow model generated based on the order information and key field information of several database tables in the data stream to be processed, it can not only realize efficient management of these several database tables, but also reduce manual maintenance. cost, and can also be applied to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids the performance problems caused by creating physical foreign keys, and can also be used in the data flow model It is convenient to query the data flow information in the middle, which improves the efficiency of data management.
  • the data flow model generated based on the sequence information and key field information of several database tables in the data flow to be processed can not only realize efficient management of these several database tables, but also reduce the cost of manual maintenance , and can also apply to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids performance problems caused by creating physical foreign keys, and can also be used in the data flow model The data flow information is conveniently queried, which improves the efficiency of data management.

Abstract

Disclosed in the embodiments of the present application are a data management method and apparatus, and an electronic device and a storage medium. The method comprises: acquiring a data stream to be processed, wherein said data stream comprises several database tables; determining order information and key field information of the several database tables; and generating a data stream model according to the order information and the key field information of the several database tables, wherein the data stream model is used for representing the association relationship between the several database tables. In this way, by means of a data stream model which is generated on the basis of order information and key field information corresponding to several database tables in a data stream to be processed, the efficient management of the several database tables can be realized, such that the manual maintenance cost is reduced, and querying is also facilitated, thereby improving the data management efficiency.

Description

一种数据管理方法、装置、电子设备以及存储介质A data management method, device, electronic device and storage medium
相关申请的交叉引用Cross References to Related Applications
本申请要求在2022年01月05日提交中国专利局、申请号为202210006368.3、申请名称为“一种数据管理方法、装置、电子设备以及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application with the application number 202210006368.3 and the application name "a data management method, device, electronic device and storage medium" submitted to the China Patent Office on January 05, 2022, and the entire contents of which are passed References are incorporated in this application.
技术领域technical field
本申请涉及数据管理技术领域,尤其涉及一种数据管理方法、装置、电子设备以及存储介质。The present application relates to the technical field of data management, and in particular to a data management method, device, electronic equipment and storage medium.
背景技术Background technique
元数据(Metadata)是描述其它数据的数据(data about other data),或者是用于提供某种资源的有关信息的结构数据(structured data)。在这里,元数据是描述信息资源或数据等对象的数据,其使用目的在于:识别资源;评价资源;追踪资源在使用过程中的变化;实现简单高效地管理大量网络化数据;实现信息资源的有效发现、查找、一体化组织和对使用资源的有效管理。Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources. Here, metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; Efficient discovery, search, integrated organization and efficient management of resource usage.
随着业务系统的业务逻辑日益复杂,如何在海量数据下有效管理元数据信息成为亟需要解决的一个问题。目前,针对元数据的管理,主要是通过人工梳理手段把元数据信息一条条的录入到文档里面,而且还需要建立标准的数据字典模型,导致人工维护成本高,查询工作量较大,进而导致效率低。As the business logic of business systems becomes increasingly complex, how to effectively manage metadata information under massive data has become an urgent problem to be solved. At present, for the management of metadata, it is mainly to enter the metadata information one by one into the document through manual sorting, and it is also necessary to establish a standard data dictionary model, resulting in high manual maintenance costs and a large query workload, which in turn leads to low efficiency.
发明内容Contents of the invention
本申请提供了一种数据管理方法、装置、电子设备以及存储介质,能够通过数据流模型实现对若干个数据库表的高效管理,降低了人工维护成本,而且查询方便,提升了数据管理效率。The present application provides a data management method, device, electronic equipment, and storage medium, which can realize efficient management of several database tables through a data flow model, reduce manual maintenance costs, facilitate query, and improve data management efficiency.
本申请的技术方案是这样实现的:The technical scheme of the present application is realized like this:
第一方面,本申请实施例提供了一种数据管理方法,该方法包括:In the first aspect, the embodiment of the present application provides a data management method, the method includes:
获取待处理数据流,所述待处理数据流包括若干个数据库表;Obtain a data stream to be processed, the data stream to be processed includes several database tables;
确定所述若干个数据库表的次序信息和关键字段信息;Determine the sequence information and key field information of the several database tables;
根据所述若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,所述数据流模型用于表征所述若干个数据库表之间的关联关系。A data flow model is generated according to the sequence information and key field information of the several database tables; wherein, the data flow model is used to characterize the association relationship among the several database tables.
第二方面,本申请实施例提供了一种数据管理装置,包括获取单元,确定单元和生成单元,其中,In the second aspect, the embodiment of the present application provides a data management device, including an acquisition unit, a determination unit, and a generation unit, wherein,
所述获取单元,配置为获取待处理数据流,所述待处理数据流包括若干个数据库表;The acquiring unit is configured to acquire a data stream to be processed, and the data stream to be processed includes several database tables;
所述确定单元,配置为确定所述若干个数据库表的次序信息和关键字段信息;The determining unit is configured to determine sequence information and key field information of the several database tables;
所述生成单元,配置为根据所述若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,所述数据流模型用于表征所述若干个数据库表之间的关联关系。The generation unit is configured to generate a data flow model according to the sequence information and key field information of the several database tables; wherein the data flow model is used to characterize the association relationship between the several database tables.
第三方面,本申请实施例提供了一种电子设备,该电子设备包括存储器和处理器,其中,In a third aspect, an embodiment of the present application provides an electronic device, where the electronic device includes a memory and a processor, wherein,
所述存储器,用于存储能够在所述处理器上运行的计算机程序;said memory for storing a computer program capable of running on said processor;
所述处理器,用于在运行所述计算机程序时,执行如第一方面所述的数据管理方法。The processor is configured to execute the data management method as described in the first aspect when running the computer program.
第四方面,本申请实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机 程序,该计算机程序被至少一个处理器执行时实现如第一方面所述的数据管理方法。In a fourth aspect, an embodiment of the present application provides a computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the data management method as described in the first aspect is implemented.
本申请实施例所提供的一种数据管理方法、装置、电子设备以及存储介质,该方法包括:获取待处理数据流,待处理数据流包括若干个数据库表;确定若干个数据库表的次序信息和关键字段信息;根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。这样,基于待处理数据流中的若干个数据库表的次序信息和关键字段信息生成的数据流模型,不仅能够实现对这若干个数据库表的高效管理,降低了人工维护成本,而且还能够适用复杂的应用场景;另外,由于该数据流模型能够完整记录数据库表与数据库表之间的关联关系,避免了创建物理外键导致的性能问题,同时还能够在数据流模型中对数据流信息进行方便查询,提升了数据管理效率。A data management method, device, electronic device, and storage medium provided by an embodiment of the present application, the method includes: obtaining a data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and Key field information; generate a data flow model according to the order information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.
附图说明Description of drawings
图1为本申请实施例提供的一种数据管理方法的流程示意图;FIG. 1 is a schematic flow diagram of a data management method provided in an embodiment of the present application;
图2为本申请实施例提供的另一种数据管理方法的流程示意图;FIG. 2 is a schematic flow diagram of another data management method provided in the embodiment of the present application;
图3为本申请实施例提供的又一种数据管理方法的流程示意图;FIG. 3 is a schematic flowchart of another data management method provided in the embodiment of the present application;
图4为本申请实施例提供的一种数据管理装置的组成结构示意图;FIG. 4 is a schematic diagram of the composition and structure of a data management device provided by an embodiment of the present application;
图5为本申请实施例提供的一种电子设备的组成结构示意图;FIG. 5 is a schematic diagram of the composition and structure of an electronic device provided in an embodiment of the present application;
图6为本申请实施例提供的另一种电子设备的组成结构示意图。FIG. 6 is a schematic diagram of the composition and structure of another electronic device provided by the embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。可以理解的是,此处所描述的具体实施例仅用于解释相关申请,而非对该申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与有关申请相关的部分。The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the drawings in the embodiments of the present application. It should be understood that the specific embodiments described here are only used to explain the related application, not to limit the application. It should also be noted that, for the convenience of description, only the parts related to the relevant application are shown in the drawings.
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field to which this application belongs. The terms used herein are only for the purpose of describing the embodiments of the present application, and are not intended to limit the present application.
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。In the following description, references to "some embodiments" describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or a different subset of all possible embodiments, and Can be combined with each other without conflict.
需要指出,本申请实施例所涉及的术语“第一\第二\第三”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二\第三”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。It should be pointed out that the term "first\second\third" involved in the embodiment of this application is only to distinguish similar objects, and does not represent a specific ordering of objects. Understandably, "first\second\third" Where permitted, the specific order or sequencing may be interchanged such that the embodiments of the application described herein can be practiced in sequences other than those illustrated or described herein.
元数据(Metadata)是描述其它数据的数据(data about other data),或者是用于提供某种资源的有关信息的结构数据(structured data)。元数据是描述信息资源或数据等对象的数据,其使用目的在于:识别资源;评价资源;追踪资源在使用过程中的变化;实现简单高效地管理大量网络化数据;实现信息资源的有效发现、查找、一体化组织和对使用资源的有效管理。Metadata is data describing other data (data about other data), or structured data (structured data) used to provide information about certain resources. Metadata is data that describes objects such as information resources or data. Its purpose is to: identify resources; evaluate resources; track changes in resources during use; realize simple and efficient management of large amounts of networked data; realize effective discovery of information resources, Finding, integrated organization and efficient management of resource usage.
由于元数据也是数据,因此可以用类似数据的方法在数据库中进行存储和获取。如果提供数据元的组织同时提供描述数据元的元数据,将会使数据元的使用变得准确而高效。用户在使用数据时可以首先查看其元数据以便能够获取自己所需的信息。Since metadata is also data, it can be stored and retrieved in the database in a similar way to data. If the organization that provides the data element also provides the metadata describing the data element, the use of the data element will become accurate and efficient. When users use data, they can first check its metadata so that they can obtain the information they need.
随着业务系统的业务逻辑日益复杂,尤其是自敏捷开发的流行,一个项目可以分为多个相互联系且能够独立运行的小项目,并分别完成,这给系统的数据质量与一致性带来了巨大的挑战,在这样的背景下,需要一套元数据管理方法来保证业务系统数据的质量以及后续的可维护性。With the business logic of the business system becoming more and more complex, especially since the popularity of agile development, a project can be divided into multiple small projects that are interrelated and can run independently, and they are completed separately, which brings great impact on the data quality and consistency of the system. In this context, a set of metadata management methods is needed to ensure the quality of business system data and subsequent maintainability.
目前,一种解决方案是可以通过数据字典或者文档管理的方式管理业务系统元数据。其 中,数据字典是描述数据的信息集合,是对系统中使用的所有数据元素的定义的集合。在传统的关系型数据库(如Oracle、MySQL)中,都包含一些数据字典表,用以存储表信息、字段信息、索引、约束信息等等,在系统开发过程中,一旦有加表、改字段等操作,数据字典表的信息将相应更新。数据库数据字典不仅是每个数据库的中心,而且对每个用户也是非常重要的信息。另一种解决方案是可以使用文档(如Excel、Word)来管理元数据。其中,文档管理元数据需要建立标准的数据字典模型,用以管理数据库中表、字段的定义与描述信息,需要安排专门人员来维护、更新文档信息,以保证数据字典文档与数据库表结构的一致性。At present, one solution is to manage business system metadata through data dictionary or document management. Among them, the data dictionary is a collection of information describing data and a collection of definitions for all data elements used in the system. In traditional relational databases (such as Oracle, MySQL), there are some data dictionary tables to store table information, field information, indexes, constraint information, etc. And so on, the information in the data dictionary table will be updated accordingly. The database data dictionary is not only the center of every database, but also very important information for every user. Another solution is to use documents (such as Excel, Word) to manage metadata. Among them, document management metadata needs to establish a standard data dictionary model to manage the definition and description information of tables and fields in the database, and special personnel need to be arranged to maintain and update document information to ensure the consistency between the data dictionary document and the database table structure sex.
然而,在利用数据字典进行元数据管理时,需要建立物理外键来记录表与表的关联关系,物理外键的好处是对于不满足外键约束的数据,无法录入系统,这样很大程度上避免了垃圾数据的产生。但是,物理外键也存在很大问题,一是加大了系统开发难度,二是加大数据处理难度,三是对系统性能有较大影响。因此,多数企业通常不选择建立物理外键,而是在系统开发中加入对外键约束的校验。另外,数据字典无法描述一些复杂的业务场景,数据字典中的外键关系,只能描述一个子表对应一个父表的关系,在实际业务中,可能存在一个子表对应多个父表的关系。在利用文档管理进行元数据管理时,人工维护的成本太高,而且容易出现文档更新不及时与系统版本不一致的情况。另外,查询工作量往往较大,需要查阅文档的多处确认表与表的关联关系,导致查询复杂且效率低下。此外,对于上述已有的两种解决方案还存在一次只能查询到两个表之间的关联关系,而不能一次性查询多张表的上下游关系的问题。However, when using the data dictionary for metadata management, it is necessary to establish a physical foreign key to record the relationship between tables. The advantage of the physical foreign key is that data that does not meet the foreign key constraints cannot be entered into the system. The generation of garbage data is avoided. However, there are also big problems with physical foreign keys. First, it increases the difficulty of system development, second, it increases the difficulty of data processing, and third, it has a greater impact on system performance. Therefore, most enterprises usually do not choose to establish physical foreign keys, but add verification of foreign key constraints in system development. In addition, the data dictionary cannot describe some complex business scenarios. The foreign key relationship in the data dictionary can only describe the relationship between one child table and one parent table. In actual business, there may be a relationship between one child table and multiple parent tables. . When using document management for metadata management, the cost of manual maintenance is too high, and it is prone to the situation that the document is not updated in time and the system version is inconsistent. In addition, the query workload is often large, and it is necessary to check multiple places in the document to confirm the relationship between tables, resulting in complex queries and low efficiency. In addition, for the above two existing solutions, there is still a problem that only the association relationship between two tables can be queried at one time, and the upstream and downstream relationships of multiple tables cannot be queried at one time.
基于此,本申请实施例提供了一种数据管理方法,该方法的基本思想是:获取待处理数据流,待处理数据流包括若干个数据库表;确定若干个数据库表的次序信息和关键字段信息;根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。这样,基于待处理数据流中的若干个数据库表的次序信息和关键字段信息生成的数据流模型,不仅能够实现对这若干个数据库表的高效管理,降低了人工维护成本,而且还能够适用复杂的应用场景;另外,由于该数据流模型能够完整记录数据库表与数据库表之间的关联关系,避免了创建物理外键导致的性能问题,同时还能够在数据流模型中对数据流信息进行方便查询,提升了数据管理效率。Based on this, the embodiment of the present application provides a data management method. The basic idea of the method is: obtain the data stream to be processed, which includes several database tables; determine the sequence information and key fields of several database tables information; according to the sequence information and key field information of several database tables, a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management.
下面将结合附图对本申请各实施例进行详细说明。Various embodiments of the present application will be described in detail below with reference to the accompanying drawings.
本申请的一实施例中,参见图1,其示出了本申请实施例提供的一种数据管理方法的流程示意图。如图1所示,该方法可以包括:In an embodiment of the present application, refer to FIG. 1 , which shows a schematic flowchart of a data management method provided in an embodiment of the present application. As shown in Figure 1, the method may include:
S101、获取待处理数据流,待处理数据流包括若干个数据库表。S101. Obtain a data stream to be processed, where the data stream to be processed includes several database tables.
S102、确定若干个数据库表的次序信息和关键字段信息。S102. Determine sequence information and key field information of several database tables.
S103、根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。S103. Generate a data flow model according to the sequence information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.
需要说明的是,本申请实施例提供了一种数据管理方法,具体可以是指元数据管理方法。该方法可以应用于数据管理装置,或者集成有该数据管理装置的电子设备。在这里,电子设备可以是诸如计算机、智能手机、平板电脑、笔记本电脑、掌上电脑、个人数字助理(Personal Digital Assistant,PDA)、导航装置、服务器等等,本申请实施例对此不作具体限定。It should be noted that the embodiment of the present application provides a data management method, which may specifically refer to a metadata management method. The method can be applied to a data management device, or an electronic device integrated with the data management device. Here, the electronic device may be, for example, a computer, a smart phone, a tablet computer, a notebook computer, a palmtop computer, a personal digital assistant (Personal Digital Assistant, PDA), a navigation device, a server, etc., which are not specifically limited in this embodiment of the present application.
还需要说明的是,对于业务系统中的某一个业务过程,该业务过程中产生的数据可以存储到数据库的一个个表中,将这些表称作数据库表。这些数据库表组成待处理数据流,每一个数据库表都代表了业务过程中的某一阶段。这里,数据库表即为元数据,通过本申请实施例提供的数据管理方法可以对这些元数据进行高效管理,并清晰地表征各数据库表之间的关联关系。It should also be noted that, for a certain business process in the business system, data generated in the business process can be stored in individual tables of the database, and these tables are called database tables. These database tables form the data stream to be processed, and each database table represents a certain stage in the business process. Here, the database tables are metadata, and the data management method provided by the embodiment of the present application can efficiently manage these metadata, and clearly represent the association relationship between various database tables.
以一个简单的商品订货业务为例,在该业务的业务过程中,其对应的待处理数据流所包括的数据库表参见表1和表2。Taking a simple commodity ordering business as an example, in the business process of this business, refer to Table 1 and Table 2 for the database tables included in the corresponding data flow to be processed.
表1Table 1
Figure PCTCN2022121315-appb-000001
Figure PCTCN2022121315-appb-000001
表2Table 2
Figure PCTCN2022121315-appb-000002
Figure PCTCN2022121315-appb-000002
如表1和表2所示,每个数据库表均可以包括表名称,若干个字段的字段名、字段描述、字段类型以及字段是否为主键。其中,表1为产品表,表2为订单表。As shown in Table 1 and Table 2, each database table may include a table name, field names of several fields, field descriptions, field types, and whether the fields are primary keys. Among them, table 1 is the product table, and table 2 is the order table.
以表1为例,表名称为product,表1包括三个字段,三个字段的字段名分别为:product_no,product_name和price。其中,product_no表示产品号,字段类型为整数型(int),是表1的主键;product_name表示产品名称,字段类型为可变长字符串(varchar(100));price表示单价,字段类型为十进制(decimal(10,2))。Take table 1 as an example, the table name is product, table 1 includes three fields, and the field names of the three fields are: product_no, product_name and price. Among them, product_no represents the product number, and the field type is integer (int), which is the primary key of Table 1; product_name represents the product name, and the field type is a variable-length string (varchar(100)); price represents the unit price, and the field type is decimal (decimal(10,2)).
表2的表名称为order,在表2中,order_no和item_no均为表2的主键,这时候,order_no和item_no可以组成表2的联合主键。The table name of Table 2 is order. In Table 2, order_no and item_no are the primary keys of Table 2. At this time, order_no and item_no can form the joint primary key of Table 2.
在一些实施例中,确定若干个数据库表的次序信息和关键字段信息,可以包括:In some embodiments, determining the sequence information and key field information of several database tables may include:
确定若干个数据库表在待处理数据流中的先后顺序,并根据先后顺序生成若干个数据库表的次序信息;Determine the sequence of several database tables in the data stream to be processed, and generate sequence information of several database tables according to the sequence;
确定若干个数据库表中每一个数据库表对应的流入字段和流出字段,并根据每一个数据库表对应的流入字段和流出字段确定若干个数据库表的关键字段信息。An inflow field and an outflow field corresponding to each of the several database tables are determined, and key field information of the several database tables is determined according to the inflow field and the outflow field corresponding to each database table.
需要说明的是,在本申请实施例中,数据库表的次序信息表示数据库表在待处理数据流中产生的先后顺序。因此,首先确定若干个数据库表在待处理数据流中的先后顺序,然后根据先后顺序生成每个数据库表的次序信息,这里,可以用从1开始顺序增长的自然数表示次序信息。It should be noted that, in the embodiment of the present application, the sequence information of the database tables indicates the sequence in which the database tables are generated in the data stream to be processed. Therefore, first determine the sequence of several database tables in the data stream to be processed, and then generate the sequence information of each database table according to the sequence. Here, the sequence information can be represented by a natural number that increases sequentially from 1.
另外,对于一个待处理数据流,可能存在两个或者两个以上的数据库表的顺序相同,例如在进行采购时,采购的对象可以是产品A,也可以是产品B;其中,产品A对应数据库表A,产品B对应数据库表B。这时候,数据库表A和数据库表B就具有相同的次序信息。In addition, for a data stream to be processed, there may be two or more database tables in the same order. For example, when purchasing, the object of purchasing can be product A or product B; among them, product A corresponds to the database Table A, product B correspond to database table B. At this time, database table A and database table B have the same sequence information.
对于表1和表2组成的待处理数据流,其先后顺序可以是根据产品表形成针对该产品的订单表,即表1先产生,表2后产生。那么表1对应的次序信息为1,表2对应的次序信息为2。For the data stream to be processed composed of Table 1 and Table 2, the sequence may be to form an order table for the product according to the product table, that is, Table 1 is generated first, and Table 2 is generated later. Then the sequence information corresponding to Table 1 is 1, and the sequence information corresponding to Table 2 is 2.
还需要说明的是,在本申请实施例中,在确定关键字段信息时,可以分别确定若干个数据库表各自对应的流入字段和流出字段,并根据流入字段和流出字段确定数据库表的关键字段信息。也就是说,数据库表的关键字段信息可以包括数据库表的流入字段和流出字段。It should also be noted that in the embodiment of the present application, when determining the key field information, the inflow fields and outflow fields corresponding to several database tables can be respectively determined, and the keywords of the database tables can be determined according to the inflow fields and outflow fields segment information. That is to say, the key field information of the database table may include the inflow field and the outflow field of the database table.
其中,流出字段为该数据库表的主键字段,流入字段为该数据库表的流入表的主键字段,流入表通常为次序信息与数据库表相邻的前一个数据库表。这里,主键字段可以是数据库表中的一个或多个字段,它的值可以用于唯一标识数据库表。Wherein, the outflow field is the primary key field of the database table, the inflow field is the primary key field of the inflow table of the database table, and the inflow table is usually the previous database table whose sequence information is adjacent to the database table. Here, the primary key field can be one or more fields in the database table, and its value can be used to uniquely identify the database table.
进一步的,对于每一个数据库表,关键字段信息还可以包括:流入表和接收字段。Further, for each database table, the key field information may also include: an inflow table and a receiving field.
需要说明的是,数据库表对应的关键字段信息还可以包括流入表名称和接收字段;其中,流入表名称为数据库表的流入表的名称字段,接收字段为流入表的关联字段(通常为数据库 表的外键字段),数据库表对应的接收字段与流入表的流入字段相关联。It should be noted that the key field information corresponding to the database table can also include the name of the inflow table and the receiving field; wherein, the name of the inflow table is the name field of the inflow table of the database table, and the receiving field is the associated field of the inflow table (usually the database The foreign key field of the table), the receiving field corresponding to the database table is associated with the inflow field of the inflow table.
在一些实施例中,根据若干个数据库表的次序信息和关键字段信息,生成数据流模型,可以包括:In some embodiments, the data flow model is generated according to the sequence information and key field information of several database tables, which may include:
根据若干个数据库表,生成若干个数据流节点;Generate several data flow nodes according to several database tables;
根据若干个数据库表的次序信息和关键字段信息,将若干个数据流节点进行串接,得到数据流模型。According to the sequence information and key field information of several database tables, several data flow nodes are connected in series to obtain a data flow model.
需要说明的是,根据若干个数据库表,可以生成若干个数据流节点,其中,一个数据库表可以对应生成一个或者多个数据流节点,在数据库表对应一个父表的情况下,一个数据库表对应生成一个数据流节点;在一个数据库表对应多个父表的情况下,一个数据库表对应生成多个数据流节点。It should be noted that, according to several database tables, several data flow nodes can be generated. Among them, one database table can generate one or more data flow nodes. In the case of a database table corresponding to a parent table, a database table corresponds One data flow node is generated; in the case of one database table corresponding to multiple parent tables, one database table corresponds to multiple data flow nodes.
一个数据流节点的内容包括但不限于数据库表的次序信息和关键字段信息。然后,按照各数据库表的次序信息和关键字段信息,依次将得到的数据流节点串接在一起,就得到了数据流模型。The content of a data flow node includes but not limited to the sequence information and key field information of the database table. Then, according to the sequence information and key field information of each database table, the obtained data flow nodes are sequentially connected together to obtain the data flow model.
在数据流模型中,数据流节点是按照各数据流节点所对应的数据库表的次序信息顺序串接的,而且对于次序信息相邻的两个数据流节点,前一数据流节点的流出字段为后一数据流节点的流入字段,形成了两个数据流节点之间的关联,并清晰表示了数据库表的关联关系。In the data flow model, the data flow nodes are serially connected according to the order information of the database tables corresponding to each data flow node, and for two data flow nodes with adjacent order information, the outflow field of the previous data flow node is The inflow field of the latter data flow node forms the association between the two data flow nodes, and clearly shows the association relationship of the database tables.
在一些实施例中,确定若干个数据库表中每一个数据库表对应的流入字段和流出字段,可以包括:In some embodiments, determining the inflow field and outflow field corresponding to each database table in several database tables may include:
确定第一数据库表的主键字段,将主键字段作为第一数据库表对应的流出字段;Determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table;
确定第一数据库表对应上一数据流节点的第二数据库表,将第二数据库表确定为第一数据库表对应的流入表,并将流入表的主键字段作为第一数据库表对应的流入字段;Determining that the first database table corresponds to the second database table of the previous data flow node, determining the second database table as the inflow table corresponding to the first database table, and using the primary key field of the inflow table as the inflow field corresponding to the first database table;
其中,第一数据库表为若干个数据库表中的任意一个数据库表。Wherein, the first database table is any one of several database tables.
需要说明的是,对于若干个数据库表中的任意一个数据库表(称作第一数据库表),在确定其流入字段和流出字段时,可以先确定第一数据库表的主键字段,并将主键字段确定为第一数据库表的流出字段;同时,确定第一数据库表对应的上一数据流节点的第二数据库表,即第二数据库表为次序信息与第一数据库表相邻且位于第一数据库表之前的数据库表,也就是第一数据库表的流入表,将流入表的主键字段作为第一数据库表对应的流入字段。It should be noted that, for any one of several database tables (called the first database table), when determining its inflow field and outflow field, the primary key field of the first database table can be determined first, and the primary key field Determined as the outflow field of the first database table; at the same time, determine the second database table of the previous data flow node corresponding to the first database table, that is, the second database table is adjacent to the first database table and located in the first database for the order information The database table before the table, that is, the inflow table of the first database table, uses the primary key field of the inflow table as the inflow field corresponding to the first database table.
在一些实施例中,该方法还可以包括:In some embodiments, the method may also include:
在第一数据库表处于待处理数据流的起始数据流节点的情况下,确定第一数据库表对应的流入表和流入字段均为空;When the first database table is at the initial data flow node of the data stream to be processed, it is determined that the inflow table and the inflow field corresponding to the first database table are both empty;
在第一数据库表处于待处理数据流的结束数据流节点的情况下,确定第一数据库表对应的流出字段为空。When the first database table is at the end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.
需要说明的是,对于第一数据库表,如果其为待处理数据流的起始,那么其并不存在流入表,则确定其对应的流入表和流入字段均为空(null),同时,其接收字段也为空;如果第一数据库表为待处理数据流的结束,那么其并不存在流出字段,则其对应的流出字段为空。It should be noted that, for the first database table, if it is the start of the data flow to be processed, and there is no inflow table, it is determined that the corresponding inflow table and inflow field are both empty (null), and at the same time, its The receiving field is also empty; if the first database table is the end of the data stream to be processed, then there is no outgoing field, and its corresponding outgoing field is empty.
对于表1而言,其流出字段为其主键字段product_no,由于其为待处理数据流的起始,没有数据库表流入,因此,表1的流入字段、流入表和接收字段均为空。对于表2而言,其流入表为表1,其流入字段为表1的主键字段product_no,由于表2为待处理数据流的结束,因此,其流出字段为空。For Table 1, its outflow field is its primary key field product_no. Since it is the start of the data stream to be processed, there is no inflow of database tables. Therefore, the inflow field, inflow table and receiving field of Table 1 are all empty. For Table 2, its inflow table is Table 1, and its inflow field is the primary key field product_no of Table 1. Since Table 2 is the end of the data flow to be processed, its outflow field is empty.
参见表3,其为基于表1和表2生成的数据流模型(也称为数据流模型表)。See Table 3, which is a data flow model generated based on Table 1 and Table 2 (also referred to as a data flow model table).
表3table 3
flow_nameflow_name seq_noseq_no tab_nametab_name out_colout_col get_colget_col in_tab_namein_tab_name in_colin_col
产品订货流product order flow 11 productproduct product_noproduct_no nullnull nullnull nullnull
产品订货流product order flow 22 orderorder nullnull product_noproduct_no productproduct product_noproduct_no
如表3所示的数据流模型,其中的每一行为数据流模型的一个数据流节点,表3包括两个数据流节点,一个是表1对应的数据流节点(即表3的第二行),另一个是表2对应的数 据流节点(即表3的第三行)。The data flow model shown in Table 3, each of which is a data flow node of the data flow model, Table 3 includes two data flow nodes, one is the data flow node corresponding to Table 1 (ie the second row of Table 3 ), and the other is the data flow node corresponding to Table 2 (that is, the third row of Table 3).
在表3中,flow_name表示数据流模型的名称,通常为其代表的待处理数据流的名称,由于表3代表的是产品订货流,因此其flow_name为产品订货流;seq_no表示数据流节点对应的数据库表的次序信息;tab_name表示数据流节点对应的数据库表的表名称;out_col表示数据流节点的流出字段,也就是数据流节点对应的数据库表的流出字段;get_col表示数据流节点的接收字段,也就是数据流节点对应的数据库表的流出字段;in_tab_name,表示数据流节点的流入表,也就是数据流节点对应的数据库表的流入表的名称;in_col表示数据流节点的流入字段,也就是数据流节点对应的数据库表的流入字段。In Table 3, flow_name represents the name of the data flow model, usually the name of the data flow it represents to be processed. Since Table 3 represents the product order flow, its flow_name is the product order flow; seq_no represents the data flow node corresponding The sequence information of the database table; tab_name indicates the table name of the database table corresponding to the data flow node; out_col indicates the outflow field of the data flow node, that is, the outflow field of the database table corresponding to the data flow node; get_col indicates the receiving field of the data flow node, That is, the outflow field of the database table corresponding to the data flow node; in_tab_name indicates the inflow table of the data flow node, that is, the name of the inflow table of the database table corresponding to the data flow node; in_col indicates the inflow field of the data flow node, that is, the data The inflow field of the database table corresponding to the flow node.
如表3所示,在数据流模型中除包括次序信息和关键字段信息之外,还可以包括数据流名称(flow_name),用于表示该数据流模型所对应的待处理数据流。As shown in Table 3, in addition to sequence information and key field information, the data flow model may also include a data flow name (flow_name), which is used to indicate the data flow to be processed corresponding to the data flow model.
可见,通过表3,能够将表1和表2之间的关联关系进行表征。例如,表1和表2之间的流入流出关系为,表1流入表2,以及各表的流入字段、流出字段以及接收字段等等。It can be seen that through Table 3, the relationship between Table 1 and Table 2 can be characterized. For example, the inflow and outflow relationship between Table 1 and Table 2 is that Table 1 flows into Table 2, and the inflow fields, outflow fields, and receiving fields of each table, etc.
还需要说明的是,该数据管理方法生成的数据流模型可以实现对某一业务过程中的数据库表的管理。这里,该业务过程不一定已经实际进行;也就是说,本申请实施例能够在业务开展之前或者之后的任意时间,生成数据流模型。It should also be noted that the data flow model generated by the data management method can realize the management of database tables in a certain business process. Here, the business process may not have been actually performed; that is, the embodiment of the present application can generate a data flow model at any time before or after the business is started.
还需要说明的是,本申请实施例的基本思想是将数据库表看作数据库中的一个个节点,按照数据产生的先后顺序以及数据库表之间的关联关系,把一个个节点串联起来,就构成该业务过程的数据流,然后将数据流中的信息按照一定的数据结构存储到数据库中,得到数据流模型,数据流模型中包括若干个数据流节点。示例性地,如表4所示,其示出了本申请实施例提供的一种数据流模型的表结构及示例数据。其主要描述了在数据流模型中,一个数据流节点所包括的内容,以及数据流节点的各字段的字段描述、字段类型以及字段属性等。It should also be noted that the basic idea of the embodiment of the present application is to regard the database table as each node in the database, and connect the nodes in series according to the sequence of data generation and the relationship between the database tables to form a The data flow of the business process, and then store the information in the data flow in the database according to a certain data structure to obtain a data flow model, which includes several data flow nodes. Exemplarily, as shown in Table 4, it shows a table structure and example data of a data flow model provided by the embodiment of the present application. It mainly describes the content included in a data flow node in the data flow model, as well as the field description, field type and field attribute of each field of the data flow node.
表4Table 4
Figure PCTCN2022121315-appb-000003
Figure PCTCN2022121315-appb-000003
如表4所示,data_flow表示数据流,对于一个待处理数据流,其生成的数据流模型所包括的数据流节点中可以包括以下内容:flow_name、seq_no、tab_name、out_col、get_col、in_tab_name和in_col。As shown in Table 4, data_flow represents a data flow. For a data flow to be processed, the data flow nodes included in the generated data flow model may include the following contents: flow_name, seq_no, tab_name, out_col, get_col, in_tab_name and in_col.
flow_name、seq_no、tab_name、in_tab_name均为数据流模型的主键,三者可以作为数据流模型的联合主键;seq_no表示tab_name(该数据流节点对应的数据库表)在待处理数据流中的次序信息,也称作序号;in_tab_name为流入表名称(也称作上游节点表名称),in_col为in_tab_name的主键字段;out_col为该数据流节点的流出字段(通常为该数据流节点的主键字段),get_col为本节点接收上游节点(即流入表对应的数据流节点)的关联字段(通常为数据流节点的外键字段),一个完整的数据流模型为同一待处理数据流的多条data_flow表(即数据流节点)记录构成。flow_name, seq_no, tab_name, and in_tab_name are the primary keys of the data flow model, and the three can be used as the joint primary key of the data flow model; seq_no indicates the order information of tab_name (the database table corresponding to the data flow node) in the data flow to be processed, and also It is called the serial number; in_tab_name is the name of the inflow table (also called the name of the upstream node table), in_col is the primary key field of in_tab_name; out_col is the outflow field of the data flow node (usually the primary key field of the data flow node), get_col is the base The node receives the associated field (usually the foreign key field of the data flow node) of the upstream node (that is, the data flow node corresponding to the inflow table), and a complete data flow model is multiple data_flow tables of the same data flow to be processed (that is, the data flow node) record composition.
表2、表3和表4示出了一种简单的业务场景下,由待处理数据流生成数据流模型的过程,在实际中,往往还会存在更复杂的场景。例如,一个子表对应多个父表的业务场景。Table 2, Table 3, and Table 4 show the process of generating a data flow model from the data stream to be processed in a simple business scenario. In practice, there are often more complex scenarios. For example, a child table corresponds to the business scenario of multiple parent tables.
因此,在一些实施例中,根据若干个数据库表,生成若干个数据流节点,可以包括:Therefore, in some embodiments, generating several data flow nodes according to several database tables may include:
若第一数据库表对应一个父表,则根据第一数据库表生成一个数据流节点;If the first database table corresponds to a parent table, then generate a data flow node according to the first database table;
若第一数据库表对应至少两个父表,则根据第一数据库表生成至少两个数据流节点,且 至少两个数据流节点的次序信息相同;If the first database table corresponds to at least two parent tables, at least two data flow nodes are generated according to the first database table, and the order information of at least two data flow nodes is the same;
其中,第一数据库表为若干个数据库表中的任意一个数据库表。Wherein, the first database table is any one of several database tables.
需要说明的是,将若干个数据库表中的任意一个数据库表记为第一数据库表,父表表示第一数据库的流入表。It should be noted that any one of the several database tables is marked as the first database table, and the parent table represents the inflow table of the first database.
对于第一数据库表(记为子表),其对应的流入表记为父表。如果一个子表只对应一个父表,即第一数据库表只存在一个流入表,则如前述的表2~4,第一数据库表对应生成一个数据流节点。For the first database table (denoted as a child table), its corresponding inflow table is denoted as a parent table. If a child table corresponds to only one parent table, that is, there is only one inflow table in the first database table, as in Tables 2 to 4 above, the first database table corresponds to a data flow node.
如果一个子表对应至少两个父表,即第一数据库表存在多个流入表,则分别生成第一数据库表对应的至少两个数据流节点,这至少两个数据流节点的次序信息相同,但是流入信息不同(流入信息可以包括前述的:流入表名称、流入字段和接收字段),而且子表对应的至少两个数据流节点的数量和至少两个父表的数量相同。If a child table corresponds to at least two parent tables, that is, there are multiple inflow tables in the first database table, at least two data flow nodes corresponding to the first database table are respectively generated, and the order information of the at least two data flow nodes is the same, However, the inflow information is different (the inflow information may include the aforementioned: inflow table name, inflow field and receiving field), and the number of at least two data flow nodes corresponding to the child table is the same as the number of at least two parent tables.
示例性地,以待处理数据流为一个物资采购流为例,其包括的数据库表为下述表5~8。Exemplarily, taking the data flow to be processed as an example of a material procurement flow, the database tables included in it are the following tables 5-8.
表5table 5
Figure PCTCN2022121315-appb-000004
Figure PCTCN2022121315-appb-000004
表6Table 6
Figure PCTCN2022121315-appb-000005
Figure PCTCN2022121315-appb-000005
表7Table 7
Figure PCTCN2022121315-appb-000006
Figure PCTCN2022121315-appb-000006
表8Table 8
Figure PCTCN2022121315-appb-000007
Figure PCTCN2022121315-appb-000007
表5~8为某制造行业系统的物资采购业务的数据库表,其中,表5为工程部件表,表6为标准件表,表7为技术需求单表,表8为采购单表。工程部件为某个工程的专用物资,标 准件为所有工程的通用物资,由于两种物资特征属性差异较大,故分为两张表存储。Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Among them, Table 5 is the table of engineering components, Table 6 is the table of standard parts, Table 7 is the table of technical requirements, and Table 8 is the table of purchase orders. Engineering parts are special materials for a certain project, and standard parts are common materials for all projects. Since the characteristic attributes of the two materials are quite different, they are stored in two tables.
在这种业务中,技术需求可能是针对工程部件的需求,也有可能是针对标准件的需求,从而表5和表6均为表7的父表,这时候,在生成数据流模型时,就需要针对表7生成两个数据流节点。最终生成的数据流模型如表9所示。In this kind of business, the technical requirements may be for engineering components, or for standard parts, so Table 5 and Table 6 are the parent tables of Table 7. At this time, when generating the data flow model, it is Two dataflow nodes need to be generated for Table 7. The final generated data flow model is shown in Table 9.
表9某制造行业物资采购流示例数据Table 9 Example data of material procurement flow in a manufacturing industry
Figure PCTCN2022121315-appb-000008
Figure PCTCN2022121315-appb-000008
如表9所示,由于表5和表6均为物资采购流的起始,因此,表5和表6对应的数据流节点共用一个自增长序列(次序信息),保证两张表的主键不冲突。该行业的物资采购流程是先由技术部门创建技术需求单,指定要采购的物资及数量,再交由采购部门创建采购单,发起采购。As shown in Table 9, since both Table 5 and Table 6 are the start of the material procurement flow, the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (sequence information) to ensure that the primary keys of the two tables are not the same conflict. The material procurement process in this industry is that the technical department first creates a technical demand list, specifies the materials and quantities to be purchased, and then submits it to the purchasing department to create a purchase order and initiate the procurement.
该数据流模型存在两个起始数据流节点,proj_mat和std_mat,故数据流模型存在两条seq_no为1的记录,tech_mat_req的接收字段mat_no或者来源于proj_mat的流出字段proj_mat_no,或者来源于std_mat的流出字段std_mat_no,故数据流模型存在两条seq_no为2的记录。There are two initial data flow nodes in the data flow model, proj_mat and std_mat, so there are two records with seq_no as 1 in the data flow model, and the receiving field mat_no of tech_mat_req comes from the outflow field proj_mat_no of proj_mat, or the outflow from std_mat The field std_mat_no, so there are two records with seq_no as 2 in the data flow model.
另外,由于tech_mat_req的主键为tmr_no、tmr_item_no组成的联合主键,故表tech_mat_req的输出字段及表purchase_order的输入字段和接收字段均使用联合主键的元组格式表示。表5~9说明了如何使用数据流模型来表示一个子表对应多个父表的复杂业务场景。In addition, since the primary key of tech_mat_req is the joint primary key composed of tmr_no and tmr_item_no, the output field of the table tech_mat_req and the input field and receiving field of the table purchase_order are expressed in the tuple format of the joint primary key. Tables 5 to 9 illustrate how to use the data flow model to represent complex business scenarios where one child table corresponds to multiple parent tables.
进一步的,在实际中还存在一个待处理数据流对应多个业务过程的场景。对于这种复杂的待处理数据流,还可以按照先分流再合流的方式进行处理。参见图2,其示出了本申请实施例提供的另一种数据管理方法的流程示意图。如图2所示,该方法可以包括:Furthermore, in reality, there is a scenario where one data stream to be processed corresponds to multiple business processes. For this kind of complex data stream to be processed, it can also be processed in the manner of first splitting and then merging. Referring to FIG. 2 , it shows a schematic flowchart of another data management method provided by an embodiment of the present application. As shown in Figure 2, the method may include:
S201、对待处理数据流包括的若干个数据库表进行分流处理,得到至少两组数据库表。S201. Perform split processing on several database tables included in the data stream to be processed to obtain at least two groups of database tables.
S202、确定至少两组数据库表中每一组数据库表的次序信息和关键字段信息。S202. Determine sequence information and key field information of each group of database tables in at least two groups of database tables.
S203、根据每一组数据库表的次序信息和关键字段信息,生成至少两个数据流子模型;每一个数据流子模型分别用于表征每一组数据库表之间的关联关系。S203. Generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; each data flow sub-model is used to represent the association relationship between each group of database tables.
需要说明的时候,在一些复杂的业务场景下,对于待处理数据流,虽然其对应的是同一个目标业务的业务过程,但是即使是同一个目标业务,也可能在业务进行时,存在不同的业务过程,即待处理数据流对应至少两个业务过程。When it needs to be explained, in some complex business scenarios, although the data flow to be processed corresponds to the business process of the same target business, even if it is the same target business, there may be different differences when the business is in progress. The business process, that is, the data flow to be processed corresponds to at least two business processes.
在这种情况下,本申请实施例还可以对待处理数据流所包括的数据库表进行分流处理,即将该待处理数据流中的数据库表分流为至少两组数据库表。这里,对于同时参与了多个业务过程的数据库表,在分流时,在每一个数据库表中均包括该数据库表。In this case, the embodiment of the present application may also perform split processing on the database tables included in the data stream to be processed, that is, split the database tables in the data stream to be processed into at least two groups of database tables. Here, for a database table that participates in multiple business processes at the same time, the database table is included in each database table during distribution.
对分流得到的每一组数据库表,分别确定每一组数据库表的次序信息和关键字段信息,并据此生成各自对应的数据流子模型。每一个数据流子模型分别用于表征每一组数据库表之间的关联关系。For each group of database tables obtained by splitting, the order information and key field information of each group of database tables are respectively determined, and corresponding data flow sub-models are generated accordingly. Each data flow sub-model is used to represent the relationship between each group of database tables.
对于每一组数据库表,确定数据流子模型的方式如前述。For each group of database tables, the method of determining the data flow sub-model is as described above.
示例性地,物资的采购流程可能分两种情况,一是通过技术需求单直接生成采购单,二是通过技术需求单先生成询价单再生成采购单。这种场景是由技术需求单开始,通过询价与不询价的两条分支,最终又合并于采购单,关于这种场景应该将业务过程归于两条业务流, 即将待处理数据流分流为询价采购流和不询价采购流。Exemplarily, the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through the technical demand form, and the other is to generate an inquiry form first and then generate a purchase order through the technical demand form. This scenario starts with a technical demand sheet, passes through the two branches of inquiry and non-inquiry, and finally merges into the purchase order. Regarding this scenario, the business process should be attributed to two business flows, that is, the data flow to be processed is divided into Inquiry procurement flow and non-inquiry procurement flow.
以这种询价和不询价的物资采购场景为例,待处理数据流中包括的数据库表为表7,以及下述表10~14。Taking this material procurement scenario of inquiry and non-inquiry as an example, the database tables included in the data flow to be processed are Table 7 and the following Tables 10-14.
表10Table 10
Figure PCTCN2022121315-appb-000009
Figure PCTCN2022121315-appb-000009
表11Table 11
Figure PCTCN2022121315-appb-000010
Figure PCTCN2022121315-appb-000010
表12Table 12
Figure PCTCN2022121315-appb-000011
Figure PCTCN2022121315-appb-000011
表13Table 13
Figure PCTCN2022121315-appb-000012
Figure PCTCN2022121315-appb-000012
表14Table 14
Figure PCTCN2022121315-appb-000013
Figure PCTCN2022121315-appb-000013
其中,表7为技术需求单,表10为询价单表;表11为技术需求与询价关联表(tmr与enq关联表);表12为询价与采购关联表(enq和po关联表);表13为技术需求与采购关联表(tmr与po关联表),表14为采购单表。Among them, Table 7 is the technical demand list, Table 10 is the inquiry form; Table 11 is the technical demand and inquiry association table (tmr and enq association table); Table 12 is the inquiry and procurement association table (enq and po association table ); Table 13 is the association table between technical requirements and procurement (the association table between tmr and po), and Table 14 is the purchase order table.
需要说明的是,对于存在询价和不询价两种业务过程的物资采购流,关于技术需求单、询价单以及采购单之间可能存在多对多的关系,需要通过几个关联表进行关联,以区分询价 和不询价的过程。且采购单表不再需要tmr_no、tmr_item_no字段。It should be noted that for the material procurement flow with two business processes of inquiry and non-inquiry, there may be a many-to-many relationship between the technical demand list, the inquiry list and the purchase order, which needs to be carried out through several association tables. Association to distinguish the process of inquiry and non-inquiry. And the tmr_no and tmr_item_no fields are no longer needed in the purchase order table.
将表7以及表10~14组成的待处理数据流进行分流,分流的结果为询价采购流和不询价采购流两组数据流,其中,询价采购流包括表7、表11、表10、表12和表14,对应询价场景下的物资采购流程;不询价采购流包括表7、表13和表14,对应不询价场景下的物资采购流程。Divide the data flow to be processed composed of Table 7 and Tables 10 to 14, and the result of the diversion is two sets of data flows: the inquiry procurement flow and the non-inquiry procurement flow, where the inquiry procurement flow includes Table 7, Table 11, Table 10. Table 12 and Table 14 correspond to the material procurement process in the inquiry scenario; the non-inquiry procurement flow includes Table 7, Table 13 and Table 14, which correspond to the material procurement process in the non-inquiry scenario.
根据询价采购流得到的数据流子模型如表15。Table 15 shows the data flow sub-model obtained according to the RFQ procurement flow.
表15Table 15
Figure PCTCN2022121315-appb-000014
Figure PCTCN2022121315-appb-000014
根据不询价采购流得到的数据流子模型如表16。Table 16 shows the data flow sub-model obtained according to the non-inquiry procurement flow.
表16Table 16
Figure PCTCN2022121315-appb-000015
Figure PCTCN2022121315-appb-000015
如表15和16所示,将询价与不询价的业务流程对应的数据库表划分为两个数据流,可以更加清晰地描述业务过程,待处理数据流被分为两组数据库表,并得到了两个数据流子模型,且两个数据流子模型的起始数据流节点对应的数据库表均为表7,结束数据流节点对应的数据库表均为表14,即两者具有相同的起点,在分流之后,最后又合流为相同的终点。As shown in Tables 15 and 16, dividing the database tables corresponding to the business processes of inquiry and non-inquiry into two data streams can describe the business process more clearly. The data streams to be processed are divided into two sets of database tables, and Two data flow sub-models are obtained, and the database tables corresponding to the start data flow nodes of the two data flow sub-models are both Table 7, and the database tables corresponding to the end data flow nodes are both Table 14, that is, both have the same The starting point, after diverging, finally merges into the same end point.
进一步的,在一些实施例中,该方法还可以包括:Further, in some embodiments, the method may also include:
对至少两个数据流子模型进行合流处理,得到数据流模型。Merge processing is performed on at least two data flow sub-models to obtain a data flow model.
需要说明的是,在得到两个数据流子模型之后,还可以将两个数据流子模型进行合流处理,例如进行拼接或者保存在同一目录下,至少两个数据流子模型合流后得到这种复杂场景下的数据流模型。It should be noted that after the two data flow sub-models are obtained, the two data flow sub-models can also be merged, such as spliced or saved in the same directory, and at least two data flow sub-models can be merged to obtain this Data flow model in complex scenarios.
本示例说明了如何使用数据流模型来表示数据分流之后再合流的场景,对于一个非常复杂的待处理数据流,如果中间阶段存在分流再合流的情况,可以把待处理数据流分为多个阶段来处理,在命名数据流时可以用“XX流-XX阶段”来区分,分流的数据库表处于分别的数据流中。例如,将表15对应的数据流命名为:物资采购流-询价阶段,将表16对应的数据流命名为:物资采购流-不询价阶段。This example illustrates how to use the data flow model to represent the scene of data splitting and then merging. For a very complex data flow to be processed, if there is splitting and remerging in the middle stage, the pending data flow can be divided into multiple stages. To deal with, when naming the data stream, you can use "XX stream-XX stage" to distinguish, and the database tables for the split are in separate data streams. For example, name the data flow corresponding to Table 15: material procurement flow-inquiry stage, and name the data flow corresponding to Table 16: material procurement flow-no inquiry stage.
进一步地,在得到数据流模型之后,还可以将数据流模型与数据字典进行比较,以确保信息准确无误。因此,在一些实施例中,在生成数据流模型之后,该方法还可以包括:Further, after the data flow model is obtained, the data flow model can also be compared with the data dictionary to ensure that the information is correct. Therefore, in some embodiments, after generating the data flow model, the method may further include:
将数据流模型中的数据信息与数据字典进行比较;其中,数据字典存储有待处理数据流对应的数据信息;Compare the data information in the data flow model with the data dictionary; wherein, the data dictionary stores the data information corresponding to the data flow to be processed;
若数据流模型中的数据信息与数据字典中的数据信息不一致,则基于数据字典中的数据信息对数据流模型中的数据信息进行修正,使得数据流模型中的数据信息与数据字典中的数据信息一致。If the data information in the data flow model is inconsistent with the data information in the data dictionary, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data in the data dictionary The message is consistent.
需要说明的是,数据字典中存储有待处理数据流对应的目标业务的业务数据,在得到待处理数据流对应的数据流模型之后,将数据流模型和数据字典进行比较核对,如果两者信息一致,则说明生成的数据流模型无误;如果两者信息不一致,就基于数据字典中的数据信息对数据流模型中的数据流信息进行修正,使得数据流模型与数据字典中的数据信息是一致的,从而得到准确的数据流模型。It should be noted that the data dictionary stores the business data of the target business corresponding to the data flow to be processed. After obtaining the data flow model corresponding to the data flow to be processed, compare and check the data flow model and the data dictionary. If the two information is consistent , it means that the generated data flow model is correct; if the two information is inconsistent, the data flow information in the data flow model is corrected based on the data information in the data dictionary, so that the data flow model is consistent with the data information in the data dictionary , so as to obtain an accurate data flow model.
进一步地,基于生成的数据流模型,还可以对各数据流节点对应的数据库表进行查询,并查询任意数据流节点的上下游关系。因此,在一些实施例中,在生成数据流模型之后,该方法还可以包括:Furthermore, based on the generated data flow model, the database table corresponding to each data flow node can also be queried, and the upstream and downstream relationship of any data flow node can be queried. Therefore, in some embodiments, after generating the data flow model, the method may further include:
确定待查询信息;Determine the information to be queried;
基于待查询信息在数据流模型中进行查询,确定待查询信息对应的数据库表,和/或,确定待查询信息对应的数据流节点与数据流节点之间的关联关系。Perform a query in the data flow model based on the information to be queried, determine a database table corresponding to the information to be queried, and/or determine a data flow node corresponding to the information to be queried and an association relationship between the data flow nodes.
需要说明的是,首先确定待查询信息,这里可以将次序信息和/或关键字段信息作为待查询信息在数据流模型中进行查询,从而能够得到待查询信息对应的至少一个数据流节点,并进一步得到该数据流节点对应的数据库表,同时,还可以获知数据流节点与其它数据流节点之间的关联关系。关联关系可以包括数据流节点之间的上下游关系、关键字段信息之间的关联等等,其中,将流入该数据流节点的数据流节点称为上游数据流节点,将该数据流节点所流入的数据流节点称为下游数据流节点;例如在前述表3中,第二行的产品订货流1就是第三行的产品订货流2的上游数据流节点,产品订货流2就是产品订货流1的下游数据流节点。在进行查询时,可以使用结构化查询语言(Structured Query Language,SQL语句)方便地实现查询多张数据库表的上下游关系及复杂的逻辑关联。It should be noted that, firstly, the information to be queried is determined. Here, the sequence information and/or key field information can be queried in the data flow model as the information to be queried, so that at least one data flow node corresponding to the information to be queried can be obtained, and The database table corresponding to the data flow node is further obtained, and at the same time, the association relationship between the data flow node and other data flow nodes can also be obtained. The association relationship may include the upstream and downstream relationship between data flow nodes, the association between key field information, etc., wherein the data flow node flowing into the data flow node is called the upstream data flow node, and the data flow node is referred to as The incoming data flow node is called a downstream data flow node; for example, in the aforementioned Table 3, the product order flow 1 in the second row is the upstream data flow node of the product order flow 2 in the third row, and the product order flow 2 is the product order flow 1's downstream data flow node. When querying, you can use Structured Query Language (SQL statement) to conveniently query the upstream and downstream relationships and complex logical associations of multiple database tables.
本实施例提供了一种数据管理方法,通过获取待处理数据流,待处理数据流包括若干个数据库表;确定若干个数据库表的次序信息和关键字段信息;根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。这样,基于待处理数据流中的若干个数据库表的次序信息和关键字段信息生成的数据流模型,不仅能够实现对这若干个数据库表的高效管理,降低了人工维护成本,而且还能够适用复杂的应用场景;另外,由于该数据流模型能够完整记录数据库表与数据库表之间的关联关系,避免了创建物理外键导致的性能问题,同时还能够在数据流模型中对数据流信息进行方便查询,提升了数据管理效率;另外,针对一个子表对应多个父表,以及待处理数据流过于复杂的场景,本申请实施例还可以针对一个数据流表生成多个数据流节点,或者通过对待处理数据流进行分流,分别得到数据流子模型后再合流等方式,实现将这些复杂场景下的待处理数据流转化为简洁清晰的数据流模型,进行数据流的高效管理。This embodiment provides a data management method, by obtaining the data stream to be processed, the data stream to be processed includes several database tables; determining the sequence information and key field information of several database tables; according to the sequence information of several database tables and key field information to generate a data flow model; wherein, the data flow model is used to represent the association relationship between several database tables. In this way, the data flow model generated based on the sequence information and key field information of several database tables in the data stream to be processed can not only realize the efficient management of these several database tables, reduce the cost of manual maintenance, but also be applicable to Complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, performance problems caused by creating physical foreign keys can be avoided, and data flow information can also be processed in the data flow model It is convenient for query and improves the efficiency of data management; in addition, for scenarios where one child table corresponds to multiple parent tables and the data flow to be processed is too complex, the embodiment of this application can also generate multiple data flow nodes for one data flow table, or By splitting the data streams to be processed, obtaining the sub-models of the data streams separately, and then merging them, the data streams to be processed in these complex scenarios can be converted into a concise and clear data stream model for efficient management of the data streams.
本申请的另一实施例中,参见图3,其示出了本申请实施例提供的又一种数据管理方法的流程示意图。如图3所示,该方法可以包括:In another embodiment of the present application, refer to FIG. 3 , which shows a schematic flowchart of another data management method provided by the embodiment of the present application. As shown in Figure 3, the method may include:
S301、确定待处理数据流。S301. Determine the data flow to be processed.
需要说明的是,数据流是对业务过程的一种描述,在本申请实施例中,每个待处理数据流通常只对应一个业务过程。确定待处理数据流,即是确定业务过程,可以根据业务过程为待处理数据流进行命名,例如:产品订货流、物资采购流等等,该命名可以作为待处理数据流的唯一标识。It should be noted that a data flow is a description of a business process, and in this embodiment of the application, each data flow to be processed generally corresponds to only one business process. To determine the data flow to be processed is to determine the business process. The data flow to be processed can be named according to the business process, for example: product order flow, material procurement flow, etc., and the name can be used as the unique identifier of the data flow to be processed.
S302、确定待处理数据流所包含的所有数据库表。S302. Determine all database tables included in the data stream to be processed.
需要说明的是,在确定待处理数据流后,需要确定该待处理数据流对应的业务过程所涉及的所有数据库表。It should be noted that after the data flow to be processed is determined, all database tables involved in the business process corresponding to the data flow to be processed need to be determined.
还需要说明的是,同一数据库表可能存在于多个不同的待处理数据流中,即不同的待处理数据流中可能会包括有相同的数据库表。例如,前述的表7就分别存在于两个待处理数据流中。It should also be noted that the same database table may exist in multiple different data streams to be processed, that is, different data streams to be processed may include the same database table. For example, the aforementioned Table 7 exists in two data streams to be processed respectively.
示例性地,对于前述实施例中的产品订货流,其包括的数据库表为表1(产品表)和表2(订单表);对于前述实施例中的物资采购流,其包括的数据库表为表5(工程部件表)、表6(标准件表)、表7(技术需求单表)和表8(采购单表);对于前述实施例中需要进行分流-合流的复杂的物资采购流,其包括的数据库表为表7(技术需求单表)、表10(询价单表)、表11(技术需求与询价关联表)、表12(询价与采购关联表)、表13(技术需求与采购关联表)和表14(采购单表)。Exemplarily, for the product ordering flow in the aforementioned embodiments, the database tables it includes are Table 1 (product table) and Table 2 (order table); for the material procurement flow in the aforementioned embodiments, the database tables it includes are Table 5 (engineering component table), table 6 (standard part table), table 7 (technical demand list table) and table 8 (purchase order table); for the complex material procurement flow that needs to be diverted-combined in the aforementioned embodiments, The database tables included are Table 7 (Technical Requirements Form), Table 10 (Inquiry Form), Table 11 (Technical Requirements and Inquiry Association Table), Table 12 (Inquiry and Purchase Association Table), Table 13 ( Technical Requirements and Purchasing Association Table) and Table 14 (Purchasing Form).
S303、确定待处理数据流中所有数据库表的先后次序,为每个数据库表编号,并确定每个数据库表的流入流出字段信息,确定数据流信息。S303. Determine the sequence of all database tables in the data stream to be processed, number each database table, determine the inflow and outflow field information of each database table, and determine the data flow information.
S304、将确定的数据流信息写入到数据流模型中。S304. Write the determined data flow information into the data flow model.
需要说明的是,根据业务过程中各数据库表产生的先后顺序,确定待处理数据流中所有数据库表的先后次序,为每个数据库表进行编号,并确定每个数据库表的流入流出字段信息。It should be noted that, according to the sequence of database tables generated in the business process, the sequence of all database tables in the data stream to be processed is determined, each database table is numbered, and the inflow and outflow field information of each database table is determined.
确定出的数据流信息可以包括包括次序信息、流入字段、流出字段,还可以包括接收字段以及流入表等等。The determined data flow information may include sequence information, an inflow field, an outflow field, and may also include a receiving field, an inflow table, and the like.
具体来说,根据数据库表在待处理数据流中的先后顺序,生成每个数据库表在待处理数据流中的序号,即前述实施例中的次序信息。这里可以按照数字顺序生成从1到n的数字序列,并将其写入到数据流模型中。Specifically, according to the order of the database tables in the data stream to be processed, the sequence number of each database table in the data stream to be processed is generated, that is, the sequence information in the foregoing embodiments. Here, a sequence of numbers from 1 to n can be generated in numerical order and written into the data flow model.
在数据流模型中,每一行数据为该数据流模型中的一个数据流节点,对于每一个数据流节点,其流出字段为数据库表自身的主键,接收字段为外键字段,流入表为上一个节点的数据库表,流入字段为流入表的主键(数据库表的接收字段与流入表的流入字段相关联)。In the data flow model, each row of data is a data flow node in the data flow model. For each data flow node, its outgoing field is the primary key of the database table itself, the receiving field is the foreign key field, and the incoming table is the previous The database table of the node, the inflow field is the primary key of the inflow table (the receiving field of the database table is associated with the inflow field of the inflow table).
顺序最先的数据库表对应的数据流节点为起始数据流节点,起始数据流节点的接收字段、流入表、流入字段为null,顺序最后的数据库表对应的数据流节点为结束数据流节点,结束数据流节点的流出字段为null。The data flow node corresponding to the first database table is the start data flow node, the receiving field, inflow table, and inflow field of the start data flow node are null, and the data flow node corresponding to the last database table is the end data flow node , the outflow field of the end dataflow node is null.
在本申请实施例的一种实现方式中,对于仅包括两个数据库表(表1和表2)的产品订货流,在数据流模型3中,表1对应的数据流节点的序号为1,表2对应的节点序号为2。表3为该业务过程对应的数据流模型。In an implementation of the embodiment of the present application, for a product order flow that only includes two database tables (Table 1 and Table 2), in the data flow model 3, the serial number of the data flow node corresponding to Table 1 is 1, The node number corresponding to Table 2 is 2. Table 3 shows the data flow model corresponding to the business process.
如表3所示,其中包括两个数据流节点:产品订货流1和产品订货流2,产品订货流1为表1(产品表,product表)对应的数据流节点,产品订货流2为表2(订单表,order表)对应的数据流节点。As shown in Table 3, it includes two data flow nodes: product order flow 1 and product order flow 2, product order flow 1 is the data flow node corresponding to table 1 (product table, product table), and product order flow 2 is the table 2 (order table, order table) corresponds to the data flow node.
由于product表对应的产品订货流1为该数据流模型的起始数据流节点,故产品订货流1的get_col、in_tab_name以及in_col均为null;order表对应的产品订货流2为该数据流模型的结束数据流节点,故产品订货流2的out_col为null。其中,product.product_no与order.product_no互为关联。Since the product order flow 1 corresponding to the product table is the initial data flow node of the data flow model, the get_col, in_tab_name, and in_col of the product order flow 1 are all null; the product order flow 2 corresponding to the order table is the data flow model’s End the data flow node, so the out_col of product order flow 2 is null. Among them, product.product_no and order.product_no are related to each other.
表1~3为产品订货流的一个简单业务场景,在实际中,往往存在更加复杂的业务场景。Tables 1 to 3 show a simple business scenario of product order flow. In practice, there are often more complex business scenarios.
在本申请实施例的另一种实现方式中,对于一个子表对应多个父表的复杂情况,如物资采购流,表5~8为某制造行业系统的物资采购业务的数据库表,其中工程部件可以为某个工程的专用物资,标准件可以为所有工程的通用物资。由于两种物资特征属性差异较大,故分为两张表(表5和表6)进行存储;该行业的物资采购流程是先由技术部门创建技术需求单,指定要采购的物资及其数量,再交由采购部门创建采购单,发起采购。该行业物资采购的数据流信息如表9所示。In another implementation of the embodiment of the present application, for a complex situation where one child table corresponds to multiple parent tables, such as the material procurement flow, Tables 5 to 8 are the database tables of the material procurement business of a certain manufacturing industry system. Components can be special materials for a certain project, and standard parts can be general materials for all projects. Due to the large difference in the characteristic attributes of the two materials, they are divided into two tables (Table 5 and Table 6) for storage; the material procurement process in this industry is to first create a technical demand list by the technical department, and specify the materials to be purchased and their quantities , and then submit it to the purchasing department to create a purchase order and initiate the purchase. The data flow information of material procurement in this industry is shown in Table 9.
在这种情况下,表7可以与表5关联,表7也可以与表6关联,即表7作为一个子表, 存在表5和表6两个父表。在数据库模型中,表5和表6对应的数据流节点的主键共用一个自增长序列(即次序信息),以保证两张表的主键不冲突。如表9所示,由于表5和表6对应的数据流节点均可以作为该数据流模型的起始数据流节点,故表5和表6对应的两个数据流节点的节点序号均为1。tech_mat_req.mat_no或者来源于proj_mat.proj_mat_no,或者来源于std_mat.std_mat_no,故表tech_mat_req存在两条seq_no为2的记录。In this case, Table 7 can be associated with Table 5, and Table 7 can also be associated with Table 6, that is, Table 7 is a child table, and there are two parent tables, Table 5 and Table 6. In the database model, the primary keys of the data flow nodes corresponding to Table 5 and Table 6 share a self-increasing sequence (that is, order information) to ensure that the primary keys of the two tables do not conflict. As shown in Table 9, since the data flow nodes corresponding to Table 5 and Table 6 can be used as the initial data flow node of the data flow model, the node serial numbers of the two data flow nodes corresponding to Table 5 and Table 6 are both 1 . tech_mat_req.mat_no comes from either proj_mat.proj_mat_no or std_mat.std_mat_no, so there are two records with seq_no 2 in the table tech_mat_req.
另外,由于tech_mat_req的主键为tmr_no、tmr_item_no的联合主键,故表tech_mat_req的输出字段及表purchase_order的输入字段和接收字段均使用元组格式表示。本示例说明了如何使用数据流模型来表示一个子表对应多个父表的复杂业务场景。In addition, since the primary key of tech_mat_req is the joint primary key of tmr_no and tmr_item_no, the output field of table tech_mat_req and the input field and receiving field of table purchase_order are expressed in tuple format. This example illustrates how to use the data flow model to represent a complex business scenario where one child table corresponds to multiple parent tables.
在本申请实施例的再一种实现方式中,在物资的采购流程中,还可能存在不同的业务过程,例如物资的采购流程可能分两种情况,一是通过技术需求单直接生成采购单,二是通过技术需求单先生成询价单再生成采购单,关于tmr、enq、po可能存在多对多的关系,故需要几个关联表,且采购单表不再需要tmr_no、tmr_item_no字段。这种场景是由技术需求单开始,通过询价与不询价的两条分支,最终又合并于采购单,关于这种场景可以将业务归于两个数据流子模型,分别如表15和表16所示。In yet another implementation of the embodiment of the present application, there may be different business processes in the procurement process of materials. For example, the procurement process of materials may be divided into two situations, one is to directly generate a purchase order through a technical demand list, The second is to first generate an inquiry form through the technical demand form and then generate a purchase order. There may be a many-to-many relationship between tmr, enq, and po, so several association tables are needed, and the purchase order table no longer needs the tmr_no and tmr_item_no fields. This scenario starts with a technical demand sheet, through the two branches of inquiry and non-inquiry, and finally merges into the purchase order. Regarding this scenario, the business can be attributed to two data flow sub-models, as shown in Table 15 and Table 1. 16.
如表15和16所示,将询价与不询价划分为两个数据流,可以更加清晰地描述业务过程。本示例说明了如何使用数据流模型来表示数据分流之后再合流的场景,对于一个非常复杂的数据流,如果中间阶段存在分流再合流的情况,可以把数据流分为多个阶段来处理,在命名数据流用“XX流-XX阶段”来区分,分流的信息仍然处于分别的数据流中。As shown in Tables 15 and 16, dividing inquiry and non-inquiry into two data streams can describe the business process more clearly. This example illustrates how to use the data flow model to represent the scene of data splitting and then merging. For a very complex data flow, if there is splitting and remerging in the middle stage, the data flow can be divided into multiple stages for processing. The named data flow is distinguished by "XX flow-XX stage", and the split information is still in a separate data flow.
进一步的,在将数据流信息插入数据流模型时,可以通过以下语句实现:Furthermore, when inserting data flow information into the data flow model, it can be realized by the following statement:
Figure PCTCN2022121315-appb-000016
Figure PCTCN2022121315-appb-000016
该语句可重复执行,直至数据流模型中的数据流节点数量达到该数据流模型所包含的数据流节点的总数量。This statement can be executed repeatedly until the number of data flow nodes in the data flow model reaches the total number of data flow nodes contained in the data flow model.
当存在数据流业务的更新需求,例如根据前述确定的数据流信息(包括次序信息、流入字段、流出字段、接收字段以及流入表等等)更新数据时,可以通过以下语句实现:When there is an update requirement for data flow services, such as updating data based on the aforementioned determined data flow information (including sequence information, inflow fields, outflow fields, receiving fields, and inflow tables, etc.), it can be implemented through the following statements:
Figure PCTCN2022121315-appb-000017
Figure PCTCN2022121315-appb-000017
该语句的参数可以根据前述确定的数据流信息来设置,其中需更新的字段均为可选项,根据实际情况更新数据。The parameters of this statement can be set according to the aforementioned determined data flow information, and the fields to be updated are all optional, and the data is updated according to the actual situation.
S305、核对数据流信息,与数据字典相关信息保持一致。S305. Check the data flow information, and keep consistent with the relevant information of the data dictionary.
在完成数据流信息的处理,得到数据流模型后,将数据流中的表名称以及关键字段信息等与数据字典的表名称、关键字段信息等进行核对是否一致,两者保持一致以便后续数据流模型与数据字典的结合使用。After completing the processing of data flow information and obtaining the data flow model, check whether the table name and key field information in the data flow are consistent with the table name and key field information in the data dictionary, and keep the two consistent for subsequent The data flow model is used in conjunction with the data dictionary.
综上所述,本申请实施例提供了一种数据流模型和生成数据流模型的方法,该数据流模型可以用来描述业务系统数据库中的表与表之间的关系,使业务系统的逻辑更加容易理解, 同时使得系统更容易维护及二次开发,也为后期将系统数据集成到数据仓库带来了便利。本申请实施例可以通过一个简洁的数据流模型来记录数据库中表与表的关联关系,并且可存储于数据库中与数据字典并存。在数据流模型中,可以使用SQL语句方便地查询多张数据库表的上下游关系及复杂的逻辑关联,同时,系统也避免了因为创建物理外键而导致的各种性能问题。To sum up, the embodiment of the present application provides a data flow model and a method for generating a data flow model. The data flow model can be used to describe the relationship between tables in the business system database, so that the logic of the business system It is easier to understand, and at the same time makes the system easier to maintain and secondary development, and also brings convenience for later integration of system data into the data warehouse. In the embodiment of the present application, a concise data flow model can be used to record the relationship between tables in the database, and can be stored in the database and coexist with the data dictionary. In the data flow model, you can use SQL statements to easily query the upstream and downstream relationships and complex logical associations of multiple database tables. At the same time, the system also avoids various performance problems caused by creating physical foreign keys.
以一个简单的示例来说明数据流模型的使用,以商品订货为例,如表1~3所示,表3中的两条数据为产品订货流的product表与order表的数据流节点信息,由于product表为起始数据流节点,故get_col、in_tab_name、in_col为null,order表为结束数据流节点,故out_col为null,product.product_no与order.product_no互为关联。A simple example is used to illustrate the use of the data flow model. Taking commodity ordering as an example, as shown in Tables 1 to 3, the two data in Table 3 are the data flow node information of the product table and the order table of the product order flow. Since the product table is the starting data flow node, get_col, in_tab_name, and in_col are null, the order table is the ending data flow node, so out_col is null, and product.product_no and order.product_no are related to each other.
以上是数据流模型的简单示例,数据流模型的核心在于对数据流信息的收集,本申请实施例的完备的数据收集流程如下:(1)确定数据流。(2)确定数据流所包含的所有数据库表。(3)确定数据流中所有数据库表的先后次序,为每个数据库表编号,并确定数据流中每个数据库表的流入流出字段。(4)将确定的数据流信息写入到数据流模型中。(5)核对数据流信息,与数据字典相关信息保持一致。The above is a simple example of the data flow model. The core of the data flow model lies in the collection of data flow information. The complete data collection process in the embodiment of the present application is as follows: (1) Determine the data flow. (2) Determine all database tables included in the data stream. (3) Determine the sequence of all database tables in the data stream, number each database table, and determine the inflow and outflow fields of each database table in the data stream. (4) Write the determined data flow information into the data flow model. (5) Check the data flow information and keep it consistent with the relevant information of the data dictionary.
在相关技术方案中,通过数据字典物理外键或文档来记录业务系统数据库表与表之间关系,会导致系统的性能问题且效率低下,易出错。而本申请实施例提出了一种数据流模型,不仅能够完整记录数据库表与表之间关系,避免了创建物理外键导致的性能问题,还能够在数据库中对数据流信息进行查询,远比文档记录更加方便和高效。另外,本申请实施例在提出数据流模型的同时,还提出了在一个子表对应多个父表、数据分流之后再合流的业务场景的数据标准,从而能够记录更加复杂的业务场景,与数据字典的单表元数据相结合使用,数据流模型将会有更广泛的应用场景。In related technical solutions, recording the relationship between tables in the business system database through physical foreign keys or documents in the data dictionary will lead to system performance problems, low efficiency, and error-prone. However, the embodiment of this application proposes a data flow model, which can not only completely record the relationship between database tables and tables, avoid performance problems caused by creating physical foreign keys, but also query data flow information in the database, which is far more efficient than Documentation is more convenient and efficient. In addition, while proposing the data flow model in the embodiment of the present application, it also proposes a data standard for a business scenario in which one child table corresponds to multiple parent tables and the data is split and then merged, so that more complex business scenarios can be recorded, and data Combined with the single-table metadata of the dictionary, the data flow model will have a wider range of application scenarios.
本实施例提供了一种数据管理方法,本实施例是对前述实施例的具体实现进行详细阐述,从中可以看出,与相关技术相比,本申请实施例提供的技术方案至少具有以下优势:(1)相关技术使用数据字段管理元数据,需要在表上建立物理外键来记录表与表之间的关联关系,建立物理外键会导致系统开发难度加大、数据处理难度变大、影响系统性能等问题,使用数据流模型避免了系统开发时建立物理外键,也避免了相关种种问题。(2)相关技术使用文档管理元数据,人工维护的工作量较大、查阅困难、容易造成与系统信息不一致。使用数据流模型,即是在系统数据库中建立数据流表,在数据库中维护数据流信息,可以方便地与数据字典进行对比,避免与系统信息不一致,也可以方便地在数据库中进行查询数据流信息。(3)本申请实施例使用数据流模型,不仅仅能表示相邻两张表的关联关系,还能完整地表示整个数据链路的先后关联关系;建立数据流模型,即是对业务逻辑的梳理过程,可与业务系统开发同时进行,便于发现业务逻辑存在的问题;数据流模型还能够表示关于一个子表对应多个父表、数据分流之后再合流的复杂业务场景,相对于相关技术有更广的应用场景。This embodiment provides a data management method. This embodiment is a detailed description of the specific implementation of the foregoing embodiments. It can be seen that, compared with related technologies, the technical solution provided by this embodiment of the application has at least the following advantages: (1) Related technologies use data fields to manage metadata. It is necessary to establish physical foreign keys on tables to record the relationship between tables. Establishing physical foreign keys will make system development more difficult, data processing more difficult, and affect For issues such as system performance, the use of the data flow model avoids the establishment of physical foreign keys during system development, and also avoids various related problems. (2) Related technologies use document management metadata, which requires a large workload for manual maintenance, makes it difficult to consult, and easily causes inconsistencies with system information. Using the data flow model means creating a data flow table in the system database and maintaining data flow information in the database, which can be easily compared with the data dictionary to avoid inconsistencies with the system information, and can also easily query the data flow in the database information. (3) The embodiment of the present application uses a data flow model, which can not only represent the association relationship between two adjacent tables, but also completely represent the sequence association relationship of the entire data link; the establishment of a data flow model is the definition of business logic. The combing process can be carried out simultaneously with the development of the business system to facilitate the discovery of problems in the business logic; the data flow model can also represent complex business scenarios in which one child table corresponds to multiple parent tables, and the data is split and then merged. Compared with related technologies, Wider application scenarios.
本申请的再一实施例中,参见图4,其示出了本申请实施例提供的一种数据管理装置40的组成结构示意图。如图4所示,该数据管理装置40可以包括获取单元401,确定单元402和生成单元403,其中,In yet another embodiment of the present application, refer to FIG. 4 , which shows a schematic diagram of the composition and structure of a data management device 40 provided in the embodiment of the present application. As shown in FIG. 4, the data management device 40 may include an acquisition unit 401, a determination unit 402 and a generation unit 403, wherein,
获取单元401,配置为获取待处理数据流,待处理数据流包括若干个数据库表;The obtaining unit 401 is configured to obtain a data stream to be processed, and the data stream to be processed includes several database tables;
确定单元402,配置为确定若干个数据库表的次序信息和关键字段信息;A determination unit 402 configured to determine sequence information and key field information of several database tables;
生成单元403,配置为根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。The generating unit 403 is configured to generate a data flow model according to sequence information and key field information of several database tables; wherein, the data flow model is used to represent the association relationship between several database tables.
在一些实施例中,确定单元402,具体配置为确定若干个数据库表在所述待处理数据流中的先后顺序,并根据先后顺序生成若干个数据库表的次序信息;以及确定若干个数据库表中每一个数据库表对应的流入字段和流出字段,并根据每一个数据库表对应的流入字段和流出字段确定若干个数据库表的关键字段信息。In some embodiments, the determining unit 402 is specifically configured to determine the sequence of several database tables in the data stream to be processed, and generate sequence information of several database tables according to the sequence; and determine the sequence information of several database tables The inflow field and outflow field corresponding to each database table, and the key field information of several database tables are determined according to the inflow field and outflow field corresponding to each database table.
在一些实施例中,生成单元403,具体配置为根据若干个数据库表,生成若干个数据流 节点;以及根据若干个数据库表的次序信息和关键字段信息,将若干个数据流节点进行串接,得到数据流模型。In some embodiments, the generation unit 403 is specifically configured to generate several data flow nodes according to several database tables; and to concatenate several data flow nodes according to the sequence information and key field information of several database tables , get the data flow model.
在一些实施例中,确定单元402,还具体配置为确定第一数据库表的主键字段,将主键字段作为第一数据库表对应的流出字段;以及确定第一数据库表对应上一数据流节点的第二数据库表,将第二数据库表确定为第一数据库表对应的流入表,并将流入表的主键字段作为第一数据库表对应的流入字段;其中,第一数据库表为若干个数据库表中的任意一个数据库表。In some embodiments, the determining unit 402 is also specifically configured to determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table; and determine the first database table corresponding to the first data flow node Two database tables, the second database table is determined as the inflow table corresponding to the first database table, and the primary key field of the inflow table is used as the inflow field corresponding to the first database table; wherein, the first database table is one of several database tables any database table.
在一些实施例中,确定单元402,还配置为在第一数据库表处于待处理数据流的起始数据流节点的情况下,确定第一数据库表对应的流入表和流入字段均为空;以及在第一数据库表处于待处理数据流的结束数据流节点的情况下,确定第一数据库表对应的流出字段为空。In some embodiments, the determining unit 402 is further configured to determine that both the inflow table and the inflow field corresponding to the first database table are empty when the first database table is at the start data flow node of the data flow to be processed; and When the first database table is at the end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.
在一些实施例中,生成单元403,还具体配置为若第一数据库表对应一个父表,则根据第一数据库表生成一个数据流节点;以及若第一数据库表对应至少两个父表,则根据第一数据库表生成至少两个数据流节点,且至少两个数据流节点的次序信息相同;其中,第一数据库表为若干个数据库表中的任意一个数据库表。In some embodiments, the generation unit 403 is further specifically configured to generate a data flow node according to the first database table if the first database table corresponds to one parent table; and if the first database table corresponds to at least two parent tables, then At least two data flow nodes are generated according to the first database table, and the sequence information of the at least two data flow nodes is the same; wherein, the first database table is any one of several database tables.
在一些实施例中,如图4所示,该数据管理装置还可以包括分流单元404,配置为对待处理数据流包括的若干个数据库表进行分流处理,得到至少两组数据库表;In some embodiments, as shown in FIG. 4 , the data management device may further include a splitting unit 404 configured to split several database tables included in the data stream to be processed to obtain at least two groups of database tables;
确定单元402,还配置为确定至少两组数据库表中每一组数据库表的次序信息和关键字段信息;The determination unit 402 is further configured to determine the sequence information and key field information of each group of database tables in at least two groups of database tables;
生成单元403,还配置为根据每一组数据库表的次序信息和关键字段信息,生成至少两个数据流子模型;其中,每一个数据流子模型分别用于表征每一组数据库表之间的关联关系。The generation unit 403 is further configured to generate at least two data flow sub-models according to the sequence information and key field information of each group of database tables; wherein, each data flow sub-model is used to represent each group of database tables relationship.
在一些实施例中,如图4所示,该数据管理装置还可以包括合流单元405,配置为对至少两个数据流子模型进行合流处理,得到数据流模型。In some embodiments, as shown in FIG. 4 , the data management apparatus may further include a merging unit 405 configured to perform merging processing on at least two data flow sub-models to obtain a data flow model.
在一些实施例中,如图4所示,该数据管理装置还可以包括比较单元406,配置为将数据流模型中的数据信息与数据字典进行比较;以及若数据流模型中的数据信息与数据字典中的数据信息不一致,则基于数据字典中的数据信息对数据流模型中的数据信息进行修正,使得数据流模型中的数据信息与数据字典中的数据信息一致。In some embodiments, as shown in FIG. 4 , the data management device may further include a comparison unit 406 configured to compare the data information in the data flow model with the data dictionary; and if the data information in the data flow model is consistent with the data If the data information in the dictionary is inconsistent, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data information in the data flow model is consistent with the data information in the data dictionary.
在一些实施例中,如图4所示,该数据管理装置还可以包括查询单元407,配置为确定待查询信息;以及基于待查询信息在数据流模型中进行查询,确定待查询信息对应的数据库表,和/或,确定待查询信息对应的数据流节点与数据流节点之间的关联关系。In some embodiments, as shown in FIG. 4 , the data management device may further include a query unit 407 configured to determine the information to be queried; and perform a query in the data flow model based on the information to be queried, and determine the database corresponding to the information to be queried table, and/or, determine the data flow node corresponding to the information to be queried and the association relationship between the data flow nodes.
可以理解地,在本实施例中,“单元”可以是部分电路、部分处理器、部分程序或软件等等,当然也可以是模块,还可以是非模块化的。而且在本实施例中的各组成部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。It can be understood that, in this embodiment, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., of course it may also be a module, or it may be non-modular. Moreover, each component in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(processor)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment. The aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
因此,本实施例提供了一种计算机存储介质,该计算机存储介质存储有计算机程序,所述计算机程序被至少一个处理器执行时实现前述实施例中任一项所述数据处理方法的步骤。Therefore, this embodiment provides a computer storage medium, where the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the steps of the data processing method described in any one of the preceding embodiments are implemented.
基于上述的一种数据管理装置40的组成以及计算机存储介质,参见图5,其示出了本申请实施例提供的一种电子设备50的组成结构示意图。如图5所示,可以包括:通信接口501、存储器502和处理器503;各个组件通过总线系统504耦合在一起。可理解,总线系统504 用于实现这些组件之间的连接通信。总线系统504除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图5中将各种总线都标为总线系统504。其中,通信接口501,用于在与其他外部网元之间进行收发信息过程中,信号的接收和发送;Based on the composition of the above-mentioned data management apparatus 40 and the computer storage medium, refer to FIG. 5 , which shows a schematic diagram of the composition and structure of an electronic device 50 provided by an embodiment of the present application. As shown in FIG. 5 , it may include: a communication interface 501 , a memory 502 and a processor 503 ; each component is coupled together through a bus system 504 . It can be understood that the bus system 504 is used to realize connection and communication between these components. In addition to the data bus, the bus system 504 also includes a power bus, a control bus and a status signal bus. However, for clarity of illustration, the various buses are labeled as bus system 504 in FIG. 5 . Among them, the communication interface 501 is used for receiving and sending signals during the process of sending and receiving information with other external network elements;
存储器502,用于存储能够在处理器503上运行的计算机程序; memory 502, used to store computer programs that can run on the processor 503;
处理器503,用于在运行所述计算机程序时,执行:The processor 503 is configured to, when running the computer program, execute:
获取待处理数据流,待处理数据流包括若干个数据库表;Obtain the data stream to be processed, which includes several database tables;
确定若干个数据库表的次序信息和关键字段信息;Determine the sequence information and key field information of several database tables;
根据若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,数据流模型用于表征若干个数据库表之间的关联关系。According to the sequence information and key field information of several database tables, a data flow model is generated; wherein, the data flow model is used to represent the association relationship between several database tables.
可以理解,本申请实施例中的存储器502可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步链动态随机存取存储器(Synchronous link DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DRRAM)。本文描述的系统和方法的存储器502旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory 502 in the embodiment of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (Read-Only Memory, ROM), programmable read-only memory (Programmable ROM, PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electronically programmable Erase Programmable Read-Only Memory (Electrically EPROM, EEPROM) or Flash. The volatile memory can be Random Access Memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM), Dynamic Random Access Memory (Dynamic RAM, DRAM), Synchronous Dynamic Random Access Memory (Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous chain dynamic random access memory (Synchronous link DRAM, SLDRAM ) and Direct Memory Bus Random Access Memory (Direct Rambus RAM, DRRAM). Memory 502 of the systems and methods described herein is intended to include, but is not limited to, these and any other suitable types of memory.
而处理器503可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器503中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器503可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器502,处理器503读取存储器502中的信息,结合其硬件完成上述方法的步骤。The processor 503 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the above method can be completed by an integrated logic circuit of hardware in the processor 503 or instructions in the form of software. The above-mentioned processor 503 may be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a field programmable gate array (Field Programmable Gate Array, FPGA) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 502, and the processor 503 reads the information in the memory 502, and completes the steps of the above method in combination with its hardware.
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。It should be understood that the embodiments described herein may be implemented by hardware, software, firmware, middleware, microcode or a combination thereof. For hardware implementation, the processing unit can be implemented in one or more application specific integrated circuits (Application Specific Integrated Circuits, ASIC), digital signal processor (Digital Signal Processing, DSP), digital signal processing device (DSP Device, DSPD), programmable Logic device (Programmable Logic Device, PLD), Field-Programmable Gate Array (Field-Programmable Gate Array, FPGA), general-purpose processor, controller, microcontroller, microprocessor, other devices used to perform the functions described in this application electronic unit or its combination.
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来实现本文所述的技术。软件代码可存储在存储器中并通过处理器执行。存储器可以在处理器中或在处理器外部实现。For a software implementation, the techniques described herein can be implemented through modules (eg, procedures, functions, and so on) that perform the functions described herein. Software codes can be stored in memory and executed by a processor. Memory can be implemented within the processor or external to the processor.
可选地,作为另一个实施例,处理器503还配置为在运行所述计算机程序时,执行前述实施例中任一项所述的方法。Optionally, as another embodiment, the processor 503 is further configured to execute the method described in any one of the foregoing embodiments when running the computer program.
基于上述数据处理装置40的组成,参见图6,其示出了本申请实施例提供的另一种电子设备50的组成结构示意图。如图6所示,该电子设备50至少包括前述实施例中任一项所述的数据管理装置40。Based on the composition of the data processing apparatus 40 described above, refer to FIG. 6 , which shows a schematic diagram of the composition and structure of another electronic device 50 provided by the embodiment of the present application. As shown in FIG. 6 , the electronic device 50 at least includes the data management apparatus 40 described in any one of the foregoing embodiments.
对于电子设备50而言,由于基于待处理数据流中的若干个数据库表的次序信息和关键字段信息生成的数据流模型,不仅能够实现对这若干个数据库表的高效管理,降低了人工维护成本,而且还能够适用复杂的应用场景;另外,由于该数据流模型能够完整记录数据库表与数据库表之间的关联关系,避免了创建物理外键导致的性能问题,同时还能够在数据流模型中对数据流信息进行方便查询,提升了数据管理效率。For the electronic device 50, due to the data flow model generated based on the order information and key field information of several database tables in the data stream to be processed, it can not only realize efficient management of these several database tables, but also reduce manual maintenance. cost, and can also be applied to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids the performance problems caused by creating physical foreign keys, and can also be used in the data flow model It is convenient to query the data flow information in the middle, which improves the efficiency of data management.
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。The above descriptions are only preferred embodiments of the present application, and are not intended to limit the protection scope of the present application.
需要说明的是,在本申请中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。It should be noted that in this application, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article or apparatus comprising a set of elements includes not only those elements , but also includes other elements not expressly listed, or also includes elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not preclude the presence of additional identical elements in the process, method, article, or apparatus comprising that element.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
本申请所提供的几个方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。The methods disclosed in several method embodiments provided in this application can be combined arbitrarily to obtain new method embodiments under the condition of no conflict.
本申请所提供的几个产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。The features disclosed in several product embodiments provided in this application can be combined arbitrarily without conflict to obtain new product embodiments.
本申请所提供的几个方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。The features disclosed in several method or device embodiments provided in this application can be combined arbitrarily without conflict to obtain new method embodiments or device embodiments.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.
工业实用性Industrial Applicability
在本申请实施例中,基于待处理数据流中的若干个数据库表的次序信息和关键字段信息生成的数据流模型,不仅能够实现对这若干个数据库表的高效管理,降低了人工维护成本,而且还能够适用复杂的应用场景;另外,由于该数据流模型能够完整记录数据库表与数据库表之间的关联关系,避免了创建物理外键导致的性能问题,同时还能够在数据流模型中对数据流信息进行方便查询,提升了数据管理效率。In the embodiment of this application, the data flow model generated based on the sequence information and key field information of several database tables in the data flow to be processed can not only realize efficient management of these several database tables, but also reduce the cost of manual maintenance , and can also apply to complex application scenarios; in addition, because the data flow model can completely record the relationship between database tables and database tables, it avoids performance problems caused by creating physical foreign keys, and can also be used in the data flow model The data flow information is conveniently queried, which improves the efficiency of data management.

Claims (13)

  1. 一种数据管理方法,包括:A data management method comprising:
    获取待处理数据流,所述待处理数据流包括若干个数据库表;Obtain a data stream to be processed, the data stream to be processed includes several database tables;
    确定所述若干个数据库表的次序信息和关键字段信息;Determine the sequence information and key field information of the several database tables;
    根据所述若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,所述数据流模型用于表征所述若干个数据库表之间的关联关系。A data flow model is generated according to the sequence information and key field information of the several database tables; wherein, the data flow model is used to characterize the association relationship among the several database tables.
  2. 根据权利要求1所述的方法,其中,所述确定所述若干个数据库表的次序信息和关键字段信息,包括:The method according to claim 1, wherein said determining the order information and key field information of said several database tables comprises:
    确定所述若干个数据库表在所述待处理数据流中的先后顺序,并根据所述先后顺序生成所述若干个数据库表的次序信息;Determining the sequence of the several database tables in the data stream to be processed, and generating sequence information of the several database tables according to the sequence;
    确定所述若干个数据库表中每一个数据库表对应的流入字段和流出字段,并根据所述每一个数据库表对应的流入字段和流出字段确定所述若干个数据库表的关键字段信息。Determine the inflow field and outflow field corresponding to each database table in the several database tables, and determine the key field information of the several database tables according to the inflow field and outflow field corresponding to each database table.
  3. 根据权利要求2所述的方法,其中,所述根据所述若干个数据库表的次序信息和关键字段信息,生成数据流模型,包括:The method according to claim 2, wherein said generating a data flow model according to the sequence information and key field information of said several database tables comprises:
    根据所述若干个数据库表,生成若干个数据流节点;Generate several data flow nodes according to the several database tables;
    根据所述若干个数据库表的次序信息和所述关键字段信息,将所述若干个数据流节点进行串接,得到所述数据流模型。According to the order information of the several database tables and the key field information, the several data flow nodes are concatenated to obtain the data flow model.
  4. 根据权利要求3所述的方法,其中,所述确定所述若干个数据库表中每一个数据库表对应的流入字段和流出字段,包括:The method according to claim 3, wherein said determining the inflow field and outflow field corresponding to each of the several database tables comprises:
    确定第一数据库表的主键字段,将所述主键字段作为所述第一数据库表对应的流出字段;Determine the primary key field of the first database table, and use the primary key field as the outflow field corresponding to the first database table;
    确定所述第一数据库表对应上一数据流节点的第二数据库表,将所述第二数据库表确定为所述第一数据库表对应的流入表,并将所述流入表的主键字段作为所述第一数据库表对应的流入字段;Determining that the first database table corresponds to the second database table of the previous data flow node, determining the second database table as the inflow table corresponding to the first database table, and using the primary key field of the inflow table as the The inflow field corresponding to the first database table;
    其中,所述第一数据库表为所述若干个数据库表中的任意一个数据库表。Wherein, the first database table is any one of the several database tables.
  5. 根据权利要求4所述的方法,其中,所述方法还包括:The method according to claim 4, wherein the method further comprises:
    在所述第一数据库表处于所述待处理数据流的起始数据流节点的情况下,确定所述第一数据库表对应的所述流入表和所述流入字段均为空;When the first database table is at the start data flow node of the data flow to be processed, it is determined that both the inflow table and the inflow field corresponding to the first database table are empty;
    在所述第一数据库表处于所述待处理数据流的结束数据流节点的情况下,确定所述第一数据库表对应的所述流出字段为空。In a case where the first database table is at an end data flow node of the data flow to be processed, it is determined that the outflow field corresponding to the first database table is empty.
  6. 根据权利要求3所述的方法,其中,所述根据所述若干个数据库表,生成若干个数据流节点,包括:The method according to claim 3, wherein said generating several data flow nodes according to said several database tables comprises:
    若第一数据库表对应一个父表,则根据所述第一数据库表生成一个数据流节点;If the first database table corresponds to a parent table, then generate a data flow node according to the first database table;
    若第一数据库表对应至少两个父表,则根据所述第一数据库表生成至少两个数据流节点,且所述至少两个数据流节点的次序信息相同;If the first database table corresponds to at least two parent tables, at least two data flow nodes are generated according to the first database table, and the sequence information of the at least two data flow nodes is the same;
    其中,所述第一数据库表为所述若干个数据库表中的任意一个数据库表。Wherein, the first database table is any one of the several database tables.
  7. 根据权利要求1所述的方法,其中,所述方法还包括:The method according to claim 1, wherein the method further comprises:
    对所述待处理数据流包括的若干个数据库表进行分流处理,得到至少两组数据库表;Perform splitting processing on several database tables included in the data stream to be processed to obtain at least two groups of database tables;
    确定所述至少两组数据库表中每一组数据库表的次序信息和关键字段信息;determining sequence information and key field information for each set of database tables in the at least two sets of database tables;
    根据所述每一组数据库表的次序信息和关键字段信息,生成至少两个数据流子模型;其中,每一个数据流子模型分别用于表征每一组数据库表之间的关联关系。According to the sequence information and key field information of each group of database tables, at least two data flow sub-models are generated; wherein each data flow sub-model is used to represent the association relationship between each group of database tables.
  8. 根据权利要求7所述的方法,其中,所述方法还包括:The method according to claim 7, wherein the method further comprises:
    对所述至少两个数据流子模型进行合流处理,得到所述数据流模型。Perform merge processing on the at least two data flow sub-models to obtain the data flow model.
  9. 根据权利要求1至8任一项所述的方法,其中,在所述生成数据流模型之后,所述方法还包括:The method according to any one of claims 1 to 8, wherein, after said generating the data flow model, said method further comprises:
    将所述数据流模型中的数据信息与数据字典进行比较;comparing the data information in the data flow model with a data dictionary;
    若所述数据流模型中的数据信息与所述数据字典中的数据信息不一致,则基于所述数据字典中的数据信息对所述数据流模型中的数据信息进行修正,使得所述数据流模型中的数据信息与所述数据字典中的数据信息一致。If the data information in the data flow model is inconsistent with the data information in the data dictionary, the data information in the data flow model is corrected based on the data information in the data dictionary, so that the data flow model The data information in is consistent with the data information in the data dictionary.
  10. 根据权利要求1至8任一项所述的方法,其中,在所述生成数据流模型之后,所述方法还包括:The method according to any one of claims 1 to 8, wherein, after said generating the data flow model, said method further comprises:
    确定待查询信息;Determine the information to be queried;
    基于所述待查询信息在所述数据流模型中进行查询,确定所述待查询信息对应的数据库表,和/或,确定所述待查询信息对应的数据流节点与数据流节点之间的关联关系。Perform a query in the data flow model based on the information to be queried, determine a database table corresponding to the information to be queried, and/or determine a data flow node corresponding to the information to be queried and an association between data flow nodes relation.
  11. 一种数据管理装置,包括获取单元,确定单元和生成单元,其中,A data management device, including an acquisition unit, a determination unit and a generation unit, wherein,
    所述获取单元,配置为获取待处理数据流,所述待处理数据流包括若干个数据库表;The acquiring unit is configured to acquire a data stream to be processed, and the data stream to be processed includes several database tables;
    所述确定单元,配置为确定所述若干个数据库表的次序信息和关键字段信息;The determining unit is configured to determine sequence information and key field information of the several database tables;
    所述生成单元,配置为根据所述若干个数据库表的次序信息和关键字段信息,生成数据流模型;其中,所述数据流模型用于表征所述若干个数据库表之间的关联关系。The generation unit is configured to generate a data flow model according to the sequence information and key field information of the several database tables; wherein the data flow model is used to characterize the association relationship between the several database tables.
  12. 一种电子设备,所述电子设备包括存储器和处理器,其中,An electronic device comprising a memory and a processor, wherein,
    所述存储器,用于存储能够在所述处理器上运行的计算机程序;said memory for storing a computer program capable of running on said processor;
    所述处理器,用于在运行所述计算机程序时,执行如权利要求1至10任一项所述的数据管理方法。The processor is configured to execute the data management method according to any one of claims 1 to 10 when running the computer program.
  13. 一种计算机存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序被至少一个处理器执行时实现如权利要求1至10任一项所述的数据管理方法。A computer storage medium, the computer storage medium stores a computer program, and when the computer program is executed by at least one processor, the data management method according to any one of claims 1 to 10 is realized.
PCT/CN2022/121315 2022-01-05 2022-09-26 Data management method and apparatus, and electronic device and storage medium WO2023130771A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210006368.3A CN116450637A (en) 2022-01-05 2022-01-05 Data management method, device, electronic equipment and storage medium
CN202210006368.3 2022-01-05

Publications (1)

Publication Number Publication Date
WO2023130771A1 true WO2023130771A1 (en) 2023-07-13

Family

ID=87073014

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/121315 WO2023130771A1 (en) 2022-01-05 2022-09-26 Data management method and apparatus, and electronic device and storage medium

Country Status (2)

Country Link
CN (1) CN116450637A (en)
WO (1) WO2023130771A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117539869B (en) * 2024-01-08 2024-03-15 北京睿企信息科技有限公司 Data processing system for acquiring data table

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122796A1 (en) * 2012-10-31 2014-05-01 Netapp, Inc. Systems and methods for tracking a sequential data stream stored in non-sequential storage blocks
CN106250382A (en) * 2016-01-28 2016-12-21 新博卓畅技术(北京)有限公司 A kind of metadata management automotive engine system and implementation method
CN108132957A (en) * 2016-12-01 2018-06-08 中国移动通信有限公司研究院 A kind of data base processing method and device
CN110908978A (en) * 2019-11-06 2020-03-24 中盈优创资讯科技有限公司 Database data structure verification method and device
CN111078695A (en) * 2019-11-29 2020-04-28 东软集团股份有限公司 Method and device for calculating metadata association relation in enterprise

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140122796A1 (en) * 2012-10-31 2014-05-01 Netapp, Inc. Systems and methods for tracking a sequential data stream stored in non-sequential storage blocks
CN106250382A (en) * 2016-01-28 2016-12-21 新博卓畅技术(北京)有限公司 A kind of metadata management automotive engine system and implementation method
CN108132957A (en) * 2016-12-01 2018-06-08 中国移动通信有限公司研究院 A kind of data base processing method and device
CN110908978A (en) * 2019-11-06 2020-03-24 中盈优创资讯科技有限公司 Database data structure verification method and device
CN111078695A (en) * 2019-11-29 2020-04-28 东软集团股份有限公司 Method and device for calculating metadata association relation in enterprise

Also Published As

Publication number Publication date
CN116450637A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
US10318551B2 (en) Reporting and summarizing metrics in sparse relationships on an OLTP database
US8010521B2 (en) Systems and methods for managing foreign key constraints
US8768967B2 (en) Data export/import from multiple data sources to a destination data repository using corresponding data exporters and an importer
US8103704B2 (en) Method for database consolidation and database separation
US20120109888A1 (en) Data partitioning method of distributed parallel database system
US8321390B2 (en) Methods and apparatus for organizing data in a database
US10078676B2 (en) Schema evolution in multi-tenant environment
US20160224623A1 (en) Workflow Processing System and Method with Database System Support
US10296542B2 (en) Integration database framework
WO2023130771A1 (en) Data management method and apparatus, and electronic device and storage medium
US9652740B2 (en) Fan identity data integration and unification
US9053207B2 (en) Adaptive query expression builder for an on-demand data service
Marotta et al. Data warehouse design: A schema-transformation approach
US20080294673A1 (en) Data transfer and storage based on meta-data
CN103678591A (en) Device and method for automatically executing multi-service receipt statistical treatment
WO2016112502A1 (en) Method, apparatus and computing device for storing query result
CN113297171A (en) Database migration method and device and database cluster
US11853274B2 (en) Efficient deduplication of randomized file paths
Feng et al. Transforming UML class diagram into cassandra data model with annotations
US20200380022A1 (en) Auto derivation of summary data using machine learning
CN111723129B (en) Report generation method, report generation device and electronic equipment
CN110399419B (en) Relational template memory database system
Zdepski et al. An Approach for Modeling Polyglot Persistence.
CN112905601A (en) Routing method and device for database sub-tables
US10198249B1 (en) Accessing schema-free databases

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22918225

Country of ref document: EP

Kind code of ref document: A1