CN117851405A

CN117851405A - Improved method for automatically creating and updating data synchronous wide table structure

Info

Publication number: CN117851405A
Application number: CN202311722358.0A
Authority: CN
Inventors: 肖明睿
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-04-09

Abstract

The invention discloses an improved method for automatically creating and updating a synchronous wide table structure of data, which comprises the steps that firstly, a wide table structure of a plurality of storage bodies is created and updated through configuration of schema metadata configured by a table and associated information configuration tables in a database; step two, automatically generating a required wide table field according to best practice rules; thirdly, redundant configuration is reduced; fourth, the type of the specific field of the output stream is customized; fifthly, adding a table custom attribute table; sixthly, a table related information table related to the task wide table is formed; seventh, the configuration of the directory names and the database names to which the tables belong is performed. The invention efficiently and automatically creates and updates the wide table structures of a plurality of storage bodies by configuring the schema metadata and the associated information configuration tables of the table configuration in the database, simultaneously automatically generates the required wide table fields, standardizes the wide table field setting rules, reduces redundant configuration, improves configuration efficiency and reduces inconsistency.

Description

Improved method for automatically creating and updating data synchronous wide table structure

Technical Field

The invention relates to the field of large data real-time flow dimension table query, in particular to an improved method for automatically creating and updating a data synchronous wide table structure.

Background

Large data real-time bins are data warehouse systems capable of processing and analyzing large-scale data in real-time. The method combines a big data technology and a real-time data processing technology, can meet the rapidly-changing business requirement, and supports real-time data analysis and decision.

However, the field synchronization of the table at the present stage can only be limited to a single table, a wide table generated by multi-table association cannot be adapted, manual field correspondence is needed or only selection operation of a web interface is needed, no certain rule is needed to automatically associate the fields, the existing database belongs to the storage body with media and the storage body, so that a lot of things which are the same need to be repeatedly configured, and the zoned barrel configuration of the multi-key type field is redundant.

Disclosure of Invention

The present invention aims to provide an improved method for automatically creating and updating a data synchronous wide table structure, so as to solve the problems set forth in the background art.

In order to achieve the above purpose, the present invention provides the following technical solutions: an improved method of automating the creation and updating of a data synchronized wide table structure, said improved method comprising the steps of;

the method comprises the steps that firstly, a wide table structure of a plurality of storage bodies is created and updated through configuration of schema metadata configured by a table in a database and a related information configuration table thereof;

step two, automatically generating a required wide table field according to best practice rules;

thirdly, redundant configuration is reduced, a medium and a storage body are split into a storage body medium table and a metadata storage body table, and then a metadata information table is added to form a core base;

the fourth step, the type of the specific field of the output stream is customized, the field to be converted and the type to be converted are defined through a new flink task wide table specific field type conversion table, and the strong association of the field type of the output stream and the input stream is decoupled;

fifthly, adding a table custom attribute table, and supporting attribute custom configuration, attribute encryption and decryption and replacement;

step six, splitting a table association information table related to the task wide table into table association information related to the task wide table and a metadata storage body, and analyzing the wide table corresponding to the task and the association relation thereof;

seventh, the configuration of the directory names and the database names of the tables is carried out, and the directory names and the database names are added in the metadata storage table.

Preferably, the memory banks in the first step include flink, hudi, and doris.

Preferably, after the redundancy configuration reducing operation in the third step, when the other tables are distinguished at three progressive levels of metadata, a storage body and a medium, the primary key of one of the corresponding tables is used as the external key to perform targeted configuration.

Preferably, after the progressive layer is distinguished, the association table is improved, including splitting the medium classification from the storage body classification, so as to ensure that only the table template attribute, the table template field and the table custom attribute configuration and the storage body medium table association configuration are ensured, and the configuration is required according to different media.

Preferably, the redundancy reducing configuration further comprises reducing redundancy of the split bucket configuration, and moving key fields and key types of the metadata storage table to a new metadata key value table.

Preferably, in the fifth step, the attribute custom configuration is that the attribute of the table template is removed from the table template, and a table custom attribute table is added to separate the field from the attribute configuration.

Preferably, the step of analyzing the broad table and the association relation thereof corresponding to the task in the sixth step is to look up the association relation of metadata information of the broad table and the association relation thereof and information of the corresponding association table according to the task name, and the broad table needs to be synchronized with a storage body, a medium, a corresponding template, a key value, a custom field and attribute information.

Preferably, the obtaining of the field information required by the broad table comprises the following steps;

a1, acquiring field name information required by a broad table according to the broad table and the association relation thereof and the information of the corresponding association table;

a2, unifying the field formats of the fact table;

a3, performing increase and decrease processing on the specific field;

and A4, obtaining the type and length type complete field information of the field name corresponding field required by the wide table.

Preferably, after the field information required by the wide table is acquired, the structure of the wide table is set, and the structure of the wide table corresponding to different storage bodies is created.

Preferably, after the wide table structure setting is completed, the sentence setting is created and updated, and the executed operation includes the following steps;

b1, setting table attribute, key value and sub-division barrel information;

b2, judging whether to create or update, including searching whether a corresponding storage body has a table, if the table does not exist, preparing to create the table, and if the table does exist, acquiring an old table structure;

b3, creating a data table operation statement, including judging whether to create, set table attributes, search corresponding table templates and replace placeholders in the table to create corresponding table creation and update statements;

and B4, executing the statement.

The invention has the technical effects and advantages that:

the invention efficiently and automatically creates and updates the wide table structure of a plurality of memory banks by configuring the schema metadata and the associated information configuration table of the table configuration in the database, and simultaneously, the method can automatically generate the required wide table field and standardize the wide table field setting rule, thereby reducing redundant configuration, improving configuration efficiency and reducing inconsistency; meanwhile, the setting of the method supports attribute custom configuration, attribute encryption and decryption and replacement, so that the consistency of data fields is ensured, and the operation flow is correct and rapid.

Drawings

FIG. 1 is a flow chart of the operation of the improved method of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The present invention provides an improved method for automatically creating and updating a synchronous wide table structure of data as shown in fig. 1, the improved method comprising the following steps;

specifically, the memory banks in the first step include flink, hudi, and doris.

specifically, after the redundancy configuration operation is reduced in the third step, when the other tables are distinguished in three progressive layers of metadata, a storage body and a medium, the main key of one corresponding table is used as an external key to perform targeted configuration.

Further, after the progressive layer is distinguished, the association table is improved, including splitting the medium classification from the storage body classification, so as to ensure that only the table template attribute, the table template field and the table custom attribute configuration and the storage body medium table association configuration are ensured, and the configuration is required according to different mediums.

Still further, reducing redundancy configuration further includes reducing split bucket configuration redundancy, moving key fields, key types of the metadata store table to a new metadata key value table.

specifically, in the fifth step, the attribute custom configuration is to detach the attribute of the table template from the table template, and add the table custom attribute table to separate the field from the attribute configuration.

specifically, in the sixth step, the broad table and the association relationship corresponding to the task are analyzed, wherein the metadata information of the broad table and the association relationship of the association table and the information of the corresponding association table are checked through the task name, and the broad table needs to be synchronized with the storage body, the medium, the corresponding template, the key value, the custom field and the attribute information.

Specifically, the acquisition of the field information required by the broad table comprises the following steps;

further, whether the field name information is a key value is judged, and if so, the field name information is reserved. If the addition mode of the association table related to the wide table is the supplement mode, the field name required by the wide table is all the fields of the association table. If the update mode of the association table related to the wide table is the supplement mode, the field name required by the wide table is the invariable column of the value of the association table and the key field set of the corresponding metadata storage information table record, and the combination is carried out after the duplication removal.

A2, unifying the field formats of the fact table, and storing the fields named by the non-standardized fields into the table in a standardized way, so that repeated fields can be conveniently found, and humping and underlining are carried out on the fields of the fact table.

A3, performing increase and decrease processing on the specific field;

further, a task wide table specific field increasing and decreasing table corresponding to the task is obtained according to the task name, and a record of the field needing to be deleted is obtained.

According to records, field names required by the wide table are added and deleted, and renaming is generally used for processing fields with the same name.

Further, according to the wide table, the association relation thereof and the name of the corresponding association table, complete field information in the category table structure of the flink is obtained;

if the information is incomplete, connecting the data source of the association table to obtain complete field information;

and finding a field to be converted according to the specific field type conversion table of the task wide table, and converting into a corresponding field type if the field exists.

Further, after the field information required by the wide table is acquired, the structure of the wide table is set, and the structure of the wide table corresponding to different memory banks is created.

It should be noted that creating the structure of the wide table corresponding to different storage volumes (such as hudi, doris, etc.) and media (such as flink, spark, etc.), includes:

setting a field, namely setting complete field information into a corresponding table structure, and corresponding field types according to a table storage body;

setting key and partition type, and obtaining required key field, such as id of metadata information table, memory type, key field, key type, type of table partition, and type information of table partition bucket according to metadata memory and metadata key table record.

Still further, after the wide table structure setting is completed, the sentence setting is created and updated and executed, and the executed operation includes the following steps;

b1, setting table attribute, key value and sub-division barrel information;

specifically, comparing the new table structure, if there is a field change, if so, recording the field structure information to be changed, and preparing for updating; nothing is done if there is no change.

it should be noted that, setting a table attribute, finding a corresponding template and custom configuration from a table template attribute and a table custom attribute table, changing the corresponding template attribute according to the custom attribute, if a placeholder is in the attribute, replacing the template attribute with an attribute value in a configuration file, if encryption and decryption configuration is in the attribute, encrypting and decrypting according to rules, and then replacing the template attribute.

And B4, executing the statement.

Note that, the data table uses:

metadata information table: the table is in the required relevant information of several bins. Metadata associated information table: association relationship information between tables. Task wide table association information table: correspondence between tasks and broad tables. Table association information table related to task width table: the broad table corresponding to the task is related metadata associated information. Task wide table specific field increase and decrease table: the deletion and increasing and decreasing renaming tasks of specific fields are performed on the basis of the general rule, such as the deletion and renaming of the fields with the same names in the association table. Task wide table specific field type conversion table: the field type of the decoupled output stream is strongly correlated with the input stream, and the type of the specific field of the output stream is customized. Table custom attribute table: the self-defined configuration of the attributes of the support table in different media supports ordering and encryption and decryption configuration. Metadata repository information table: for setting the structure and configuration information specific to the memory bank for the different memory banks. A storage medium table: a bank and media combination relationship is defined. Metadata key table: the definition table corresponds to the key fields, types and order of the memory banks. Table template table: the creation update table template is used for recording different media. Table template attribute table: the table attribute templates are used for recording different media and support ordering and encryption and decryption configuration.

The specific table structure information is as follows:

metadata information table: primary key, original table name and several bin layer types, 1: ods table, 2: dwd broad table, add mode, 0: applied, 1: upsert, value-invariant column;

metadata associated information table: primary key, foreign key and associated data field, relational model, metadata table id, associated metadata table id;

task wide table association information table: primary key, task name, id of broad table (id of corresponding metadata information table);

table association information table related to task width table: a primary key, an id of a link task wide table association information table, and an id of a related metadata table information table, whether the related metadata table information table is a temporal temporary association and ordering number;

task wide table specific field increase and decrease table: a primary key, an id of a flink task wide table associated information table, an id of a related metadata associated information table, a field set to be excluded, a field set to be reserved, and a field set to be renamed;

task wide table specific field type conversion table: a primary key, an id of a flink task wide table associated information table, an id of a metadata storage table, a field of a type to be converted, and a type to be converted;

table custom attribute table: primary key, id of metadata storage table, id of storage medium table, operation type, 1: upsert 2: delete, key of attribute value, value of attribute value, encryption type, rank number;

metadata repository information table: primary key, id of metadata information table, bank type, 0: mysql,1: hudi,2: doris, directory name, library name, type of table partition, 0: non-partitioned, 1: field value, 2:range, 3:list list, table type of bucket, 0: not barrel separation, 1: HASH,2: a RANDOM;

a storage medium table: primary key, bank type, media type, 0: memory bank, 1: flink,2: spark;

metadata key table: primary key, id of metadata store table, key field, key type, 0: unique primary key, 1: repeating the primary key, 2: partition key, 3: barrel dividing keys and sorting numbers;

table template table: primary key, id of the storage medium table, table sentence template, template type: 1: build a table, 2: adding fields, 3: deleting the field; 4: modifying the field;

table template table: a primary key, an id of a storage medium table, a key of an attribute value, a value of the attribute value, an encryption type and a sequence number.

Finally, it should be noted that: the foregoing description is only illustrative of the preferred embodiments of the present invention, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that modifications may be made to the embodiments described, or equivalents may be substituted for elements thereof, and any modifications, equivalents, improvements or changes may be made without departing from the spirit and principles of the present invention.

Claims

1. An improved method for automatically creating and updating a data synchronous wide table structure, characterized in that the improved method comprises the following steps;

2. The improved method of automatically creating and updating a data synchronized wide table structure of claim 1, wherein the memory banks in the first step include flink, hudi, and doris.

3. The improvement in the automated data synchronization wide table structure according to claim 1, wherein after the redundancy configuration reduction operation in the third step, when the other tables are distinguished at three progressive levels of metadata, memory bank, and medium, the main key of one of the corresponding tables is used as the external key for the targeted configuration.

4. The improvement in automatically creating and updating a data synchronized wide table structure of claim 3, wherein said progressive level distinguishes itself and improves the associated table by splitting the medium classification from the bank classification to ensure that only table template attributes, table template fields and table custom attribute configurations and configuration of the bank medium table associations need to be configured for different mediums.

5. The improved method of automatically creating and updating a synchronized wide table of data structures of claim 4, wherein said reducing redundancy configuration further comprises reducing split bucket configuration redundancy by moving key fields, key types of a metadata store table to a new metadata key value table.

6. The improvement in automatically creating and updating a data synchronized wide table structure of claim 1, wherein the attribute customization configuration in the fifth step is to detach the table template attributes from the table template and add a table customization attribute table to separate the field and attribute configuration.

7. The method for automatically creating and updating a synchronous wide table structure according to claim 1, wherein the step six is to analyze the wide table corresponding to the task and its association relationship, and the metadata information of the wide table and its association relationship, and the information of the corresponding association table are searched by the task name, and the wide table needs to be synchronized to the memory bank, the medium, the corresponding template, the key value, the custom field and the attribute information.

8. The improved method for automatically creating and updating a synchronized wide table structure of data according to claim 7, wherein the obtaining of the information of the required fields of said wide table comprises the steps of;

a2, unifying the field formats of the fact table;

a3, performing increase and decrease processing on the specific field;

9. The improved method for automatically creating and updating a synchronous wide table structure of data according to claim 8, wherein the wide table structure is set after the field information required by the wide table is acquired, and the wide table is created to correspond to the structures of different memory banks.

10. The improved method for automatically creating and updating a data synchronized wide table structure of claim 9, wherein said wide table structure is set up and then the sentence is created and updated and executed, said executing comprising the steps of;

b1, setting table attribute, key value and sub-division barrel information;

and B4, executing the statement.