CN112000657A

CN112000657A - Data management method, device, server and storage medium

Info

Publication number: CN112000657A
Application number: CN201910446394.6A
Authority: CN
Inventors: 贾烈; 刘荣明
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-05-27
Filing date: 2019-05-27
Publication date: 2020-11-27

Abstract

The invention provides a data management method, a device, a server and a storage medium, wherein the method comprises the following steps: determining the importance level of a data table of the data application according to the level of the data application; and managing the data processing process of the data table according to the importance level of the data table. The invention can reduce the workload of data operation and maintenance and improve the accuracy and timeliness of the data.

Description

Data management method, device, server and storage medium

Technical Field

The present invention relates to the field of big data technologies, and in particular, to a data management method, an apparatus, a server, and a storage medium.

Background

With the development of big data technology, data assets are more and more, and a large amount of storage and computing resources are occupied.

In huge data assets, data has no level or is randomly selected by a user, and accuracy of the data level cannot be guaranteed. Meanwhile, the management of the data processing process is also carried out based on manual configuration.

Data management is performed based on manual configuration, operation and maintenance workload is large, errors are prone to occurring, and accuracy and timeliness of data cannot be guaranteed.

Disclosure of Invention

The invention provides a data management method, a data management device, a server and a storage medium, which are used for reducing the workload of data operation and maintenance and improving the accuracy and timeliness of data.

In a first aspect, the present invention provides a data management method, where the data management method is applicable to a big data platform, and the method includes:

determining the importance level of a data table of the data application according to the level of the data application;

and managing the data processing process of the data table according to the importance level of the data table.

In a second aspect, the present invention provides a data management apparatus, which is suitable for a big data platform, and includes:

the determining module is used for determining the importance level of a data table of the data application according to the level of the data application;

and the management module is used for managing the data processing process of the data table according to the importance level of the data table.

In a third aspect, the present invention further provides a server, including: a memory and a processor; the memory is connected with the processor;

the memory to store program instructions;

the processor is configured to implement the data management method according to the first aspect when the program instructions are executed.

In a fourth aspect, the present invention may also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the data management method of the first aspect described above.

The invention provides a data management method, a device, a server and a storage medium, which can determine the importance level of a data table of a data application according to the level of the data application and manage the data processing process of the data table according to the importance level of the data table. According to the method, the importance level of the data sheet of the data application is determined based on the level of the data application, the accuracy of the data level can be ensured, the data processing process of the data sheet is managed according to the importance level, manual configuration is not needed, the workload of data operation and maintenance is reduced, and the accuracy and timeliness of the data are ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, a brief description will be given below of the drawings required for the embodiments or the technical solutions in the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a first schematic diagram of a data system according to an embodiment of the present invention;

fig. 2 is a second schematic diagram of a data system according to an embodiment of the present invention;

fig. 3 is a first flowchart of a data management method according to an embodiment of the present invention;

fig. 4 is a second flowchart of a data management method according to an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating determining importance levels of data tables in a data management method according to an embodiment of the present invention;

fig. 6 is a flowchart of a data management method according to an embodiment of the present invention;

fig. 7 is a fourth flowchart of a data management method according to an embodiment of the present invention;

FIG. 8 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first", "second", third "and the like in the various parts of the embodiments and drawings are used for distinguishing similar objects and not necessarily for describing a particular order or sequence. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The method flow diagrams of the embodiments of the invention described below are merely exemplary and do not necessarily include all of the contents and steps, nor do they necessarily have to be performed in the order described. For example, some steps may be broken down and some steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

The functional blocks in the block diagrams referred to in the embodiments of the present invention described below are only functional entities and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processors and/or microcontrollers.

A data system to which the embodiments of the present invention are applied will be described below. Fig. 1 is a first schematic diagram of a data system according to an embodiment of the present invention. As shown in fig. 1, a data system provided by an embodiment of the present invention may include: database system 11, big data platform 12, and data application management system 13.

The Database system 11 may be a column-wise storage Database such as a Database management system, a relational Database system, or a distributed Database system (HBase). The database system 11, which may also be referred to as a database service platform, provides support for the business systems, and may have at least one database of the business systems, and the database of each business system may store data of the business system. The database system 11 can be used as an access data source and a push data source of a large data platform. The database system 11 may specifically be implemented on one or more servers.

The big data platform 12 may transmit the processed data to the data application system 13 after processing the data from the database system for use by the data application in the data application system 13. The big data platform 12 may process data from a business system in the database system 11 by using an Extract-Transform-Load (ETL) tool, which is also called a data through vehicle, so as to Load the data into a data warehouse and push the data to a data mart from the data warehouse; or, after the processing data is processed by the ETL tool, the data is directly pushed from the data warehouse to the data mart. In the big data platform 12, data may be pushed to a data application in the data application system 13 after being processed in a data warehouse and a data mart, or pushed to a database of the data application in the data application system to be used by the data application to serve the data application. The big data platform 13 may be implemented on a server cluster consisting of a plurality of servers.

The data application management system 13 may manage a plurality of data applications, for example, a report platform, an operation monitoring platform, an operation analysis platform, a data compass platform, an external data report platform, and the like. The data application management system 13 may also be implemented in one or more servers. The plurality of data applications managed by the data application management system 13 may have respective levels, and the level of the data application may be an importance level of the data application to characterize the importance of the data application. The importance of the data application may be determined, for example, based on the extent of the traffic served by the data application and/or the importance of the served traffic, and may also be determined based on the importance of the data generated by the data application. Data applications applied to different industries may have different hierarchical ways.

For example, in the e-commerce industry, the importance of the data application may be determined according to the importance of the data generated by the data application, for example, the importance of the data may be the influence range generated by the data, and the larger the influence range generated by the data is, the more important the data is, and the smaller the influence range generated by the data is, the less important the data is. The importance of the data application determined based on the importance of the data may be at any of the following levels:

l1: to indicate that the data generated by the data application relates to asset loss and/or clearance risks, etc.

L2: to indicate that the data generated by the data application relates to business impact, business efficiency loss, etc. within the enterprise, to users, suppliers, etc.

L3: the data generated by the data application is indicated to relate to the influence of data products, operation reports and other internal inputs of the enterprise, and the influence is small.

L4: the method is used for indicating that the influence of data generated by data application is an uncertain state, such as that a use scene cannot be determined temporarily.

Fig. 2 is a second schematic diagram of a data system according to an embodiment of the present invention. As shown in fig. 2, in the data system provided by the embodiment of the present invention, the database system 11 is an external system on which the big data platform 12 depends, and is an access data source and a push data source of the big data platform 12.

The requirement management platform is also an external system on which the big data platform 12 depends, and may be a management system for enterprise requirements and projects, and business requirements in the business system are managed and controlled by the requirement management platform.

The data application management system 13 is an external system on which the big data platform 12 depends, and is a management system of data applications, and is a software platform integrating compilation, testing, online release and deployment of data applications.

The data management platform in the big data platform 12 has a data management tool of software, and provides the operation functions of the data table such as creation, change, deletion, recovery, information maintenance, and checking of each index of the data table for the platform of life cycle management and information maintenance of the data table in the big data platform in the enterprise.

The data development platform in the big data platform 12 is a software platform that can acquire information of a data table in the big data platform from the data management platform and perform script development, testing, release and the like of scheduling tasks based on the acquired information of the data table.

The data source management system in the big data platform 12 is used for managing the data source used by the big data platform 12 when extracting and/or pushing data, and plays a role of a bridge between the big data platform 12 and the data source.

The cluster management system in the big data platform 12 provides services for querying, creating, modifying, and deleting cluster, mart, and linear queue information for a software platform that manages distributed clusters, marts, and execution queues.

The scheduling center in the big data platform 12 is an execution control center for scheduling tasks such as ETL tasks, calculation tasks, and number-pushing tasks.

The data quality monitoring platform of the big data platform 12 is a software platform for monitoring the data quality after executing the scheduling task, and can provide various data quality monitoring indexes and corresponding monitoring strategies, and also has a monitoring alarm function for reporting the data quality.

The distributed cluster may be a Hadoop cluster, which is a cluster of the underlying technology of the big data platform 12 and may pass storage and computing services of mass data, and the like.

The demand management platform, the data development platform, the data source management system, the cluster management system, the dispatching center, the data quality monitoring platform and the distributed cluster can be respectively realized on one or more servers.

The data management method provided by the embodiment of the invention is described in the following with reference to the data system shown in fig. 2. Fig. 3 is a first flowchart of a data management method according to an embodiment of the present invention. The data management method may be performed by the big data platform 12 shown in FIG. 1 or FIG. 2 described above. As shown in fig. 3, the data management method may include:

s301, determining the importance level of the data table of the data application according to the level of the data application.

In the method, the level of the data application may be determined as an importance level of a data table of the data application. In the case of determining the importance level of the data table of the data application, the data table of the data application may also be identified in the big data platform to indicate the importance level of the data table. Wherein, identifying the data table may include: and identifying the data table in the big data platform by adopting the identification corresponding to the importance level, identifying the table in the service system generating the data table by adopting the identification corresponding to the importance level, and the like.

The data application may be any data application managed by the data application management system 13 described above. This S301 may be implemented by the data management platform in fig. 2 described above. The data application management system 13 serves as a management system of the data application, and may have a level of the data application. After the level of the data application is determined, the data application management system 13 may transmit the level of the data application to the data management platform, so that the data management platform determines the importance level of the data table of the data application in the big data platform 12 according to the level of the data application. Under the condition that the data management platform determines the importance level of the data table, the data management platform can also adopt the identifier corresponding to the importance level to identify the data table in the big data platform.

S302, managing the data processing process of the data table according to the importance level of the data table.

The data processing procedure for the data table may be to process the data of the data table. The data of the data table may be data of the data table from a data source of the big data platform 12. For large data platforms 12, the database system 11 shown in FIG. 2 above is the data source. The data obtained from the data source refers to data obtained from a business system and stored in a database of the business system in the database system 11.

In the method, the data processing process of the data table can be managed according to the importance level of the data table. Specifically, in the data processing process of the data table, corresponding processing resources may be allocated to the data table according to the importance level, and the data of the data table is processed according to the processing resources, that is, the data of the data table is processed. Wherein, managing the data processing process of the data table may include at least one of: allocating processing resources required for the data processing process of the data table, monitoring the structure of the data processing of the data table, changing the structure of the data table, and the like.

Taking the allocated resources in the data processing process as an example, the higher the importance level is, the more processing resources corresponding to the importance level are allocated, and the lower the importance level is, the less processing resources corresponding to the importance level are allocated. The processing resources may include, for example: storage and/or computing resources.

This S302 may be performed, for example, by the cluster management system in fig. 2 described above. The computing and/or storage resources of the distributed cluster are managed and coordinated by the cluster management system of fig. 2. The cluster management system allocates processing resources to the data table according to the importance level of the data table. The cluster management system can allocate processing resources to the data table under the condition of acquiring cluster and mart information from the distributed cluster.

And processing the data of the data table according to the processing resources distributed based on the importance levels, and pushing the processed data to the data application so as to enable the data application to use the processed data, thereby enabling the processed data to play a role corresponding to the importance levels.

According to the data management method provided by the embodiment of the application, the importance level of the data sheet of the data application can be determined according to the level of the data application, and the data processing process of the data sheet is managed according to the importance level of the data sheet. According to the method, the importance level of the data sheet of the data application is determined based on the level of the data application, the accuracy of the data level can be ensured, the data processing process of the data sheet is managed according to the importance level, manual configuration is not needed, the workload of data operation and maintenance is reduced, and the accuracy and timeliness of the data are ensured. In addition, in the method, more processing resources can be adopted in the data processing process of the data table with higher importance level, priority processing is realized, the execution of the service of the data application with higher importance level and the data quality are ensured, the important data play an important role as much as possible, and the resource utilization rate is improved.

On the basis of the data management method shown in fig. 3, an embodiment of the present invention may further provide a data management method. This embodiment may provide one possible implementation of determining the importance level of a data table. The data table of the data application includes: a usage data table for the data application, and, an upstream data table for the usage data table. The upstream data table is the data table used to generate the usage data table. Fig. 4 is a second flowchart of a data management method according to an embodiment of the present invention. As shown in fig. 4, determining the importance level of the data table of the data application according to the level of the data application in S301 may include:

s401, according to the level of the data application, determining the importance level of the use data table.

S402, according to the importance level of the usage data table, reverse tracing is conducted, and the importance level of the upstream data table is determined.

In the method, for example, the level of the data application can be directly determined as the importance level of the usage data table, and then the importance level of the upstream data table of the usage data table of the data application is determined by performing reverse tracing based on the blood relationship between data in the big data platform.

According to the method, the reverse tracing is carried out according to the importance level of the use data table, the importance level of the upstream data table of the use data table is determined, so that the importance level of each data table of the data application is obtained, the importance level of the data is more accurate, the data quality after the data with higher importance level is processed is effectively ensured, the important data plays an important role as much as possible, and the resource utilization rate is improved.

Optionally, as shown in S301, determining the importance level of the data table of the data application according to the level of the data application may further include:

s403, if one data table is an upstream data table of the usage data tables of the plurality of data applications, and the importance level of the one data table obtained according to the plurality of data applications is different, determining the highest importance level as the importance level of the one data table.

For example, fig. 5 is a schematic diagram illustrating determining an importance level of a data table in a data management method according to an embodiment of the present invention. As shown in fig. 5, since the operation monitoring platform can provide operation data for the enterprise, and improve functions of monitoring, enterprise operation analysis, online supervision, and the like, in the data application system, the level of the operation monitoring platform is L2. And the data directly used by the operation monitoring platform, namely the use data table of the operation monitoring platform, is an order detail table. According to the level L2 of the operation monitoring platform, the importance level of the order detail table is also L2, and according to the importance level L2 of the order detail table, performing backward tracing, and determining the upstream data table of the order detail table includes: the order width table, the order type dimension table, the order detail table, and the order pull list are all L2.

For an external data report that reports important business information of a related enterprise, assuming a role of public relations, the level of the external data report may be L1. The data directly used by the external data report, namely the usage data table of the external data report is an order amount summary table. According to the level L1 of the external data report, the importance level of the order amount summary table is also L1, and according to the importance level L1 of the order amount summary table, the reverse tracing is performed, and the upstream data table for determining the order amount summary table comprises: the order detail table and the order pull list are both L1 in importance level.

Since the order detail table and the order pull-up list obtained based on the operation monitoring platform are the upstream data tables of the usage data tables of the operation monitoring platform and the external data report, the importance level of the order detail table and the order pull-up list obtained based on the operation monitoring platform is L2, and the importance level of the order detail table and the order pull-up list obtained based on the external data report is L1, the highest importance level is selected, and then the importance level of the order detail table and the order pull-up list can be determined to be L1.

In the method, one data table is an upstream data table of a plurality of data application using data tables, and the highest importance level can be determined as the importance level of the data table according to different importance levels of the data table obtained by the data applications, so that the importance of the data table can be effectively ensured, and the importance level can be adopted to manage the data processing process of the data table, such as executing quality monitoring rules corresponding to the importance level, and the like, so as to ensure the data quality of the data table after data processing.

On the basis of the data management method shown in fig. 3 or fig. 4, an embodiment of the present invention may further provide a data management method. This embodiment may provide one possible implementation of determining data management. Fig. 6 is a flowchart of a data management method according to an embodiment of the present invention. As shown in fig. 6, the managing the data processing procedure of the data table according to the importance level of the data table in S302 may include:

s601, determining a quality monitoring rule corresponding to the importance level according to the importance level of the data table.

S602, according to the quality monitoring rule, checking the data quality to obtain the data quality information of the data table in the scheduling execution process or after the scheduling execution of the data task of the data table.

The scheduling execution of the data task is used for processing the data of the data table.

Before the scheduling execution of the data task, a corresponding processing resource may be allocated to the data table according to the importance level, and the data task corresponding to the data table is scheduled according to the processing resource to perform data processing on the data table.

Wherein the processing resources may include: and scheduling resources such as accounts and queues required by the data task. According to the method, the data task corresponding to the data table is scheduled according to the processing resource, so that the data task can be safely and effectively carried out, and the data quality of the processed data is ensured.

The data task may also be referred to as a scheduling task, which may be a task encapsulating data collection, processing, transmission back and output, and the like. Each time the data task is executed, a task instance is formed. The allocation of the processing resources may be implemented by the scheduling center shown in fig. 2, and the scheduling center may schedule the data task corresponding to the data table according to the processing resources allocated by the cluster management system, such as the account number, the queue, and other resources required for scheduling the data task, so that the data task is implemented in the distributed cluster, and the data processing is performed on the data table.

In the method, processing resources can be preferentially allocated to the data table with higher importance Level according to the importance Level of the data table, so that Service-Level agent (SLA) time of a data task is ensured.

Optionally, the data task as shown above may include at least one of: ETL tasks, data computation tasks, and data push tasks.

In the method, the quality monitoring rule corresponding to the importance level of the data table can be determined according to the importance level of the data table, so that the higher the importance level is, the more data quality information items are included in the quality monitoring rule corresponding to the assigned importance level, and the lower the importance level is, the fewer data quality information items are included in the quality monitoring rule corresponding to the assigned importance level. This S601 may be implemented by the distributed cluster in fig. 2 described above.

Optionally, the data quality information includes at least one of the following information: data integrity, data accuracy, data consistency, timeliness, and the like.

For example, if the importance level of the data table is L1 or L2, the quality monitoring rule corresponding to the importance level of the data table at least needs to include: the data quality monitoring method comprises four data quality information checking rules of data integrity, data accuracy, data consistency, timeliness and the like, wherein the timeliness has the highest priority in the quality monitoring rules. If the importance level of the data table is L3, the quality monitoring rule corresponding to the importance level of the data table at least needs to include: data integrity, data accuracy, data consistency, timeliness and the like. If the importance level of the data table is L4, the quality monitoring rule corresponding to the importance level of the data table may include: at least one item of data quality information checking rules of data integrity, data accuracy, data consistency, timeliness and the like, or any data quality information checking rule is not included, and no enforcement requirement is made under the important level. Among them, the importance level of L1 is higher than that of L2, the importance level of L2 is higher than that of L3, and the importance level of L3 is higher than that of L4.

The quality monitoring rules corresponding to the importance levels are only possible examples, and may be in other forms including other information, and the embodiments of the present invention are not limited thereto.

In the case that the quality monitoring rule is determined, the distributed cluster in fig. 2 may control the data quality monitoring platform to perform data quality check according to the quality monitoring rule, so as to control the data quality monitoring platform to obtain the data quality information of the data table during or after the scheduling execution of the data task from the scheduling center.

According to the method, a quality monitoring rule corresponding to the importance level of the data table can be determined according to the importance level of the data table, and data quality inspection is carried out according to the quality monitoring rule so as to obtain data quality information of the data table in the scheduling execution process of the data task of the data table or after the scheduling execution, so that the data table with higher importance level executes strict data quality guarantee measures, and the application accuracy of important data is ensured.

Optionally, if a scheduling fault occurs in the scheduling execution process of the data task, corresponding fault response times may be different for different importance levels, and the higher the importance level is, the shorter the fault response time corresponding to the importance level is, thereby ensuring timeliness and correctness of the data task; the lower the importance Level is, the longer the fault response time corresponding to the importance Level is, so that the timeliness and the correctness of data tasks such as acquisition and processing of data tables of different importance levels are guaranteed, and the requirements of the fault response time are different, thereby guaranteeing the scheduling execution of the data tasks corresponding to the data tables based on a Service-Level Agreement (SLA for short).

Optionally, in the method as described above, if there is a quality problem in the data of the data table during the scheduling execution of the data task or after the scheduling execution of the data task or the scheduling execution time of the data task is delayed, the method may further include:

and S603, determining an alarm mode corresponding to the importance level according to the importance level of the data table.

S604, sending out corresponding alarm by adopting the alarm mode to indicate that the data of the data table has quality problem or the scheduling execution time of the data task is delayed in the scheduling execution process of the data task or after the scheduling execution.

In this embodiment, if at least one piece of data quality information does not satisfy the preset condition in the scheduling execution result obtained based on the quality monitoring rule, it may be determined that the data quality problem exists in the data table after the data task is scheduled and executed. On the contrary, if each piece of data quality information in the scheduling execution result obtained by the quality monitoring rule meets the preset condition, it can be determined that the data quality problem does not exist in the data table after the data task is scheduled and executed. If the scheduling execution time of the data task is greater than or equal to the preset time, the scheduling execution time of the data task can be determined to be delayed.

In this embodiment, the alarm modes corresponding to different importance levels may have corresponding response times.

For example, if the importance level of the data table is L1 or L2, the alarm manner corresponding to the importance level of the data table at least includes: a telephone alarm mode; if the importance level of the data table is L3, the alarm manner corresponding to the importance level of the data table may include: short message alarm mode; if the importance level of the data table is L4, the alarm manner corresponding to the importance level of the data table may include: and E, mail warning mode. The alarm modes corresponding to different importance levels may have corresponding response times.

The above-mentioned alarm modes corresponding to the importance levels are only possible examples, and may also be in other forms, including other information, and the embodiment of the present invention is not limited thereto.

Both S603 and S604 may be implemented by the distributed cluster in fig. 2.

In the embodiment, according to the importance levels of the data table, the alarm modes corresponding to the importance levels of the data table are determined, so that each importance level has a corresponding alarm mode, and the corresponding alarms are sent by adopting the alarm modes to indicate that a data quality problem exists in the data table or a delay occurs in the scheduling execution time of the data task in the scheduling execution process of the data task or after the scheduling execution of the data task, so that the timeliness, the correctness and the fault response time of important data are ensured, and the normal order of important data application is ensured.

On the basis of the data management method shown in any one of fig. 3 to fig. 6, an embodiment of the present invention may further provide a data management method. The embodiment can provide a possible implementation mode for carrying out change processing on the structure of the data table in the big data platform under the condition that the table structure of the business system is changed. Fig. 7 is a fourth flowchart of a data management method according to an embodiment of the present invention. As shown in fig. 7, on the basis of the data management method, the method may further include:

s701, if the table structure of the business system is changed, determining whether to change the structure of the data table according to the importance level of the data table and the change type of the table structure of the business system.

And S702, if so, executing the change operation corresponding to the table structure of the service system on the data table.

In this embodiment, whether to change the table structure of the data table may be determined by using a change requirement table corresponding to a preset importance level and a preset change type according to the importance level of the data table and the change type of the table structure. The change request table corresponding to the importance level and the change type may be, for example, as shown in table 1 below.

TABLE 1

If the importance level of the data table is L1 or L2 and the change type of the table structure is a newly added field, determining whether the change requirement is negative, namely, the table structure of the data table does not need to be changed; if the importance level of the data table is L1 or L2, the change type of the table structure is modified field, deleted field, modified field type, added or modified comment, or deleted table, it is determined that the change requirement is yes, i.e., the table structure of the data table needs to be changed.

If the importance level of the data table is L3, the change type of the table structure is a new field, or a new or modified note, it can be determined whether the change requirement is negative, i.e. the table structure of the data table does not need to be changed; if the importance level of the data table is L3, and the change type of the table structure is a modified field, a deleted field, a modified field type, or a deleted table, it can be determined that the change requirement is yes, i.e., the table structure of the data table needs to be changed.

If the importance level of the data table is L4, and the change type of the table structure is an added field, a deleted field, a modified field type, or an added or modified note, it can be determined whether the change requirement is negative, i.e. the table structure of the data table does not need to be changed; if the importance level of the data table is L4, and the change type of the table structure is a modified field or a deleted table, it can be determined that the change requirement is yes, i.e., the table structure of the data table needs to be changed.

In the method, the database system 11 in fig. 2 determines whether to push a change request according to the importance level of the data table and the change type of the table structure, and if so, transmits the change request to the request management platform to request the request management platform, and transmits the information of the change table structure to the data management platform, so that the data management platform makes a corresponding change to the table corresponding to the change table structure of the data table according to the information of the change table structure.

In this embodiment, whether to change the table structure of the data table may be determined based on the importance level of the data table and the change type of the table structure, and then the table structure of the data table is changed, so that accurate control of the change of the table structure on the big data platform after the table structure of the business system is changed is achieved.

The service system may be an online service system. If the table structure of the service system is changed, the table structure of the data in the database of the service system in the database system 11 is also changed, and in order to ensure that the big data platform performs accurate data processing on the data table from the database of the service system, the change operation corresponding to the table structure of the service system needs to be performed on the data table. S702 may change the table structure by the data management platform in fig. 2, call a change resource corresponding to the importance level on the distributed cluster to change the table, and feed back the change result to the data management platform in real time.

And S703, adjusting the script of the data task corresponding to the data table according to the changed table.

In the method, after the change, the script of the data task can be determined according to the information of the data task corresponding to the data table, and the script of the data task is adjusted. The script of the data task may include: a computation script and/or an analysis script for the data task.

After the data management platform in fig. 2 performs the table change processing, metadata information, storage information, and a data calculation execution log corresponding to the changed table of the changed table may be obtained from the distributed cluster, and a change instruction may be sent to the data development platform according to the information obtained from the distributed cluster, so that the data development platform may adjust a script of a data task corresponding to the data table, such as a calculation script and/or an analysis script of the data task.

After the script of the data task is changed, a notification can be sent to the responsible node of the downstream task of the data task to notify the downstream responsible node to make corresponding modification, so that each data task corresponding to the scheduling data table is ensured to carry out stable and smooth data processing.

S704, determining the test strategy corresponding to the importance level according to the importance level of the data table.

S705, testing the adjusted script under the test strategy corresponding to the importance level.

Because the scripts of the data tasks corresponding to the data table are adjusted, the adjusted scripts are required to be tested in order to ensure the normal operation of the adjusted scripts. Because certain test resources are needed for script testing, if the same test strategy is adopted for all the adjusted scripts, the test resources may be wasted, or the test accuracy is not high, and the test effect is difficult to ensure. Therefore, in this embodiment, the test policy corresponding to the importance level may be determined according to the importance level of the data table, and then the adjusted script may be tested according to the test policy corresponding to the importance level. The test strategies corresponding to different importance levels include: information of different test environments.

Optionally, the test policy is also called a test scheme, and may include information of at least one of the following test environments: development environment, test environment, grayscale environment, and online environment.

The development environment refers to a software environment in the processes of development, debugging and unit testing of the adjusted script. The test environment is a software environment for testing the adjusted script based on a preset test data set or the like after the adjusted script is developed. The testing range of the gray scale environment is larger than the testing environment but smaller than the on-line environment, the gray scale environment can also be called a small-range on-line environment, and the testing in the gray scale environment can be called an internal test. The online environment is also called a formal environment, which is a software environment for testing in the online running environment after the adjusted script is developed.

For example, if the importance level of the data table is L1, the test policy corresponding to the importance level may include: and testing in testing environments such as development environment, testing environment, gray level environment, formal data of online environment and the like, and testing the adjusted script according to the testing strategy corresponding to the important level.

If the importance level of the data table is L2, the test policy corresponding to the importance level may include: and testing in test environments such as development environment, test environment, gray scale environment, dry running (Dryrun) data of online environment and the like, and testing the adjusted script according to the test strategy corresponding to the importance level.

If the importance level of the data table is L3, the test policy corresponding to the importance level may include: and testing in testing environments such as a development environment, a testing environment, a gray level environment and the like, and testing the adjusted script according to the testing strategy corresponding to the important level.

If the importance level of the data table is L4, the test policy corresponding to the importance level may include: and testing in testing environments such as a development environment, a testing environment and the like, and testing the adjusted script according to the testing strategy corresponding to the importance level. Among them, the importance level of L1 is higher than that of L2, the importance level of L2 is higher than that of L3, and the importance level of L3 is higher than that of L4.

The test policies corresponding to the importance levels are only possible examples, and may be in other forms, including other information, and the embodiments of the present invention are not limited thereto.

In the method of this embodiment, the test strategy corresponding to the importance level may be determined according to the importance level of the data table, and then the adjusted script may be tested according to the test strategy corresponding to the importance level, thereby avoiding waste of test resources, ensuring test accuracy, and ensuring test effect.

Optionally, the method may further include:

and S706, determining an online approval process corresponding to the importance level according to the importance level of the data table.

The online approval process comprises the following steps: and at least one online approval node in the big data platform. The at least one online approval node comprises: the data task comprises a higher-level approval node of a script of the data task, a lower-level approval node of a script of the data task, a change requirement proposed approval node, a business responsible approval node corresponding to the data task and the like.

For example, if the importance level of the data table is L1 or L2, the online approval process corresponding to the importance level may include the following four online approval nodes: the data task comprises a higher-level approval node of a script of the data task, a lower-level approval node of a script of the data task, a change requirement proposed approval node and a business responsible approval node corresponding to the data task.

If the importance level of the data table is L3, the online approval process corresponding to the importance level may include the following three online approval nodes: the upper examination and approval nodes of the scripts of the data task are responsible for the nodes, the lower examination and approval nodes of the scripts of the data task are responsible for the examination and approval nodes, and the examination and approval nodes are provided according to change requirements.

If the importance level of the data table is L4, the online approval process corresponding to the importance level may include: the upper-level approval node of the script responsible node of the data task. Among them, the importance level of L1 is higher than that of L2, the importance level of L2 is higher than that of L3, and the importance level of L3 is higher than that of L4.

The above online approval process corresponding to each importance level is only a possible example, and it may also be in other forms, including other information, and the embodiment of the present invention is not limited thereto.

And S707, according to the online approval process, performing approval processing on the adjusted script.

The embodiment of the invention can also carry out approval treatment on the adjusted script according to the upper limit approval process corresponding to the importance level of the data table so as to ensure the timeliness of the online of the adjusted script and the running stability of the online script.

On the basis of any one of the above methods, optionally, the method may further include:

and if the database of the service system is expanded or migrated, expanding or changing the data source of the big data platform. Specifically, the database system shown in fig. 2 may transmit the change information of the database that is expanded or migrated to the data source management system shown in fig. 2, and the data source management system may expand or change the data source of the big data platform according to the change information. The data source management system can also transmit the change information to a dispatching center, and the dispatching center adds or modifies the data source referenced by the data task according to the change information.

When the data source referred by the data task is changed, such as expansion or modification, a test strategy corresponding to the importance level can be determined according to the importance level of the data table related to the data task, and then the data task after the data source is changed and referred is tested according to the test strategy. In the method of the embodiment, the test strategy corresponding to the importance level can be determined according to the importance level of the data table, and then the data task after changing and quoting the data source is tested according to the test strategy corresponding to the importance level, so that the waste of test resources is avoided, the test accuracy is ensured, and the test effect is ensured.

After testing the data task which changes and refers to the data source, determining an online approval process corresponding to the important level according to the important level of the data table related to the data task, and performing approval processing on the data task which changes and refers to the data source according to the online approval process. According to the embodiment of the invention, the data task after the reference data source is changed can be approved according to the online approval process corresponding to the important level of the data table, so that the online timeliness of the data task after the reference data source is changed is ensured, and the operation stability of the data task after the reference data source is changed is ensured.

The following is an embodiment of the apparatus of the present invention, which can be used to implement the above-mentioned embodiment of the method of the present invention, and the implementation principle and technical effects are similar.

Fig. 8 is a schematic structural diagram of a data management apparatus according to an embodiment of the present invention. As shown in fig. 8, the data management apparatus 80 of the present embodiment is applicable to a large data platform, and the data management apparatus 80 may include:

the determining module 81 is configured to determine the importance level of the data table of the data application according to the level of the data application.

And the management module 82 is used for managing the data processing process of the data table according to the importance level of the data table.

Optionally, the data table of the data application includes: a usage data table for the data application, and, an upstream data table of the usage data table; the upstream data table is the data table used to generate the usage data table.

A determining module 81, configured to determine, according to the level of the data application, an importance level of the usage data table of the usage data application; and performing reverse tracing according to the importance level of the use data table, and determining the importance level of the upstream data table.

Optionally, the determining module 81 is further specifically configured to determine the highest importance level as the importance level of one data table if the one data table is located in an upstream data table of the usage data tables of the multiple data applications, and the importance levels of the one data table obtained according to the multiple data applications are different.

Optionally, the management module 82 is specifically configured to determine, according to the importance level of the data table, a quality monitoring rule corresponding to the importance level of the data table; and according to the quality monitoring rule, checking the data quality to acquire the data quality information of the data table in the scheduling execution process or after the scheduling execution of the data task. The scheduling execution of the data task is used for processing the data of the data table.

Optionally, the data task includes at least one of the following: ETL tasks, data computation tasks, and data push tasks.

Optionally, the data quality information includes at least one of the following information: data integrity, data accuracy, data consistency.

Optionally, the management module 82 is further configured to determine, according to the importance level of the data table, an alarm manner corresponding to the importance level of the data table if the data quality problem of the data table or the scheduling execution time of the data task is delayed during or after the scheduling execution of the data task, and send a corresponding alarm by using the alarm manner to indicate that the data of the data table has the quality problem or the scheduling execution time of the data task is delayed during or after the scheduling execution of the data task.

Optionally, the data management apparatus 80 further includes:

the change module is used for determining whether to change the structure of the data table according to the importance level of the data table and the change type of the table structure of the business system if the table structure of the business system is changed; if so, executing the change operation corresponding to the table structure of the service system on the data table; adjusting the script of the data task corresponding to the data table according to the changed table;

the test module is used for determining a test strategy corresponding to the importance level according to the importance level of the data table; and testing the adjusted script under the test strategy corresponding to the importance level. The test strategies corresponding to different importance levels include: testing under different testing environments.

Optionally, the test strategy includes a test in at least one of the following test environments: a development environment, a test environment, a grayscale environment, or an online environment.

Optionally, the management module 82 is further configured to determine, according to the importance level of the data table, an online approval process corresponding to the importance level; and according to the online examination and approval process, carrying out examination and approval processing on the adjusted script.

Fig. 9 is a schematic structural diagram of a server according to an embodiment of the present invention. As shown in fig. 9, the server 90 of the present embodiment includes: a memory 91 and a processor 92. The memory 91 is connected to the processor 92 via a bus. The server 90 may be a server of the above-mentioned big data platform, may be a single server, or may be implemented by a plurality of servers.

A memory 91 for storing program instructions.

A processor 92 for executing the data management method of any of the methods of fig. 3-7 described above when the program instructions are executed.

An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executable by the processor 92 described in the foregoing fig. 9 to implement a data management method in any one of the methods described in the foregoing fig. 3 to fig. 7.

The data management apparatus, the server router, and the computer-readable storage medium according to the embodiments of the present invention may execute the data management method in any one of the methods shown in fig. 3 to 7, and for specific implementation and effective effects, reference may be made to the above, which is not described herein again.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The computer program may be stored in a computer readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media capable of storing program codes, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A data management method is characterized in that the data management method is suitable for a big data platform, and the method comprises the following steps:

2. The method of claim 1, wherein the data table of the data application comprises: a usage data table for the data application, and, an upstream data table of the usage data table; the upstream data table is a data table used for generating the usage data table; the determining the importance level of the data table of the data application according to the level of the data application comprises:

determining an importance level of the usage data table according to a level of the data application;

and performing reverse tracing according to the importance level of the use data table, and determining the importance level of the upstream data table.

3. The method of claim 1, wherein determining the importance level of the data table of the data application according to the level of the data application further comprises:

if one data table is an upstream data table of the usage data tables of a plurality of data applications, and the importance levels of the data table obtained according to the data applications are different, determining the highest importance level as the importance level of the data table.

4. The method according to any one of claims 1-3, wherein managing the data processing procedure of the data table according to the importance level of the data table comprises:

determining a quality monitoring rule corresponding to the importance level according to the importance level of the data table;

and according to the quality monitoring rule, checking the data quality to acquire the data quality information of the data table in the scheduling execution process or after the scheduling execution of the data tasks of the data table, wherein the scheduling execution of the data tasks is used for carrying out data processing on the data table.

5. The method of claim 4, wherein the data task comprises at least one of: the ETL task, the data calculation task and the data pushing task are extracted, converted and loaded.

6. The method of claim 4, wherein the data quality information comprises at least one of: data integrity, data accuracy, data consistency.

7. The method of claim 4, wherein if there is a quality problem in the data of the data table during the scheduled execution of the data task or after the scheduled execution of the data task, or if the scheduled execution time of the data task is delayed, the method further comprises:

determining an alarm mode corresponding to the importance level according to the importance level of the data table;

and sending corresponding alarms by adopting the alarm mode to indicate that the data of the data table has quality problems or the scheduling execution time of the data task is delayed in the scheduling execution process of the data task or after the scheduling execution.

8. The method of claim 4, further comprising:

if the table structure of the business system is changed, determining whether to change the structure of the data table according to the importance level of the data table and the change type of the table structure of the business system;

if so, executing change operation corresponding to the table structure of the service system on the data table;

adjusting the script of the data task corresponding to the data table according to the changed table;

determining a test strategy corresponding to the importance level according to the importance level of the data table;

testing the adjusted script under the test strategies corresponding to the importance levels, wherein the test strategies corresponding to different importance levels comprise: testing under different testing environments.

9. The method of claim 8, wherein the testing strategy comprises testing of at least one of the following testing environments: a development environment, a test environment, a grayscale environment, or an online environment.

10. The method of claim 8, further comprising:

determining an online approval process corresponding to the importance level according to the importance level of the data table;

and according to the online examination and approval process, carrying out examination and approval processing on the adjusted script.

11. A data management device, wherein the data management device is adapted for a big data platform, the data management device comprising:

12. A server, comprising: a memory and a processor; the memory is connected with the processor;

the memory to store program instructions;

the processor, when the program instructions are executed, implementing the data management method of any of claims 1-10.

13. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the data management method of any one of claims 1 to 10.