CN110457371A - Data managing method, device, storage medium and system - Google Patents

Data managing method, device, storage medium and system Download PDF

Info

Publication number
CN110457371A
CN110457371A CN201910744219.5A CN201910744219A CN110457371A CN 110457371 A CN110457371 A CN 110457371A CN 201910744219 A CN201910744219 A CN 201910744219A CN 110457371 A CN110457371 A CN 110457371A
Authority
CN
China
Prior art keywords
data
task
rule
verification
result table
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910744219.5A
Other languages
Chinese (zh)
Inventor
孙伟伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Zan Technology Co Ltd
Original Assignee
Hangzhou Zan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Zan Technology Co Ltd filed Critical Hangzhou Zan Technology Co Ltd
Priority to CN201910744219.5A priority Critical patent/CN110457371A/en
Publication of CN110457371A publication Critical patent/CN110457371A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of data managing method, is executed by metadata system, this method comprises: executing the quality of data in response to the off-line data task output result table of data warehouse to result table and verifying task, obtain check results;Judge whether the check results are verification failure;It is verification failure in response to check results, determines the task node of verification failure, and judges whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is key task node;In response to determining that the task node of the verification failure is configured with the first alarm mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.Data managing method provided by the invention can be improved data management efficiency, prevent Downstream Jobs from avalanche effect occur, avoid the occurrence of the data problem of large area.The present invention also provides data administrator, storage medium and systems simultaneously.

Description

Data managing method, device, storage medium and system
Technical field
The present invention relates to data warehouse technology field more particularly to data managing method, device, storage medium and systems.
Background technique
Data warehouse is the indispensable ring of company for possessing mass data, be in order to further mining data resource, It is generated for decision needs.The acquisition of data is necessarily involved in this process, is cleaned, a series of behaviour of integration etc. Make, then, along with data mining, necessarily be unableing to do without data management during data-oriented.
Inventor has found that the prior art, which has following defects that, is carrying out data management in the practice of the invention When, warehouse developer generally require to spend a large amount of energy go to pay close attention to some other systems for checking the quality of data or Person is that the additional exploitation quality of data checks task;Moreover, the data management of this mode, knows the mode of aftersensation after belonging in fact, When the quality of data goes wrong, it cannot be easy to cause avalanche effect, out in first time perception data quality problems The data problem of existing large area.
Summary of the invention
Based on this, the embodiment of the present invention proposes a kind of data managing method, device, storage medium and system, Neng Gouti High data management efficiency, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of the data problem of large area.
Data managing method provided in an embodiment of the present invention is applied to metadata system, which comprises
In response to the off-line data task output result table of data warehouse, quality of data verification is executed to the result table and is appointed Business, obtains check results;
Judge whether the check results are verification failure;
It is verification failure in response to the check results, determines the task node of verification failure, and judges verification failure Whether task node is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is crucial Task node;
In response to determining that the task node of the verification failure is configured with first alarm mode, Xiang Suoshu data warehouse Interrupt signal is sent, so that the data warehouse interrupts the execution of the off-line data task.
In a kind of optional embodiment, the off-line data task output result table in response to data warehouse is right The result table executes the quality of data and verifies task, obtains check results, comprising:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task The metadata genetic connection obtained determines preconfigured at least one quality of data verification rule corresponding with the result table; And
Quality of data verification is carried out to the result table according at least one quality of data verification rule, is verified As a result;
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
In a kind of optional embodiment, at least one quality of data verification rule includes: general rule, makes by oneself Adopted table level rule and custom field rule;
The then off-line data task output result table in response to data warehouse, based on previously according to the off-line data The metadata genetic connection that task obtains determines preconfigured at least one quality of data verification corresponding with the result table Rule, comprising:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task Metadata genetic connection, inquire corresponding with the result table preconfigured customized table level rule and it is described oneself Define field rule;The customized table level rule includes non-SQL type custom rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
It is described regular to the result according at least one quality of data verification in a kind of optional embodiment Table carries out quality of data verification, obtains check results, comprising:
Successively from the non-SQL type custom rule, the SQL type custom rule, the general rule and it is described from It defines the rule taken out in each queue in corresponding four queues of field rule to verify the result table, obtain Check results.
In a kind of optional embodiment, the method also includes:
It is verification failure in response to the check results, is determined according to the asset level of preconfigured alarm channel and table Alarm object, and the alarm object is alerted.
In a kind of optional embodiment, the method also includes:
The information of the result table is collected when executing quality of data verification task to the result table;
By the metadata genetic connection by off-line data task orientation to the result table, to determine for producing State cluster resource consumed by the off-line data task of result table;
The information of the result table, the cluster resource are shown with the check results using visualization tool.
In a kind of optional embodiment, the method also includes:
Data standard rule is checked by timing asynchronous task, obtains inspection result;
Notice is sent to corresponding business domains responsible person according to the inspection result;
It is wherein, described that data standard rule is checked by timing asynchronous task, comprising:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
The embodiment of the present invention also provides a kind of data administrator, which is applied to metadata system.The device includes:
The data administrator is applied to metadata system, and described device includes quality of data correction verification module, verification knot Fruit judgment module, alarm mode judgment module and tasks interrupt module.
Quality of data correction verification module is used for the off-line data task output result table in response to data warehouse, to the result Table executes the quality of data and verifies task, obtains check results;
Check results judgment module is for judging whether the check results are verification failure;
Alarm mode judgment module is used in response to the check results be verification failure, determines the task section of verification failure Point, and judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate institute State whether task node is key task node;
Tasks interrupt module is used in response to determining that the task node of the verification failure is configured with the first alarm side Formula, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
As an improvement of the above scheme,
Another embodiment of the present invention is corresponding to provide a kind of storage medium, and the storage medium includes the computer journey of storage Sequence, wherein the computer program controls storage medium institute's Coupling device and realizes such as above-mentioned any embodiment institute when running The data managing method stated.
Another embodiment of the present invention correspondence provides data management system, including one or more processors;Memory;With And one or more computer programs, wherein one or more of computer programs are stored in the memory, and It is configured as being executed by one or more of processors, the computer program includes for executing such as above-mentioned any embodiment The data managing method.
Compared with the existing technology, the present invention has following outstanding the utility model has the advantages that the embodiment of the invention provides data pipes Manage method, apparatus, storage medium and system, wherein the method is applied to metadata system, by data warehouse from The result table of line data task institute output carries out quality of data verification, and developer in the production process of data warehouse is of interest Data quality problem showed by metadata system.Developer is not necessarily to the problems such as actively going focused data quality, and It is that the efficiency of data warehouse staff development is greatly improved by metadata system active drive, avoids data warehouse developer When carrying out data management, inefficiency but spends a large amount of energy, and the case where be easy to produce data problem.Data check Process combination metadata system, the problem of enabling developer to perceive data more comprehensively, automatically.This method Also determine whether the task node is pass by judging whether the task node of verification failure is configured with the first alarm mode Key task node, and interrupt signal is sent to the data warehouse when determining that task node is key node, so that the number The execution that the off-line data task is reversely interrupted according to warehouse, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of large area Data problem.
Detailed description of the invention
Fig. 1 is the flow diagram of the data managing method of an embodiment provided by the invention;
Fig. 2 is the flow diagram of the step S110 of an embodiment provided by the invention;
Fig. 3 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 4 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 5 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 6 is the structural schematic diagram of the data administrator of an embodiment provided by the invention;
Fig. 7 is the structural schematic diagram of the data management system of an embodiment provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
It is the flow diagram of the data managing method of an embodiment provided by the invention, the data referring to Fig. 1 Management method is applied to metadata system.
Metadata (Meta Data) refers to the data of description data, is usually made of the description of message structure, with technology Development metadata intension have a very big extension, for example uml model, data trade rule, compiled with Java .NET, C++ etc. APIs, operation flow and the Work flow model write, products configuration description and tuning parameter and various businesses rule, term and fixed Justice etc..Metadata should also include the description to various new data types, such as to position, name, user click frequency, audio, The description etc. of video, picture, various wireless aware device datas and various monitoring device data etc..Metadata is generally divided into business Metadata, technology metadata and operation metadata etc..Data service metadata mainly include business rule, definition, term, nomenclature, Using service language etc., main user is service-user for algorithm and system.Technology metadata is mainly used to define information Supply chain (Information Supply Chain, ISC) all kinds of component part metadata structures, specifically include each system table With the various objects such as field structure, attribute, source, dependence etc. and storing process, function, sequence.Operation metadata refers to Application program operation information, such as its frequency, the analysis of record number and various components and other statistical informations etc..
Metadata system, also referred to as metadata management system, the platform can be managed metadata.
Data managing method provided in this embodiment includes:
Step S110, in response to the off-line data task output result table of data warehouse, data are executed to the result table Quality indicator task, obtains check results;
Step S120, judge whether the check results are verification failure;
Step S130, it is verification failure in response to the check results, determines the task node of verification failure, and judge school Whether the task node for testing failure is configured with the first alarm mode;First alarm mode is used to indicate the task node No is key task node;
Step S140, in response to determining that the task node of the verification failure is configured with first alarm mode, to institute It states data warehouse and sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
In daily data warehouse production process, each it is related to the off-line data task of output tables, in the present invention, The table of off-line data task output is referred to as result table.Data warehouse calls in a manner of through HTTP after having executed data task The interface of metadata system triggers the quality of data relevant to the result table of off-line data task output and verifies task.
Optionally, the first alarm mode can be the modes such as warning by telephone.
Specifically, the method also includes: in response to determine it is described verification failure task node be configured with described first Alarm mode carries out warning by telephone to the related personnel of the off-line data task.Warning by telephone can be used for illustrating not handle asking Topic may cause the case where downstream data failure.By warning by telephone, problem is handled in time for the related personnel of the task, just In the execution of the recovery data task after processing problem, so that closed loop is formed between metadata system and data warehouse, it is whole The operation of body smoothness.
Further, the method also includes: in response to determine it is described verification failure task node configuration without first accuse Police's formula, then the task node is not key task node, and first alarm mode can be used at this time and notify related personnel, Convenient for the subsequent follow-up processing of developer.Different disposal is made according to the significance level of task node, improves flexibility, is guaranteed The treatment effeciency of off-line data task.
Data managing method provided in this embodiment is applied to metadata system, passes through the off-line data to data warehouse The result table of task institute output carries out quality of data verification, by the data of interest of developer in the production process of data warehouse Quality problems are showed by metadata system.Developer is not necessarily to the problems such as actively going focused data quality, but by member Data system active drive greatly improves the efficiency of data warehouse staff development, avoids data warehouse developer and is carrying out Inefficiency but spends a large amount of energy when data management, and the case where be easy to produce data problem.The process of data check In conjunction with metadata system, the problem of enabling developer to perceive data more comprehensively, automatically.This method also passes through Judge whether the task node of verification failure is configured with the first alarm mode to determine whether the task node is key task Node, and interrupt signal is sent to the data warehouse when determining that task node is key node, so that the data warehouse The execution for reversely interrupting the off-line data task, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of the data of large area Problem.
Referring to fig. 2, be an embodiment provided by the invention step S110 flow diagram, with above-described embodiment Unlike step S110~step S140 of offer, in the present embodiment, step S110 includes:
Step S1101, in response to the off-line data task output result table of data warehouse, based on previously according to described offline The metadata genetic connection that data task obtains determines preconfigured at least one quality of data corresponding with the result table Verification rule;
Step S1102, quality of data school is carried out to the result table according at least one quality of data verification rule It tests, obtains check results.
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
Generation, processing fusion, the circulation circulation of data, wither away to final, will form a kind of relationship, blood naturally between data Edge relationship is for expressing this relationship between data.Database, table and field are the storage organizations of data.Inhomogeneity The data of type have different storage organizations.Storage organization determines the hierarchical structure of genetic connection.
Based on metadata genetic connection, guarantee the correspondence of quality of data verification rule and result table, convenient for flexibly for not Same result table configures different quality of data verification rules;Result table is carried out according at least one quality of data verification rule Verification, is conducive to more fully find data quality problem.
Optionally, at least one quality of data verification rule includes: general rule, customized table level rule and makes by oneself Adopted field rule;
Then step S1101 includes:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task Metadata genetic connection, inquire corresponding with the result table preconfigured customized table level rule and it is described oneself Define field rule;Customized table level rule includes non-SQL (structured query language, Structured Query Language) customized verification rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
It should be noted that customized table level rule, custom field rule are all customized verification rule, relative to For general rule, customized verification rule can be custom-configured according to the demand of developer.In other embodiment party In formula, customized table level rule can also only include one of non-SQL type custom rule and SQL type custom rule.
Further, described that quality of data school is carried out to the result table according at least one quality of data verification rule It tests, obtains check results, comprising:
It is successively corresponding from non-SQL type custom rule, SQL type custom rule, general rule and custom field rule Four queues in the rule that takes out in each queue the result table is verified, obtain check results.
In order to not influence the progress of off-line data task, the mode of the asynchronous execution of well known Producer-consumer problem is taken Successively consume the task in queue.Specifically, meeting when a data quality indicator rule being put into a queue every time Excute () method of the ExecutorService class of Java is executed, excute () method can be to registration one in task factory Task, needs to be passed to a worker parameter, and the method for worker parameter definition consumption takes the worker meeting of task in this way The verification rule that queue is taken out in into factory executes verification.
Each Worker etc., can be successively customized from non-SQL type custom rule, SQL type until when the right of consumption The rule taken out in each queue in regular four queues of rule, general rule and custom field is verified, in this way It may insure that worker number is consistent with the number of tasks of verification, that is, all verification tasks can all be ensured that final consumption falls.
When being tested according to general rule to result table, metadata system can access HDFS file system according to library name table name System takes out some metamessages of result table, such as size, line number etc..Then full dose verification is carried out respectively and increment verifies, two The data of kind verification respectively according to certain algorithm and before are compared, and obtain the data check situation of table comprehensively.
It is to meet the needs of number storehouse personnel are different, data that customized table-level data check rule, which is divided into SQL type with non-SQL type, Warehouse developer can shift to an earlier date the customized table level rule of quality indicator module typing of the table on metadata system.Metadata system It the different configurations such as provides different manner of comparison, compares cycle, compare content, comparison range, being needed when configuration has not been met When asking, then can directly take out oneself desired result by customized SQL mode and carry out content comparison, metadata system according to The compares cycle of configuration automatically selects presto or the query engine of druid is inquired.
Rule is verified for custom field, data warehouse can equally shift to an earlier date typing rule, example on metadata system Can such as configure enumerate, uniquely, non-empty rule, execute verification task when can be automatically generated according to configuration inquiry sql, In It is inquired in presto, obtains the laggard line discipline verification of query result.
It is the flow diagram of the data managing method of another embodiment provided by the invention referring to Fig. 3.With it is above-mentioned Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S150 is verification failure in response to the check results, according to the money of preconfigured alarm channel and table It produces grade and determines alarm object, and the alarm object is alerted.
It should be noted that the present embodiment and being executed not in strict accordance with the sequence of step S110~S140, such as step S140 and step S150 can be executed parallel.
Specifically, the check results include general rule check results, customized table level rule check results and make by oneself Adopted field rule check results.General rule is verified, is directly recorded if verifying successfully to general rule check results, If it fails, the content of failure to be then sent to the related alarm acceptor of table by mail automatically, then record to Universal gauge Then check results are stored in metadata system, and the verification situation of inter-related task, and processing verification are obtained convenient for related personnel The data quality problem of failure.For the verification of custom field rule and the verification of table level custom rule, if verification failure, It is reported according to checking procedure generation check problem and distributes to relevant treatment personnel.The case where for verification unsuccessfully, can also incite somebody to action Check problem reports that typing Jira system (project and affairs trace tool), the system belong to developer's daily need concern System handles data quality problem, and metadata system is by the way that by this process automation, developer finally only needs to close It infuses Jira and handles questionnaire.
Referring to fig. 4, be another embodiment provided by the invention data managing method flow diagram.With it is above-mentioned Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S160 collects the information of the result table when executing quality of data verification task to the result table;
Step S170, by the metadata genetic connection by off-line data task orientation to the result table, with determination For producing cluster resource consumed by the off-line data task of the result table;
Step S180, using visualization tool to the information of the result table, the cluster resource and the check results It is shown.
It should be noted that the present embodiment and being executed not in strict accordance with the sequence of step S110~S140, such as step S160 and step S110 can be executed parallel.
Specifically, quality of data billboard can be established, visualization tool is presented in quality of data billboard.
The visualization tool may include the chart for display data trend.During above-mentioned verification, metadata System has collected the partition size of table, line number, full table size, line number, and field enumerates distribution situation etc. process metadata, and It will be in these data filings to different mysql tables.The information of result table can more intuitively be shown using visualization tool To user, facilitates the hiding data problem of discovery, facilitate data warehouse developer perception problems and investigation problem.
Executing off-line data task and while data quality indicator task, Hive that data task is relied on or Spark engine also will record the process metadata of data task, and time-consuming of such as handling up, gc time etc., these information can all be pushed away It is sent to the library mysql of metadata system, by the offline task state of timing, in conjunction with metadata genetic connection, by task orientation To table, just can obtain producing the cluster resource that the data task of each table is spent.Optionally, certain algorithm can be used, it will Cluster resource conversion is the amount of money, as cost result, is then shown using visualization tool to cost result.Data warehouse Developer can also be concerned about the calculating cost variation of table, be convenient for data warehouse developer while focused data trend More intuitively cluster resource consumed by impression task, so that the relevant data task logic of data warehouse personnel optimization is pushed, Reduce the resource of consumption.
It is the flow diagram of the data managing method of another embodiment provided by the invention referring to Fig. 5.With it is above-mentioned Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S190, data standard rule is checked by timing asynchronous task, obtains inspection result;
Step S200, notice is sent to corresponding business domains responsible person according to the inspection result.
Specifically, step S190 includes:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
It should be noted that step S190 can also include the steps that checking other specification rules, it can be according to reality The configuration of border demand, is not listed herein.
The embodiment realizes automaticly inspecting for data standard, avoids leading to avalanche effect because data are lack of standardization.
It is the structural schematic diagram of the data administrator of an embodiment provided by the invention referring to Fig. 6.The present invention is real The data administrator 1 for applying example offer includes: quality of data correction verification module 210, check results judgment module 220, alarm mode Judgment module 230 and tasks interrupt module 240.
Quality of data correction verification module 210 is used for the off-line data task output result table in response to data warehouse, to described As a result table executes the quality of data and verifies task, obtains check results.
Check results judgment module 220 is for judging whether the check results are verification failure.
Alarm mode judgment module 230 is used to be verification failure in response to the check results, determines appointing for verification failure Business node, and judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is for referring to Show whether the task node is key task node.
Tasks interrupt module 240 is used for the task node in response to determining the verification failure configured with first alarm Mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
It should be noted that the quality of data correction verification module 210 in the embodiment can be used for executing in above-described embodiment Step S110, check results judgment module 220 can be used for executing the step S120 in above-described embodiment, alarm mode judgement Module 230 can be used for executing the step S130 in above-described embodiment, and tasks interrupt module 240 can be used for executing above-mentioned implementation Step S140 in example.
Optionally, quality of data correction verification module 210 includes quality indicator rule determination unit and verification task executing units.
Quality indicator rule determination unit is used for the off-line data task output result table in response to data warehouse, based on pre- First according to the off-line data task obtain metadata genetic connection, determine it is corresponding with the result table it is preconfigured extremely Few data quality indicator rule.
Task executing units are verified to be used to carry out the result table according at least one quality of data verification rule Quality of data verification, obtains check results;
Wherein, metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
Optionally, at least one quality of data verification rule includes: general rule, customized table level rule and makes by oneself Adopted field rule;
Then quality indicator rule determination unit includes:
Subelement is inquired, for the off-line data task output result table in response to data warehouse, based on previously according to institute The metadata genetic connection of off-line data task acquisition is stated, is inquired corresponding with the result table preconfigured described customized Table level rule and custom field rule;The customized table level rule include non-SQL type custom rule and SQL type from Definition rule;
First subelement, for general rule to be put into first queue;
Second subelement, for the non-SQL type custom rule to be put into second queue;
Third subelement, for the SQL type custom rule to be put into third queue;
4th subelement, for custom field rule to be put into the 4th queue.Wherein, the first queue, the second team Column and third queue can be executed concurrently.
Optionally, verification task executing units include:
Rule consumption subelement, for successively from non-SQL type custom rule, SQL type custom rule, general rule and The rule taken out in each queue in corresponding four queues of custom field rule verifies the result table, obtains To check results.
Optionally, described device further include:
Alarm Unit, for being verification failure in response to the check results, according to preconfigured alarm channel and table Asset level determine alarm object, and the alarm object is alerted.
Optionally, described device further include:
Collector unit, for collecting the letter of the result table when executing quality of data verification task to the result table Breath;
Cluster resource determination unit, for by the metadata genetic connection by off-line data task orientation to the knot Fruit table, to determine for producing cluster resource consumed by the off-line data task of the result table;
Unit is visualized, for using visualization tool to the information of the result table, the cluster resource and institute Check results are stated to be shown.
Optionally, described device further include:
Rule detection module obtains inspection result for checking by timing asynchronous task data standard rule;
Notification module, for sending notice to corresponding business domains responsible person according to the inspection result;
Specifically, the rule detection module includes:
Rule detection subelement is verified, whether configuration data quality indicator is regular for checking critical field;
Specification rule verification subelement, for table naming rule and zoning ordinance check.
As an improvement of the above scheme, present invention correspondence provides a kind of preferred embodiment of system, referring to Fig. 7, is The structural schematic diagram of the data management system of an embodiment provided by the invention, the system comprises one or more processors 301, memory 302 and one or more computer programs 303.Wherein one or more of computer programs are stored In the memory, it and is configured as being executed by one or more of processors, the computer program includes being used for Execute the data managing method as described in above-mentioned any embodiment.
Illustratively, the computer program can be divided into one or more modules, one or more of moulds Block is stored in the memory, and is executed by the processor, to complete the present invention.One or more of modules can be with It is the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing the computer program in institute State the implementation procedure in system.For example, the computer program can be divided into quality of data correction verification module 210, for ringing The quality of data should be executed to the result table and verify task, obtain school in the off-line data task output result table of data warehouse Test result;Check results judgment module 220, for judging whether the check results are verification failure;Alarm mode judges mould Block 230 determines the task node of verification failure, and judge verification failure for being verification failure in response to the check results Task node whether be configured with the first alarm mode;First alarm mode is used to indicate whether the task node is pass Key task node;Tasks interrupt module 240, in response to determining that the task node of the verification failure is configured with described first Alarm mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts holding for the off-line data task Row.
The system can be desktop PC, notebook, desktop computer or cloud system etc. and calculate equipment.The system System may include, but be not limited only to, processor, memory.It will be understood by those skilled in the art that the schematic diagram is only system Example, the not restriction of structure paired systems, may include than illustrating more or fewer components, or the certain components of combination, Or different components, such as the system can also include input-output equipment, network access equipment, bus etc..
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng the processor is the control centre of the system, utilizes the various pieces of various interfaces and connection whole system.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization The various functions of system.The memory can mainly include storing program area and storage data area, wherein storing program area can deposit Application program (such as sound-playing function, image player function etc.) needed for storing up operating system, at least one function etc.;Storage Data field, which can be stored, uses created data (such as audio data, phone directory etc.) etc. according to mobile phone.In addition, memory can It can also include nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligence to include high-speed random access memory Energy storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card), at least one disk memory, flush memory device or other volatile solid-state parts.
Wherein, if the module of the system integration is realized in the form of SFU software functional unit and as independent product pin It sells or in use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned All or part of the process in embodiment method can also instruct relevant hardware to complete by computer program, described Computer program can be stored in a storage medium, equipment is real which controls the storage medium when running where The now data managing method as described in above-mentioned any embodiment.Wherein, the computer program includes computer program code, institute Stating computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described Computer-readable medium may include: any entity or device, recording medium, U that can carry the computer program code Disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), arbitrary access Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs It is bright, the content that the computer-readable medium includes can according in jurisdiction make laws and patent practice requirement into Row increase and decrease appropriate, such as do not include electric load according to legislation and patent practice, computer-readable medium in certain jurisdictions Wave signal and telecommunication signal.
It should be noted that the apparatus embodiments described above are merely exemplary, wherein described be used as separation unit The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual It needs that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, device provided by the invention In embodiment attached drawing, user's trip relationship between module indicates there is communication connection between them, specifically can be implemented as one Item or a plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, it can It understands and implements.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as Protection scope of the present invention.

Claims (10)

1. a kind of data managing method, which is characterized in that the method is applied to metadata system, which comprises
In response to the off-line data task output result table of data warehouse, the quality of data is executed to the result table and verifies task, Obtain check results;
Judge whether the check results are verification failure;
It is verification failure in response to the check results, determines the task node of verification failure, and judges the task of verification failure Whether node is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is key task Node;And
In response to determining that the task node of the verification failure is configured with first alarm mode, Xiang Suoshu data warehouse is sent Interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
2. data managing method as described in claim 1, which is characterized in that the off-line data in response to data warehouse is appointed Business output result table executes the quality of data to the result table and verifies task, obtains check results, comprising:
In response to the off-line data task output result table of data warehouse, based on what is obtained previously according to the off-line data task Metadata genetic connection determines preconfigured at least one quality of data verification rule corresponding with the result table;And
Quality of data verification is carried out to the result table according at least one quality of data verification rule, obtains verification knot Fruit;
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
3. data managing method as claimed in claim 2, which is characterized in that at least one quality of data verification rule packet It includes: general rule, customized table level rule and custom field rule;
The then off-line data task output result table in response to data warehouse, based on previously according to the off-line data task The metadata genetic connection of acquisition determines preconfigured at least one quality of data verification rule corresponding with the result table Then, comprising:
In response to the off-line data task output result table of data warehouse, based on what is obtained previously according to the off-line data task Metadata genetic connection inquires corresponding with the result table preconfigured customized table level rule and described customized Field rule;The customized table level rule includes non-SQL type custom rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
4. data managing method as claimed in claim 3, which is characterized in that described according at least one quality of data school It tests rule and quality of data verification is carried out to the result table, obtain check results, comprising:
Successively from the non-SQL type custom rule, the SQL type custom rule, the general rule and described customized The rule taken out in each queue in corresponding four queues of field rule verifies the result table, is verified As a result.
5. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
It is verification failure in response to the check results, alarm is determined according to the asset level of preconfigured alarm channel and table Object, and the alarm object is alerted.
6. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
The information of the result table is collected when executing quality of data verification task to the result table;
By the metadata genetic connection by off-line data task orientation to the result table, to determine for producing the knot Cluster resource consumed by the off-line data task of fruit table;
The information of the result table, the cluster resource are shown with the check results using visualization tool.
7. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
Data standard rule is checked by timing asynchronous task, obtains inspection result;
Notice is sent to corresponding business domains responsible person according to the inspection result;
It is wherein, described that data standard rule is checked by timing asynchronous task, comprising:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
8. a kind of data administrator, which is characterized in that the data administrator is applied to metadata system, described device packet It includes:
Quality of data correction verification module, for the off-line data task output result table in response to data warehouse, to the result table It executes the quality of data and verifies task, obtain check results;
Check results judgment module, for judging whether the check results are verification failure;
Alarm mode judgment module, for being verification failure in response to the check results, the task node of determining verification failure, And judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate described appoint Whether business node is key task node;And
Tasks interrupt module, for being configured with first alarm mode in response to the task node for determining that the verification fails, Interrupt signal is sent to the data warehouse, so that the data warehouse interrupts the execution of the off-line data task.
9. a kind of storage medium, which is characterized in that the storage medium includes the computer program of storage, the computer program Data management side of the storage medium institute's Coupling device realization as described in any one of claim 1-7 is controlled when operation Method.
10. a kind of data management system, including one or more processors;Memory;And
One or more computer programs, wherein one or more of computer programs are stored in the memory, and And be configured as being executed by one or more of processors, the computer program includes for executing such as claim 1 to 7 Any one of described in data managing method.
CN201910744219.5A 2019-08-13 2019-08-13 Data managing method, device, storage medium and system Pending CN110457371A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910744219.5A CN110457371A (en) 2019-08-13 2019-08-13 Data managing method, device, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910744219.5A CN110457371A (en) 2019-08-13 2019-08-13 Data managing method, device, storage medium and system

Publications (1)

Publication Number Publication Date
CN110457371A true CN110457371A (en) 2019-11-15

Family

ID=68486238

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910744219.5A Pending CN110457371A (en) 2019-08-13 2019-08-13 Data managing method, device, storage medium and system

Country Status (1)

Country Link
CN (1) CN110457371A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105603A (en) * 2019-12-20 2020-05-05 万申科技股份有限公司 Venue and stadium integrated security management platform based on big data
CN111597255A (en) * 2020-04-29 2020-08-28 北京金山云网络技术有限公司 Data disaster recovery processing method and device, electronic equipment and storage medium
CN112328619A (en) * 2020-09-24 2021-02-05 杭州小电科技股份有限公司 Data quality monitoring method, device, system, electronic device and storage medium
CN112506911A (en) * 2020-12-18 2021-03-16 杭州数澜科技有限公司 Data quality monitoring method and device, electronic equipment and storage medium
CN112632174A (en) * 2020-12-31 2021-04-09 江苏苏宁云计算有限公司 Data inspection method, device and system
CN112860803A (en) * 2021-03-29 2021-05-28 中信银行股份有限公司 Account checking method, device and equipment and readable storage medium
CN113641739A (en) * 2021-07-05 2021-11-12 南京联创信息科技有限公司 Spark-based intelligent data conversion method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102005818A (en) * 2010-11-10 2011-04-06 国电南瑞科技股份有限公司 Method for detecting consistency of SCD (System Configuration Document) and IED (Intelligent Electronic Device) model on line
CN103929326A (en) * 2014-03-18 2014-07-16 烽火通信科技股份有限公司 Communication network transmission type alarm uniform analysis device and method
CN104766151A (en) * 2014-12-29 2015-07-08 国家电网公司 Quality management and control method for electricity transaction data warehouses and management and control system thereof
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
CN107766572A (en) * 2017-11-13 2018-03-06 北京国信宏数科技有限责任公司 Distributed extraction and visual analysis method and system based on economic field data
CN109542901A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Data processing method, device, computer readable storage medium and electronic equipment
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN109857755A (en) * 2019-01-08 2019-06-07 中国联合网络通信集团有限公司 A kind of rule method of calibration and device
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102005818A (en) * 2010-11-10 2011-04-06 国电南瑞科技股份有限公司 Method for detecting consistency of SCD (System Configuration Document) and IED (Intelligent Electronic Device) model on line
CN103929326A (en) * 2014-03-18 2014-07-16 烽火通信科技股份有限公司 Communication network transmission type alarm uniform analysis device and method
CN104766151A (en) * 2014-12-29 2015-07-08 国家电网公司 Quality management and control method for electricity transaction data warehouses and management and control system thereof
CN105278373A (en) * 2015-10-16 2016-01-27 中国南方电网有限责任公司电网技术研究中心 Substation integrated information processing system realizing method
CN109947746A (en) * 2017-10-26 2019-06-28 亿阳信通股份有限公司 A kind of quality of data management-control method and system based on ETL process
CN107766572A (en) * 2017-11-13 2018-03-06 北京国信宏数科技有限责任公司 Distributed extraction and visual analysis method and system based on economic field data
CN109542901A (en) * 2018-11-12 2019-03-29 北京懿医云科技有限公司 Data processing method, device, computer readable storage medium and electronic equipment
CN109739893A (en) * 2018-12-28 2019-05-10 上海连尚网络科技有限公司 A kind of metadata management method, equipment and computer-readable medium
CN109857755A (en) * 2019-01-08 2019-06-07 中国联合网络通信集团有限公司 A kind of rule method of calibration and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105603A (en) * 2019-12-20 2020-05-05 万申科技股份有限公司 Venue and stadium integrated security management platform based on big data
CN111105603B (en) * 2019-12-20 2021-04-27 万申科技股份有限公司 Venue and stadium integrated security management platform based on big data
CN111597255A (en) * 2020-04-29 2020-08-28 北京金山云网络技术有限公司 Data disaster recovery processing method and device, electronic equipment and storage medium
CN112328619A (en) * 2020-09-24 2021-02-05 杭州小电科技股份有限公司 Data quality monitoring method, device, system, electronic device and storage medium
CN112506911A (en) * 2020-12-18 2021-03-16 杭州数澜科技有限公司 Data quality monitoring method and device, electronic equipment and storage medium
CN112632174A (en) * 2020-12-31 2021-04-09 江苏苏宁云计算有限公司 Data inspection method, device and system
CN112860803A (en) * 2021-03-29 2021-05-28 中信银行股份有限公司 Account checking method, device and equipment and readable storage medium
CN112860803B (en) * 2021-03-29 2024-05-03 中信银行股份有限公司 Method, device and equipment for checking account and readable storage medium
CN113641739A (en) * 2021-07-05 2021-11-12 南京联创信息科技有限公司 Spark-based intelligent data conversion method

Similar Documents

Publication Publication Date Title
CN110457371A (en) Data managing method, device, storage medium and system
US11886464B1 (en) Triage model in service monitoring system
CN107958049B (en) Data quality inspection management system
US8555248B2 (en) Business object change management using release status codes
US8276152B2 (en) Validation of the change orders to an I T environment
CN108711030A (en) The end-to-end project management platform integrated with artificial intelligence
US7908160B2 (en) System and method for producing audit trails
CN107810500A (en) Data quality analysis
US8463811B2 (en) Automated correlation discovery for semi-structured processes
CN105183625A (en) Log data processing method and apparatus
US8904357B2 (en) Dashboard for architectural governance
CN109753596B (en) Information source management and configuration method and system for large-scale network data acquisition
US20110131247A1 (en) Semantic Management Of Enterprise Resourses
US10754901B2 (en) Analytics of electronic content management systems using a staging area database
US11853794B2 (en) Pipeline task verification for a data processing platform
CN103714133A (en) Data operation and maintenance management method and device
CN109656963A (en) Metadata acquisition methods, device, equipment and computer readable storage medium
CN111400288A (en) Data quality inspection method and system
CN112905323B (en) Data processing method, device, electronic equipment and storage medium
CN102609789A (en) Information monitoring and abnormality predicting system for library
US11797339B2 (en) Systems and methods for maintaining data objects to manage asynchronous workflows
CN107480188B (en) Audit service data processing method and computer equipment
CN109819019B (en) Monitoring and statistical analysis method and system for large-scale network data acquisition
CN115221337A (en) Data weaving processing method and device, electronic equipment and readable storage medium
Pintas et al. SciLightning: a cloud provenance-based event notification for parallel workflows

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115

RJ01 Rejection of invention patent application after publication