CN110457371A - Data managing method, device, storage medium and system - Google Patents
Data managing method, device, storage medium and system Download PDFInfo
- Publication number
- CN110457371A CN110457371A CN201910744219.5A CN201910744219A CN110457371A CN 110457371 A CN110457371 A CN 110457371A CN 201910744219 A CN201910744219 A CN 201910744219A CN 110457371 A CN110457371 A CN 110457371A
- Authority
- CN
- China
- Prior art keywords
- data
- task
- rule
- verification
- result table
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/283—Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of data managing method, is executed by metadata system, this method comprises: executing the quality of data in response to the off-line data task output result table of data warehouse to result table and verifying task, obtain check results;Judge whether the check results are verification failure;It is verification failure in response to check results, determines the task node of verification failure, and judges whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is key task node;In response to determining that the task node of the verification failure is configured with the first alarm mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.Data managing method provided by the invention can be improved data management efficiency, prevent Downstream Jobs from avalanche effect occur, avoid the occurrence of the data problem of large area.The present invention also provides data administrator, storage medium and systems simultaneously.
Description
Technical field
The present invention relates to data warehouse technology field more particularly to data managing method, device, storage medium and systems.
Background technique
Data warehouse is the indispensable ring of company for possessing mass data, be in order to further mining data resource,
It is generated for decision needs.The acquisition of data is necessarily involved in this process, is cleaned, a series of behaviour of integration etc.
Make, then, along with data mining, necessarily be unableing to do without data management during data-oriented.
Inventor has found that the prior art, which has following defects that, is carrying out data management in the practice of the invention
When, warehouse developer generally require to spend a large amount of energy go to pay close attention to some other systems for checking the quality of data or
Person is that the additional exploitation quality of data checks task;Moreover, the data management of this mode, knows the mode of aftersensation after belonging in fact,
When the quality of data goes wrong, it cannot be easy to cause avalanche effect, out in first time perception data quality problems
The data problem of existing large area.
Summary of the invention
Based on this, the embodiment of the present invention proposes a kind of data managing method, device, storage medium and system, Neng Gouti
High data management efficiency, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of the data problem of large area.
Data managing method provided in an embodiment of the present invention is applied to metadata system, which comprises
In response to the off-line data task output result table of data warehouse, quality of data verification is executed to the result table and is appointed
Business, obtains check results;
Judge whether the check results are verification failure;
It is verification failure in response to the check results, determines the task node of verification failure, and judges verification failure
Whether task node is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is crucial
Task node;
In response to determining that the task node of the verification failure is configured with first alarm mode, Xiang Suoshu data warehouse
Interrupt signal is sent, so that the data warehouse interrupts the execution of the off-line data task.
In a kind of optional embodiment, the off-line data task output result table in response to data warehouse is right
The result table executes the quality of data and verifies task, obtains check results, comprising:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task
The metadata genetic connection obtained determines preconfigured at least one quality of data verification rule corresponding with the result table;
And
Quality of data verification is carried out to the result table according at least one quality of data verification rule, is verified
As a result;
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
In a kind of optional embodiment, at least one quality of data verification rule includes: general rule, makes by oneself
Adopted table level rule and custom field rule;
The then off-line data task output result table in response to data warehouse, based on previously according to the off-line data
The metadata genetic connection that task obtains determines preconfigured at least one quality of data verification corresponding with the result table
Rule, comprising:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task
Metadata genetic connection, inquire corresponding with the result table preconfigured customized table level rule and it is described oneself
Define field rule;The customized table level rule includes non-SQL type custom rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
It is described regular to the result according at least one quality of data verification in a kind of optional embodiment
Table carries out quality of data verification, obtains check results, comprising:
Successively from the non-SQL type custom rule, the SQL type custom rule, the general rule and it is described from
It defines the rule taken out in each queue in corresponding four queues of field rule to verify the result table, obtain
Check results.
In a kind of optional embodiment, the method also includes:
It is verification failure in response to the check results, is determined according to the asset level of preconfigured alarm channel and table
Alarm object, and the alarm object is alerted.
In a kind of optional embodiment, the method also includes:
The information of the result table is collected when executing quality of data verification task to the result table;
By the metadata genetic connection by off-line data task orientation to the result table, to determine for producing
State cluster resource consumed by the off-line data task of result table;
The information of the result table, the cluster resource are shown with the check results using visualization tool.
In a kind of optional embodiment, the method also includes:
Data standard rule is checked by timing asynchronous task, obtains inspection result;
Notice is sent to corresponding business domains responsible person according to the inspection result;
It is wherein, described that data standard rule is checked by timing asynchronous task, comprising:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
The embodiment of the present invention also provides a kind of data administrator, which is applied to metadata system.The device includes:
The data administrator is applied to metadata system, and described device includes quality of data correction verification module, verification knot
Fruit judgment module, alarm mode judgment module and tasks interrupt module.
Quality of data correction verification module is used for the off-line data task output result table in response to data warehouse, to the result
Table executes the quality of data and verifies task, obtains check results;
Check results judgment module is for judging whether the check results are verification failure;
Alarm mode judgment module is used in response to the check results be verification failure, determines the task section of verification failure
Point, and judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate institute
State whether task node is key task node;
Tasks interrupt module is used in response to determining that the task node of the verification failure is configured with the first alarm side
Formula, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
As an improvement of the above scheme,
Another embodiment of the present invention is corresponding to provide a kind of storage medium, and the storage medium includes the computer journey of storage
Sequence, wherein the computer program controls storage medium institute's Coupling device and realizes such as above-mentioned any embodiment institute when running
The data managing method stated.
Another embodiment of the present invention correspondence provides data management system, including one or more processors;Memory;With
And one or more computer programs, wherein one or more of computer programs are stored in the memory, and
It is configured as being executed by one or more of processors, the computer program includes for executing such as above-mentioned any embodiment
The data managing method.
Compared with the existing technology, the present invention has following outstanding the utility model has the advantages that the embodiment of the invention provides data pipes
Manage method, apparatus, storage medium and system, wherein the method is applied to metadata system, by data warehouse from
The result table of line data task institute output carries out quality of data verification, and developer in the production process of data warehouse is of interest
Data quality problem showed by metadata system.Developer is not necessarily to the problems such as actively going focused data quality, and
It is that the efficiency of data warehouse staff development is greatly improved by metadata system active drive, avoids data warehouse developer
When carrying out data management, inefficiency but spends a large amount of energy, and the case where be easy to produce data problem.Data check
Process combination metadata system, the problem of enabling developer to perceive data more comprehensively, automatically.This method
Also determine whether the task node is pass by judging whether the task node of verification failure is configured with the first alarm mode
Key task node, and interrupt signal is sent to the data warehouse when determining that task node is key node, so that the number
The execution that the off-line data task is reversely interrupted according to warehouse, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of large area
Data problem.
Detailed description of the invention
Fig. 1 is the flow diagram of the data managing method of an embodiment provided by the invention;
Fig. 2 is the flow diagram of the step S110 of an embodiment provided by the invention;
Fig. 3 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 4 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 5 is the flow diagram of the data managing method of another embodiment provided by the invention;
Fig. 6 is the structural schematic diagram of the data administrator of an embodiment provided by the invention;
Fig. 7 is the structural schematic diagram of the data management system of an embodiment provided by the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It is the flow diagram of the data managing method of an embodiment provided by the invention, the data referring to Fig. 1
Management method is applied to metadata system.
Metadata (Meta Data) refers to the data of description data, is usually made of the description of message structure, with technology
Development metadata intension have a very big extension, for example uml model, data trade rule, compiled with Java .NET, C++ etc.
APIs, operation flow and the Work flow model write, products configuration description and tuning parameter and various businesses rule, term and fixed
Justice etc..Metadata should also include the description to various new data types, such as to position, name, user click frequency, audio,
The description etc. of video, picture, various wireless aware device datas and various monitoring device data etc..Metadata is generally divided into business
Metadata, technology metadata and operation metadata etc..Data service metadata mainly include business rule, definition, term, nomenclature,
Using service language etc., main user is service-user for algorithm and system.Technology metadata is mainly used to define information
Supply chain (Information Supply Chain, ISC) all kinds of component part metadata structures, specifically include each system table
With the various objects such as field structure, attribute, source, dependence etc. and storing process, function, sequence.Operation metadata refers to
Application program operation information, such as its frequency, the analysis of record number and various components and other statistical informations etc..
Metadata system, also referred to as metadata management system, the platform can be managed metadata.
Data managing method provided in this embodiment includes:
Step S110, in response to the off-line data task output result table of data warehouse, data are executed to the result table
Quality indicator task, obtains check results;
Step S120, judge whether the check results are verification failure;
Step S130, it is verification failure in response to the check results, determines the task node of verification failure, and judge school
Whether the task node for testing failure is configured with the first alarm mode;First alarm mode is used to indicate the task node
No is key task node;
Step S140, in response to determining that the task node of the verification failure is configured with first alarm mode, to institute
It states data warehouse and sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
In daily data warehouse production process, each it is related to the off-line data task of output tables, in the present invention,
The table of off-line data task output is referred to as result table.Data warehouse calls in a manner of through HTTP after having executed data task
The interface of metadata system triggers the quality of data relevant to the result table of off-line data task output and verifies task.
Optionally, the first alarm mode can be the modes such as warning by telephone.
Specifically, the method also includes: in response to determine it is described verification failure task node be configured with described first
Alarm mode carries out warning by telephone to the related personnel of the off-line data task.Warning by telephone can be used for illustrating not handle asking
Topic may cause the case where downstream data failure.By warning by telephone, problem is handled in time for the related personnel of the task, just
In the execution of the recovery data task after processing problem, so that closed loop is formed between metadata system and data warehouse, it is whole
The operation of body smoothness.
Further, the method also includes: in response to determine it is described verification failure task node configuration without first accuse
Police's formula, then the task node is not key task node, and first alarm mode can be used at this time and notify related personnel,
Convenient for the subsequent follow-up processing of developer.Different disposal is made according to the significance level of task node, improves flexibility, is guaranteed
The treatment effeciency of off-line data task.
Data managing method provided in this embodiment is applied to metadata system, passes through the off-line data to data warehouse
The result table of task institute output carries out quality of data verification, by the data of interest of developer in the production process of data warehouse
Quality problems are showed by metadata system.Developer is not necessarily to the problems such as actively going focused data quality, but by member
Data system active drive greatly improves the efficiency of data warehouse staff development, avoids data warehouse developer and is carrying out
Inefficiency but spends a large amount of energy when data management, and the case where be easy to produce data problem.The process of data check
In conjunction with metadata system, the problem of enabling developer to perceive data more comprehensively, automatically.This method also passes through
Judge whether the task node of verification failure is configured with the first alarm mode to determine whether the task node is key task
Node, and interrupt signal is sent to the data warehouse when determining that task node is key node, so that the data warehouse
The execution for reversely interrupting the off-line data task, prevents Downstream Jobs from avalanche effect occur, avoids the occurrence of the data of large area
Problem.
Referring to fig. 2, be an embodiment provided by the invention step S110 flow diagram, with above-described embodiment
Unlike step S110~step S140 of offer, in the present embodiment, step S110 includes:
Step S1101, in response to the off-line data task output result table of data warehouse, based on previously according to described offline
The metadata genetic connection that data task obtains determines preconfigured at least one quality of data corresponding with the result table
Verification rule;
Step S1102, quality of data school is carried out to the result table according at least one quality of data verification rule
It tests, obtains check results.
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
Generation, processing fusion, the circulation circulation of data, wither away to final, will form a kind of relationship, blood naturally between data
Edge relationship is for expressing this relationship between data.Database, table and field are the storage organizations of data.Inhomogeneity
The data of type have different storage organizations.Storage organization determines the hierarchical structure of genetic connection.
Based on metadata genetic connection, guarantee the correspondence of quality of data verification rule and result table, convenient for flexibly for not
Same result table configures different quality of data verification rules;Result table is carried out according at least one quality of data verification rule
Verification, is conducive to more fully find data quality problem.
Optionally, at least one quality of data verification rule includes: general rule, customized table level rule and makes by oneself
Adopted field rule;
Then step S1101 includes:
In response to the off-line data task output result table of data warehouse, based on being obtained previously according to the off-line data task
Metadata genetic connection, inquire corresponding with the result table preconfigured customized table level rule and it is described oneself
Define field rule;Customized table level rule includes non-SQL (structured query language, Structured Query
Language) customized verification rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
It should be noted that customized table level rule, custom field rule are all customized verification rule, relative to
For general rule, customized verification rule can be custom-configured according to the demand of developer.In other embodiment party
In formula, customized table level rule can also only include one of non-SQL type custom rule and SQL type custom rule.
Further, described that quality of data school is carried out to the result table according at least one quality of data verification rule
It tests, obtains check results, comprising:
It is successively corresponding from non-SQL type custom rule, SQL type custom rule, general rule and custom field rule
Four queues in the rule that takes out in each queue the result table is verified, obtain check results.
In order to not influence the progress of off-line data task, the mode of the asynchronous execution of well known Producer-consumer problem is taken
Successively consume the task in queue.Specifically, meeting when a data quality indicator rule being put into a queue every time
Excute () method of the ExecutorService class of Java is executed, excute () method can be to registration one in task factory
Task, needs to be passed to a worker parameter, and the method for worker parameter definition consumption takes the worker meeting of task in this way
The verification rule that queue is taken out in into factory executes verification.
Each Worker etc., can be successively customized from non-SQL type custom rule, SQL type until when the right of consumption
The rule taken out in each queue in regular four queues of rule, general rule and custom field is verified, in this way
It may insure that worker number is consistent with the number of tasks of verification, that is, all verification tasks can all be ensured that final consumption falls.
When being tested according to general rule to result table, metadata system can access HDFS file system according to library name table name
System takes out some metamessages of result table, such as size, line number etc..Then full dose verification is carried out respectively and increment verifies, two
The data of kind verification respectively according to certain algorithm and before are compared, and obtain the data check situation of table comprehensively.
It is to meet the needs of number storehouse personnel are different, data that customized table-level data check rule, which is divided into SQL type with non-SQL type,
Warehouse developer can shift to an earlier date the customized table level rule of quality indicator module typing of the table on metadata system.Metadata system
It the different configurations such as provides different manner of comparison, compares cycle, compare content, comparison range, being needed when configuration has not been met
When asking, then can directly take out oneself desired result by customized SQL mode and carry out content comparison, metadata system according to
The compares cycle of configuration automatically selects presto or the query engine of druid is inquired.
Rule is verified for custom field, data warehouse can equally shift to an earlier date typing rule, example on metadata system
Can such as configure enumerate, uniquely, non-empty rule, execute verification task when can be automatically generated according to configuration inquiry sql, In
It is inquired in presto, obtains the laggard line discipline verification of query result.
It is the flow diagram of the data managing method of another embodiment provided by the invention referring to Fig. 3.With it is above-mentioned
Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S150 is verification failure in response to the check results, according to the money of preconfigured alarm channel and table
It produces grade and determines alarm object, and the alarm object is alerted.
It should be noted that the present embodiment and being executed not in strict accordance with the sequence of step S110~S140, such as step
S140 and step S150 can be executed parallel.
Specifically, the check results include general rule check results, customized table level rule check results and make by oneself
Adopted field rule check results.General rule is verified, is directly recorded if verifying successfully to general rule check results,
If it fails, the content of failure to be then sent to the related alarm acceptor of table by mail automatically, then record to Universal gauge
Then check results are stored in metadata system, and the verification situation of inter-related task, and processing verification are obtained convenient for related personnel
The data quality problem of failure.For the verification of custom field rule and the verification of table level custom rule, if verification failure,
It is reported according to checking procedure generation check problem and distributes to relevant treatment personnel.The case where for verification unsuccessfully, can also incite somebody to action
Check problem reports that typing Jira system (project and affairs trace tool), the system belong to developer's daily need concern
System handles data quality problem, and metadata system is by the way that by this process automation, developer finally only needs to close
It infuses Jira and handles questionnaire.
Referring to fig. 4, be another embodiment provided by the invention data managing method flow diagram.With it is above-mentioned
Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S160 collects the information of the result table when executing quality of data verification task to the result table;
Step S170, by the metadata genetic connection by off-line data task orientation to the result table, with determination
For producing cluster resource consumed by the off-line data task of the result table;
Step S180, using visualization tool to the information of the result table, the cluster resource and the check results
It is shown.
It should be noted that the present embodiment and being executed not in strict accordance with the sequence of step S110~S140, such as step
S160 and step S110 can be executed parallel.
Specifically, quality of data billboard can be established, visualization tool is presented in quality of data billboard.
The visualization tool may include the chart for display data trend.During above-mentioned verification, metadata
System has collected the partition size of table, line number, full table size, line number, and field enumerates distribution situation etc. process metadata, and
It will be in these data filings to different mysql tables.The information of result table can more intuitively be shown using visualization tool
To user, facilitates the hiding data problem of discovery, facilitate data warehouse developer perception problems and investigation problem.
Executing off-line data task and while data quality indicator task, Hive that data task is relied on or
Spark engine also will record the process metadata of data task, and time-consuming of such as handling up, gc time etc., these information can all be pushed away
It is sent to the library mysql of metadata system, by the offline task state of timing, in conjunction with metadata genetic connection, by task orientation
To table, just can obtain producing the cluster resource that the data task of each table is spent.Optionally, certain algorithm can be used, it will
Cluster resource conversion is the amount of money, as cost result, is then shown using visualization tool to cost result.Data warehouse
Developer can also be concerned about the calculating cost variation of table, be convenient for data warehouse developer while focused data trend
More intuitively cluster resource consumed by impression task, so that the relevant data task logic of data warehouse personnel optimization is pushed,
Reduce the resource of consumption.
It is the flow diagram of the data managing method of another embodiment provided by the invention referring to Fig. 5.With it is above-mentioned
Unlike step S110~step S140 that embodiment provides, in the present embodiment, the data managing method further include:
Step S190, data standard rule is checked by timing asynchronous task, obtains inspection result;
Step S200, notice is sent to corresponding business domains responsible person according to the inspection result.
Specifically, step S190 includes:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
It should be noted that step S190 can also include the steps that checking other specification rules, it can be according to reality
The configuration of border demand, is not listed herein.
The embodiment realizes automaticly inspecting for data standard, avoids leading to avalanche effect because data are lack of standardization.
It is the structural schematic diagram of the data administrator of an embodiment provided by the invention referring to Fig. 6.The present invention is real
The data administrator 1 for applying example offer includes: quality of data correction verification module 210, check results judgment module 220, alarm mode
Judgment module 230 and tasks interrupt module 240.
Quality of data correction verification module 210 is used for the off-line data task output result table in response to data warehouse, to described
As a result table executes the quality of data and verifies task, obtains check results.
Check results judgment module 220 is for judging whether the check results are verification failure.
Alarm mode judgment module 230 is used to be verification failure in response to the check results, determines appointing for verification failure
Business node, and judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is for referring to
Show whether the task node is key task node.
Tasks interrupt module 240 is used for the task node in response to determining the verification failure configured with first alarm
Mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
It should be noted that the quality of data correction verification module 210 in the embodiment can be used for executing in above-described embodiment
Step S110, check results judgment module 220 can be used for executing the step S120 in above-described embodiment, alarm mode judgement
Module 230 can be used for executing the step S130 in above-described embodiment, and tasks interrupt module 240 can be used for executing above-mentioned implementation
Step S140 in example.
Optionally, quality of data correction verification module 210 includes quality indicator rule determination unit and verification task executing units.
Quality indicator rule determination unit is used for the off-line data task output result table in response to data warehouse, based on pre-
First according to the off-line data task obtain metadata genetic connection, determine it is corresponding with the result table it is preconfigured extremely
Few data quality indicator rule.
Task executing units are verified to be used to carry out the result table according at least one quality of data verification rule
Quality of data verification, obtains check results;
Wherein, metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
Optionally, at least one quality of data verification rule includes: general rule, customized table level rule and makes by oneself
Adopted field rule;
Then quality indicator rule determination unit includes:
Subelement is inquired, for the off-line data task output result table in response to data warehouse, based on previously according to institute
The metadata genetic connection of off-line data task acquisition is stated, is inquired corresponding with the result table preconfigured described customized
Table level rule and custom field rule;The customized table level rule include non-SQL type custom rule and SQL type from
Definition rule;
First subelement, for general rule to be put into first queue;
Second subelement, for the non-SQL type custom rule to be put into second queue;
Third subelement, for the SQL type custom rule to be put into third queue;
4th subelement, for custom field rule to be put into the 4th queue.Wherein, the first queue, the second team
Column and third queue can be executed concurrently.
Optionally, verification task executing units include:
Rule consumption subelement, for successively from non-SQL type custom rule, SQL type custom rule, general rule and
The rule taken out in each queue in corresponding four queues of custom field rule verifies the result table, obtains
To check results.
Optionally, described device further include:
Alarm Unit, for being verification failure in response to the check results, according to preconfigured alarm channel and table
Asset level determine alarm object, and the alarm object is alerted.
Optionally, described device further include:
Collector unit, for collecting the letter of the result table when executing quality of data verification task to the result table
Breath;
Cluster resource determination unit, for by the metadata genetic connection by off-line data task orientation to the knot
Fruit table, to determine for producing cluster resource consumed by the off-line data task of the result table;
Unit is visualized, for using visualization tool to the information of the result table, the cluster resource and institute
Check results are stated to be shown.
Optionally, described device further include:
Rule detection module obtains inspection result for checking by timing asynchronous task data standard rule;
Notification module, for sending notice to corresponding business domains responsible person according to the inspection result;
Specifically, the rule detection module includes:
Rule detection subelement is verified, whether configuration data quality indicator is regular for checking critical field;
Specification rule verification subelement, for table naming rule and zoning ordinance check.
As an improvement of the above scheme, present invention correspondence provides a kind of preferred embodiment of system, referring to Fig. 7, is
The structural schematic diagram of the data management system of an embodiment provided by the invention, the system comprises one or more processors
301, memory 302 and one or more computer programs 303.Wherein one or more of computer programs are stored
In the memory, it and is configured as being executed by one or more of processors, the computer program includes being used for
Execute the data managing method as described in above-mentioned any embodiment.
Illustratively, the computer program can be divided into one or more modules, one or more of moulds
Block is stored in the memory, and is executed by the processor, to complete the present invention.One or more of modules can be with
It is the series of computation machine program instruction section that can complete specific function, the instruction segment is for describing the computer program in institute
State the implementation procedure in system.For example, the computer program can be divided into quality of data correction verification module 210, for ringing
The quality of data should be executed to the result table and verify task, obtain school in the off-line data task output result table of data warehouse
Test result;Check results judgment module 220, for judging whether the check results are verification failure;Alarm mode judges mould
Block 230 determines the task node of verification failure, and judge verification failure for being verification failure in response to the check results
Task node whether be configured with the first alarm mode;First alarm mode is used to indicate whether the task node is pass
Key task node;Tasks interrupt module 240, in response to determining that the task node of the verification failure is configured with described first
Alarm mode, Xiang Suoshu data warehouse sends interrupt signal, so that the data warehouse interrupts holding for the off-line data task
Row.
The system can be desktop PC, notebook, desktop computer or cloud system etc. and calculate equipment.The system
System may include, but be not limited only to, processor, memory.It will be understood by those skilled in the art that the schematic diagram is only system
Example, the not restriction of structure paired systems, may include than illustrating more or fewer components, or the certain components of combination,
Or different components, such as the system can also include input-output equipment, network access equipment, bus etc..
Alleged processor can be central processing unit (Central Processing Unit, CPU), can also be it
His general processor, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng the processor is the control centre of the system, utilizes the various pieces of various interfaces and connection whole system.
The memory can be used for storing the computer program and/or module, and the processor is by operation or executes
Computer program in the memory and/or module are stored, and calls the data being stored in memory, described in realization
The various functions of system.The memory can mainly include storing program area and storage data area, wherein storing program area can deposit
Application program (such as sound-playing function, image player function etc.) needed for storing up operating system, at least one function etc.;Storage
Data field, which can be stored, uses created data (such as audio data, phone directory etc.) etc. according to mobile phone.In addition, memory can
It can also include nonvolatile memory, such as hard disk, memory, plug-in type hard disk, intelligence to include high-speed random access memory
Energy storage card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash
Card), at least one disk memory, flush memory device or other volatile solid-state parts.
Wherein, if the module of the system integration is realized in the form of SFU software functional unit and as independent product pin
It sells or in use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned
All or part of the process in embodiment method can also instruct relevant hardware to complete by computer program, described
Computer program can be stored in a storage medium, equipment is real which controls the storage medium when running where
The now data managing method as described in above-mentioned any embodiment.Wherein, the computer program includes computer program code, institute
Stating computer program code can be source code form, object identification code form, executable file or certain intermediate forms etc..It is described
Computer-readable medium may include: any entity or device, recording medium, U that can carry the computer program code
Disk, mobile hard disk, magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), arbitrary access
Memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It needs
It is bright, the content that the computer-readable medium includes can according in jurisdiction make laws and patent practice requirement into
Row increase and decrease appropriate, such as do not include electric load according to legislation and patent practice, computer-readable medium in certain jurisdictions
Wave signal and telecommunication signal.
It should be noted that the apparatus embodiments described above are merely exemplary, wherein described be used as separation unit
The unit of explanation may or may not be physically separated, and component shown as a unit can be or can also be with
It is not physical unit, it can it is in one place, or may be distributed over multiple network units.It can be according to actual
It needs that some or all of the modules therein is selected to achieve the purpose of the solution of this embodiment.In addition, device provided by the invention
In embodiment attached drawing, user's trip relationship between module indicates there is communication connection between them, specifically can be implemented as one
Item or a plurality of communication bus or signal wire.Those of ordinary skill in the art are without creative efforts, it can
It understands and implements.
The above is a preferred embodiment of the present invention, it is noted that for those skilled in the art
For, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also considered as
Protection scope of the present invention.
Claims (10)
1. a kind of data managing method, which is characterized in that the method is applied to metadata system, which comprises
In response to the off-line data task output result table of data warehouse, the quality of data is executed to the result table and verifies task,
Obtain check results;
Judge whether the check results are verification failure;
It is verification failure in response to the check results, determines the task node of verification failure, and judges the task of verification failure
Whether node is configured with the first alarm mode;First alarm mode is used to indicate whether the task node is key task
Node;And
In response to determining that the task node of the verification failure is configured with first alarm mode, Xiang Suoshu data warehouse is sent
Interrupt signal, so that the data warehouse interrupts the execution of the off-line data task.
2. data managing method as described in claim 1, which is characterized in that the off-line data in response to data warehouse is appointed
Business output result table executes the quality of data to the result table and verifies task, obtains check results, comprising:
In response to the off-line data task output result table of data warehouse, based on what is obtained previously according to the off-line data task
Metadata genetic connection determines preconfigured at least one quality of data verification rule corresponding with the result table;And
Quality of data verification is carried out to the result table according at least one quality of data verification rule, obtains verification knot
Fruit;
Wherein, the metadata genetic connection includes the genetic connection of table and task and the genetic connection of table and table.
3. data managing method as claimed in claim 2, which is characterized in that at least one quality of data verification rule packet
It includes: general rule, customized table level rule and custom field rule;
The then off-line data task output result table in response to data warehouse, based on previously according to the off-line data task
The metadata genetic connection of acquisition determines preconfigured at least one quality of data verification rule corresponding with the result table
Then, comprising:
In response to the off-line data task output result table of data warehouse, based on what is obtained previously according to the off-line data task
Metadata genetic connection inquires corresponding with the result table preconfigured customized table level rule and described customized
Field rule;The customized table level rule includes non-SQL type custom rule and SQL type custom rule;
General rule is put into first queue;
The non-SQL type custom rule is put into second queue;
The SQL type custom rule is put into third queue;
Custom field rule is put into the 4th queue;
Wherein, the first queue, second queue and third queue can be executed concurrently.
4. data managing method as claimed in claim 3, which is characterized in that described according at least one quality of data school
It tests rule and quality of data verification is carried out to the result table, obtain check results, comprising:
Successively from the non-SQL type custom rule, the SQL type custom rule, the general rule and described customized
The rule taken out in each queue in corresponding four queues of field rule verifies the result table, is verified
As a result.
5. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
It is verification failure in response to the check results, alarm is determined according to the asset level of preconfigured alarm channel and table
Object, and the alarm object is alerted.
6. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
The information of the result table is collected when executing quality of data verification task to the result table;
By the metadata genetic connection by off-line data task orientation to the result table, to determine for producing the knot
Cluster resource consumed by the off-line data task of fruit table;
The information of the result table, the cluster resource are shown with the check results using visualization tool.
7. data managing method according to any one of claims 1-4, which is characterized in that the method also includes:
Data standard rule is checked by timing asynchronous task, obtains inspection result;
Notice is sent to corresponding business domains responsible person according to the inspection result;
It is wherein, described that data standard rule is checked by timing asynchronous task, comprising:
Checking critical field, whether configuration data quality indicator is regular;
The naming rule and zoning ordinance of table are checked.
8. a kind of data administrator, which is characterized in that the data administrator is applied to metadata system, described device packet
It includes:
Quality of data correction verification module, for the off-line data task output result table in response to data warehouse, to the result table
It executes the quality of data and verifies task, obtain check results;
Check results judgment module, for judging whether the check results are verification failure;
Alarm mode judgment module, for being verification failure in response to the check results, the task node of determining verification failure,
And judge whether the task node of verification failure is configured with the first alarm mode;First alarm mode is used to indicate described appoint
Whether business node is key task node;And
Tasks interrupt module, for being configured with first alarm mode in response to the task node for determining that the verification fails,
Interrupt signal is sent to the data warehouse, so that the data warehouse interrupts the execution of the off-line data task.
9. a kind of storage medium, which is characterized in that the storage medium includes the computer program of storage, the computer program
Data management side of the storage medium institute's Coupling device realization as described in any one of claim 1-7 is controlled when operation
Method.
10. a kind of data management system, including one or more processors;Memory;And
One or more computer programs, wherein one or more of computer programs are stored in the memory, and
And be configured as being executed by one or more of processors, the computer program includes for executing such as claim 1 to 7
Any one of described in data managing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910744219.5A CN110457371A (en) | 2019-08-13 | 2019-08-13 | Data managing method, device, storage medium and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910744219.5A CN110457371A (en) | 2019-08-13 | 2019-08-13 | Data managing method, device, storage medium and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110457371A true CN110457371A (en) | 2019-11-15 |
Family
ID=68486238
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910744219.5A Pending CN110457371A (en) | 2019-08-13 | 2019-08-13 | Data managing method, device, storage medium and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110457371A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105603A (en) * | 2019-12-20 | 2020-05-05 | 万申科技股份有限公司 | Venue and stadium integrated security management platform based on big data |
CN111597255A (en) * | 2020-04-29 | 2020-08-28 | 北京金山云网络技术有限公司 | Data disaster recovery processing method and device, electronic equipment and storage medium |
CN112328619A (en) * | 2020-09-24 | 2021-02-05 | 杭州小电科技股份有限公司 | Data quality monitoring method, device, system, electronic device and storage medium |
CN112506911A (en) * | 2020-12-18 | 2021-03-16 | 杭州数澜科技有限公司 | Data quality monitoring method and device, electronic equipment and storage medium |
CN112632174A (en) * | 2020-12-31 | 2021-04-09 | 江苏苏宁云计算有限公司 | Data inspection method, device and system |
CN112860803A (en) * | 2021-03-29 | 2021-05-28 | 中信银行股份有限公司 | Account checking method, device and equipment and readable storage medium |
CN113641739A (en) * | 2021-07-05 | 2021-11-12 | 南京联创信息科技有限公司 | Spark-based intelligent data conversion method |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102005818A (en) * | 2010-11-10 | 2011-04-06 | 国电南瑞科技股份有限公司 | Method for detecting consistency of SCD (System Configuration Document) and IED (Intelligent Electronic Device) model on line |
CN103929326A (en) * | 2014-03-18 | 2014-07-16 | 烽火通信科技股份有限公司 | Communication network transmission type alarm uniform analysis device and method |
CN104766151A (en) * | 2014-12-29 | 2015-07-08 | 国家电网公司 | Quality management and control method for electricity transaction data warehouses and management and control system thereof |
CN105278373A (en) * | 2015-10-16 | 2016-01-27 | 中国南方电网有限责任公司电网技术研究中心 | Substation integrated information processing system realizing method |
CN107766572A (en) * | 2017-11-13 | 2018-03-06 | 北京国信宏数科技有限责任公司 | Distributed extraction and visual analysis method and system based on economic field data |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
CN109857755A (en) * | 2019-01-08 | 2019-06-07 | 中国联合网络通信集团有限公司 | A kind of rule method of calibration and device |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
-
2019
- 2019-08-13 CN CN201910744219.5A patent/CN110457371A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102005818A (en) * | 2010-11-10 | 2011-04-06 | 国电南瑞科技股份有限公司 | Method for detecting consistency of SCD (System Configuration Document) and IED (Intelligent Electronic Device) model on line |
CN103929326A (en) * | 2014-03-18 | 2014-07-16 | 烽火通信科技股份有限公司 | Communication network transmission type alarm uniform analysis device and method |
CN104766151A (en) * | 2014-12-29 | 2015-07-08 | 国家电网公司 | Quality management and control method for electricity transaction data warehouses and management and control system thereof |
CN105278373A (en) * | 2015-10-16 | 2016-01-27 | 中国南方电网有限责任公司电网技术研究中心 | Substation integrated information processing system realizing method |
CN109947746A (en) * | 2017-10-26 | 2019-06-28 | 亿阳信通股份有限公司 | A kind of quality of data management-control method and system based on ETL process |
CN107766572A (en) * | 2017-11-13 | 2018-03-06 | 北京国信宏数科技有限责任公司 | Distributed extraction and visual analysis method and system based on economic field data |
CN109542901A (en) * | 2018-11-12 | 2019-03-29 | 北京懿医云科技有限公司 | Data processing method, device, computer readable storage medium and electronic equipment |
CN109739893A (en) * | 2018-12-28 | 2019-05-10 | 上海连尚网络科技有限公司 | A kind of metadata management method, equipment and computer-readable medium |
CN109857755A (en) * | 2019-01-08 | 2019-06-07 | 中国联合网络通信集团有限公司 | A kind of rule method of calibration and device |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105603A (en) * | 2019-12-20 | 2020-05-05 | 万申科技股份有限公司 | Venue and stadium integrated security management platform based on big data |
CN111105603B (en) * | 2019-12-20 | 2021-04-27 | 万申科技股份有限公司 | Venue and stadium integrated security management platform based on big data |
CN111597255A (en) * | 2020-04-29 | 2020-08-28 | 北京金山云网络技术有限公司 | Data disaster recovery processing method and device, electronic equipment and storage medium |
CN112328619A (en) * | 2020-09-24 | 2021-02-05 | 杭州小电科技股份有限公司 | Data quality monitoring method, device, system, electronic device and storage medium |
CN112506911A (en) * | 2020-12-18 | 2021-03-16 | 杭州数澜科技有限公司 | Data quality monitoring method and device, electronic equipment and storage medium |
CN112632174A (en) * | 2020-12-31 | 2021-04-09 | 江苏苏宁云计算有限公司 | Data inspection method, device and system |
CN112860803A (en) * | 2021-03-29 | 2021-05-28 | 中信银行股份有限公司 | Account checking method, device and equipment and readable storage medium |
CN112860803B (en) * | 2021-03-29 | 2024-05-03 | 中信银行股份有限公司 | Method, device and equipment for checking account and readable storage medium |
CN113641739A (en) * | 2021-07-05 | 2021-11-12 | 南京联创信息科技有限公司 | Spark-based intelligent data conversion method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110457371A (en) | Data managing method, device, storage medium and system | |
US11886464B1 (en) | Triage model in service monitoring system | |
CN107958049B (en) | Data quality inspection management system | |
US8555248B2 (en) | Business object change management using release status codes | |
US8276152B2 (en) | Validation of the change orders to an I T environment | |
CN108711030A (en) | The end-to-end project management platform integrated with artificial intelligence | |
US7908160B2 (en) | System and method for producing audit trails | |
CN107810500A (en) | Data quality analysis | |
US8463811B2 (en) | Automated correlation discovery for semi-structured processes | |
CN105183625A (en) | Log data processing method and apparatus | |
US8904357B2 (en) | Dashboard for architectural governance | |
CN109753596B (en) | Information source management and configuration method and system for large-scale network data acquisition | |
US20110131247A1 (en) | Semantic Management Of Enterprise Resourses | |
US10754901B2 (en) | Analytics of electronic content management systems using a staging area database | |
US11853794B2 (en) | Pipeline task verification for a data processing platform | |
CN103714133A (en) | Data operation and maintenance management method and device | |
CN109656963A (en) | Metadata acquisition methods, device, equipment and computer readable storage medium | |
CN111400288A (en) | Data quality inspection method and system | |
CN112905323B (en) | Data processing method, device, electronic equipment and storage medium | |
CN102609789A (en) | Information monitoring and abnormality predicting system for library | |
US11797339B2 (en) | Systems and methods for maintaining data objects to manage asynchronous workflows | |
CN107480188B (en) | Audit service data processing method and computer equipment | |
CN109819019B (en) | Monitoring and statistical analysis method and system for large-scale network data acquisition | |
CN115221337A (en) | Data weaving processing method and device, electronic equipment and readable storage medium | |
Pintas et al. | SciLightning: a cloud provenance-based event notification for parallel workflows |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |
|
RJ01 | Rejection of invention patent application after publication |