The content of the invention
The invention provides a kind of Data Cleaning Method, pre-set indicator-specific statistics maintenance task table and refer to
SD washes Code Template, and the method includes:
It is effective indicator-specific statistics maintenance task table and institute according to current state when synchronization point is reached
Index cleaning Code Template configuration data cleaning task is stated, the indicator-specific statistics maintenance task table is comprising current
For the element and its corresponding data of index cleaning;
Testing results are carried out to the data cleansing task;
If the data cleansing task run is successfully tested, according to the indicator-specific statistics maintenance task table and
The index cleaning Code Template is scheduled configuration, and the data cleansing task is distributed into production ring
Border, so that data warehouse carries out data cleansing.
Preferably, testing results are carried out to the data cleansing task, specially:
Flow is run according to data cleansing tasks carrying examination, and judges that the examination runs whether flow succeeds;
If flow success is run in the examination, the result data to being obtained by the examination race flow is verified;
If being verified for the data, confirms that the data cleansing task run is successfully tested;
If flow failure is run in the examination or the checking of the data does not pass through, the data cleansing task is confirmed
Testing results fail.
Preferably, flow is run according to data cleansing tasks carrying examination, specially:
Run the index cleaning Code Template;
Code Template is cleaned according to the index and reads the indicator-specific statistics maintenance task table, and to the finger
The corresponding data of each element are parsed in mark statistics maintenance task table;
According to analysis result and index cleaning Code Template splicing generation SQL statement, and run institute
State SQL statement.
Preferably, before synchronization point is reached, also include:
Obtain the current state with each indicator-specific statistics maintenance task table;
If existence is the indicator-specific statistics maintenance task table of standby service examination & approval, the indicator-specific statistics is safeguarded
Task list carries out business approval, and by the indicator-specific statistics maintenance task table after the business approval passes through
State be updated to treat that technology is audited;
If existence is the indicator-specific statistics maintenance task table for treating technology examination & verification, the indicator-specific statistics is safeguarded
Task list carries out technology examination & verification, and by the indicator-specific statistics maintenance task table after technology examination & verification passes through
State be updated to effectively.
Preferably, also include:
If state is the business approval of the indicator-specific statistics maintenance task table of standby service examination & approval not passing through, will be described
It is to be modified that the state of indicator-specific statistics maintenance task table is updated to business approval, and is the business by state
Examination & approval indicator-specific statistics maintenance task table to be modified state after the modification is updated to standby service examination & approval;
If the technology examination & verification that state is the indicator-specific statistics maintenance task table for treating technology examination & verification does not pass through, will be described
The state of indicator-specific statistics maintenance task table is updated to technology and audits to be modified, and by state for institute's technology is examined
Core indicator-specific statistics maintenance task table to be modified state after the modification is updated to treat that technology is audited.
Preferably, before the current state with each indicator-specific statistics maintenance task table is obtained, also wrap
Include:
When the newly-increased request of data cleansing task is received, according in the newly-increased request of the data cleansing task
The indicator-specific statistics maintenance task table that the corresponding data genaration of each described element for carrying is increased newly, and will be described new
The state of the indicator-specific statistics maintenance task table of increasing is set to standby service examination & approval;
When the modification request of data cleansing task is received, according in data cleansing task modification request
The corresponding data of element to be modified for carrying and the corresponding original finger of data cleansing task modification request
The newly-increased indicator-specific statistics maintenance task table of mark statistics maintenance task table generation, and the newly-increased index is united
The state for counting maintenance task table is set to standby service examination & approval.
Preferably, also include:
If state is institute's technology auditing to be modified or described business approval indicator-specific statistics maintenance task to be modified
Table is not changed in default time threshold, and the state of the indicator-specific statistics maintenance task table is updated to
It is invalid.
Correspondingly, the application also proposed a kind of data cleansing equipment, and the equipment pre-sets index system
Meter maintenance task table and index cleaning Code Template, the equipment also include:
Configuration module, when reach synchronization point when according to current state be effective indicator-specific statistics maintenance task
Table and index cleaning Code Template configuration data cleaning task, the indicator-specific statistics maintenance task table
Comprising the element and its corresponding data that are currently used in index cleaning;
Test module, testing results are carried out to the data cleansing task;
Release module, safeguards when the data cleansing task run is successfully tested according to the indicator-specific statistics
Task list and index cleaning Code Template are scheduled configuration, and the data cleansing task is sent out
Cloth is to production environment, so that data warehouse carries out data cleansing.
Preferably, the test module is specifically included:
Submodule is run in examination, runs flow according to data cleansing tasks carrying examination, and judge that stream is run in the examination
Whether journey succeeds;
If flow success is run in the examination, the examination runs submodule to running the result that flow is obtained by the examination
Data are verified;
If being verified for the data, described to try to run the submodule confirmation data cleansing task run survey
Try successfully;
If flow failure is run in the examination or the checking of the data does not pass through, the examination is run submodule and confirms institute
State data cleansing task run test crash.
Preferably, the examination is run submodule and runs flow according to data cleansing tasks carrying examination, specially:
Run the index cleaning Code Template;
Code Template is cleaned according to the index and reads the indicator-specific statistics maintenance task table, and to the finger
The corresponding data of each element are parsed in mark statistics maintenance task table;
According to analysis result and index cleaning Code Template splicing generation SQL statement, and run institute
State SQL statement.
Preferably, also include:
Acquisition module, obtains the current state with each indicator-specific statistics maintenance task table;
If existence is the indicator-specific statistics maintenance task table of standby service examination & approval, the acquisition module is to described
Indicator-specific statistics maintenance task table carries out business approval, and the index is united after the business approval passes through
The state for counting maintenance task table is updated to treat that technology is audited;
If existence is the indicator-specific statistics maintenance task table for treating technology examination & verification, the acquisition module is to described
Indicator-specific statistics maintenance task table carries out technology examination & verification, and by index system after technology examination & verification passes through
The state for counting maintenance task table is updated to effectively.
Preferably, also include:
It is described to obtain if state is the business approval of the indicator-specific statistics maintenance task table of standby service examination & approval not passing through
It is to be modified that the state of the indicator-specific statistics maintenance task table is updated to business approval by modulus block, and by shape
It is unemployed that state is that business approval indicator-specific statistics maintenance task table to be modified state after the modification is updated to
Business examination & approval;
If the technology examination & verification that state is the indicator-specific statistics maintenance task table for treating technology examination & verification does not pass through, described to obtain
It is to be modified that the state of the indicator-specific statistics maintenance task table is updated to technology examination & verification by modulus block, and by shape
State is that institute's technology examination & verification indicator-specific statistics maintenance task table to be modified state after the modification is updated to treat technology
Examination & verification.
Preferably, also include:
Generation module is new according to the data cleansing task when data cleansing task increases request newly receiving
Increase the newly-increased indicator-specific statistics maintenance task table of the corresponding data genaration of each described element carried in request, and
The state of the newly-increased indicator-specific statistics maintenance task table is set to standby service examination & approval;
Modified module, repaiies when the modification request of data cleansing task is received according to the data cleansing task
Change the corresponding data of element to be modified and data cleansing task modification request correspondence carried in request
The newly-increased indicator-specific statistics maintenance task table of original index statistics maintenance task table generation, and described will increase newly
Indicator-specific statistics maintenance task table state be set to standby service examination & approval.
Preferably, also include:
Remove module, is that institute's technology audits to be modified or described business approval index system to be modified in state
By the indicator-specific statistics maintenance task table when meter maintenance task table is not changed in default time threshold
State is updated to invalid.
As can be seen here, by the technical scheme of application the application, indicator-specific statistics maintenance task is being pre-set
According to current state it is effective when synchronization point is reached in the case of table and index cleaning Code Template
Indicator-specific statistics maintenance task table and index cleaning Code Template configuration data cleaning task, and to data
Cleaning task carries out testing results, just according to indicator-specific statistics only when data cleansing task run is successfully tested
Maintenance task table and index cleaning Code Template are scheduled configuration, and data cleansing task is issued
To production environment, so that data warehouse carries out data cleansing.Appoint so as to automatically carry out data cleansing
Business, reduces the workload of data warehouse developer, improves data mining efficiency.
Specific embodiment
As stated in the Background Art, the corresponding model of each new environment for processing data technical staff and
Speech, its dispositions method is all basically identical, but is limited to the resource problem of the technical staff of processing data,
Often want waiting very long.Simultaneously for these homogeneous demands, the technical staff of processing data is each
The code for being required for exploitation new, but the index demand of model construction personnel is all similar, therefore split
Hair resource is a kind of greatly waste.By taking the data processing of credible system as an example, index is that fixed that is several
Individual, simply the corresponding dimension of index (the corresponding environmental information of account) is different.
Based on above-mentioned situation, present applicant proposes a kind of Data Cleaning Method, to reduce code maintenance
The workload of data mining personnel is reduced while cost, and then lifts development efficiency.The method passes through will
The index cleaning logical abstraction of homogeneity out, makes Code Template, is reached by way of Transfer Parameters
Purpose is cleaned to different indexs, while being safeguarded to the variable part in homogeneous index cleaning logic
Treatment.Therefore before the method is implemented, indicator-specific statistics maintenance task table and index cleaning are pre-set
Code Template, wherein, index cleaning Code Template is that cleaning can be completely performed after filling finishes variate-value
One section of code of task, wherein needing the place of filling variate-value can use blank value or the side of free time
Formula, and indicator-specific statistics maintenance task table then contain it is each required for generation one section of partial data cleaning code
The corresponding variate-value of individual element.
As shown in figure 1, the schematic flow sheet of the Data Cleaning Method for the application proposition, comprises the following steps:
S101, is effective indicator-specific statistics maintenance task table according to current state when synchronization point is reached
And the index cleaning Code Template configuration data cleaning task,.
Because the application is intended to carry out data cleansing task automatically, therefore technical staff can be according to actual feelings
Condition sets a synchronizing cycle or a synchronization point is manually specified, so when in arrival synchronization point,
The current state that can pass through to pre-set is that effective indicator-specific statistics maintenance task table and index clean generation
Code mask configuration data cleaning task.Index is currently used in due to being contained in indicator-specific statistics maintenance task table
The element of cleaning and its corresponding data, therefore in configuration process, can be according to indicator-specific statistics maintenance task
Element in table is filled in index cleaning Code Template, and the execution code of data cleansing is generated with this.
In the preferred embodiment of the application, index cleaning relates generally to element as shown in table 1 below:
Element |
Explanation |
Source table |
Example:Such as ctu event tables |
Dimension field |
Example:Such as account USER_ID, UMID |
Metric field |
Example:Such as amount of money AMOUNT |
Metric form |
Example:Such as COUNT DISTINCT, SUM |
Time marking field |
Example:Such as Time To Event gmt_occur |
Collect the beginning and ending time |
Example:Such as count 20120101~20130101 |
Object table |
The index for counting needs to be placed in which table |
Table 1
Correspondingly, based on the element filled the need for as implied above, index cleaning Code Template may be configured as
Following false code:
Insert overwrite table object tables partition (dt=$ { yyyymmdd })
Select dimensions field 1,
Dimension field 2,
Metric form 1 (metric field 1),
Metric form 2 (metric field 2)
From sources table
Where time marking fields between collects the beginning and ending time.
By taking the indicator-specific statistics maintenance task table for double dimension combination Two indices currently to be counted as an example, then should
Front page layout can be used similar to the maintenance page as shown in table 2 below in specific embodiment:
Table 2
Although it should be noted that the application specific embodiment is by above-mentioned Code Template and list template
The execution code building mode of data cleansing is illustrated, but the application is not limited thereto, it is basic herein
On other list Setting patterns or Code Template belong to the protection domain of the application.
Additionally, in order to carry out effective maintenance and management to indicator-specific statistics maintenance task table, the application's
Preferred embodiment is provided with different task statuses for indicator-specific statistics maintenance task table, which includes
Standby service is examined and is treated technology and audits two states, and wherein standby service is examined corresponding indicator-specific statistics and safeguarded
Task list illustrates that the data cleansing task is not yet allowed to implement, and treats technology and audit corresponding indicator-specific statistics
Maintenance task table illustrates that it has been allowed to implement, but technically still infeasible at present.By in business
Examined on reasonability and technological rationality, it is ensured that the reasonable utilization of data warehouse resource.
Specifically, in the preferred embodiment of the application, obtaining and each indicator-specific statistics maintenance task
After the current state of table, corresponding processing procedure is as follows:
(1) if existence is the indicator-specific statistics maintenance task table of standby service examination & approval, the index is united
Meter maintenance task table carries out business approval, and safeguards the indicator-specific statistics after the business approval passes through
The state of task list is updated to treat that technology is audited;
(2) if existence is the indicator-specific statistics maintenance task table for treating technology examination & verification, the index is united
Meter maintenance task table carries out technology examination & verification, and safeguards the indicator-specific statistics after technology examination & verification passes through
The state of task list is updated to effectively.
What above result was directed to is all processing mode ideally, but in actual process
Middle technical staff needs constantly to increase new data cleansing task according to the actual requirements, while these are newly-increased
Data cleansing task often because many reasons cause not pass through and need the technical staff to carry out it
Modification, in the preferred embodiment of the application, the specific aim measure taken for different situations is as follows:
(1) business approval of the indicator-specific statistics maintenance task table that state is examined for standby service does not pass through
In this case, it is to be modified that the state of the indicator-specific statistics maintenance task table is updated into business approval,
And by state be business approval indicator-specific statistics maintenance task table to be modified state after the modification more
New is standby service examination & approval.
(2) state be treat technology examination & verification indicator-specific statistics maintenance task table technology examination & verification do not pass through
In this case, the state of the indicator-specific statistics maintenance task table is updated into technology audits to be modified,
And by state be that institute technology is audited indicator-specific statistics maintenance task table to be modified state after the modification and updated
To treat that technology is audited.
(3) the newly-increased request of data cleansing task is received
Increase the corresponding data genaration of each described element carried in request newly according to the data cleansing task new
The indicator-specific statistics maintenance task table of increasing, and the state of the newly-increased indicator-specific statistics maintenance task table is set
For standby service is examined;
(4) the modification request of data cleansing task is received
According to the corresponding data of element to be modified and institute that are carried in data cleansing task modification request
State the newly-increased index system of the corresponding original index statistics maintenance task table generation of data cleansing task modification request
Maintenance task table is counted, and the state of the newly-increased indicator-specific statistics maintenance task table is set to standby service and examined
Batch.
While data task state-maintenance is completed through the above way, need not in order to clear up in time
Data cleansing task, the application preferred embodiment is in state for the examination & verification of institute technology is to be modified or the business
Examination & approval indicator-specific statistics maintenance task table to be modified in default time threshold not by modification in the case of,
It is invalid that the state of the indicator-specific statistics maintenance task table is updated to.
S102, testing results are carried out to the data cleansing task.
After obtaining the execution code of execution data cleansing task based on S101, the step can be clear for data
The task of washing carries out testing results.In the preferred embodiment of the application, appointed according to the data cleansing first
Business performs examination and runs flow, and judges that the examination runs whether flow succeeds, and is then entered respectively according to situations below
Row treatment:
(1) if flow success is run in the examination, the result data to being obtained by the examination race flow is carried out
Checking;
(2) if being verified for the data, confirms that the data cleansing task run is successfully tested;
(3) if flow failure is run in the examination or the checking of the data does not pass through, confirm that the data are clear
Wash task run test crash.
It should be noted that according to different applied environment and device type, technical staff can take
Flow is run in the examination of different step, in the preferred embodiment of the application, is completed examination by following steps and is run stream
Journey:
Step a) runs the index cleaning Code Template;
Step b) cleans Code Template and reads the indicator-specific statistics maintenance task table according to the index, and
The corresponding data of each element in the indicator-specific statistics maintenance task table are parsed;
Step c) generates SQL statement according to analysis result and index cleaning Code Template splicing, and
Run the SQL statement.
In specific embodiment as shown in Figure 2, postponed when completing to match somebody with somebody on line, warehouse needs to start to try to run and flows
Journey, the code of node operation can read the synchronous allocation list for getting off, and the information in allocation list is parsed,
Completion SQL statement is spliced into be run, examination runs that confirmation program is errorless, data accurate, stable performance, money
After the consumption rationally of source, then formally it is distributed to production environment.
S103, if the data cleansing task run is successfully tested, according to the indicator-specific statistics maintenance task
Table and index cleaning Code Template are scheduled configuration, and the data cleansing task is distributed to
Production environment, so that data warehouse carries out data cleansing.
By the technical scheme using above-described embodiment, in the case where inline system and data warehouse is combined,
Using inline system editor and advantage easy to maintenance, the change that above-mentioned homogeneous index SD is washed in logic
Amount part inline system safeguards that data warehouse is used after data syn-chronization on line is returned into warehouse,
Improve development efficiency.
To reach above technical purpose, the application also proposed a kind of data cleansing equipment, as shown in figure 3,
The equipment pre-sets indicator-specific statistics maintenance task table and index cleaning Code Template, and the equipment is also wrapped
Include:
Configuration module 310, safeguards for effective indicator-specific statistics according to current state when synchronization point is reached and appoints
Business table and index cleaning Code Template configuration data cleaning task, the indicator-specific statistics maintenance task
Table includes the element and its corresponding data for being currently used in index cleaning;
Test module 320, testing results are carried out to the data cleansing task;
Release module 330, ties up when the data cleansing task run is successfully tested according to the indicator-specific statistics
Shield task list and index cleaning Code Template are scheduled configuration, and by the data cleansing task
Production environment is distributed to, so that data warehouse carries out data cleansing.
In specific application scenarios, the test module is specifically included:
Submodule is run in examination, runs flow according to data cleansing tasks carrying examination, and judge that stream is run in the examination
Whether journey succeeds;
If flow success is run in the examination, the examination runs submodule to running the result that flow is obtained by the examination
Data are verified;
If being verified for the data, described to try to run the submodule confirmation data cleansing task run survey
Try successfully;
If flow failure is run in the examination or the checking of the data does not pass through, the examination is run submodule and confirms institute
State data cleansing task run test crash.
In specific application scenarios, the examination is run submodule and is tried to run according to the data cleansing tasks carrying
Flow, specially:
Run the index cleaning Code Template;
Code Template is cleaned according to the index and reads the indicator-specific statistics maintenance task table, and to the finger
The corresponding data of each element are parsed in mark statistics maintenance task table;
According to analysis result and index cleaning Code Template splicing generation SQL statement, and run described
SQL statement.
In specific application scenarios, also include:
Acquisition module, obtains the current state with each indicator-specific statistics maintenance task table;
If existence is the indicator-specific statistics maintenance task table of standby service examination & approval, the acquisition module is to described
Indicator-specific statistics maintenance task table carries out business approval, and the index is united after the business approval passes through
The state for counting maintenance task table is updated to treat that technology is audited;
If existence is the indicator-specific statistics maintenance task table for treating technology examination & verification, the acquisition module is to described
Indicator-specific statistics maintenance task table carries out technology examination & verification, and by index system after technology examination & verification passes through
The state for counting maintenance task table is updated to effectively.
In specific application scenarios, also include:
It is described to obtain if state is the business approval of the indicator-specific statistics maintenance task table of standby service examination & approval not passing through
It is to be modified that the state of the indicator-specific statistics maintenance task table is updated to business approval by modulus block, and by shape
It is unemployed that state is that business approval indicator-specific statistics maintenance task table to be modified state after the modification is updated to
Business examination & approval;
If the technology examination & verification that state is the indicator-specific statistics maintenance task table for treating technology examination & verification does not pass through, described to obtain
It is to be modified that the state of the indicator-specific statistics maintenance task table is updated to technology examination & verification by modulus block, and by shape
State is that institute's technology examination & verification indicator-specific statistics maintenance task table to be modified state after the modification is updated to treat technology
Examination & verification.
In specific application scenarios, also include:
Generation module is new according to the data cleansing task when data cleansing task increases request newly receiving
Increase the newly-increased indicator-specific statistics maintenance task table of the corresponding data genaration of each described element carried in request, and
The state of the newly-increased indicator-specific statistics maintenance task table is set to standby service examination & approval;
Modified module, repaiies when the modification request of data cleansing task is received according to the data cleansing task
Change the corresponding data of element to be modified and data cleansing task modification request correspondence carried in request
The newly-increased indicator-specific statistics maintenance task table of original index statistics maintenance task table generation, and described will increase newly
Indicator-specific statistics maintenance task table state be set to standby service examination & approval.
In specific application scenarios, also include:
Remove module, is that institute's technology audits to be modified or described business approval index system to be modified in state
By the indicator-specific statistics maintenance task table when meter maintenance task table is not changed in default time threshold
State is updated to invalid.
By the technical scheme of application the application, indicator-specific statistics maintenance task table and index are being pre-set
Cleaning Code Template in the case of, when reach synchronization point when according to current state be effective indicator-specific statistics
Maintenance task table and index cleaning Code Template configuration data cleaning task, and data cleaning task is entered
Row testing results, just according to indicator-specific statistics maintenance task table only when data cleansing task run is successfully tested
And index cleaning Code Template is scheduled configuration, and data cleansing task is distributed to production environment,
So that data warehouse carries out data cleansing.So as to automatically carry out data cleansing task, data are reduced
The workload of warehouse developer, improves data mining efficiency.
Through the above description of the embodiments, those skilled in the art can be understood that this hair
It is bright to be realized by hardware, it is also possible to be realized by the mode of software plus necessary general hardware platform.
Based on such understanding, technical scheme can be embodied in the form of software product, and this is soft
It (can be CD-ROM, USB flash disk is mobile hard that part product can be stored in a non-volatile memory medium
Disk etc.) in, including some instructions are used to so that a computer equipment (can be personal computer, take
Business device, or the network equipment etc.) perform method described in each implement scene of the invention.
It will be appreciated by those skilled in the art that accompanying drawing is a schematic diagram for being preferable to carry out scene, in accompanying drawing
Module or necessary to flow not necessarily implements the present invention.
It will be appreciated by those skilled in the art that the module in device in implement scene can be according to implement scene
Description be distributed in the device of implement scene, it is also possible to is carried out respective change and is disposed other than this implementation
In one or more devices of scene.The module of above-mentioned implement scene can merge into a module, also may be used
To be further split into multiple submodule.
The invention described above sequence number is for illustration only, and the quality of implement scene is not represented.
Disclosed above is only several specific implementation scenes of the invention, but, the present invention is not limited to
This, the changes that any person skilled in the art can think of should all fall into protection scope of the present invention.