Specific embodiment
In order to realize the purpose of the embodiment of the present application, the embodiment of the present application provide a kind of data processing method and
Equipment, obtains the multi-group data information being associated with target service of storage in the first tables of data, each group of institute
State the generation time of the business datum comprising described target service in data message and described business datum
First data content;There is data in the business datum determining described target service in described first tables of data
During drift, obtain the described business datum with described target service that data wander occurs from the second tables of data
The second associated data content, described first tables of data is different from described second tables of data;By the institute obtaining
Second data content of the first data content and described business datum of stating business datum merges, to merging
Data content execution data cleansing operation afterwards, so, data warehouse, before carrying out data cleansing, judges
Whether the business datum obtaining there is data wander, and when determining that business datum occurs data wander, obtains
There is the data content of the business datum of data wander, and then the data content of business datum merged,
Efficiently avoid because data wander leads to the cumulative problem of omission in business datum merging process, have
Effect improves the accuracy of the business datum of storage in data warehouse, simplifies Data Warehouse same simultaneously
Step mode, improves Data Warehouse treatment effeciency effectively.
It should be noted that data cleansing described in the embodiment of the present application refers to data warehouse to being drawn into
Data is carried out, and finds and correct mistake present in data.Generally comprise check data concordance,
Data invalid value or missing values is processed.Here process can comprise to delete.
The embodiment of the present application can be applied to for multistage business, for example:Installment business, or
Person needs to execute business of multi-pass operation etc..
With reference to Figure of description, each embodiment of the application is described in further detail.Obviously, institute
The embodiment of description is only a part of embodiment of the application, rather than whole embodiments.Based on this Shen
Please in embodiment, the institute that those of ordinary skill in the art are obtained under the premise of not making creative work
There are other embodiments, broadly fall into the scope of the application protection.
A kind of schematic flow sheet of data processing method that Fig. 1 provides for the embodiment of the present application.Methods described can
With as described below.The executive agent of the embodiment of the present application can be data warehouse.
Step 101:Obtain the multi-group data information being associated with target service of storage in the first tables of data.
Wherein, comprise in data message described in each group described target service business datum generation time with
And the first data content of described business datum.
In a step 101, because data warehouse possesses the ability that mass data is managed, each dispersion
The business datum that system acquisition arrives needs in specified data syn-chronization time synchronized to data warehouse, to realize number
According to the management to mass data for the warehouse.
The function of data warehouse can be realized by some instruments, for example:Open data processing service (English:
Open Data Processing Service;Abbreviation:ODPS);Hive instrument etc..
It should be noted that Hive is a kind of Tool for Data Warehouse of increasing income based on Hadoop, can be by structure
The data file changed is mapped as a tables of data, and can provide simple SQL query function, acceptable
SQL statement is converted to Map Reduce task run.
Data warehouse, when completing data syn-chronization, generally requires and cleans this two ranks through data pick-up data
Section.Wherein, data pick-up refers to that data warehouse gathers each system within a specified time from disperse system
The business datum of collection.
It should be noted that the time of specifying can determine according to actual needs it is also possible to be set according to system requirements
Fixed, for example:Daily 00:00:00~23:59:59.
The time that data warehouse executes data syn-chronization can be timing or periodically, for example:
It is set to daily 00:00:00~00:30:00;Or it is set on every Mondays 00:00:00~00:30:00 etc..False
If the set of time that data warehouse executes data syn-chronization is daily 00:00:00~00:30:00, then at this
In time period, data warehouse extracts the business datum of collection within the previous day from disperse system.For example:?
The 00 of No. 2:00:00~00:30:00, data warehouse extracts the business datum in No. 1 collection from disperse system.
Generally disperse system stores the business datum of collection in a day by the way of table.
So, data warehouse, when executing data syn-chronization, obtains related to target service from the first tables of data
The multi-group data information of connection.
In the first tables of data, for different business, each business datum life producing for each business
Become data message, that is, comprise in the service identification of business, the generation time of business datum and this generation time
Data content of business datum producing etc..
Due in actual applications, because the data content of business datum occurs situation about producing across sky, leading
Cause business datum content that the phenomenon of data wander occurs, that is, be directed to the business datum of target service, business datum
The change time occur at No. 1 23:59:59;But occur at No. 2 for the corresponding data content of this change
00:00:00.In systems, for No. 2 00:00:The data content presence of 00 generation is considered invalid data
Probability, execute data cleansing when, this data will be cleaned by, and so lead to the business number of target service
According to imperfect.
Step 102:For one of which data message, judge that whether the business datum of described target service exists
There is data wander in described first tables of data;If generation data wander, execution step 103;If not yet
There is data wander, then carry out data pick-up according to prior art.
In a step 102, for one of which data message, according to comprise in described data message
The generation time of the business datum of target service, judges that the generation time of the business datum of described target service is
No be included in the default very first time within the scope of.
Wherein, described default very first time scope extracts industry according to data warehouse from different system databases
The time of business data determines.
If the generation time that judged result is the business datum of described target service is included in the default very first time
Within the scope of it is determined that the business datum of described target service occur in described first tables of data data float
Move.
Specifically, for one group of data message in the first tables of data it is assumed that business number in one group of data message
It is sky according to content, the now generation time according to the business datum in this data message, further determine that this number
It is believed that within the scope of whether the generation time of the business datum in breath is included in the default very first time, if this data
Within the scope of the generation time of the business datum in information is included in the default very first time, then can determine this number
It is believed that there is data wander in the first tables of data in the business datum in breath.
For example:The time that data warehouse extracts business datum from different system databases is defined as
00:00:00~00:30:00, then default very first time scope can determine for:23:59:50~23:59:59, one
The generation time of the business datum of target service described in denier is included in 23:59:50~23:59:Within 59 it is determined that
There is data wander in the business datum of described target service in described first tables of data.
Step 103:There is data in the business datum determining described target service in described first tables of data
During drift, obtain the described business datum with described target service that data wander occurs from the second tables of data
The second associated data content.
Wherein, described first tables of data is different from described second tables of data.
In step 103, after due to data wander, the data content of business datum is possibly stored in separately
In one tables of data, then obtain the institute with described target service that data wander occurs from the second tables of data
State the second data content that business datum is associated.
Specifically, search from the second tables of data and produce in default second time range, and with described mesh
The data content that mark business is associated, wherein, described default second time range be used for characterize data warehouse from
Business datum is extracted in different system databases;Determining the data content and described target service searched
When described business datum is associated, using the data content finding as generation data wander and described target
The second data content that the described business datum of business is associated.
It should be noted that described default very first time scope is different from described default second time range, but
Time difference between default very first time scope and default second time range meets given threshold.
Described given threshold can determine according to actual needs it is also possible to be determined according to the characteristic of data wander.
I.e. first from other tables of data search comprise target service service identification tables of data (it is assumed here that
It is the second tables of data);
Secondly, search from the second tables of data and produce in default second time range, and with described target
The data content that business is associated, i.e. the generation time according to the business datum comprising in the second tables of data, really
Determine the business datum that generation time is included in default second time range, and from the business datum determining really
Make the data content with generation data wander in the first tables of data.
As shown in table 1, be the first tables of data and the second tables of data signal table:
Table 1
Step 104:By the first data content of the described business datum obtaining and the second of described business datum
Data content merges, to the data content execution data cleansing operation after merging.
At step 104, for the business datum being drawn into, by the first data content of described business datum
Merge with the second data content of described business datum, obtain the partial data content of this business datum.
In another embodiment of the application, data warehouse, after completing data pick-up, needs the more new calendar
History data, therefore, data warehouse obtains in the historical data of described business datum of described target service again
Hold;And the first data content of described business datum of by described historical data content, obtaining and described industry
Second data content of business data merges.
In another embodiment of the application, data message in being drawn into the first tables of data for the data warehouse
When, for the business datum that data wander not yet occurs, can be first by the described business of described target service
The historical data content of data is merged with the first data content of the described business datum obtaining;Secondly exist
Second data content of amalgamation result and the described business datum getting is merged.
The data processing method being provided by the embodiment of the present application, obtains store in the first tables of data and target
The multi-group data information that business is associated, comprises the business of described target service in data message described in each group
The generation time of data and the first data content of described business datum;In the industry determining described target service
When business data occurs data wander in described first tables of data, obtain from the second tables of data and data drift occurs
The second data content being associated with the described business datum of described target service moving, described first tables of data
Different from described second tables of data;By the first data content of the described business datum obtaining and described business number
According to the second data content merge, to merge after data content execution data cleansing operation, so,
Data warehouse, before carrying out data cleansing, judges whether the business datum obtaining occurs data wander, and
When determining that business datum occurs data wander, obtain the data content of the business datum that data wander occurs, enter
And the data content of business datum is merged, efficiently avoid because data wander leads to business datum
Occur in merging process omitting cumulative problem, effectively improve the standard of the business datum of storage in data warehouse
Really property.
For example:There is following groups data message for target service, as shown in table 2:
Table 2
The service identification of target service |
Generation time |
Business datum |
Data content |
1111 |
No. 1 11:59:59 |
Pay |
10 |
1111 |
No. 2 23:59:59 |
Pay |
Empty |
1111 |
No. 3 00:00:00 |
Empty |
20 |
If the time that data warehouse extracts business datum is No. 2 00:00:00~00:30:00, due to business datum
Generation time be No. 1 11:59:59, it is not included in default very first time scope (23:59:50~23:59:59)
Within, then the data content being drawn into the business datum of target service is 10;If data warehouse extracts business
The time of data is No. 3 00:00:00~00:30:00, the generation time due to business datum is No. 2
23:59:59, it is included in default very first time scope (23:59:50~23:59:59) within, then determine this industry
There is data wander in business data, now need further from default second time range
(00:00:00~00:15:00) determine the data content of the business datum that data wander occurs within, that is, obtain
To 20, so, data warehouse can the relatively accurate business datum to this target service, also will not be because of number
It is believed that leading to this data message invalid because of missing content in breath, efficiently avoid because data wander is led
Cause that omission cumulative problem occurs in business datum merging process, effectively improve the industry of storage in data warehouse
The accuracy of business data.
A kind of structural representation of data handling equipment that Fig. 2 provides for the embodiment of the present application.At described data
Reason equipment includes:Acquiring unit 21 and processing unit 22, wherein:
Acquiring unit 21, for obtaining the multi-group data being associated with target service of storage in the first tables of data
Information, wherein, comprises the generation time of the business datum of described target service in data message described in each group
And the first data content of described business datum;
Described acquiring unit 21, is additionally operable to determining the business datum of described target service in described first data
When there is data wander in table, obtain from the second tables of data occur data wander with described target service
The second data content that described business datum is associated, wherein, described first tables of data and described second data
Table is different;
Processing unit 22, the first data content and the described business datum of the described business datum for obtaining
The second data content merge, to merge after data content execution data cleansing operation.
Specifically, described acquiring unit 21 determines the business datum of described target service in described first data
There is data wander in table, including:
For one of which data message, according to the business of the described target service comprising in described data message
The generation time of data, judges whether the generation time of the business datum of described target service is included in default
Within the scope of one time, wherein, described default very first time scope according to data warehouse from different system numbers
Determine according to the time extracting business datum in storehouse;
If the generation time that judged result is the business datum of described target service is included in the default very first time
Within the scope of it is determined that the business datum of described target service occur in described first tables of data data float
Move.
Specifically, described acquiring unit 21 obtain from the second tables of data occur data wander with described mesh
The second data content that the described business datum of mark business is associated, including:
Search from the second tables of data and produce in default second time range, and with described target service phase
The data content of association, wherein, described default second time range for characterize data warehouse from different is
Business datum is extracted in system data base;
When determining that the data content searched is associated with the described business datum of described target service, will search
As there is being associated with the described business datum of described target service of data wander in the data content arriving
Second data content.
Specifically, described processing unit 22 by obtain described business datum the first data content with described
Second data content of business datum merges, including:
Obtain the historical data content of the described business datum of described target service;
By described historical data content, the first data content of the described business datum of acquisition and described business
Second data content of data merges.
It should be noted that the embodiment of the present application provide equipment can be realized by hardware mode it is also possible to
Realized by software mode, do not limit here,
Described equipment, before carrying out data cleansing, judges whether the business datum obtaining occurs data wander,
And when determining that business datum occurs data wander, obtain in the data of business datum that data wander occurs
Hold, and then the data content of business datum is merged, efficiently avoid because data wander leads to industry
Occur omitting cumulative problem in business data merging process, effectively improve the business number of storage in data warehouse
According to accuracy.
It will be understood by those skilled in the art that embodiments herein can be provided as method, device (equipment),
Or computer program.Therefore, the application can using complete hardware embodiment, complete software embodiment,
Or combine the form of the embodiment of software and hardware aspect.And, the application can using one or more its
In include computer-usable storage medium (the including but not limited to disk storage of computer usable program code
Device, CD-ROM, optical memory etc.) the upper computer program implemented form.
The application is with reference to according to the method for the embodiment of the present application, device (equipment) and computer program
Flow chart and/or block diagram describing.It should be understood that can by computer program instructions flowchart and/or
Each flow process in block diagram and/or the flow process in square frame and flow chart and/or block diagram and/or square frame
In conjunction with.These computer program instructions can be provided to general purpose computer, special-purpose computer, Embedded Processor
Or the processor of other programmable data processing device with produce a machine so that by computer or other
The instruction of the computing device of programmable data processing device produce for realizing in one flow process of flow chart or
The device of the function of specifying in multiple flow processs and/or one square frame of block diagram or multiple square frame.
These computer program instructions may be alternatively stored in and computer or other programmable datas can be guided to process and set
So that being stored in this computer-readable memory in the standby computer-readable memory working in a specific way
Instruction produce and include the manufacture of command device, the realization of this command device is in one flow process or multiple of flow chart
The function of specifying in flow process and/or one square frame of block diagram or multiple square frame.
These computer program instructions also can be loaded in computer or other programmable data processing device, makes
Obtain and series of operation steps is executed on computer or other programmable devices to produce computer implemented place
Reason, thus the instruction of execution is provided for realizing in flow chart one on computer or other programmable devices
The step of the function of specifying in flow process or multiple flow process and/or one square frame of block diagram or multiple square frame.
Although having been described for the preferred embodiment of the application, those skilled in the art once know base
This creative concept, then can make other change and modification to these embodiments.So, appended right will
Ask and be intended to be construed to including preferred embodiment and fall into being had altered and changing of the application scope.
Obviously, those skilled in the art can carry out various changes and modification without deviating from this Shen to the present invention
Scope please.So, if these modifications of the application and modification belong to the application claim and its be equal to
Within the scope of technology, then the application is also intended to comprise these changes and modification.