Specific embodiment
In order to realize the purpose of the embodiment of the present application, the embodiment of the present application provides a kind of data processing method and equipment,
The multi-group data information associated with target service stored in the first tables of data is obtained, includes in data information described in each group
First data content of the generation time of the business datum of the target service and the business datum;Determining the target
When data wander occurs in first tables of data for the business datum of business, is obtained from the second tables of data and data wander occurs
The second data content associated with the business datum of the target service, first tables of data and it is described second number
According to table difference;First data content of the business datum that will acquire is closed with the second data content of the business datum
And data cleansing operation is executed to the data content after merging, in this way, data warehouse is before carrying out data cleansing, judgement is obtained
Whether the business datum taken occurs data wander, and when determining that data wander occurs for business datum, obtains and data wander occurs
Business datum data content, and then the data content of business datum is merged, is efficiently avoided because data are floated
Moving causes to occur in business datum merging process to omit cumulative problem, effectively improves the business datum stored in data warehouse
Accuracy, while simplifying the Data Warehouse method of synchronization, Data Warehouse treatment effeciency be effectively promoted.
It should be noted that data cleansing described in the embodiment of the present application refer to data warehouse to the data being drawn into
Row cleaning, finds and corrects mistake present in data.Generally comprise check data consistency, to occur invalid value or lack
The data of mistake value are handled.Here processing may include deletion.
The embodiment of the present application can be applied to for multistage business, such as: installment business, or need to hold
The business etc. of row multi-pass operation.
The each embodiment of the application is described in further detail with reference to the accompanying drawings of the specification.Obviously, described
Embodiment is only a part of the embodiment of the application, instead of all the embodiments.Based on the embodiment in the application, ability
Domain those of ordinary skill all other embodiment obtained without making creative work belongs to the application guarantor
The range of shield.
Fig. 1 is a kind of flow diagram of data processing method provided by the embodiments of the present application.The method can be as follows
It is described.The executing subject of the embodiment of the present application can be data warehouse.
Step 101: obtaining the multi-group data information associated with target service stored in the first tables of data.
Wherein, the generation time of business datum in data information described in each group comprising the target service and described
First data content of business datum.
In a step 101, since data warehouse has the ability being managed to mass data, each decentralized system acquisition
To business datum need in specified data to be synchronized to data warehouse synchronization time, to realize data warehouse to mass data
Management.
The function of data warehouse can realize by some tools, such as: open data processing service (English: Open
Data Processing Service;Abbreviation: ODPS);Hive tool etc..
It should be noted that Hive is a kind of open source Tool for Data Warehouse based on Hadoop, it can be by the data of structuring
File Mapping is a tables of data, and is capable of providing simple SQL query function, SQL statement can also be converted to Map
Reduce task is run.
Data warehouse is generally required when completion data are synchronous by data pick-up and the two stages of data cleansing.Its
In, data pick-up refers to that data warehouse acquires the business datum that each system within a specified time acquires from decentralized system.
It should be noted that specified time can also can set, example according to system requirements determine according to actual needs
Such as: daily 00:00:00~23:59:59.
The data warehouse execution data synchronous time can be timing, be also possible to periodically, such as: it is set as every
Its 00:00:00~00:30:00;Or it is set as 00:00:00~00:30:00 etc. on every Mondays.Assuming that data warehouse executes
The data synchronous time is set as daily 00:00:00~00:30:00, then within this period, data warehouse from point
The business datum acquired within the previous day is extracted in the system of dissipating.Such as: in No. 2 00:00:00~00:30:00, data warehouse
The business datum acquired at No. 1 is extracted from decentralized system.
Usual decentralized system stores the business datum of acquisition in one day by the way of table.
In this way, data warehouse obtains associated with target service more when execution data are synchronous from the first tables of data
Group data information.
In the first tables of data, for different business, data are generated for each business datum that each business generates
The business datum generated in information, the i.e. service identification comprising business, the generation time and the generation time of business datum
Data content etc..
Due to the case where in practical applications, will appear across day generation due to the data content of business datum, lead to business
There is a phenomenon where data wanders for data content, that is, are directed to the business datum of target service, and the change time of business datum occurs 1
Number 23:59:59;But occur for the corresponding data content of the change in No. 2 00:00:00.In systems, for No. 2 00:
There is a possibility that being considered as invalid data in the data content that 00:00 is generated, when executing data cleansing, which will be clear
It washes, causes the business datum of target service imperfect in this way.
Step 102: for wherein one group of data information, judging the business datum of the target service whether described first
Data wander occurs in tables of data;If data wander occurs, 103 are thened follow the steps;If data wander not yet occurs, according to existing
There is technical solution to carry out data pick-up.
In a step 102, for wherein one group of data information, according to the target service for including in the data information
Business datum generation time, when judging whether the generation time of the business datum of the target service is included in default first
Between within the scope of.
Wherein, the default first time range extracts business datum from different system databases according to data warehouse
Time determine.
If the generation time that judging result is the business datum of the target service be included in default first time range it
It is interior, it is determined that in first tables of data data wander occurs for the business datum of the target service.
Specifically, for one group of data information in the first tables of data, it is assumed that business datum content in one group of data information
The business number in the data information is further determined that at this time according to the generation time of the business datum in the data information for sky
According to generation time whether be included in default first time within the scope of, if the generation time of the business datum in the data information
Within the scope of default first time, then it can determine that the business datum in the data information occurs in the first tables of data
Data wander.
Such as: the time that data warehouse extracts business datum from different system databases be determined as 00:00:00~
00:30:00, then default first time range can determine are as follows: 23:59:50~23:59:59, once the target service
The generation time of business datum is included within 23:59:50~23:59:59, it is determined that the business datum of the target service exists
Data wander occurs in first tables of data.
Step 103: when in first tables of data data wander occurs for the business datum for determining the target service,
The second data associated with the business datum of the target service that data wander occurs are obtained from the second tables of data
Content.
Wherein, first tables of data is different from second tables of data.
In step 103, after due to data wander, the data content of business datum is possibly stored in another data
In table, the associated with the business datum of the target service of data wander occurs then obtaining from the second tables of data
Second data content.
Specifically, generated in default second time range from searching in the second tables of data, and with the target service
Associated data content, wherein default second time range is used for characterize data warehouse from different system databases
Middle extraction business datum;When determining that the data content searched is associated with the business datum of the target service, will look into
The data content found is as the second data associated with the business datum of the target service that data wander occurs
Content.
It should be noted that the default first time range and default second time range are different, but preset the
Time difference between one time range and default second time range meets given threshold.
The given threshold can also can be determined according to the characteristic of data wander determine according to actual needs.
The tables of data of the service identification comprising target service is searched first from other tables of data (it is assumed that being second
Tables of data);
Secondly, generated in default second time range from searching in the second tables of data, and with the target service phase
Associated data content determines that generation time is included in that is, according to the generation time for the business datum for including in the second tables of data
Business datum in default second time range, and from determining that data occur with the first tables of data in determining business datum
The data content of drift.
As shown in table 1, it is the schematic table of the first tables of data and the second tables of data:
Table 1
Step 104: in the first data content of the business datum that will acquire and the second data of the business datum
Appearance merges, and executes data cleansing operation to the data content after merging.
At step 104, for the business datum being drawn into, by the first data content of the business datum and the industry
Second data content of business data merges, and obtains the partial data content of the business datum.
In another embodiment of the application, data warehouse needs to update historical data after completing data pick-up,
Therefore, data warehouse obtains the historical data content of the business datum of the target service again;And by the history number
It is closed according to the second data content of content, the first data content of the business datum of acquisition and the business datum
And.
In another embodiment of the application, data warehouse is right in the data information being drawn into the first tables of data
It, can first will be in the historical data of the business datum of the target service in the business datum that data wander not yet occurs
Hold and is merged with the first data content of the business datum obtained;Secondly by amalgamation result and the industry that gets
Second data content of business data merges.
By data processing method provided by the embodiments of the present application, obtain stored in the first tables of data with target service phase
Associated multi-group data information, the generation time of the business datum in data information described in each group comprising the target service with
And the first data content of the business datum;It is sent out in first tables of data in the business datum for determining the target service
When raw data wander, is obtained from the second tables of data and the related to the business datum of the target service of data wander occurs
Second data content of connection, first tables of data are different from second tables of data;The of the business datum that will acquire
One data content and the second data content of the business datum merge, and execute data cleansing to the data content after merging
Operation, in this way, data warehouse before carrying out data cleansing, judges whether the business datum of acquisition occurs data wander, and
When determining that data wander occurs for business datum, the data content that the business datum of data wander occurs is obtained, and then to business number
According to data content merge, efficiently avoid because data wander cause to occur omitting in business datum merging process it is tired
The problem of adding, effectively improves the accuracy of the business datum stored in data warehouse.
Such as: for target service, there are following groups data informations, as shown in table 2:
Table 2
The service identification of target service |
Generation time |
Business datum |
Data content |
1111 |
No. 1 11:59:59 |
Payment |
10 |
1111 |
No. 2 23:59:59 |
Payment |
It is empty |
1111 |
No. 3 00:00:00 |
It is empty |
20 |
If the time that data warehouse extracts business datum is No. 2 00:00:00~00:30:00, due to the production of business datum
The raw time is No. 1 11:59:59, is not included within default first time range (23:59:50~23:59:59), then extracting
Data content to the business datum of target service is 10;If the time that data warehouse extracts business datum is No. 3 00:00:00
~00:30:00, since the generation time of business datum is No. 2 23:59:59, be included in default first time range (23:59:
50~23:59:59) within, then it is determined that data wander occurs for the business datum, need at this time further from when presetting second
Between the data content that the business datum of data wander occurs is determined within range (00:00:00~00:15:00), that is, get
20, in this way, data warehouse can the relatively accurate business datum to the target service, will not because of in data information because lack
It loses content and causes the data information invalid, efficiently avoid because data wander causes to occur in business datum merging process
Cumulative problem is omitted, the accuracy of the business datum stored in data warehouse is effectively improved.
Fig. 2 is a kind of structural schematic diagram of data processing equipment provided by the embodiments of the present application.The data processing equipment
It include: acquiring unit 21 and processing unit 22, in which:
Acquiring unit 21, for obtaining the multi-group data information associated with target service stored in the first tables of data,
Wherein, the generation time and the business datum of the business datum in data information described in each group comprising the target service
The first data content;
The acquiring unit 21 is also used to send out in first tables of data in the business datum for determining the target service
When raw data wander, is obtained from the second tables of data and the related to the business datum of the target service of data wander occurs
Second data content of connection, wherein first tables of data is different from second tables of data;
Processing unit 22, the first data content of the business datum for will acquire and the second of the business datum
Data content merges, and executes data cleansing operation to the data content after merging.
Specifically, the acquiring unit 21 determines that the business datum of the target service occurs in first tables of data
Data wander, comprising:
For wherein one group of data information, according to the business datum for the target service for including in the data information
Generation time, judges whether the generation time of the business datum of the target service was included within the scope of default first time,
Wherein, the default first time range extracted from different system databases according to data warehouse business datum time it is true
It is fixed;
If the generation time that judging result is the business datum of the target service be included in default first time range it
It is interior, it is determined that in first tables of data data wander occurs for the business datum of the target service.
Specifically, the acquiring unit 21 obtained from the second tables of data occur data wander with the target service
Associated second data content of business datum, comprising:
It is generated in default second time range from lookup in the second tables of data, and associated with the target service
Data content, wherein default second time range extracts industry from different system databases for characterize data warehouse
Business data;
When determining that the data content searched is associated with the business datum of the target service, the number that will find
According to content as the second data content associated with the business datum of the target service that data wander occurs.
Specifically, the first data content of the business datum that the processing unit 22 will acquire and the business datum
The second data content merge, comprising:
Obtain the historical data content of the business datum of the target service;
By the historical data content, the first data content of the business datum of acquisition and the business datum
Second data content merges.
It should be noted that equipment provided by the embodiments of the present application can be realized by hardware mode, it can also be by soft
Part mode realizes, here without limitation,
The equipment judges whether the business datum obtained occurs data wander before carrying out data cleansing, and true
When determining business datum generation data wander, the data content that the business datum of data wander occurs is obtained, and then to business datum
Data content merge, efficiently avoid because data wander cause to occur omitting in business datum merging process it is cumulative
The problem of, effectively improve the accuracy of the business datum stored in data warehouse.
It will be understood by those skilled in the art that embodiments herein can provide as method, apparatus (equipment) or computer
Program product.Therefore, in terms of the application can be used complete hardware embodiment, complete software embodiment or combine software and hardware
Embodiment form.Moreover, it wherein includes the meter of computer usable program code that the application, which can be used in one or more,
The computer journey implemented in calculation machine usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.)
The form of sequence product.
The application is flow chart of the reference according to method, apparatus (equipment) and computer program product of the embodiment of the present application
And/or block diagram describes.It should be understood that each process in flowchart and/or the block diagram can be realized by computer program instructions
And/or the combination of the process and/or box in box and flowchart and/or the block diagram.It can provide these computer programs to refer to
Enable the processor of general purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to generate
One machine so that by the instruction that the processor of computer or other programmable data processing devices executes generate for realizing
The device for the function of being specified in one or more flows of the flowchart and/or one or more blocks of the block diagram.
These computer program instructions, which may also be stored in, is able to guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works, so that it includes referring to that instruction stored in the computer readable memory, which generates,
Enable the manufacture of device, the command device realize in one box of one or more flows of the flowchart and/or block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device, so that counting
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, thus in computer or
The instruction executed on other programmable devices is provided for realizing in one or more flows of the flowchart and/or block diagram one
The step of function of being specified in a box or multiple boxes.
Although the preferred embodiment of the application has been described, it is created once a person skilled in the art knows basic
Property concept, then additional changes and modifications may be made to these embodiments.So it includes excellent that the following claims are intended to be interpreted as
It selects embodiment and falls into all change and modification of the application range.
Obviously, those skilled in the art various changes and modifications can be made to the invention without departing from the application model
It encloses.In this way, if these modifications and variations of the application belong within the scope of the claim of this application and its equivalent technologies, then
The application is also intended to include these modifications and variations.