CN104424220B - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN104424220B
CN104424220B CN201310373788.6A CN201310373788A CN104424220B CN 104424220 B CN104424220 B CN 104424220B CN 201310373788 A CN201310373788 A CN 201310373788A CN 104424220 B CN104424220 B CN 104424220B
Authority
CN
China
Prior art keywords
data
pending
dimension
record
processing unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310373788.6A
Other languages
Chinese (zh)
Other versions
CN104424220A (en
Inventor
黄晓锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310373788.6A priority Critical patent/CN104424220B/en
Publication of CN104424220A publication Critical patent/CN104424220A/en
Application granted granted Critical
Publication of CN104424220B publication Critical patent/CN104424220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Abstract

This application discloses a kind of data processing method and devices, including:Take the dimension data of at least one pending dimension of pending data record;And for each pending dimension, according to the dimension data of the pending dimension, from preset multiple data processing units corresponding with the pending dimension, selection will be to data processing unit that pending data record is handled;And pending data record is distributed to the data processing unit of selection;And the dimension data of the pending dimension that the data processing unit by selecting records the pending data is handled.Using scheme provided by the embodiments of the present application, the efficiency for carrying out data processing is improved.

Description

A kind of data processing method and device
Technical field
This application involves in field of computer technology technical field of data processing more particularly to a kind of data processing method And device.
Background technology
Currently, in the practical application of computer technology and Internet technology, it is often necessary to unite to a large amount of data Meter, polymerization calculate and the processing such as analysis, for example, data summation, data deduplication, seeking data maximums and seeking data most The processing such as small value.
In the prior art, when stream data is handled, data source is by message-oriented middleware by data record in batches Form be sent to data processing equipment, data processing equipment for the dimension data of the pending dimension of data record at Reason, and the handling result of lot data record is obtained, further, multiple lot datas will can also be recorded and be handled Obtained multiple handling results carry out comprehensive accumulation process, and data record and finally obtained data result are stored to data In library.
In the said program of the prior art, data equipment serially carries out the processing of data record, it is necessary to wait for Upper a data record has been processed into rear, a data record under reprocessing, and for the data record of a batch, only The dimension data of one dimension can be handled, also can only be successively when needing to be handled for multiple data dimensions It carries out, so as to cause to the less efficient of data processing.
Invention content
In view of this, a kind of data processing method of the embodiment of the present application offer and device, are deposited in the prior art for solving Progress data processing less efficient problem.
The embodiment of the present application is achieved through the following technical solutions:
The embodiment of the present application provides a kind of data processing method, including:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of the pending dimension, from preset pending with this In the corresponding multiple data processing units of dimension, selection will record the data processing list handled to the pending data Member;
Pending data record is distributed to the data processing unit of selection;
By the dimension data progress for the pending dimension that the data processing unit selected records the pending data Processing.
In above-mentioned data processing method provided by the embodiments of the present application, preset for the different dimensions of data record Corresponding data processing unit, so that for the dimension data of different dimensions, it can be by the corresponding data processing of each dimension Unit parallel processing, also, for the corresponding multiple data processing units of each dimension set, it is possible to it is waited for for multiple The dimension data parallel processing for handling the dimension of data record, to improve the efficiency for carrying out data processing.
Further, according to the dimension data of the pending dimension, from preset corresponding with the pending dimension In multiple data processing units, selection will record the pending data data processing unit handled, specific to wrap It includes:
Determine the Hash codes of the dimension data of the pending dimension;
It is taken using the quantity of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension It is remaining, obtain remainder values;
From the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as general The data processing unit handled is recorded to the pending data.
In such manner, it is possible to according to the Hash codes of the dimension data of the pending dimension, accurately from multiple data processing units In, selection will record the data processing unit handled to the pending data.
Further, the dimension of the pending dimension pending data recorded by the data processing unit selected Data are handled, and are specifically included:
The data processing unit of selection determines the Hash codes of the unique identification data of the pending data record;
According to the rear preset quantity position of the Hash codes of the unique identification data, recorded from preservation data accepted multiple In data set, data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data is determined, as to be checked Data set, the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set it is rear pre- If number of bits is identical, and different data concentrates the rear preset quantity position of the Hash codes of the unique identification data of the data record preserved It is different;
When pending data record is not present in the determining data set to be checked, to the pending data The dimension data of the pending dimension of record is handled.
In this way, when the dimension data to pending data record is handled, first according to preservation data accepted Multiple data sets of record carry out duplicate removal processing, and when duplicate removal processing no longer needs to inquire in recording from all data accepteds, It only needs to inquire from one of multiple data sets, reduces the calculation amount of duplicate removal processing, carried out to further improve The efficiency of data processing.
Further, above-mentioned data processing method further includes:
It is default to meeting in the multiple data set according to the timestamp of the data record preserved in the multiple data set The data record of discarding condition carries out discard processing, the timestamp of data record be the data record be saved to data set when Between information.
In this way, the memory space of data set can be saved, and the number of the centrally stored data record of data can be reduced According to amount search efficiency is improved to reduce query time during duplicate removal processing.
Further, above-mentioned data processing method further includes:
For the pending dimension, the multiple data processing unit respectively waits for the data record that respectively receives this The handling result that the dimension data of processing dimension obtains after being handled carries out comprehensive accumulation process.
The embodiment of the present application also provides a kind of data processing equipment, including:
Acquiring unit, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit, for for each pending dimension, according to the dimension data of the pending dimension, from presetting Multiple data processing units corresponding with the pending dimension in, selection will record the pending data and handle Data processing unit;
Dispatching Unit, the data processing unit for pending data record to be distributed to selection;
Data processing unit, the dimension data of the pending dimension for the pending data record to being distributed to It is handled.
In above-mentioned data processing equipment provided by the embodiments of the present application, preset for the different dimensions of data record Corresponding data processing unit, so that for the dimension data of different dimensions, it can be by the corresponding data processing of each dimension Unit parallel processing, also, for the corresponding multiple data processing units of each dimension set, it is possible to it is waited for for multiple The dimension data parallel processing for handling the dimension of data record, to improve the efficiency for carrying out data processing.
Further, selecting unit is specifically used for determining the Hash codes of the dimension data of the pending dimension;And it uses and is somebody's turn to do The quantity remainder of the Hash codes pair of dimension data multiple data processing units corresponding with the pending dimension, obtains remainder values; And from the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will be right The pending data records the data processing unit handled.
In such manner, it is possible to according to the Hash codes of the dimension data of the pending dimension, accurately from multiple data processing units In, selection will record the data processing unit handled to the pending data.
Further, data processing unit is specifically used for determining the unique identification data of the pending data record Hash codes;And it according to the rear preset quantity position of the Hash codes of the unique identification data, is recorded from preservation data accepted more A data are concentrated, and data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data are determined, as to be checked Data set is ask, after the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set Preset quantity position is identical, and different data concentrates the rear preset quantity of the Hash codes of the unique identification data of the data record preserved Position is different;And when pending data record is not present in the determining data set to be checked, to described pending The dimension data of the pending dimension of data record is handled.
In this way, when the dimension data to pending data record is handled, first according to preservation data accepted Multiple data sets of record carry out duplicate removal processing, and when duplicate removal processing no longer needs to inquire in recording from all data accepteds, It only needs to inquire from one of multiple data sets, reduces the calculation amount of duplicate removal processing, carried out to further improve The efficiency of data processing.
Further, above-mentioned data processing equipment further includes:
Discarding unit, for the timestamp according to the data record preserved in the multiple data set, to the multiple number Meet the data record progress discard processing for presetting discarding condition according to concentrating, the timestamp of data record is that the data record is protected It is stored to the temporal information of data set.
In this way, the memory space of data set can be saved, and the number of the centrally stored data record of data can be reduced According to amount search efficiency is improved to reduce query time during duplicate removal processing.
Further, above-mentioned data processing equipment further includes:
Comprehensive summing elements, for being directed to the pending dimension, to the multiple data processing unit respectively to respectively connecing The handling result that the dimension data of the pending dimension of the data record of receipts obtains after being handled carries out comprehensive cumulative place Reason.
Other features and advantage will illustrate in the following description, also, partly become from specification It obtains it is clear that being understood by implementing the application.The purpose of the application and other advantages can be by the explanations write Specifically noted structure is realized and is obtained in book, claims and attached drawing.
Description of the drawings
Attached drawing is used for providing further understanding of the present application, and a part for constitution instruction, implements with the application Example for explaining the application, does not constitute the limitation to the application together.In the accompanying drawings:
Fig. 1 is the flow chart of data processing method provided by the embodiments of the present application;
Fig. 2 is in data processing method provided by the embodiments of the present application to select that pending data record will be handled Data processing unit flow chart;
Fig. 3 is by data processing unit in data processing method provided by the embodiments of the present application to pending data record The flow chart that dimension data is handled;
Fig. 4 is the structural schematic diagram of data processing equipment provided by the embodiments of the present application.
Specific implementation mode
In order to provide the implementation for improving the efficiency for carrying out data processing, the embodiment of the present application provides at a kind of data Method and device is managed, which can be applied to the process handled data, both can be implemented as a kind of method, It can be implemented as a kind of device.The preferred embodiment of the application is illustrated below in conjunction with Figure of description, it should be understood that this The described preferred embodiment in place is only used for describing and explaining the application, is not used to limit the application.And what is do not conflicted In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
The embodiment of the present application provides a kind of data processing method, as shown in Figure 1, including:
Step 101, the dimension data for obtaining at least one pending dimension that pending data records.
Step 102, for each pending dimension, according to the dimension data of the pending dimension, from it is preset with In the corresponding multiple data processing units of the pending dimension, selection will record the data handled to the pending data Processing unit.
Step 103, the data processing unit that pending data record is distributed to selection.
The number of dimensions of step 104, the pending dimension that the pending data is recorded by the data processing unit selected According to being handled.
Below in conjunction with the accompanying drawings, method and device provided by the present application is described in detail with specific embodiment.
In the embodiment of the present application, the pending data record obtained in above-mentioned steps 101 can be in the form of flow data It is constantly transmitted to data processing equipment, pending data record can be various types of data records, for example, it may be with The relevant data record of Internet technology, the transaction data record as involved in e-commerce website.
Pending dimension can be arranged previously according to the actual needs of data processing, could be provided as it is multiple, so as to Different pending dimensions can be subsequently directed to, parallel processing is carried out to data record, to improve data-handling efficiency.It waits locating Reason dimension can be the various data dimensions of data record, for example, for transaction data record, which can buy Family's payment amount dimension, the then when dimension data of buyer's payment amount dimension is that buyer buys commodity in transaction data record The amount of money of payment can also be seller's charge amount dimension, then the dimension data of seller's charge amount dimension is number of deals Can also be postage amount of money dimension according to the amount of money collected when seller's vending articles in record, then the dimension data of the postage dimension Buyer needs the postage paid when seller posts commodity to buyer as in transaction data record.
Further, calculation amount when subsequently being handled data record to reduce, before above-mentioned steps 101, It can be pre-processed by the original data record in the form of the flow data to reception, filter out follow-up carry out needed for data processing The data wanted obtain pending data record.
In the embodiment of the present application, it is directed to each data dimension in advance and is provided with corresponding multiple data processing units, to Can parallel processing be carried out to the dimension data of the pending dimension of multiple pending datas record simultaneously, to improve processing effect Rate.And it is possible to be each data processing unit setting unit ID, unit ID can be respectively from 0 to multiple data processing Integer between the quantity of unit.
Correspondingly, being waited for from preset with this according to the dimension data of a pending dimension in above-mentioned steps 102 It handles in the corresponding multiple data processing units of dimension, selection will record the data processing handled to the pending data It, specifically can be with as shown in Fig. 2, including when unit:
The Hash codes of the dimension data of step 201, the determining pending dimension obtained.
Step 202 uses the corresponding multiple data processing list of the pending dimension of Hash codes pair and this of the dimension data The quantity remainder of member, obtains remainder values.
Step 203, from multiple data processing unit, selecting unit ID be the remainder values data processing unit, make For the data processing unit handled will be recorded to the pending data.
In the embodiment of the present application, processing similar with data processing unit mode is selected shown in above-mentioned Fig. 2 can also be used Mode will be handled the pending data from the selection of multiple data processing unit according to the dimension data of acquisition Data processing unit, be no longer described in detail herein.
In the above method provided by the embodiments of the present application, in the data processing that pending data record is distributed to selection After unit, you can with through the above steps 104 the pending data is recorded by the data processing unit that selects this wait for The dimension data of reason dimension is handled, specifically can be with as shown in figure 3, including following processing step:
Step 301, the data processing unit selected determine the Hash codes of the unique identification data of pending data record.
Wherein, which can be used for distinguishing different pending data records, for example, for transaction record Data, the unique identification data can be trading card number.
Step 302, the rear preset quantity position according to the Hash codes of the unique identification data, from preserving data accepted record Multiple data sets in, corresponding with the rear preset quantity position of the Hash codes of unique identification data data set is determined, as waiting for Inquire data set.
Wherein, the Hash codes of the unique identification data of the data record preserved in each data set in multiple data set Preset quantity position is identical afterwards, and different data concentrates the rear present count of the Hash codes of the unique identification data of the data record preserved It is different to measure position.
In the embodiment of the present application, data processing unit can be saved into number after receiving pending data record According to concentration, and it is to be saved in the identical data record in rear preset quantity position of the Hash codes of unique identification data when stored In the same data set, so that subsequently can be based on the data record preserved in data set, to the pending number newly received Duplicate removal processing is carried out according to record.
Wherein, which can be flexibly arranged according to actual needs, for example, can be according to unique identification data The total bits of Hash codes be configured.
Step 303 is recorded in data Integrated query to be checked with the presence or absence of the pending data, when the data set to be checked In there is no the pending data record when, to the pending data record the pending dimension dimension data at Reason indicates that pending data record had been received, no when being recorded there are the pending data in the data set to be checked It needs again to handle pending data record, that is, cancels the processing recorded to the pending data, it further, can be with Abandon pending data record.
In the above method provided by the embodiments of the present application, further, in multiple number corresponding with the pending dimension According to processing unit respectively to the dimension data of the pending dimension of the data record respectively received, handled to obtain corresponding After handling result, it can also be directed to the pending dimension, comprehensive accumulation process is carried out to these handling results, for example, if It is data summation process, then these handling results can be carried out to cumulative summation can be from this if it is data maximums are sought Data maximums are sought in a little handling results.
It, can also will final place corresponding with each dimension after the final process result for obtaining comprehensive accumulation process Reason in output to preset storage system as a result, preserved.
In the above method provided by the embodiments of the present application, further, when data record to be saved in data set, also The temporal information that data record can be saved to data set is recorded as timestamp, so as to according to multiple data The timestamp for concentrating the data record preserved, the data record to meeting default discarding condition in multiple data sets carry out at discarding Reason, for example, can be more than the data record discarding of predetermined time period by the holding time, it can also be by timestamp in predetermined time Data record discarding before etc., so as to save the memory space of data set, and it is centrally stored to reduce data The data volume of data record improves search efficiency to reduce query time during duplicate removal processing.
Based on same inventive concept, according to the data processing method that the above embodiments of the present application provide, correspondingly, the application Another embodiment additionally provides data processing equipment, and structural schematic diagram is as shown in figure 4, specifically include:
Acquiring unit 401, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit 402, for for each pending dimension, according to the dimension data of the pending dimension, from advance In multiple data processing units corresponding with the pending dimension of setting, selection will record the pending data and carry out The data processing unit of processing;
Dispatching Unit 403, the data processing unit for pending data record to be distributed to selection;
Data processing unit 404, the dimension of the pending dimension for the pending data record to being distributed to Data are handled.
Further, selecting unit 402 are specifically used for determining the Hash codes of the dimension data of the pending dimension;And make With the quantity remainder of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension, remainder is obtained Value;And from the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will The data processing unit handled is recorded to the pending data.
Further, data processing unit 404 are specifically used for determining the unique identification data of the pending data record Hash codes;And it according to the rear preset quantity position of the Hash codes of the unique identification data, is recorded from preservation data accepted In multiple data sets, corresponding with the rear preset quantity position of the Hash codes of unique identification data data set is determined, as waiting for Data set is inquired, the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set Preset quantity position is identical afterwards, and different data concentrates the rear present count of the Hash codes of the unique identification data of the data record preserved It is different to measure position;And it when pending data record is not present in the determining data set to be checked, waits locating to described The dimension data for managing the pending dimension of data record is handled.
Further, above-mentioned data processing equipment further includes:
Discarding unit 405, for the timestamp according to the data record preserved in the multiple data set, to the multiple Meet the data record progress discard processing for presetting discarding condition in data set, the timestamp of data record is the data record quilt It is saved in the temporal information of data set.
Further, above-mentioned data processing equipment further includes:
Comprehensive summing elements 406, for being directed to the pending dimension, to the multiple data processing unit respectively to respective The handling result that the dimension data of the pending dimension of the data record of reception obtains after being handled carries out comprehensive cumulative place Reason.
The function of above-mentioned each unit can correspond to the respective handling step in flow shown in Fig. 1 to Fig. 3, no longer superfluous herein It states.
In conclusion scheme provided by the embodiments of the present application, including:Pending data is taken to record at least one pending The dimension data of dimension;And for each pending dimension, according to the dimension data of the pending dimension, from it is preset with In the corresponding multiple data processing units of the pending dimension, selection will record the data handled to the pending data Processing unit;And pending data record is distributed to the data processing unit of selection;And the data processing list by selecting Member handles the dimension data for the pending dimension that the pending data records.Using side provided by the embodiments of the present application Case improves the efficiency for carrying out data processing.
The data processing equipment that embodiments herein is provided can be realized by computer program.Those skilled in the art It should be appreciated that above-mentioned module dividing mode is only one kind in numerous module dividing modes, if being divided into other moulds Block or non-division module all should be within the protection domains of the application as long as data processing equipment has above-mentioned function.
The application is with reference to method, the equipment according to the embodiment of the present application(System)And the flow of computer program product Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in a box or multiple boxes.
In a typical configuration, the computer equipment includes one or more processors (CPU), input/output Interface, network interface and memory.Memory may include the volatile memory in computer-readable medium, random access memory The forms such as device (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is to calculate The example of machine readable medium.Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with Information storage is realized by any method or technique.Information can be computer-readable instruction, data structure, the module of program or Other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM are read-only Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or Other magnetic storage apparatus or any other non-transmission medium can be used for storage and can be accessed by a computing device information.According to Herein defines, and computer-readable medium does not include non-persistent computer readable media (transitory media), such as The data-signal and carrier wave of modulation.
Obviously, those skilled in the art can carry out the application essence of the various modification and variations without departing from the application God and range.In this way, if these modifications and variations of the application belong to the range of the application claim and its equivalent technologies Within, then the application is also intended to include these modifications and variations.

Claims (10)

1. a kind of data processing method, which is characterized in that including:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of the pending dimension, from the preset and pending dimension In corresponding multiple data processing units, selection will record the data processing unit handled to the pending data;
Pending data record is distributed to the data processing unit of selection;
The pending data record of reception is saved in data set by the data processing unit, when stored, will be unique The identical data record in rear preset quantity position of the Hash codes of mark data is saved in the same data set so that subsequently can be with Based on the data record preserved in data set, duplicate removal processing is carried out to the pending data record newly received;
The dimension data of the pending dimension recorded to the pending data by the data processing unit that selects is handled.
2. the method as described in claim 1, which is characterized in that according to the dimension data of the pending dimension, from presetting Multiple data processing units corresponding with the pending dimension in, selection will record the pending data and handle Data processing unit, specifically include:
Determine the Hash codes of the dimension data of the pending dimension;
Using the quantity remainder of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension, obtain To remainder values;
From the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will be right The pending data records the data processing unit handled.
3. the method as described in claim 1, which is characterized in that remembered to the pending data by the data processing unit selected The dimension data of the pending dimension of record is handled, and is specifically included:
The data processing unit of selection determines the Hash codes of the unique identification data of the pending data record;
According to the rear preset quantity position of the Hash codes of the unique identification data, from the multiple data for preserving data accepted record It concentrates, data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data is determined, as data to be checked Collect, the rear present count of the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set Amount position is identical, and different data concentrates the rear preset quantity position of the Hash codes of the unique identification data of the data record preserved not Together;
When pending data record is not present in the determining data set to be checked, the pending data is recorded The dimension data of the pending dimension handled.
4. method as claimed in claim 3, which is characterized in that further include:
According to the timestamp of the data record preserved in the multiple data set, to meeting default abandon in the multiple data set The data record of condition carries out discard processing, and the timestamp of data record is the time letter that the data record is saved to data set Breath.
5. the method as described in claim 1-4 is any, which is characterized in that further include:
For the pending dimension, to the multiple data processing unit respectively to the data record that respectively receives this be pending The handling result that the dimension data of dimension obtains after being handled carries out comprehensive accumulation process.
6. a kind of data processing equipment, which is characterized in that including:
Acquiring unit, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit, for for each pending dimension, according to the dimension data of the pending dimension, from it is preset with In the corresponding multiple data processing units of the pending dimension, selection will record the number handled to the pending data According to processing unit;
Dispatching Unit, the data processing unit for pending data record to be distributed to selection;
The pending data record of reception is saved in data set by storage unit for the data processing unit, When preservation, the identical data record in rear preset quantity position of the Hash codes of unique identification data is saved in the same data set In so that the pending data record newly received can subsequently be carried out at duplicate removal based on the data record preserved in data set Reason;
The dimension data of data processing unit, the pending dimension for the pending data record to being distributed to carries out Processing.
7. device as claimed in claim 6, which is characterized in that the selecting unit is specifically used for determining the pending dimension Dimension data Hash codes;And use the multiple data processings corresponding with the pending dimension of the Hash codes pair of the dimension data The quantity remainder of unit, obtains remainder values;And from the multiple data processing unit, selecting unit ID is the remainder values Data processing unit, as the data processing unit that is handled will be recorded to the pending data.
8. device as claimed in claim 6, which is characterized in that data processing unit is specifically used for determining the pending number According to the Hash codes of the unique identification data of record;And according to the rear preset quantity position of the Hash codes of the unique identification data, from In the multiple data sets for preserving data accepted record, the rear preset quantity position with the Hash codes of the unique identification data is determined Corresponding data set, as data set to be checked, the data record preserved in each data set in the multiple data set is only The rear preset quantity position of the Hash codes of one mark data is identical, and different data concentrates the unique mark number of the data record preserved According to Hash codes rear preset quantity position it is different;And when there is no the pending numbers in the determining data set to be checked When according to record, the dimension data of the pending dimension of pending data record is handled.
9. device as claimed in claim 8, which is characterized in that further include:
Discarding unit, for the timestamp according to the data record preserved in the multiple data set, to the multiple data set Middle to meet the data record progress discard processing for presetting discarding condition, the timestamp of data record is that the data record is saved to The temporal information of data set.
10. the device as described in claim 6-9 is any, which is characterized in that further include:
Comprehensive summing elements, for being directed to the pending dimension, to the multiple data processing unit respectively to respectively receiving The handling result that the dimension data of the pending dimension of data record obtains after being handled carries out comprehensive accumulation process.
CN201310373788.6A 2013-08-23 2013-08-23 A kind of data processing method and device Active CN104424220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310373788.6A CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310373788.6A CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN104424220A CN104424220A (en) 2015-03-18
CN104424220B true CN104424220B (en) 2018-07-13

Family

ID=52973217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310373788.6A Active CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN104424220B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701201B (en) * 2016-01-12 2019-05-07 浪潮通用软件有限公司 A kind of method and device of data processing
CN107180017B (en) * 2016-03-11 2021-05-28 阿里巴巴集团控股有限公司 Sample serialization method and device
CN107229663B (en) * 2016-03-25 2022-05-27 阿里巴巴集团控股有限公司 Data processing method and device and data table processing method and device
CN107610468A (en) * 2017-09-28 2018-01-19 航天科技控股集团股份有限公司 Speed density Analysis System and method based on recorder management
CN112162859A (en) * 2020-09-24 2021-01-01 成都长城开发科技有限公司 Data processing method and device, computer readable medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7457872B2 (en) * 2003-10-15 2008-11-25 Microsoft Corporation On-line service/application monitoring and reporting system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof

Also Published As

Publication number Publication date
CN104424220A (en) 2015-03-18

Similar Documents

Publication Publication Date Title
CN104424220B (en) A kind of data processing method and device
CN110008248B (en) Data processing method and device
CN108734575A (en) Financing method, system based on block chain and storage medium
US20160267566A1 (en) Systems and methods for managing an inventory of digital gift card assets
US20160035044A1 (en) Account processing method and apparatus
Udayakumar et al. Economic ordering policy for non‐instantaneous deteriorating items with price and advertisement dependent demand and permissible delay in payment under inflation
US20070156578A1 (en) Method and system for reducing a number of financial transactions
CN110363476B (en) Cargo warehousing distribution processing method and device
CN107885788A (en) A kind of business datum check method
WO2019196254A1 (en) Electronic resource packet processing method and apparatus, terminal device and medium
CN108171488A (en) Data processing method, device and system
CN107665463A (en) Method for processing resource, device, storage medium and the computer equipment of investment product
JP6199958B2 (en) User recommended methods and equipment
CN112927045A (en) Car rental credit data processing method, device and equipment
CN113518117A (en) ETC transaction recommendation method, bank server, computer device and medium
CN112200516A (en) Package loss processing method, device, medium and terminal equipment
CN109191101B (en) Method, device and equipment for guaranteeing customer asset safety
CN110969400A (en) Supply chain upstream and downstream data association method and device
CN110083437A (en) Handle the method and device of block chain affairs
CN107369093A (en) A kind of business determines method and apparatus
CN112200686A (en) Package damage processing method and device, medium and terminal equipment
CN111582905A (en) Target object acquisition method and device, electronic equipment and storage medium
CN106611315B (en) Service associated information pre-estimation method and device
CN108111484A (en) The method and system of credit information is provided for third-party registration
CN109150994A (en) Hot spot data processing method, device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211117

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co.,Ltd.

Address before: Box four, 847, capital building, Grand Cayman Island capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.