CN104424220B - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN104424220B CN104424220B CN201310373788.6A CN201310373788A CN104424220B CN 104424220 B CN104424220 B CN 104424220B CN 201310373788 A CN201310373788 A CN 201310373788A CN 104424220 B CN104424220 B CN 104424220B
- Authority
- CN
- China
- Prior art keywords
- data
- pending
- dimension
- record
- processing unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
Abstract
This application discloses a kind of data processing method and devices, including:Take the dimension data of at least one pending dimension of pending data record;And for each pending dimension, according to the dimension data of the pending dimension, from preset multiple data processing units corresponding with the pending dimension, selection will be to data processing unit that pending data record is handled;And pending data record is distributed to the data processing unit of selection;And the dimension data of the pending dimension that the data processing unit by selecting records the pending data is handled.Using scheme provided by the embodiments of the present application, the efficiency for carrying out data processing is improved.
Description
Technical field
This application involves in field of computer technology technical field of data processing more particularly to a kind of data processing method
And device.
Background technology
Currently, in the practical application of computer technology and Internet technology, it is often necessary to unite to a large amount of data
Meter, polymerization calculate and the processing such as analysis, for example, data summation, data deduplication, seeking data maximums and seeking data most
The processing such as small value.
In the prior art, when stream data is handled, data source is by message-oriented middleware by data record in batches
Form be sent to data processing equipment, data processing equipment for the dimension data of the pending dimension of data record at
Reason, and the handling result of lot data record is obtained, further, multiple lot datas will can also be recorded and be handled
Obtained multiple handling results carry out comprehensive accumulation process, and data record and finally obtained data result are stored to data
In library.
In the said program of the prior art, data equipment serially carries out the processing of data record, it is necessary to wait for
Upper a data record has been processed into rear, a data record under reprocessing, and for the data record of a batch, only
The dimension data of one dimension can be handled, also can only be successively when needing to be handled for multiple data dimensions
It carries out, so as to cause to the less efficient of data processing.
Invention content
In view of this, a kind of data processing method of the embodiment of the present application offer and device, are deposited in the prior art for solving
Progress data processing less efficient problem.
The embodiment of the present application is achieved through the following technical solutions:
The embodiment of the present application provides a kind of data processing method, including:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of the pending dimension, from preset pending with this
In the corresponding multiple data processing units of dimension, selection will record the data processing list handled to the pending data
Member;
Pending data record is distributed to the data processing unit of selection;
By the dimension data progress for the pending dimension that the data processing unit selected records the pending data
Processing.
In above-mentioned data processing method provided by the embodiments of the present application, preset for the different dimensions of data record
Corresponding data processing unit, so that for the dimension data of different dimensions, it can be by the corresponding data processing of each dimension
Unit parallel processing, also, for the corresponding multiple data processing units of each dimension set, it is possible to it is waited for for multiple
The dimension data parallel processing for handling the dimension of data record, to improve the efficiency for carrying out data processing.
Further, according to the dimension data of the pending dimension, from preset corresponding with the pending dimension
In multiple data processing units, selection will record the pending data data processing unit handled, specific to wrap
It includes:
Determine the Hash codes of the dimension data of the pending dimension;
It is taken using the quantity of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension
It is remaining, obtain remainder values;
From the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as general
The data processing unit handled is recorded to the pending data.
In such manner, it is possible to according to the Hash codes of the dimension data of the pending dimension, accurately from multiple data processing units
In, selection will record the data processing unit handled to the pending data.
Further, the dimension of the pending dimension pending data recorded by the data processing unit selected
Data are handled, and are specifically included:
The data processing unit of selection determines the Hash codes of the unique identification data of the pending data record;
According to the rear preset quantity position of the Hash codes of the unique identification data, recorded from preservation data accepted multiple
In data set, data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data is determined, as to be checked
Data set, the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set it is rear pre-
If number of bits is identical, and different data concentrates the rear preset quantity position of the Hash codes of the unique identification data of the data record preserved
It is different;
When pending data record is not present in the determining data set to be checked, to the pending data
The dimension data of the pending dimension of record is handled.
In this way, when the dimension data to pending data record is handled, first according to preservation data accepted
Multiple data sets of record carry out duplicate removal processing, and when duplicate removal processing no longer needs to inquire in recording from all data accepteds,
It only needs to inquire from one of multiple data sets, reduces the calculation amount of duplicate removal processing, carried out to further improve
The efficiency of data processing.
Further, above-mentioned data processing method further includes:
It is default to meeting in the multiple data set according to the timestamp of the data record preserved in the multiple data set
The data record of discarding condition carries out discard processing, the timestamp of data record be the data record be saved to data set when
Between information.
In this way, the memory space of data set can be saved, and the number of the centrally stored data record of data can be reduced
According to amount search efficiency is improved to reduce query time during duplicate removal processing.
Further, above-mentioned data processing method further includes:
For the pending dimension, the multiple data processing unit respectively waits for the data record that respectively receives this
The handling result that the dimension data of processing dimension obtains after being handled carries out comprehensive accumulation process.
The embodiment of the present application also provides a kind of data processing equipment, including:
Acquiring unit, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit, for for each pending dimension, according to the dimension data of the pending dimension, from presetting
Multiple data processing units corresponding with the pending dimension in, selection will record the pending data and handle
Data processing unit;
Dispatching Unit, the data processing unit for pending data record to be distributed to selection;
Data processing unit, the dimension data of the pending dimension for the pending data record to being distributed to
It is handled.
In above-mentioned data processing equipment provided by the embodiments of the present application, preset for the different dimensions of data record
Corresponding data processing unit, so that for the dimension data of different dimensions, it can be by the corresponding data processing of each dimension
Unit parallel processing, also, for the corresponding multiple data processing units of each dimension set, it is possible to it is waited for for multiple
The dimension data parallel processing for handling the dimension of data record, to improve the efficiency for carrying out data processing.
Further, selecting unit is specifically used for determining the Hash codes of the dimension data of the pending dimension;And it uses and is somebody's turn to do
The quantity remainder of the Hash codes pair of dimension data multiple data processing units corresponding with the pending dimension, obtains remainder values;
And from the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will be right
The pending data records the data processing unit handled.
In such manner, it is possible to according to the Hash codes of the dimension data of the pending dimension, accurately from multiple data processing units
In, selection will record the data processing unit handled to the pending data.
Further, data processing unit is specifically used for determining the unique identification data of the pending data record
Hash codes;And it according to the rear preset quantity position of the Hash codes of the unique identification data, is recorded from preservation data accepted more
A data are concentrated, and data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data are determined, as to be checked
Data set is ask, after the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set
Preset quantity position is identical, and different data concentrates the rear preset quantity of the Hash codes of the unique identification data of the data record preserved
Position is different;And when pending data record is not present in the determining data set to be checked, to described pending
The dimension data of the pending dimension of data record is handled.
In this way, when the dimension data to pending data record is handled, first according to preservation data accepted
Multiple data sets of record carry out duplicate removal processing, and when duplicate removal processing no longer needs to inquire in recording from all data accepteds,
It only needs to inquire from one of multiple data sets, reduces the calculation amount of duplicate removal processing, carried out to further improve
The efficiency of data processing.
Further, above-mentioned data processing equipment further includes:
Discarding unit, for the timestamp according to the data record preserved in the multiple data set, to the multiple number
Meet the data record progress discard processing for presetting discarding condition according to concentrating, the timestamp of data record is that the data record is protected
It is stored to the temporal information of data set.
In this way, the memory space of data set can be saved, and the number of the centrally stored data record of data can be reduced
According to amount search efficiency is improved to reduce query time during duplicate removal processing.
Further, above-mentioned data processing equipment further includes:
Comprehensive summing elements, for being directed to the pending dimension, to the multiple data processing unit respectively to respectively connecing
The handling result that the dimension data of the pending dimension of the data record of receipts obtains after being handled carries out comprehensive cumulative place
Reason.
Other features and advantage will illustrate in the following description, also, partly become from specification
It obtains it is clear that being understood by implementing the application.The purpose of the application and other advantages can be by the explanations write
Specifically noted structure is realized and is obtained in book, claims and attached drawing.
Description of the drawings
Attached drawing is used for providing further understanding of the present application, and a part for constitution instruction, implements with the application
Example for explaining the application, does not constitute the limitation to the application together.In the accompanying drawings:
Fig. 1 is the flow chart of data processing method provided by the embodiments of the present application;
Fig. 2 is in data processing method provided by the embodiments of the present application to select that pending data record will be handled
Data processing unit flow chart;
Fig. 3 is by data processing unit in data processing method provided by the embodiments of the present application to pending data record
The flow chart that dimension data is handled;
Fig. 4 is the structural schematic diagram of data processing equipment provided by the embodiments of the present application.
Specific implementation mode
In order to provide the implementation for improving the efficiency for carrying out data processing, the embodiment of the present application provides at a kind of data
Method and device is managed, which can be applied to the process handled data, both can be implemented as a kind of method,
It can be implemented as a kind of device.The preferred embodiment of the application is illustrated below in conjunction with Figure of description, it should be understood that this
The described preferred embodiment in place is only used for describing and explaining the application, is not used to limit the application.And what is do not conflicted
In the case of, the features in the embodiments and the embodiments of the present application can be combined with each other.
The embodiment of the present application provides a kind of data processing method, as shown in Figure 1, including:
Step 101, the dimension data for obtaining at least one pending dimension that pending data records.
Step 102, for each pending dimension, according to the dimension data of the pending dimension, from it is preset with
In the corresponding multiple data processing units of the pending dimension, selection will record the data handled to the pending data
Processing unit.
Step 103, the data processing unit that pending data record is distributed to selection.
The number of dimensions of step 104, the pending dimension that the pending data is recorded by the data processing unit selected
According to being handled.
Below in conjunction with the accompanying drawings, method and device provided by the present application is described in detail with specific embodiment.
In the embodiment of the present application, the pending data record obtained in above-mentioned steps 101 can be in the form of flow data
It is constantly transmitted to data processing equipment, pending data record can be various types of data records, for example, it may be with
The relevant data record of Internet technology, the transaction data record as involved in e-commerce website.
Pending dimension can be arranged previously according to the actual needs of data processing, could be provided as it is multiple, so as to
Different pending dimensions can be subsequently directed to, parallel processing is carried out to data record, to improve data-handling efficiency.It waits locating
Reason dimension can be the various data dimensions of data record, for example, for transaction data record, which can buy
Family's payment amount dimension, the then when dimension data of buyer's payment amount dimension is that buyer buys commodity in transaction data record
The amount of money of payment can also be seller's charge amount dimension, then the dimension data of seller's charge amount dimension is number of deals
Can also be postage amount of money dimension according to the amount of money collected when seller's vending articles in record, then the dimension data of the postage dimension
Buyer needs the postage paid when seller posts commodity to buyer as in transaction data record.
Further, calculation amount when subsequently being handled data record to reduce, before above-mentioned steps 101,
It can be pre-processed by the original data record in the form of the flow data to reception, filter out follow-up carry out needed for data processing
The data wanted obtain pending data record.
In the embodiment of the present application, it is directed to each data dimension in advance and is provided with corresponding multiple data processing units, to
Can parallel processing be carried out to the dimension data of the pending dimension of multiple pending datas record simultaneously, to improve processing effect
Rate.And it is possible to be each data processing unit setting unit ID, unit ID can be respectively from 0 to multiple data processing
Integer between the quantity of unit.
Correspondingly, being waited for from preset with this according to the dimension data of a pending dimension in above-mentioned steps 102
It handles in the corresponding multiple data processing units of dimension, selection will record the data processing handled to the pending data
It, specifically can be with as shown in Fig. 2, including when unit:
The Hash codes of the dimension data of step 201, the determining pending dimension obtained.
Step 202 uses the corresponding multiple data processing list of the pending dimension of Hash codes pair and this of the dimension data
The quantity remainder of member, obtains remainder values.
Step 203, from multiple data processing unit, selecting unit ID be the remainder values data processing unit, make
For the data processing unit handled will be recorded to the pending data.
In the embodiment of the present application, processing similar with data processing unit mode is selected shown in above-mentioned Fig. 2 can also be used
Mode will be handled the pending data from the selection of multiple data processing unit according to the dimension data of acquisition
Data processing unit, be no longer described in detail herein.
In the above method provided by the embodiments of the present application, in the data processing that pending data record is distributed to selection
After unit, you can with through the above steps 104 the pending data is recorded by the data processing unit that selects this wait for
The dimension data of reason dimension is handled, specifically can be with as shown in figure 3, including following processing step:
Step 301, the data processing unit selected determine the Hash codes of the unique identification data of pending data record.
Wherein, which can be used for distinguishing different pending data records, for example, for transaction record
Data, the unique identification data can be trading card number.
Step 302, the rear preset quantity position according to the Hash codes of the unique identification data, from preserving data accepted record
Multiple data sets in, corresponding with the rear preset quantity position of the Hash codes of unique identification data data set is determined, as waiting for
Inquire data set.
Wherein, the Hash codes of the unique identification data of the data record preserved in each data set in multiple data set
Preset quantity position is identical afterwards, and different data concentrates the rear present count of the Hash codes of the unique identification data of the data record preserved
It is different to measure position.
In the embodiment of the present application, data processing unit can be saved into number after receiving pending data record
According to concentration, and it is to be saved in the identical data record in rear preset quantity position of the Hash codes of unique identification data when stored
In the same data set, so that subsequently can be based on the data record preserved in data set, to the pending number newly received
Duplicate removal processing is carried out according to record.
Wherein, which can be flexibly arranged according to actual needs, for example, can be according to unique identification data
The total bits of Hash codes be configured.
Step 303 is recorded in data Integrated query to be checked with the presence or absence of the pending data, when the data set to be checked
In there is no the pending data record when, to the pending data record the pending dimension dimension data at
Reason indicates that pending data record had been received, no when being recorded there are the pending data in the data set to be checked
It needs again to handle pending data record, that is, cancels the processing recorded to the pending data, it further, can be with
Abandon pending data record.
In the above method provided by the embodiments of the present application, further, in multiple number corresponding with the pending dimension
According to processing unit respectively to the dimension data of the pending dimension of the data record respectively received, handled to obtain corresponding
After handling result, it can also be directed to the pending dimension, comprehensive accumulation process is carried out to these handling results, for example, if
It is data summation process, then these handling results can be carried out to cumulative summation can be from this if it is data maximums are sought
Data maximums are sought in a little handling results.
It, can also will final place corresponding with each dimension after the final process result for obtaining comprehensive accumulation process
Reason in output to preset storage system as a result, preserved.
In the above method provided by the embodiments of the present application, further, when data record to be saved in data set, also
The temporal information that data record can be saved to data set is recorded as timestamp, so as to according to multiple data
The timestamp for concentrating the data record preserved, the data record to meeting default discarding condition in multiple data sets carry out at discarding
Reason, for example, can be more than the data record discarding of predetermined time period by the holding time, it can also be by timestamp in predetermined time
Data record discarding before etc., so as to save the memory space of data set, and it is centrally stored to reduce data
The data volume of data record improves search efficiency to reduce query time during duplicate removal processing.
Based on same inventive concept, according to the data processing method that the above embodiments of the present application provide, correspondingly, the application
Another embodiment additionally provides data processing equipment, and structural schematic diagram is as shown in figure 4, specifically include:
Acquiring unit 401, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit 402, for for each pending dimension, according to the dimension data of the pending dimension, from advance
In multiple data processing units corresponding with the pending dimension of setting, selection will record the pending data and carry out
The data processing unit of processing;
Dispatching Unit 403, the data processing unit for pending data record to be distributed to selection;
Data processing unit 404, the dimension of the pending dimension for the pending data record to being distributed to
Data are handled.
Further, selecting unit 402 are specifically used for determining the Hash codes of the dimension data of the pending dimension;And make
With the quantity remainder of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension, remainder is obtained
Value;And from the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will
The data processing unit handled is recorded to the pending data.
Further, data processing unit 404 are specifically used for determining the unique identification data of the pending data record
Hash codes;And it according to the rear preset quantity position of the Hash codes of the unique identification data, is recorded from preservation data accepted
In multiple data sets, corresponding with the rear preset quantity position of the Hash codes of unique identification data data set is determined, as waiting for
Data set is inquired, the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set
Preset quantity position is identical afterwards, and different data concentrates the rear present count of the Hash codes of the unique identification data of the data record preserved
It is different to measure position;And it when pending data record is not present in the determining data set to be checked, waits locating to described
The dimension data for managing the pending dimension of data record is handled.
Further, above-mentioned data processing equipment further includes:
Discarding unit 405, for the timestamp according to the data record preserved in the multiple data set, to the multiple
Meet the data record progress discard processing for presetting discarding condition in data set, the timestamp of data record is the data record quilt
It is saved in the temporal information of data set.
Further, above-mentioned data processing equipment further includes:
Comprehensive summing elements 406, for being directed to the pending dimension, to the multiple data processing unit respectively to respective
The handling result that the dimension data of the pending dimension of the data record of reception obtains after being handled carries out comprehensive cumulative place
Reason.
The function of above-mentioned each unit can correspond to the respective handling step in flow shown in Fig. 1 to Fig. 3, no longer superfluous herein
It states.
In conclusion scheme provided by the embodiments of the present application, including:Pending data is taken to record at least one pending
The dimension data of dimension;And for each pending dimension, according to the dimension data of the pending dimension, from it is preset with
In the corresponding multiple data processing units of the pending dimension, selection will record the data handled to the pending data
Processing unit;And pending data record is distributed to the data processing unit of selection;And the data processing list by selecting
Member handles the dimension data for the pending dimension that the pending data records.Using side provided by the embodiments of the present application
Case improves the efficiency for carrying out data processing.
The data processing equipment that embodiments herein is provided can be realized by computer program.Those skilled in the art
It should be appreciated that above-mentioned module dividing mode is only one kind in numerous module dividing modes, if being divided into other moulds
Block or non-division module all should be within the protection domains of the application as long as data processing equipment has above-mentioned function.
The application is with reference to method, the equipment according to the embodiment of the present application(System)And the flow of computer program product
Figure and/or block diagram describe.It should be understood that can be realized by computer program instructions every first-class in flowchart and/or the block diagram
The combination of flow and/or box in journey and/or box and flowchart and/or the block diagram.These computer programs can be provided
Instruct the processor of all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing devices to produce
A raw machine so that the instruction executed by computer or the processor of other programmable data processing devices is generated for real
The device for the function of being specified in present one flow of flow chart or one box of multiple flows and/or block diagram or multiple boxes.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing devices with spy
Determine in the computer-readable memory that mode works so that instruction generation stored in the computer readable memory includes referring to
Enable the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one box of block diagram or
The function of being specified in multiple boxes.
These computer program instructions also can be loaded onto a computer or other programmable data processing device so that count
Series of operation steps are executed on calculation machine or other programmable devices to generate computer implemented processing, in computer or
The instruction executed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one
The step of function of being specified in a box or multiple boxes.
In a typical configuration, the computer equipment includes one or more processors (CPU), input/output
Interface, network interface and memory.Memory may include the volatile memory in computer-readable medium, random access memory
The forms such as device (RAM) and/or Nonvolatile memory, such as read-only memory (ROM) or flash memory (flash RAM).Memory is to calculate
The example of machine readable medium.Computer-readable medium includes that permanent and non-permanent, removable and non-removable media can be with
Information storage is realized by any method or technique.Information can be computer-readable instruction, data structure, the module of program or
Other data.The example of the storage medium of computer includes, but are not limited to phase transition internal memory (PRAM), static RAM
(SRAM), dynamic random access memory (DRAM), other kinds of random access memory (RAM), read-only memory
(ROM), electrically erasable programmable read-only memory (EEPROM), fast flash memory bank or other memory techniques, CD-ROM are read-only
Memory (CD-ROM), digital versatile disc (DVD) or other optical storages, magnetic tape cassette, tape magnetic disk storage or
Other magnetic storage apparatus or any other non-transmission medium can be used for storage and can be accessed by a computing device information.According to
Herein defines, and computer-readable medium does not include non-persistent computer readable media (transitory media), such as
The data-signal and carrier wave of modulation.
Obviously, those skilled in the art can carry out the application essence of the various modification and variations without departing from the application
God and range.In this way, if these modifications and variations of the application belong to the range of the application claim and its equivalent technologies
Within, then the application is also intended to include these modifications and variations.
Claims (10)
1. a kind of data processing method, which is characterized in that including:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of the pending dimension, from the preset and pending dimension
In corresponding multiple data processing units, selection will record the data processing unit handled to the pending data;
Pending data record is distributed to the data processing unit of selection;
The pending data record of reception is saved in data set by the data processing unit, when stored, will be unique
The identical data record in rear preset quantity position of the Hash codes of mark data is saved in the same data set so that subsequently can be with
Based on the data record preserved in data set, duplicate removal processing is carried out to the pending data record newly received;
The dimension data of the pending dimension recorded to the pending data by the data processing unit that selects is handled.
2. the method as described in claim 1, which is characterized in that according to the dimension data of the pending dimension, from presetting
Multiple data processing units corresponding with the pending dimension in, selection will record the pending data and handle
Data processing unit, specifically include:
Determine the Hash codes of the dimension data of the pending dimension;
Using the quantity remainder of the Hash codes pair of the dimension data multiple data processing units corresponding with the pending dimension, obtain
To remainder values;
From the multiple data processing unit, selecting unit ID is the data processing unit of the remainder values, as will be right
The pending data records the data processing unit handled.
3. the method as described in claim 1, which is characterized in that remembered to the pending data by the data processing unit selected
The dimension data of the pending dimension of record is handled, and is specifically included:
The data processing unit of selection determines the Hash codes of the unique identification data of the pending data record;
According to the rear preset quantity position of the Hash codes of the unique identification data, from the multiple data for preserving data accepted record
It concentrates, data set corresponding with the rear preset quantity position of the Hash codes of the unique identification data is determined, as data to be checked
Collect, the rear present count of the Hash codes of the unique identification data of the data record preserved in each data set in the multiple data set
Amount position is identical, and different data concentrates the rear preset quantity position of the Hash codes of the unique identification data of the data record preserved not
Together;
When pending data record is not present in the determining data set to be checked, the pending data is recorded
The dimension data of the pending dimension handled.
4. method as claimed in claim 3, which is characterized in that further include:
According to the timestamp of the data record preserved in the multiple data set, to meeting default abandon in the multiple data set
The data record of condition carries out discard processing, and the timestamp of data record is the time letter that the data record is saved to data set
Breath.
5. the method as described in claim 1-4 is any, which is characterized in that further include:
For the pending dimension, to the multiple data processing unit respectively to the data record that respectively receives this be pending
The handling result that the dimension data of dimension obtains after being handled carries out comprehensive accumulation process.
6. a kind of data processing equipment, which is characterized in that including:
Acquiring unit, the dimension data of at least one pending dimension for obtaining pending data record;
Selecting unit, for for each pending dimension, according to the dimension data of the pending dimension, from it is preset with
In the corresponding multiple data processing units of the pending dimension, selection will record the number handled to the pending data
According to processing unit;
Dispatching Unit, the data processing unit for pending data record to be distributed to selection;
The pending data record of reception is saved in data set by storage unit for the data processing unit,
When preservation, the identical data record in rear preset quantity position of the Hash codes of unique identification data is saved in the same data set
In so that the pending data record newly received can subsequently be carried out at duplicate removal based on the data record preserved in data set
Reason;
The dimension data of data processing unit, the pending dimension for the pending data record to being distributed to carries out
Processing.
7. device as claimed in claim 6, which is characterized in that the selecting unit is specifically used for determining the pending dimension
Dimension data Hash codes;And use the multiple data processings corresponding with the pending dimension of the Hash codes pair of the dimension data
The quantity remainder of unit, obtains remainder values;And from the multiple data processing unit, selecting unit ID is the remainder values
Data processing unit, as the data processing unit that is handled will be recorded to the pending data.
8. device as claimed in claim 6, which is characterized in that data processing unit is specifically used for determining the pending number
According to the Hash codes of the unique identification data of record;And according to the rear preset quantity position of the Hash codes of the unique identification data, from
In the multiple data sets for preserving data accepted record, the rear preset quantity position with the Hash codes of the unique identification data is determined
Corresponding data set, as data set to be checked, the data record preserved in each data set in the multiple data set is only
The rear preset quantity position of the Hash codes of one mark data is identical, and different data concentrates the unique mark number of the data record preserved
According to Hash codes rear preset quantity position it is different;And when there is no the pending numbers in the determining data set to be checked
When according to record, the dimension data of the pending dimension of pending data record is handled.
9. device as claimed in claim 8, which is characterized in that further include:
Discarding unit, for the timestamp according to the data record preserved in the multiple data set, to the multiple data set
Middle to meet the data record progress discard processing for presetting discarding condition, the timestamp of data record is that the data record is saved to
The temporal information of data set.
10. the device as described in claim 6-9 is any, which is characterized in that further include:
Comprehensive summing elements, for being directed to the pending dimension, to the multiple data processing unit respectively to respectively receiving
The handling result that the dimension data of the pending dimension of data record obtains after being handled carries out comprehensive accumulation process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310373788.6A CN104424220B (en) | 2013-08-23 | 2013-08-23 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310373788.6A CN104424220B (en) | 2013-08-23 | 2013-08-23 | A kind of data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104424220A CN104424220A (en) | 2015-03-18 |
CN104424220B true CN104424220B (en) | 2018-07-13 |
Family
ID=52973217
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310373788.6A Active CN104424220B (en) | 2013-08-23 | 2013-08-23 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104424220B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105701201B (en) * | 2016-01-12 | 2019-05-07 | 浪潮通用软件有限公司 | A kind of method and device of data processing |
CN107180017B (en) * | 2016-03-11 | 2021-05-28 | 阿里巴巴集团控股有限公司 | Sample serialization method and device |
CN107229663B (en) * | 2016-03-25 | 2022-05-27 | 阿里巴巴集团控股有限公司 | Data processing method and device and data table processing method and device |
CN107610468A (en) * | 2017-09-28 | 2018-01-19 | 航天科技控股集团股份有限公司 | Speed density Analysis System and method based on recorder management |
CN112162859A (en) * | 2020-09-24 | 2021-01-01 | 成都长城开发科技有限公司 | Data processing method and device, computer readable medium and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101539950A (en) * | 2009-05-08 | 2009-09-23 | 成都市华为赛门铁克科技有限公司 | Data storage method and device |
CN102467458A (en) * | 2010-11-05 | 2012-05-23 | 英业达股份有限公司 | Method for establishing index of data block |
CN103136217A (en) * | 2011-11-24 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Distributed data flow processing method and system thereof |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7457872B2 (en) * | 2003-10-15 | 2008-11-25 | Microsoft Corporation | On-line service/application monitoring and reporting system |
-
2013
- 2013-08-23 CN CN201310373788.6A patent/CN104424220B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101539950A (en) * | 2009-05-08 | 2009-09-23 | 成都市华为赛门铁克科技有限公司 | Data storage method and device |
CN102467458A (en) * | 2010-11-05 | 2012-05-23 | 英业达股份有限公司 | Method for establishing index of data block |
CN103136217A (en) * | 2011-11-24 | 2013-06-05 | 阿里巴巴集团控股有限公司 | Distributed data flow processing method and system thereof |
Also Published As
Publication number | Publication date |
---|---|
CN104424220A (en) | 2015-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104424220B (en) | A kind of data processing method and device | |
CN110008248B (en) | Data processing method and device | |
CN108734575A (en) | Financing method, system based on block chain and storage medium | |
US20160267566A1 (en) | Systems and methods for managing an inventory of digital gift card assets | |
US20160035044A1 (en) | Account processing method and apparatus | |
Udayakumar et al. | Economic ordering policy for non‐instantaneous deteriorating items with price and advertisement dependent demand and permissible delay in payment under inflation | |
US20070156578A1 (en) | Method and system for reducing a number of financial transactions | |
CN110363476B (en) | Cargo warehousing distribution processing method and device | |
CN107885788A (en) | A kind of business datum check method | |
WO2019196254A1 (en) | Electronic resource packet processing method and apparatus, terminal device and medium | |
CN108171488A (en) | Data processing method, device and system | |
CN107665463A (en) | Method for processing resource, device, storage medium and the computer equipment of investment product | |
JP6199958B2 (en) | User recommended methods and equipment | |
CN112927045A (en) | Car rental credit data processing method, device and equipment | |
CN113518117A (en) | ETC transaction recommendation method, bank server, computer device and medium | |
CN112200516A (en) | Package loss processing method, device, medium and terminal equipment | |
CN109191101B (en) | Method, device and equipment for guaranteeing customer asset safety | |
CN110969400A (en) | Supply chain upstream and downstream data association method and device | |
CN110083437A (en) | Handle the method and device of block chain affairs | |
CN107369093A (en) | A kind of business determines method and apparatus | |
CN112200686A (en) | Package damage processing method and device, medium and terminal equipment | |
CN111582905A (en) | Target object acquisition method and device, electronic equipment and storage medium | |
CN106611315B (en) | Service associated information pre-estimation method and device | |
CN108111484A (en) | The method and system of credit information is provided for third-party registration | |
CN109150994A (en) | Hot spot data processing method, device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20211117 Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang Patentee after: Alibaba (China) Network Technology Co.,Ltd. Address before: Box four, 847, capital building, Grand Cayman Island capital, Cayman Islands, UK Patentee before: ALIBABA GROUP HOLDING Ltd. |