CN104424220A - Data processing method and equipment - Google Patents

Data processing method and equipment Download PDF

Info

Publication number
CN104424220A
CN104424220A CN201310373788.6A CN201310373788A CN104424220A CN 104424220 A CN104424220 A CN 104424220A CN 201310373788 A CN201310373788 A CN 201310373788A CN 104424220 A CN104424220 A CN 104424220A
Authority
CN
China
Prior art keywords
data
dimension
pending
record
data processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310373788.6A
Other languages
Chinese (zh)
Other versions
CN104424220B (en
Inventor
黄晓锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Network Technology Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201310373788.6A priority Critical patent/CN104424220B/en
Publication of CN104424220A publication Critical patent/CN104424220A/en
Application granted granted Critical
Publication of CN104424220B publication Critical patent/CN104424220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data processing method and equipment. The method comprises the steps: acquiring dimensional data of at least one to-be-processed dimension from a to-be-processed data record; with respect to each to-be-processed dimension and according to the dimensional data of the to-be-processed dimension, choosing a data processing unit for processing the to-be-processed data record from a plurality of preset data processing units corresponding to the to-be-processed dimension; distributing the to-be-processed record to the chosen data processing unit; and processing the dimensional data of the to-be-processed dimension of the to-be-processed data record by the chosen data processing unit. With adoption of the scheme, the data processing efficiency is improved.

Description

A kind of data processing method and device
Technical field
The application relates to the technical field of data processing in field of computer technology, particularly relates to a kind of data processing method and device.
Background technology
At present, in the practical application of computer technology and Internet technology, frequent needs are added up a large amount of data, be polymerized calculating and the process such as analysis, such as, data summation, data deduplication, ask for data maximums and ask for the process such as data minimum value.
In prior art, when stream data processes, data record is sent to data processing equipment by message-oriented middleware by data source in batch form, data processing equipment processes for the dimension data of the pending dimension of data record, and obtain the result of this lot data record, further, comprehensive accumulation process can also be carried out by processing to multiple lot data record the multiple results obtained, and data record and the data result that finally obtains are stored in database.
In the such scheme of prior art, the process of data equipment to data record is that serial is carried out, after must waiting for that a data record has been processed, next data record of reprocessing, and for the data record of batch, only can process the dimension data of a dimension, when needs process for multiple data dimension, also can only carry out successively, thus cause the efficiency of data processing lower.
Summary of the invention
In view of this, the embodiment of the present application provides a kind of data processing method and device, for solving the lower problem of the efficiency of carrying out data processing that exists in prior art.
The embodiment of the present application is achieved through the following technical solutions:
The embodiment of the present application provides a kind of data processing method, comprising:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record;
Described pending data record is distributed to the data processing unit of selection;
Processed by the dimension data of the data processing unit selected to this pending dimension of described pending data record.
In the above-mentioned data processing method that the embodiment of the present application provides, different dimensions for data record has preset corresponding data processing unit, thus make the dimension data for different dimensions, can by data processing unit parallel processing corresponding to each dimension, and, the multiple data processing units corresponding for each dimension set, so for the dimension data parallel processing of this dimension of multiple pending data record, thus can improve the efficiency of carrying out data processing.
Further, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record, specifically comprise:
Determine the Hash codes of the dimension data of this pending dimension;
Use the quantity remainder of the Hash codes of this dimension data pair multiple data processing units corresponding with this pending dimension, obtain remainder values;
From described multiple data processing unit, selection unit ID is the data processing unit of described remainder values, as the data processing unit that will process described pending data record.
Like this, according to the Hash codes of the dimension data of this pending dimension, exactly from multiple data processing unit, the data processing unit that will process this pending data record can be selected.
Further, processed by the dimension data of the data processing unit selected to this pending dimension of described pending data record, specifically comprise:
The data processing unit selected determines the Hash codes of the unique identification data of described pending data record;
According to the rear predetermined number position of the Hash codes of described unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of described unique identification data, as data set to be checked, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of described multiple data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different;
When the data centralization described to be checked determined does not exist described pending data record, the dimension data of this pending dimension of described pending data record is processed.
Like this, when processing this dimension data of pending data record, first duplicate removal process is carried out according to the multiple data sets preserving data accepted record, no longer need during its duplicate removal process to inquire about from all data accepted records, only need to inquire about from one of multiple data set, decrease the calculated amount of duplicate removal process, thus further increase the efficiency of carrying out data processing.
Further, above-mentioned data processing method, also comprises:
According to the timestamp of the data record that described multiple data centralization is preserved, meet to described multiple data centralization the data record presetting the condition that abandons and carry out discard processing, the timestamp of data record is the temporal information that this data record is saved to data set.
Like this, the storage space of data set can be saved, and the data volume of the data record that data centralization stores can be reduced, to reduce query time in duplicate removal processing procedure, improve search efficiency.
Further, above-mentioned data processing method, also comprises:
For this pending dimension, the result obtained after processing the dimension data of this pending dimension of the data record received separately respectively described multiple data processing unit, carries out comprehensive accumulation process.
The embodiment of the present application also provides a kind of data processing equipment, comprising:
Acquiring unit, for obtaining the dimension data of at least one pending dimension of pending data record;
Selection unit, for for each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record;
Dispatching Unit, for being distributed to the data processing unit of selection by described pending data record;
Data processing unit, the dimension data for this pending dimension to the described pending data record be distributed to processes.
In the above-mentioned data processing equipment that the embodiment of the present application provides, different dimensions for data record has preset corresponding data processing unit, thus make the dimension data for different dimensions, can by data processing unit parallel processing corresponding to each dimension, and, the multiple data processing units corresponding for each dimension set, so for the dimension data parallel processing of this dimension of multiple pending data record, thus can improve the efficiency of carrying out data processing.
Further, selection unit, specifically for determining the Hash codes of the dimension data of this pending dimension; And use the quantity remainder of the Hash codes of this dimension data pair multiple data processing units corresponding with this pending dimension, obtain remainder values; And from described multiple data processing unit, selection unit ID is the data processing unit of described remainder values, as the data processing unit that will process described pending data record.
Like this, according to the Hash codes of the dimension data of this pending dimension, exactly from multiple data processing unit, the data processing unit that will process this pending data record can be selected.
Further, data processing unit, specifically for determining the Hash codes of the unique identification data of described pending data record; And the rear predetermined number position of Hash codes according to described unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of described unique identification data, as data set to be checked, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of described multiple data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different; And when the data centralization described to be checked determined does not exist described pending data record, the dimension data of this pending dimension of described pending data record is processed.
Like this, when processing this dimension data of pending data record, first duplicate removal process is carried out according to the multiple data sets preserving data accepted record, no longer need during its duplicate removal process to inquire about from all data accepted records, only need to inquire about from one of multiple data set, decrease the calculated amount of duplicate removal process, thus further increase the efficiency of carrying out data processing.
Further, above-mentioned data processing equipment, also comprises:
Discarding unit, for the timestamp of the data record according to described multiple data centralization preservation, meet to described multiple data centralization the data record presetting the condition that abandons and carry out discard processing, the timestamp of data record is the temporal information that this data record is saved to data set.
Like this, the storage space of data set can be saved, and the data volume of the data record that data centralization stores can be reduced, to reduce query time in duplicate removal processing procedure, improve search efficiency.
Further, above-mentioned data processing equipment, also comprises:
Comprehensive summing elements, for for this pending dimension, the result obtained after processing the dimension data of this pending dimension of the data record received separately respectively described multiple data processing unit, carries out comprehensive accumulation process.
The further feature of the application and advantage will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the application.The object of the application and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Accompanying drawing explanation
Accompanying drawing is used to provide further understanding of the present application, and forms a part for instructions, is used from explanation the application with the embodiment of the present application one, does not form the restriction to the application.In the accompanying drawings:
The process flow diagram of the data processing method that Fig. 1 provides for the embodiment of the present application;
Selecting in the data processing method that Fig. 2 provides for the embodiment of the present application will to the process flow diagram of the data processing unit that pending data record processes;
The process flow diagram processed by the dimension data of data processing unit to pending data record in the data processing method that Fig. 3 provides for the embodiment of the present application;
The structural representation of the data processing equipment that Fig. 4 provides for the embodiment of the present application.
Embodiment
In order to provide the implementation improving and carry out the efficiency of data processing, the embodiment of the present application provides a kind of data processing method and device, this technical scheme can be applied to the process processed data, both can be implemented as a kind of method, also can be implemented as a kind of device.Be described below in conjunction with the preferred embodiment of Figure of description to the application, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the application, and be not used in restriction the application.And when not conflicting, the embodiment in the application and the feature in embodiment can combine mutually.
The embodiment of the present application provides a kind of data processing method, as shown in Figure 1, comprising:
Step 101, obtain the dimension data of at least one pending dimension of pending data record.
Step 102, for each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process this pending data record.
Step 103, this pending data record is distributed to the data processing unit of selection.
Step 104, to be processed by the dimension data of the data processing unit selected to this pending dimension of this pending data record.
Below in conjunction with accompanying drawing, the method provided the application with specific embodiment and device are described in detail.
In the embodiment of the present application, the pending data record obtained in above-mentioned steps 101 can be constantly transfer to data processing equipment with the form of flow data, pending data record can be various types of data records, such as, can be the data record relevant to Internet technology, as the transaction data record related in e-commerce website.
Pending dimension can be arrange according to the actual needs of data processing in advance, can be set to multiple, can carry out parallel processing so that follow-up for different pending dimensions to data record, thus improves data-handling efficiency.Pending dimension can be the various data dimensions of data record, such as, for transaction data record, this pending dimension can be buyer's payment dimension, then the dimension data of this buyer's payment dimension is the amount of money paid when buyer in transaction data record buys commodity, also can be that seller collects amount of money dimension, the amount of money collected when the dimension data that then this seller collects amount of money dimension is seller sells commodity in transaction data record, also can be postage amount of money dimension, when then the dimension data of this postage dimension is that in transaction data record, seller posts commodity to buyer, buyer needs the postage paid.
Further, in order to reduce follow-up data record is processed time calculated amount, before above-mentioned steps 101, pre-service can also be carried out to the original data record of form of the flow data received, filter out follow-up data of carrying out required for data processing, obtain pending data record.
In the embodiment of the present application, be provided with corresponding multiple data processing units for often kind of data dimension in advance, thus parallel processing can be carried out, to improve treatment effeciency to the dimension data of this pending dimension of multiple pending data record simultaneously.Further, can be each data processing unit setting unit ID, unit ID can be respectively from 0 to the plurality of data processing unit quantity integer.
Accordingly, according to the dimension data of a pending dimension in above-mentioned steps 102, from multiple data processing units corresponding with this pending dimension preset, when selecting the data processing unit that will process this pending data record, specifically can as shown in Figure 2, comprise:
Step 201, determine the Hash codes of dimension data of this pending dimension obtained.
Step 202, use the quantity remainder of the Hash codes of this dimension data pair the plurality of data processing unit corresponding with this pending dimension, obtain remainder values.
Step 203, from the plurality of data processing unit, selection unit ID is the data processing unit of this remainder values, as the data processing unit that will process this pending data record.
In the embodiment of the present application, also the processing mode similar to selecting data processing unit mode shown in above-mentioned Fig. 2 can be adopted, according to this dimension data obtained, select the data processing unit that will process these pending data from the plurality of data processing unit, be no longer described in detail at this.
In the said method that the embodiment of the present application provides, after this pending data record is distributed to the data processing unit of selection, namely can be processed by the dimension data of the data processing unit selected to this pending dimension of this pending data record by above-mentioned steps 104, specifically as shown in Figure 3, following treatment step can be comprised:
Step 301, the data processing unit selected determine the Hash codes of the unique identification data of this pending data record.
Wherein, this unique identification data may be used for distinguishing different pending data records, and such as, for transaction record data, this unique identification data can be trading card number.
Step 302, rear predetermined number position according to the Hash codes of this unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of this unique identification data, as data set to be checked.
Wherein, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of the plurality of data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different.
In the embodiment of the present application, data processing unit is after receiving pending data record, data centralization can be saved in, and be that data record identical for the rear predetermined number position of the Hash codes of unique identification data is saved in same data centralization when preserving, thus make the follow-up data record can preserved based on data centralization, duplicate removal process is carried out to the new pending data record received.
Wherein, this predetermined number can be arranged according to actual needs flexibly, such as, can arrange according to the total bit of the Hash codes of unique identification data.
Step 303, whether there is this pending data record in data centralization to be checked inquiry, when there is not this pending data record in this data centralization to be checked, the dimension data of this pending dimension of this pending data record is processed, when there is this pending data record in this data centralization to be checked, represent that this pending data record was received, do not need to process this pending data record again, namely the process to this pending data record is cancelled, further, this pending data record can be abandoned.
In the said method that the embodiment of the present application provides, further, at the plurality of data processing unit corresponding with this pending dimension respectively to the dimension data of this pending dimension of the data record received separately, carry out after process obtains corresponding result, can also for this pending dimension, comprehensive accumulation process is carried out to these results, such as, if data summation process, then these results can be carried out cumulative summation, if ask for data maximums, data maximums can be asked for from these results.
After the final process result obtaining comprehensive accumulation process, by the final process result corresponding respectively with each dimension, can also export in the storage system preset and preserve.
In the said method that the embodiment of the present application provides, further, when data record is saved in data centralization, the temporal information that data record can also be saved to data set carries out record as timestamp, thus the timestamp of the data record can preserved according to multiple data centralization, the data record presetting the condition that abandons is met to multiple data centralization and carries out discard processing, such as, the data record that can be exceeded predetermined time period the holding time abandons, also the data record of timestamp before predetermined time can be abandoned, thus the storage space of data set can be saved, and the data volume of the data record that data centralization stores can be reduced, to reduce query time in duplicate removal processing procedure, improve search efficiency.
Based on same inventive concept, according to the data processing method that the above embodiments of the present application provide, correspondingly, the application another embodiment still provides data processing equipment, its structural representation as shown in Figure 4, specifically comprises:
Acquiring unit 401, for obtaining the dimension data of at least one pending dimension of pending data record;
Selection unit 402, for for each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record;
Dispatching Unit 403, for being distributed to the data processing unit of selection by described pending data record;
Data processing unit 404, the dimension data for this pending dimension to the described pending data record be distributed to processes.
Further, selection unit 402, specifically for determining the Hash codes of the dimension data of this pending dimension; And use the quantity remainder of the Hash codes of this dimension data pair multiple data processing units corresponding with this pending dimension, obtain remainder values; And from described multiple data processing unit, selection unit ID is the data processing unit of described remainder values, as the data processing unit that will process described pending data record.
Further, data processing unit 404, specifically for determining the Hash codes of the unique identification data of described pending data record; And the rear predetermined number position of Hash codes according to described unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of described unique identification data, as data set to be checked, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of described multiple data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different; And when the data centralization described to be checked determined does not exist described pending data record, the dimension data of this pending dimension of described pending data record is processed.
Further, above-mentioned data processing equipment, also comprises:
Discarding unit 405, for the timestamp of the data record according to described multiple data centralization preservation, meet to described multiple data centralization the data record presetting the condition that abandons and carry out discard processing, the timestamp of data record is the temporal information that this data record is saved to data set.
Further, above-mentioned data processing equipment, also comprises:
Comprehensive summing elements 406, for for this pending dimension, the result obtained after processing the dimension data of this pending dimension of the data record received separately respectively described multiple data processing unit, carries out comprehensive accumulation process.
The function of above-mentioned each unit may correspond to the respective handling step in flow process shown in Fig. 1 to Fig. 3, does not repeat them here.
In sum, the scheme that the embodiment of the present application provides, comprising: the dimension data getting at least one pending dimension of pending data record; And for each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process this pending data record; And this pending data record is distributed to the data processing unit of selection; And processed by the dimension data of the data processing unit selected to this pending dimension of this pending data record.The scheme adopting the embodiment of the present application to provide, improves the efficiency of carrying out data processing.
The data processing equipment that the embodiment of the application provides realizes by computer program.Those skilled in the art should be understood that; above-mentioned Module Division mode is only the one in numerous Module Division mode; if be divided into other modules or do not divide module, as long as data processing equipment has above-mentioned functions, all should within the protection domain of the application.
The application describes with reference to according to the process flow diagram of the method for the embodiment of the present application, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
In one typically configuration, described computer equipment comprises one or more processor (CPU), input/output interface, network interface and internal memory.Internal memory may comprise the volatile memory in computer-readable medium, and the forms such as random access memory (RAM) and/or Nonvolatile memory, as ROM (read-only memory) (ROM) or flash memory (flash RAM).Internal memory is the example of computer-readable medium.Computer-readable medium comprises permanent and impermanency, removable and non-removable media can be stored to realize information by any method or technology.Information can be computer-readable instruction, data structure, the module of program or other data.The example of the storage medium of computing machine comprises, but be not limited to phase transition internal memory (PRAM), static RAM (SRAM), dynamic RAM (DRAM), the random access memory (RAM) of other types, ROM (read-only memory) (ROM), Electrically Erasable Read Only Memory (EEPROM), fast flash memory bank or other memory techniques, read-only optical disc ROM (read-only memory) (CD-ROM), digital versatile disc (DVD) or other optical memory, magnetic magnetic tape cassette, tape magnetic rigid disk stores or other magnetic storage apparatus or any other non-transmitting medium, can be used for storing the information can accessed by computing equipment.According to defining herein, computer-readable medium does not comprise the computer readable media (transitory media) of non-standing, as data-signal and the carrier wave of modulation.
Obviously, those skilled in the art can carry out various change and modification to the application and not depart from the spirit and scope of the application.Like this, if these amendments of the application and modification belong within the scope of the application's claim and equivalent technologies thereof, then the application is also intended to comprise these change and modification.

Claims (10)

1. a data processing method, is characterized in that, comprising:
Obtain the dimension data of at least one pending dimension of pending data record;
For each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record;
Described pending data record is distributed to the data processing unit of selection;
Processed by the dimension data of the data processing unit selected to this pending dimension of described pending data record.
2. the method for claim 1, it is characterized in that, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, the data processing unit that selection will process described pending data record, specifically comprises:
Determine the Hash codes of the dimension data of this pending dimension;
Use the quantity remainder of the Hash codes of this dimension data pair multiple data processing units corresponding with this pending dimension, obtain remainder values;
From described multiple data processing unit, selection unit ID is the data processing unit of described remainder values, as the data processing unit that will process described pending data record.
3. the method for claim 1, is characterized in that, is processed, specifically comprise by the dimension data of the data processing unit selected to this pending dimension of described pending data record:
The data processing unit selected determines the Hash codes of the unique identification data of described pending data record;
According to the rear predetermined number position of the Hash codes of described unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of described unique identification data, as data set to be checked, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of described multiple data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different;
When the data centralization described to be checked determined does not exist described pending data record, the dimension data of this pending dimension of described pending data record is processed.
4. method as claimed in claim 3, is characterized in that, also comprise:
According to the timestamp of the data record that described multiple data centralization is preserved, meet to described multiple data centralization the data record presetting the condition that abandons and carry out discard processing, the timestamp of data record is the temporal information that this data record is saved to data set.
5. the method as described in as arbitrary in claim 1-4, is characterized in that, also comprise:
For this pending dimension, the result obtained after processing the dimension data of this pending dimension of the data record received separately respectively described multiple data processing unit, carries out comprehensive accumulation process.
6. a data processing equipment, is characterized in that, comprising:
Acquiring unit, for obtaining the dimension data of at least one pending dimension of pending data record;
Selection unit, for for each pending dimension, according to the dimension data of this pending dimension, from multiple data processing units corresponding with this pending dimension preset, select the data processing unit that will process described pending data record;
Dispatching Unit, for being distributed to the data processing unit of selection by described pending data record;
Data processing unit, the dimension data for this pending dimension to the described pending data record be distributed to processes.
7. device as claimed in claim 6, is characterized in that, described selection unit, specifically for determining the Hash codes of the dimension data of this pending dimension; And use the quantity remainder of the Hash codes of this dimension data pair multiple data processing units corresponding with this pending dimension, obtain remainder values; And from described multiple data processing unit, selection unit ID is the data processing unit of described remainder values, as the data processing unit that will process described pending data record.
8. device as claimed in claim 6, is characterized in that, data processing unit, specifically for determining the Hash codes of the unique identification data of described pending data record; And the rear predetermined number position of Hash codes according to described unique identification data, from multiple data centralizations of preserving data accepted record, determine the data set corresponding with the rear predetermined number position of the Hash codes of described unique identification data, as data set to be checked, the rear predetermined number position of the Hash codes of the unique identification data of the data record that each data centralization of described multiple data centralization is preserved is identical, and different pieces of information concentrates the rear predetermined number position of the Hash codes of the unique identification data of the data record of preservation different; And when the data centralization described to be checked determined does not exist described pending data record, the dimension data of this pending dimension of described pending data record is processed.
9. device as claimed in claim 8, is characterized in that, also comprise:
Discarding unit, for the timestamp of the data record according to described multiple data centralization preservation, meet to described multiple data centralization the data record presetting the condition that abandons and carry out discard processing, the timestamp of data record is the temporal information that this data record is saved to data set.
10. the device as described in as arbitrary in claim 6-9, is characterized in that, also comprise:
Comprehensive summing elements, for for this pending dimension, the result obtained after processing the dimension data of this pending dimension of the data record received separately respectively described multiple data processing unit, carries out comprehensive accumulation process.
CN201310373788.6A 2013-08-23 2013-08-23 A kind of data processing method and device Active CN104424220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310373788.6A CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310373788.6A CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN104424220A true CN104424220A (en) 2015-03-18
CN104424220B CN104424220B (en) 2018-07-13

Family

ID=52973217

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310373788.6A Active CN104424220B (en) 2013-08-23 2013-08-23 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN104424220B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701201A (en) * 2016-01-12 2016-06-22 浪潮通用软件有限公司 Data processing method and device
WO2017152766A1 (en) * 2016-03-11 2017-09-14 阿里巴巴集团控股有限公司 Sample serialization method and device
CN107229663A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data processing method and device and tables of data treating method and apparatus
CN107610468A (en) * 2017-09-28 2018-01-19 航天科技控股集团股份有限公司 Speed density Analysis System and method based on recorder management
CN112162859A (en) * 2020-09-24 2021-01-01 成都长城开发科技有限公司 Data processing method and device, computer readable medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138111A1 (en) * 2003-10-15 2005-06-23 Microsoft Corporation On-line service/application monitoring and reporting system
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138111A1 (en) * 2003-10-15 2005-06-23 Microsoft Corporation On-line service/application monitoring and reporting system
CN101539950A (en) * 2009-05-08 2009-09-23 成都市华为赛门铁克科技有限公司 Data storage method and device
CN102467458A (en) * 2010-11-05 2012-05-23 英业达股份有限公司 Method for establishing index of data block
CN103136217A (en) * 2011-11-24 2013-06-05 阿里巴巴集团控股有限公司 Distributed data flow processing method and system thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701201A (en) * 2016-01-12 2016-06-22 浪潮通用软件有限公司 Data processing method and device
CN105701201B (en) * 2016-01-12 2019-05-07 浪潮通用软件有限公司 A kind of method and device of data processing
WO2017152766A1 (en) * 2016-03-11 2017-09-14 阿里巴巴集团控股有限公司 Sample serialization method and device
TWI761331B (en) * 2016-03-11 2022-04-21 香港商阿里巴巴集團服務有限公司 Sample serialization method and apparatus
CN107229663A (en) * 2016-03-25 2017-10-03 阿里巴巴集团控股有限公司 Data processing method and device and tables of data treating method and apparatus
CN107610468A (en) * 2017-09-28 2018-01-19 航天科技控股集团股份有限公司 Speed density Analysis System and method based on recorder management
CN112162859A (en) * 2020-09-24 2021-01-01 成都长城开发科技有限公司 Data processing method and device, computer readable medium and electronic equipment

Also Published As

Publication number Publication date
CN104424220B (en) 2018-07-13

Similar Documents

Publication Publication Date Title
CN110768912B (en) API gateway current limiting method and device
CN110333951B (en) Commodity purchase request distribution method
US10402427B2 (en) System and method for analyzing result of clustering massive data
CN104424220A (en) Data processing method and equipment
CN109934712B (en) Account checking method and account checking device applied to distributed system and electronic equipment
CN108470045A (en) The method and storage medium that electronic device, data chain type are filed
CN109145625A (en) Processing method, device and the block chain data-storage system of policy information
CN110019298A (en) Data processing method and device
EP2850544A1 (en) A user recommendation method and device
US10853033B1 (en) Effectively fusing database tables
CN105468623A (en) Data processing method and apparatus
CN106878365B (en) data synchronization method and device
CN110442598A (en) A kind of data query method and apparatus
US11301426B1 (en) Maintaining stable record identifiers in the presence of updated data records
CN112418864A (en) Data sending method and device
CN109063967B (en) Processing method and device for wind control scene feature tensor and electronic equipment
CN110928941A (en) Data fragment extraction method and device
CN105893393B (en) Data save method and device
CN110245177A (en) A kind of deriving method of million grades of data
CN111369282B (en) Resource processing method and device
CN109150994A (en) Hot spot data processing method, device and electronic equipment
CN104298614A (en) Method for storing data block in memory device and memory device
CN109035040B (en) Policy generation method and device and electronic equipment
CN110991177B (en) Material weight removing method and device
CN111611056A (en) Data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20211117

Address after: No. 699, Wangshang Road, Binjiang District, Hangzhou, Zhejiang

Patentee after: Alibaba (China) Network Technology Co.,Ltd.

Address before: Box four, 847, capital building, Grand Cayman Island capital, Cayman Islands, UK

Patentee before: ALIBABA GROUP HOLDING Ltd.