CN104050291B - A kind of method for parallel processing and system of account balance data - Google Patents
A kind of method for parallel processing and system of account balance data Download PDFInfo
- Publication number
- CN104050291B CN104050291B CN201410306448.6A CN201410306448A CN104050291B CN 104050291 B CN104050291 B CN 104050291B CN 201410306448 A CN201410306448 A CN 201410306448A CN 104050291 B CN104050291 B CN 104050291B
- Authority
- CN
- China
- Prior art keywords
- remaining sum
- task
- account
- record
- output parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012545 processing Methods 0.000 title claims abstract description 74
- 238000000034 method Methods 0.000 title claims abstract description 29
- 239000012634 fragment Substances 0.000 claims abstract description 24
- 241001269238 Data Species 0.000 claims abstract description 13
- 230000002354 daily effect Effects 0.000 claims description 142
- 238000013507 mapping Methods 0.000 claims description 52
- 230000003203 everyday effect Effects 0.000 claims description 28
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000009826 distribution Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003860 storage Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2219—Large Object storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24532—Query optimisation of parallel queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of method for parallel processing of account balance data, this method includes:One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, generates the first output parameter and the second output parameter of each remaining sum record in read fragment data;Wherein, first output parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence number;The Reduce nodes that one or more performs first task read the different remaining sums record that the Map node processings for performing first task finish, and the average daily remaining sum value that the first output parameter and the second output parameter recorded according to the remaining sum generates each account respectively records;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes.The present invention can be under quick obtaining big data quantity the average daily remaining sum of account statistical result.The invention also discloses a kind of parallel processing system (PPS) of account balance data.
Description
Technical field
The present invention relates to the account balance data under big data processing technology field, more particularly to a kind of big data quantity
Method for parallel processing and system.
Background technology
Data be nearly all business activities such as enterprise's production, operation, strategy rely on, indispensable information.Number
The problem of according to just just as the eyes of enterprise operator, operation can be reflected by data, just navigated just as steersman relies on
Equally.As human society enters the information age comprehensively, data are even more to turn into the strategic resource of equal importance with water, oil.Mesh
Preceding enterprise is faced with the extensive growth of data volume.For example, address prediction nearest IDC claims, to the year two thousand twenty, global metadata amount will
Expand 50 times.At present, the scale of big data is still a continually changing index, and the size range of single data set is from tens TB
To several PB.In addition, various unexpected sources can produce data.
Traditional business data have possessed the form of standard with time-evolution, can be identified by the business intelligence software of standard.
Traditional business datum is compared, big data has sandwich construction, it means that big data can show changeable form and type.
Because big data is present irregularly with ambiguous characteristic, cause to be difficult that using traditional application software can not even be divided
Analysis.
At present, enterprise's facing challenges are the tap values from various forms of complex datas.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of method for parallel processing and system of account balance data, soon
Speed obtains the statistical result of the average daily remaining sum of account under big data quantity.
In order to solve the above-mentioned technical problem, the invention provides a kind of method for parallel processing of account balance data, the party
Method includes:
One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, raw
The first output parameter and the second output parameter of each remaining sum record into the fragment data read;Wherein, described first
Output parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information is extremely
Include less:Remaining sum value, trade date and daylight trading sequence number;
The Map node processings that one or more performs the Reduce nodes reading execution first task of first task are complete
Complete different remaining sums record, the first output parameter and the second output parameter recorded according to the remaining sum generate each account respectively
Average daily remaining sum value record;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes.
Further, this method also includes following features:
First output parameter recorded according to the remaining sum and the second output parameter generate the average daily of each account respectively
Remaining sum value records, including:
According to the account ID in first output parameter, each bar remaining sum for traveling through same account records, according to described remaining
Second output parameter of volume record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by every day
Remaining sum value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining of the account
Volume value records.
Further, this method also includes following features:
It is defeated in the first output parameter and second that the Reduce nodes of the execution first task record according to the remaining sum
After going out the average daily remaining sum value record that parameter generates each account respectively, in addition to:
One or more performs the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, and generation is read
The first output parameter and the second output parameter of the average daily remaining sum value record taken;Wherein, the first of the average daily remaining sum value record
Output parameter is set as the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as
1;
The Map node processings that one or more performs Reduce nodes reading the second task of the execution of the second task are complete
The average daily remaining sum value record of complete difference, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count
The account number in each average daily remaining sum value section, including:It is same according to the average daily remaining sum value section in first output parameter, traversal
The average daily remaining sum value record of each bar in average daily remaining sum value section, the second output parameter of each average daily remaining sum value record is tired out
Add, obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same
Reduce nodes are read.
Further, this method also includes following features:
The second output parameter recorded according to the remaining sum determines the every day of the account in the range of the inquiry beginning and ending time
Remaining sum value, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum,
As existed, using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, such as it is not present, traces
Earlier than the same day and with the nearest date of remaining sum record, the maximum remaining sum of the transaction sequence number of described that day on date recently is recorded
Resulting balance value of the remaining sum value as the same day.
Further, this method also includes following features:
The Map node processings for performing first task are read in one or more Reduce nodes for performing first task
Before the different remaining sums record finished, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, the cryptographic Hash of the first parameter is established and performs the with described
The mapping relations of the Reduce nodes of one task;Wherein, the mapping relations are used for the Reduce for the execution first task
Node remaining sum record according to corresponding to being read the mapping relations.
Further, this method also includes following features:
The Map node processings for performing the second task are read in one or more Reduce nodes for performing the second task
Before the average daily remaining sum value record of difference finished, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes cryptographic Hash and the execution of the first parameter
The mapping relations of the Reduce nodes of second task;Wherein, the mapping relations are used for for the second task of the execution
Reduce nodes average daily remaining sum value record according to corresponding to being read the mapping relations.
Further, this method also includes following features:
One or more Map nodes for performing first tasks read account balance detailed datas different fragment datas it
Before, in addition to:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail number
According to the read range for being defined as account balance detailed data with the increment balance detail data on the day of inquiry terminates the time;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task
Map nodes mapping relations;Wherein, the mapping relations are used for the Map nodes for the execution first task according to
Fragment data corresponding to mapping relations reading.
In order to solve the above-mentioned technical problem, should present invention also offers a kind of parallel processing system (PPS) of account balance data
System includes:
Map processing modules, including one or more Map nodes for performing first task;Each Map sections for performing first task
Point is used for the different fragment datas for reading account balance detailed data, and each remaining sum records in the read fragment data of generation
The first output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output
Parameter setting is account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence
Number;
Reduce processing modules, including one or more Reduce nodes for performing first task;It is each to perform first task
Reduce nodes be used for read it is described execution first task Map node processings finish different remaining sums record, according to described
The first output parameter and the second output parameter of remaining sum record generate the average daily remaining sum value record of each account respectively;Wherein, first
Output parameter identical remaining sum record is read by same Reduce nodes.
Further, the system also includes following features:
The Reduce nodes for performing first task are used for the first output parameter and second recorded according to the remaining sum
Output parameter generates the average daily remaining sum value record of each account respectively, including:According to the account ID in first output parameter, time
Each bar remaining sum record of same account is gone through, determines the account when inquiring about start-stop according to the second output parameter that the remaining sum records
Between in the range of every day remaining sum value, by the remaining sum value of every day it is described inquiry the beginning and ending time in the range of be averaged this
The average daily remaining sum value of account, generate the average daily remaining sum value record of the account.
Further, the system also includes following features:
The Map processing modules also include one or more Map nodes for performing the second task, and the Reduce handles mould
Block also includes one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, what generation was read
The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record
Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
Each Reduce nodes for performing the second task are used to read what the Map node processings for performing the second task finished
Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day
The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily
The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up,
Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same
Reduce nodes are read.
Further, the system also includes following features:
The second output parameter that the Reduce nodes for performing first task are used to be recorded according to the remaining sum determines should
Account inquiry the beginning and ending time in the range of every day remaining sum value, including:Terminated from the same day of inquiry initial time to inquiry
On the same day of time, judge that every day records with the presence or absence of remaining sum, such as exist, more than the maximum remaining sum record of day trade transaction sequence number
Resulting balance value of the volume value as the same day, is such as not present, and traces earlier than the same day and with the nearest date of remaining sum record, by described in
Resulting balance value of the remaining sum value of the remaining sum record of the transaction sequence number maximum of nearest that day on date as the same day.
Further, the system also includes following features:
The Reduce processing modules also include first task routing module:
The first task routing module, described in being read in one or more Reduce nodes for performing first task
Before the different remaining sums record that the Map node processings of execution first task finish, the first parameter of each remaining sum record is calculated
Cryptographic Hash, establish the cryptographic Hash of the first parameter and the mapping relations of the Reduce nodes of the execution first task;Wherein, institute
State mapping relations be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum remember
Record.
Further, the system also includes following features:
The Reduce processing modules also include the second task routing module:
The second task routing module, described in being read in one or more Reduce nodes for performing the second task
Before performing the average daily remaining sum values record of difference that the Map node processings of the second task finish, calculate each average daily remaining sum value and remember
The cryptographic Hash of first parameter of record, the mapping for establishing Reduce node of the cryptographic Hash of the first parameter with performing the second task are closed
System;Wherein, the mapping relations are used for the Reduce nodes for the second task of the execution according to mapping relations reading pair
The average daily remaining sum value record answered.
Further, the system also includes following features:The Map processing modules also include burst module:
The burst module, for reading account balance detail number in one or more Map nodes for performing first task
According to different fragment datas before, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:Will be complete
Amount balance detail data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detail number
According to read range;The account balance detailed data burst that will belong in the read range, each burst is established with performing the
The mapping relations of the Map nodes of one task;Wherein, the mapping relations are used for the Map node roots for the execution first task
According to fragment data corresponding to mapping relations reading.
Compared with prior art, the method for parallel processing and system of a kind of account balance data provided by the invention, is based on
Large-scale account balance detailed data is divided into several pieces and gives the processing of Map nodal parallels, Map stage logarithms by MapReduce
It is classified according to according to account, is grouped after the completion of processing according to account id and is routed to multiple Reduce nodal parallels processing,
So as to the statistical result of the average daily remaining sum of account under quick obtaining big data quantity, treatment effeciency is high, scalability is strong.
Brief description of the drawings
Fig. 1 obtains the average daily remaining of each user in the method for parallel processing for a kind of account balance data of the embodiment of the present invention
The flow chart of volume value.
Fig. 2 is that each average daily remaining sum value area is counted in a kind of method for parallel processing of account balance data of the embodiment of the present invention
Between account number flow chart.
Fig. 3 is a kind of structural representation of the parallel processing system (PPS) of account balance data of the embodiment of the present invention.
Fig. 4 is processing framework schematic diagram of the present invention using the account balance data based on MapReduce of example.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with accompanying drawing to the present invention
Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application
Feature can mutually be combined.
As shown in figure 1, the embodiments of the invention provide a kind of method for parallel processing of account balance data, this method bag
Include:
S10, one or more perform the different burst numbers of the Map nodes reading account balance detailed data of first task
According to the first output parameter and the second output parameter of each remaining sum record in the read fragment data of generation;Wherein, it is described
First output parameter comprises at least account ID, and second output parameter is set as account status information, the account status letter
Breath comprises at least:Remaining sum value, trade date and daylight trading sequence number;
S20, the Reduce nodes that one or more performs first task are read at the Map nodes for performing first task
The different remaining sums record finished is managed, the first output parameter and the second output parameter recorded according to the remaining sum generates each account respectively
The average daily remaining sum value record at family;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes;
This method can also include following features:
Preferably, the difference point of account balance detailed data is read in one or more Map nodes for performing first task
Before sheet data, in addition to:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail number
According to the read range for being defined as account balance detailed data with the increment balance detail data on the day of inquiry terminates the time;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task
Map nodes mapping relations;Wherein, the mapping relations are used for the Map nodes for the execution first task according to
Fragment data corresponding to mapping relations reading.
Preferably, first output parameter recorded according to the remaining sum and the second output parameter generate each account respectively
Average daily remaining sum value record, including:
According to the account ID in first output parameter, each bar remaining sum for traveling through same account records, according to described remaining
Second output parameter of volume record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by every day
Remaining sum value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining of the account
Volume value records.
Preferably, the second output parameter recorded according to the remaining sum determines the account in the range of the inquiry beginning and ending time
The remaining sum value of every day, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum,
As existed, using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, such as it is not present, traces
Earlier than the same day and with the nearest date of remaining sum record, the maximum remaining sum of the transaction sequence number of described that day on date recently is recorded
Resulting balance value of the remaining sum value as the same day.
Preferably, the Map for performing first task is read in one or more Reduce nodes for performing first task
Before the different remaining sums record that node processing finishes, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, the cryptographic Hash of the first parameter is established and performs the with described
The mapping relations of the Reduce nodes of one task;Wherein, the mapping relations are used for the Reduce for the execution first task
Node remaining sum record according to corresponding to being read the mapping relations.
Preferably, the cryptographic Hash of the first parameter of each remaining sum record is to performing first task by first parameter
Reduce node total number modulus;
Preferably, as shown in Fig. 2 also including after step S20:
S30, it is determined that the section where the average daily remaining sum value of each account, counts the account number in each section;
Preferably, it is determined that section where the average daily remaining sum value of each account, counts the account number in each section, wrap
Include:
S301, one or more perform the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, generation
The first output parameter and the second output parameter of the average daily remaining sum value record read;Wherein, the average daily remaining sum value records
First output parameter is set as the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set
It is set to 1;
S302, the Reduce nodes that one or more performs the second task read the Map nodes for performing the second task
The average daily remaining sum value record of difference being disposed, the first output parameter and the second output recorded according to the average daily remaining sum value are joined
The account number in each average daily remaining sum value section of number statistics, including:According to the average daily remaining sum value section in first output parameter, time
The average daily remaining sum value record of each bar in same average daily remaining sum value section is gone through, by the second output parameter of each average daily remaining sum value record
Added up, obtain the account number in the average daily remaining sum value section;Wherein, the first output parameter identical average daily remaining sum value record by
Same Reduce nodes are read.
Preferably, the Map for performing the second task is read in one or more Reduce nodes for performing the second task
Before the average daily remaining sum value record of difference that node processing finishes, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes cryptographic Hash and the execution of the first parameter
The mapping relations of the Reduce nodes of second task;Wherein, the mapping relations are used for for the second task of the execution
Reduce nodes average daily remaining sum value record according to corresponding to being read the mapping relations.
Preferably, the cryptographic Hash of the first parameter of each article of average daily remaining sum value record be will first parameter to execution the
The Reduce node total number modulus of two tasks;
Wherein, the Map nodes for performing the second task are not with a collection of node or not with the Map nodes for performing first task
With the node criticized, that is, Map nodes after first task has been performed, can just perform the second task.Similarly, it is described to perform the
The Reduce nodes of two tasks from perform first task Reduce nodes be with a collection of node or the node of different batches, that is,
Reduce nodes can just perform the second task after first task has been performed.
As shown in figure 3, the embodiments of the invention provide a kind of parallel processing system (PPS) of account balance data, the system bag
Include:
Map processing modules, including one or more Map nodes for performing first task;Each Map sections for performing first task
Point is used for the different fragment datas for reading account balance detailed data, and each remaining sum records in the read fragment data of generation
The first output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output
Parameter setting is account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence
Number;
Reduce processing modules, including one or more Reduce nodes for performing first task;It is each to perform first task
Reduce nodes be used for read it is described execution first task Map node processings finish different remaining sums record, according to described
The first output parameter and the second output parameter of remaining sum record generate the average daily remaining sum value record of each account respectively;Wherein, first
Output parameter identical remaining sum record is read by same Reduce nodes.
The system can also include following features:
Preferably, the Reduce nodes for performing first task are used for the first output parameter recorded according to the remaining sum
Generate the average daily remaining sum value record of each account respectively with the second output parameter, including:According to the account in first output parameter
Family ID, each bar remaining sum record of same account is traveled through, determines that the account is looking into according to the second output parameter that the remaining sum records
The remaining sum value of every day in the range of the beginning and ending time is ask, the remaining sum value of every day is made even in the range of the inquiry beginning and ending time
The average daily remaining sum value of the account is obtained, generates the average daily remaining sum value record of the account.
Preferably, the Map processing modules also include one or more Map nodes for performing the second task, described
Reduce processing modules also include one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, what generation was read
The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record
Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
Each Reduce nodes for performing the second task are used to read what the Map node processings for performing the second task finished
Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day
The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily
The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up,
Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same
Reduce nodes are read.
Preferably, the Reduce nodes for performing first task are used for the second output parameter recorded according to the remaining sum
The remaining sum value of the every day of the account in the range of the inquiry beginning and ending time is determined, including:From same day of inquiry initial time to looking into
The same day for terminating the time is ask, judges that every day records with the presence or absence of remaining sum, such as exists, the maximum remaining sum of day trade transaction sequence number is remembered
Resulting balance value of the remaining sum value of record as the same day, is such as not present, the nearest date for tracing earlier than the same day and being recorded with remaining sum,
Resulting balance value using the remaining sum value that the maximum remaining sum of the transaction sequence number of described that day on date recently records as the same day.
Preferably, the Reduce processing modules also include first task routing module:
The first task routing module, described in being read in one or more Reduce nodes for performing first task
Before the different remaining sums record that the Map node processings of execution first task finish, the first parameter of each remaining sum record is calculated
Cryptographic Hash, establish the cryptographic Hash of the first parameter and the mapping relations of the Reduce nodes of the execution first task;Wherein, institute
State mapping relations be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum remember
Record.
Preferably, the Reduce processing modules also include the second task routing module:
The second task routing module, described in being read in one or more Reduce nodes for performing the second task
Before performing the average daily remaining sum values record of difference that the Map node processings of the second task finish, calculate each average daily remaining sum value and remember
The cryptographic Hash of first parameter of record, the mapping for establishing Reduce node of the cryptographic Hash of the first parameter with performing the second task are closed
System;Wherein, the mapping relations are used for the Reduce nodes for the second task of the execution according to mapping relations reading pair
The average daily remaining sum value record answered.
Preferably, the Map processing modules also include burst module:
The burst module, for reading account balance detail number in one or more Map nodes for performing first task
According to different fragment datas before, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:Will be complete
Amount balance detail data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detail number
According to read range;The account balance detailed data burst that will belong in the read range, each burst is established with performing the
The mapping relations of the Map nodes of one task;Wherein, the mapping relations are used for the Map node roots for the execution first task
According to fragment data corresponding to mapping relations reading.
Using example
An example application is given below:Count the average daily remaining sum in each 1 day January in account this year to January 8, and each account
The distributed area of the average daily remaining sum in family.(assuming that there is two accounts:It is id001 and id002 respectively)
The original detailed data of two accounts is as shown in table 1, including the part full dose data of last year (2013) and the present
The incremental data in (2014) January 1 to January 8 in year.
Date | Account ID | Remaining sum (member) | Sequence number |
20140101 | 002 | 20 | 1 |
20140102 | 002 | 10 | 1 |
20140102 | 002 | 50 | 2 |
20140103 | 002 | 30 | 1 |
20140103 | 002 | 15 | 2 |
20140106 | 002 | 40 | 1 |
20140106 | 002 | 50 | 2 |
20140106 | 002 | 60 | 3 |
20140108 | 002 | 30 | 1 |
20131230 | 002 | 90 | 1 |
20140106 | 001 | 60 | 1 |
20140108 | 001 | 30 | 1 |
20131225 | 001 | 90 | 1 |
Table 1
As shown in figure 4, starting 2 tasks based on MapReduce, parallel processing and calculating are carried out to account information.
The large-scale data file of input is divided into some bursts and gives the processing of Map nodal parallels, Map ranks by first task
Section by data according to<Account id, accounts information > form are exported, and step operation is classified pre- to the data of magnanimity
Processing, merging treatment in same Reduce nodes is directed to by the detailed data of same account, such as, according to account id Hash
Hash value (being directed to Reduce number of nodes modulus) is grouped and is routed to multiple Reduce parallel processings;The Reduce stages are to Map
The detailed data of every account of stage output is handled, including:
A) delta file on the day of terminating the time to inquiry from last year full dose file is read;
B) the Map stages classify to all data, output<Account id, account details data message >
C) the Reduce stages select detailed data in same account id and carry out tissue and processing, determine that each account exists
Initial time is inquired about to the remaining sum value of every day in the measurement period for terminating the time, then calculates each account in statistics week
Average daily remaining sum in phase.
Account id001 and account id002 all data pull processing by two Reduce nodes respectively.To each account
Family, Reduce nodes judge that every day whether there is remaining sum from the same day for inquiring about initial time to the same day of inquiry termination time
Record, such as exist, the resulting balance value using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the same day, if do not deposited
Tracing earlier than the same day and with the nearest date of remaining sum record, by the transaction sequence number maximum of described that day on date recently
Resulting balance value of the remaining sum value of remaining sum record as the same day.
Such as on the day of inquiry initial time (January 1 this year), 002 account surplus record, then by 002 account 2014
Year January 1 Day Trading serial number 1 remaining sum record remaining sum value " 20 yuan " as 002 account in the remaining sum on January 1st, 2014
Value;001 account in sunset surplus record January 1 in 2014, then by the full dose data of 2013 apart from nearest one in this year
" 90 yuan " of the remaining sum value of remaining sum record (on December 25th, 2013, the remaining sum record of transaction serial number 1) is as 001 account 2014
The remaining sum value on January 1, in.
To 002 account, 002 account that the first Reduce nodes determine is inquiring about the remaining sum of every day in the range of the beginning and ending time
Value, as shown in table 2;
Query Dates | Remaining sum value (member) |
20140101 | 20 |
20140102 | 50 |
20140103 | 15 |
20140104 | 15 |
20140105 | 40 |
20140106 | 60 |
20140107 | 60 |
20140108 | 30 |
Table 2
To 001 account, 002 account that the first Reduce nodes determine is inquiring about the remaining sum of every day in the range of the beginning and ending time
Value, as shown in table 3;
Date | Remaining sum value |
20140101 | 90 |
20140102 | 90 |
20140103 | 90 |
20140104 | 90 |
20140105 | 90 |
20140106 | 60 |
20140107 | 60 |
20140108 | 30 |
Table 3
Average daily remaining sum of the account 002 by the end of on January 8th, 2014 is calculated as follows:(20+50+15+15+40+60+60+
30)/8=36.25 (member);
Average daily remaining sum of the account 001 by the end of on January 8th, 2014 is calculated as follows:(90+90+90+90+90+60+60+
30)/8=75;
Second task counts the distribution situation of the average daily remaining sum value of each account according to the result of calculation of first task.
The average daily remaining sum exported according to first task, average daily remaining sum is sentenced in the Map stages of second task
It is disconnected.For example, setting section [0,15), [15,50), [50,100].Then the affiliated section of average daily remaining sum of account 002 for [15,50),
The affiliated section of average daily remaining sum of account 001 is [50,100].To account 002, with section [15,50) be used as the first output parameter
(key), 1 as the second output parameter (value);To account 001, the first output parameter (key) is used as with section [50,100],
1 as the second output parameter (value).At the Reduce ends of second task, the value in each section is added up
And export, the as a result account distribution number in as each section.
The method for parallel processing and system for a kind of account balance data that above-described embodiment provides, will based on MapReduce
Large-scale account balance detailed data is divided into several pieces and gives the processing of Map nodal parallels, and the Map stages enter to data according to account
Classification is gone, has been grouped after the completion of processing according to account id and is routed to multiple Reduce nodal parallels processing, so as to quick obtaining
The statistical result of the average daily remaining sum of account under big data quantity, treatment effeciency is high, scalability is strong.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program
Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD
Deng.Alternatively, all or part of step of above-described embodiment can also be realized using one or more integrated circuits, accordingly
Ground, each module/unit in above-described embodiment can be realized in the form of hardware, can also use the shape of software function module
Formula is realized.The present invention is not restricted to the combination of the hardware and software of any particular form.
It should be noted that the present invention can also have other various embodiments, without departing substantially from of the invention spiritual and its essence
In the case of, those skilled in the art can make various corresponding changes and deformation according to the present invention, but these are corresponding
Change and deform the protection domain that should all belong to appended claims of the invention.
Claims (12)
1. a kind of method for parallel processing of account balance data, this method include:
One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, generates institute
The first output parameter and the second output parameter of each remaining sum record in the fragment data of reading;Wherein, first output
Parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information is at least wrapped
Include:Remaining sum value, trade date and daylight trading sequence number;
The Reduce nodes that one or more performs first task read what the Map node processings for performing first task finished
Different remaining sum records, the first output parameter and the second output parameter recorded according to the remaining sum generate the average daily of each account respectively
Remaining sum value records;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes;
The first output parameter recorded in the Reduce nodes of the execution first task according to the remaining sum and the second output ginseng
After number generates the average daily remaining sum value record of each account respectively, in addition to:
One or more performs the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, what generation was read
The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record
Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
The Reduce nodes that one or more performs the second task read what the Map node processings for performing the second task finished
Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day
The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily
The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up,
Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same
Reduce nodes are read.
2. the method as described in claim 1, it is characterised in that:
First output parameter recorded according to the remaining sum and the second output parameter generate the average daily remaining sum of each account respectively
Value record, including:
According to the account ID in first output parameter, each bar remaining sum record of same account is traveled through, is remembered according to the remaining sum
Second output parameter of record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by the remaining sum of every day
Value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining sum value of the account
Record.
3. method as claimed in claim 2, it is characterised in that:
According to the remaining sum record the second output parameter determine the account inquiry the beginning and ending time in the range of more than every day
Volume value, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum, such as deposit
, using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, be such as not present, trace earlier than
The same day and the nearest date recorded with remaining sum, more than the maximum remaining sum record of the transaction sequence number of described that day on date recently
Resulting balance value of the volume value as the same day.
4. the method as described in claim 1, it is characterised in that:
The Map node processings for performing first task are read in one or more Reduce nodes for performing first task to finish
Different remaining sums record before, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, cryptographic Hash and the execution for establishing the first parameter are first
The mapping relations of the Reduce nodes of business;Wherein, the mapping relations are used for the Reduce nodes for the execution first task
The remaining sum record according to corresponding to being read the mapping relations.
5. the method as described in claim 1, it is characterised in that:
The Map node processings for performing the second task are read in one or more Reduce nodes for performing the second task to finish
The average daily remaining sum values record of difference before, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes the cryptographic Hash of the first parameter with performing second
The mapping relations of the Reduce nodes of task;Wherein, the mapping relations are used for the Reduce sections for the second task of the execution
Point average daily remaining sum value record according to corresponding to being read the mapping relations.
6. the method as described in claim 1, it is characterised in that:
Before one or more Map nodes for performing first task read the different fragment datas of account balance detailed data,
Also include:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail data and
Increment balance detail data on the day of inquiry terminates the time are defined as the read range of account balance detailed data;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task
The mapping relations of Map nodes;Wherein, the mapping relations are used to reflect according to for the Map nodes of the execution first task
Penetrate fragment data corresponding to relation reading.
7. a kind of parallel processing system (PPS) of account balance data, the system include:
Map processing modules, including one or more Map nodes for performing first task;Each Map nodes for performing first task are used
The of each article of remaining sum record in the different fragment datas for reading account balance detailed data, the read fragment data of generation
One output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output parameter
It is set as account status information, the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence number;
Reduce processing modules, including one or more Reduce nodes for performing first task;Each execution first task
Reduce nodes are used to read the different remaining sums record that the Map node processings of the execution first task finish, according to described remaining
The first output parameter and the second output parameter of volume record generate the average daily remaining sum value record of each account respectively;Wherein, first is defeated
Go out parameter identical remaining sum record to be read by same Reduce nodes;
The Map processing modules also include one or more Map nodes for performing the second task, and the Reduce processing modules are also
Including one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, and generation is read average daily
The first output parameter and the second output parameter of remaining sum value record;Wherein, the first output parameter of the average daily remaining sum value record
The section being set as where the average daily remaining sum value, the second output parameter of the average daily remaining sum value record are set as 1;
Each Reduce nodes for performing the second task are used to read the difference that the Map node processings of the second task of the execution finish
Average daily remaining sum value record, the first output parameter and the second output parameter statistics that are recorded according to the average daily remaining sum value it is each it is average daily more than
The account number in volume value section, including:According to the average daily remaining sum value section in first output parameter, same average daily remaining sum is traveled through
It is worth the average daily remaining sum value record of each bar in section, the second output parameter of each average daily remaining sum value record is added up, obtained
The account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce
Node is read.
8. system as claimed in claim 7, it is characterised in that:
The Reduce nodes for performing first task are used for the first output parameter recorded according to the remaining sum and the second output
Parameter generates the average daily remaining sum value record of each account respectively, including:It is same according to the account ID in first output parameter, traversal
Each bar remaining sum record of one account, the second output parameter recorded according to the remaining sum determine the account in inquiry beginning and ending time model
The remaining sum value of every day in enclosing, the remaining sum value of every day is averaged to obtain the account in the range of the inquiry beginning and ending time
Average daily remaining sum value, generate the account average daily remaining sum value record.
9. system as claimed in claim 8, it is characterised in that:
The Reduce nodes for performing first task are used to determine the account according to the second output parameter that the remaining sum records
The remaining sum value of every day in the range of the inquiry beginning and ending time, including:The time is terminated from the same day of inquiry initial time to inquiry
The same day, judge that every day records with the presence or absence of remaining sum, such as exist, the remaining sum value that the maximum remaining sum of day trade transaction sequence number is recorded
As the resulting balance value on the same day, such as it is not present, traces earlier than the same day and with the nearest date of remaining sum record, will be described nearest
Resulting balance value of the remaining sum value of the maximum remaining sum record of the transaction sequence number of that day on date as the same day.
10. system as claimed in claim 7, it is characterised in that the Reduce processing modules also include first task and route
Module:
The first task routing module, for reading the execution in one or more Reduce nodes for performing first task
Before the different remaining sums record that the Map node processings of first task finish, the Kazakhstan of the first parameter of each remaining sum record is calculated
Uncommon value, establish the cryptographic Hash and the mapping relations of the Reduce nodes of the execution first task of the first parameter;Wherein, it is described to reflect
Penetrate relation be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum record.
11. system as claimed in claim 7, it is characterised in that the Reduce processing modules also include the second task and route
Module:
The second task routing module, for reading the execution in one or more Reduce nodes for performing the second task
Before the average daily remaining sum value record of difference that the Map node processings of second task finish, calculate what each average daily remaining sum value recorded
The cryptographic Hash of first parameter, establish the mapping relations of Reduce node of the cryptographic Hash of the first parameter with performing the second task;Its
In, the mapping relations are used to supply the Reduce nodes for performing the second task day according to corresponding to being read the mapping relations
Equal remaining sum value record.
12. system as claimed in claim 7, it is characterised in that the Map processing modules also include burst module:
The burst module, for reading account balance detailed data in one or more Map nodes for performing first task
Before different fragment datas, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:More than full dose
Volume detailed data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detailed data
Read range;The account balance detailed data burst that will belong in the read range, it is first with performing to establish each burst
The mapping relations of the Map nodes of business;Wherein, the mapping relations are used for the Map nodes for the execution first task according to institute
State fragment data corresponding to mapping relations reading.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410306448.6A CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410306448.6A CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104050291A CN104050291A (en) | 2014-09-17 |
CN104050291B true CN104050291B (en) | 2017-11-10 |
Family
ID=51503123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410306448.6A Active CN104050291B (en) | 2014-06-30 | 2014-06-30 | A kind of method for parallel processing and system of account balance data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104050291B (en) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105740063A (en) * | 2014-12-08 | 2016-07-06 | 杭州华为数字技术有限公司 | Data processing method and apparatus |
CN106022901A (en) * | 2015-03-17 | 2016-10-12 | 阿里巴巴集团控股有限公司 | Data processing method and device |
CN107357679A (en) * | 2016-05-10 | 2017-11-17 | 银联数据服务有限公司 | A kind of backup method and device |
CN110659265B (en) * | 2019-09-27 | 2020-11-24 | 广州峻林互联科技有限公司 | Distributed parallel database resource management method |
CN111680080A (en) * | 2020-04-16 | 2020-09-18 | 中邮消费金融有限公司 | Data processing method and data processing system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101799808A (en) * | 2009-02-10 | 2010-08-11 | 中国移动通信集团公司 | Data processing method and system thereof |
CN102467570A (en) * | 2010-11-17 | 2012-05-23 | 日电(中国)有限公司 | Connection query system and method for distributed data warehouse |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100162230A1 (en) * | 2008-12-24 | 2010-06-24 | Yahoo! Inc. | Distributed computing system for large-scale data handling |
-
2014
- 2014-06-30 CN CN201410306448.6A patent/CN104050291B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101799808A (en) * | 2009-02-10 | 2010-08-11 | 中国移动通信集团公司 | Data processing method and system thereof |
CN102467570A (en) * | 2010-11-17 | 2012-05-23 | 日电(中国)有限公司 | Connection query system and method for distributed data warehouse |
Non-Patent Citations (1)
Title |
---|
基于OGSA的网格记帐系统的研究与实现;阿都建华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20061215(第12期);42-43 * |
Also Published As
Publication number | Publication date |
---|---|
CN104050291A (en) | 2014-09-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104050291B (en) | A kind of method for parallel processing and system of account balance data | |
CN105446991B (en) | Date storage method, querying method and equipment | |
US20210326815A1 (en) | Information storage and retrieval using an off-chain isomorphic database and a distributed ledger | |
CN103853820B (en) | Data processing method and data processing system | |
CN106227800B (en) | Storage method and management system for highly-associated big data | |
CN104123374B (en) | The method and device of aggregate query in distributed data base | |
CN104317789B (en) | The method for building passenger social network | |
CN103678590B (en) | Report collecting device and report collecting method based on OLAP | |
CN106844477B (en) | To synchronous method after block catenary system, block lookup method and block chain | |
CN108733713A (en) | Data query method and device in data warehouse | |
CN103226618B (en) | The related term extracting method excavated based on Data Mart and system | |
CN105989129A (en) | Real-time data statistic method and device | |
CN110023925A (en) | It generates, access and display follow metadata | |
CN107798038A (en) | Data response method and data response apparatus | |
CN105843860B (en) | A kind of microblogging concern recommended method based on parallel item-based collaborative filtering | |
CN105204920B (en) | A kind of implementation method and device of the distributed computing operation based on mapping polymerization | |
CN108229728A (en) | A kind of recommendation method of information of freight source and a kind of computer equipment | |
CN107870949A (en) | Data analysis job dependence relation generation method and system | |
CN111382181A (en) | Designated enterprise family affiliation analysis method and system based on stock right penetration | |
CN104036039A (en) | Parallel processing method and system of data | |
CN103995886B (en) | A kind of various dimensions product-design knowledge pushes framework and construction method | |
CN102208061A (en) | Data cancel after verification processing device and method | |
CN105468728B (en) | A kind of method and system obtaining cross-section data | |
CN106095952A (en) | In space-time unique based on key assignments cloud storage, magnanimity crosses car record method for quickly querying | |
CN108470251A (en) | Community based on Average Mutual divides quality evaluating method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20180817 Address after: 200436 Room 411, No. three, JIANGCHANG Road, Jingan District, Shanghai, 411 Patentee after: Shanghai wave Cloud Computing Service Co., Ltd. Address before: 100085 floor 1, C 2-1, No. 2, Shang Di Road, Haidian District, Beijing. Patentee before: Electronic information industry Co.,Ltd of the tide (Beijing) |