CN104050291B - A kind of method for parallel processing and system of account balance data - Google Patents

A kind of method for parallel processing and system of account balance data Download PDF

Info

Publication number
CN104050291B
CN104050291B CN201410306448.6A CN201410306448A CN104050291B CN 104050291 B CN104050291 B CN 104050291B CN 201410306448 A CN201410306448 A CN 201410306448A CN 104050291 B CN104050291 B CN 104050291B
Authority
CN
China
Prior art keywords
remaining sum
task
account
record
output parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410306448.6A
Other languages
Chinese (zh)
Other versions
CN104050291A (en
Inventor
赵仁明
辛国茂
亓开元
房体盈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Wave Cloud Computing Service Co Ltd
Original Assignee
Inspur Beijing Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Beijing Electronic Information Industry Co Ltd filed Critical Inspur Beijing Electronic Information Industry Co Ltd
Priority to CN201410306448.6A priority Critical patent/CN104050291B/en
Publication of CN104050291A publication Critical patent/CN104050291A/en
Application granted granted Critical
Publication of CN104050291B publication Critical patent/CN104050291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of method for parallel processing of account balance data, this method includes:One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, generates the first output parameter and the second output parameter of each remaining sum record in read fragment data;Wherein, first output parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence number;The Reduce nodes that one or more performs first task read the different remaining sums record that the Map node processings for performing first task finish, and the average daily remaining sum value that the first output parameter and the second output parameter recorded according to the remaining sum generates each account respectively records;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes.The present invention can be under quick obtaining big data quantity the average daily remaining sum of account statistical result.The invention also discloses a kind of parallel processing system (PPS) of account balance data.

Description

A kind of method for parallel processing and system of account balance data
Technical field
The present invention relates to the account balance data under big data processing technology field, more particularly to a kind of big data quantity Method for parallel processing and system.
Background technology
Data be nearly all business activities such as enterprise's production, operation, strategy rely on, indispensable information.Number The problem of according to just just as the eyes of enterprise operator, operation can be reflected by data, just navigated just as steersman relies on Equally.As human society enters the information age comprehensively, data are even more to turn into the strategic resource of equal importance with water, oil.Mesh Preceding enterprise is faced with the extensive growth of data volume.For example, address prediction nearest IDC claims, to the year two thousand twenty, global metadata amount will Expand 50 times.At present, the scale of big data is still a continually changing index, and the size range of single data set is from tens TB To several PB.In addition, various unexpected sources can produce data.
Traditional business data have possessed the form of standard with time-evolution, can be identified by the business intelligence software of standard. Traditional business datum is compared, big data has sandwich construction, it means that big data can show changeable form and type. Because big data is present irregularly with ambiguous characteristic, cause to be difficult that using traditional application software can not even be divided Analysis.
At present, enterprise's facing challenges are the tap values from various forms of complex datas.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of method for parallel processing and system of account balance data, soon Speed obtains the statistical result of the average daily remaining sum of account under big data quantity.
In order to solve the above-mentioned technical problem, the invention provides a kind of method for parallel processing of account balance data, the party Method includes:
One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, raw The first output parameter and the second output parameter of each remaining sum record into the fragment data read;Wherein, described first Output parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information is extremely Include less:Remaining sum value, trade date and daylight trading sequence number;
The Map node processings that one or more performs the Reduce nodes reading execution first task of first task are complete Complete different remaining sums record, the first output parameter and the second output parameter recorded according to the remaining sum generate each account respectively Average daily remaining sum value record;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes.
Further, this method also includes following features:
First output parameter recorded according to the remaining sum and the second output parameter generate the average daily of each account respectively Remaining sum value records, including:
According to the account ID in first output parameter, each bar remaining sum for traveling through same account records, according to described remaining Second output parameter of volume record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by every day Remaining sum value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining of the account Volume value records.
Further, this method also includes following features:
It is defeated in the first output parameter and second that the Reduce nodes of the execution first task record according to the remaining sum After going out the average daily remaining sum value record that parameter generates each account respectively, in addition to:
One or more performs the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, and generation is read The first output parameter and the second output parameter of the average daily remaining sum value record taken;Wherein, the first of the average daily remaining sum value record Output parameter is set as the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
The Map node processings that one or more performs Reduce nodes reading the second task of the execution of the second task are complete The average daily remaining sum value record of complete difference, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count The account number in each average daily remaining sum value section, including:It is same according to the average daily remaining sum value section in first output parameter, traversal The average daily remaining sum value record of each bar in average daily remaining sum value section, the second output parameter of each average daily remaining sum value record is tired out Add, obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce nodes are read.
Further, this method also includes following features:
The second output parameter recorded according to the remaining sum determines the every day of the account in the range of the inquiry beginning and ending time Remaining sum value, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum, As existed, using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, such as it is not present, traces Earlier than the same day and with the nearest date of remaining sum record, the maximum remaining sum of the transaction sequence number of described that day on date recently is recorded Resulting balance value of the remaining sum value as the same day.
Further, this method also includes following features:
The Map node processings for performing first task are read in one or more Reduce nodes for performing first task Before the different remaining sums record finished, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, the cryptographic Hash of the first parameter is established and performs the with described The mapping relations of the Reduce nodes of one task;Wherein, the mapping relations are used for the Reduce for the execution first task Node remaining sum record according to corresponding to being read the mapping relations.
Further, this method also includes following features:
The Map node processings for performing the second task are read in one or more Reduce nodes for performing the second task Before the average daily remaining sum value record of difference finished, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes cryptographic Hash and the execution of the first parameter The mapping relations of the Reduce nodes of second task;Wherein, the mapping relations are used for for the second task of the execution Reduce nodes average daily remaining sum value record according to corresponding to being read the mapping relations.
Further, this method also includes following features:
One or more Map nodes for performing first tasks read account balance detailed datas different fragment datas it Before, in addition to:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail number According to the read range for being defined as account balance detailed data with the increment balance detail data on the day of inquiry terminates the time;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task Map nodes mapping relations;Wherein, the mapping relations are used for the Map nodes for the execution first task according to Fragment data corresponding to mapping relations reading.
In order to solve the above-mentioned technical problem, should present invention also offers a kind of parallel processing system (PPS) of account balance data System includes:
Map processing modules, including one or more Map nodes for performing first task;Each Map sections for performing first task Point is used for the different fragment datas for reading account balance detailed data, and each remaining sum records in the read fragment data of generation The first output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output Parameter setting is account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence Number;
Reduce processing modules, including one or more Reduce nodes for performing first task;It is each to perform first task Reduce nodes be used for read it is described execution first task Map node processings finish different remaining sums record, according to described The first output parameter and the second output parameter of remaining sum record generate the average daily remaining sum value record of each account respectively;Wherein, first Output parameter identical remaining sum record is read by same Reduce nodes.
Further, the system also includes following features:
The Reduce nodes for performing first task are used for the first output parameter and second recorded according to the remaining sum Output parameter generates the average daily remaining sum value record of each account respectively, including:According to the account ID in first output parameter, time Each bar remaining sum record of same account is gone through, determines the account when inquiring about start-stop according to the second output parameter that the remaining sum records Between in the range of every day remaining sum value, by the remaining sum value of every day it is described inquiry the beginning and ending time in the range of be averaged this The average daily remaining sum value of account, generate the average daily remaining sum value record of the account.
Further, the system also includes following features:
The Map processing modules also include one or more Map nodes for performing the second task, and the Reduce handles mould Block also includes one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, what generation was read The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
Each Reduce nodes for performing the second task are used to read what the Map node processings for performing the second task finished Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up, Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce nodes are read.
Further, the system also includes following features:
The second output parameter that the Reduce nodes for performing first task are used to be recorded according to the remaining sum determines should Account inquiry the beginning and ending time in the range of every day remaining sum value, including:Terminated from the same day of inquiry initial time to inquiry On the same day of time, judge that every day records with the presence or absence of remaining sum, such as exist, more than the maximum remaining sum record of day trade transaction sequence number Resulting balance value of the volume value as the same day, is such as not present, and traces earlier than the same day and with the nearest date of remaining sum record, by described in Resulting balance value of the remaining sum value of the remaining sum record of the transaction sequence number maximum of nearest that day on date as the same day.
Further, the system also includes following features:
The Reduce processing modules also include first task routing module:
The first task routing module, described in being read in one or more Reduce nodes for performing first task Before the different remaining sums record that the Map node processings of execution first task finish, the first parameter of each remaining sum record is calculated Cryptographic Hash, establish the cryptographic Hash of the first parameter and the mapping relations of the Reduce nodes of the execution first task;Wherein, institute State mapping relations be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum remember Record.
Further, the system also includes following features:
The Reduce processing modules also include the second task routing module:
The second task routing module, described in being read in one or more Reduce nodes for performing the second task Before performing the average daily remaining sum values record of difference that the Map node processings of the second task finish, calculate each average daily remaining sum value and remember The cryptographic Hash of first parameter of record, the mapping for establishing Reduce node of the cryptographic Hash of the first parameter with performing the second task are closed System;Wherein, the mapping relations are used for the Reduce nodes for the second task of the execution according to mapping relations reading pair The average daily remaining sum value record answered.
Further, the system also includes following features:The Map processing modules also include burst module:
The burst module, for reading account balance detail number in one or more Map nodes for performing first task According to different fragment datas before, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:Will be complete Amount balance detail data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detail number According to read range;The account balance detailed data burst that will belong in the read range, each burst is established with performing the The mapping relations of the Map nodes of one task;Wherein, the mapping relations are used for the Map node roots for the execution first task According to fragment data corresponding to mapping relations reading.
Compared with prior art, the method for parallel processing and system of a kind of account balance data provided by the invention, is based on Large-scale account balance detailed data is divided into several pieces and gives the processing of Map nodal parallels, Map stage logarithms by MapReduce It is classified according to according to account, is grouped after the completion of processing according to account id and is routed to multiple Reduce nodal parallels processing, So as to the statistical result of the average daily remaining sum of account under quick obtaining big data quantity, treatment effeciency is high, scalability is strong.
Brief description of the drawings
Fig. 1 obtains the average daily remaining of each user in the method for parallel processing for a kind of account balance data of the embodiment of the present invention The flow chart of volume value.
Fig. 2 is that each average daily remaining sum value area is counted in a kind of method for parallel processing of account balance data of the embodiment of the present invention Between account number flow chart.
Fig. 3 is a kind of structural representation of the parallel processing system (PPS) of account balance data of the embodiment of the present invention.
Fig. 4 is processing framework schematic diagram of the present invention using the account balance data based on MapReduce of example.
Embodiment
For the object, technical solutions and advantages of the present invention are more clearly understood, below in conjunction with accompanying drawing to the present invention Embodiment be described in detail.It should be noted that in the case where not conflicting, in the embodiment and embodiment in the application Feature can mutually be combined.
As shown in figure 1, the embodiments of the invention provide a kind of method for parallel processing of account balance data, this method bag Include:
S10, one or more perform the different burst numbers of the Map nodes reading account balance detailed data of first task According to the first output parameter and the second output parameter of each remaining sum record in the read fragment data of generation;Wherein, it is described First output parameter comprises at least account ID, and second output parameter is set as account status information, the account status letter Breath comprises at least:Remaining sum value, trade date and daylight trading sequence number;
S20, the Reduce nodes that one or more performs first task are read at the Map nodes for performing first task The different remaining sums record finished is managed, the first output parameter and the second output parameter recorded according to the remaining sum generates each account respectively The average daily remaining sum value record at family;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes;
This method can also include following features:
Preferably, the difference point of account balance detailed data is read in one or more Map nodes for performing first task Before sheet data, in addition to:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail number According to the read range for being defined as account balance detailed data with the increment balance detail data on the day of inquiry terminates the time;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task Map nodes mapping relations;Wherein, the mapping relations are used for the Map nodes for the execution first task according to Fragment data corresponding to mapping relations reading.
Preferably, first output parameter recorded according to the remaining sum and the second output parameter generate each account respectively Average daily remaining sum value record, including:
According to the account ID in first output parameter, each bar remaining sum for traveling through same account records, according to described remaining Second output parameter of volume record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by every day Remaining sum value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining of the account Volume value records.
Preferably, the second output parameter recorded according to the remaining sum determines the account in the range of the inquiry beginning and ending time The remaining sum value of every day, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum, As existed, using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, such as it is not present, traces Earlier than the same day and with the nearest date of remaining sum record, the maximum remaining sum of the transaction sequence number of described that day on date recently is recorded Resulting balance value of the remaining sum value as the same day.
Preferably, the Map for performing first task is read in one or more Reduce nodes for performing first task Before the different remaining sums record that node processing finishes, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, the cryptographic Hash of the first parameter is established and performs the with described The mapping relations of the Reduce nodes of one task;Wherein, the mapping relations are used for the Reduce for the execution first task Node remaining sum record according to corresponding to being read the mapping relations.
Preferably, the cryptographic Hash of the first parameter of each remaining sum record is to performing first task by first parameter Reduce node total number modulus;
Preferably, as shown in Fig. 2 also including after step S20:
S30, it is determined that the section where the average daily remaining sum value of each account, counts the account number in each section;
Preferably, it is determined that section where the average daily remaining sum value of each account, counts the account number in each section, wrap Include:
S301, one or more perform the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, generation The first output parameter and the second output parameter of the average daily remaining sum value record read;Wherein, the average daily remaining sum value records First output parameter is set as the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set It is set to 1;
S302, the Reduce nodes that one or more performs the second task read the Map nodes for performing the second task The average daily remaining sum value record of difference being disposed, the first output parameter and the second output recorded according to the average daily remaining sum value are joined The account number in each average daily remaining sum value section of number statistics, including:According to the average daily remaining sum value section in first output parameter, time The average daily remaining sum value record of each bar in same average daily remaining sum value section is gone through, by the second output parameter of each average daily remaining sum value record Added up, obtain the account number in the average daily remaining sum value section;Wherein, the first output parameter identical average daily remaining sum value record by Same Reduce nodes are read.
Preferably, the Map for performing the second task is read in one or more Reduce nodes for performing the second task Before the average daily remaining sum value record of difference that node processing finishes, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes cryptographic Hash and the execution of the first parameter The mapping relations of the Reduce nodes of second task;Wherein, the mapping relations are used for for the second task of the execution Reduce nodes average daily remaining sum value record according to corresponding to being read the mapping relations.
Preferably, the cryptographic Hash of the first parameter of each article of average daily remaining sum value record be will first parameter to execution the The Reduce node total number modulus of two tasks;
Wherein, the Map nodes for performing the second task are not with a collection of node or not with the Map nodes for performing first task With the node criticized, that is, Map nodes after first task has been performed, can just perform the second task.Similarly, it is described to perform the The Reduce nodes of two tasks from perform first task Reduce nodes be with a collection of node or the node of different batches, that is, Reduce nodes can just perform the second task after first task has been performed.
As shown in figure 3, the embodiments of the invention provide a kind of parallel processing system (PPS) of account balance data, the system bag Include:
Map processing modules, including one or more Map nodes for performing first task;Each Map sections for performing first task Point is used for the different fragment datas for reading account balance detailed data, and each remaining sum records in the read fragment data of generation The first output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output Parameter setting is account status information, and the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence Number;
Reduce processing modules, including one or more Reduce nodes for performing first task;It is each to perform first task Reduce nodes be used for read it is described execution first task Map node processings finish different remaining sums record, according to described The first output parameter and the second output parameter of remaining sum record generate the average daily remaining sum value record of each account respectively;Wherein, first Output parameter identical remaining sum record is read by same Reduce nodes.
The system can also include following features:
Preferably, the Reduce nodes for performing first task are used for the first output parameter recorded according to the remaining sum Generate the average daily remaining sum value record of each account respectively with the second output parameter, including:According to the account in first output parameter Family ID, each bar remaining sum record of same account is traveled through, determines that the account is looking into according to the second output parameter that the remaining sum records The remaining sum value of every day in the range of the beginning and ending time is ask, the remaining sum value of every day is made even in the range of the inquiry beginning and ending time The average daily remaining sum value of the account is obtained, generates the average daily remaining sum value record of the account.
Preferably, the Map processing modules also include one or more Map nodes for performing the second task, described Reduce processing modules also include one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, what generation was read The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
Each Reduce nodes for performing the second task are used to read what the Map node processings for performing the second task finished Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up, Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce nodes are read.
Preferably, the Reduce nodes for performing first task are used for the second output parameter recorded according to the remaining sum The remaining sum value of the every day of the account in the range of the inquiry beginning and ending time is determined, including:From same day of inquiry initial time to looking into The same day for terminating the time is ask, judges that every day records with the presence or absence of remaining sum, such as exists, the maximum remaining sum of day trade transaction sequence number is remembered Resulting balance value of the remaining sum value of record as the same day, is such as not present, the nearest date for tracing earlier than the same day and being recorded with remaining sum, Resulting balance value using the remaining sum value that the maximum remaining sum of the transaction sequence number of described that day on date recently records as the same day.
Preferably, the Reduce processing modules also include first task routing module:
The first task routing module, described in being read in one or more Reduce nodes for performing first task Before the different remaining sums record that the Map node processings of execution first task finish, the first parameter of each remaining sum record is calculated Cryptographic Hash, establish the cryptographic Hash of the first parameter and the mapping relations of the Reduce nodes of the execution first task;Wherein, institute State mapping relations be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum remember Record.
Preferably, the Reduce processing modules also include the second task routing module:
The second task routing module, described in being read in one or more Reduce nodes for performing the second task Before performing the average daily remaining sum values record of difference that the Map node processings of the second task finish, calculate each average daily remaining sum value and remember The cryptographic Hash of first parameter of record, the mapping for establishing Reduce node of the cryptographic Hash of the first parameter with performing the second task are closed System;Wherein, the mapping relations are used for the Reduce nodes for the second task of the execution according to mapping relations reading pair The average daily remaining sum value record answered.
Preferably, the Map processing modules also include burst module:
The burst module, for reading account balance detail number in one or more Map nodes for performing first task According to different fragment datas before, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:Will be complete Amount balance detail data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detail number According to read range;The account balance detailed data burst that will belong in the read range, each burst is established with performing the The mapping relations of the Map nodes of one task;Wherein, the mapping relations are used for the Map node roots for the execution first task According to fragment data corresponding to mapping relations reading.
Using example
An example application is given below:Count the average daily remaining sum in each 1 day January in account this year to January 8, and each account The distributed area of the average daily remaining sum in family.(assuming that there is two accounts:It is id001 and id002 respectively)
The original detailed data of two accounts is as shown in table 1, including the part full dose data of last year (2013) and the present The incremental data in (2014) January 1 to January 8 in year.
Date Account ID Remaining sum (member) Sequence number
20140101 002 20 1
20140102 002 10 1
20140102 002 50 2
20140103 002 30 1
20140103 002 15 2
20140106 002 40 1
20140106 002 50 2
20140106 002 60 3
20140108 002 30 1
20131230 002 90 1
20140106 001 60 1
20140108 001 30 1
20131225 001 90 1
Table 1
As shown in figure 4, starting 2 tasks based on MapReduce, parallel processing and calculating are carried out to account information.
The large-scale data file of input is divided into some bursts and gives the processing of Map nodal parallels, Map ranks by first task Section by data according to<Account id, accounts information > form are exported, and step operation is classified pre- to the data of magnanimity Processing, merging treatment in same Reduce nodes is directed to by the detailed data of same account, such as, according to account id Hash Hash value (being directed to Reduce number of nodes modulus) is grouped and is routed to multiple Reduce parallel processings;The Reduce stages are to Map The detailed data of every account of stage output is handled, including:
A) delta file on the day of terminating the time to inquiry from last year full dose file is read;
B) the Map stages classify to all data, output<Account id, account details data message >
C) the Reduce stages select detailed data in same account id and carry out tissue and processing, determine that each account exists Initial time is inquired about to the remaining sum value of every day in the measurement period for terminating the time, then calculates each account in statistics week Average daily remaining sum in phase.
Account id001 and account id002 all data pull processing by two Reduce nodes respectively.To each account Family, Reduce nodes judge that every day whether there is remaining sum from the same day for inquiring about initial time to the same day of inquiry termination time Record, such as exist, the resulting balance value using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the same day, if do not deposited Tracing earlier than the same day and with the nearest date of remaining sum record, by the transaction sequence number maximum of described that day on date recently Resulting balance value of the remaining sum value of remaining sum record as the same day.
Such as on the day of inquiry initial time (January 1 this year), 002 account surplus record, then by 002 account 2014 Year January 1 Day Trading serial number 1 remaining sum record remaining sum value " 20 yuan " as 002 account in the remaining sum on January 1st, 2014 Value;001 account in sunset surplus record January 1 in 2014, then by the full dose data of 2013 apart from nearest one in this year " 90 yuan " of the remaining sum value of remaining sum record (on December 25th, 2013, the remaining sum record of transaction serial number 1) is as 001 account 2014 The remaining sum value on January 1, in.
To 002 account, 002 account that the first Reduce nodes determine is inquiring about the remaining sum of every day in the range of the beginning and ending time Value, as shown in table 2;
Query Dates Remaining sum value (member)
20140101 20
20140102 50
20140103 15
20140104 15
20140105 40
20140106 60
20140107 60
20140108 30
Table 2
To 001 account, 002 account that the first Reduce nodes determine is inquiring about the remaining sum of every day in the range of the beginning and ending time Value, as shown in table 3;
Date Remaining sum value
20140101 90
20140102 90
20140103 90
20140104 90
20140105 90
20140106 60
20140107 60
20140108 30
Table 3
Average daily remaining sum of the account 002 by the end of on January 8th, 2014 is calculated as follows:(20+50+15+15+40+60+60+ 30)/8=36.25 (member);
Average daily remaining sum of the account 001 by the end of on January 8th, 2014 is calculated as follows:(90+90+90+90+90+60+60+ 30)/8=75;
Second task counts the distribution situation of the average daily remaining sum value of each account according to the result of calculation of first task.
The average daily remaining sum exported according to first task, average daily remaining sum is sentenced in the Map stages of second task It is disconnected.For example, setting section [0,15), [15,50), [50,100].Then the affiliated section of average daily remaining sum of account 002 for [15,50), The affiliated section of average daily remaining sum of account 001 is [50,100].To account 002, with section [15,50) be used as the first output parameter (key), 1 as the second output parameter (value);To account 001, the first output parameter (key) is used as with section [50,100], 1 as the second output parameter (value).At the Reduce ends of second task, the value in each section is added up And export, the as a result account distribution number in as each section.
The method for parallel processing and system for a kind of account balance data that above-described embodiment provides, will based on MapReduce Large-scale account balance detailed data is divided into several pieces and gives the processing of Map nodal parallels, and the Map stages enter to data according to account Classification is gone, has been grouped after the completion of processing according to account id and is routed to multiple Reduce nodal parallels processing, so as to quick obtaining The statistical result of the average daily remaining sum of account under big data quantity, treatment effeciency is high, scalability is strong.
One of ordinary skill in the art will appreciate that all or part of step in the above method can be instructed by program Related hardware is completed, and described program can be stored in computer-readable recording medium, such as read-only storage, disk or CD Deng.Alternatively, all or part of step of above-described embodiment can also be realized using one or more integrated circuits, accordingly Ground, each module/unit in above-described embodiment can be realized in the form of hardware, can also use the shape of software function module Formula is realized.The present invention is not restricted to the combination of the hardware and software of any particular form.
It should be noted that the present invention can also have other various embodiments, without departing substantially from of the invention spiritual and its essence In the case of, those skilled in the art can make various corresponding changes and deformation according to the present invention, but these are corresponding Change and deform the protection domain that should all belong to appended claims of the invention.

Claims (12)

1. a kind of method for parallel processing of account balance data, this method include:
One or more performs the different fragment datas of the Map nodes reading account balance detailed data of first task, generates institute The first output parameter and the second output parameter of each remaining sum record in the fragment data of reading;Wherein, first output Parameter comprises at least account ID, and second output parameter is set as account status information, and the account status information is at least wrapped Include:Remaining sum value, trade date and daylight trading sequence number;
The Reduce nodes that one or more performs first task read what the Map node processings for performing first task finished Different remaining sum records, the first output parameter and the second output parameter recorded according to the remaining sum generate the average daily of each account respectively Remaining sum value records;Wherein, the first output parameter identical remaining sum record is read by same Reduce nodes;
The first output parameter recorded in the Reduce nodes of the execution first task according to the remaining sum and the second output ginseng After number generates the average daily remaining sum value record of each account respectively, in addition to:
One or more performs the average daily remaining sum value record of the different accounts of Map nodes reading of the second task, what generation was read The first output parameter and the second output parameter of average daily remaining sum value record;Wherein, the first output of the average daily remaining sum value record Parameter setting is the section where the average daily remaining sum value, and the second output parameter of the average daily remaining sum value record is set as 1;
The Reduce nodes that one or more performs the second task read what the Map node processings for performing the second task finished Different average daily remaining sum value records, the first output parameter and the second output parameter recorded according to the average daily remaining sum value count each day The account number in equal remaining sum value section, including:According to the average daily remaining sum value section in first output parameter, travel through same average daily The average daily remaining sum value record of each bar in remaining sum value section, the second output parameter of each average daily remaining sum value record is added up, Obtain the account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce nodes are read.
2. the method as described in claim 1, it is characterised in that:
First output parameter recorded according to the remaining sum and the second output parameter generate the average daily remaining sum of each account respectively Value record, including:
According to the account ID in first output parameter, each bar remaining sum record of same account is traveled through, is remembered according to the remaining sum Second output parameter of record determines the remaining sum value of the every day of the account in the range of the inquiry beginning and ending time, by the remaining sum of every day Value is averaged to obtain the average daily remaining sum value of the account in the range of the inquiry beginning and ending time, generates the average daily remaining sum value of the account Record.
3. method as claimed in claim 2, it is characterised in that:
According to the remaining sum record the second output parameter determine the account inquiry the beginning and ending time in the range of more than every day Volume value, including:
From the same day of the same day to the inquiry termination time of inquiry initial time, judge that every day records with the presence or absence of remaining sum, such as deposit , using the remaining sum value that the maximum remaining sum of day trade transaction sequence number records as the resulting balance value on the same day, be such as not present, trace earlier than The same day and the nearest date recorded with remaining sum, more than the maximum remaining sum record of the transaction sequence number of described that day on date recently Resulting balance value of the volume value as the same day.
4. the method as described in claim 1, it is characterised in that:
The Map node processings for performing first task are read in one or more Reduce nodes for performing first task to finish Different remaining sums record before, in addition to:
The cryptographic Hash of the first parameter of each remaining sum record is calculated, cryptographic Hash and the execution for establishing the first parameter are first The mapping relations of the Reduce nodes of business;Wherein, the mapping relations are used for the Reduce nodes for the execution first task The remaining sum record according to corresponding to being read the mapping relations.
5. the method as described in claim 1, it is characterised in that:
The Map node processings for performing the second task are read in one or more Reduce nodes for performing the second task to finish The average daily remaining sum values record of difference before, in addition to:
The cryptographic Hash of the first parameter of each average daily remaining sum value record is calculated, establishes the cryptographic Hash of the first parameter with performing second The mapping relations of the Reduce nodes of task;Wherein, the mapping relations are used for the Reduce sections for the second task of the execution Point average daily remaining sum value record according to corresponding to being read the mapping relations.
6. the method as described in claim 1, it is characterised in that:
Before one or more Map nodes for performing first task read the different fragment datas of account balance detailed data, Also include:
The read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:By full dose balance detail data and Increment balance detail data on the day of inquiry terminates the time are defined as the read range of account balance detailed data;
The account balance detailed data burst that will belong in the read range, each burst is established with performing first task The mapping relations of Map nodes;Wherein, the mapping relations are used to reflect according to for the Map nodes of the execution first task Penetrate fragment data corresponding to relation reading.
7. a kind of parallel processing system (PPS) of account balance data, the system include:
Map processing modules, including one or more Map nodes for performing first task;Each Map nodes for performing first task are used The of each article of remaining sum record in the different fragment datas for reading account balance detailed data, the read fragment data of generation One output parameter and the second output parameter;Wherein, first output parameter comprises at least account ID, second output parameter It is set as account status information, the account status information comprises at least:Remaining sum value, trade date and daylight trading sequence number;
Reduce processing modules, including one or more Reduce nodes for performing first task;Each execution first task Reduce nodes are used to read the different remaining sums record that the Map node processings of the execution first task finish, according to described remaining The first output parameter and the second output parameter of volume record generate the average daily remaining sum value record of each account respectively;Wherein, first is defeated Go out parameter identical remaining sum record to be read by same Reduce nodes;
The Map processing modules also include one or more Map nodes for performing the second task, and the Reduce processing modules are also Including one or more Reduce nodes for performing the second task;
Each Map nodes for performing the second task are used for the average daily remaining sum value record for reading different accounts, and generation is read average daily The first output parameter and the second output parameter of remaining sum value record;Wherein, the first output parameter of the average daily remaining sum value record The section being set as where the average daily remaining sum value, the second output parameter of the average daily remaining sum value record are set as 1;
Each Reduce nodes for performing the second task are used to read the difference that the Map node processings of the second task of the execution finish Average daily remaining sum value record, the first output parameter and the second output parameter statistics that are recorded according to the average daily remaining sum value it is each it is average daily more than The account number in volume value section, including:According to the average daily remaining sum value section in first output parameter, same average daily remaining sum is traveled through It is worth the average daily remaining sum value record of each bar in section, the second output parameter of each average daily remaining sum value record is added up, obtained The account number in the average daily remaining sum value section;Wherein, the average daily remaining sum value record of the first output parameter identical is by same Reduce Node is read.
8. system as claimed in claim 7, it is characterised in that:
The Reduce nodes for performing first task are used for the first output parameter recorded according to the remaining sum and the second output Parameter generates the average daily remaining sum value record of each account respectively, including:It is same according to the account ID in first output parameter, traversal Each bar remaining sum record of one account, the second output parameter recorded according to the remaining sum determine the account in inquiry beginning and ending time model The remaining sum value of every day in enclosing, the remaining sum value of every day is averaged to obtain the account in the range of the inquiry beginning and ending time Average daily remaining sum value, generate the account average daily remaining sum value record.
9. system as claimed in claim 8, it is characterised in that:
The Reduce nodes for performing first task are used to determine the account according to the second output parameter that the remaining sum records The remaining sum value of every day in the range of the inquiry beginning and ending time, including:The time is terminated from the same day of inquiry initial time to inquiry The same day, judge that every day records with the presence or absence of remaining sum, such as exist, the remaining sum value that the maximum remaining sum of day trade transaction sequence number is recorded As the resulting balance value on the same day, such as it is not present, traces earlier than the same day and with the nearest date of remaining sum record, will be described nearest Resulting balance value of the remaining sum value of the maximum remaining sum record of the transaction sequence number of that day on date as the same day.
10. system as claimed in claim 7, it is characterised in that the Reduce processing modules also include first task and route Module:
The first task routing module, for reading the execution in one or more Reduce nodes for performing first task Before the different remaining sums record that the Map node processings of first task finish, the Kazakhstan of the first parameter of each remaining sum record is calculated Uncommon value, establish the cryptographic Hash and the mapping relations of the Reduce nodes of the execution first task of the first parameter;Wherein, it is described to reflect Penetrate relation be used for for it is described execution first task Reduce nodes according to the mapping relations read corresponding to remaining sum record.
11. system as claimed in claim 7, it is characterised in that the Reduce processing modules also include the second task and route Module:
The second task routing module, for reading the execution in one or more Reduce nodes for performing the second task Before the average daily remaining sum value record of difference that the Map node processings of second task finish, calculate what each average daily remaining sum value recorded The cryptographic Hash of first parameter, establish the mapping relations of Reduce node of the cryptographic Hash of the first parameter with performing the second task;Its In, the mapping relations are used to supply the Reduce nodes for performing the second task day according to corresponding to being read the mapping relations Equal remaining sum value record.
12. system as claimed in claim 7, it is characterised in that the Map processing modules also include burst module:
The burst module, for reading account balance detailed data in one or more Map nodes for performing first task Before different fragment datas, the read range of account balance detailed data is determined according to the inquiry beginning and ending time, including:More than full dose Volume detailed data and the increment balance detail data on the day of inquiry terminates the time are defined as account balance detailed data Read range;The account balance detailed data burst that will belong in the read range, it is first with performing to establish each burst The mapping relations of the Map nodes of business;Wherein, the mapping relations are used for the Map nodes for the execution first task according to institute State fragment data corresponding to mapping relations reading.
CN201410306448.6A 2014-06-30 2014-06-30 A kind of method for parallel processing and system of account balance data Active CN104050291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410306448.6A CN104050291B (en) 2014-06-30 2014-06-30 A kind of method for parallel processing and system of account balance data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410306448.6A CN104050291B (en) 2014-06-30 2014-06-30 A kind of method for parallel processing and system of account balance data

Publications (2)

Publication Number Publication Date
CN104050291A CN104050291A (en) 2014-09-17
CN104050291B true CN104050291B (en) 2017-11-10

Family

ID=51503123

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410306448.6A Active CN104050291B (en) 2014-06-30 2014-06-30 A kind of method for parallel processing and system of account balance data

Country Status (1)

Country Link
CN (1) CN104050291B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105740063A (en) * 2014-12-08 2016-07-06 杭州华为数字技术有限公司 Data processing method and apparatus
CN106022901A (en) * 2015-03-17 2016-10-12 阿里巴巴集团控股有限公司 Data processing method and device
CN107357679A (en) * 2016-05-10 2017-11-17 银联数据服务有限公司 A kind of backup method and device
CN110659265B (en) * 2019-09-27 2020-11-24 广州峻林互联科技有限公司 Distributed parallel database resource management method
CN111680080A (en) * 2020-04-16 2020-09-18 中邮消费金融有限公司 Data processing method and data processing system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799808A (en) * 2009-02-10 2010-08-11 中国移动通信集团公司 Data processing method and system thereof
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100162230A1 (en) * 2008-12-24 2010-06-24 Yahoo! Inc. Distributed computing system for large-scale data handling

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101799808A (en) * 2009-02-10 2010-08-11 中国移动通信集团公司 Data processing method and system thereof
CN102467570A (en) * 2010-11-17 2012-05-23 日电(中国)有限公司 Connection query system and method for distributed data warehouse

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于OGSA的网格记帐系统的研究与实现;阿都建华;《中国优秀硕士学位论文全文数据库 信息科技辑》;20061215(第12期);42-43 *

Also Published As

Publication number Publication date
CN104050291A (en) 2014-09-17

Similar Documents

Publication Publication Date Title
CN104050291B (en) A kind of method for parallel processing and system of account balance data
CN105446991B (en) Date storage method, querying method and equipment
US20210326815A1 (en) Information storage and retrieval using an off-chain isomorphic database and a distributed ledger
CN103853820B (en) Data processing method and data processing system
CN106227800B (en) Storage method and management system for highly-associated big data
CN104123374B (en) The method and device of aggregate query in distributed data base
CN104317789B (en) The method for building passenger social network
CN103678590B (en) Report collecting device and report collecting method based on OLAP
CN106844477B (en) To synchronous method after block catenary system, block lookup method and block chain
CN108733713A (en) Data query method and device in data warehouse
CN103226618B (en) The related term extracting method excavated based on Data Mart and system
CN105989129A (en) Real-time data statistic method and device
CN110023925A (en) It generates, access and display follow metadata
CN107798038A (en) Data response method and data response apparatus
CN105843860B (en) A kind of microblogging concern recommended method based on parallel item-based collaborative filtering
CN105204920B (en) A kind of implementation method and device of the distributed computing operation based on mapping polymerization
CN108229728A (en) A kind of recommendation method of information of freight source and a kind of computer equipment
CN107870949A (en) Data analysis job dependence relation generation method and system
CN111382181A (en) Designated enterprise family affiliation analysis method and system based on stock right penetration
CN104036039A (en) Parallel processing method and system of data
CN103995886B (en) A kind of various dimensions product-design knowledge pushes framework and construction method
CN102208061A (en) Data cancel after verification processing device and method
CN105468728B (en) A kind of method and system obtaining cross-section data
CN106095952A (en) In space-time unique based on key assignments cloud storage, magnanimity crosses car record method for quickly querying
CN108470251A (en) Community based on Average Mutual divides quality evaluating method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20180817

Address after: 200436 Room 411, No. three, JIANGCHANG Road, Jingan District, Shanghai, 411

Patentee after: Shanghai wave Cloud Computing Service Co., Ltd.

Address before: 100085 floor 1, C 2-1, No. 2, Shang Di Road, Haidian District, Beijing.

Patentee before: Electronic information industry Co.,Ltd of the tide (Beijing)