CN107508901A - Distributed data processing method, apparatus, server and system - Google Patents

Distributed data processing method, apparatus, server and system Download PDF

Info

Publication number
CN107508901A
CN107508901A CN201710783415.4A CN201710783415A CN107508901A CN 107508901 A CN107508901 A CN 107508901A CN 201710783415 A CN201710783415 A CN 201710783415A CN 107508901 A CN107508901 A CN 107508901A
Authority
CN
China
Prior art keywords
data
burst
server
data processing
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710783415.4A
Other languages
Chinese (zh)
Other versions
CN107508901B (en
Inventor
黄世清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710783415.4A priority Critical patent/CN107508901B/en
Publication of CN107508901A publication Critical patent/CN107508901A/en
Application granted granted Critical
Publication of CN107508901B publication Critical patent/CN107508901B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application proposes a kind of distributed data processing method, apparatus, server and system, is related to technical field of data processing.A kind of distributed data processing method of the present invention includes:Task distribution server by pending data burst, obtains the burst information of pending data according to the quantity of data processing server;Each burst information is sent to corresponding data processing server by task distribution server, so that data processing server obtains according to burst information and handles pending data burst;Task distribution server determines data processed result according to the feedback result of each data processing server.Pass through such method, the quantity of the server of processing data can be based on data to be carried out with burst and assigns data to each server handle, so as to reduce the operation bidirectional that data processing is removed in distributed data processing, the duration that operation bidirectional takes is reduced, so as to improve the efficiency of distributed data processing.

Description

Distributed data processing method, apparatus, server and system
Technical field
The application is related to technical field of data processing, particularly a kind of distributed data processing method, apparatus, server and System.
Background technology
In common software development process, the calculating for low volume data often can meet need using single computer Ask, large-scale data are calculated and generally require to be handled using special big data service.
But when data volume is medium-scale, single computer can not be handled quickly, and big data cluster service is because make It is excessive with number, cause resource nervous, task processing is slow, or even needs to wait in line, and the duration waited is in data processing During the entire process of take proportion it is excessive, reduce the efficiency of data processing.
The content of the invention
Inventor has found that big data cluster service is made due to the reason such as preliminary preparation and cluster resource overall planning Low into data-handling efficiency, this point is more obvious data volume more hour performance.
The purpose of the application is the efficiency for improving distributed data processing.
According to the one side of the application, a kind of distributed data processing method is proposed, including:Task distribution server root Pending data is averaged burst according to the quantity of data processing server, obtains the burst information of pending data;Task is distributed Each burst information is sent to corresponding data processing server by server, so that data processing server is according to burst information Obtain and handle pending data burst;Task distribution server determines number according to the feedback result of each data processing server According to result.
Alternatively, pending data is averaged burst by task distribution server according to the quantity of data processing server, obtains Taking the burst information of pending data includes:Task distribution server is according to the quantity of data processing server by pending data Average burst, obtain the information of single server process data;Will be single according to the predetermined number of threads of individual data processing server Server process data fragmentation, obtain burst information.
Alternatively, pending data is averaged burst by task distribution server according to the quantity of data processing server, obtains Taking the information of single server process data includes:It is that each data processing server distributes data by hash algorithm, obtains just Distribute the information of single server process data;Single server process data just are distributed so that data by data balancing algorithm process Equilibrium assignment, obtain the information of single server process data.
Alternatively, each burst information is sent to corresponding data processing server and included by task distribution server:Appoint Burst information is stored in database by business distribution server with predetermined policy, so that data processing server is in monitored data storehouse When, burst information is obtained according to the fresh information for meeting predetermined policy.
Alternatively, multiple threads of data processing server are while listening for database;When multiple threads obtain burst letter During breath, at first obtain burst information corresponding to pending data burst thread process pending data burst, other threads after Continuous monitored data storehouse;The thread for obtaining pending data burst completes continuation monitored data storehouse after pending data burst;Circulation Said process is performed until all pending datas for distributing to data processing server are completed in processing.
Alternatively, in addition to:Task distribution server sends to each data processing server wait to locate for handling in advance The algorithm or algorithm mark of data are managed, so as to algorithm process pending data corresponding to the use of each data processing server.
Alternatively, source-information of the burst information including pending data, data table information, burst field information, filtering One or more in conditional information, and the address information of purpose data processing server.
Alternatively, pending data includes being stored in database data, the data from external equipment and pass through One or more in the data of Network Capture.
By such method, the quantity that can be based on the server of processing data carries out burst to data and by number Handled according to each server is distributed to, so as to reduce shared by the operation in distributed data processing beyond data processing Proportion, improve the efficiency of distributed data processing.
According to further aspect of the application, a kind of distributed computing devices are proposed, including:Data fragmentation unit, is used for Pending data is averaged burst according to the quantity of data processing server, obtains the burst information of pending data;Burst is believed Dispatching Unit is ceased, for each burst information to be sent into corresponding data processing server, so as to data processing server root Obtained according to burst information and handle pending data burst;As a result acquiring unit, for according to each data processing server Feedback information determines data processed result.
Alternatively, data fragmentation unit includes:First burst subelement, will for the quantity according to data processing server The average burst of pending data, obtain the information of single server process data;Second burst subelement, for according to individual data Single server process data fragmentation is obtained burst information by the predetermined number of threads of processing server.
Alternatively, the first burst subelement is used for:It is that each data processing server distributes data by hash algorithm, obtains Take the information for just distributing single server process data;Single server process data just are distributed by data balancing algorithm process, are obtained Take the information of single server process data.
Alternatively, burst information Dispatching Unit is used for:Burst information is stored in database with predetermined policy, so as to data Processing server obtains burst information at monitored data storehouse, according to the fresh information for meeting predetermined policy.
Alternatively, in addition to:Data storage cell, for pending data to be stored in database.
Alternatively, in addition to:Algorithm designating unit, it is pending for handling for being sent to each data processing server Algorithm or the algorithm mark of data, so as to algorithm process pending data corresponding to the use of each data processing server.
Alternatively, in addition to:Data capture unit, for obtaining burst information, pending number is obtained according to burst information According to burst;Data processing unit, for handling pending data burst, feedback processing result.
Alternatively, data capture unit is not handling the thread of pending data burst while listening for data using multiple Storehouse;Data processing unit is used for when multiple threads obtain burst information, waits to locate corresponding to burst information using obtaining at first Manage the thread process pending data burst of data fragmentation;The thread for obtaining pending data burst completes pending data burst Continue monitored data storehouse afterwards.
Alternatively, source-information of the burst information including pending data, data table information, burst field information, filtering One or more in conditional information, and the address information of purpose data processing server.
Alternatively, pending data includes being stored in database data, the data from external equipment and pass through One or more in the data of Network Capture.
The quantity that such device can be based on the server of processing data carries out burst to data and divides data The each server of dispensing is handled, so as to reduce the ratio shared by the operation in distributed data processing beyond data processing Weight, improve the efficiency of distributed data processing.
According to the another aspect of the application, a kind of distributed data processing device is proposed, including:Memory;And coupling Be connected to the processor of memory, processor be configured as performing based on the instruction for being stored in memory be mentioned above it is any one Kind distributed data processing method.
According to another aspect of the application, a kind of computer-readable recording medium is proposed, is stored thereon with computer journey Sequence instruct, when the instruction is executed by processor realize be mentioned above any one distributed data processing method the step of.
According to another aspect of the application, propose a kind of server, including for perform be mentioned above it is any A kind of device of distributed data processing method.
Such server can be based on the quantity of the server of processing data and data be carried out with burst and by data Distribute to each processing server to be handled, so as to reduce the operation institute in distributed data processing beyond data processing The proportion accounted for, improve the efficiency of distributed data processing.
In addition, according to the one side of the application, propose a kind of distributed data processing system, including it is multiple above Server.
In such distributed data processing system, server can be based on the quantity pair of the server of processing data Data carry out burst and assign data to each server and handled, so as to reduce data in distributed data processing The proportion shared by operation beyond processing, improve the efficiency of distributed data processing.
Brief description of the drawings
Accompanying drawing described herein is used for providing further understanding of the present application, forms the part of the application, this Shen Schematic description and description please is used to explain the application, does not form the improper restriction to the application.In the accompanying drawings:
Fig. 1 is the flow chart of one embodiment of the distributed data processing method of the application.
Fig. 2 is the flow chart of another embodiment of the distributed data processing method of the application.
Fig. 3 is the schematic diagram of one embodiment of the distributed data processing device of the application.
Fig. 4 is the schematic diagram of another embodiment of the distributed data processing device of the application.
Fig. 5 is the schematic diagram of another embodiment of the distributed data processing device of the application.
Fig. 6 is the schematic diagram of the further embodiment of the distributed data processing device of the application.
Fig. 7 is the schematic diagram of one embodiment of the distributed data processing system of the application.
Embodiment
Below by drawings and examples, the technical scheme of the application is described in further detail.
Some platforms for being used for large-scale data processing, such as MapReduce in the prior art be present.MapReduce is face To the computation model, framework and platform of big data parallel processing, it allows to form one with the common commercial server of in the market Include tens of, hundreds of distributions to many thousands of nodes and parallel computing trunking.
MapReduce Computational frame functions are very powerful, obtained in the practical application handled large-scale data Generally approve.But there is problems with MapReduce in the application process of reality:
For maintainability, the calculating service provided using MapReduce is wanted, it has to Hadoop environment is installed, Because MapReduce is not an independent framework, it is necessary to which relying on HDFS file system can just perform.Which adds dimension The cost and workload of shield.
For ease for use, writing MapReduce tasks needs to learn its API although to have certain workload, but substantially It can learn within one week, but next the maintenance workload of Hadoop cluster services is very surprising, and need to have Very strong professional knowledge.
For time cost, MapReduce is during calculating, it is necessary to be loaded by task, Mission Monitor, data The processes such as burst, burst calculating, data shuffling, data merging, are once calculated substantially in units of hour, for extensive Data can also endure, but for middle and small scale data, the time for completing framework functions just seems oversize.
Therefore, for during being developed using MapReduce frameworks, in processes during small-scale data set, All there is larger waste in terms of system resource and manpower, time cost.
The flow chart of one embodiment of the distributed data processing method of the application is as shown in Figure 1.
In a step 101, task distribution server according to the quantity of data processing server by pending data average mark Piece, obtain the burst information of pending data.For example, task distribution server finds have 10 data processing servers to use In processing pending data, therefore by pending data burst, distributed for each data processing server at least one pending Data fragmentation, and generate the burst information of each pending data burst.
In one embodiment, burst information includes source-information, data table information, burst the field letter of pending data One or more in breath, filtering conditional information, and receive the address letter of the purpose data processing server of the burst information Breath.
In a step 102, burst information is sent to corresponding data processing server by task distribution server.At one In embodiment, burst information can be stored in database by task distribution server according to predetermined strategy, such as be stored in purpose number In the tables of data monitored according to processing server, when data processing server determines that data renewal occurs for its tables of data monitored When, the burst information is obtained from database.Database can be MySQL or NoSql etc..
Data processing server can according to corresponding to obtaining the burst information of acquisition pending data burst, and to acquisition Data are handled.In one embodiment, pending data can include being stored in database data, set from outside One or more in standby data and the data for passing through Network Capture.Data processing server can be true according to burst information The source of fixed number evidence simultaneously carries out data extraction.
In step 103, task distribution server is determined at data according to the feedback result of each data processing server Manage result.In one embodiment, feedback result can be stored in the precalculated position or predetermined of database by data processing server In table, field, task distribution server obtains the feedback result of each data processing server by reading database, so as to obtain Obtain data processed result.
By the distributed computing method of such lightweight, the quantity logarithm of the server of processing data can be based on According to carrying out burst and assign data to each server and handled, so as to reduce in distributed data processing at data The proportion shared by operation beyond reason, improve the efficiency of distributed data processing, the data processing for intermediate data amount, efficiency The performance of raising is especially prominent.In addition, such method need not install specific environment, tieed up without great amount of cost and energy is spent The specific environment is protected, reduces maintenance cost, improves Consumer's Experience.
In one embodiment, task distribution server can also write burst record information, burst record to database The one or more in following information can be included in information:
Message count:Total burst number, i.e. message count caused by this calculating.
Data set:The set of the calculative master data field of each burst.
Carry out source host:Task distribution server identifies.
Calculating main frame:The mark of purpose data processing server corresponding to burst.
Time started:Data processing server receives the time of burst information.
End time:Data processing server completes the time of the processing of corresponding pending data burst.
State:Current slice calculate handle in state, state can include:In not calculating, calculating, calculate and complete and meter Calculate unsuccessfully etc.;
Version number:For concurrently fetch according to when use, be defaulted as 0.
By such method, the effective monitoring to data handling procedure can be realized, improves the controllability of data processing And reliability.
In one embodiment, each data processing server can improve data by the way of multi-threading parallel process The efficiency of processing.The quantity of thread can be determined by artificially adjusting, configuring.When task distribution server carries out data distribution When, n parts first can be splitted data into according to the quantity n of data processing server, obtain the information of single server process data;Enter And the data fragmentation of single server will be distributed to according to the predetermined number of threads of each data processing server, obtain by single line The information of the pending data burst of journey processing, i.e. burst information.
By such method, can be carried out in each data processing server by the way of the processing of multiple thread parallels Data processing, so as to improve the efficiency of data processing, also improve the utilization rate of server resource.
In one embodiment, it can be that each data processing server distributes data by hash algorithm, obtain just dividing Information with single server process data, and then data balancing calculating is carried out to just distributing single server process data, make distribution Data volume to each server is tried one's best balanced.In one embodiment, average isostatic algorithm can be used, that is, calculates total amount Average value, the pending data burst more than average value can be assigned data to the server less than average value.It can also adopt With maximum value-based algorithm, every server of setting is capable of the maximum of processing data, and only pending data burst exceedes this most Big value can just carry out equilibrium assignment, if this maximum less than if average by averagely figuring.
Such method make it that the resource utilization of each server is balanced, also shortens the determination of task distribution server Duration used in data processed result.
The flow chart of another embodiment of the distributed data processing method of the application is as shown in Figure 2.
In step 201, task distribution server according to the quantity of data processing server by pending data average mark Piece, obtain the information of single server process data.In one embodiment, first can just be distributed by what is be mentioned above, then The mode for carrying out equilibrium treatment obtains the information of single server process data, to ensure the equal of the data for each server-assignment Weighing apparatus.
In step 202, single server process data are divided according to the predetermined number of threads of individual data processing server Piece, obtain burst information.In one embodiment, the duration that each pending data burst allows to perform can be set, it is such as uncommon Hope every task half an hour complete, and estimate time that the calculating of each data needs (i.e. each data execution time, can be with Obtained by practical experience), further according to formula:
Per sheet data number=every allow to calculate total time/each data and perform the time
Total tablet number=data count/per sheet data number
Total tablet number is determined, and then obtains the burst information of each pending data burst.
In step 203, multiple threads of data processing server are while listening for database.In one embodiment, often The number of threads of individual data server can be set, such as acquiescence opens 5 threads, and interval time is 3 seconds etc..
In step 204, in order to avoid multiple threads take a data simultaneously, cause to compute repeatedly, pleasure can be used Lock is seen to avoid repeating taking data.Optimism lock is identification field using version number, when multiple threads take same burst letter During breath, the thread process in the storehouse pending data burst is updated the data at first, and other threads no longer carry out the pending data point The acquisition and processing operation of piece.After data processing is completed, result can be stored in database and supply task by data processing server Distribution server is read.After the thread for handling pending data burst completes data processing, monitored data storehouse can be continued and obtained Take next burst information.
In step 205, task distribution server is determined at data according to the feedback result of each data processing server Manage result.
By such method, multiple threads of data processing server can be avoided to take and handle identical task, The stability of data processing is improved, and ensure that the efficiency of data processing.
In one embodiment, task distribution server to database while burst information is write, data processing clothes Business device can obtain burst information in real time and carry out data processing., can be by data fragmentation and data by such method The concurrent process of reason performs, and takes the plenty of time so as to avoid the process of data fragmentation, further increases data processing Efficiency.
In one embodiment, in server cluster, any server can be used as task distribution server, will Other one or more servers, and task distribution server itself can be from clothes as data processing server, user Any server being engaged in device cluster performs the unlatching of task, so as to improve the utilization rate of server, improves user Experience.
In one embodiment, pending data can be stored in by task server in advance before data fragmentation is carried out In database, so that data processing server can carry out data extraction according to burst information.In particular for file type data, Need that the data to be calculated and its related data are loaded into database in advance.Database can include relevant database Or non-relational database etc..
In one embodiment, at least one algorithm can be configured with each server, task server can be Data processing server specifies the algorithm for being currently used in processing pending data, so as to improve the flexibility ratio of data processing.
The schematic diagram of one embodiment of the distributed data processing device of the application is as shown in Figure 3.Data fragmentation unit 301 can obtain the burst information of pending data according to the quantity of data processing server by pending data burst.Example Such as, task distribution server finds have 10 data processing servers to can be used for handling pending data, therefore data fragmentation Unit 301 distributes at least one pending data burst by pending data burst, for each data processing server, and generates The burst information of each pending data burst.
Burst information can be sent to corresponding data processing server by burst information Dispatching Unit 302.In a reality Apply in example, burst information can be stored in database by burst information Dispatching Unit 302 according to predetermined strategy, such as be stored in purpose In the tables of data that data processing server is monitored, when data processing server determines that data renewal occurs for its tables of data monitored When, the burst information is obtained from database.Database can be MySQL or NoSql etc..
As a result acquiring unit 303 can determine data processed result according to the feedback result of each data processing server. In one embodiment, feedback result can be stored in precalculated position or reservation chart, the field of database by data processing server In, as a result acquiring unit 303 obtains the feedback result of each data processing server by reading database, so as to obtain data Result.
The quantity that such device can be based on the server of processing data carries out burst to data and divides data The each server of dispensing is handled, so as to reduce the ratio shared by the operation in distributed data processing beyond data processing Weight, improves the efficiency of distributed data processing, the data processing for intermediate data amount, and the performance that efficiency improves is especially prominent.
In one embodiment, data fragmentation unit 301 can include the first burst subelement and the second burst subelement. First burst subelement first can split data into n according to the quantity n (n is the positive integer not less than 1) of data processing server Part, obtain the information of single server process data;Second burst subelement is according to the predetermined thread of each data processing server Quantity will distribute to the data fragmentation of single server, obtains by the information of the pending data burst of single thread process, that is, divides Piece information.
Such device can carry out data in each data processing server by the way of the processing of multiple thread parallels Processing, so as to improve the efficiency of data processing, also improves the utilization rate of server resource.
In one embodiment, the first burst subelement can first pass through hash algorithm as each data processing server point With data, the information of single server process data is just distributed, and then data just are carried out with single server process data to place Equilibrium calculation, the data volume for distributing to each server is set to try one's best balanced.Such method cause each server application and Resource utilization is tried one's best equilibrium, is also shortened task distribution server and is determined duration used in data processed result.
In one embodiment, distributed data processing device can also include data storage cell, can be in data point Before blade unit carries out data fragmentation, pending data is stored in database in advance, so that data processing server can root Data extraction is carried out according to burst information.
In one embodiment, distributed data processing device can also include algorithm designating unit, can be at data Reason server specifies the algorithm for being currently used in processing pending data, so as to improve the flexibility ratio of data processing.
Distributed data processing device can include data capture unit and data processing unit in one embodiment.
Data capture unit can obtain burst information, and pending data burst is obtained according to burst information.In a reality Apply in example, data capture unit can use multiple threads while listening for database.
Data processing unit can handle pending data burst, feedback processing result.In one embodiment, when multiple When thread obtains burst information, data processing unit is using the line for obtaining pending data burst corresponding to burst information at first Journey handles pending data burst, and other threads stop obtaining the operation of the pending data burst.
Such device can avoid multiple threads of data processing server from taking and handle identical task, improve The stability of data processing, and ensure that the efficiency of data processing.
The schematic diagram of another embodiment of the distributed data processing device of the application is as shown in Figure 4.Data fragmentation list The 26S Proteasome Structure and Function of member 401, burst information Dispatching Unit 402 and result acquiring unit 403 is similar to embodiment illustrated in fig. 3, For performing above in distributed data processing method task distribution server performs the step of.Distributed data processing fills Putting also includes data capture unit 404 and data processing unit 405, for perform above in distributed data processing method The step of data processing server performs.
Such distributed data processing device is enabled in server cluster, and any server can conduct Task distribution server, will be other one or more, and task distribution server itself is used as data processing server Family can proceed by the unlatching of task from any server in server cluster, so as to improve the utilization of server Rate, improve Consumer's Experience.
In one embodiment, the function of unit can be realized by the interface of setting, such as:Data capture unit 404 include 3 interfaces, and defining interface class name is referred to as: Com.jd.ipc.simulate.frame.JDDataCollectService, 3 interfaces are respectively:
The method of interface 1:public Map geSplitData(DataContext context)throws Exception
The method explanation of interface 1:This interface method is used to obtain burst information.The method return value is the data of burst information Collection, can be with throw exception during error.
The method of interface 2:public Map getCalcData(DataContext context)throws Exception
The method explanation of interface 2:This interface method is used to obtain pending data burst, can transmit necessary parameter, such as Main table and auxiliary table name, the configuration file of data or filter condition etc..The method return value goes out to calculate the data set of data Staggering the time can be with throw exception.
The method of interface 3:public Map getOutsideData(DataContext context)throws Exception
The method explanation of interface 3:This interface method is used to obtain external data, can transmit necessary parameter, such as external number According to configuration file or filter condition etc..The method return value is the data set of external data, can be with throw exception during error.
Data processing unit can be referred to as including defining interface class name Com.jd.ipc.simulate.module.JDDataCalcService interface, interface are:public boolean Execute (CalcContext context) throws Exception, this interface method are used to calculate data, had The calculating logic of body is realized by user, using the teaching of the invention it is possible to provide pending data, and related configuration data.The method return value is Boolean values, success or failure is handled for identifying, can be with throw exception during error.
The structural representation of one embodiment of the application distributed data processing device is as shown in Figure 5.Distributed data Processing unit includes memory 510 and processor 520.Wherein:Memory 510 can be disk, flash memory or other any non-easy The property lost storage medium.Memory is used to store the instruction in the above corresponding embodiment of distributed data processing method.Processing Device 520 is coupled to memory 510, one or more integrated circuits can be used as to implement, such as microprocessor or microcontroller. The processor 520 is used to perform the instruction stored in memory, can realize distributed data processing, improves at distributed data The efficiency of reason.
In one embodiment, can be with as shown in fig. 6, distributed data processing device 600 includes memory 610 and place Manage device 620.Processor 620 is coupled to memory 610 by BUS buses 630.The distributed data processing device 600 can be with By the externally connected storage device 650 of memory interface 640 to call external data, can also be connected by network interface 660 It is connected to network or an other computer system (not shown).The detailed process of transmission and processing for data is herein no longer Describe in detail.
In this embodiment, instructed by memory stores data, then above-mentioned instruction is handled by processor, can realized Distributed data processing, improve the efficiency of distributed data processing.
In another embodiment, a kind of computer-readable recording medium that the application proposes, is stored thereon with computer The step of distributed data processing method corresponds to the method in embodiment is realized in programmed instruction, the instruction when being executed by processor. It should be understood by those skilled in the art that, embodiments herein can be provided as method, apparatus or computer program product.Cause This, the application can be using the shape of the embodiment in terms of complete hardware embodiment, complete software embodiment or combination software and hardware Formula.Moreover, the application can use the computer for wherein including computer usable program code in one or more to use non-wink The computer program production that when property storage medium is implemented on (including but is not limited to magnetic disk storage, CD-ROM, optical memory etc.) The form of product.
In one embodiment, the application also proposes a kind of server, be configured be able to carry out being mentioned above it is any A kind of device of distributed data processing method, data are carried out so as to be based on the quantity of the server of processing data Burst simultaneously assigns data to each server and handled, so as to reduce in distributed data processing beyond data processing Operation shared by proportion, improve the efficiency of distributed data processing, the data processing for intermediate data amount, what efficiency improved Show especially prominent.
In one embodiment, the application for realizing above distributed data processing method can downloaded and protected After being stored to the lib catalogues of J2EE servers, start server, and calling interface, specify what is calculated in interface parameters Server ip address that main table, burst field, filter condition and participation calculate etc..According to main table, the burst word specified in interface Section and filter condition obtain pending data burst, and the burst information got is distributed into the IP of every server to service Pending data burst corresponding to device acquisition.Every server monitors task distribution data, when the data for having oneself server, obtains Take the data.In order to improve computational efficiency, every server all carries out data processing, each thread list using the mode of multithreading Stay alone and manage the data of a burst.Multithreading calling task computation model completes the calculating of data.
Such server can perform the distributed data processing method being mentioned above after preservation application is downloaded, Without the specific dependence environment of configuration, safeguarded without to the specific environment that relies on, reduce the workload of user, improve Consumer's Experience.
The schematic diagram of one embodiment of the distributed data processing system of the application is as shown in Figure 7.Distributed treatment system System includes multiple servers, such as server 701~705.Each server can be the server being mentioned above, and be configured with It is able to carry out the device of any one distributed data processing method being mentioned above.Server connects with database 710 respectively Connect, the interaction of data can be carried out by database.
Such distribution can be based on the quantity of the server of processing data and data be carried out with burst and by data Distribute to each server to be handled, so as to reduce shared by the operation in distributed data processing beyond data processing Proportion, the efficiency of distributed data processing is improved, the data processing for intermediate data amount, the performance that efficiency improves particularly is dashed forward Go out.
In one embodiment, the part server in distributed data processing system can be able to carry out distribution above In data processing method task distribution server performs the step of, part server can be able to carry out distributed data above The step of data processing server performs in processing method.
In one embodiment, each server in distributed data processing system can either be above at distributed data In reason method task distribution server performs the step of, data processing in distributed data processing method above is also able to carry out The step of server performs, so as to proceed by the unlatching of task from any server in server cluster, from And the utilization rate of server is improved, improve Consumer's Experience.
The application is with reference to the flow chart according to the method for the embodiment of the present application, equipment (system) and computer program product And/or block diagram describes.It should be understood that can be by each flow in computer program instructions implementation process figure and/or block diagram And/or square frame and the flow in flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided to refer to The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is made to produce One machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for realizing The device for the function of being specified in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
So far, the application is described in detail.In order to avoid covering the design of the application, it is public that this area institute is not described Some details known.Those skilled in the art as described above, can be appreciated how to implement technology disclosed herein completely Scheme.
The present processes and device may be achieved in many ways.For example, can by software, hardware, firmware or Person's software, hardware, any combinations of firmware realize the present processes and device.The step of for methods described it is above-mentioned Order is not limited to order described in detail above merely to illustrate, the step of the present processes, unless with other sides Formula illustrates.In addition, in certain embodiments, the application can be also embodied as recording program in the recording medium, these Program includes being used to realize the machine readable instructions according to the present processes.Thus, the application also covers storage and is used to perform According to the recording medium of the program of the present processes.
Finally it should be noted that:Above example is only illustrating the technical scheme of the application rather than its limitations;To the greatest extent The application is described in detail with reference to preferred embodiment for pipe, those of ordinary skills in the art should understand that:Still The embodiment of the application can be modified or equivalent substitution is carried out to some technical characteristics;Without departing from this Shen Please technical scheme spirit, it all should cover among the claimed technical scheme scope of the application.

Claims (19)

1. a kind of distributed data processing method, including:
Pending data is averaged burst by task distribution server according to the quantity of data processing server, obtains described pending The burst information of data;
Each burst information is sent to corresponding data processing server by the task distribution server, so as to the number Obtained according to processing server according to the burst information and handle pending data burst;
The task distribution server determines data processed result according to the feedback result of each data processing server.
2. according to the method for claim 1, wherein, the task distribution server is according to the quantity of data processing server Pending data is averaged burst, obtaining the burst information of the pending data includes:
The task distribution server is single by the average burst of the pending data, acquisition according to the quantity of data processing server The information of server process data;
According to the predetermined number of threads of the single data processing server by single server process data fragmentation, institute is obtained State burst information.
3. according to the method for claim 2, wherein, the task distribution server according to data processing server quantity By the average burst of the pending data, obtaining the information of single server process data includes:
Data are distributed for each data processing server by hash algorithm, obtains and just distributes single server process data Information;
By just distributing single server process data described in data balancing algorithm process so that data balancing distribution, obtains the list The information of server process data.
4. according to the method for claim 2, wherein, each burst information is sent to by the task distribution server Corresponding data processing server includes:
The burst information is stored in database by the task distribution server with predetermined policy, so that the data processing takes Device be engaged in when monitoring the database, the burst information is obtained according to the fresh information for meeting the predetermined policy.
5. the method according to claim 11, wherein,
Multiple threads of the data processing server are while listening for the database;
When multiple threads obtain burst information, pending data burst corresponding to the burst information is obtained at first Pending data burst described in thread process, other threads continue to monitor the database;
The thread for obtaining the pending data burst continues to monitor the data after completing the pending data burst Storehouse;
Circulation performs said process until all pending datas for distributing to the data processing server are completed in processing.
6. the method according to claim 11, in addition to:
The task distribution server is sent for handling the pending data to each data processing server in advance Algorithm or algorithm mark, so as to each data processing server use corresponding to pending data described in algorithm process.
7. the method according to claim 11, wherein,
The burst information includes source-information, data table information, burst field information, the filter condition of the pending data One or more in information, and the address information of purpose data processing server;
Data that the pending data includes being stored in database, the data from external equipment and pass through Network Capture Data in one or more.
8. a kind of distributed computing devices, including:
Data fragmentation unit, pending data is averaged burst for the quantity according to data processing server, treated described in acquisition The burst information of processing data;
Burst information Dispatching Unit, for each burst information to be sent into corresponding data processing server, with toilet Data processing server is stated to be obtained according to the burst information and handle pending data burst;
As a result acquiring unit, for determining data processed result according to the feedback information of each data processing server.
9. device according to claim 8, wherein, the data fragmentation unit includes:
First burst subelement, for the quantity according to data processing server by the average burst of the pending data, obtain The information of single server process data;
Second burst subelement, for the predetermined number of threads according to the single data processing server by single server Processing data burst, obtain the burst information.
10. device according to claim 9, wherein, the first burst subelement is used for:
Data are distributed for each data processing server by hash algorithm, obtains and just distributes single server process data Information;
By just distributing single server process data described in data balancing algorithm process, single server process data are obtained Information.
11. device according to claim 9, wherein, the burst information Dispatching Unit is used for:
The burst information is stored in database with predetermined policy, so that the data processing server is monitoring the data During storehouse, the burst information is obtained according to the fresh information for meeting the predetermined policy.
12. device according to claim 8, in addition to:
Algorithm designating unit, for sending the algorithm for handling the pending data to each data processing server Or algorithm mark, so as to pending data described in algorithm process corresponding to each data processing server use.
13. device according to claim 1, in addition to:
Data capture unit, for obtaining burst information, pending data burst is obtained according to the burst information;
Data processing unit, for handling the pending data burst, feedback processing result.
14. device according to claim 13, wherein,
The data capture unit is not handling the thread of pending data burst while listening for the database using multiple;
The data processing unit is used for when multiple threads obtain burst information, is believed using the burst is obtained at first Pending data burst described in the thread process of pending data burst corresponding to breath;
The thread for obtaining the pending data burst continues to monitor the data after completing the pending data burst Storehouse.
15. device according to claim 8, wherein,
The burst information includes source-information, data table information, burst field information, the filter condition of the pending data One or more in information, and the address information of purpose data processing server;
Data that the pending data includes being stored in database, the data from external equipment and pass through Network Capture Data in one or more.
16. a kind of distributed data processing device, including:
Memory;And
The processor of the memory is coupled to, the processor is configured as performing based on the instruction for being stored in the memory Method as described in any one of claim 1 to 7.
17. a kind of computer-readable recording medium, is stored thereon with computer program instructions, real when the instruction is executed by processor The step of showing the method described in claim 1 to 7 any one.
18. a kind of server, including the dress for distributed data processing method described in perform claim 1~7 any one of requirement Put.
19. a kind of distributed data processing system, including the server described in multiple claims 18.
CN201710783415.4A 2017-09-04 2017-09-04 Distributed data processing method, device, server and system Active CN107508901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710783415.4A CN107508901B (en) 2017-09-04 2017-09-04 Distributed data processing method, device, server and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710783415.4A CN107508901B (en) 2017-09-04 2017-09-04 Distributed data processing method, device, server and system

Publications (2)

Publication Number Publication Date
CN107508901A true CN107508901A (en) 2017-12-22
CN107508901B CN107508901B (en) 2020-12-22

Family

ID=60695522

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710783415.4A Active CN107508901B (en) 2017-09-04 2017-09-04 Distributed data processing method, device, server and system

Country Status (1)

Country Link
CN (1) CN107508901B (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664660A (en) * 2018-05-21 2018-10-16 北京五八信息技术有限公司 Distributed implementation method, apparatus, equipment and the storage medium of time series database
CN109101394A (en) * 2018-07-09 2018-12-28 珠海格力电器股份有限公司 Data processing method and device
CN109117189A (en) * 2018-07-02 2019-01-01 杭州振牛信息科技有限公司 Data processing method, device and computer equipment
CN109240624A (en) * 2018-09-29 2019-01-18 郑州云海信息技术有限公司 A kind of data processing method and device
CN109660587A (en) * 2018-10-22 2019-04-19 平安科技(深圳)有限公司 Data push method, device, storage medium and server based on random number
CN109656694A (en) * 2018-11-02 2019-04-19 国网青海省电力公司 A kind of distributed approach and system of energy storage monitoring data
CN109670932A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Credit data calculate method, apparatus, system and computer storage medium
CN110008017A (en) * 2018-12-06 2019-07-12 阿里巴巴集团控股有限公司 A kind of distributed processing system(DPS) and method, a kind of calculating equipment and storage medium
CN110113387A (en) * 2019-04-17 2019-08-09 深圳前海微众银行股份有限公司 A kind of processing method based on distributed batch processing system, apparatus and system
CN110134326A (en) * 2018-02-09 2019-08-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of fragment cutting
CN110443695A (en) * 2019-07-31 2019-11-12 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN110704183A (en) * 2019-09-18 2020-01-17 深圳前海大数金融服务有限公司 Data processing method, system and computer readable storage medium
CN110765179A (en) * 2019-10-18 2020-02-07 京东数字科技控股有限公司 Distributed account checking processing method, device, equipment and storage medium
CN111145028A (en) * 2019-12-31 2020-05-12 中国银行股份有限公司 Distributed text pre-check method and device
CN111176842A (en) * 2019-12-23 2020-05-19 中国平安财产保险股份有限公司 Data processing method and device, electronic equipment and storage medium
CN111782348A (en) * 2019-04-04 2020-10-16 北京沃东天骏信息技术有限公司 Application program processing method, device, system and computer readable storage medium
CN111951091A (en) * 2020-08-13 2020-11-17 金蝶软件(中国)有限公司 Transaction flow reconciliation method, system and related equipment
CN112231330A (en) * 2020-10-15 2021-01-15 中体彩科技发展有限公司 Control method and system for preventing lottery game from being repeated and rewarded
CN112468548A (en) * 2020-11-13 2021-03-09 苏州智加科技有限公司 Data processing method, device, system, server and readable storage medium
CN112667656A (en) * 2020-12-07 2021-04-16 南方电网数字电网研究院有限公司 Transaction data processing method and device, computer equipment and storage medium
CN113051103A (en) * 2019-12-27 2021-06-29 中国移动通信集团湖南有限公司 Data processing method and device and electronic equipment
CN115378889A (en) * 2022-08-18 2022-11-22 中国工商银行股份有限公司 Data flow control method and device

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227460A (en) * 2007-01-19 2008-07-23 秦晨 Method for uploading and downloading distributed document and apparatus and system thereof
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
CN102495857A (en) * 2011-11-21 2012-06-13 北京新媒传信科技有限公司 Load balancing method for distributed database
CN102622209A (en) * 2011-11-28 2012-08-01 苏州奇可思信息科技有限公司 Parallel audio frequency processing method for multiple server nodes
US20120311395A1 (en) * 2011-06-06 2012-12-06 Cleversafe, Inc. Storing portions of data in a dispersed storage network
CN102882983A (en) * 2012-10-22 2013-01-16 南京云创存储科技有限公司 Rapid data memory method for improving concurrent visiting performance in cloud memory system
CN103092886A (en) * 2011-11-07 2013-05-08 中国移动通信集团公司 Achieving method, device and system for data query operation
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN103577503A (en) * 2012-08-10 2014-02-12 鸿富锦精密工业(深圳)有限公司 Cloud file storage system and method
CN104102646A (en) * 2013-04-07 2014-10-15 腾讯科技(深圳)有限公司 Method, device and system for processing data
CN105373746A (en) * 2015-11-26 2016-03-02 深圳市金证科技股份有限公司 Distributed data processing method and device
CN106254470A (en) * 2016-08-08 2016-12-21 广州唯品会信息科技有限公司 Distributed job burst distribution method and device

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101227460A (en) * 2007-01-19 2008-07-23 秦晨 Method for uploading and downloading distributed document and apparatus and system thereof
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
US20120311395A1 (en) * 2011-06-06 2012-12-06 Cleversafe, Inc. Storing portions of data in a dispersed storage network
CN103092886A (en) * 2011-11-07 2013-05-08 中国移动通信集团公司 Achieving method, device and system for data query operation
CN102495857A (en) * 2011-11-21 2012-06-13 北京新媒传信科技有限公司 Load balancing method for distributed database
CN102622209A (en) * 2011-11-28 2012-08-01 苏州奇可思信息科技有限公司 Parallel audio frequency processing method for multiple server nodes
CN103577503A (en) * 2012-08-10 2014-02-12 鸿富锦精密工业(深圳)有限公司 Cloud file storage system and method
CN102882983A (en) * 2012-10-22 2013-01-16 南京云创存储科技有限公司 Rapid data memory method for improving concurrent visiting performance in cloud memory system
CN104102646A (en) * 2013-04-07 2014-10-15 腾讯科技(深圳)有限公司 Method, device and system for processing data
CN103473334A (en) * 2013-09-18 2013-12-25 浙江中控技术股份有限公司 Data storage method, inquiry method and system
CN105373746A (en) * 2015-11-26 2016-03-02 深圳市金证科技股份有限公司 Distributed data processing method and device
CN106254470A (en) * 2016-08-08 2016-12-21 广州唯品会信息科技有限公司 Distributed job burst distribution method and device

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134326A (en) * 2018-02-09 2019-08-16 北京京东尚科信息技术有限公司 A kind of method and apparatus of fragment cutting
CN108664660A (en) * 2018-05-21 2018-10-16 北京五八信息技术有限公司 Distributed implementation method, apparatus, equipment and the storage medium of time series database
CN109117189B (en) * 2018-07-02 2021-06-08 杭州振牛信息科技有限公司 Data processing method and device and computer equipment
CN109117189A (en) * 2018-07-02 2019-01-01 杭州振牛信息科技有限公司 Data processing method, device and computer equipment
CN109101394A (en) * 2018-07-09 2018-12-28 珠海格力电器股份有限公司 Data processing method and device
CN109101394B (en) * 2018-07-09 2023-05-09 珠海格力电器股份有限公司 Data processing method and device
CN109670932A (en) * 2018-09-25 2019-04-23 平安科技(深圳)有限公司 Credit data calculate method, apparatus, system and computer storage medium
CN109670932B (en) * 2018-09-25 2024-02-20 平安科技(深圳)有限公司 Credit data accounting method, apparatus, system and computer storage medium
CN109240624A (en) * 2018-09-29 2019-01-18 郑州云海信息技术有限公司 A kind of data processing method and device
CN109660587A (en) * 2018-10-22 2019-04-19 平安科技(深圳)有限公司 Data push method, device, storage medium and server based on random number
CN109660587B (en) * 2018-10-22 2022-07-29 平安科技(深圳)有限公司 Data pushing method and device based on random number, storage medium and server
CN109656694A (en) * 2018-11-02 2019-04-19 国网青海省电力公司 A kind of distributed approach and system of energy storage monitoring data
CN110008017A (en) * 2018-12-06 2019-07-12 阿里巴巴集团控股有限公司 A kind of distributed processing system(DPS) and method, a kind of calculating equipment and storage medium
CN110008017B (en) * 2018-12-06 2023-08-15 创新先进技术有限公司 Distributed processing system and method, computing device and storage medium
CN111782348A (en) * 2019-04-04 2020-10-16 北京沃东天骏信息技术有限公司 Application program processing method, device, system and computer readable storage medium
CN110113387A (en) * 2019-04-17 2019-08-09 深圳前海微众银行股份有限公司 A kind of processing method based on distributed batch processing system, apparatus and system
CN110443695A (en) * 2019-07-31 2019-11-12 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN110704183B (en) * 2019-09-18 2021-01-08 深圳前海大数金融服务有限公司 Data processing method, system and computer readable storage medium
CN110704183A (en) * 2019-09-18 2020-01-17 深圳前海大数金融服务有限公司 Data processing method, system and computer readable storage medium
CN110765179A (en) * 2019-10-18 2020-02-07 京东数字科技控股有限公司 Distributed account checking processing method, device, equipment and storage medium
CN111176842A (en) * 2019-12-23 2020-05-19 中国平安财产保险股份有限公司 Data processing method and device, electronic equipment and storage medium
CN113051103A (en) * 2019-12-27 2021-06-29 中国移动通信集团湖南有限公司 Data processing method and device and electronic equipment
CN113051103B (en) * 2019-12-27 2023-09-05 中国移动通信集团湖南有限公司 Data processing method and device and electronic equipment
CN111145028A (en) * 2019-12-31 2020-05-12 中国银行股份有限公司 Distributed text pre-check method and device
CN111951091A (en) * 2020-08-13 2020-11-17 金蝶软件(中国)有限公司 Transaction flow reconciliation method, system and related equipment
CN111951091B (en) * 2020-08-13 2023-12-29 金蝶软件(中国)有限公司 Transaction flow reconciliation method, system and related equipment
CN112231330A (en) * 2020-10-15 2021-01-15 中体彩科技发展有限公司 Control method and system for preventing lottery game from being repeated and rewarded
CN112468548A (en) * 2020-11-13 2021-03-09 苏州智加科技有限公司 Data processing method, device, system, server and readable storage medium
CN112667656A (en) * 2020-12-07 2021-04-16 南方电网数字电网研究院有限公司 Transaction data processing method and device, computer equipment and storage medium
CN115378889A (en) * 2022-08-18 2022-11-22 中国工商银行股份有限公司 Data flow control method and device

Also Published As

Publication number Publication date
CN107508901B (en) 2020-12-22

Similar Documents

Publication Publication Date Title
CN107508901A (en) Distributed data processing method, apparatus, server and system
CN103780655B (en) A kind of message passing interface task and resource scheduling system and method
Calheiros et al. Cost-effective provisioning and scheduling of deadline-constrained applications in hybrid clouds
US9262228B2 (en) Distributed workflow in loosely coupled computing
US9805170B2 (en) System and methods for performing medical physics calculations
Abd Latiff A checkpointed league championship algorithm-based cloud scheduling scheme with secure fault tolerance responsiveness
CN103873321A (en) Distributed file system-based simulation distributed parallel computing platform and method
CN107291546A (en) A kind of resource regulating method and device
CN105808328B (en) The methods, devices and systems of task schedule
CN110060765A (en) A kind of standardization cloud radiotherapy planning method, storage medium and system
CN105872068A (en) Cloud platform and automatic operation check method based on same
US10782988B2 (en) Operating system for distributed enterprise artificial intelligence programs on data centers and the clouds
Mahato et al. On scheduling transactions in a grid processing system considering load through ant colony optimization
CN108920948A (en) A kind of anti-fraud streaming computing device and method
CN104850394B (en) The management method and distributed system of distributed application program
CN109240814A (en) A kind of deep learning intelligent dispatching method and system based on TensorFlow
CN113064744A (en) Task processing method and device, computer readable medium and electronic equipment
CN105989133A (en) Transaction processing method and device
Orellana et al. FPGA‐aware scheduling strategies at hypervisor level in cloud environments
CN112087518B (en) Consensus method, apparatus, computer system, and medium for blockchains
Hayes et al. Design and Analytical Model of a PlatformasaService Cloud for Healthcare
WO2020047390A1 (en) Systems and methods for hybrid burst optimized regulated workload orchestration for infrastructure as a service
CN110109732A (en) A kind of virtual machine management method based on cloud computing
CN115543614A (en) Model training method, device, system, electronic equipment and storage medium
CN108446174A (en) Multinuclear job scheduling method based on pre-allocation of resources and public guiding agency

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant