CN104980462B

CN104980462B - Distributed computing method, device and system

Info

Publication number: CN104980462B
Application number: CN201410136942.2A
Authority: CN
Inventors: 刘科峰; 蔡晨; 周明辉; 叶星
Original assignee: Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2014-04-04
Filing date: 2014-04-04
Publication date: 2018-03-30
Anticipated expiration: 2034-04-04
Also published as: CN104980462A

Abstract

The invention discloses a kind of distributed computing method, device and system.Wherein, distributed computing method includes：The aggregated data that computer node receives is obtained, wherein, aggregated data is the data for converging operation；Aggregated data is stored at least one key value databases；Aggregated data is calculated at least one key value databases, obtains result of calculation；And result of calculation is returned into computer node.By the present invention, the performance for improving distributed computing system is reached.

Description

Distributed computing method, device and system

Technical field

The present invention relates to data processing field, in particular to a kind of distributed computing method, device and system.

Background technology

When doing some polymerization analysis to mass data, generally use Distributed Calculation is first by data distribution to different meters Calculation machine node calculates up, and then each computer node collects the result of calculation being calculated, and obtains final result of calculation. Wherein, Distributed Calculation is to need very huge computing capability to solve the problems, such as to be divided into many small parts one, Then many computer nodes are distributed in these parts and carries out parallel processing, finally these result of calculations are integrated to obtain Final result.During Distributed Calculation is carried out, how by the data distribution of magnanimity to each computer node, and will be each It is all a big difficult point doing Distributed Calculation that the result of calculation of computer node, which collects,.

Map/Reduce methods can be used to realize Distributed Calculation in the prior art.Map/Reduce is one by large-scale point Cloth calculation expression is a programming model for set serialize distributed operation to data key/value.But Map/ Reduce is typically used in hadoop systems, and its open source community is not provided with the framework that can be used for data to analyze in real time, needs Want self-developing one big similar to the real-time analytical frameworks of Map/Reduce, exploitation amount.Inventor's discovery, in calculating process, respectively Also need to carry out data interaction between computer node, calculating process is complicated, increases the expense of computer node, causes distribution The performance of computing system reduces.

For distributed computing system in the prior art performance it is low the problem of, not yet propose effective solution party at present Case.

The content of the invention

It is a primary object of the present invention to provide a kind of distributed computing method and device, to solve to be distributed in the prior art The problem of performance of formula computing system is low.

To achieve these goals, according to an aspect of the invention, there is provided a kind of distributed computing method.According to this The distributed computing method of invention includes：The aggregated data that computer node receives is obtained, wherein, aggregated data is for gathering The data of closing operation；Aggregated data is stored at least one key-value databases；In at least one key-value databases In aggregated data is calculated, obtain result of calculation；Result of calculation is returned into computer node.

To achieve these goals, according to another aspect of the present invention, there is provided a kind of distributed computing devices.According to this The distributed computing devices of invention include：First acquisition unit, the aggregated data received for obtaining computer node, its In, aggregated data is the data for converging operation；Unit is stored in, for aggregated data to be stored in at least one key-value Database；First computing unit, for being calculated at least one key-value databases aggregated data, counted Calculate result；Returning unit, for result of calculation to be returned into computer node.

To achieve these goals, according to another aspect of the present invention, there is provided a kind of distributed computing system.According to this The distributed computing system of invention includes：Computer node, for receiving aggregated data, aggregated data is for converging operation Data；Router；At least one key-value databases, it is connected via router with computer node, for via route Device obtains the aggregated data that computer node receives；Store aggregated data；Aggregated data is calculated, obtains calculating knot Fruit；And result of calculation is returned into computer node.

Pass through the embodiment of the present invention, it would be desirable to which the aggregated data for carrying out converging operation is stored at least one key-value numbers According to storehouse, aggregated data is calculated at least one key-value databases, and result of calculation shared, each computer Data interaction need not be carried out between node, so as to avoid due to needing progress data interaction to cause to be distributed between computer node The excessively complicated situation of formula calculating process, solves the problems, such as that the performance of distributed computing system is low, it is distributed to have reached raising The performance of computing system.

Brief description of the drawings

Accompanying drawing described herein is used for providing a further understanding of the present invention, forms the part of the application, this hair Bright schematic description and description is used to explain the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings：

Fig. 1 is the flow chart of distributed computing method according to a first embodiment of the present invention；

Fig. 2 is the flow chart of distributed computing method according to a second embodiment of the present invention；

Fig. 3 is the schematic diagram of distributed computing devices according to a first embodiment of the present invention；

Fig. 4 is the schematic diagram of distributed computing devices according to a second embodiment of the present invention；And

Fig. 5 is the schematic diagram of distributed computing system according to embodiments of the present invention.

Embodiment

In order that those skilled in the art more fully understand the present invention program, below in conjunction with the embodiment of the present invention Accompanying drawing, the technical scheme in the embodiment of the present invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, rather than whole embodiments.Based on the embodiment in the present invention, ordinary skill people The every other embodiment that member is obtained under the premise of creative work is not made, it should all belong to the model that the present invention protects Enclose.

It should be noted that term " first " in description and claims of this specification and above-mentioned accompanying drawing, " Two " etc. be for distinguishing similar object, without for describing specific order or precedence.It should be appreciated that so use Data can exchange in the appropriate case, so as to embodiments of the invention described herein can with except illustrating herein or Order beyond those of description is implemented.In addition, term " comprising " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, be not necessarily limited to for example, containing the process of series of steps or unit, method, system, product or equipment Those steps or unit clearly listed, but may include not list clearly or for these processes, method, product Or the intrinsic other steps of equipment or unit.

The embodiment of the present invention additionally provides a kind of distributed computing method.This method is operated in distributed computing system. Fig. 1 is the flow chart of distributed computing method according to a first embodiment of the present invention.As shown in figure 1, the distributed computing method It is as follows including step：

Step S102, obtain the aggregated data that computer node receives.

Aggregated data is the data for converging operation, can be the data for needing to carry out polymerization analysis.Computer node , it is necessary to which it is calculated and handled accordingly after the aggregated data is received.Wherein, computer node can include more Individual computer node, Distributed Calculation is carried out to data by multiple computer nodes.For example, in the website Xia Bao of a domain name Multiple servers are included, the network user can produce substantial amounts of data, such as IP address of visitor etc. when accessing the website, will These substantial amounts of data distributions are to being handled on different servers.Wherein, equivalent to one computer of each server Node.

Step S104, aggregated data is stored at least one key-value databases.

After aggregated data is got, aggregated data can be stored at least one key-value databases, i.e. Aggregated data can be deposited into a key-value database or be deposited into multiple key-value databases.Can To be that computer node will need the data for carrying out polymerization analysis to be stored in key-value databases, key-value databases are made For shared drive, in order to be iterated calculating to the data by key-value databases.Key-value databases can be with For data to be carried out with calculating processing, key-value databases can be such as redis databases database.Different meters Calculation machine node can also carry out data interaction by key-value databases.

In the embodiment of the present invention, the data that computer node will can also interact write at least one key-value numbers According to storehouse, wherein, the data to be interacted can need interactive data between different computer nodes, for example, computer node Including computer node A and computer node B, computer node A is during data calculating is carried out, it is necessary to use computer Data in node B, then computer node A can directly access at least one key-value databases, obtain computer node B is stored in advance in the data at least one key-value databases, then performs corresponding calculate.Due to computer node Only need to carry out data interaction with key-value databases, the data write-in key-value numbers that square computer node will interact Behind storehouse, other computer nodes can quickly access these resources, and computer node accesses key-value databases Performance be limited only in bandwidth between computer node and key-value databases.

In the embodiment of the present invention, aggregated data may have different action types, i.e. need to carry out converging operation not phase Together, for example, to the converging operation types that are calculated of quantity of the nearest 10 minutes User IPs for accessing website with to nearest 10 points The converging operation type that the number that clock same subscriber IP accesses website is calculated is different.Due to aggregated data action type not Together, computer node by aggregated data write key-value databases before, it is necessary to first determine aggregated data operation class Type, in order to select corresponding data structure to be stored from key-value databases.

Step S106, aggregated data is calculated at least one key-value databases, obtains result of calculation.

It can be that calculating is iterated to aggregated data that calculating is carried out to aggregated data, and aggregated data is being stored in into key- After value databases, iterator can be constructed in key-value databases, utilize the data knot of key-value databases Structure and built-in operation are iterated calculating to aggregated data, obtain result of calculation.Basic thought is to complete to polymerize by iteration Calculate：F (n)=g (f (n-1)), f (n) are n-th result of calculation, and g (f (n-1)) is that (n-1)th result of calculation calculates with n-th As a result functional relation, for example, being summed by Sum：Sum (10)=sum (9)+10, wherein, sum (9) is the 9th result of calculation, 10 be the increment in the 9th result of calculation, and sum (10) is the 10th result of calculation.

Specifically, it is public can be constructed in key-value databases according to the action type of different aggregated datas for iteration Formula, the iterative formula can be used to indicate that the intermediate result calculated each time in key-value databases.Certainly, for one The action type of a little common aggregated datas, can be pre-created corresponding iterative formula in key-value databases, so as to Reduce the time of Distributed Calculation and reduce expense.For example, which User IP calculates nearest 10 minutes has have accessed website, then The iterative formula of the calculating can be pre-created：Distinct (n)=distinct { distinct (n-1), n }.

Step S108, result of calculation is returned into computer node.

After result of calculation is calculated, result of calculation can be returned to computer section by key-value databases Point, can be that the result of calculation being calculated is returned to computer node by key-value databases, computer node is according to this Result of calculation is carrying out corresponding calculating processing.Which, for example, when calculating 10 minutes User IPs and accessing the quantity of website, calculate The related data of the User IP for accessing website are transferred at least one key-value databases by machine node, in key-value numbers Statistics calculating is carried out to the data according in storehouse, when access User IP and 10 minutes access website User IP differ, then Statistical result adds 1, and by that analogy, key-value databases calculated the User IP for accessing website every 10 minutes, and will calculate As a result computer node is returned to.The statistical result that computer node can return according to key-value databases, is counted Calculate, obtain which User IP in a hour have accessed website.Certainly, key-value databases are calculated aggregated data After obtaining result of calculation, the result of calculation can also be stored, in order to which other computer nodes obtain the result of calculation, or The same computer node of person obtains the result of calculation again.

According to embodiments of the present invention, by the way that the aggregated data for carrying out converging operation will be needed to be stored at least one key- Value databases, aggregated data is calculated at least one key-value databases, and result of calculation is shared, respectively Data interaction need not be carried out between computer node, so as to avoid due to needing progress data interaction to lead between computer node The excessively complicated situation of Distributed Calculation process is caused, solves the problems, such as that the performance of distributed computing system is low, has reached raising The performance of distributed computing system.

Fig. 2 is the flow chart of distributed computing method according to a second embodiment of the present invention.The distributed meter of the embodiment Calculation method can be a kind of preferred embodiment of the distributed computing method of above-described embodiment.As shown in Fig. 2 the distribution is counted It is as follows that calculation method includes step：

Step S202, obtain the aggregated data that computer node receives.

Step S204, determine the action type of the converging operation of aggregated data.

Because aggregated data may have different action types, i.e. need progress converging operation to differ.It is for example, right The converging operation type that is calculated of quantity of the nearest 10 minutes User IPs for accessing website with to nearest 10 minutes each users The converging operation type that the number that IP accesses website is calculated is different.Wherein, to the nearest 10 minutes User IPs for accessing website Quantity when being calculated, need to only count the User IP for accessing website, no matter whether User IP identical, is required for carrying out cumulative system Meter, its action type be the statistics of data is added and.The number for accessing website to nearest 10 minutes each User IPs calculates When, not only need to count User IP, it is also necessary to which the number for accessing each IP website counts, and its action type is First classification to data counts again.Because the action type of aggregated data is different, after aggregated data is got, and calculate Machine node is before aggregated data to be write at least one key-value databases, it is necessary to first determine the operation class of aggregated data Type, in order to select corresponding data structure to be stored from key-value databases.

Step S206, the number according to corresponding to action type selection operation type from least one key-value databases According to structure.

It is determined that aggregated data converging operation action type after, can be selected according to the action type of the aggregated data Corresponding data structure is selected, with the data structure storage aggregated data of selection.Preferably, the key-value of the embodiment of the present invention Database is redis databases.Because redis databases support that the value types of storage are a lot, including string（Character String）、list（Chained list）、set（Set）、zset（Sorted set-ordered set）And hash（Hash type）These data class Type all supports push/pop, add/remove and takes common factor and difference set and more rich operation, and these operations are all atoms Property.For example, which User IP to calculate has access website for nearest 10 minutes, the set structures pair of redis databases can be utilized Data are stored；Calculate nearest 10 minutes each User IPs and access the number of website, then need to utilize zset structure logarithms According to being stored；Calculate that nearest 10 minutes how many IP have accessed website, stores using Hash structures to data.

Aggregated data is write at least one key-value numbers by step S208, computer node with the data structure selected According to storehouse.

After the data structure of selective polymerization data, computer node writes aggregated data with the data structure selected At least one key-value databases, facilitate the use key-value databases data structure and it is built-in operation to polymerization Data carry out that result of calculation is calculated.Due to plurality of data structures can be included in a key-value database, because This, the aggregated data of different operating type can be stored in a key-value database, can also be stored in different In key-value databases, at that time, for same calculated examples, its result of calculation needs to be stored in identical memory space. For example, which User IP to be calculated in above-mentioned has access website for nearest 10 minutes, it is necessary to which the related data of User IP are deposited into In same key-value database instances, statistics calculating is carried out to the data so as to reach.

Step S210, aggregated data is calculated at least one key-value databases, obtains result of calculation.

After aggregated data to be stored in at least one key-value databases, can in key-value databases structure Iterator is made, calculating is iterated to aggregated data using the data structure and built-in operation of key-value databases, is obtained Result of calculation.Basic thought is to complete polymerization by iteration to calculate：F (n)=g (f (n-1)), f (n) are n-th result of calculation, g (f (n-1)) is the functional relation of (n-1)th result of calculation and n-th result of calculation, for example, being summed by Sum：sum(10)= Sum (9)+10, wherein, sum (9) is the 9th result of calculation, and 10 be the increment in the 9th result of calculation, and sum (10) is the 10 result of calculations.

Step S212, result of calculation is returned into computer node.

After result of calculation is calculated, result of calculation is returned into computer node by key-value databases, can To be that the result of calculation being calculated is returned to computer node by key-value databases, computer node is according to the calculating As a result corresponding calculating processing is being carried out.For example, when calculating 10 minutes which User IPs and accessing the quantity of website, computer section The related data of the User IP for accessing website are transferred to key-value databases by point, to the number in key-value databases According to statistics calculating is carried out, when the User IP and the User IP of 10 minutes access websites of access differ, then statistical result adds 1, By that analogy, key-value databases calculated the User IP for accessing website every 10 minutes, and result of calculation is returned into meter Calculation machine node.The statistical result that computer node can return according to key-value databases, is calculated, is obtained one small When which interior User IP have accessed website.Certainly, key-value databases aggregated data be calculated result of calculation it Afterwards, the result of calculation can also be stored, in order to which other computer nodes obtain the result of calculation, or same computer section Point obtains the result of calculation again.

According to embodiments of the present invention, by selecting different data structures from redis databases, different operating type Aggregated data is stored with corresponding data structure, in order to complete collecting for result of calculation by being operated built in redis, is carried The performance of high distributed computing system.

Preferably, aggregated data is stored in at least one key-value databases includes：Computer node is by aggregated data It is converted into key-value pair；Storage with the action type identical key-value databases of aggregated data is determined by hash algorithm Space；And by the key-value pair write-in memory space of conversion.

Computer node, can be by aggregated data with key-value pair after receiving and needing to do the aggregated data of converging operation Form be sent to key-value databases.Aggregated data is being sent to key-value database mistakes in the form of key-value pair Cheng Zhong, the related example of the aggregated data can be determined by hash algorithm, i.e. the action type identical with aggregated data The memory space of key-value databases, it is ensured that deposited to identical of the same type of data storage in key-value databases Store up in space.The key-value pair of correlation is distributed in an example by hash algorithm and is iterated calculating.

Such as：" nearest 10 minutes how many individual User IPs will be calculated and have accessed website " and be designated as action type 1, computer node After receiving data, it can be determined that whether the data are data corresponding to action type 1, if it is, the data are converted For the form of 1 corresponding key-value pair of action type, and the key-value pair is written in the example corresponding to action type 1, key- Result of calculation is iterated calculating before value databases are based on, and result of calculation adds 1.So, avoid due to storage position not With the problem of leading to not be iterated calculating to result of calculation, reach the effect accurately iterated to calculate to aggregated data Fruit.

Preferably, aggregated data is calculated at least one key-value databases, obtains result of calculation bag Include：

Step 1, the iterative formula according to corresponding to action type creates aggregated data.

Due to different operating type aggregated data with different data structure storages in key-value databases, number According to the difference of structure, the representation of its intermediate result being calculated also differs.Such as when key-value databases are During redis databases, following three situation：Situation one, which User IP to calculate has access website, Ke Yili for nearest 10 minutes Data are stored with the set structures of redis databases；Situation two, to calculate nearest 10 minutes each User IPs and access net The number stood, then need to store data using zset structures；Situation three, to calculate that nearest 10 minutes how many IP are accessed Data are stored using Hash structures by website.Iterative formula can be constructed accordingly for situation one： Distinct (n)=distinct { distinct (n-1), n }, for representing the middle knot being calculated in redis databases Fruit.For situation two and situation three, then corresponding key-value pair can be constructed using the characteristics of its data structure to represent what is calculated Intermediate result.Certainly, for some conventional data structures, it is corresponding that aggregated data can be pre-created in redis databases Iterative formula, so as to improve the efficiency of Distributed Calculation.

Step 2, aggregated data is changed at least one key-value databases by the iterative formula of establishment In generation, calculates.

After iterative formula corresponding to aggregated data is created, iterative formula of the key-value databases based on establishment Calculating is iterated to aggregated data.For example, after key-value databases receive the aggregated data of computer node, it is right The aggregated data received carries out that intermediate result is calculated, and then exports the intermediate result according to the form of iterative formula.When When key-value databases receive the aggregated data of correlation again, the form of expression based on intermediate result obtains above-mentioned centre As a result, calculating and on the basis of the intermediate result is iterated with the data received again, the like, obtain final Result of calculation.

According to embodiments of the present invention, represent what key-value databases were calculated by using different iterative formulas Intermediate result, when being iterated calculating, the result of last calculating can be quickly located（That is intermediate result）, so as to improve The accuracy of iterative calculation.

Preferably, computer node includes the first computer node and second computer node, wherein, obtain computer section The aggregated data that point receives includes：At least one key-value databases obtain the polymerization that the first computer node receives Data.Aggregated data is calculated at least one key-value databases, after obtaining result of calculation, distribution meter Calculation method also includes：Second computer node obtains result of calculation from key-value databases；Second computer node is based on The result of calculation got carries out data calculating.

Second computer node can be the computer nodes different from the first computer node, can be specifically One computer node will write key-value databases for the data for carrying out converging operation, utilize key-value database phases The data structure and built-in operation answered calculate result of calculation, and result of calculation is returned into the first computer node, meanwhile, The result of calculation specifically operated under storage.Second computer node can obtain the result of calculation from key-value databases, And corresponding data calculating is carried out based on the result of calculation.For example, aggregated data a is write key- by the first computer node Value databases, key-value database roots calculate which User IP in nearest 10 minutes accesses according to the aggregated data a of write-in Website, to obtain result of calculation be IP-1, IP-2 and IP-3.Second calculate node calculates which user within a nearest hour IP have accessed website, then the result of calculation can be obtained from key-value databases, then counted based on the result of calculation Calculate.

By regarding key-value databases as shared drive, computer node can quickly access key-value data Storehouse, so as to reduce the data interaction between computer node, reduce the expense of each computer node.

Preferably, the key-value databases of the embodiment of the present invention are redis databases, and redis databases are one Key-value memory storage system.A redis database can be used to use multiple redis databases, i.e. Using redis clusters.Using shared drive of the redis clusters as computer node, the capacity of shared drive can be increased.

It should be noted that for foregoing each method embodiment, in order to be briefly described, therefore it is all expressed as a series of Combination of actions, but those skilled in the art should know, the present invention is not limited by described sequence of movement because According to the present invention, some steps can use other orders or carry out simultaneously.Secondly, those skilled in the art should also know Know, embodiment described in this description belongs to preferred embodiment, and involved action and module are not necessarily of the invention It is necessary.

Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation The method of example can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but a lot In the case of the former be more preferably embodiment.Based on such understanding, technical scheme is substantially in other words to existing The part that technology contributes can be embodied in the form of software product, and the computer software product is stored in a storage Medium（Such as ROM/RAM, magnetic disc, CD）In, including some instructions are causing a station terminal equipment（It can be mobile phone, calculate Machine, server, or network equipment etc.）Perform the method described in each embodiment of the present invention.

The embodiments of the invention provide a kind of distributed computing devices, the device can be realized by distributed computing system Its function.Carried it should be noted that the distributed computing devices of the embodiment of the present invention can be used for the execution embodiment of the present invention The distributed computing method of confession, what the distributed computing method of the embodiment of the present invention can also be provided by the embodiment of the present invention Distributed computing devices perform.

Fig. 3 is the schematic diagram of distributed computing devices according to a first embodiment of the present invention.As shown in figure 3, the distribution Computing device includes first acquisition unit 10, deposit unit 20, the first computing unit 30 and returning unit 40.

First acquisition unit 10 is used to obtain the aggregated data that computer node receives.

Aggregated data is the data for converging operation, can be the data for needing to carry out polymerization analysis.Computer node , it is necessary to which it is calculated and handled accordingly after the aggregated data is received.Wherein, computer node can include more Individual computer node, Distributed Calculation is carried out to data by multiple computer nodes.For example, in the website Xia Bao of a domain name Multiple servers are included, the network user can produce substantial amounts of data, such as IP address of visitor etc. when accessing the website, will These substantial amounts of data distributions are to being handled on different servers.Wherein, equivalent to one computer of each server Node.First acquisition unit 10 obtains the aggregated data that computer node receives, in order to right in key-value databases The data are calculated accordingly.

Deposit unit 20 is used to aggregated data being stored at least one key-value databases.

After aggregated data is got, aggregated data can be stored at least one key-value numbers by deposit unit 20 According in storehouse, i.e. aggregated data can be deposited into a key-value database or be deposited into multiple key-value Database.Can be that computer node will need the data for carrying out polymerization analysis to be stored in key-value databases, by key- Value databases are as shared drive, in order to be iterated calculating to the data by key-value databases.key- Value databases can be used for carrying out data calculating processing, and key-value databases can be such as redis databases Database.Different computer nodes can also carry out data interaction by key-value databases.

First computing unit 30 is used to calculate aggregated data at least one key-value databases, obtains Result of calculation.

Returning unit 40 is used to result of calculation returning to computer node.

According to embodiments of the present invention, by the way that the aggregated data for carrying out converging operation will be needed to be stored at least one key- Value databases, calculating is iterated to aggregated data at least one key-value databases, and result of calculation is total to Enjoy, data interaction need not be carried out between each computer node, so as to avoid due to needing to carry out data between computer node Interaction causes the excessively complicated situation of Distributed Calculation process, solves the problems, such as that the performance of distributed computing system is low, reaches Improve the performance of distributed computing system.

Fig. 4 is the schematic diagram of distributed computing devices according to a first embodiment of the present invention.The distributed meter of the embodiment Calculating device can be as a kind of preferred embodiment of the distributed computing devices of above-described embodiment.As shown in figure 4, the distribution Computing device includes first acquisition unit 10, deposit unit 20, the first computing unit 30 and returning unit 40.Wherein, it is distributed Computing device also includes determining unit 50 and selecting unit 60, and deposit unit 20 includes the first writing module 201.First obtains list First 10, first computing unit 30 and returning unit 40 respectively with the first acquisition unit 10 shown in Fig. 3, the first computing unit 30 and The function phase of returning unit 40 is same, does not repeat here.

Determining unit 50 is used for before aggregated data to be stored in at least one key-value databases, determines aggregate number According to converging operation action type.

Because aggregated data may have different action types, i.e. need progress converging operation to differ.It is for example, right The converging operation type that is calculated of quantity of the nearest 10 minutes User IPs for accessing website with to nearest 10 minutes each users The converging operation type that the number that IP accesses website is calculated is different.Wherein, to the nearest 10 minutes User IPs for accessing website Quantity when being calculated, need to only count the User IP for accessing website, no matter whether User IP identical, is required for carrying out cumulative system Meter, its action type be the statistics of data is added and.The number for accessing website to nearest 10 minutes each User IPs calculates When, not only need to count User IP, it is also necessary to which the number for accessing each IP website counts, and its action type is First classification to data counts again.Because the action type of aggregated data is different, after aggregated data is got, and calculate For machine node before aggregated data to be write at least one key-value databases, determining unit 50 first determines aggregated data Action type, in order to select corresponding data structure to be stored from key-value databases.

Selecting unit 60 is for according to action type, selection operation type to be corresponding from least one key-value databases Data structure.

First writing module 201 is used to cause computer node by data structure write-in at least one of the aggregated data to select Individual key-value databases.

According to embodiments of the present invention, by selecting different data structures from redis databases, different operating type Aggregated data is stored with corresponding data structure, in order to complete collecting for result of calculation by being operated built in redis.

Preferably, unit is stored in the embodiment of the present invention includes conversion module, determining module and the second writing module.

Conversion module is used to cause computer node that aggregated data is converted into key-value pair.Determining module is used to pass through Hash Algorithm determines the memory space with the action type identical key-value databases of aggregated data.Second writing module is used for So that computer node is by the key-value pair write-in memory space of conversion.

Preferably, the first computing unit of the embodiment of the present invention includes creation module and computing module.

Creation module is used for the iterative formula according to corresponding to action type creates aggregated data.

Computing module is used to enter aggregated data at least one key-value databases by the iterative formula created Row iteration calculates.

Preferably, computer node includes the first computer node and second computer node, wherein, first acquisition unit Including：First acquisition module, for make it that it is poly- that the first computer node of at least one key-value databases acquisition receives Close data.Distributed computing devices also include：Second acquisition unit, at least one key-value databases to poly- Close data to be calculated, after obtaining result of calculation so that second computer node obtains calculating from key-value databases As a result；Second computing unit, for causing second computer node to carry out data calculating based on the result of calculation got.

The embodiment of the present invention additionally provides a kind of distributed computing system, and the distributed computing system can be used in execution The distributed computing method in embodiment is stated, the distributed computing devices of above-described embodiment can also be realized.

Fig. 5 is the schematic diagram of distributed computing system according to embodiments of the present invention.As shown in figure 5, the Distributed Calculation System includes computer node, router and at least one key-value databases.Computer node is used to receive aggregate number According to the aggregated data is the data for converging operation.At least one key-value databases are via the router and institute State computer node to be connected, for the aggregated data received via router acquisition computer node；Storage The aggregated data；The aggregated data is calculated, obtains result of calculation；And the result of calculation is returned into the meter Calculation machine node.

It should be noted that the key-value databases of the embodiment of the present invention can pass through the distribution of the embodiment of the present invention Formula computing device realizes its function, and certainly, the key-value databases of the embodiment of the present invention can be used for realizing that the present invention is real Apply the distributed computing devices of example.

The distributed computing system of the embodiment of the present invention, using redis internal storage data Sink Characteristics, by each computer node Node is connected, and forms a supercomputer.The data write-in redis that computer node Node will can be interacted, its He can quickly access these resources by computer node, and its access performance is limited only in redis databases and computer node Between bandwidth.Redis is a key-value database, the data group that some are needed to do polymerization analysis by computer node The form for dressing up key-value is sent to redis, and redis returns to result of calculation and give computer section by some built-in calculating Point, while redis can store down the result of calculation specifically operated.Its basic thought is to complete polymerization by iteration to calculate：f(n)= G (f (n-1)), f (n) are n-th result of calculation, and g (f (n-1)) is the function of (n-1)th result of calculation and n-th result of calculation Relation, for example, being summed by Sum：Sum (10)=sum (9)+10, wherein, sum (9) is the 9th result of calculation, and 10 be the 9th Increment in secondary result of calculation, sum (10) are the 10th result of calculation.

Specifically, computer node by aggregated data before redis is write, and first according to converging operation type, selection is not The data structure of same redis storages, such as：Situation one, which User IP to calculate has access website for nearest 10 minutes, can be with Data are stored using the set structures of redis databases；Situation two, to calculate nearest 10 minutes each User IPs and access The number of website, then need to store data using zset structures；Situation three, to calculate that nearest 10 minutes how many IP are visited Website has been asked, data have been stored using Hash structures.

It is iterated in redis databases in the composition of calculating, it is public to construct corresponding iteration according to different action types In the case of formula, such as above-mentioned three kinds, iterative formula can be constructed for situation one：distinct(n)=distinct { distinct (n-1), n }, for the intermediate result for representing to be calculated in redis databases.For situation two and situation Three, then corresponding key-value pair can be constructed using the characteristics of its data structure to represent the intermediate result calculated.

As shown in figure 5, the distributed computing system of the embodiment of the present invention also includes router Router, computer node Node is connected by router Router with redis databases.

Distributed computing system according to embodiments of the present invention, the shared interior of computer node is used as by the use of redis clusters Deposit, by converging operation be configured to one can iteration complete system, select different redis data structures to deposit data Storage, wall hanging complete collecting for result of calculation by being operated built in redis.Due to entering line number without between each computer node According to interaction, the distributed computing system of the embodiment of the present invention has higher relative to the distributed computing system of prior art Performance is calculated, can be applied to the real-time analysis and early warning of mass data.

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and does not have the portion being described in detail in some embodiment Point, it may refer to the associated description of other embodiment.

In several embodiments provided herein, it should be understood that disclosed device, can be by another way Realize.For example, device embodiment described above is only schematical, such as the division of the unit, it is only one kind Division of logic function, can there is an other dividing mode when actually realizing, such as multiple units or component can combine or can To be integrated into another system, or some features can be ignored, or not perform.Another, shown or discussed is mutual Coupling direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING or communication connection of device or unit, Can be electrical or other forms.

The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.

If the integrated unit is realized in the form of SFU software functional unit and is used as independent production marketing or use When, it can be stored in a computer read/write memory medium.Based on such understanding, technical scheme is substantially The part to be contributed in other words to prior art or all or part of the technical scheme can be in the form of software products Embody, the computer software product is stored in a storage medium, including some instructions are causing a computer Equipment（Can be personal computer, mobile terminal, server or network equipment etc.）Perform side described in each embodiment of the present invention The all or part of step of method.And foregoing storage medium includes：USB flash disk, read-only storage（ROM, Read-Only Memory）、 Random access memory（RAM, Random Access Memory）, mobile hard disk, magnetic disc or CD etc. are various to store The medium of program code.

The preferred embodiments of the present invention are the foregoing is only, are not intended to limit the invention, for the skill of this area For art personnel, the present invention can have various modifications and variations.Within the spirit and principles of the invention, that is made any repaiies Change, equivalent substitution, improvement etc., should be included in the scope of the protection.

Claims

A kind of 1. distributed computing method, it is characterised in that including：

The aggregated data that computer node receives is obtained, wherein, the aggregated data is the data for converging operation；

The aggregated data is stored at least one key-value databases；

The aggregated data is calculated at least one key-value databases, obtains result of calculation；

The result of calculation is returned into the computer node；

Wherein, the method calculated the aggregated data is to be iterated calculating to the aggregated data；

Wherein, before the aggregated data is stored in at least one key-value databases, the distributed computing method is also Including：Determine the action type of the converging operation of the aggregated data；According to the action type from least one key- Data structure corresponding to the action type is selected in value databases；Wherein, the aggregated data is stored at least one Key-value databases include：The computer node by the aggregated data with select data structure write-in described at least One key-value database；

Wherein, calculating is iterated to the aggregated data at least one key-value databases, obtains calculating knot Fruit includes：The iterative formula according to corresponding to the action type creates the aggregated data；And the iterative formula for passing through establishment Calculating is iterated to the aggregated data at least one key-value databases.
2. distributed computing method according to claim 1, it is characterised in that the computer node includes first and calculated Machine node and second computer node, wherein,

Obtaining the aggregated data that computer node receives includes：At least one key-value databases obtain described the The aggregated data that one computer node receives,

The aggregated data is calculated at least one key-value databases, after obtaining result of calculation, institute Stating distributed computing method also includes：The second computer node obtains from least one key-value databases The result of calculation；The second computer node carries out data calculating based on the result of calculation got.
3. distributed computing method according to claim 1, it is characterised in that be stored in the aggregated data at least one Key-value databases include：

The aggregated data is converted into key-value pair by the computer node；

Memory space with the action type identical key-value databases of the aggregated data is determined by hash algorithm； And

The key-value pair of conversion is write in the memory space.
4. distributed computing method according to any one of claim 1 to 3, it is characterised in that described at least one Key-value databases include redis databases.
A kind of 5. distributed computing devices, it is characterised in that including：

First acquisition unit, the aggregated data received for obtaining computer node, wherein, the aggregated data is for gathering The data of closing operation；

Unit is stored in, for the aggregated data to be stored in at least one key-value databases；

First computing unit, for being calculated at least one key-value databases the aggregated data, obtain To result of calculation；

Returning unit, for the result of calculation to be returned into the computer node；

Wherein, the method calculated the aggregated data is to be iterated calculating to the aggregated data；

Wherein, the distributed computing devices also include：Determining unit, for aggregated data deposit is at least one Before key-value databases, the action type of the converging operation of the aggregated data is determined；Selecting unit, for according to institute State action type and select data structure corresponding to the action type from least one key-value databases；Wherein, The deposit unit includes：First writing module, for causing the computer node by number of the aggregated data to select At least one key-value databases are write according to structure；

Wherein, first computing unit includes：Creation module, for creating the aggregated data pair according to the action type The iterative formula answered；And computing module, for the iterative formula by establishment at least one key-value databases In calculating is iterated to the aggregated data.
6. distributed computing devices according to claim 5, it is characterised in that the computer node includes first and calculated Machine node and second computer node, wherein,

The first acquisition unit includes：First acquisition module, for causing at least one key-value databases to obtain The aggregated data that first computer node receives,

The distributed computing devices also include：Second acquisition unit, at least one key-value databases Calculating is iterated to the aggregated data, after obtaining result of calculation so that the second computer node from it is described at least The result of calculation is obtained in one key-value database；Second computing unit, for causing the second computer node Data calculating is carried out based on the result of calculation got.
7. distributed computing devices according to claim 5, it is characterised in that the deposit unit includes：

Conversion module, for causing the computer node that the aggregated data is converted into key-value pair；

Determining module, for determining the action type identical key-value data with the aggregated data by hash algorithm The memory space in storehouse；And

Second writing module, for causing the computer node to write the key-value pair of conversion in the memory space.
8. the distributed computing devices according to any one of claim 5 to 7, it is characterised in that described at least one Key-value databases are at least one redis databases.
A kind of 9. distributed computing system, it is characterised in that including：

Computer node, for receiving aggregated data, the aggregated data is the data for converging operation；

Router；

At least one key-value databases, it is connected via the router with the computer node, for via described Router obtains the aggregated data that computer node receives；Store the aggregated data；The aggregated data is carried out Calculate, obtain result of calculation；And the result of calculation is returned into the computer node；

Wherein, the method calculated the aggregated data is to be iterated calculating to the aggregated data；

Wherein, before the aggregated data is stored in at least one key-value databases, in addition to：Determine the polymerization The action type of the converging operation of data；Selected according to the action type from least one key-value databases Data structure corresponding to the action type；Wherein, the aggregated data is stored at least one key-value databases bag Include：The aggregated data is write at least one key-value data by the computer node with the data structure selected Storehouse；

Wherein, calculating is iterated to the aggregated data at least one key-value databases, obtains calculating knot Fruit includes：The iterative formula according to corresponding to the action type creates the aggregated data；And the iterative formula for passing through establishment Calculating is iterated to the aggregated data at least one key-value databases.