CN113434299B - Coding distributed computing method based on MapReduce framework - Google Patents
Coding distributed computing method based on MapReduce framework Download PDFInfo
- Publication number
- CN113434299B CN113434299B CN202110756959.8A CN202110756959A CN113434299B CN 113434299 B CN113434299 B CN 113434299B CN 202110756959 A CN202110756959 A CN 202110756959A CN 113434299 B CN113434299 B CN 113434299B
- Authority
- CN
- China
- Prior art keywords
- node
- distributed computing
- nodes
- intermediate values
- input file
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 73
- 238000000034 method Methods 0.000 claims abstract description 23
- 238000004891 communication Methods 0.000 abstract description 5
- 238000013459 approach Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000008054 signal transmission Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5083—Techniques for rebalancing the load in a distributed system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/505—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a coding distributed computing method based on a MapReduce framework, which is characterized in that N input files are divided into a plurality of parts and are respectively stored on different distributed computing nodes; then, in the output function allocation, a new output function set W is designed for each distributed computing node k This greatly reduces the number of output functions required. Finally, each distributed computing node obtains intermediate values of the input files which are not stored in the distributed computing nodes from other distributed computing nodes in a random selection mode, intermediate values of all the input files can be obtained, and the distributed output functions are calculated by utilizing the intermediate values of all the input files, so that distributed computing tasks are completed. The method reduces the number of input files and the number of output functions actually required on the premise of sacrificing a small amount of communication load by a new file distribution and function distribution mode, thereby better solving the actual problem, namely being widely applied in practice.
Description
Technical Field
The invention relates to the technical field of distributed computing, in particular to a coding distributed computing method based on a MapReduce framework.
Background
Driven by the rapid development of machine learning and data science, modern computing paradigms have shifted from traditional single processor systems to large distributed computing systems, and one popular framework in distributed computing is the MapReduce framework. Distributed computing has shown its own strong advantages in processing large-scale data, and has become a popular research direction in recent years.
While the MapReduce framework has become a popular framework for distributed computing, it also suffers from a significant disadvantage in that it requires a large amount of data exchange. For example, when running "SelfJoin" on Amazon EC2 clusters, 70% of the execution time is spent on data exchanges. To alleviate the communication bottleneck, ali et al 2018 proposed coded distributed computing ("Coded Distributed Computing", CDC) based on the MapReduce framework, and given a general scheme to achieve optimal communication load. Although the scheme obtains the optimal communication load, the number of input files and the number of output functions required by the scheme are exponentially increased along with the increase of the number of nodes, so that a certain difficulty exists in solving the practical problem.
Disclosure of Invention
The invention aims to solve the problem that the number of input files and the number of output functions required by the existing coding distributed computing method based on the MapReduce framework are large, and provides the coding distributed computing method based on the MapReduce framework.
In order to solve the problems, the invention is realized by the following technical scheme:
the coding distributed computing method based on the MapReduce framework comprises a Map stage, a Shuffle stage and a Reduce stage, and comprises the following steps:
1) Map stage:
step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files;
step 2, randomly selecting from 0-K' -1 integersThe integers are used as marks of each input file subset;
step 3, performing modular operation on the node factor K' on the number of each node to obtain the mark of each node;
step 4, based on the marks of each input file subset and the marks of each node, distributing the input file subset with the marks the same as the marks of the nodes to the corresponding nodes for storage;
step 5, each node calculates the intermediate value of each stored input file subset by using a Map function;
2) The Shuffle stage:
step 6, each node encodes the intermediate values of all the stored subsets of the input files into signals and transmits the signals to other nodes;
step 7, distributing an output function set to be calculated to each node; wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
3) Reduce stage
Step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from the intermediate values transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; and calculating the output function set distributed by the node by utilizing the intermediate values of all the input file subsets to finish distributed calculation.
K is the total number of nodes, K' is a node factor, and r is the calculated number of times of each input file; []Represents a rounding function, K e {0,1,.., K-1}, t is the number of output functions allocated on each node, andwhere gcd (K, s) represents the maximum common factor between K and s, s being the number of times each output function is calculated.
Compared with the prior art, the invention has the following characteristics:
1. compared with a general MapReduce framework, the method utilizes a carefully designed file allocation mode to enable each file block to be calculated by r different distributed computing nodes, and then utilizes redundant calculation on the nodes to create a coded multicast opportunity, so that the method can simultaneously transmit data to r nodes, and the time required by data transmission is reduced.
2. We reduce the number of input files and the number of output functions required compared to the solution proposed by Ali. The reason for this is: (1) when the Map stage performs file allocation, we perform modular operation on the node numbers first, and then perform file allocation on each distributed computing node according to the marks after the modular operation, so as to reduce the number of required input files; (2) during the Shuffle phase we have designed a new set of output functions W for each distributed compute node k The number of required output functions is greatly reduced; and at s>When=k', our solution does not require all K distributed computing nodes to participateThe signal transmission is only carried out on K' distributed computing nodes, so that the computing tasks on some distributed computing nodes are lightened; (3) by using the new file allocation and function allocation mode, the number of input files and the number of output functions actually required can be reduced on the premise of sacrificing a small amount of communication load, so that the actual problem can be better solved, and the method is widely applied in practice.
Drawings
Fig. 1 is an execution process of the MapReduce framework.
Detailed Description
The present invention will be further described in detail with reference to specific examples in order to make the objects, technical solutions and advantages of the present invention more apparent.
As shown in fig. 1, the principle of the MapReduce framework is by computing Q output functions of N input files on K distributed computing nodes. According to the coding distributed computing method based on the MapReduce framework, firstly, a new file dividing mode is selected, so that the number of input files actually needed is reduced. Then, in the output function allocation, a new output function set W is designed for each distributed computing node k This greatly reduces the number of output functions required. Finally, each distributed computing node obtains the intermediate values of the input files which are not stored in the distributed computing nodes from other distributed computing nodes in a random selection mode, and the intermediate values of all N input files can be obtained. Thus, each distributed computing node utilizes the intermediate values of the N input files to calculate the distributed output functions, and the distributed computing tasks are completed.
The coding distributed computing method based on the MapReduce framework comprises a Map stage, a Shuffle stage and a Reduce stage, and comprises the following steps:
1) Map stage
Step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files.
If the total number of the given input files is N, the number of the input files allocated to each input file subset isWherein (1)>Represents ∈K' taken from>Are combined, i.e.)>K is the total number of nodes, K ' is a node factor, K ' is a factor of K which is not equal to 1, and K ' noteqK, r is the number of times each input file is calculated.
In this embodiment, k=10, K' = 5,r =4, then
Step 2, randomly selecting from 0-K' -1 integersThe integer numbers act as labels for each subset of input files.
In this embodiment, k=10, K '= 5,r =4, then K' -1=5-1=4,i.e. 2 integers are randomly selected from the 5 integers of 0-4 as the labels for each subset of input files. For example, the labels of the n=10 input file subsets are respectively: {0,1},{0,2},{0,3},{0,4},{1,2},{1,3},{1,4},{2,3},{2,4},{3,4}.
And step 3, performing modular operation on the node factor K' by the number of each node to obtain the mark of each node.
In this embodiment, the labels of the 10 nodes are respectively:
for node number 0, after modulus K' =5, 0% 5=0, marked as 0;
for node number 1, after modulus K' =5, 1% 5=1, labeled 1;
for node number 2, K' =5 is modulo first, 2%5 =2, which is labeled 2;
for node number 3, K' =5 is modulo first, 3%5 =3, which is labeled 3;
for node number 4, after modulo K' =5, 4% 5=4, labeled 4;
for node number 5, K' =5 is modulo first, 5%5 =0, which is marked 0;
for node number 6, K' =5 is modulo first, 6%5 =1, which is labeled 1;
for node number 7, K' =5 is modulo first, 7%5 =2, which is labeled 2;
for node number 8, K' =5 is modulo first, 8%5 =3, which is labeled 3;
for node number 9, K' =5 is modulo first, 9%5 =4, which is labeled 4.
And 4, distributing the input file subsets with the same marks as the marks of the nodes to the corresponding nodes for storage based on the marks of each input file subset and the marks of each node.
In this embodiment, the input file subsets stored by the 10 nodes are respectively:
since the node numbered 0 is marked with 0, the node numbered 0 stores the 4 input file subsets of the marked 0, i.e., {0,1}, {0,2}, {0,3}, {0,4 };
since the node numbered 1 is marked 1, the node numbered 1 stores the 4 input file subsets of the marked 1, i.e., {0,1}, {1,2}, {1,3}, {1,4 };
since the node numbered 2 is marked with 2, the node numbered 2 stores the 4 input file subsets of the marked tape 2, namely {0,2}, {1,2}, {2,3}, {2,4 };
since the node numbered 3 is labeled 3, the node numbered 3 stores 4 subsets of input files labeled 3, namely {0,3}, {1,3} {2,3}, {3,4 };
since the node numbered 4 is marked 4, the node numbered 4 stores the subset of 4 input files of the marked tape 4, {0,4}, {1,4}, {2,4}, {3,4 };
since the node with the number of 5 is marked with 0, the node with the number of 5 stores 4 input file subsets with the number of 0, namely {0,1}, {0,2}, {0,3}, {0,4 };
since the node numbered 6 is marked 1, the node numbered 6 stores the 4 input file subsets of the marked 1, i.e., {0,1}, {1,2}, {1,3}, {1,4 };
since the node numbered 7 is marked with 2, the node numbered 7 stores the 4 input file subsets of the marked tape 2, namely {0,2}, {1,2}, {2,3}, {2,4 };
since the node numbered 8 is labeled 3, the node numbered 8 stores the 4 subsets of input files labeled 3, namely {0,3}, {1,3} {2,3}, {3,4 };
since the node numbered 9 is labeled 4, the node numbered 9 stores the 4 subsets of input files of the label tape 4, namely {0,4}, {1,4}, {2,4}, {3,4}.
And 5, calculating intermediate values of all the input file subsets stored at present on each node by using a Map function to obtain intermediate values of the files stored in the node.
2) Stage of Shuffle
And 6, each node encodes the stored file intermediate value into a signal and transmits the signal to other nodes.
Since each node of the MapReduce framework needs to use the intermediate values of all input files to perform the computation of the assigned function set, and in the Map stage, each node already stores a part of files, that is, only the intermediate values of a part of stored files are obtained, other nodes are required to give it the intermediate value of the file that the node does not store.
And 7, distributing an output function set to be calculated to each node.
In the MapReduce framework, Q output functions of N input files are computed on K distributed compute nodes. Wherein the set of output functions allocated on each distributed computing node isNamely W k The output functions included in (a) are the output function sets that node k needs to calculate. Wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
wherein, [. Cndot. ] represents a rounding function, i.e., rounding up, k.e., {0, 1.,. Cndot., }, t is the number of output functions assigned to each node, and s is the number of times each output function is calculated.
In this embodiment, s=6,where gcd (K, s) represents the maximum common factor between K and s. Taking the node with the number of 2 as an example, the output function set isI.e. node number 2, requires the calculation of 3 output functions, output function 1, output function 3 and output function 4, respectively.
3) Reduce stage
Step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from signals transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; and calculates the set of output functions assigned by the node using the intermediate values of all the subsets of input files.
Step 8.1, according to the output function set distributed in the Shuffle stage, each node obtains an intermediate value about the file stored by itself, that is, the intermediate value calculated by each node is: { v q,n :q∈W k ,w n ∈M k M is }, where M k Represented is a set of files stored by the node numbered k.
And 8.2, calculating all intermediate values related to the stored file by other nodes in the Map stage, and then encoding the calculated intermediate values into signals by each node and transmitting the signals to the rest nodes.
Step 8.3, each node calculates the obtained intermediate value { v }, according to the intermediate value { v } q,n :q∈W k ,w n ∈M k And signals received from other nodes, to solve for intermediate values required by themselvesThe specific implementation process is as follows:
since each file is computed by r 'nodes, the intermediate value computed for this file is also owned by r' nodes. For each node to which intermediate values are to be transmitted, there are 1 intermediate values that are identical to the node to which the output function is to be calculated, so that the node to which the intermediate values are to be transmitted needs to encode all intermediate values that have been calculated into a signal to be transmitted to the node to which the output function is to be calculated. Since there is the same intermediate value between the node where the output function is to be calculated and the node where the intermediate value is to be transmitted, it is possible to useThe values of the remaining intermediate values are calculated, and because in one equation set, 3 unknowns are required to be solved, the node which is required to transmit the intermediate value sends 3 equations of the intermediate value calculated by the node, so that the intermediate value required by the node which is required to calculate the output function can be solved in a mode of solving one unknown according to one equation.
In this example implementation, each file is calculated by r '=2 nodes, so the intermediate value calculated for this file is also owned by r' =2 nodes. Taking the node numbered 2 as an example, it is assigned output functions numbered 1,3 and 4. While nodes 0,1, 3 and 4 store the same files {0,2}, {1,2}, {2,3}, {2,4} respectively as node 2, and 4 files are stored by each node, so node 2 lacks 3 files with respect to these nodes, i.e. lacks 3 intermediate values with respect to the calculated output function, and thus each node needs to transmit 3 signals to node 2.
Since the node numbered 2 stores the subset of 4 input files {0,2}, {1,2}, {2,3}, {2,4}, it is possible to know the intermediate value of the node numbered 2 that already has a partial stored file for the output function 1,3,4, namely:
v 1,{0,2} ,v 1,{1,2} ,v 1,{2,3} ,v 1,{2,4} ,v 3,{0,2} ,v 3,{1,2} ,v 3,{2,3} ,v 3,{2,4} ,v 4,{0,2} ,v 4,{1,2} ,v 4,{2,3} ,v 4,{2,4} ;
but No. 2 also lacks intermediate values for other non-stored files of output functions 1,3,4, namely:
v 1,{0,1} ,v 1,{0,3} ,v 1,{0,4} ,v 1,{1,3} ,v 1,{1,4} ,v 1,{3,4} ,v 3,{0,1} ,v 3,{0,3} ,v 3,{0,4} ,
v 3,{1,3} ,v 3,{1,4} ,v 3,{3,4} ,v 4,{0,1} ,v 4,{0,3} ,v 4,{0,4} ,v 4,{1,3} ,v 4,{1,4} ,v 4,{3,4} ;
the intermediate values of these non-stored files can be obtained by other nodes, so that the other nodes (node 0, node 1, node 3 and node 4) encode those intermediate values calculated by themselves as needed for node 2 into a signal which is sent to node 2, when:
the signals transmitted from node 0 to node 2 are:
the signals transmitted from node 1 to node 2 are:
the signals transmitted by node 3 to node 2 are:
the signals transmitted by node 4 to node 2 are:
wherein alpha is 1 ,α 2 ,α 3 ,α 4 、Is a coefficient, and alpha 1 ,α 2 ,α 3 ,α 4 Not all equal to 1;Not all equal to alpha 1 ,α 2 ,α 3 ,α 4 I.e. all coefficients are linearly independent, ensuring that 3 unknowns can be solved by such a system of equations.
Node 2 can thus solve for those intermediate values that it needs by means of these signals. Wherein each signal is usedConnected +.>The intermediate values represented are exclusive-ored by bits, and each intermediate value is represented in binary. I.e. node 2 can solve 9 own required intermediate values from the system of equations sent by 0: v 1,{0,1} ,v 1,{0,3} ,v 1,{0,4} ,v 3,{0,1} ,v 3,{0,1} ,v 3,{0,1} ,v 4,{0,1} ,v 4,{0,1} ,v 4,{0,1} The method comprises the steps of carrying out a first treatment on the surface of the From the system of equations sent by node 1, the 6 own required intermediate values are solved: v 1,{1,3} ,v 1,{1,4} ,v 3,{1,3} ,v 3,{1,3} ,v 4,{1,3} ,v 4,{1,3} The method comprises the steps of carrying out a first treatment on the surface of the From the system of equations sent by node 3, 3 own required intermediate values are solved: v 1,{3,4} ,v 3,{3,4} ,v 4,{3,4} . (the solution is not unique, so long as the required intermediate value is solved from the node of the transmitted signal)
Since s=6 > K ', when K' distributed computing nodes restore their desired intermediate values, the intermediate values required by s=6 distributed computing nodes are restored at the same time, since the files stored by 2 of them must be identical.
Table 1 is the file and function assignments on the nodes. In table 1, the file stored on each node is represented, the grey part represents the output function allocated on each node, and the number of intermediate values that are commonly needed in this distributed system, i.e. all the rows in the table, can be seen intuitively by the table. The intermediate values required for each node are those with grey parts, wherein each node already has a part of its required intermediate values in the Map phase, those grey bottom bands; the remaining intermediate values with grey parts require other nodes in K' to transmit them.
Table 1: file and function allocation on nodes
Taking the third column in table 1 (i.e., node 0) as an example, it can be seen from the table that the intermediate values already existing for node 0 include:
v 1,{0,1} ,v 1,{0,2} ,v 1,{0,3} ,v 1,{0,4} ,v 2,{0,1} ,v 2,{0,2} ,v 2,{0,3} ,v 2,{0,4} ,v 4,{0,1} ,v 4,{0,2} ,v 4,{0,3} ,v 4,{0,4} the method comprises the steps of carrying out a first treatment on the surface of the Intermediate values are also required to include:
v 1,{1,2} ,v 1,{1,3} ,v 1,{1,4} ,v 1,{2,3} ,v 1,{2,4} ,v 1,{3,4} ,v 2,{1,2} ,v 2,{1,3} ,v 2,{1,4} ,
v 2,{2,3} ,v 2,{2,4} ,v 2,{3,4} ,v 4,{1,2} ,v 4,{1,3} ,v 4,{1,4} ,v 4,{2,3} ,v 4,{2,4} ,v 4,{3,4} 。
table 2 shows the results of comparison with the Ali scheme. K in the table represents the number of nodes required; r represents the number of times each file is calculated; s represents the number of times each output function is calculated. N (Ali) represents the number of files needed in the article being compared; n (New) represents the number of files that we need in this approach; q (Ali) represents the number of functions required in the article being compared; q (New) represents the number of functions that we need in this approach; the final L/L represents the ratio to the traffic load of the article being compared. From table 2, we can intuitively see that using our method significantly reduces the number of files and the number of output functions, but the traffic load is not much increased, but is less than twice as much as the original traffic load.
Table 2: comparison results with Ali scheme
The innovation of the invention is that:
in the Map stage, the method of the invention selects a node factor K' of K to obtain the target valueAnd K' to divide the total file into +.>A block; when the file blocks are stored, the number of each node is firstly subjected to modulo K', and then the judgment of which file blocks are stored in the node is carried out according to the modulo result serving as a mark. The proposal proposed by Ali is to divide the file directly into +.>When storing the file blocks, the blocks only need to store the file blocks with the node numbers on the corresponding nodes. Therefore, the scheme of the invention can process fewer files, and the files processed by the scheme proposed by Ali are more, so that the method of the invention has wider application in practice.
In the Shuffle stage, when the method of the invention distributes the output function to each node, a new function distribution mode is provided to determine the distributed output function on each node; while the approach proposed by Ali is to directly divide all output functions intoAnd when the functions are allocated, storing the output function set with the node numbers on the corresponding nodes. Thus, the method of the present invention requires a smaller number of output functions than the method proposed by Ali.
When s > =k ', using our scheme does not require all K distributed computing nodes to participate, but only needs to be performed on K' distributed computing nodes, thus reducing the computing tasks on some distributed computing nodes.
In summary, the method of the present invention can reduce the number of files and functions required in the method proposed by Ali, so that the method is better applied in practice.
It should be noted that, although the examples described above are illustrative, this is not a limitation of the present invention, and thus the present invention is not limited to the above-described specific embodiments. Other embodiments, which are apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein, are considered to be within the scope of the invention as claimed.
Claims (1)
1. The coding distributed computing method based on the MapReduce framework is characterized by comprising the following steps of:
step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files; wherein the method comprises the steps ofRepresents ∈K' taken from>Are combined, i.e.)>
Step 2, randomly selecting from 0-K' -1 integersThe integers are used as marks of each input file subset;
step 3, performing modular operation on the node factor K' on the number of each node to obtain the mark of each node;
step 4, based on the marks of each input file subset and the marks of each node, distributing the input file subset with the marks the same as the marks of the nodes to the corresponding nodes for storage;
step 5, each node calculates the intermediate value of each stored input file subset by using a Map function;
step 6, each node encodes the intermediate values of all the stored subsets of the input files into signals and transmits the signals to other nodes;
step 7, distributing an output function set to be calculated to each node; wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from the intermediate values transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; calculating an output function set distributed by the node by utilizing intermediate values of all input file subsets to finish distributed calculation;
wherein K is the total number of nodes, K' is a node factor, r is the number of times each input file is calculated, t is the number of output functions allocated on each node,s is the number of times each output function is calculated, gcd (K, s) represents the maximum common factor between K and s, []Represents a rounding function, K e {0,1,...
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110756959.8A CN113434299B (en) | 2021-07-05 | 2021-07-05 | Coding distributed computing method based on MapReduce framework |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110756959.8A CN113434299B (en) | 2021-07-05 | 2021-07-05 | Coding distributed computing method based on MapReduce framework |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113434299A CN113434299A (en) | 2021-09-24 |
CN113434299B true CN113434299B (en) | 2024-02-06 |
Family
ID=77758959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110756959.8A Active CN113434299B (en) | 2021-07-05 | 2021-07-05 | Coding distributed computing method based on MapReduce framework |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113434299B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011134285A1 (en) * | 2010-04-29 | 2011-11-03 | 中科院成都计算机应用研究所 | Distributed self-adaptive coding and storage method |
CN103106253A (en) * | 2013-01-16 | 2013-05-15 | 西安交通大学 | Data balance method based on genetic algorithm in MapReduce calculation module |
US8738581B1 (en) * | 2012-02-15 | 2014-05-27 | Symantec Corporation | Using multiple clients for data backup |
CN111045843A (en) * | 2019-11-01 | 2020-04-21 | 河海大学 | Distributed data processing method with fault tolerance capability |
CN111490795A (en) * | 2020-05-25 | 2020-08-04 | 南京大学 | Intermediate value length isomerism-oriented encoding MapReduce method |
-
2021
- 2021-07-05 CN CN202110756959.8A patent/CN113434299B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2011134285A1 (en) * | 2010-04-29 | 2011-11-03 | 中科院成都计算机应用研究所 | Distributed self-adaptive coding and storage method |
US8738581B1 (en) * | 2012-02-15 | 2014-05-27 | Symantec Corporation | Using multiple clients for data backup |
CN103106253A (en) * | 2013-01-16 | 2013-05-15 | 西安交通大学 | Data balance method based on genetic algorithm in MapReduce calculation module |
CN111045843A (en) * | 2019-11-01 | 2020-04-21 | 河海大学 | Distributed data processing method with fault tolerance capability |
CN111490795A (en) * | 2020-05-25 | 2020-08-04 | 南京大学 | Intermediate value length isomerism-oriented encoding MapReduce method |
Non-Patent Citations (1)
Title |
---|
编码技术改进大规模分布式机器学习性能综述;王艳;李念爽;王希龄;钟凤艳;;计算机研究与发展(第03期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113434299A (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111062472B (en) | Sparse neural network accelerator based on structured pruning and acceleration method thereof | |
CN111382844B (en) | Training method and device for deep learning model | |
CN101510781B (en) | Method and device for filling dummy argument for interlace and de-interlace process as well as processing system | |
JPH11259441A (en) | All-to-all communication method for parallel computer | |
CN111490795B (en) | Intermediate value length isomerism-oriented encoding MapReduce method | |
WO2022134465A1 (en) | Sparse data processing method for accelerating operation of re-configurable processor, and device | |
CN111104215A (en) | Random gradient descent optimization method based on distributed coding | |
CN113434299B (en) | Coding distributed computing method based on MapReduce framework | |
CN107800700B (en) | Router and network-on-chip transmission system and method | |
CN116842998A (en) | Distributed optimization-based multi-FPGA collaborative training neural network method | |
CN113505021B (en) | Fault tolerance method and system based on multi-master-node master-slave distributed architecture | |
US20200242724A1 (en) | Device and method for accelerating graphics processor units, and computer readable storage medium | |
CN112799852B (en) | Multi-dimensional SBP distributed signature decision system and method for logic node | |
CN110766136B (en) | Compression method of sparse matrix and vector | |
CN115103031B (en) | Multistage quantization and self-adaptive adjustment method | |
US11297127B2 (en) | Information processing system and control method of information processing system | |
CN112769522B (en) | Partition structure-based encoding distributed computing method | |
CN117574966B (en) | Model quantization method, device, electronic equipment and storage medium | |
CN113722666B (en) | Application specific integrated circuit chip and method, block chain system and block generation method | |
CN111966404B (en) | GPU-based regular sparse code division multiple access SCMA high-speed parallel decoding method | |
CN113704681B (en) | Data processing method, device and super computing system | |
CN110598175B (en) | Sparse matrix column vector comparison device based on graph computation accelerator | |
CN113052332B (en) | Distributed model parallel equipment distribution optimization method based on equipment balance principle | |
JP3524430B2 (en) | Reduction processing method for parallel computers | |
JP2019086976A (en) | Information processing system, arithmetic processing unit, and control method for information processing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |