CN113434299B - Coding distributed computing method based on MapReduce framework - Google Patents

Coding distributed computing method based on MapReduce framework Download PDF

Info

Publication number
CN113434299B
CN113434299B CN202110756959.8A CN202110756959A CN113434299B CN 113434299 B CN113434299 B CN 113434299B CN 202110756959 A CN202110756959 A CN 202110756959A CN 113434299 B CN113434299 B CN 113434299B
Authority
CN
China
Prior art keywords
node
distributed computing
nodes
intermediate values
input file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110756959.8A
Other languages
Chinese (zh)
Other versions
CN113434299A (en
Inventor
周玲玲
蒋静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangxi Normal University
Original Assignee
Guangxi Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi Normal University filed Critical Guangxi Normal University
Priority to CN202110756959.8A priority Critical patent/CN113434299B/en
Publication of CN113434299A publication Critical patent/CN113434299A/en
Application granted granted Critical
Publication of CN113434299B publication Critical patent/CN113434299B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a coding distributed computing method based on a MapReduce framework, which is characterized in that N input files are divided into a plurality of parts and are respectively stored on different distributed computing nodes; then, in the output function allocation, a new output function set W is designed for each distributed computing node k This greatly reduces the number of output functions required. Finally, each distributed computing node obtains intermediate values of the input files which are not stored in the distributed computing nodes from other distributed computing nodes in a random selection mode, intermediate values of all the input files can be obtained, and the distributed output functions are calculated by utilizing the intermediate values of all the input files, so that distributed computing tasks are completed. The method reduces the number of input files and the number of output functions actually required on the premise of sacrificing a small amount of communication load by a new file distribution and function distribution mode, thereby better solving the actual problem, namely being widely applied in practice.

Description

Coding distributed computing method based on MapReduce framework
Technical Field
The invention relates to the technical field of distributed computing, in particular to a coding distributed computing method based on a MapReduce framework.
Background
Driven by the rapid development of machine learning and data science, modern computing paradigms have shifted from traditional single processor systems to large distributed computing systems, and one popular framework in distributed computing is the MapReduce framework. Distributed computing has shown its own strong advantages in processing large-scale data, and has become a popular research direction in recent years.
While the MapReduce framework has become a popular framework for distributed computing, it also suffers from a significant disadvantage in that it requires a large amount of data exchange. For example, when running "SelfJoin" on Amazon EC2 clusters, 70% of the execution time is spent on data exchanges. To alleviate the communication bottleneck, ali et al 2018 proposed coded distributed computing ("Coded Distributed Computing", CDC) based on the MapReduce framework, and given a general scheme to achieve optimal communication load. Although the scheme obtains the optimal communication load, the number of input files and the number of output functions required by the scheme are exponentially increased along with the increase of the number of nodes, so that a certain difficulty exists in solving the practical problem.
Disclosure of Invention
The invention aims to solve the problem that the number of input files and the number of output functions required by the existing coding distributed computing method based on the MapReduce framework are large, and provides the coding distributed computing method based on the MapReduce framework.
In order to solve the problems, the invention is realized by the following technical scheme:
the coding distributed computing method based on the MapReduce framework comprises a Map stage, a Shuffle stage and a Reduce stage, and comprises the following steps:
1) Map stage:
step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files;
step 2, randomly selecting from 0-K' -1 integersThe integers are used as marks of each input file subset;
step 3, performing modular operation on the node factor K' on the number of each node to obtain the mark of each node;
step 4, based on the marks of each input file subset and the marks of each node, distributing the input file subset with the marks the same as the marks of the nodes to the corresponding nodes for storage;
step 5, each node calculates the intermediate value of each stored input file subset by using a Map function;
2) The Shuffle stage:
step 6, each node encodes the intermediate values of all the stored subsets of the input files into signals and transmits the signals to other nodes;
step 7, distributing an output function set to be calculated to each node; wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
3) Reduce stage
Step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from the intermediate values transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; and calculating the output function set distributed by the node by utilizing the intermediate values of all the input file subsets to finish distributed calculation.
K is the total number of nodes, K' is a node factor, and r is the calculated number of times of each input file; []Represents a rounding function, K e {0,1,.., K-1}, t is the number of output functions allocated on each node, andwhere gcd (K, s) represents the maximum common factor between K and s, s being the number of times each output function is calculated.
Compared with the prior art, the invention has the following characteristics:
1. compared with a general MapReduce framework, the method utilizes a carefully designed file allocation mode to enable each file block to be calculated by r different distributed computing nodes, and then utilizes redundant calculation on the nodes to create a coded multicast opportunity, so that the method can simultaneously transmit data to r nodes, and the time required by data transmission is reduced.
2. We reduce the number of input files and the number of output functions required compared to the solution proposed by Ali. The reason for this is: (1) when the Map stage performs file allocation, we perform modular operation on the node numbers first, and then perform file allocation on each distributed computing node according to the marks after the modular operation, so as to reduce the number of required input files; (2) during the Shuffle phase we have designed a new set of output functions W for each distributed compute node k The number of required output functions is greatly reduced; and at s>When=k', our solution does not require all K distributed computing nodes to participateThe signal transmission is only carried out on K' distributed computing nodes, so that the computing tasks on some distributed computing nodes are lightened; (3) by using the new file allocation and function allocation mode, the number of input files and the number of output functions actually required can be reduced on the premise of sacrificing a small amount of communication load, so that the actual problem can be better solved, and the method is widely applied in practice.
Drawings
Fig. 1 is an execution process of the MapReduce framework.
Detailed Description
The present invention will be further described in detail with reference to specific examples in order to make the objects, technical solutions and advantages of the present invention more apparent.
As shown in fig. 1, the principle of the MapReduce framework is by computing Q output functions of N input files on K distributed computing nodes. According to the coding distributed computing method based on the MapReduce framework, firstly, a new file dividing mode is selected, so that the number of input files actually needed is reduced. Then, in the output function allocation, a new output function set W is designed for each distributed computing node k This greatly reduces the number of output functions required. Finally, each distributed computing node obtains the intermediate values of the input files which are not stored in the distributed computing nodes from other distributed computing nodes in a random selection mode, and the intermediate values of all N input files can be obtained. Thus, each distributed computing node utilizes the intermediate values of the N input files to calculate the distributed output functions, and the distributed computing tasks are completed.
The coding distributed computing method based on the MapReduce framework comprises a Map stage, a Shuffle stage and a Reduce stage, and comprises the following steps:
1) Map stage
Step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files.
If the total number of the given input files is N, the number of the input files allocated to each input file subset isWherein (1)>Represents ∈K' taken from>Are combined, i.e.)>K is the total number of nodes, K ' is a node factor, K ' is a factor of K which is not equal to 1, and K ' noteqK, r is the number of times each input file is calculated.
In this embodiment, k=10, K' = 5,r =4, then
Step 2, randomly selecting from 0-K' -1 integersThe integer numbers act as labels for each subset of input files.
In this embodiment, k=10, K '= 5,r =4, then K' -1=5-1=4,i.e. 2 integers are randomly selected from the 5 integers of 0-4 as the labels for each subset of input files. For example, the labels of the n=10 input file subsets are respectively: {0,1},{0,2},{0,3},{0,4},{1,2},{1,3},{1,4},{2,3},{2,4},{3,4}.
And step 3, performing modular operation on the node factor K' by the number of each node to obtain the mark of each node.
In this embodiment, the labels of the 10 nodes are respectively:
for node number 0, after modulus K' =5, 0% 5=0, marked as 0;
for node number 1, after modulus K' =5, 1% 5=1, labeled 1;
for node number 2, K' =5 is modulo first, 2%5 =2, which is labeled 2;
for node number 3, K' =5 is modulo first, 3%5 =3, which is labeled 3;
for node number 4, after modulo K' =5, 4% 5=4, labeled 4;
for node number 5, K' =5 is modulo first, 5%5 =0, which is marked 0;
for node number 6, K' =5 is modulo first, 6%5 =1, which is labeled 1;
for node number 7, K' =5 is modulo first, 7%5 =2, which is labeled 2;
for node number 8, K' =5 is modulo first, 8%5 =3, which is labeled 3;
for node number 9, K' =5 is modulo first, 9%5 =4, which is labeled 4.
And 4, distributing the input file subsets with the same marks as the marks of the nodes to the corresponding nodes for storage based on the marks of each input file subset and the marks of each node.
In this embodiment, the input file subsets stored by the 10 nodes are respectively:
since the node numbered 0 is marked with 0, the node numbered 0 stores the 4 input file subsets of the marked 0, i.e., {0,1}, {0,2}, {0,3}, {0,4 };
since the node numbered 1 is marked 1, the node numbered 1 stores the 4 input file subsets of the marked 1, i.e., {0,1}, {1,2}, {1,3}, {1,4 };
since the node numbered 2 is marked with 2, the node numbered 2 stores the 4 input file subsets of the marked tape 2, namely {0,2}, {1,2}, {2,3}, {2,4 };
since the node numbered 3 is labeled 3, the node numbered 3 stores 4 subsets of input files labeled 3, namely {0,3}, {1,3} {2,3}, {3,4 };
since the node numbered 4 is marked 4, the node numbered 4 stores the subset of 4 input files of the marked tape 4, {0,4}, {1,4}, {2,4}, {3,4 };
since the node with the number of 5 is marked with 0, the node with the number of 5 stores 4 input file subsets with the number of 0, namely {0,1}, {0,2}, {0,3}, {0,4 };
since the node numbered 6 is marked 1, the node numbered 6 stores the 4 input file subsets of the marked 1, i.e., {0,1}, {1,2}, {1,3}, {1,4 };
since the node numbered 7 is marked with 2, the node numbered 7 stores the 4 input file subsets of the marked tape 2, namely {0,2}, {1,2}, {2,3}, {2,4 };
since the node numbered 8 is labeled 3, the node numbered 8 stores the 4 subsets of input files labeled 3, namely {0,3}, {1,3} {2,3}, {3,4 };
since the node numbered 9 is labeled 4, the node numbered 9 stores the 4 subsets of input files of the label tape 4, namely {0,4}, {1,4}, {2,4}, {3,4}.
And 5, calculating intermediate values of all the input file subsets stored at present on each node by using a Map function to obtain intermediate values of the files stored in the node.
2) Stage of Shuffle
And 6, each node encodes the stored file intermediate value into a signal and transmits the signal to other nodes.
Since each node of the MapReduce framework needs to use the intermediate values of all input files to perform the computation of the assigned function set, and in the Map stage, each node already stores a part of files, that is, only the intermediate values of a part of stored files are obtained, other nodes are required to give it the intermediate value of the file that the node does not store.
And 7, distributing an output function set to be calculated to each node.
In the MapReduce framework, Q output functions of N input files are computed on K distributed compute nodes. Wherein the set of output functions allocated on each distributed computing node isNamely W k The output functions included in (a) are the output function sets that node k needs to calculate. Wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
wherein, [. Cndot. ] represents a rounding function, i.e., rounding up, k.e., {0, 1.,. Cndot., }, t is the number of output functions assigned to each node, and s is the number of times each output function is calculated.
In this embodiment, s=6,where gcd (K, s) represents the maximum common factor between K and s. Taking the node with the number of 2 as an example, the output function set isI.e. node number 2, requires the calculation of 3 output functions, output function 1, output function 3 and output function 4, respectively.
3) Reduce stage
Step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from signals transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; and calculates the set of output functions assigned by the node using the intermediate values of all the subsets of input files.
Step 8.1, according to the output function set distributed in the Shuffle stage, each node obtains an intermediate value about the file stored by itself, that is, the intermediate value calculated by each node is: { v q,n :q∈W k ,w n ∈M k M is }, where M k Represented is a set of files stored by the node numbered k.
And 8.2, calculating all intermediate values related to the stored file by other nodes in the Map stage, and then encoding the calculated intermediate values into signals by each node and transmitting the signals to the rest nodes.
Step 8.3, each node calculates the obtained intermediate value { v }, according to the intermediate value { v } q,n :q∈W k ,w n ∈M k And signals received from other nodes, to solve for intermediate values required by themselvesThe specific implementation process is as follows:
since each file is computed by r 'nodes, the intermediate value computed for this file is also owned by r' nodes. For each node to which intermediate values are to be transmitted, there are 1 intermediate values that are identical to the node to which the output function is to be calculated, so that the node to which the intermediate values are to be transmitted needs to encode all intermediate values that have been calculated into a signal to be transmitted to the node to which the output function is to be calculated. Since there is the same intermediate value between the node where the output function is to be calculated and the node where the intermediate value is to be transmitted, it is possible to useThe values of the remaining intermediate values are calculated, and because in one equation set, 3 unknowns are required to be solved, the node which is required to transmit the intermediate value sends 3 equations of the intermediate value calculated by the node, so that the intermediate value required by the node which is required to calculate the output function can be solved in a mode of solving one unknown according to one equation.
In this example implementation, each file is calculated by r '=2 nodes, so the intermediate value calculated for this file is also owned by r' =2 nodes. Taking the node numbered 2 as an example, it is assigned output functions numbered 1,3 and 4. While nodes 0,1, 3 and 4 store the same files {0,2}, {1,2}, {2,3}, {2,4} respectively as node 2, and 4 files are stored by each node, so node 2 lacks 3 files with respect to these nodes, i.e. lacks 3 intermediate values with respect to the calculated output function, and thus each node needs to transmit 3 signals to node 2.
Since the node numbered 2 stores the subset of 4 input files {0,2}, {1,2}, {2,3}, {2,4}, it is possible to know the intermediate value of the node numbered 2 that already has a partial stored file for the output function 1,3,4, namely:
v 1,{0,2} ,v 1,{1,2} ,v 1,{2,3} ,v 1,{2,4} ,v 3,{0,2} ,v 3,{1,2} ,v 3,{2,3} ,v 3,{2,4} ,v 4,{0,2} ,v 4,{1,2} ,v 4,{2,3} ,v 4,{2,4}
but No. 2 also lacks intermediate values for other non-stored files of output functions 1,3,4, namely:
v 1,{0,1} ,v 1,{0,3} ,v 1,{0,4} ,v 1,{1,3} ,v 1,{1,4} ,v 1,{3,4} ,v 3,{0,1} ,v 3,{0,3} ,v 3,{0,4} ,
v 3,{1,3} ,v 3,{1,4} ,v 3,{3,4} ,v 4,{0,1} ,v 4,{0,3} ,v 4,{0,4} ,v 4,{1,3} ,v 4,{1,4} ,v 4,{3,4}
the intermediate values of these non-stored files can be obtained by other nodes, so that the other nodes (node 0, node 1, node 3 and node 4) encode those intermediate values calculated by themselves as needed for node 2 into a signal which is sent to node 2, when:
the signals transmitted from node 0 to node 2 are:
the signals transmitted from node 1 to node 2 are:
the signals transmitted by node 3 to node 2 are:
the signals transmitted by node 4 to node 2 are:
wherein alpha is 1234Is a coefficient, and alpha 1234 Not all equal to 1;Not all equal to alpha 1234 I.e. all coefficients are linearly independent, ensuring that 3 unknowns can be solved by such a system of equations.
Node 2 can thus solve for those intermediate values that it needs by means of these signals. Wherein each signal is usedConnected +.>The intermediate values represented are exclusive-ored by bits, and each intermediate value is represented in binary. I.e. node 2 can solve 9 own required intermediate values from the system of equations sent by 0: v 1,{0,1} ,v 1,{0,3} ,v 1,{0,4} ,v 3,{0,1} ,v 3,{0,1} ,v 3,{0,1} ,v 4,{0,1} ,v 4,{0,1} ,v 4,{0,1} The method comprises the steps of carrying out a first treatment on the surface of the From the system of equations sent by node 1, the 6 own required intermediate values are solved: v 1,{1,3} ,v 1,{1,4} ,v 3,{1,3} ,v 3,{1,3} ,v 4,{1,3} ,v 4,{1,3} The method comprises the steps of carrying out a first treatment on the surface of the From the system of equations sent by node 3, 3 own required intermediate values are solved: v 1,{3,4} ,v 3,{3,4} ,v 4,{3,4} . (the solution is not unique, so long as the required intermediate value is solved from the node of the transmitted signal)
Since s=6 > K ', when K' distributed computing nodes restore their desired intermediate values, the intermediate values required by s=6 distributed computing nodes are restored at the same time, since the files stored by 2 of them must be identical.
Table 1 is the file and function assignments on the nodes. In table 1, the file stored on each node is represented, the grey part represents the output function allocated on each node, and the number of intermediate values that are commonly needed in this distributed system, i.e. all the rows in the table, can be seen intuitively by the table. The intermediate values required for each node are those with grey parts, wherein each node already has a part of its required intermediate values in the Map phase, those grey bottom bands; the remaining intermediate values with grey parts require other nodes in K' to transmit them.
Table 1: file and function allocation on nodes
Taking the third column in table 1 (i.e., node 0) as an example, it can be seen from the table that the intermediate values already existing for node 0 include:
v 1,{0,1} ,v 1,{0,2} ,v 1,{0,3} ,v 1,{0,4} ,v 2,{0,1} ,v 2,{0,2} ,v 2,{0,3} ,v 2,{0,4} ,v 4,{0,1} ,v 4,{0,2} ,v 4,{0,3} ,v 4,{0,4} the method comprises the steps of carrying out a first treatment on the surface of the Intermediate values are also required to include:
v 1,{1,2} ,v 1,{1,3} ,v 1,{1,4} ,v 1,{2,3} ,v 1,{2,4} ,v 1,{3,4} ,v 2,{1,2} ,v 2,{1,3} ,v 2,{1,4} ,
v 2,{2,3} ,v 2,{2,4} ,v 2,{3,4} ,v 4,{1,2} ,v 4,{1,3} ,v 4,{1,4} ,v 4,{2,3} ,v 4,{2,4} ,v 4,{3,4}
table 2 shows the results of comparison with the Ali scheme. K in the table represents the number of nodes required; r represents the number of times each file is calculated; s represents the number of times each output function is calculated. N (Ali) represents the number of files needed in the article being compared; n (New) represents the number of files that we need in this approach; q (Ali) represents the number of functions required in the article being compared; q (New) represents the number of functions that we need in this approach; the final L/L represents the ratio to the traffic load of the article being compared. From table 2, we can intuitively see that using our method significantly reduces the number of files and the number of output functions, but the traffic load is not much increased, but is less than twice as much as the original traffic load.
Table 2: comparison results with Ali scheme
The innovation of the invention is that:
in the Map stage, the method of the invention selects a node factor K' of K to obtain the target valueAnd K' to divide the total file into +.>A block; when the file blocks are stored, the number of each node is firstly subjected to modulo K', and then the judgment of which file blocks are stored in the node is carried out according to the modulo result serving as a mark. The proposal proposed by Ali is to divide the file directly into +.>When storing the file blocks, the blocks only need to store the file blocks with the node numbers on the corresponding nodes. Therefore, the scheme of the invention can process fewer files, and the files processed by the scheme proposed by Ali are more, so that the method of the invention has wider application in practice.
In the Shuffle stage, when the method of the invention distributes the output function to each node, a new function distribution mode is provided to determine the distributed output function on each node; while the approach proposed by Ali is to directly divide all output functions intoAnd when the functions are allocated, storing the output function set with the node numbers on the corresponding nodes. Thus, the method of the present invention requires a smaller number of output functions than the method proposed by Ali.
When s > =k ', using our scheme does not require all K distributed computing nodes to participate, but only needs to be performed on K' distributed computing nodes, thus reducing the computing tasks on some distributed computing nodes.
In summary, the method of the present invention can reduce the number of files and functions required in the method proposed by Ali, so that the method is better applied in practice.
It should be noted that, although the examples described above are illustrative, this is not a limitation of the present invention, and thus the present invention is not limited to the above-described specific embodiments. Other embodiments, which are apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein, are considered to be within the scope of the invention as claimed.

Claims (1)

1. The coding distributed computing method based on the MapReduce framework is characterized by comprising the following steps of:
step 1, carrying out repeated-free average division on a given input file to obtainA subset of the input files; wherein the method comprises the steps ofRepresents ∈K' taken from>Are combined, i.e.)>
Step 2, randomly selecting from 0-K' -1 integersThe integers are used as marks of each input file subset;
step 3, performing modular operation on the node factor K' on the number of each node to obtain the mark of each node;
step 4, based on the marks of each input file subset and the marks of each node, distributing the input file subset with the marks the same as the marks of the nodes to the corresponding nodes for storage;
step 5, each node calculates the intermediate value of each stored input file subset by using a Map function;
step 6, each node encodes the intermediate values of all the stored subsets of the input files into signals and transmits the signals to other nodes;
step 7, distributing an output function set to be calculated to each node; wherein the node numbered k is assigned a set of output functions W k The method comprises the following steps:
step 8, each node randomly selects the intermediate value of each non-stored input file subset of the node from the intermediate values transmitted by other nodes; combining the node to store the intermediate values of the input file subsets to obtain the intermediate values of all the input file subsets; calculating an output function set distributed by the node by utilizing intermediate values of all input file subsets to finish distributed calculation;
wherein K is the total number of nodes, K' is a node factor, r is the number of times each input file is calculated, t is the number of output functions allocated on each node,s is the number of times each output function is calculated, gcd (K, s) represents the maximum common factor between K and s, []Represents a rounding function, K e {0,1,...
CN202110756959.8A 2021-07-05 2021-07-05 Coding distributed computing method based on MapReduce framework Active CN113434299B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110756959.8A CN113434299B (en) 2021-07-05 2021-07-05 Coding distributed computing method based on MapReduce framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110756959.8A CN113434299B (en) 2021-07-05 2021-07-05 Coding distributed computing method based on MapReduce framework

Publications (2)

Publication Number Publication Date
CN113434299A CN113434299A (en) 2021-09-24
CN113434299B true CN113434299B (en) 2024-02-06

Family

ID=77758959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110756959.8A Active CN113434299B (en) 2021-07-05 2021-07-05 Coding distributed computing method based on MapReduce framework

Country Status (1)

Country Link
CN (1) CN113434299B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011134285A1 (en) * 2010-04-29 2011-11-03 中科院成都计算机应用研究所 Distributed self-adaptive coding and storage method
CN103106253A (en) * 2013-01-16 2013-05-15 西安交通大学 Data balance method based on genetic algorithm in MapReduce calculation module
US8738581B1 (en) * 2012-02-15 2014-05-27 Symantec Corporation Using multiple clients for data backup
CN111045843A (en) * 2019-11-01 2020-04-21 河海大学 Distributed data processing method with fault tolerance capability
CN111490795A (en) * 2020-05-25 2020-08-04 南京大学 Intermediate value length isomerism-oriented encoding MapReduce method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2011134285A1 (en) * 2010-04-29 2011-11-03 中科院成都计算机应用研究所 Distributed self-adaptive coding and storage method
US8738581B1 (en) * 2012-02-15 2014-05-27 Symantec Corporation Using multiple clients for data backup
CN103106253A (en) * 2013-01-16 2013-05-15 西安交通大学 Data balance method based on genetic algorithm in MapReduce calculation module
CN111045843A (en) * 2019-11-01 2020-04-21 河海大学 Distributed data processing method with fault tolerance capability
CN111490795A (en) * 2020-05-25 2020-08-04 南京大学 Intermediate value length isomerism-oriented encoding MapReduce method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
编码技术改进大规模分布式机器学习性能综述;王艳;李念爽;王希龄;钟凤艳;;计算机研究与发展(第03期);全文 *

Also Published As

Publication number Publication date
CN113434299A (en) 2021-09-24

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111382844B (en) Training method and device for deep learning model
CN101510781B (en) Method and device for filling dummy argument for interlace and de-interlace process as well as processing system
JPH11259441A (en) All-to-all communication method for parallel computer
CN111490795B (en) Intermediate value length isomerism-oriented encoding MapReduce method
WO2022134465A1 (en) Sparse data processing method for accelerating operation of re-configurable processor, and device
CN111104215A (en) Random gradient descent optimization method based on distributed coding
CN113434299B (en) Coding distributed computing method based on MapReduce framework
CN107800700B (en) Router and network-on-chip transmission system and method
CN116842998A (en) Distributed optimization-based multi-FPGA collaborative training neural network method
CN113505021B (en) Fault tolerance method and system based on multi-master-node master-slave distributed architecture
US20200242724A1 (en) Device and method for accelerating graphics processor units, and computer readable storage medium
CN112799852B (en) Multi-dimensional SBP distributed signature decision system and method for logic node
CN110766136B (en) Compression method of sparse matrix and vector
CN115103031B (en) Multistage quantization and self-adaptive adjustment method
US11297127B2 (en) Information processing system and control method of information processing system
CN112769522B (en) Partition structure-based encoding distributed computing method
CN117574966B (en) Model quantization method, device, electronic equipment and storage medium
CN113722666B (en) Application specific integrated circuit chip and method, block chain system and block generation method
CN111966404B (en) GPU-based regular sparse code division multiple access SCMA high-speed parallel decoding method
CN113704681B (en) Data processing method, device and super computing system
CN110598175B (en) Sparse matrix column vector comparison device based on graph computation accelerator
CN113052332B (en) Distributed model parallel equipment distribution optimization method based on equipment balance principle
JP3524430B2 (en) Reduction processing method for parallel computers
JP2019086976A (en) Information processing system, arithmetic processing unit, and control method for information processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant