CN112612601A - Intelligent model training method and system for distributed image recognition - Google Patents

Intelligent model training method and system for distributed image recognition Download PDF

Info

Publication number
CN112612601A
CN112612601A CN202011419005.XA CN202011419005A CN112612601A CN 112612601 A CN112612601 A CN 112612601A CN 202011419005 A CN202011419005 A CN 202011419005A CN 112612601 A CN112612601 A CN 112612601A
Authority
CN
China
Prior art keywords
matrix
task
edge
result
pool
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011419005.XA
Other languages
Chinese (zh)
Inventor
李领治
成聪
王进
谷飞
戴欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
CERNET Corp
Original Assignee
Suzhou University
CERNET Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University, CERNET Corp filed Critical Suzhou University
Priority to CN202011419005.XA priority Critical patent/CN112612601A/en
Publication of CN112612601A publication Critical patent/CN112612601A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an intelligent model training method and system for distributed image recognition, which comprises the steps of establishing a task pool and a result pool on an edge server; each edge working node randomly acquires a task from the task pool; calculating the task, and putting a result into the result pool by the edge working node; the edge server takes all results from the result pool and integrates all results into a final result. The invention has the advantages of short training time, high robustness and safety and capability of running on the edge embedded equipment.

Description

Intelligent model training method and system for distributed image recognition
Technical Field
The invention relates to the technical field of artificial intelligence and image recognition, in particular to an intelligent model training method and system for distributed image recognition.
Background
Deep learning is deeply studied in edge calculation, and efficient image processing technologies are all based on deep learning algorithms. How to deploy the deep learning Convolutional Neural Networks (CNNs) on edge equipment and use the edge embedded equipment distributed training image classification model has wide application prospect.
Training deep learning models on edge-embedded devices faces three challenges: firstly, the communication delay affects the time required by the main node to transmit data to the edge node; second, the communication bandwidth limits the amount of data transferred from the master node to the edge node per unit time; thirdly, the computing power of the edge nodes affects the computing power of the whole computing system; fourth, training the CNN model on a single edge-embedded device is difficult to implement. Thus, edge devices can be less computationally powerful than cloud computing, and multiple edge embedded devices must be combined using distributed computing methods to increase their computational power.
Data parallel and model parallel are two common methods for improving the operational capability of distributed deep learning. Data parallel refers to copying an untrained model onto multiple computing devices, having them the same model, and then partitioning the data set into them for parallel training. Data parallelism can speed up the training of CNN models, but requires that the slave computing devices be able to independently train the entire model. However, typical edge embedded devices cannot meet this requirement. Model parallel is a large model that is divided into different parts and deployed on multiple computing devices, so that the large model that cannot be trained on one computer can be trained in a distributed environment. However, the edge device is prone to the problem of falling behind, which may cause the failure of the parallel training process of the model. Therefore, the two distributed deep learning methods are not suitable for the edge embedded device, and a new distributed computing method must be designed to train the CNN model on the edge embedded device.
Disclosure of Invention
Therefore, the technical problem to be solved by the present invention is to overcome the problem that the method in the prior art is not suitable for running on the edge embedded device, so as to provide an intelligent model training method and system for distributed image recognition, which have the advantages of short training time, robustness and high security, and can run on the edge embedded device.
In order to solve the technical problem, the intelligent model training method for distributed image recognition comprises the following steps: creating a task pool and a result pool on an edge server; each edge working node randomly acquires a task from the task pool; calculating the task, and putting a result into the result pool by the edge working node; the edge server takes all results from the result pool and integrates all results into a final result.
In one embodiment of the present invention, the method for calculating the task is as follows: processing image data to convert the image data into a first matrix, converting weight parameters into a second matrix, uniformly dividing the first matrix into a plurality of first matrix blocks, and then coding the first matrix blocks by using a coding matrix; making a calculation task; distributing the computing task to the edge working node; the results returned from the edge worker nodes are collected and merged into the result of multiplying the first and second matrices.
In one embodiment of the invention, the first matrix block is redundantly encoded when the first matrix block is encoded using an encoding matrix.
In one embodiment of the present invention, a method for performing redundancy coding on a first matrix block is as follows: dividing the first matrix into n smaller first matrix blocks, then performing redundant coding on the first matrix blocks, and coding the n first matrix blocks by using the coding matrix; finally, m encoded second matrix blocks are obtained, where m > n.
In one embodiment of the invention, the second matrix block is assigned to the edge worker node for computation.
In an embodiment of the present invention, after the second matrix block is calculated, the calculation result is sent to the edge server, and the second matrix block is decoded.
In one embodiment of the invention, after decoding of the second matrix block is completed, all results are integrated into the final result.
In one embodiment of the present invention, the method for decoding the second matrix block comprises: the second matrix block is multiplied by a coding matrix, and the coding matrix is an inverse matrix of a matrix composed of encoded vectors of the n task results.
In one embodiment of the invention, the encoding matrix is stored on the edge server.
The invention also provides an intelligent model training system for distributed image recognition, which comprises: the pool creating module is used for creating a task pool and a result pool on the edge server; the task acquisition module is used for randomly acquiring a task from the task pool by each edge working node; the computing module is used for computing the task, and the edge working node puts the result into the result pool; and the integration module is used for taking all results from the result pool by the edge server and integrating all the results into a final result.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the present invention creates two pools in the edge server: the computing tasks are distributed to a plurality of nodes, so that the training time is reduced; and the edge embedded device does not passively distribute tasks but actively acquires tasks from the task pool. Under the mechanism, the more powerful the edge embedded equipment has, the more computation tasks can be completed, and the training system designed by the invention can run efficiently, so that the return of computation results is accelerated, and the training time is reduced.
The system is easy to implement because the system does not need to set tasks of different sizes for edge embedded devices of different performance, and does not need to determine which edge embedded device is a discrete node. In addition, the edge embedded device actively acquires the tasks from the task pool, so that the original computing tasks only need to be uniformly partitioned when the distributed computing starts, and the tasks in the task pool are the same in size.
The operation of the system is not influenced by the edge embedded equipment which falls behind, and the robustness of the system is improved. Although the edge embedded device that is dequeued returns the computation result for a longer time than other nodes, the computation task does not have to be abandoned. In a new round of distributed computation, powerful nodes do a large number of subtasks, while laggard nodes do few subtasks. The design of the edge distributed computing system successfully utilizes the computing resources existing in the discrete nodes, and increases the computing resources of the whole system.
Because the data transmission and the task allocation are carried out in a coding mode, the safety of the system can be ensured. The invention can protect the image data from being transmitted from the edge server to the edge embedded device and the safety of the edge embedded device when executing the calculation task.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference is now made to the following detailed description of the embodiments of the present disclosure taken in conjunction with the accompanying drawings, in which
FIG. 1 is a framework diagram of the intelligent model training method for distributed image recognition of the present invention;
FIG. 2 is a process diagram of the edge server authoring, distribution and collection of computing tasks in accordance with the present invention;
FIG. 3 is a diagram of the process of the present invention for redundantly encoding a large matrix into n blocks using an m × n vandermonde matrix;
FIG. 4 is a diagram of a process of the present invention for decoding n result matrix blocks;
FIGS. 5 a-5 c are graphs of experimental results of model training of 10 distributed processes with delay obeying uniform, exponential and normal distributions and with different ranges, different variances and different expectations using the MNIST data set in accordance with the present invention;
FIG. 6 is a block diagram of an intelligent model training system for distributed image recognition in accordance with the present invention;
FIG. 7 is a schematic diagram of the forward and backward propagation of the CNN of the present invention on an edge server;
FIG. 8 is an exemplary diagram of the convolution conversion to matrix multiplication of the present invention;
FIG. 9 is a flow chart of the LT-code strategy based matrix encoding algorithm of the present invention;
fig. 10 is a diagram of the LT-code strategy based matrix decoding process of the present invention.
Detailed Description
Example one
As shown in fig. 1 and fig. 2, the present embodiment provides an intelligent model training method for distributed image recognition, and step S1: creating a task pool and a result pool on an edge server; step S2: each edge working node randomly acquires a task from the task pool; step S3: calculating the task, and putting a result into the result pool by the edge working node; step S4: the edge server takes all results from the result pool and integrates all results into a final result.
In the method for training an intelligent model for distributed image recognition according to this embodiment, in step S1, a Task Pool (Task Pool) and a Result Pool (Result Pool) are created on an Edge Server (Edge Server), and since two pools are provided, it is beneficial to divide a round of computation into different steps; in step S2, each Edge work node (Edge Worker node) randomly obtains a task from the task pool, and because an Edge work node is an Edge embedded device, and the Edge embedded device does not passively allocate a task but actively obtains a task from the task pool, under this mechanism, the more powerful the Edge embedded device has, the more computation tasks can be completed, so that the training system designed by the present invention operates efficiently, the return of computation results is accelerated, and the training time is reduced; in addition, the original computing tasks are only required to be uniformly partitioned when the distributed computing is started, so that the tasks in the task pool are the same in size, and the tasks with different sizes are not required to be set for edge embedded devices with different performances, so that the distributed computing method is easy to realize; in step S3, the task is calculated, and the edge working node puts the result into the result pool; in the step S4, the edge server takes all the results from the result pool and integrates all the results into the final result, which not only has less training time, robustness and high security, but also can be operated on the edge embedded device.
The task calculating method comprises the following steps: processing image data to convert the image data into a first matrix, converting weight parameters into a second matrix, uniformly dividing the first matrix into a plurality of first matrix blocks, and then coding the first matrix blocks by using a coding matrix; making a calculation task; distributing the computing task to the edge working node; and collecting results returned from the edge worker nodes, and combining the results into a result obtained by multiplying the first matrix by the second matrix. In addition, the invention can protect the safety when the image data is transmitted from the edge server to the edge embedded device and the edge embedded device executes the calculation task.
Specifically, the edge server vectorizes the neural network to obtain a large number of matrix multiplications. In the forward propagation of the convolutional layer and the fully-connected layer, the image data is converted into the first matrix a, and the weight parameters are converted into the second matrix B, so that the protection of the first matrix a is equivalent to the protection of the image data.
When the first matrix block is encoded using the encoding matrix, the first matrix block is redundantly encoded.
The method for performing redundancy coding on the first matrix block comprises the following steps: dividing the first matrix into n smaller first matrix blocks, then performing redundant coding on the first matrix blocks, and coding the n first matrix blocks by using the coding matrix; finally, m encoded second matrix blocks are obtained, where m > n.
Specifically, as shown in fig. 2, the multiplication of the first matrix a and the second matrix B is a distributed calculation according to the design: the first matrix a is uniformly divided into a number of small matrix blocks, which are then encoded using an encoding matrix. The size of the small matrix block is determined according to the size of the maximum matrix multiplication which can be carried out by the edge embedded equipment with the worst performance; making a calculation task; distributing the computing task to the edge working node; the results returned from the edge worker nodes are collected and merged into the A B results.
The redundant coding has proven to perform well in distributed computing. To solve the dequeuer problem, the original matrix, i.e. the first matrix, is redundantly encoded here. Because the original matrix is coded, the safety of data is ensured, and the original matrix is redundant. The specific process of the MDS encoding strategy is shown in fig. 3, the vandermonde matrix can encode the redundancy of the original matrix under the MDS encoding strategy, but the encoding vectors in the vandermonde matrix increase with the increase of the encoding matrix, which also means that the calculation amount is larger and larger. In order to reduce the amount of computation and the coding complexity, the original matrix is first divided uniformly into n smaller blocks, and then redundant coding is performed on the basis of these matrix blocks. An m x n (m > n) coding matrix is used to code the n small matrix blocks. Finally, m encoded matrix blocks are obtained.
The invention can protect the image data from being transmitted from the edge server to the edge embedded device and the safety of the edge embedded device when executing the calculation task. As can be seen from fig. 2, the first matrix a to be protected is transformed from the image data. If the first matrix a is transmitted directly from the edge server to the edge embedded device in plaintext, the content of the first matrix a is easily leaked on the edge embedded device during the transmission and calculation process. Therefore, the present invention uses the coding matrix to code the matrix block of the first matrix a, and then protects the content of the coded matrix block during transmission and calculation. Even if an attacker collects all the encoding matrix blocks, the matrix cannot be obtained by intercepting the transmission of data packets or attacking the edge worker nodes, and the encoding matrix is stored on the edge server and does not participate in the transmission and the calculation on the edge worker nodes. Therefore, the present invention performs security protection on the first matrix a, i.e., protects the security of the image data.
The second matrix block is assigned to the edge worker node for computation. Wherein m encoded matrix blocks are assigned to edge worker nodes for computation. Because the edge server receives n calculation results from the edge embedded device, the original matrix multiplication result can be decoded.
And after the second matrix block is calculated, the calculation result is sent to the edge server, and the second matrix block is decoded.
And after the second matrix block is decoded, integrating all the results into a final result.
The method for decoding the second matrix block comprises the following steps: the second matrix block is multiplied by a coding matrix, and the coding matrix is an inverse matrix of a matrix composed of encoded vectors of the n task results. Specifically, the m encoded matrix blocks are assigned to the edge worker nodes for computation. Because the edge server receives n calculation results from the edge embedded device, the original matrix multiplication result can be decoded. As can be seen from fig. 4, the decoding matrix is the inverse of the matrix composed of the encoded vectors of the n task results. And multiplying the left side of the decoding matrix by the returned calculation result to obtain a final result.
In order to evaluate the performance of the safe distributed image classification model training system designed by the invention, a time delay model is arranged to simulate the delay of edge embedded equipment in the distributed computing process. Adding a time delay T to the process by using a Sleep () function, wherein the time delay T follows three distributions of uniform distribution, normal distribution and exponential distribution in sequence; then, a VGGNet model is trained on three kinds of delay-distributed edge-embedded devices with different ranges, different variances, and different expected distributed processes using the MNIST dataset, and when the model training runs out of a set of 50 images, the time spent by the system is recorded, and the results are shown in fig. 5a, 5b, and 5 c.
In fig. 5a, the delay settings for 10 distribution processes follow a normal distribution. Whether they are uncoded or coded, it can be seen from fig. 5a that as the delay range increases, the time taken by the system also increases. The MDS strategy and the LT-code strategy have equivalent performance when the coding scheme can reduce the time cost of the system. In fig. 5b, the performance of the coding scheme is also better than the non-coding scheme, where the delay follows a normal distribution. In fig. 5c, 10 distributed process delays are set to follow an exponential distribution. With the expectation increasing, the coding scheme operates more stably than the uncoded scheme. The MDS and LT-code strategies may reduce the time overhead of the system compared to the uncoded and 2-copy strategies.
The coding scheme may also improve the stability of the system. As the delay range increases, the time spent in the coding scheme increases slower than the time spent in the non-coding scheme in fig. 5 a. Also, as can be seen from fig. 5c, as the delay is expected to increase, the time-cost curve of the coding scheme fluctuates less, being smoother than the time-cost curve of the non-coding scheme. It can be seen that the coding scheme can improve the stability of the system when training the image classification model on heterogeneous edge worker nodes with different delays.
Example two
The embodiment provides an intelligent model training system for distributed image recognition, the problem solving principle of the intelligent model training system is similar to that of the intelligent model training method for distributed image recognition, and repeated parts are not repeated.
The intelligent model training system for distributed image recognition of the embodiment comprises:
the pool creating module is used for creating a task pool and a result pool on the edge server;
the task acquisition module is used for randomly acquiring a task from the task pool by each edge working node;
the computing module is used for computing the task, and the edge working node puts the result into the result pool;
and the integration module is used for taking all results from the result pool by the edge server and integrating all the results into a final result.
In this embodiment, matrix multiplication is performed using a distributed computation mode, and the edge server is responsible for distributing computation tasks to edge worker nodes and collecting computation results, in addition to maintaining forward and backward propagation of CNNs. The distribution and recovery scheme of the computing task plays an important role in a safe distributed image classification model training system. Because the operating performance of each edge embedded device is different, it is difficult for the edge server to actively contact all edge embedded devices. How to distribute computing tasks to edge embedded devices determines the performance of the overall system.
As shown in fig. 6, the intelligent model training system for distributed image recognition includes an edge server and a plurality of edge embedded devices.
The left graph is the physical network topology of the system, each edge embedded device communicates independently with the edge server from which it takes tasks and then returns the task results to the edge server in this network topology.
The right diagram is the working process of the edge server. And a CNN model is maintained on the edge server and is responsible for updating parameters of the convolutional layer and the fully-connected layer. The edge server also performs the task of making code computation tasks based on matrix multiplication and integrating the returned task results. The method for transforming convolution into matrix multiplication introduced by the invention helps the edge server to extract a calculation task from the neural network; then, the computing task is distributed to the edge embedded equipment, and the edge server collects the computing result of the edge embedded equipment; and finally, carrying out a round of encoding distributed computation. The Image data (Image data) is transmitted to the Convolution Layer (Convolution Layer), Other Layers (Other Layers), and Fully Connected Layers (full Connected Layers), after the data is convoluted, the result (Results) is output by utilizing Matrix Multiplication (Matrix Multiplication) and code distribution calculation (Coded Distributed Computing), and then the result is fed back to the Convolution Layer.
Although convolutional neural networks have many different layers, their computational load is mainly concentrated on convolutional and fully-connected layers, and distributed training of CNNs can be translated into distributed execution of convolutional and fully-connected layer computational load. The vectorization of the convolutional neural network is to complete the vectorization of the convolutional layer and the fully-connected layer.
Fig. 7 shows the forward and backward propagation of CNN on the edge server. In the forward propagation, the input image is processed by a convolution layer and a full link layer to obtain a predicted value. The deviation between the image prediction value and the index value becomes a gradient of forward propagation. In back propagation, the gradient is passed back to update the parameters in CNN. The forward and backward propagation in the convolutional layer can be converted into forward and backward propagation in the fully-connected layer by a method of convolution transformation into matrix multiplication. It is difficult to implement a distributed CNNs training system without performing a convolution transformation. The method of converting convolution into matrix multiplication can not only increase the speed of image convolution operation, but also reduce the difficulty of image convolution operation.
Fig. 8 shows an example of the conversion of image convolution into matrix multiplication using a matrix expansion method. It can be seen that after the convolution kernel window scans the image matrix with the step length of 1, 4 2 × 2 matrixes are obtained from the 3 × 3 image matrix; then spread out to a single row, respectively, and the convolution kernel spreads out to a single column. Finally, four rows form a large matrix, the large matrix is multiplied by a convolution kernel, and then the result of matrix multiplication is converted into the result of convolution operation.
The vectorization of forward propagation and backward propagation in CNNs, that is, the vectorization of forward propagation and backward propagation in the fully connected layer, is explained below. The forward propagation of CNN is to compute the prediction value of the training samples. The forward propagation vectorization process is as follows:
Figure BDA0002821446580000101
in formula (1): w[l]Is a parameter matrix of the first layer of CNN, b[l]For the first layer of CNN, A[l-1]Is the output of the l-1 layer, is also the input of the l-1 layer, g[l]Is the activation function of the l-th layer.
The back propagation of the CNN is to transmit the difference value between the predicted value and the true value of the training sample back to the convolutional layer and the full link layer of the network neural network and update the parameters therein. The vectorization process of back propagation is as follows:
Figure BDA0002821446580000102
Figure BDA0002821446580000103
db[l]=sum(dZ[l]) (4)
W[l]:=W[l]-αdW[l],b[l]:=b[l]-αdb[l] (5)
in the formula: d 2]Is [ 2 ]]The gradient of (2)]Is `]The lead function of]TIs [ 2 ]]Transpose in reverse propagation. Db in the formula (4)[l]Is dZ[l]The row and column. In the formula (5), α represents the learning rate of CNN.
In addition, in the invention, two strategies of MDS coding and LT coding are adopted for the distributed matrix multiplication.
In the (m, n) MDS encoding strategy, matrices from CNNs are encoded using vandermonde matrices. Any n row vectors of the m × n vandermonde matrix may constitute an n × n invertible matrix. The matrix encoding and return result decoding processes are as follows:
Em×t=Mm×n×An×t (6)
Rm×s=Em×t×Bt×s (7)
Figure BDA0002821446580000104
matrix M in formula (6)m×nIs a Van der Waals matrix, and can be used to convert the matrix A inton×tRedundant coding to obtain redundant coding matrix Em×tThen the redundant coding matrix is encodedEm×tMatrix B multiplied in a distributed environmentt×s. In the formula (9), represented by the matrix Rm×sThe resulting matrix Rn×sCan be decoded by the inverse of its coding matrix to finally obtain an An×t×Bt×sThe result of (1).
In the LT coding strategy, the number of robust transform (LT) codes as a kind of coding matrix blocks is not a fixed rateless code, which increases with the need of decoding end to decode the overall result. LT codes have a simple basis compared to MDS codes
Figure BDA0002821446580000111
The coding method of (1).
Figure BDA0002821446580000112
Is an exclusive or operation. In this system, addition and subtraction operations are used instead of exclusive-or operations, with practically the same performance.
In the encoding phase, the degree d is chosen to represent the number of original matrix blocks that are involved in the encoding. Randomly selecting d original matrix blocks from the n original matrix blocks, and taking degrees {1,2, …, n } of the d original matrix blocks. The degree d conforms to the robust soliton degree distribution ρ (d).
A subscript set Sd of d original matrices and Sd ∈ {1,2, …, n }.
The encoding process is shown in the algorithm presented in fig. 9. In the first step of the algorithm, a degree d is obtained from the robust soliton degree distribution ρ (d). For n original matrix blocks, the set of values of degree d is {1,2, …, n }; then, d matrix blocks are selected from the original matrix blocks and their sum is obtained. The sum of d matrix blocks is one coded matrix block. By repeating the above operations, an endless stream of code matrix blocks can be obtained in theory. In the algorithm, the number of actually generated coding matrix blocks is determined by setting a redundancy parameter r, and finally r × n coded matrix blocks are obtained. Specifically, the following are input: an original matrix block M; original matrix block nmThe number of (2); a robust soliton degree distribution ρ (d); a redundancy parameter r; and (3) outputting: encoding a matrix block E; coding a matrix block neThe number of the cells.
In the decoding stage, decoding starts from degree 1 blocks and their neighboring blocks, as shown in fig. 10, degree 1 blocks are more and more as the decoding process proceeds. The data block of degree 1 is actually a decoded data block. When all the original blocks are decoded, the decoding stage is complete.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

Claims (10)

1. A distributed image recognition intelligent model training method is characterized by comprising the following steps:
step S1: creating a task pool and a result pool on an edge server;
step S2: each edge working node randomly acquires a task from the task pool;
step S3: calculating the task, and putting a result into the result pool by the edge working node;
step S4: the edge server takes all results from the result pool and integrates all results into a final result.
2. The intelligent model training method for distributed image recognition according to claim 1, wherein: the task calculating method comprises the following steps: processing image data to convert the image data into a first matrix, converting weight parameters into a second matrix, uniformly dividing the first matrix into a plurality of first matrix blocks, and then coding the first matrix blocks by using a coding matrix; making a calculation task; distributing the computing task to the edge working node; the results returned from the edge worker nodes are collected and merged into the result of multiplying the first and second matrices.
3. The intelligent model training method for distributed image recognition according to claim 2, wherein: when the first matrix block is encoded using the encoding matrix, the first matrix block is redundantly encoded.
4. The intelligent model training method for distributed image recognition according to claim 3, wherein: the method for performing redundancy coding on the first matrix block comprises the following steps: dividing the first matrix into n smaller first matrix blocks, then performing redundant coding on the first matrix blocks, and coding the n first matrix blocks by using the coding matrix; finally, m encoded second matrix blocks are obtained, where m > n.
5. The intelligent model training method for distributed image recognition according to claim 4, wherein: the second matrix block is assigned to the edge worker node for computation.
6. The intelligent model training method for distributed image recognition according to claim 5, wherein: and after the second matrix block is calculated, the calculation result is sent to the edge server, and the second matrix block is decoded.
7. The intelligent model training method for distributed image recognition according to claim 6, wherein: and after the second matrix block is decoded, integrating all the results into a final result.
8. The intelligent model training method for distributed image recognition according to claim 6 or 7, wherein: the method for decoding the second matrix block comprises the following steps: the second matrix block is multiplied by a coding matrix, and the coding matrix is an inverse matrix of a matrix composed of encoded vectors of the n task results.
9. The intelligent model training method for distributed image recognition according to claim 1, wherein: the encoding matrix is stored on the edge server.
10. The utility model provides an intelligent model training system of distributed image recognition which characterized in that includes:
the pool creating module is used for creating a task pool and a result pool on the edge server;
the task acquisition module is used for randomly acquiring a task from the task pool by each edge working node;
the computing module is used for computing the task, and the edge working node puts the result into the result pool;
and the integration module is used for taking all results from the result pool by the edge server and integrating all the results into a final result.
CN202011419005.XA 2020-12-07 2020-12-07 Intelligent model training method and system for distributed image recognition Pending CN112612601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011419005.XA CN112612601A (en) 2020-12-07 2020-12-07 Intelligent model training method and system for distributed image recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011419005.XA CN112612601A (en) 2020-12-07 2020-12-07 Intelligent model training method and system for distributed image recognition

Publications (1)

Publication Number Publication Date
CN112612601A true CN112612601A (en) 2021-04-06

Family

ID=75229567

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011419005.XA Pending CN112612601A (en) 2020-12-07 2020-12-07 Intelligent model training method and system for distributed image recognition

Country Status (1)

Country Link
CN (1) CN112612601A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158243A (en) * 2021-04-16 2021-07-23 苏州大学 Distributed image recognition model reasoning method and system
WO2023016309A1 (en) * 2021-08-09 2023-02-16 International Business Machines Corporation Distributed machine learning in edge computing

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109889575A (en) * 2019-01-15 2019-06-14 北京航空航天大学 Cooperated computing plateform system and method under a kind of peripheral surroundings
CN110336643A (en) * 2019-07-05 2019-10-15 苏州大学 A kind of data processing method based on edge calculations environment
CN111125752A (en) * 2019-12-04 2020-05-08 苏州大学 Privacy protection method in edge computing environment
CN111381950A (en) * 2020-03-05 2020-07-07 南京大学 Task scheduling method and system based on multiple copies for edge computing environment
CN111510774A (en) * 2020-04-21 2020-08-07 济南浪潮高新科技投资发展有限公司 Intelligent terminal image compression algorithm combining edge calculation and deep learning
CN111552564A (en) * 2020-04-23 2020-08-18 中南大学 Task unloading and resource optimization method based on edge cache

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109889575A (en) * 2019-01-15 2019-06-14 北京航空航天大学 Cooperated computing plateform system and method under a kind of peripheral surroundings
CN110336643A (en) * 2019-07-05 2019-10-15 苏州大学 A kind of data processing method based on edge calculations environment
CN111125752A (en) * 2019-12-04 2020-05-08 苏州大学 Privacy protection method in edge computing environment
CN111381950A (en) * 2020-03-05 2020-07-07 南京大学 Task scheduling method and system based on multiple copies for edge computing environment
CN111510774A (en) * 2020-04-21 2020-08-07 济南浪潮高新科技投资发展有限公司 Intelligent terminal image compression algorithm combining edge calculation and deep learning
CN111552564A (en) * 2020-04-23 2020-08-18 中南大学 Task unloading and resource optimization method based on edge cache

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄永忠 等: "《面向对象方法与技术基础》", 国防工业出版社, pages: 302 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113158243A (en) * 2021-04-16 2021-07-23 苏州大学 Distributed image recognition model reasoning method and system
WO2023016309A1 (en) * 2021-08-09 2023-02-16 International Business Machines Corporation Distributed machine learning in edge computing
US11770305B2 (en) 2021-08-09 2023-09-26 International Business Machines Corporation Distributed machine learning in edge computing

Similar Documents

Publication Publication Date Title
Reisizadeh et al. Coded computation over heterogeneous clusters
Li et al. Coding for distributed fog computing
Karakus et al. Straggler mitigation in distributed optimization through data encoding
Kiani et al. Exploitation of stragglers in coded computation
Lee et al. Coded computation for multicore setups
CN109993299B (en) Data training method and device, storage medium and electronic device
Das et al. C 3 LES: Codes for coded computation that leverage stragglers
Ozfatura et al. Gradient coding with clustering and multi-message communication
CN112612601A (en) Intelligent model training method and system for distributed image recognition
CN110298446A (en) The deep neural network compression of embedded system and accelerated method and system
CN113158243A (en) Distributed image recognition model reasoning method and system
CN116644804B (en) Distributed training system, neural network model training method, device and medium
CN114580636A (en) Neural network lightweight deployment method based on three-target joint optimization
CN104281636B (en) The concurrent distributed approach of magnanimity report data
Hong et al. Chebyshev polynomial codes: Task entanglement-based coding for distributed matrix multiplication
CN112799852B (en) Multi-dimensional SBP distributed signature decision system and method for logic node
US20220036190A1 (en) Neural network compression device
CN113505021A (en) Fault-tolerant method and system based on multi-master-node master-slave distributed architecture
CN116681127B (en) Neural network model training method and device, electronic equipment and storage medium
Brennsteiner et al. A real-time deep learning OFDM receiver
CN114844781B (en) Method and system for optimizing Shuffle performance for encoding MapReduce under Rack architecture
CN107015946A (en) Distributed high-order SVD and its incremental computations a kind of method
WO2009083110A1 (en) Workload distribution in parallel computing systems
CN113505881A (en) Distributed neural network training method, device and medium for heterogeneous equipment
CN115515181B (en) Distributed computing method and system based on network coding in wireless environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210406