CN110928481A - Distributed deep neural network and storage method of parameters thereof - Google Patents

Distributed deep neural network and storage method of parameters thereof Download PDF

Info

Publication number
CN110928481A
CN110928481A CN201811092458.9A CN201811092458A CN110928481A CN 110928481 A CN110928481 A CN 110928481A CN 201811092458 A CN201811092458 A CN 201811092458A CN 110928481 A CN110928481 A CN 110928481A
Authority
CN
China
Prior art keywords
parameter
blocks
memory
neural network
deep neural
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811092458.9A
Other languages
Chinese (zh)
Inventor
何东杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Unionpay Co Ltd
Original Assignee
China Unionpay Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Unionpay Co Ltd filed Critical China Unionpay Co Ltd
Priority to CN201811092458.9A priority Critical patent/CN110928481A/en
Publication of CN110928481A publication Critical patent/CN110928481A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a storage method of parameters in a distributed deep neural network. The method comprises the following steps: the management node constructs parameters required in a network according to the deep neural network, and divides the parameters into M parameter blocks; setting M parameter service nodes, wherein each parameter service node stores one parameter block in M parameter blocks, M is a natural number, setting copies of N parameter service nodes, dividing a memory on each parameter service node into N block memory blocks, wherein in the N block memory blocks, 1 block memory block stores one parameter block in M parameters as a main parameter data block, and the rest N-1 block memory blocks store copies of the rest N-1 parameter blocks respectively, wherein N is a natural number. According to the invention, by adopting the distributed multi-copy, the efficient access efficiency of the parameter data can be ensured, and the high availability effect can be provided.

Description

Distributed deep neural network and storage method of parameters thereof
Technical Field
The invention relates to a computer technology, in particular to a distributed deep neural network and a parameter storage method thereof.
Background
In recent years, with the development of mobile internet, mobile payment has been rapidly developed. In the near field, with the rapid development of big data and artificial intelligence technology, more and more deep neural network algorithms are used in the aspects of customer analysis, marketing analysis, business decision and the like of enterprises. The current deep learning open source technology is represented as the leading edge of the technology development, and has been applied to many enterprises, including tenserflow, caffe, pytorch, and the like.
According to a deep neural network operation mechanism, operation nodes are divided into parameter service nodes and gradient calculation nodes worker. The gradient calculation node reads the training data to perform gradient calculation, and the parameter service node is responsible for updating, distributing and the like of the parameters. Generally speaking, training of a model requires many iterations, considering that the size of parameters is large, and in order to improve the efficiency of data access, each parameter service node stores a part of the parameters and stores the parameter data in the memory. In the operation process, if the parameter service node is abnormal, the parameter data can be lost, so that the model training task fails.
Disclosure of Invention
In view of the above problems, the present invention aims to provide a distributed deep neural network and a method for storing parameters in the distributed deep neural network, which enable a model training task to continue to run when an anomaly occurs in a parameter service node by optimizing and improving the parameter service node in a distributed deep learning framework.
The method for storing the parameters in the distributed deep neural network is characterized by comprising the following steps:
a parameter dividing step, in which a management node divides parameters required in a network constructed according to a deep neural network into M parameter blocks; and
a parameter service node setting step, setting M parameter service nodes, each parameter service node storing one of M parameter blocks,
wherein M is a natural number. Optionally, before the parameter dividing step, the method further includes:
and a parameter node number estimation step, namely calculating the memory space required by storage according to the network scale during the operation of the deep neural network, and combining the number of backup copies and the memory size of each node to obtain the number M of the required minimum parameter service nodes, wherein the actual memory space is required to be larger than the memory space required by multi-copy storage.
Optionally, further comprising:
a copy of the N parameter service nodes is set,
the memory on each parameter service node is divided into N blocks of memory,
in the N block memory blocks, 1 block memory block stores one parameter number block in M block parameters as a main parameter data block, and the rest N-1 block memory blocks respectively store the copies of the rest N-1 parameter blocks, wherein N is a natural number.
Optionally, the parameters are divided into M equal blocks of parameter blocks, and the memory on each parameter service node is divided into N equal blocks of memory.
Optionally, storing non-repeating N parameter blocks in the N block memory block,
and dividing the parameter service nodes into N groups, wherein each group covers M parameter blocks.
Optionally, the management node randomly allocates copies of the parameter blocks to the remaining parameter nodes according to the number of copies of the parameter blocks so that each parameter node stores a different parameter block.
Optionally, each parameter service node synchronizes the parameters of the main parameter data block to the copies of the remaining N-1 parameter service nodes after updating the main parameter data block.
Optionally, each parameter service node persists data of the master parameter data block to the shared storage while updating the master parameter data block.
Optionally, setting M distributed memory service nodes,
the parameter service nodes are only responsible for parameter updating and accessing operations of parameter blocks, and the parameter distribution of the parameter blocks is stored in the M distributed memory service nodes.
Optionally, the parameter service node persists the parameters of the parameter block to a shared storage through the distributed memory service node.
The distributed deep neural network of an aspect of the present invention is characterized by comprising:
the management node is used for constructing parameters required in the network according to the deep neural network and dividing the parameters into M parameter blocks; and
the system comprises M parameter service nodes, wherein each parameter service node stores one of M parameter blocks, and M is a natural number.
Optionally, a memory space required for storage is calculated according to a network scale during operation of the deep neural network, and the number of the required minimum parameter service nodes is obtained by combining the number of the backup copies and the size of the memory of each node, wherein the requirement that the actual memory space is larger than the memory space required for multi-copy storage is met.
Optionally, further comprising:
the number of copies of the parameter service node is N,
the M parameter service nodes are respectively provided with a memory, the memory of each parameter service node is divided into N blocks of memory blocks,
in the N-block memory blocks, 1-block memory block stores one parameter number block in M-block parameters as a main parameter data block, the rest N-1-block memory blocks respectively store the copy of the rest N-1 parameter blocks,
wherein N is a natural number.
Optionally, storing non-repeating N parameter blocks in the N block memory block,
the parameter service nodes are divided into N groups, and each group covers M parameter blocks.
Optionally, the management node randomly allocates copies of the parameter blocks to the remaining parameter nodes according to the number of copies of the parameter blocks so that each parameter node stores a different parameter block.
Optionally, each parameter service node synchronizes the parameters of the main parameter data block to the copies of the remaining N-1 parameter service nodes after updating the main parameter data block.
Optionally, the method further comprises:
and the shared storage is used for storing the data of the main parameter data block from each parameter service node.
Optionally, the method further comprises:
and the M distributed memory service nodes are used for storing the parameters of the parameter blocks from the M parameter service nodes in a distributed manner.
Optionally, the method further comprises:
and the shared storage is used for storing the parameters of the parameter blocks from the M distributed memory service nodes.
A computer-readable storage medium of an aspect of the present invention on which a computer program is stored is characterized in that the program, when executed by a processor, implements the above-described storage method of parameters in a distributed deep neural network.
A computer device according to an aspect of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that the processor implements the above-described method for storing parameters in the distributed deep neural network when executing the computer program.
As described above, according to the distributed deep neural network and the method for storing parameters in the distributed deep neural network in one aspect of the present invention, by using distributed multiple copies, efficient access efficiency of parameter data can be ensured, and a high availability effect can also be provided, by persisting the parameter data to a local shared storage, high availability of data can be further provided, further by dividing the parameter data into blocks, data synchronization is completely copied according to data blocks, and synchronization efficiency can be improved, and further, by dividing the parameter data into N parts, concise data division and access efficiency are provided, so that a failed node reaches 2/3 at most, and is N-1 at least.
Drawings
Fig. 1 is a schematic diagram showing a storage method of parameters in a distributed deep neural network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram showing a storage method of parameters in the distributed deep neural network according to still another embodiment of the present invention.
Detailed Description
The following description is of some of the several embodiments of the invention and is intended to provide a basic understanding of the invention. It is not intended to identify key or critical elements of the invention or to delineate the scope of the invention.
The main technical scheme of the parameter storage method in the distributed deep neural network in one aspect of the invention is as follows:
calculating the memory space required by storage according to the network scale during the operation of the deep neural network, and then combining the number of backup copies and the memory size of each node to obtain the number of the required minimum parameter service nodes, and simultaneously ensuring that the actual memory space is slightly larger than the memory space required by the storage of a plurality of copies;
setting the number of the parameter service nodes as M and the number of the copies as N;
dividing the parameters into M blocks which are approximately equal, wherein each parameter service node stores 1 parameter data block in the M blocks respectively;
dividing the memory on each parameter service node into N blocks which are approximately equal, wherein 1 block stores a main parameter data block, and the rest spaces store the copies of the rest N-1 parameter blocks, namely the parameter service nodes synchronize the N-1 parameter data blocks from other nodes;
the parameter blocks are distributed according to a random distribution principle and a following principle, the memory blocks can be regarded as M-N matrixes, the principle is that each line in N lines is a nonrepeated M parameter blocks, the principle is that the parameter service nodes can be roughly divided into N groups, and each group can basically cover M parameter blocks;
each parameter service node updates the main parameter data block and simultaneously persists the parameter data into a shared storage;
after each parameter service node updates the main parameter data block, the main parameter data is synchronized to the rest N-1 replica nodes,
wherein M and N are natural numbers.
Thus, according to the method for storing the parameters in the distributed deep neural network, efficient access efficiency of parameter data can be ensured and a high availability effect can be provided by adopting the distributed multiple copies. Secondly, by persisting the parameter data to a local shared store, high availability of data can be further provided. In addition, the parameter data are divided into blocks, so that the data synchronization is completely copied according to the data blocks, and the synchronization efficiency can be improved. Furthermore, by dividing the parameter data into N parts, concise data division and access efficiency are provided, and the fault node can reach 2/3 at most and is N-1 at least.
Next, a method of storing parameters in a deep neural network and a storage system thereof according to an embodiment of the present invention will be described.
Fig. 1 is a schematic diagram showing a storage method of parameters in a distributed deep neural network according to an embodiment of the present invention.
As shown in fig. 1, the distributed deep neural network according to the first embodiment of the present invention includes a management node 100, M parameter service nodes (i.e., parameter service node 1, parameter service node 2 … … parameter service node M in fig. 1), and a shared storage 200, where M is a natural number.
The management node 100 divides the parameters into M parameter blocks (i.e., parameter block 1 and parameter block 2 … … parameter block M) with substantially equal length according to the parameters required by the deep neural network in the network construction. As shown in fig. 1, each parameter service node stores one parameter block of M parameter blocks, that is, in fig. 1, the parameter block 1 is stored by the parameter service node 1, and the parameter block 2 is stored by the parameter service node 2.
Moreover, the management node 100 randomly distributes the copies to the remaining nodes according to the number of copies, ensuring that each parameter node stores different parameter blocks, specifically for example: the parameter service node 1 stores parameter blocks a and b as copies in addition to the parameter block 1, the parameter service node 2 stores parameter blocks c and d as copies in addition to the parameter block 2, and the parameter service node M stores parameter blocks e and f as copies in addition to the parameter block M. That is, it is equivalent to divide the memory on each parameter service node into approximately equal 3 blocks (i.e., N = 3), where 1 block stores the main parameter data block and the remaining space stores 2 (N-1) copies of the parameter block, that is, the parameter service node synchronizes 2 (N-1) parameter data blocks from other nodes. The management node 100 realizes synchronization of duplicate parameter blocks between parameter service nodes according to the situation of duplicate distribution.
In the invention, all the parameter blocks are stored in the memory, thereby ensuring the efficiency of updating and accessing the parameter data. Moreover, after the main parameter block stored by each parameter service node is updated, the updated information is synchronously persisted to the shared storage 200, for example, the updated information may be recorded in a log form or directly copied to the parameter data in the storage memory.
As described above, according to the method for storing parameters in a distributed deep neural network according to an embodiment of the present invention, efficient access efficiency of parameter data can be ensured and a high availability effect can be provided by using distributed multiple copies. Secondly, by persisting the parameter data to the local shared storage 200, high availability of data can be further provided. In addition, the parameter data are divided into blocks, so that the data synchronization is completely copied according to the data blocks, and the synchronization efficiency can be improved.
Next, a method of storing parameters in the distributed deep neural network according to still another embodiment of the present invention will be described.
Fig. 2 is a schematic diagram showing a storage method of parameters in the distributed deep neural network according to still another embodiment of the present invention.
As shown in fig. 2, the management node 110 is configured to configure M parameter service nodes (i.e., parameter service node 1, parameter service node 2 ….. parameter service node M) and M distributed memory service nodes (i.e., respective memory service node 1, respective memory service node 2 ….. respective memory service node M in fig. 2). The M parameter service nodes persist the parameters to the shared storage 210 through the M distributed memory service nodes.
For example, the management node 110 constructs parameter information required in the network according to a deep neural network created by a task, and divides parameters into M parameter blocks substantially equal to each other according to the number of parameter service nodes, assuming that the parameter block 1 is stored by the parameter service node 1, and the parameter block 2 is stored by the parameter service node 2.
Here, the M parameter service nodes are only responsible for parameter updating and access operations of the parameter block, and specific parameters are stored in the distributed memory service nodes 1 to M, that is, the parameter service and the memory storage can be separated.
And the parameter distribution of the parameter block of the M parameter service nodes is stored in the M distributed memory service nodes. The M parameter service nodes persist the parameters of the parameter block to the shared storage 210 through the M distributed memory service nodes. The invention realizes the distributed memory storage, and the data is stored and updated in batch in a data block mode, so that the storage and update efficiency can be greatly improved.
The M parameter service nodes persist the parameters to the shared storage 210 through the M distributed memory service nodes, for example, as an example, the parameter data in the storage memory may be recorded in a log form or directly copied.
The present invention also provides a computer-readable storage medium on which a computer program is stored, wherein the program, when executed by a processor, implements the above-described method of storing parameters in a distributed deep neural network.
The invention also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the above-mentioned method for storing parameters in the distributed deep neural network when executing the computer program.
The above examples mainly illustrate the distributed deep neural network of the present invention and the storage method of parameters in the distributed deep neural network of the present invention. Although only a few embodiments of the present invention have been described in detail, those skilled in the art will appreciate that the present invention may be embodied in many other forms without departing from the spirit or scope thereof. Accordingly, the present examples and embodiments are to be considered as illustrative and not restrictive, and various modifications and substitutions may be made therein without departing from the spirit and scope of the present invention as defined by the appended claims.

Claims (21)

1. A method for storing parameters in a distributed deep neural network is characterized by comprising the following steps:
a parameter dividing step, in which a management node divides parameters required in a network constructed according to a deep neural network into M parameter blocks; and
a parameter service node setting step, setting M parameter service nodes, each parameter service node storing one of M parameter blocks,
wherein M is a natural number.
2. The method of storing parameters in a distributed deep neural network as claimed in claim 1, further comprising, before the parameter dividing step:
and a parameter node number estimation step, namely calculating the memory space required by storage according to the network scale during the operation of the deep neural network, and combining the number of backup copies and the memory size of each node to obtain the number M of the required minimum parameter service nodes, wherein the actual memory space is required to be larger than the memory space required by multi-copy storage.
3. The method of storing parameters in a distributed deep neural network as claimed in claim 1, further comprising after said parameter service node setting step:
a copy of the N parameter service nodes is set,
the memory on each parameter service node is divided into N blocks of memory,
in the N-block memory blocks, 1-block memory block stores one parameter number block in M-block parameters as a main parameter data block, the rest N-1-block memory blocks respectively store the copy of the rest N-1 parameter blocks,
wherein N is a natural number.
4. The method of storing parameters in a distributed deep neural network of claim 3,
and dividing the parameters into M equal parameter blocks, and dividing the memory on each parameter service node into N equal memory blocks.
5. The method of storing parameters in a distributed deep neural network of claim 3,
storing non-repeating N blocks of parameters in the N block memory block,
and dividing the parameter service nodes into N groups, wherein each group covers M parameter blocks.
6. The method of storing parameters in a distributed deep neural network of claim 3,
the management node randomly distributes the copies of the parameter blocks to the rest parameter nodes according to the copy number of the parameter blocks so that each parameter node stores different parameter blocks.
7. The method of storing parameters in a distributed deep neural network of claim 3,
after updating the main parameter data block, each parameter service node synchronizes the parameters of the main parameter data block to the copies of the remaining N-1 parameter service nodes.
8. The method of storing parameters in a distributed deep neural network of claim 3,
each parameter serving node persists data of the master parameter data block to shared storage while updating the master parameter data block.
9. The method of storing parameters in a distributed deep neural network of claim 2,
setting M distributed memory service nodes,
the parameter service nodes are only responsible for parameter updating and accessing operations of parameter blocks, and the parameter distribution of the parameter blocks is stored in the M distributed memory service nodes.
10. The method of storing parameters in a distributed deep neural network of claim 9,
and the parameter service node persists the parameters of the parameter block to shared storage through the distributed memory service node.
11. A distributed deep neural network, comprising:
the management node is used for constructing parameters required in the network according to the deep neural network and dividing the parameters into M parameter blocks; and
the system comprises M parameter service nodes, wherein each parameter service node stores one of M parameter blocks, and M is a natural number.
12. The distributed deep neural network of claim 11,
calculating the memory space required by storage according to the network scale during the operation of the deep neural network, and combining the number of backup copies and the memory size of each node to obtain the number M of the required minimum parameter service nodes, wherein the actual memory space is required to be larger than the memory space required by multi-copy storage.
13. The distributed deep neural network of claim 11, further comprising:
the number of copies of the parameter service node is N,
the M parameter service nodes are respectively provided with a memory, the memory of each parameter service node is divided into N blocks of memory blocks,
in the N-block memory blocks, 1-block memory block stores one parameter number block in M-block parameters as a main parameter data block, the rest N-1-block memory blocks respectively store the copy of the rest N-1 parameter blocks,
wherein N is a natural number.
14. The distributed deep neural network of claim 11,
storing non-repeating N blocks of parameters in the N block memory block,
the parameter service nodes are divided into N groups, and each group covers M parameter blocks.
15. The distributed deep neural network of claim 14,
the management node randomly distributes the copies of the parameter blocks to the rest parameter nodes according to the copy number of the parameter blocks so that each parameter node stores different parameter blocks.
16. The distributed deep neural network of claim 11,
after updating the main parameter data block, each parameter service node synchronizes the parameters of the main parameter data block to the copies of the remaining N-1 parameter service nodes.
17. The distributed deep neural network of claim 11, further comprising:
and the shared storage is used for storing the data of the main parameter data block from each parameter service node.
18. The distributed deep neural network of claim 11, further comprising:
and the M distributed memory service nodes are used for storing the parameters of the parameter blocks from the M parameter service nodes in a distributed manner.
19. The distributed deep neural network of claim 18, further comprising:
and the shared storage is used for storing the parameters of the parameter blocks from the M distributed memory service nodes.
20. A computer-readable storage medium on which a computer program is stored, which program, when executed by a processor, implements a method of storing parameters in a distributed deep neural network as claimed in any one of claims 1 to 10.
21. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements a method of storing parameters in a distributed deep neural network as claimed in any one of claims 1 to 10.
CN201811092458.9A 2018-09-19 2018-09-19 Distributed deep neural network and storage method of parameters thereof Pending CN110928481A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811092458.9A CN110928481A (en) 2018-09-19 2018-09-19 Distributed deep neural network and storage method of parameters thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811092458.9A CN110928481A (en) 2018-09-19 2018-09-19 Distributed deep neural network and storage method of parameters thereof

Publications (1)

Publication Number Publication Date
CN110928481A true CN110928481A (en) 2020-03-27

Family

ID=69855103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811092458.9A Pending CN110928481A (en) 2018-09-19 2018-09-19 Distributed deep neural network and storage method of parameters thereof

Country Status (1)

Country Link
CN (1) CN110928481A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256440A (en) * 2020-12-23 2021-01-22 上海齐感电子信息科技有限公司 Memory management method and device for neural network inference
CN115526302A (en) * 2022-08-19 2022-12-27 北京应用物理与计算数学研究所 Multilayer neural network computing method and device based on heterogeneous multi-core processor
CN117909418A (en) * 2024-03-20 2024-04-19 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
US20100274762A1 (en) * 2009-04-24 2010-10-28 Microsoft Corporation Dynamic placement of replica data
CN103034739A (en) * 2012-12-29 2013-04-10 天津南大通用数据技术有限公司 Distributed memory system and updating and querying method thereof
CN106156810A (en) * 2015-04-26 2016-11-23 阿里巴巴集团控股有限公司 General-purpose machinery learning algorithm model training method, system and calculating node
CN107395745A (en) * 2017-08-20 2017-11-24 长沙曙通信息科技有限公司 A kind of distributed memory system data disperse Realization of Storing
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA
CN108073986A (en) * 2016-11-16 2018-05-25 北京搜狗科技发展有限公司 A kind of neural network model training method, device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101753349A (en) * 2008-12-09 2010-06-23 中国移动通信集团公司 Upgrading method of data node, upgrade dispatching node as well as upgrading system
US20100274762A1 (en) * 2009-04-24 2010-10-28 Microsoft Corporation Dynamic placement of replica data
CN103034739A (en) * 2012-12-29 2013-04-10 天津南大通用数据技术有限公司 Distributed memory system and updating and querying method thereof
CN106156810A (en) * 2015-04-26 2016-11-23 阿里巴巴集团控股有限公司 General-purpose machinery learning algorithm model training method, system and calculating node
CN108073986A (en) * 2016-11-16 2018-05-25 北京搜狗科技发展有限公司 A kind of neural network model training method, device and electronic equipment
CN107395745A (en) * 2017-08-20 2017-11-24 长沙曙通信息科技有限公司 A kind of distributed memory system data disperse Realization of Storing
CN107578094A (en) * 2017-10-25 2018-01-12 济南浪潮高新科技投资发展有限公司 The method that the distributed training of neutral net is realized based on parameter server and FPGA

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256440A (en) * 2020-12-23 2021-01-22 上海齐感电子信息科技有限公司 Memory management method and device for neural network inference
CN112256440B (en) * 2020-12-23 2021-03-09 上海齐感电子信息科技有限公司 Memory management method and device for neural network inference
CN115526302A (en) * 2022-08-19 2022-12-27 北京应用物理与计算数学研究所 Multilayer neural network computing method and device based on heterogeneous multi-core processor
CN117909418A (en) * 2024-03-20 2024-04-19 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform
CN117909418B (en) * 2024-03-20 2024-05-31 广东琴智科技研究院有限公司 Deep learning model storage consistency method, computing subsystem and computing platform

Similar Documents

Publication Publication Date Title
US11461695B2 (en) Systems and methods for fault tolerance recover during training of a model of a classifier using a distributed system
US11442961B2 (en) Active transaction list synchronization method and apparatus
CN110389858B (en) Method and device for recovering faults of storage device
US20150178170A1 (en) Method and Apparatus for Recovering Data
US10496618B2 (en) Managing data replication in a data grid
CN110928481A (en) Distributed deep neural network and storage method of parameters thereof
CN109918229B (en) Database cluster copy construction method and device in non-log mode
CN109710586B (en) A kind of clustered node configuration file synchronous method and device
CN109522150B (en) Hypergraph-based self-adaptive decomposable part repeated code construction and fault repair method
US20210365300A9 (en) Systems and methods for dynamic partitioning in distributed environments
US20170371892A1 (en) Systems and methods for dynamic partitioning in distributed environments
US11307781B2 (en) Managing replicas of content in storage systems
CN102833273A (en) Data restoring method when meeting temporary fault and distributed caching system
US20180137055A1 (en) Log-Structured Storage Method and Server
CN114218193A (en) Data migration method and device, computer equipment and readable storage medium
KR20180012436A (en) The database management system and method for preventing performance degradation of transaction when table reconfiguring
CN112543920A (en) Data reconstruction method, device, computer equipment, storage medium and system
CN110298031B (en) Dictionary service system and model version consistency distribution method
CN109815047B (en) Data processing method and related device
CN113821362B (en) Data copying method and device
CN111240577B (en) MPP database-based data multi-fragment storage method and device
CN114528139A (en) Method, device, electronic equipment and medium for data processing and node deployment
CN112181974B (en) Identification information distribution method, system and storage device
CN113836238A (en) Batch processing method and device for data commands
CN108304370B (en) Data updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination