WO2020187041A1 - Procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique - Google Patents

Procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique Download PDF

Info

Publication number
WO2020187041A1
WO2020187041A1 PCT/CN2020/077973 CN2020077973W WO2020187041A1 WO 2020187041 A1 WO2020187041 A1 WO 2020187041A1 CN 2020077973 W CN2020077973 W CN 2020077973W WO 2020187041 A1 WO2020187041 A1 WO 2020187041A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
core
sub
network layer
many
Prior art date
Application number
PCT/CN2020/077973
Other languages
English (en)
Chinese (zh)
Inventor
张伟豪
李涵
裴京
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2020187041A1 publication Critical patent/WO2020187041A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of processors, in particular to a neural network mapping method and computing equipment based on a many-core processor.
  • Neural network algorithm is a mainstream artificial intelligence algorithm with the characteristics of high computational load and high parallelism. These characteristics make the neural network suitable for running on the many-core architecture, which is why the many-core processor architecture is currently an important way to build neural network accelerators.
  • neural network algorithms and suitable many-core processors how to map neural network algorithms to processors, and how to allocate resources such as computing, storage, and routing for each core of many-core processors, is an urgent need to solve The problem.
  • the present invention provides a many-core processor-based neural network mapping method and computing device that overcomes the above problems or at least partially solves the above problems.
  • a neural network mapping method based on a many-core processor including:
  • each of the network layers including multiple sub-network layers
  • the sub-network layers belonging to the same network layer group are merged according to preset rules to obtain multiple sub-network layer groups, and the multiple sub-network layer groups are respectively mapped to multiple cores of a preset many-core processor.
  • the sub-network layers belonging to the same network layer group are merged according to preset rules, and after obtaining multiple sub-network layer groups, the multiple sub-network layer groups are respectively mapped to preset many-core processors Before the multiple cores, it also includes:
  • the number of network layers included in at least one of the sub-network layer groups is greater than the first preset threshold, re-execute the network layer group to which the sub-network layer group belongs to which the number of network layers is greater than the first preset threshold Split within the group.
  • intra-group fusion it is possible that a group is allocated too large when the network is grouped, and the number of layers in the group is too large, which exceeds the upper limit of the resource burden of a nuclear energy. At this time, by regrouping, reduce the size of the group. The number of network layers to achieve load balancing of each core.
  • the method further includes:
  • the sub-network layer group corresponding to the first core is merged again to obtain at least one first sub-network layer group, and the first sub-network layer group is remapped to the second core of the many-core processor.
  • the first sub-network layer group is remapped to the second core of the many-core processor.
  • the method further includes:
  • At least part of the sub-network layer mapped to the third core is converted to the remaining core.
  • the overall re-split strategy can be added.
  • the calculation tasks of the cores with a larger load are redistributed to the remaining cores, which can not only improve
  • the utilization rate of each core in the core processor can also improve its operating efficiency.
  • the converting at least part of the sub-network layer mapped to the third kernel to the remaining kernel includes:
  • all network layers in the same network layer group are sequentially connected in the neural network.
  • the number of sub-network layers in each network layer group is equal.
  • sub-network layers with the same index in the network layer group are merged to obtain the sub-network layer group corresponding to the index.
  • a computing device including a many-core processor, characterized in that:
  • the many-core processor is configured to execute related algorithms of the neural network mapped by the neural network mapping method based on the many-core processor described above.
  • the computing device further includes:
  • the storage device is used to store a computer program, which is loaded and executed by the processor when the computer program is running in the computing device.
  • the present invention provides a more balanced mapping method based on many-core processor-based neural network mapping method.
  • the neural network will be divided through network grouping, intra-group splitting, intra-group fusion, and overall re-fusion.
  • Each network layer of the network is reasonably mapped to the cores of the many-core processor, and the calculation, storage, routing and other resources of each core of the many-core processor are allocated, so that the operation of the neural network is more efficient and the cores of the many-core processor
  • the load is more balanced than traditional solutions, effectively improving the efficiency of computing and storage resources.
  • Fig. 1 shows a schematic diagram of neural network mapping based on a many-core processor according to an embodiment of the present invention
  • FIG. 2 shows a schematic diagram of neural network mapping based on many-core processors according to another embodiment of the present invention
  • Fig. 3 shows a schematic diagram of neural network mapping based on a many-core processor according to another embodiment of the present invention
  • FIG. 4 shows a schematic flowchart of a neural network mapping method based on a many-core processor according to a preferred embodiment of the present invention
  • FIG. 5 shows a schematic diagram of network grouping of a neural network based on a many-core processor according to a preferred embodiment of the present invention
  • Fig. 6 shows a schematic diagram of intra-group splitting of a neural network based on a many-core processor according to a preferred embodiment of the present invention
  • Fig. 7 shows a schematic diagram of intra-group fusion of a neural network based on a many-core processor according to a preferred embodiment of the present invention
  • FIG. 8 shows a schematic diagram of routing between cores before and after intra-group fusion of a neural network based on a many-core processor according to a preferred embodiment of the present invention
  • Fig. 9 shows a schematic diagram of the overall re-fusion of a neural network based on a many-core processor according to a preferred embodiment of the present invention.
  • the algorithm responsible for allocating parallel algorithms to the many-core processor is generally called a scheduling algorithm, and scheduling is divided into static scheduling and dynamic scheduling.
  • static scheduling refers to the formulation of a scheduling strategy before the parallel algorithm is executed, and it runs in accordance with the established strategy during runtime.
  • Dynamic scheduling is different. Dynamic scheduling will decide how to schedule the next step according to the state of itself and the environment when the algorithm is running.
  • mapping refers to a static scheduling algorithm in a narrow sense, emphasizing that a certain part of the parallel algorithm is mapped to the cores of the many-core processor, and each core runs and only runs the mapped part of the algorithm.
  • mapping algorithms of neural networks there are few researches on the mapping algorithms of neural networks on many-core processors, but the mapping algorithms of general parallel algorithms on many-core processors have been extensively studied, and some universal methods have been formed.
  • the simplest general mapping algorithm can be to map each layer of the neural network to each core in turn until all the layers are allocated.
  • Layer0-Layer5 respectively represent network layers 1-6
  • Core0-Core5 respectively represent cores 1-6
  • Layer0-Layer5 can be corresponding respectively Map to Core0-Core5.
  • the various layers of a neural network may have the characteristics of extremely unbalanced calculation and storage.
  • Traditional parallel algorithm mapping techniques are rarely optimized specifically for the characteristics of the neural network.
  • the mapping technology through a simple universal strategy may cause extremely unbalanced load between each core, resulting in a large amount of waste of computing and storage resources, or routing blockage.
  • the network layer with high load that is, map one layer to multiple cores, and use multiple cores to calculate one layer. This is conducive to the load balancing of the overall architecture.
  • the technology used in this process can be called Splitting technology. As shown in Figure 2, the Layer5 (layer 5) is split, and the two sub-network layers after splitting are mapped to Core5 and Core6 respectively.
  • the fusion of the layers with relatively small load and the calculation of multiple layers with one core can improve the resource utilization of these cores.
  • the technology used in this process can be called fusion technology. As shown in Figure 3, Layer0 and Layer1 are merged and mapped to Core0 together.
  • the embodiments of the present invention provide a more efficient and balanced neural network mapping method based on a many-core processor.
  • the method provided in this embodiment may include:
  • Step S401 Obtain the neural network to be mapped, and combine all the network layers of the neural network to be mapped in order, and divide them into multiple network layer groups; wherein, all network layers in the same network layer group are sequentially connected in the neural network.
  • the first step is to group the network. All the network layers of the neural network to be mapped are divided into different network layer groups in order, and the network layers in the same network layer group are often embodied as a continuous segment on the neural network in connection relationship.
  • Figure 5 shows a schematic diagram of the network grouping of this embodiment.
  • the neural network can include Layer0-Layer5.
  • Layer0 and Layer1 can be divided into Group0 (group 0), and Layer 2-4 is Group1.
  • Group 1 Layer 5 alone is used as Group 2 (Group 2).
  • the network grouping shown in Figure 5 is only one of a variety of groups. In practical applications, all network layers of the neural network can be divided according to different needs. , The present invention is not limited.
  • Step S402 Split each of all network layers, and each network layer is included as a multi-layer sub-network layer.
  • the second step is split within the group. Due to the large amount of calculation in some network layers in neural networks, one or more layers of neural networks are selected for splitting by using split within the group. Specifically, splitting technology can be used to split each network layer group. When splitting within the group, for example, the number of sub-network layers included in each network layer in all network layer groups can be equal, that is, Network layer groups in the same group preferably have the same number of splits. Take the example shown in Figure 5, assuming that Group0 is split into 2 parts, then Layer0 and Layer1 will both be split into 2 parts. Similarly, if Group1 is split into 3 parts, Layer2, Layer3, and Layer4 will all be split into 3 parts; Group2 is split into 3 parts, and Layer5 is split into 3 parts.
  • the splitting is as equal as possible, that is, some of the algorithms obtained by splitting have as close as possible the amount of calculation, storage and/or routing.
  • an index can be used to indicate a part of the algorithm after the network layer is split, that is, Layer 0 is split into 2 to obtain Layer 0 [0] and Layer 0 [1], where [0] and [1] can be expressed as sub
  • the index of the network layer this process can be seen in Figure 6.
  • step S403 the sub-network layers belonging to the same network layer group are merged according to preset rules to obtain multiple sub-network layer groups, and the multiple sub-network layer groups are respectively mapped to multiple cores of a preset many-core processor.
  • multiple sub-network layers in each network layer group can be merged to obtain multiple sub-network layer groups.
  • an index can be used to represent part of the algorithm after the network layer is split.
  • sub-network layers belonging to the same network layer group are merged according to preset rules to obtain multiple sub-network layer groups, which further includes: For any network layer group, merge the sub-network layers with the same index in the network layer group to obtain the sub-network layer group corresponding to the index. That is, in each group, the fusion technology is used to merge the sub-network layers with the same index into one core.
  • Layer0[0] and Layer1[0] will be merged into Core0
  • Layer0[1] and Layer1[1] will be merged into Core1
  • step S403 merges the sub-network layers in the same network layer group to obtain multiple sub-network layer groups, it can also be determined whether the number of network layers included in the sub-network layer group is greater than a first preset threshold; If the number of network layers included in at least one sub-network layer group is greater than the first preset threshold, re-divide the network layer group to which the sub-network layer group belongs to which the number of network layers is greater than the first preset threshold. , Thereby reducing the number of network layers in this group.
  • the first preset threshold can be set according to different actual needs, which is not limited in the present invention.
  • Intra-group fusion can greatly reduce routing between cores. Take the intra-group fusion of Group0 as an example. As shown in Figure 8, before fusion, the total route of Layer0 and Layer1 can be represented by two thicker arrows and two thinner arrows. Among them, due to the data locality of neural network operations, the routing volume represented by thick arrows is generally much larger than the routing volume represented by thin arrows (except for fully connected neural networks). After intra-group fusion, the route represented by the thick arrow becomes the data transfer within the core. The real inter-core routing is only the amount of routing represented by the two thin arrows, and the total amount of routing is greatly reduced.
  • the embodiment of the present invention may further include step S404, screening out the first cores whose resource utilization rate is lower than a preset index among the plurality of cores; and fusing the sub-network layer groups corresponding to the first cores to obtain at least one first core.
  • a sub-network layer group remaps the above-mentioned first sub-network layer group to the second core of the many-core processor, where the first core is at least one.
  • a number of mapped cores are obtained.
  • these cores there may be some "small cores", that is, cores that do not occupy much resources or whose resource utilization is lower than the preset index.
  • the preset index can be based on Different many-core processors are set. If some of the cores are re-integrated and do not reach the bottleneck of the original mapping, that is, the neural network under the original mapping scheme will not run slower or exceed the memory and routing limits, then the overall re-fusion can be carried out to reduce the resources The cores with lower utilization are refused.
  • Figure 9 shows an example of this process.
  • Core0 is responsible for Layer0[0] and Layer1[0]
  • Core5 is responsible for Layer5[0]. Since Core0 and Core5 have low utilization rates, Core0 and Core5 can be re-fused at this time, and the fused Core0 is responsible for Layer0[ 0], Layer1[0], and Layer5[0]. Similarly, Core6 was originally responsible for Layer5[1], and Core7 was originally responsible for Layer5[2]. After the fusion of the two, Core5 can be responsible for both Layer5[1] and Layer5[2].
  • the embodiment of the present invention may further include step S405 of determining whether there are remaining cores in the many-core processor; if there are remaining cores, acquiring at least one third core in the many-core processor whose resource consumption rate is greater than a second preset threshold ; Convert at least part of the sub-network layer mapped to the third core to the remaining core, while other parts continue to be mapped to the third core.
  • one-half of the sub-network layer that has been mapped to the third core can be retained and mapped to the third core, and the other half of the sub-network layer can be converted and mapped to the remaining core.
  • the whole is split again, and the core with the largest load is continuously selected and split.
  • the one-for-two strategy can be used first until all the cores are used. nuclear.
  • the input feature map size is 224 ⁇ 224 ⁇ 3, and the solution of the above-mentioned embodiment is described as an example.
  • the network structure of the convolutional neural network VGG19 is shown in Table 1.
  • Convolutional neural network consists of input layer, convolution layer, activation function, pooling layer, fully connected layer, namely Input-Conv-ReLU-Pool-Fc, and prob (softmax, classifier).
  • CNN Convolutional neural network
  • the convolutional layer and the ReLU layer are concerned, and the adjacent convolutional layer and the ReLU layer are regarded as one layer, that is, layer i .
  • Table 2 evaluates the amount of calculation and storage for each layer.
  • the calculation amount is expressed by MAC, which is the number of accumulations.
  • MAC is the number of accumulations.
  • For the ReLU layer it is assumed that a MAC is recorded for each activation function operation on a number.
  • the storage capacity is expressed by the total number of weighted numbers in the feature map.
  • the unit of MAC is M, the unit of storage is K, 1M means 1000000, and 1K means 1000.
  • each core Assume that the storage size of each core is 4M (4000K).
  • the above mapping uses 16 cores, and the calculation utilization of each core is calculated using the following formula.
  • compute_rate i represents calculation utilization
  • i represents Corei (core i)
  • MAC i represents the calculation amount of Corei
  • MAC j represents the calculation amount of Corej.
  • memory_rate i represents the amount of memory
  • Mem i represents the amount of storage Corei.
  • the average calculated utilization rate is 65.85%, and the average storage utilization rate is 16.31%.
  • the network grouping can be as follows:
  • Group 0 ⁇ layer 0 ,layer 1 ⁇
  • Group 1 ⁇ layer 2 ,layer 3 ⁇
  • Group 2 ⁇ layer 4 ,layer 5 ,layer 6 ,layer 7 ⁇
  • Group 3 ⁇ layer 8 ,layer 9 ,layer 10 ,layer 11 ⁇
  • Group 4 ⁇ layer 12 ,layer 13 ,layer 14 ,layer 15 ⁇
  • the number of splits in Group 0 is 1, the number of splits in Group 1 is 2, the number of splits in Group 2 is 3, the number of splits in Group 3 is 3, and the number of splits in Group 4 is 1.
  • 10 nuclei were obtained. According to the results of these 10 nuclei splitting, it is found that no small nuclei can be further fused to obtain the final optimized solution.
  • the calculation and storage utilization of each core of this scheme are shown in Table 4.
  • the average calculated utilization rate is 90.43%, and the average storage utilization rate is 26.09%. It can be seen that the solution provided based on the embodiment of the present invention reduces the number of cores used compared with the traditional solution, and greatly improves the resource utilization rate.
  • the optimization scheme used in the example here is mainly aimed at increasing the computing utilization, so the storage utilization will be at a lower level. Due to the different splitting schemes, the split algorithm may have slight storage redundancy. The calculation of storage redundancy is ignored here.
  • an embodiment of the present invention also provides a computing device, including a many-core processor, characterized in that the many-core processor is configured to execute the many-core processor based on any one of the above
  • the neural network mapping method is related to the neural network mapping method.
  • the computing device further includes: a storage device for storing a computer program, and the computer program is loaded and executed by the processor when the computer program runs in the computing device.
  • the embodiment of the present invention provides a neural network mapping method based on a many-core processor with wider applicability and higher efficiency.
  • the neural network to be mapped will be grouped by the network, split within the group, integrated within the group, and integrated again. Steps to reasonably map the network layers of the neural network to the cores of the many-core processor, and allocate the computing, storage, routing and other resources of each core of the many-core processor, so that the neural network runs more efficiently and at the same time makes the many-core
  • the load of each core of the processor is more balanced than traditional solutions. In theory, it can be applied to the current mainstream neural network algorithms, including fully connected neural networks and convolutional neural networks, especially for convolutional neural networks.
  • the solutions provided by the embodiments of the present invention are particularly suitable for many-core accelerator architectures designed specifically for neural networks. Because the solution provided by the embodiment of the present invention is a static scheduling solution, the overhead required for scheduling during operation can be greatly reduced, and the neural network accelerator can use the main computing power for the calculation of the neural network itself.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

La présente invention concerne un procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique. Le procédé comporte les étapes consistant à: acquérir un réseau neuronal à mapper, combiner, dans un certain ordre, toutes les couches de réseau du réseau neuronal, et diviser les couches de réseau combinées en groupes multiples de couches de réseau; séparer chacune des couches de réseau, chaque couche de réseau comportant de multiples couches de sous-réseaux; et fusionner, selon une règle préconfigurée, des couches de sous-réseaux appartenant au même groupe de couches de réseau pour acquérir de multiples groupes de couches de sous-réseaux, et associer respectivement les multiples groupes de couches de sous-réseaux à de multiples cœurs d'un processeur multicœur préconfiguré. Des ressources telles que des ressources de calcul, des ressources de stockage, des ressources de routage, etc. de chaque cœur du processeur multicœur sont attribuées sur la base du procédé selon la présente invention, de sorte que le réseau neuronal fonctionne de manière efficiente, et que les charges sur chaque cœur du processeur multicœur sont plus équilibrées en comparaison d'une solution conventionnelle, améliorant ainsi en pratique le rendement des ressources de calcul et des ressources de stockage.
PCT/CN2020/077973 2019-03-18 2020-03-05 Procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique WO2020187041A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910203167.0A CN111723900B (zh) 2019-03-18 2019-03-18 一种基于众核处理器的神经网络的映射方法及计算设备
CN201910203167.0 2019-03-18

Publications (1)

Publication Number Publication Date
WO2020187041A1 true WO2020187041A1 (fr) 2020-09-24

Family

ID=72518948

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/077973 WO2020187041A1 (fr) 2019-03-18 2020-03-05 Procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique

Country Status (2)

Country Link
CN (1) CN111723900B (fr)
WO (1) WO2020187041A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884123A (zh) * 2021-02-23 2021-06-01 杭州海康威视数字技术股份有限公司 神经网络优化方法、装置、电子设备及可读存储介质
CN113485836A (zh) * 2021-07-21 2021-10-08 瀚博半导体(上海)有限公司 一种基于张量切分的张量处理方法和张量处理系统
CN114418063A (zh) * 2021-12-27 2022-04-29 北京百度网讯科技有限公司 神经网络模型中网络层的分配方法与装置
CN116167463A (zh) * 2023-04-26 2023-05-26 之江实验室 一种模型训练的方法、装置、存储介质及电子设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022171002A1 (fr) * 2021-02-10 2022-08-18 北京灵汐科技有限公司 Procédé et appareil de traitement de tâche, système à plusieurs noyaux et support lisible par ordinateur
CN112835718A (zh) * 2021-02-10 2021-05-25 北京灵汐科技有限公司 任务处理的方法和装置、众核系统、计算机可读介质
CN115098262B (zh) * 2022-06-27 2024-04-23 清华大学 一种多神经网络任务处理方法及装置

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018154494A1 (fr) * 2017-02-23 2018-08-30 Cerebras Systems Inc. Apprentissage profond accéléré
EP3382989A1 (fr) * 2017-03-31 2018-10-03 Solarflare Communications Inc Dispositif d'interface de réseau
US20180307899A1 (en) * 2017-04-24 2018-10-25 Intel Corproation Recognition, reidentification and security enhancements using autonomous machines
WO2019001418A1 (fr) * 2017-06-26 2019-01-03 上海寒武纪信息科技有限公司 Système de partage de données et procédé de partage de données associé
CN110515732A (zh) * 2019-08-23 2019-11-29 中国人民解放军国防科技大学 一种基于资源受限机器人深度学习推理的任务分配方法
CN110738316A (zh) * 2018-07-20 2020-01-31 北京三星通信技术研究有限公司 基于神经网络的操作方法、装置及电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9460012B2 (en) * 2014-02-18 2016-10-04 National University Of Singapore Fusible and reconfigurable cache architecture
CN106909971A (zh) * 2017-02-10 2017-06-30 华南理工大学 一种面向多核计算环境的bp神经网络并行化方法
CN109409513B (zh) * 2018-10-10 2021-03-12 广州市百果园信息技术有限公司 一种基于神经网络的任务处理方法及相关设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018154494A1 (fr) * 2017-02-23 2018-08-30 Cerebras Systems Inc. Apprentissage profond accéléré
EP3382989A1 (fr) * 2017-03-31 2018-10-03 Solarflare Communications Inc Dispositif d'interface de réseau
US20180307899A1 (en) * 2017-04-24 2018-10-25 Intel Corproation Recognition, reidentification and security enhancements using autonomous machines
WO2019001418A1 (fr) * 2017-06-26 2019-01-03 上海寒武纪信息科技有限公司 Système de partage de données et procédé de partage de données associé
CN110738316A (zh) * 2018-07-20 2020-01-31 北京三星通信技术研究有限公司 基于神经网络的操作方法、装置及电子设备
CN110515732A (zh) * 2019-08-23 2019-11-29 中国人民解放军国防科技大学 一种基于资源受限机器人深度学习推理的任务分配方法

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112884123A (zh) * 2021-02-23 2021-06-01 杭州海康威视数字技术股份有限公司 神经网络优化方法、装置、电子设备及可读存储介质
CN112884123B (zh) * 2021-02-23 2024-03-01 杭州海康威视数字技术股份有限公司 神经网络优化方法、装置、电子设备及可读存储介质
CN113485836A (zh) * 2021-07-21 2021-10-08 瀚博半导体(上海)有限公司 一种基于张量切分的张量处理方法和张量处理系统
CN113485836B (zh) * 2021-07-21 2024-03-19 瀚博半导体(上海)有限公司 一种基于张量切分的张量处理方法和张量处理系统
CN114418063A (zh) * 2021-12-27 2022-04-29 北京百度网讯科技有限公司 神经网络模型中网络层的分配方法与装置
CN114418063B (zh) * 2021-12-27 2023-01-06 北京百度网讯科技有限公司 神经网络模型中网络层的分配方法与装置
CN116167463A (zh) * 2023-04-26 2023-05-26 之江实验室 一种模型训练的方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN111723900B (zh) 2023-10-20
CN111723900A (zh) 2020-09-29

Similar Documents

Publication Publication Date Title
WO2020187041A1 (fr) Procédé de mappage de réseau neuronal employant un processeur multicœur et un dispositif informatique
WO2017156968A1 (fr) Procédé de calcul de réseau neuronal, système et dispositif associés
CN113193984B (zh) 一种空天地一体化网络资源映射方法及系统
CN110262901B (zh) 一种数据处理方法及数据处理系统
US11010313B2 (en) Method, apparatus, and system for an architecture for machine learning acceleration
US10320695B2 (en) Message aggregation, combining and compression for efficient data communications in GPU-based clusters
CN108923979B (zh) 软件定义网络虚拟网络映射方法
CN113708969B (zh) 一种基于深度强化学习的云数据中心虚拟网络的协同嵌入方法
WO2023050712A1 (fr) Procédé de planification de tâche pour service d'apprentissage profond, et appareil associé
CN111199275B (zh) 用于神经网络的片上系统
CN102622275A (zh) 一种云计算环境下负载均衡实现方法
TW202207031A (zh) 用於記憶體通道控制器之負載平衡
WO2016078205A1 (fr) Procédé et système de mise en œuvre de structure de répertoire pour système hôte
WO2020253383A1 (fr) Procédé de traitement de données en continu basé sur un processeur multicœur et dispositif informatique
CN111522885A (zh) 一种基于动态规划的分布式数据库系统协同优化方法
CN116070682A (zh) 神经元计算机操作系统的snn模型动态映射方法及装置
CN108304253A (zh) 基于缓存感知和数据本地性的map任务调度方法
WO2021244045A1 (fr) Procédé et appareil de traitement de données en réseau neuronal
CN212460600U (zh) 一种数据处理系统
CN104331336B (zh) 匹配于高性能计算机结构的多层嵌套负载平衡方法
CN116304212A (zh) 一种数据处理系统、方法、设备及存储介质
CN109992413A (zh) 一种面向宽度优先搜索算法的加速装置、方法及存储介质
CN106484879B (zh) 一种基于MapReduce的Map端数据的聚合方法
CN115277570A (zh) 流量分配方法、装置、计算机设备和存储介质
CN116418808A (zh) 一种mec的联合计算卸载和资源分配方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20773515

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20773515

Country of ref document: EP

Kind code of ref document: A1