CN116451757B - Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model - Google Patents

Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model Download PDF

Info

Publication number
CN116451757B
CN116451757B CN202310722520.2A CN202310722520A CN116451757B CN 116451757 B CN116451757 B CN 116451757B CN 202310722520 A CN202310722520 A CN 202310722520A CN 116451757 B CN116451757 B CN 116451757B
Authority
CN
China
Prior art keywords
target
acceleration
operation result
heterogeneous
heterogeneous acceleration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310722520.2A
Other languages
Chinese (zh)
Other versions
CN116451757A (en
Inventor
李乐乐
张晖
赵鑫鑫
姜凯
李锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN202310722520.2A priority Critical patent/CN116451757B/en
Publication of CN116451757A publication Critical patent/CN116451757A/en
Application granted granted Critical
Publication of CN116451757B publication Critical patent/CN116451757B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/485Task life-cycle, e.g. stopping, restarting, resuming execution
    • G06F9/4856Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration
    • G06F9/4862Task life-cycle, e.g. stopping, restarting, resuming execution resumption being on a different machine, e.g. task migration, virtual machine migration the task being a mobile agent, i.e. specifically designed to migrate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application discloses a heterogeneous acceleration method, a heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration media of a neural network model, which relate to the field of heterogeneous acceleration and comprise the following steps: after the main heterogeneous acceleration node acquires a model training task, migrating training data to the data heterogeneous acceleration node to perform format conversion on the training data to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration node; determining a target acceleration mode based on the layer number of the neural network model and determining a corresponding target heterogeneous acceleration node, so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result; and (3) fully connecting the final operation result by using a local CPU to obtain a trained neural network model. According to the method, different acceleration modes are selected according to the number of model layers, and format conversion of data is completed rapidly through the data heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes.

Description

Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
Technical Field
The present application relates to the field of heterogeneous acceleration, and in particular, to a heterogeneous acceleration method, apparatus, device, and medium for a neural network model.
Background
With the continuous development and innovation of hardware technology, deep learning technology is a new generator, wherein NNs (Neural Networks) as a connection model have achieved good effects on image classification, natural language processing and rapid target detection tasks. With the increasing number of layers and neurons of NNs, for training of large models, it has been difficult for the conventional software and hardware architectures such as CPU (Central Processing Unit ) and GPU (Graphics Processing Unit, graphics processor) to balance training efficiency with hardware resources and energy consumption, and meanwhile, the increasing speed and energy consumption requirements cannot be met only by using software optimization as a means for accelerating NNs.
Disclosure of Invention
Accordingly, the present application is directed to a heterogeneous acceleration method, apparatus, device, and medium for a neural network model, which can select different acceleration modes according to the number of model layers, and rapidly complete format conversion of data through heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes. The specific scheme is as follows:
in a first aspect, the present application provides a heterogeneous acceleration method of a neural network model, applied to a master heterogeneous acceleration node, including:
after a neural network model training task is acquired, migrating training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and returning the feature data to the main heterogeneous acceleration node;
determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;
and carrying out full connection processing on the final operation result by using a local CPU so as to obtain a trained neural network model.
Optionally, the migrating training data to the data heterogeneous acceleration node includes:
and caching the training data through a local CPU, and migrating the cached training data to the data heterogeneous acceleration node through XDMA.
Optionally, the determining the target acceleration mode based on the number of layers of the neural network model, and determining the target heterogeneous acceleration node corresponding to the target acceleration mode includes:
determining a target acceleration mode based on the number of layers of the neural network model;
and if the target acceleration mode is an independent acceleration mode, determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode.
Optionally, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result, includes:
and carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.
Optionally, the determining a target acceleration mode based on the number of layers of the neural network model, and determining a target heterogeneous acceleration node corresponding to the target acceleration mode, so as to generate, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, includes:
determining a target acceleration mode based on the number of layers of the neural network model;
and if the target acceleration mode is a hybrid acceleration mode, decoupling the neural network model training task to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node.
Optionally, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result, includes:
if the number of the sub-tasks in the task schedule is one, corresponding preset layer operation in a neural network model is performed through the target heterogeneous acceleration node and based on the characteristic data, the generated intermediate operation result is stored locally, iterative operation is performed based on the intermediate operation result to generate a final operation result, and then the final operation result is returned to a CPU of the main heterogeneous acceleration node through XDMA.
Optionally, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result, includes:
if the number of subtasks in the task schedule is multiple, performing corresponding preset layer operation in a neural network model through the target heterogeneous acceleration node corresponding to the first subtask and based on the corresponding characteristic data to obtain a current intermediate operation result; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a neural network model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA.
In a second aspect, the present application provides a heterogeneous acceleration device of a neural network model, applied to a main heterogeneous acceleration node, including:
the training data migration module is used for migrating training data to the data heterogeneous acceleration nodes after acquiring a neural network model training task so that the data heterogeneous acceleration nodes convert the format of the training data to obtain characteristic data in a uniform format and return the characteristic data to the main heterogeneous acceleration nodes;
the acceleration mode determining module is used for determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;
and the result full-connection module is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained neural network model.
In a third aspect, the present application provides an electronic device, comprising:
a memory for storing a computer program;
and the processor is used for executing the computer program to realize the heterogeneous acceleration method of the neural network model.
In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the heterogeneous acceleration method of the neural network model described above.
After acquiring a neural network model training task, a main heterogeneous acceleration node migrates training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain characteristic data in a uniform format, and returns the characteristic data to the main heterogeneous acceleration node; determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result; and carrying out full connection processing on the final operation result by using a local CPU so as to obtain a trained neural network model. Therefore, the training process of the neural network model is unloaded to the heterogeneous acceleration nodes, and the training efficiency of the model is improved through distributed cooperative work of the heterogeneous acceleration nodes; in addition, the application rapidly completes the format conversion of the training data through the independent data heterogeneous acceleration node, reduces the workload of the main heterogeneous acceleration node and realizes the reasonable distribution of resources; in addition, according to the method, different acceleration modes are selected according to the layer number of the neural network model, so that the neural network model is operated through different target heterogeneous acceleration nodes corresponding to the different acceleration modes, and reasonable utilization of resources is better realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a heterogeneous acceleration method of a neural network model disclosed by the application;
FIG. 2 is a heterogeneous acceleration flow chart of a neural network model according to the present disclosure;
FIG. 3 is a flowchart of a heterogeneous acceleration method of a specific neural network model disclosed in the present application;
FIG. 4 is a heterogeneous acceleration flow chart of a neural network model in an independent acceleration mode according to the present disclosure;
FIG. 5 is a heterogeneous acceleration flow chart of a neural network model in a hybrid acceleration mode according to the present disclosure;
FIG. 6 is a schematic diagram of a heterogeneous accelerator of a neural network model according to the present disclosure;
fig. 7 is a block diagram of an electronic device according to the present disclosure.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
At present, with the continuous increase of the number of layers and the number of neurons of NNs, for the training of large models, the traditional software and hardware architectures such as a CPU and a GPU have difficulty in balancing training efficiency with hardware resources and energy consumption, and meanwhile, the increasing speed and energy consumption requirements cannot be met by using software optimization as a means for accelerating NNs. Therefore, the heterogeneous acceleration method of the neural network model selects different acceleration modes according to the layer number of the neural network model, and rapidly completes format conversion of data through independent heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes.
Referring to fig. 1, an embodiment of the present application discloses a heterogeneous acceleration method of a neural network model, which is applied to a main heterogeneous acceleration node, and includes:
and S11, after a neural network model training task is acquired, migrating training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration node.
In this embodiment, the migrating training data to the data heterogeneous acceleration node may include caching the training data by a local CPU, and migrating the cached training data to the data heterogeneous acceleration node by XDMA. It can be understood that, as shown in fig. 2, after the main heterogeneous acceleration node receives the training task of the neural network model, the local CPU is utilized to buffer the training data, and since different data formats exist in the training data, such as images, voices and the like, and different data formats exist in the training data, especially three-dimensional data, such as images, the buffered training data is firstly sent to the data heterogeneous acceleration node of the next stage through an XDMA (Direct Memory Access ) channel, so that the training data is subjected to format conversion processing through the independent data heterogeneous acceleration node to form feature data in a unified format, and then the training data in the unified format is returned to the main heterogeneous acceleration node for buffering. In this way, the embodiment uses the independent data heterogeneous acceleration node to perform format conversion on the training data, so that the workload of the master heterogeneous acceleration node is reduced, the resource consumption is controlled, and the reasonable allocation of the resources is realized.
And step S12, determining a target acceleration mode based on the layer number of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result.
In this embodiment, different acceleration modes are determined according to the number of layers of the neural network model, that is, different acceleration modes are determined according to the size of the neural network model, and the different acceleration modes correspond to different target heterogeneous acceleration nodes. As shown in fig. 2, for a small number of layers of the neural network model, that is, a small neural network model, an independent acceleration mode is selected, a corresponding target heterogeneous acceleration node is a main heterogeneous acceleration node, an FPGA (Field Programmable Gate Array ) in the main heterogeneous acceleration node performs preset layer operation in the neural network model based on the feature data, and a final operation result is determined based on the generated intermediate operation result. Aiming at the situation that the number of layers of the neural network model is large, namely the large neural network model, a hybrid acceleration mode is selected, the corresponding target heterogeneous acceleration node is the branch heterogeneous acceleration node, the branch heterogeneous acceleration node independently completes corresponding preset layer operation in the neural network model based on corresponding characteristic data respectively, and a final operation result is determined based on the generated intermediate operation result. The branch heterogeneous acceleration nodes comprise an A-layer acceleration node, a B-layer acceleration node, a C-layer acceleration node, … and an N-layer acceleration node in fig. 2; the preset layer operation includes, but is not limited to, shaping, convolution, activation and pooling, and may be one or a combination of any several of them, which is not limited herein.
And S13, performing full connection processing on the final operation result by using a local CPU to obtain a trained neural network model.
In this embodiment, as shown in fig. 2, in either the independent acceleration mode or the hybrid acceleration mode, the master heterogeneous acceleration node performs full connection processing on the final operation result by using a local CPU, so as to obtain a trained neural network model, where the final operation result is stored in a global memory of the master heterogeneous acceleration node.
In this embodiment, taking the application to the target behavior detection scenario as an example, the training process of the target behavior detection model is offloaded to various heterogeneous acceleration nodes, where the target behaviors include, but are not limited to, opening and closing exercises, squatting exercises, sit-ups exercises, and the like. After the master heterogeneous acceleration node receives the training task of the target behavior detection model, a local CPU is utilized to cache an image training data set constructed based on a plurality of specific behavior gesture data. Because the image data in the image training data set has different data formats, the cached image training data set is migrated to the independent data heterogeneous acceleration node through the XDMA channel, so that the data heterogeneous acceleration node performs format conversion processing on all the received image training data in the image training data set to unify the pixel size of the image training data, thereby obtaining the image training data set with the unified pixel size, and returning to the main heterogeneous acceleration node. And then the main heterogeneous acceleration node determines a corresponding target acceleration mode based on the layer number of the target behavior detection model, and the target heterogeneous acceleration node corresponding to the target acceleration mode so as to carry out operations such as shaping, convolution, activation, pooling and the like through the target heterogeneous acceleration node and based on the image training data set with the uniform pixel size, and generates a final operation result based on an intermediate operation result generated in the operation process. And finally, performing full connection processing on the final operation result by a CPU of the main heterogeneous acceleration node to obtain a final trained target behavior detection model, thereby detecting the specific behavior gesture by using the trained target behavior detection model.
Therefore, the training process of the neural network model is unloaded to the heterogeneous acceleration nodes, and the training efficiency of the model is improved through distributed cooperative work of the heterogeneous acceleration nodes; in addition, the application rapidly completes the format conversion of the training data through the independent data heterogeneous acceleration node, reduces the workload of the main heterogeneous acceleration node and realizes the reasonable distribution of resources; in addition, according to the method, different acceleration modes are selected according to the layer number of the neural network model, so that the neural network model is operated through different target heterogeneous acceleration nodes corresponding to the different acceleration modes, and reasonable utilization of resources is better realized.
Based on the previous embodiment, the present application describes the overall process of heterogeneous acceleration of the neural network model, and next, the present application will be described in detail on how to determine the target acceleration mode, and how to perform preset layer operation in the neural network model through the target heterogeneous acceleration node corresponding to the target acceleration mode. Referring to fig. 3, an embodiment of the present application discloses a heterogeneous acceleration process of a neural network model, which is applied to a master heterogeneous acceleration node, and includes:
and S21, determining a target acceleration mode based on the number of layers of the neural network model.
And S22, if the target acceleration mode is an independent acceleration mode, determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode.
In this embodiment, the corresponding target acceleration mode is selected according to the number of layers of the neural network model, that is, the corresponding target acceleration mode is selected according to the size of the neural network model, and if the number of layers of the neural network model is small, that is, the neural network model is a small neural network model, the target acceleration mode is an independent acceleration mode, and the target heterogeneous acceleration node corresponding to the independent acceleration mode is the master heterogeneous acceleration node itself.
Step S23, carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local internal memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.
In this embodiment, as shown in fig. 4, after the main heterogeneous acceleration node obtains the feature data in the unified format returned by the data heterogeneous acceleration node, it is determined that the target acceleration mode is the independent acceleration mode based on the number of layers of the neural network model, and the target heterogeneous acceleration node corresponding to the target acceleration mode is the main heterogeneous acceleration node itself. Specifically, the master heterogeneous acceleration node determines preset layer operations in the neural network model to be executed according to the neural network model training task, for example, the preset layers are an integer layer, a convolution layer, an activation layer and a pooling layer, the integer layer, the convolution layer, the activation layer and the pooling layer are connected through an AXI-ST (Advanced eXtensible Interface, bus protocol) protocol to form a serial running water processing mechanism, so that the integer, convolution, activation and pooling operations are sequentially executed based on the characteristic data, the generated intermediate operation result is stored in a local memory, and the final operation result generated based on the intermediate operation result is stored in a local global memory. And finally, performing full connection processing on the final operation result by utilizing a CPU in the main heterogeneous acceleration node, thereby obtaining a trained neural network model.
In this embodiment, in addition to the independent acceleration mode, there is a hybrid acceleration mode, specifically, a target acceleration mode is determined based on the number of layers of the neural network model; and if the target acceleration mode is a hybrid acceleration mode, decoupling the neural network model training task to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node. It can be understood that, as shown in fig. 5, after the main heterogeneous acceleration node acquires the feature data in the unified format returned by the data heterogeneous acceleration node, a corresponding target acceleration mode is selected according to the number of layers of the neural network model, and if the number of layers of the neural network model is large, that is, the neural network model is a large neural network model, the target acceleration mode is a hybrid acceleration mode. And then the main heterogeneous acceleration node decouples the training task of the neural network model to obtain a task schedule containing sub-tasks, further determines branch heterogeneous acceleration nodes corresponding to the sub-tasks in the task schedule from branch heterogeneous acceleration nodes such as an A-layer acceleration node, a B-layer acceleration node, a C-layer acceleration node, a … and an N-layer acceleration node, takes the branch heterogeneous acceleration nodes as target heterogeneous acceleration nodes, and then migrates characteristic data corresponding to the sub-tasks to the corresponding target heterogeneous acceleration nodes so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration nodes.
In this embodiment, for the hybrid acceleration mode, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data and determining a final operation result based on the intermediate operation result may include performing, by the target heterogeneous acceleration node and based on the feature data, a corresponding preset layer operation in the neural network model if the number of sub-tasks in the task schedule is one, and storing the generated intermediate operation result locally, and performing an iterative operation based on the intermediate operation result to generate the final operation result, and then returning, by XDMA, the final operation result to the CPU of the master heterogeneous acceleration node. It may be appreciated that, as shown in fig. 5, in a specific embodiment, if the number of subtasks in the task schedule obtained by decoupling is one, for example, the subtasks are convolution tasks, then convolution operations are performed by the target heterogeneous acceleration nodes corresponding to the convolution tasks based on the corresponding feature data, and the generated intermediate operation result is stored locally, and then, the iterative operation of convolution is performed based on the intermediate operation result until the iteration is completed, so as to generate a final operation result. After the target heterogeneous acceleration node obtains the final operation result, the final operation result is returned to the main heterogeneous acceleration node through XDMA, so that a CPU in the main heterogeneous acceleration node performs full connection processing on the final operation result, and a trained neural network model is obtained.
In this embodiment, for the hybrid acceleration mode, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result may include performing, by the target heterogeneous acceleration node corresponding to the first sub-task and based on the corresponding feature data, a corresponding preset layer operation in the neural network model to obtain a current intermediate operation result if the number of sub-tasks in the task schedule is multiple; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a neural network model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA. It may be appreciated that, as shown in fig. 5, in another embodiment, if the number of sub-tasks in the task schedule obtained by decoupling is multiple, for example, the sub-tasks include an integer task, a convolution task, an activation task and a pooling task, then the current intermediate operation result is obtained by performing an integer operation on the target heterogeneous acceleration node corresponding to the first sub-task (integer task) and based on the corresponding feature data, and then the current intermediate operation result is sent to the target heterogeneous acceleration node corresponding to the next sub-task (convolution task) through the XDMA, so that the target heterogeneous acceleration node corresponding to the convolution task performs a convolution operation based on the corresponding feature data and the current intermediate operation result, so as to obtain a new current intermediate operation result, and so on, until the next sub-task is the last sub-task (pooling task), so as to obtain a final operation result, and the final operation result is returned to the main heterogeneous acceleration node through the XDMA, so that the CPU in the main heterogeneous acceleration node performs a full connection process on the final operation result, so as to obtain the trained neural network model.
It can be seen that, according to the method, the corresponding target acceleration mode is selected according to the layer number of the neural network model, so that different target heterogeneous acceleration nodes corresponding to different target acceleration modes are utilized to execute preset layer operation in the neural network model, namely, for the independent acceleration mode, the main heterogeneous acceleration node is selected to execute the preset layer operation in the neural network model; for the hybrid acceleration mode, the branch heterogeneous acceleration nodes are selected to execute the preset layer operation in the neural network model, so that the energy consumption of the main heterogeneous acceleration nodes can be controlled, the reasonable allocation and utilization of resources are realized, and the training efficiency of the neural network model can be improved.
Referring to fig. 6, an embodiment of the present application discloses a heterogeneous acceleration device of a neural network model, which is applied to a main heterogeneous acceleration node, and includes:
the training data migration module 11 is configured to migrate training data to a data heterogeneous acceleration node after a neural network model training task is acquired, so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and return the feature data to the main heterogeneous acceleration node;
an acceleration pattern determining module 12, configured to determine a target acceleration pattern based on a number of layers of the neural network model, and determine a target heterogeneous acceleration node corresponding to the target acceleration pattern, so as to generate an intermediate operation result of the neural network model by the target heterogeneous acceleration node based on the feature data, and determine a final operation result based on the intermediate operation result;
and the result full-connection module 13 is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained neural network model.
Therefore, the training process of the neural network model is unloaded to the heterogeneous acceleration nodes, and the training efficiency of the model is improved through distributed cooperative work of the heterogeneous acceleration nodes; in addition, the application rapidly completes the format conversion of the training data through the independent data heterogeneous acceleration node, reduces the workload of the main heterogeneous acceleration node and realizes the reasonable distribution of resources; in addition, according to the method, different acceleration modes are selected according to the layer number of the neural network model, so that the neural network model is operated through different target heterogeneous acceleration nodes corresponding to the different acceleration modes, and reasonable utilization of resources is better realized.
In some specific embodiments, the training data migration module 11 may specifically include:
the training data migration unit is used for caching training data through a local CPU and migrating the cached training data to the data heterogeneous acceleration node through XDMA.
In some specific embodiments, the acceleration mode determining module 12 may specifically include:
the first acceleration mode determining unit is used for determining a target acceleration mode based on the number of layers of the neural network model;
and the first acceleration node determining unit is used for determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode if the target acceleration mode is an independent acceleration mode.
In some specific embodiments, the acceleration mode determining module 12 may specifically include:
the operation result generation unit is used for carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local internal memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.
In some specific embodiments, the acceleration mode determining module 12 may specifically include:
the second acceleration mode determining unit is used for determining a target acceleration mode based on the number of layers of the neural network model;
and the second acceleration node determining unit is used for decoupling the neural network model training task to obtain a task schedule if the target acceleration mode is a hybrid acceleration mode, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node.
Further, the embodiment of the present application further discloses an electronic device, and fig. 7 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement relevant steps in the heterogeneous acceleration method of the neural network model disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol in which the communication interface is in compliance is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the heterogeneous acceleration method of the neural network model performed by the electronic device 20 disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by the processor, implements the heterogeneous acceleration method of the neural network model disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. The heterogeneous acceleration method of the neural network model is characterized by being applied to a main heterogeneous acceleration node and comprising the following steps of:
after a training task of a target behavior detection model is acquired, training data constructed based on a plurality of behavior gesture data are migrated to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and the feature data are returned to the main heterogeneous acceleration node;
determining a target acceleration mode based on the number of layers of the target behavior detection model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;
performing full connection processing on the final operation result by using a local CPU (Central processing Unit) to obtain a trained target behavior detection model so as to detect target behaviors by using the trained target behavior detection model;
the determining the target acceleration mode based on the target behavior detection model layer number comprises the following steps:
if the number of layers of the target behavior detection model is small, namely a small target behavior detection model, determining an independent acceleration mode as the target acceleration mode;
and if the number of layers of the target behavior detection model is large, namely the large target behavior detection model, determining the hybrid acceleration mode as the target acceleration mode.
2. The heterogeneous acceleration method of a neural network model according to claim 1, wherein the migrating training data constructed based on a plurality of behavior gesture data to the data heterogeneous acceleration node comprises:
and caching training data constructed based on the behavior gesture data through a local CPU, and migrating the cached training data to the data heterogeneous acceleration node through XDMA.
3. The heterogeneous acceleration method of a neural network model according to claim 1 or 2, wherein the determining a target acceleration pattern based on a target behavior detection model layer number and determining a target heterogeneous acceleration node corresponding to the target acceleration pattern includes:
determining a target acceleration mode based on the number of layers of the target behavior detection model;
and if the target acceleration mode is an independent acceleration mode, determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode.
4. The heterogeneous acceleration method of claim 3, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:
and carrying out serial processing on preset layer operation in the target behavior detection model based on the characteristic data, caching the generated intermediate operation result into a local memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.
5. The heterogeneous acceleration method of a neural network model according to claim 1 or 2, wherein the determining a target acceleration pattern based on a target behavior detection model layer number and determining a target heterogeneous acceleration node corresponding to the target acceleration pattern so as to generate an intermediate operation result of a target behavior detection model based on the feature data by the target heterogeneous acceleration node includes:
determining a target acceleration mode based on the number of layers of the target behavior detection model;
and if the target acceleration mode is a hybrid acceleration mode, decoupling the training task of the target behavior detection model to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node.
6. The heterogeneous acceleration method of claim 5, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:
if the number of the sub-tasks in the task schedule is one, corresponding preset layer operation in a target behavior detection model is carried out through the target heterogeneous acceleration node and based on the characteristic data, the generated intermediate operation result is stored locally, iterative operation is carried out based on the intermediate operation result to generate a final operation result, and then the final operation result is returned to the CPU of the main heterogeneous acceleration node through XDMA.
7. The heterogeneous acceleration method of claim 5, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:
if the number of subtasks in the task schedule is multiple, performing corresponding preset layer operation in a target behavior detection model through the target heterogeneous acceleration node corresponding to the first subtask and based on the corresponding characteristic data to obtain a current intermediate operation result; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a target behavior detection model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA.
8. Heterogeneous acceleration device of neural network model, characterized in that is applied to main heterogeneous acceleration node, includes:
the training data migration module is used for migrating training data constructed based on a plurality of behavior gesture data to the data heterogeneous acceleration nodes after a training task of the target behavior detection model is acquired, so that the data heterogeneous acceleration nodes convert the training data in format to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration nodes;
the acceleration mode determining module is used for determining a target acceleration mode based on the number of layers of the target behavior detection model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;
the result full-connection module is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained target behavior detection model, so that the target behavior is detected by utilizing the trained target behavior detection model;
the acceleration mode determining module is specifically configured to: if the number of layers of the target behavior detection model is small, namely a small target behavior detection model, determining an independent acceleration mode as the target acceleration mode; and if the number of layers of the target behavior detection model is large, namely the large target behavior detection model, determining the hybrid acceleration mode as the target acceleration mode.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the heterogeneous acceleration method of a neural network model according to any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which, when executed by a processor, implements a heterogeneous acceleration method of a neural network model according to any one of claims 1 to 7.
CN202310722520.2A 2023-06-19 2023-06-19 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model Active CN116451757B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310722520.2A CN116451757B (en) 2023-06-19 2023-06-19 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310722520.2A CN116451757B (en) 2023-06-19 2023-06-19 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Publications (2)

Publication Number Publication Date
CN116451757A CN116451757A (en) 2023-07-18
CN116451757B true CN116451757B (en) 2023-09-08

Family

ID=87127761

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310722520.2A Active CN116451757B (en) 2023-06-19 2023-06-19 Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Country Status (1)

Country Link
CN (1) CN116451757B (en)

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110097186A (en) * 2019-04-29 2019-08-06 济南浪潮高新科技投资发展有限公司 A kind of neural network isomery quantization training method
EP3543917A1 (en) * 2018-03-19 2019-09-25 SRI International Inc. Dynamic adaptation of deep neural networks
CN112116084A (en) * 2020-09-15 2020-12-22 中国科学技术大学 Convolution neural network hardware accelerator capable of solidifying full network layer on reconfigurable platform
CN112988229A (en) * 2019-12-12 2021-06-18 上海大学 Convolutional neural network resource optimization configuration method based on heterogeneous computation
CN113313243A (en) * 2021-06-11 2021-08-27 海宁奕斯伟集成电路设计有限公司 Method, device and equipment for determining neural network accelerator and storage medium
CN113688734A (en) * 2021-08-25 2021-11-23 燕山大学 Old man falling detection method based on FPGA heterogeneous acceleration
CN114064278A (en) * 2021-11-18 2022-02-18 深圳致星科技有限公司 Heterogeneous acceleration engine and method for federal learning
CN114662661A (en) * 2022-03-22 2022-06-24 东南大学 Method for accelerating multi-outlet DNN reasoning of heterogeneous processor under edge calculation
CN114742225A (en) * 2022-04-07 2022-07-12 中国科学院合肥物质科学研究院 Neural network reasoning acceleration method based on heterogeneous platform
WO2022235251A1 (en) * 2021-05-03 2022-11-10 Google Llc Generating and globally tuning application-specific machine learning accelerators
CN115731441A (en) * 2022-11-29 2023-03-03 浙江大学 Target detection and attitude estimation method based on data cross-modal transfer learning
CN115983359A (en) * 2023-02-03 2023-04-18 展讯通信(上海)有限公司 Heterogeneous computing scheduling method and device and computer readable storage medium
CN116150191A (en) * 2023-02-22 2023-05-23 上海威固信息技术股份有限公司 Data operation acceleration method and system for cloud data architecture
WO2023103301A1 (en) * 2021-12-09 2023-06-15 苏州浪潮智能科技有限公司 Distributed heterogeneous acceleration platform communication method and system, and device and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222256B2 (en) * 2017-10-17 2022-01-11 Xilinx, Inc. Neural network processing system having multiple processors and a neural network accelerator
CN109976903B (en) * 2019-02-22 2021-06-29 华中科技大学 Deep learning heterogeneous computing method and system based on layer width memory allocation
US20210390460A1 (en) * 2021-06-12 2021-12-16 Intel Corporation Compute and memory based artificial intelligence model partitioning using intermediate representation
US20230088676A1 (en) * 2021-09-20 2023-03-23 International Business Machines Corporation Graph neural network (gnn) training using meta-path neighbor sampling and contrastive learning

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3543917A1 (en) * 2018-03-19 2019-09-25 SRI International Inc. Dynamic adaptation of deep neural networks
CN110097186A (en) * 2019-04-29 2019-08-06 济南浪潮高新科技投资发展有限公司 A kind of neural network isomery quantization training method
CN112988229A (en) * 2019-12-12 2021-06-18 上海大学 Convolutional neural network resource optimization configuration method based on heterogeneous computation
CN112116084A (en) * 2020-09-15 2020-12-22 中国科学技术大学 Convolution neural network hardware accelerator capable of solidifying full network layer on reconfigurable platform
WO2022235251A1 (en) * 2021-05-03 2022-11-10 Google Llc Generating and globally tuning application-specific machine learning accelerators
CN113313243A (en) * 2021-06-11 2021-08-27 海宁奕斯伟集成电路设计有限公司 Method, device and equipment for determining neural network accelerator and storage medium
CN113688734A (en) * 2021-08-25 2021-11-23 燕山大学 Old man falling detection method based on FPGA heterogeneous acceleration
CN114064278A (en) * 2021-11-18 2022-02-18 深圳致星科技有限公司 Heterogeneous acceleration engine and method for federal learning
WO2023103301A1 (en) * 2021-12-09 2023-06-15 苏州浪潮智能科技有限公司 Distributed heterogeneous acceleration platform communication method and system, and device and medium
CN114662661A (en) * 2022-03-22 2022-06-24 东南大学 Method for accelerating multi-outlet DNN reasoning of heterogeneous processor under edge calculation
CN114742225A (en) * 2022-04-07 2022-07-12 中国科学院合肥物质科学研究院 Neural network reasoning acceleration method based on heterogeneous platform
CN115731441A (en) * 2022-11-29 2023-03-03 浙江大学 Target detection and attitude estimation method based on data cross-modal transfer learning
CN115983359A (en) * 2023-02-03 2023-04-18 展讯通信(上海)有限公司 Heterogeneous computing scheduling method and device and computer readable storage medium
CN116150191A (en) * 2023-02-22 2023-05-23 上海威固信息技术股份有限公司 Data operation acceleration method and system for cloud data architecture

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
结合迁移学习模型的卷积神经网络算法研究;邱宁佳;王晓霞;王鹏;周思丞;王艳春;;计算机工程与应用(第05期);全文 *

Also Published As

Publication number Publication date
CN116451757A (en) 2023-07-18

Similar Documents

Publication Publication Date Title
Chen et al. DNNOff: offloading DNN-based intelligent IoT applications in mobile edge computing
Li et al. DeepNFV: A lightweight framework for intelligent edge network functions virtualization
CN108122032A (en) A kind of neural network model training method, device, chip and system
CN111522657B (en) Distributed equipment collaborative deep learning reasoning method
KR20210036226A (en) A distributed computing system including multiple edges and cloud, and method for providing model for using adaptive intelligence thereof
CN110069341A (en) What binding function configured on demand has the dispatching method of dependence task in edge calculations
CN111443990A (en) Edge calculation task migration simulation system
CN108111335A (en) A kind of method and system dispatched and link virtual network function
CN113592066A (en) Hardware acceleration method, apparatus, device, computer program product and storage medium
CN109657794A (en) A kind of distributed deep neural network performance modelling method of queue based on instruction
CN114546608A (en) Task scheduling method based on edge calculation
CN115586995A (en) Method and system for predicting maximum load of load machine, computer equipment and medium
CN115033359A (en) Internet of things agent multi-task scheduling method and system based on time delay control
Qiao et al. Analysis of Evolutionary Model of DIKW Based on Cloud Resource Allocation Management
Xu et al. A meta reinforcement learning-based virtual machine placement algorithm in mobile edge computing
CN116451757B (en) Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model
CN116582407A (en) Containerized micro-service arrangement system and method based on deep reinforcement learning
Jiang et al. Hierarchical deployment of deep neural networks based on fog computing inferred acceleration model
CN110012021B (en) Self-adaptive computing migration method under mobile edge computing
CN113222134B (en) Brain-like computing system, method and computer readable storage medium
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
Ahn et al. Scissionlite: Accelerating distributed deep neural networks using transfer layer
Cui et al. Resource-Efficient DNN Training and Inference for Heterogeneous Edge Intelligence in 6G
CN111527734B (en) Node traffic ratio prediction method and device
Sędziwy On acceleration of multi-agent system performance in large scale photometric computations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant