CN116451757B

CN116451757B - Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Info

Publication number: CN116451757B
Application number: CN202310722520.2A
Authority: CN
Inventors: 李乐乐; 张晖; 赵鑫鑫; 姜凯; 李锐
Original assignee: Shandong Inspur Scientific Research Institute Co Ltd
Current assignee: Shandong Inspur Scientific Research Institute Co Ltd
Priority date: 2023-06-19
Filing date: 2023-06-19
Publication date: 2023-09-08
Anticipated expiration: 2043-06-19
Also published as: CN116451757A

Abstract

The application discloses a heterogeneous acceleration method, a heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration media of a neural network model, which relate to the field of heterogeneous acceleration and comprise the following steps: after the main heterogeneous acceleration node acquires a model training task, migrating training data to the data heterogeneous acceleration node to perform format conversion on the training data to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration node; determining a target acceleration mode based on the layer number of the neural network model and determining a corresponding target heterogeneous acceleration node, so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result; and (3) fully connecting the final operation result by using a local CPU to obtain a trained neural network model. According to the method, different acceleration modes are selected according to the number of model layers, and format conversion of data is completed rapidly through the data heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes.

Description

Heterogeneous acceleration method, heterogeneous acceleration device, heterogeneous acceleration equipment and heterogeneous acceleration medium for neural network model

Technical Field

The present application relates to the field of heterogeneous acceleration, and in particular, to a heterogeneous acceleration method, apparatus, device, and medium for a neural network model.

Background

With the continuous development and innovation of hardware technology, deep learning technology is a new generator, wherein NNs (Neural Networks) as a connection model have achieved good effects on image classification, natural language processing and rapid target detection tasks. With the increasing number of layers and neurons of NNs, for training of large models, it has been difficult for the conventional software and hardware architectures such as CPU (Central Processing Unit ) and GPU (Graphics Processing Unit, graphics processor) to balance training efficiency with hardware resources and energy consumption, and meanwhile, the increasing speed and energy consumption requirements cannot be met only by using software optimization as a means for accelerating NNs.

Disclosure of Invention

Accordingly, the present application is directed to a heterogeneous acceleration method, apparatus, device, and medium for a neural network model, which can select different acceleration modes according to the number of model layers, and rapidly complete format conversion of data through heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes. The specific scheme is as follows:

in a first aspect, the present application provides a heterogeneous acceleration method of a neural network model, applied to a master heterogeneous acceleration node, including:

after a neural network model training task is acquired, migrating training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and returning the feature data to the main heterogeneous acceleration node;

determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;

and carrying out full connection processing on the final operation result by using a local CPU so as to obtain a trained neural network model.

Optionally, the migrating training data to the data heterogeneous acceleration node includes:

and caching the training data through a local CPU, and migrating the cached training data to the data heterogeneous acceleration node through XDMA.

Optionally, the determining the target acceleration mode based on the number of layers of the neural network model, and determining the target heterogeneous acceleration node corresponding to the target acceleration mode includes:

determining a target acceleration mode based on the number of layers of the neural network model;

and if the target acceleration mode is an independent acceleration mode, determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode.

Optionally, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result, includes:

and carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.

Optionally, the determining a target acceleration mode based on the number of layers of the neural network model, and determining a target heterogeneous acceleration node corresponding to the target acceleration mode, so as to generate, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, includes:

and if the target acceleration mode is a hybrid acceleration mode, decoupling the neural network model training task to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node.

if the number of the sub-tasks in the task schedule is one, corresponding preset layer operation in a neural network model is performed through the target heterogeneous acceleration node and based on the characteristic data, the generated intermediate operation result is stored locally, iterative operation is performed based on the intermediate operation result to generate a final operation result, and then the final operation result is returned to a CPU of the main heterogeneous acceleration node through XDMA.

if the number of subtasks in the task schedule is multiple, performing corresponding preset layer operation in a neural network model through the target heterogeneous acceleration node corresponding to the first subtask and based on the corresponding characteristic data to obtain a current intermediate operation result; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a neural network model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA.

In a second aspect, the present application provides a heterogeneous acceleration device of a neural network model, applied to a main heterogeneous acceleration node, including:

the training data migration module is used for migrating training data to the data heterogeneous acceleration nodes after acquiring a neural network model training task so that the data heterogeneous acceleration nodes convert the format of the training data to obtain characteristic data in a uniform format and return the characteristic data to the main heterogeneous acceleration nodes;

the acceleration mode determining module is used for determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;

and the result full-connection module is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained neural network model.

In a third aspect, the present application provides an electronic device, comprising:

a memory for storing a computer program;

and the processor is used for executing the computer program to realize the heterogeneous acceleration method of the neural network model.

In a fourth aspect, the present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the heterogeneous acceleration method of the neural network model described above.

After acquiring a neural network model training task, a main heterogeneous acceleration node migrates training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain characteristic data in a uniform format, and returns the characteristic data to the main heterogeneous acceleration node; determining a target acceleration mode based on the number of layers of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result; and carrying out full connection processing on the final operation result by using a local CPU so as to obtain a trained neural network model. Therefore, the training process of the neural network model is unloaded to the heterogeneous acceleration nodes, and the training efficiency of the model is improved through distributed cooperative work of the heterogeneous acceleration nodes; in addition, the application rapidly completes the format conversion of the training data through the independent data heterogeneous acceleration node, reduces the workload of the main heterogeneous acceleration node and realizes the reasonable distribution of resources; in addition, according to the method, different acceleration modes are selected according to the layer number of the neural network model, so that the neural network model is operated through different target heterogeneous acceleration nodes corresponding to the different acceleration modes, and reasonable utilization of resources is better realized.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present application, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a heterogeneous acceleration method of a neural network model disclosed by the application;

FIG. 2 is a heterogeneous acceleration flow chart of a neural network model according to the present disclosure;

FIG. 3 is a flowchart of a heterogeneous acceleration method of a specific neural network model disclosed in the present application;

FIG. 4 is a heterogeneous acceleration flow chart of a neural network model in an independent acceleration mode according to the present disclosure;

FIG. 5 is a heterogeneous acceleration flow chart of a neural network model in a hybrid acceleration mode according to the present disclosure;

FIG. 6 is a schematic diagram of a heterogeneous accelerator of a neural network model according to the present disclosure;

fig. 7 is a block diagram of an electronic device according to the present disclosure.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

At present, with the continuous increase of the number of layers and the number of neurons of NNs, for the training of large models, the traditional software and hardware architectures such as a CPU and a GPU have difficulty in balancing training efficiency with hardware resources and energy consumption, and meanwhile, the increasing speed and energy consumption requirements cannot be met by using software optimization as a means for accelerating NNs. Therefore, the heterogeneous acceleration method of the neural network model selects different acceleration modes according to the layer number of the neural network model, and rapidly completes format conversion of data through independent heterogeneous acceleration nodes, so that the model training efficiency is improved through cooperative work of various heterogeneous acceleration nodes.

Referring to fig. 1, an embodiment of the present application discloses a heterogeneous acceleration method of a neural network model, which is applied to a main heterogeneous acceleration node, and includes:

and S11, after a neural network model training task is acquired, migrating training data to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration node.

In this embodiment, the migrating training data to the data heterogeneous acceleration node may include caching the training data by a local CPU, and migrating the cached training data to the data heterogeneous acceleration node by XDMA. It can be understood that, as shown in fig. 2, after the main heterogeneous acceleration node receives the training task of the neural network model, the local CPU is utilized to buffer the training data, and since different data formats exist in the training data, such as images, voices and the like, and different data formats exist in the training data, especially three-dimensional data, such as images, the buffered training data is firstly sent to the data heterogeneous acceleration node of the next stage through an XDMA (Direct Memory Access ) channel, so that the training data is subjected to format conversion processing through the independent data heterogeneous acceleration node to form feature data in a unified format, and then the training data in the unified format is returned to the main heterogeneous acceleration node for buffering. In this way, the embodiment uses the independent data heterogeneous acceleration node to perform format conversion on the training data, so that the workload of the master heterogeneous acceleration node is reduced, the resource consumption is controlled, and the reasonable allocation of the resources is realized.

And step S12, determining a target acceleration mode based on the layer number of the neural network model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result.

In this embodiment, different acceleration modes are determined according to the number of layers of the neural network model, that is, different acceleration modes are determined according to the size of the neural network model, and the different acceleration modes correspond to different target heterogeneous acceleration nodes. As shown in fig. 2, for a small number of layers of the neural network model, that is, a small neural network model, an independent acceleration mode is selected, a corresponding target heterogeneous acceleration node is a main heterogeneous acceleration node, an FPGA (Field Programmable Gate Array ) in the main heterogeneous acceleration node performs preset layer operation in the neural network model based on the feature data, and a final operation result is determined based on the generated intermediate operation result. Aiming at the situation that the number of layers of the neural network model is large, namely the large neural network model, a hybrid acceleration mode is selected, the corresponding target heterogeneous acceleration node is the branch heterogeneous acceleration node, the branch heterogeneous acceleration node independently completes corresponding preset layer operation in the neural network model based on corresponding characteristic data respectively, and a final operation result is determined based on the generated intermediate operation result. The branch heterogeneous acceleration nodes comprise an A-layer acceleration node, a B-layer acceleration node, a C-layer acceleration node, … and an N-layer acceleration node in fig. 2; the preset layer operation includes, but is not limited to, shaping, convolution, activation and pooling, and may be one or a combination of any several of them, which is not limited herein.

And S13, performing full connection processing on the final operation result by using a local CPU to obtain a trained neural network model.

In this embodiment, as shown in fig. 2, in either the independent acceleration mode or the hybrid acceleration mode, the master heterogeneous acceleration node performs full connection processing on the final operation result by using a local CPU, so as to obtain a trained neural network model, where the final operation result is stored in a global memory of the master heterogeneous acceleration node.

In this embodiment, taking the application to the target behavior detection scenario as an example, the training process of the target behavior detection model is offloaded to various heterogeneous acceleration nodes, where the target behaviors include, but are not limited to, opening and closing exercises, squatting exercises, sit-ups exercises, and the like. After the master heterogeneous acceleration node receives the training task of the target behavior detection model, a local CPU is utilized to cache an image training data set constructed based on a plurality of specific behavior gesture data. Because the image data in the image training data set has different data formats, the cached image training data set is migrated to the independent data heterogeneous acceleration node through the XDMA channel, so that the data heterogeneous acceleration node performs format conversion processing on all the received image training data in the image training data set to unify the pixel size of the image training data, thereby obtaining the image training data set with the unified pixel size, and returning to the main heterogeneous acceleration node. And then the main heterogeneous acceleration node determines a corresponding target acceleration mode based on the layer number of the target behavior detection model, and the target heterogeneous acceleration node corresponding to the target acceleration mode so as to carry out operations such as shaping, convolution, activation, pooling and the like through the target heterogeneous acceleration node and based on the image training data set with the uniform pixel size, and generates a final operation result based on an intermediate operation result generated in the operation process. And finally, performing full connection processing on the final operation result by a CPU of the main heterogeneous acceleration node to obtain a final trained target behavior detection model, thereby detecting the specific behavior gesture by using the trained target behavior detection model.

Therefore, the training process of the neural network model is unloaded to the heterogeneous acceleration nodes, and the training efficiency of the model is improved through distributed cooperative work of the heterogeneous acceleration nodes; in addition, the application rapidly completes the format conversion of the training data through the independent data heterogeneous acceleration node, reduces the workload of the main heterogeneous acceleration node and realizes the reasonable distribution of resources; in addition, according to the method, different acceleration modes are selected according to the layer number of the neural network model, so that the neural network model is operated through different target heterogeneous acceleration nodes corresponding to the different acceleration modes, and reasonable utilization of resources is better realized.

Based on the previous embodiment, the present application describes the overall process of heterogeneous acceleration of the neural network model, and next, the present application will be described in detail on how to determine the target acceleration mode, and how to perform preset layer operation in the neural network model through the target heterogeneous acceleration node corresponding to the target acceleration mode. Referring to fig. 3, an embodiment of the present application discloses a heterogeneous acceleration process of a neural network model, which is applied to a master heterogeneous acceleration node, and includes:

and S21, determining a target acceleration mode based on the number of layers of the neural network model.

And S22, if the target acceleration mode is an independent acceleration mode, determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode.

In this embodiment, the corresponding target acceleration mode is selected according to the number of layers of the neural network model, that is, the corresponding target acceleration mode is selected according to the size of the neural network model, and if the number of layers of the neural network model is small, that is, the neural network model is a small neural network model, the target acceleration mode is an independent acceleration mode, and the target heterogeneous acceleration node corresponding to the independent acceleration mode is the master heterogeneous acceleration node itself.

Step S23, carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local internal memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.

In this embodiment, as shown in fig. 4, after the main heterogeneous acceleration node obtains the feature data in the unified format returned by the data heterogeneous acceleration node, it is determined that the target acceleration mode is the independent acceleration mode based on the number of layers of the neural network model, and the target heterogeneous acceleration node corresponding to the target acceleration mode is the main heterogeneous acceleration node itself. Specifically, the master heterogeneous acceleration node determines preset layer operations in the neural network model to be executed according to the neural network model training task, for example, the preset layers are an integer layer, a convolution layer, an activation layer and a pooling layer, the integer layer, the convolution layer, the activation layer and the pooling layer are connected through an AXI-ST (Advanced eXtensible Interface, bus protocol) protocol to form a serial running water processing mechanism, so that the integer, convolution, activation and pooling operations are sequentially executed based on the characteristic data, the generated intermediate operation result is stored in a local memory, and the final operation result generated based on the intermediate operation result is stored in a local global memory. And finally, performing full connection processing on the final operation result by utilizing a CPU in the main heterogeneous acceleration node, thereby obtaining a trained neural network model.

In this embodiment, in addition to the independent acceleration mode, there is a hybrid acceleration mode, specifically, a target acceleration mode is determined based on the number of layers of the neural network model; and if the target acceleration mode is a hybrid acceleration mode, decoupling the neural network model training task to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node. It can be understood that, as shown in fig. 5, after the main heterogeneous acceleration node acquires the feature data in the unified format returned by the data heterogeneous acceleration node, a corresponding target acceleration mode is selected according to the number of layers of the neural network model, and if the number of layers of the neural network model is large, that is, the neural network model is a large neural network model, the target acceleration mode is a hybrid acceleration mode. And then the main heterogeneous acceleration node decouples the training task of the neural network model to obtain a task schedule containing sub-tasks, further determines branch heterogeneous acceleration nodes corresponding to the sub-tasks in the task schedule from branch heterogeneous acceleration nodes such as an A-layer acceleration node, a B-layer acceleration node, a C-layer acceleration node, a … and an N-layer acceleration node, takes the branch heterogeneous acceleration nodes as target heterogeneous acceleration nodes, and then migrates characteristic data corresponding to the sub-tasks to the corresponding target heterogeneous acceleration nodes so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration nodes.

In this embodiment, for the hybrid acceleration mode, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data and determining a final operation result based on the intermediate operation result may include performing, by the target heterogeneous acceleration node and based on the feature data, a corresponding preset layer operation in the neural network model if the number of sub-tasks in the task schedule is one, and storing the generated intermediate operation result locally, and performing an iterative operation based on the intermediate operation result to generate the final operation result, and then returning, by XDMA, the final operation result to the CPU of the master heterogeneous acceleration node. It may be appreciated that, as shown in fig. 5, in a specific embodiment, if the number of subtasks in the task schedule obtained by decoupling is one, for example, the subtasks are convolution tasks, then convolution operations are performed by the target heterogeneous acceleration nodes corresponding to the convolution tasks based on the corresponding feature data, and the generated intermediate operation result is stored locally, and then, the iterative operation of convolution is performed based on the intermediate operation result until the iteration is completed, so as to generate a final operation result. After the target heterogeneous acceleration node obtains the final operation result, the final operation result is returned to the main heterogeneous acceleration node through XDMA, so that a CPU in the main heterogeneous acceleration node performs full connection processing on the final operation result, and a trained neural network model is obtained.

In this embodiment, for the hybrid acceleration mode, the generating, by the target heterogeneous acceleration node, an intermediate operation result of the neural network model based on the feature data, and determining a final operation result based on the intermediate operation result may include performing, by the target heterogeneous acceleration node corresponding to the first sub-task and based on the corresponding feature data, a corresponding preset layer operation in the neural network model to obtain a current intermediate operation result if the number of sub-tasks in the task schedule is multiple; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a neural network model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA. It may be appreciated that, as shown in fig. 5, in another embodiment, if the number of sub-tasks in the task schedule obtained by decoupling is multiple, for example, the sub-tasks include an integer task, a convolution task, an activation task and a pooling task, then the current intermediate operation result is obtained by performing an integer operation on the target heterogeneous acceleration node corresponding to the first sub-task (integer task) and based on the corresponding feature data, and then the current intermediate operation result is sent to the target heterogeneous acceleration node corresponding to the next sub-task (convolution task) through the XDMA, so that the target heterogeneous acceleration node corresponding to the convolution task performs a convolution operation based on the corresponding feature data and the current intermediate operation result, so as to obtain a new current intermediate operation result, and so on, until the next sub-task is the last sub-task (pooling task), so as to obtain a final operation result, and the final operation result is returned to the main heterogeneous acceleration node through the XDMA, so that the CPU in the main heterogeneous acceleration node performs a full connection process on the final operation result, so as to obtain the trained neural network model.

It can be seen that, according to the method, the corresponding target acceleration mode is selected according to the layer number of the neural network model, so that different target heterogeneous acceleration nodes corresponding to different target acceleration modes are utilized to execute preset layer operation in the neural network model, namely, for the independent acceleration mode, the main heterogeneous acceleration node is selected to execute the preset layer operation in the neural network model; for the hybrid acceleration mode, the branch heterogeneous acceleration nodes are selected to execute the preset layer operation in the neural network model, so that the energy consumption of the main heterogeneous acceleration nodes can be controlled, the reasonable allocation and utilization of resources are realized, and the training efficiency of the neural network model can be improved.

Referring to fig. 6, an embodiment of the present application discloses a heterogeneous acceleration device of a neural network model, which is applied to a main heterogeneous acceleration node, and includes:

the training data migration module 11 is configured to migrate training data to a data heterogeneous acceleration node after a neural network model training task is acquired, so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and return the feature data to the main heterogeneous acceleration node;

an acceleration pattern determining module 12, configured to determine a target acceleration pattern based on a number of layers of the neural network model, and determine a target heterogeneous acceleration node corresponding to the target acceleration pattern, so as to generate an intermediate operation result of the neural network model by the target heterogeneous acceleration node based on the feature data, and determine a final operation result based on the intermediate operation result;

and the result full-connection module 13 is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained neural network model.

In some specific embodiments, the training data migration module 11 may specifically include:

the training data migration unit is used for caching training data through a local CPU and migrating the cached training data to the data heterogeneous acceleration node through XDMA.

In some specific embodiments, the acceleration mode determining module 12 may specifically include:

the first acceleration mode determining unit is used for determining a target acceleration mode based on the number of layers of the neural network model;

and the first acceleration node determining unit is used for determining the target acceleration mode as a target heterogeneous acceleration node corresponding to the target acceleration mode if the target acceleration mode is an independent acceleration mode.

the operation result generation unit is used for carrying out serial processing on preset layer operation in the neural network model based on the characteristic data, caching the generated intermediate operation result into a local internal memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.

the second acceleration mode determining unit is used for determining a target acceleration mode based on the number of layers of the neural network model;

and the second acceleration node determining unit is used for decoupling the neural network model training task to obtain a task schedule if the target acceleration mode is a hybrid acceleration mode, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the neural network model based on the characteristic data through the target heterogeneous acceleration node.

Further, the embodiment of the present application further discloses an electronic device, and fig. 7 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.

Fig. 7 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. The memory 22 is configured to store a computer program that is loaded and executed by the processor 21 to implement relevant steps in the heterogeneous acceleration method of the neural network model disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.

In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol in which the communication interface is in compliance is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.

The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.

The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further include a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the heterogeneous acceleration method of the neural network model performed by the electronic device 20 disclosed in any of the previous embodiments.

Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by the processor, implements the heterogeneous acceleration method of the neural network model disclosed previously. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing has outlined rather broadly the more detailed description of the application in order that the detailed description of the application that follows may be better understood, and in order that the present principles and embodiments may be better understood; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The heterogeneous acceleration method of the neural network model is characterized by being applied to a main heterogeneous acceleration node and comprising the following steps of:

after a training task of a target behavior detection model is acquired, training data constructed based on a plurality of behavior gesture data are migrated to a data heterogeneous acceleration node so that the data heterogeneous acceleration node performs format conversion on the training data to obtain feature data in a uniform format, and the feature data are returned to the main heterogeneous acceleration node;

determining a target acceleration mode based on the number of layers of the target behavior detection model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;

performing full connection processing on the final operation result by using a local CPU (Central processing Unit) to obtain a trained target behavior detection model so as to detect target behaviors by using the trained target behavior detection model;

the determining the target acceleration mode based on the target behavior detection model layer number comprises the following steps:

if the number of layers of the target behavior detection model is small, namely a small target behavior detection model, determining an independent acceleration mode as the target acceleration mode;

and if the number of layers of the target behavior detection model is large, namely the large target behavior detection model, determining the hybrid acceleration mode as the target acceleration mode.

2. The heterogeneous acceleration method of a neural network model according to claim 1, wherein the migrating training data constructed based on a plurality of behavior gesture data to the data heterogeneous acceleration node comprises:

and caching training data constructed based on the behavior gesture data through a local CPU, and migrating the cached training data to the data heterogeneous acceleration node through XDMA.

3. The heterogeneous acceleration method of a neural network model according to claim 1 or 2, wherein the determining a target acceleration pattern based on a target behavior detection model layer number and determining a target heterogeneous acceleration node corresponding to the target acceleration pattern includes:

determining a target acceleration mode based on the number of layers of the target behavior detection model;

4. The heterogeneous acceleration method of claim 3, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:

and carrying out serial processing on preset layer operation in the target behavior detection model based on the characteristic data, caching the generated intermediate operation result into a local memory, and generating a final operation result based on the intermediate operation result after all the preset layer operation is completed.

5. The heterogeneous acceleration method of a neural network model according to claim 1 or 2, wherein the determining a target acceleration pattern based on a target behavior detection model layer number and determining a target heterogeneous acceleration node corresponding to the target acceleration pattern so as to generate an intermediate operation result of a target behavior detection model based on the feature data by the target heterogeneous acceleration node includes:

and if the target acceleration mode is a hybrid acceleration mode, decoupling the training task of the target behavior detection model to obtain a task schedule, determining a branch heterogeneous acceleration node corresponding to a subtask in the task schedule as a target heterogeneous acceleration node, and then migrating the characteristic data corresponding to the subtask to the target heterogeneous acceleration node so as to generate an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node.

6. The heterogeneous acceleration method of claim 5, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:

if the number of the sub-tasks in the task schedule is one, corresponding preset layer operation in a target behavior detection model is carried out through the target heterogeneous acceleration node and based on the characteristic data, the generated intermediate operation result is stored locally, iterative operation is carried out based on the intermediate operation result to generate a final operation result, and then the final operation result is returned to the CPU of the main heterogeneous acceleration node through XDMA.

7. The heterogeneous acceleration method of claim 5, wherein the generating, by the target heterogeneous acceleration node, an intermediate operation result of a target behavior detection model based on the feature data, and determining a final operation result based on the intermediate operation result, comprises:

if the number of subtasks in the task schedule is multiple, performing corresponding preset layer operation in a target behavior detection model through the target heterogeneous acceleration node corresponding to the first subtask and based on the corresponding characteristic data to obtain a current intermediate operation result; forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA, so that the target heterogeneous acceleration node corresponding to the next subtask performs corresponding preset layer operation in a target behavior detection model based on the corresponding characteristic data and the current intermediate operation result to obtain a new current intermediate operation result, and then re-jumping to the step of forwarding the current intermediate operation result to the target heterogeneous acceleration node corresponding to the next subtask through XDMA until the next subtask is the last subtask to obtain a final operation result, and returning the final operation result to a CPU of the main heterogeneous acceleration node through XDMA.

8. Heterogeneous acceleration device of neural network model, characterized in that is applied to main heterogeneous acceleration node, includes:

the training data migration module is used for migrating training data constructed based on a plurality of behavior gesture data to the data heterogeneous acceleration nodes after a training task of the target behavior detection model is acquired, so that the data heterogeneous acceleration nodes convert the training data in format to obtain characteristic data in a uniform format, and returning the characteristic data to the main heterogeneous acceleration nodes;

the acceleration mode determining module is used for determining a target acceleration mode based on the number of layers of the target behavior detection model, determining a target heterogeneous acceleration node corresponding to the target acceleration mode, generating an intermediate operation result of the target behavior detection model based on the characteristic data through the target heterogeneous acceleration node, and determining a final operation result based on the intermediate operation result;

the result full-connection module is used for carrying out full-connection processing on the final operation result by utilizing a local CPU so as to obtain a trained target behavior detection model, so that the target behavior is detected by utilizing the trained target behavior detection model;

the acceleration mode determining module is specifically configured to: if the number of layers of the target behavior detection model is small, namely a small target behavior detection model, determining an independent acceleration mode as the target acceleration mode; and if the number of layers of the target behavior detection model is large, namely the large target behavior detection model, determining the hybrid acceleration mode as the target acceleration mode.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to implement the heterogeneous acceleration method of a neural network model according to any one of claims 1 to 7.

10. A computer readable storage medium for storing a computer program which, when executed by a processor, implements a heterogeneous acceleration method of a neural network model according to any one of claims 1 to 7.