CN110929857A

CN110929857A - Data processing method and device of neural network

Info

Publication number: CN110929857A
Application number: CN201811100682.8A
Authority: CN
Inventors: 翟云
Original assignee: Hefei Jun Zheng Science And Technology Ltd
Current assignee: Hefei Jun Zheng Science And Technology Ltd
Priority date: 2018-09-20
Filing date: 2018-09-20
Publication date: 2020-03-27
Anticipated expiration: 2038-09-20
Also published as: CN110929857B

Abstract

The invention provides a data processing method and a data processing device for a neural network, wherein the method comprises the following steps: the neural network processor performs convolution processing on the data of the current channel to obtain a convolution processing result of the current channel; sending the convolution processing result of the channel to an external processor, performing preset operation on the convolution processing result of the current channel through the external processor, and performing convolution processing on the next channel of the current channel at the same time; and acquiring a preset operation result of the external processor on the current channel, and performing pooling processing on the preset operation result of the current channel. By the scheme, the problem of low processing efficiency caused by triggering the next layer for processing after all channel data are processed is avoided, and the technical effect of effectively improving the processing efficiency is achieved.

Description

Data processing method and device of neural network

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a data processing method and device of a neural network.

Background

Neural networks (Neural networks) are research hotspots in the field of artificial intelligence since the 80 th of the 20 th century, and are formed by abstracting a human brain neuron Network from the information processing perspective so as to establish a certain simple model and then forming different networks according to different connection modes. It is also often directly referred to in engineering and academia as neural networks or neural-like networks.

A neural network is an operational model, which is formed by connecting a large number of nodes (or neurons). Each node represents a particular output function, called the excitation function. Every connection between two nodes represents a weighted value, called weight, for the signal passing through the connection, which is equivalent to the memory of the artificial neural network. The output of the network is different according to the connection mode of the network, the weight value and the excitation function. However, the network itself is usually an approximation to some algorithm or function in nature, and may also be an expression of a logical strategy.

Because of the huge computation of the Neural Network, the NPU (Neural-Network Processing Uint, Neural Network processor or Neural Network acceleration engine) often needs to use a dedicated digital logic circuit to accelerate. Although the neural network is general-purpose due to huge calculation amount, processors such as a CPU/GPU/DSP and the like, but the performance and power consumption are low, so that a special neural network accelerator is generally required to be selected to accelerate at an inference end level.

Although the neural network varies in shape, the calculation of the neural network is relatively regular, and is suitable for performing ASIC acceleration by using coarse-grained instructions, for example: convolution, pooling, full join operations, and the like.

However, in the practical process, only convolution, pooling and full connection are not enough, and sometimes some other calculations are needed, or some new operation types appear as the algorithm evolves. In this case, it is difficult to cover an accelerator that relies only on a limited fixed function, which requires an appropriate expansion of processing power (for example, an operation that cannot be supported is handed over to a CPU for processing), but because data interaction with other processing resources is required, interaction cost, efficiency of data processing, and the like need to be considered.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a data processing method and device of a neural network, and aims to achieve the technical effect of improving the processing efficiency.

In one aspect, a data processing method of a neural network is provided, including:

the neural network processor performs convolution processing on the data of the current channel to obtain a convolution processing result of the current channel;

sending the convolution processing result of the channel to an external processor, performing preset operation on the convolution processing result of the current channel through the external processor, and performing convolution processing on the next channel of the current channel at the same time;

and acquiring a preset operation result of the external processor on the current channel, and performing pooling processing on the preset operation result of the current channel.

In one embodiment, sending the convolution processing result of the channel to an external processor comprises:

writing the channel identification of the channel into a first register;

and under the condition that the change of the channel identifier in the first register is detected, triggering to send a convolution processing result corresponding to the changed channel identifier to the external processor.

In one embodiment, triggering sending of a convolution processing result corresponding to the changed channel identifier to the external processor includes:

the neural network processor sends an interrupt signal to the external processor;

the external processor reads the channel identification of the current channel from the first register in response to the interrupt signal;

and the external processor acquires the convolution processing result of the current channel according to the channel identifier of the current channel.

In one embodiment, obtaining a result of a predetermined operation of the external processor on the current channel includes:

detecting whether a channel identifier in a second register changes, wherein the channel identifier in the second register is written into the second register by an external processor after a predetermined operation on a current channel;

and under the condition that the channel identification in the second register is determined to be changed, acquiring a predetermined operation result corresponding to the channel identification from an external processor.

In one embodiment, the external processor is at least one of: CPU, GPU.

In another aspect, a data processing apparatus of a neural network is provided, which is located in a neural network processor, and includes:

the processing module is used for carrying out convolution processing on the data of the current channel to obtain a convolution processing result of the current channel;

the sending module is used for sending the convolution processing result of the channel to an external processor, performing preset operation on the convolution processing result of the current channel through the external processor and performing convolution processing on the next channel of the current channel at the same time;

and the acquisition module is used for acquiring the preset operation result of the external processor on the current channel and performing pooling processing on the preset operation result of the current channel.

In one embodiment, the sending module comprises:

a write unit for writing a channel identification of the channel into a first register;

and the triggering unit is used for triggering and sending a convolution processing result corresponding to the changed channel identifier to the external processor under the condition that the change of the channel identifier in the first register is detected.

In an embodiment, the triggering unit is specifically configured to send an interrupt signal to the external processor, the external processor reads a channel identifier of a current channel from the first register in response to the interrupt signal, and the external processor obtains a convolution processing result of the current channel according to the channel identifier of the current channel.

In one embodiment, the obtaining module comprises:

the detection unit is used for detecting whether the channel identification in the second register changes or not, wherein the channel identification in the second register is written into the second register by the external processor after the external processor performs preset operation on the current channel;

and the acquisition unit is used for acquiring a predetermined operation result corresponding to the channel identifier from an external processor under the condition that the channel identifier in the second register is determined to be changed.

In one embodiment, the external processor is at least one of: CPU, GPU.

In the above example, after the neural network processor performs convolution processing on the data of the current channel to obtain the convolution processing result of the current channel, the convolution processing result is immediately provided to the external processor and processed by the external processor instead of waiting for all channels to be processed, so that the problem of low processing efficiency caused by triggering the next layer for processing after all channel data are processed is avoided, and the technical effect of effectively improving the processing efficiency is achieved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the principles of the invention. In the drawings:

FIG. 1 is a timing diagram of a prior art neural network process;

FIG. 2 is a neural network processing timing diagram according to the present application;

FIG. 3 is an architectural diagram of a neural network system according to an embodiment of the present application;

FIG. 4 is a flow chart of a data processing method adapted to a neural network according to an embodiment of the present application;

fig. 5 is a block diagram of a data processing apparatus adapted to a neural network according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.

Considering that some calculations in the existing neural network system cannot be completed by the NPU, other processors are needed, for example: the processing of CPU/GPU, etc. thus has the interaction of NPU and other processors, and the data interaction cost and the problem of data processing efficiency exist when data are interacted.

Specifically, the following problems exist in the existing data interaction: assuming a neural network, one layer is convolution (CONV1), then the next layer is dot-by-dot inversion of feature maps (feature maps) generated by CONV1 (NEG1), and then pooling of the layers is performed (POOL 1). Assuming that n channels are provided in CONV1, NEG1 and POOL1 in this example, assuming that the current NPU does not support the negation operation, the negation operation needs to be sent to the CPU and the CPU performs the negation operation.

However, because of the data dependency relationship, the NPU cannot calculate CONV1 and POOL1 at the same time, and according to the existing processing manner, the processing procedure is as shown in fig. 1, that is, after all N channels in CONV1 are processed, the predetermined operation of NEG1 is performed, and after all N channels are processed, the pooling operation is performed.

However, in the case of CONV1 having N channels, it is obviously not necessary to wait until all N1 channels have been calculated to allow the CPU to start performing the calculation of NEG 1. Similarly, it is not necessary to start the calculation of POOL by NPU after the CPU completes the NEG calculation for all N channels.

Therefore, in this example, an interaction mechanism between the NPU and the Host CPU is proposed to improve the interaction performance between the NPU and the Host CPU, thereby improving the performance of the entire system.

Specifically, in this example, as shown in fig. 2, a sbox is designed in the NPU, wherein registers of a task-in ID and a task-out ID are designed in the sbox.

When the NPU completes the calculation of convoluting of CONV1 (convolution) of one channel, the task-out ID register is updated to the ID number of the current channel, when the sbox detects that a new task-out ID is written, an interrupt is sent to the host CPU (for example, an interrupt is sent by irq), after the host-CPU receives the interrupt, the task-out ID in the sbox is read, meanwhile, NEG calculation of the corresponding channel is completed, and after the interrupt is completed, the ID number of the current channel is written into the task-in ID register in the sbox (for example, the task-in ID register is written into sbox _ rw in fig. 2). And when the sbox detects that a new task-in ID is written, performing POOL calculation of the corresponding channel.

The processing flow shown in fig. 3 is also formed, and obviously, this processing manner can produce a channel-level task pipeline effect as a whole, so that the execution time is accelerated.

The neural network technology can be applied to the fields of pattern recognition, intelligent robot, automatic control, prediction estimation, biology, medicine, economy and the like, but is not limited to the fields.

In the above example, a specific example is taken as an example for explanation, and in actual implementation, the processor may not be a CPU, and the operation may not be a pooling operation, a convolution operation, or the like.

Based on this, in this example, a method adapted to the cooperative work of neural network processors is provided, as shown in fig. 4, the method may include the following steps:

step 1: processing data of a first channel through a first network layer of a neural network to obtain a first processing result of the first channel, wherein the first network layer is provided with a plurality of channels;

step 2: immediately providing the first processing result of the first network layer first channel to an external processor so that the external processor processes according to the first processing result of the first network layer first channel to obtain a second processing result of the first channel;

and step 3: and acquiring a second processing result of the first channel, and processing the second processing result of the first signal through a second network layer of the neural network to obtain a third processing result of the first channel.

That is, when the interaction with the external processor is needed, after the data processing of one channel is completed by the first network layer processing, the data is immediately provided to the external processor for processing, instead of being provided to the external processor for processing after the data processing of all channels is completed, and after the data processing of the channel is completed by the external processor, the data is immediately provided to the second network layer for processing, instead of being provided to the next layer for processing after the data processing of all channels is completed, so that the problem of low processing efficiency caused by the need of waiting for the data processing of all channels to be completed and triggering the next layer for processing is avoided, and the technical effect of effectively improving the processing efficiency is achieved.

Specifically, when the NPU is implemented, a first register and a second register may be set in the NPU, and when the first network layer finishes processing the data of the current channel, the ID of the channel is written into the first register, and then the external processor knows that the processing of the data of the channel can be triggered, and then after the external processor finishes processing, the ID of the channel may be written into the second register, so that the NPU is informed that the next network layer can process the data of the channel, and an effect of pipeline processing at a channel level is formed.

In this example, as shown in fig. 4, a data processing method of a neural network is provided, which includes the following steps:

step 401: the neural network processor performs convolution processing on the data of the current channel to obtain a convolution processing result of the current channel;

step 402: sending the convolution processing result of the channel to an external processor, performing preset operation on the convolution processing result of the current channel through the external processor, and performing convolution processing on the next channel of the current channel at the same time;

the predetermined operation may be an inversion operation or other operations, and specifically, what kind of the predetermined operation is may be selected according to actual needs, and when the predetermined operation is actually implemented, the operation type of the predetermined operation may be determined according to an actually required operation in the neural network, which is not limited in this application.

Step 403: and acquiring a preset operation result of the external processor on the current channel, and performing pooling processing on the preset operation result of the current channel.

In one embodiment, a trigger controller (e.g., sbox) may be provided, and the trigger controller is provided with the first register and the second register, and monitors the data status of the first register and the second register in real time.

Accordingly, sending the convolution processing result of the channel to the external processor may include:

s1: writing the channel identification of the channel into a first register;

s2: and under the condition that the change of the channel identifier in the first register is detected, triggering to send a convolution processing result corresponding to the changed channel identifier to the external processor.

Obtaining the result of the predetermined operation of the external processor on the current channel may include:

s1: detecting whether a channel identifier in a second register changes, wherein the channel identifier in the second register is written into the second register by an external processor after a predetermined operation on a current channel;

s2: and under the condition that the channel identification in the second register is determined to be changed, acquiring a predetermined operation result corresponding to the channel identification from an external processor.

Triggering sending the convolution processing result corresponding to the changed channel identifier to the external processor may include: the neural network processor sends an interrupt signal to the external processor; the external processor reads the channel identification of the current channel from the first register in response to the interrupt signal; and the external processor acquires the convolution processing result of the current channel according to the channel identifier of the current channel.

The external processor may be, but is not limited to, at least one of: CPU, GPU.

Based on the same inventive concept, the embodiment of the present invention further provides a data processing apparatus of a neural network, as described in the following embodiments. The principle of solving the problems of the data processing device of the neural network is similar to that of the data processing method of the neural network, so the implementation of the data processing device of the neural network can refer to the implementation of the data processing method of the neural network, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Fig. 5 is a block diagram of a data processing apparatus of a neural network according to an embodiment of the present invention, and as shown in fig. 5, the data processing apparatus of the neural network may be located in a neural network processor, and may include: a processing module 501, a sending module 502 and an obtaining module 503, which will be described below.

The processing module 501 is configured to perform convolution processing on data of a current channel to obtain a convolution processing result of the current channel;

a sending module 502, configured to send the convolution processing result of the channel to an external processor, perform a predetermined operation on the convolution processing result of the current channel through the external processor, and perform convolution processing on a next channel of the current channel at the same time;

an obtaining module 503, configured to obtain a predetermined operation result of the external processor on the current channel, and perform pooling processing on the predetermined operation result of the current channel.

In one embodiment, the sending module 502 may include: a write unit for writing a channel identification of the channel into a first register; and the triggering unit is used for triggering and sending a convolution processing result corresponding to the changed channel identifier to the external processor under the condition that the change of the channel identifier in the first register is detected.

In an embodiment, the triggering unit may be specifically configured to send an interrupt signal to the external processor, where the external processor reads a channel identifier of a current channel from the first register in response to the interrupt signal, and the external processor obtains a convolution processing result of the current channel according to the channel identifier of the current channel.

In one embodiment, the obtaining module 503 may include: the detection unit is used for detecting whether the channel identification in the second register changes or not, wherein the channel identification in the second register is written into the second register by the external processor after the external processor performs preset operation on the current channel; and the acquisition unit is used for acquiring a predetermined operation result corresponding to the channel identifier from an external processor under the condition that the channel identifier in the second register is determined to be changed.

In one embodiment, the external processor may include, but is not limited to, at least one of: CPU, GPU.

In another embodiment, a software is provided, which is used to execute the technical solutions described in the above embodiments and preferred embodiments.

In another embodiment, a storage medium is provided, in which the software is stored, and the storage medium includes but is not limited to: optical disks, floppy disks, hard disks, erasable memory, etc.

From the above description, it can be seen that the embodiments of the present invention achieve the following technical effects: after the neural network processor performs convolution processing on the data of the current channel to obtain a convolution processing result of the current channel, the convolution processing result is immediately provided for the external processor to be processed by the external processor instead of waiting for all channels to be processed, so that the problem of low processing efficiency caused by triggering the next layer for processing after all channel data are processed is solved, and the technical effect of effectively improving the processing efficiency is achieved.

In this specification, adjectives such as first and second may only be used to distinguish one element or action from another, without necessarily requiring or implying any actual such relationship or order. References to an element or component or step (etc.) should not be construed as limited to only one of the element, component, or step, but rather to one or more of the element, component, or step, etc., where the context permits.

It will be apparent to those skilled in the art that the modules or steps of the embodiments of the invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes may be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A data processing method for a neural network, comprising:

2. The method of claim 1, wherein sending the convolution processing result of the channel to an external processor comprises:

writing the channel identification of the channel into a first register;

3. The method of claim 2, wherein triggering sending of the convolution processing result corresponding to the changed channel identifier to the external processor comprises:

4. The method of claim 1, wherein obtaining the result of the predetermined operation of the external processor on the current channel comprises:

5. The method of any one of claims 1 to 4, wherein the external processor is at least one of: CPU, GPU.

6. A data processing apparatus of a neural network, located in a neural network processor, comprising:

7. The apparatus of claim 6, wherein the sending module comprises:

8. The apparatus according to claim 7, wherein the triggering unit is specifically configured to send an interrupt signal to the external processor, the external processor reads a channel identifier of a current channel from the first register in response to the interrupt signal, and the external processor obtains a convolution processing result of the current channel according to the channel identifier of the current channel.

9. The apparatus of claim 6, wherein the obtaining module comprises:

10. The apparatus of any one of claims 6 to 9, wherein the external processor is at least one of: CPU, GPU.