CN111523652A

CN111523652A - Processor, data processing method thereof and camera device

Info

Publication number: CN111523652A
Application number: CN201910105542.8A
Authority: CN
Inventors: 林伟; 张健松; 夏立雪; 刁岚松; 叶长安; 窦顺利; 孙猛; 蒋昭; 赵永科; 梁昊; 陈凯; 丁力
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2020-08-11
Anticipated expiration: 2039-02-01
Also published as: CN111523652B

Abstract

The invention discloses a processor, a data processing method thereof and a camera device. Wherein, the method comprises the following steps: the device comprises a caching device, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and the caching device and the convolution operation device are configured according to the operation mode; and the convolution operation device is in communication connection with the cache device and is used for operating the data to be processed according to the neural network parameters by using an operation mode. The invention solves the technical problem of poor universality of the neural network model processing device in the prior art.

Description

Processor, data processing method thereof and camera device

Technical Field

The invention relates to the field of processors, in particular to a processor, a data processing method thereof and a camera device.

Background

At present, a convolutional neural network acceleration scheme based on an FPGA and an ASIC, and a hardware structure design and performance optimization method are mainly oriented to an embedded scene for a specific application, such as an intelligent camera, and most of the schemes still stay at the level of performing completely customized design for a specific network structure. The schemes can not support some emerging operators used in actual services on one hand, and can not support simultaneous calculation of a plurality of convolutional neural networks with different structures in the same scheme on the other hand, so that the universality of the schemes is limited.

On the premise of ensuring the universality, how to improve the computing performance by utilizing the given hardware resources becomes a key. Under extreme conditions, if a special hardware computing module and a control flow are reserved for each network connection relation and operator, a large number of modules are in an idle state during actual operation, and hardware resources are wasted; and if all operators are disassembled to the most basic operation, the operators are degraded into a CPU or a GPU. Therefore, how to completely support all operations by using a small number of high-level operators and design corresponding hardware computing modules, cache modules and control modules to complete the operations is difficult to realize at present.

Aiming at the problem of poor universality of a neural network model processing device in the prior art, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a processor, a data processing method thereof and a camera device, and at least solves the technical problem that a neural network model processing device in the prior art is poor in universality.

According to an aspect of an embodiment of the present invention, there is provided a processor including: the device comprises a caching device, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and the caching device and the convolution operation device are configured according to the operation mode; and the convolution operation device is in communication connection with the cache device and is used for operating the data to be processed according to the neural network parameters by using an operation mode. The invention solves the technical problem of poor universality of the neural network model processing device in the prior art.

According to another aspect of the embodiments of the present invention, there is also provided a data processing method of a processor, including: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiments of the present invention, there is also provided a data processing method of a processor, including: the compiler receives data to be processed, neural network parameters and an operation instruction; the compiler sends and acquires data to be processed, neural network parameters and an operation instruction to the processor, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, the cache device and the convolution operation device are configured according to the operation mode, and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiments of the present invention, there is also provided an image pickup apparatus including the processor described above.

According to another aspect of the embodiments of the present invention, there is also provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to perform the following steps: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes the following steps: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

The conventional special convolutional neural network accelerator does not support the convolutional neural networks with different structures to run on the same hardware platform simultaneously. In the above scheme of the application, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, thereby providing a great space for the dynamic scheduling of various network tasks, and thus solving the technical problem that the neural network model processing device in the prior art is poor in universality.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a schematic structural diagram of a processor according to embodiment 1 of the present application;

FIG. 2 is a schematic diagram of an alternative processor according to embodiment 1 of the present application;

FIG. 3 is a schematic diagram of a first medium volume arithmetic unit according to embodiment 1 of the present application;

FIG. 4a is a schematic diagram of a skip write according to embodiment 1 of the present application;

FIG. 4b is a schematic diagram of a combined skip and skip application according to embodiment 1 of the present application;

fig. 5 illustrates a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a data processing method of a processor;

fig. 6 is a flowchart of a data processing method of a processor according to embodiment 2 of the present invention;

FIG. 7 is a schematic diagram of a compiler and a neural network processor processing task according to embodiment 2 of the present application;

FIG. 8a is a schematic diagram of a deconvolution operation;

FIG. 8b is a schematic diagram of a method for converting deconvolution into forward convolution operation according to embodiment 2 of the present application;

FIG. 8c is a schematic diagram of a method for converting a linear interpolation convolution into a convolution according to embodiment 2 of the present application;

fig. 9 is a diagram illustrating low bit quantization according to embodiment 2 of the present application;

fig. 10 is a flowchart of a data processing method of a processor according to embodiment 3 of the present invention;

FIG. 11 is a schematic diagram of a data processing apparatus of a processor according to embodiment 4 of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus of a processor according to embodiment 5 of the present application; and

fig. 13 is a block diagram of a computer terminal according to embodiment 7 of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

FPGA: the Field-Programmable Gate Array is Programmable hardware capable of modifying hardware function by burning program, and can be widely applied to the Field of hardware acceleration and also can be used for function verification before the design of a customized chip.

ASIC: application Specific Integrated Circuit, Application Specific chip, so called AI chip, bitcoin machine, and the like all belong to AISC.

Input Feature Map (IFM): the input data of each layer of neural network operation, for example, the input characteristic diagram of the first layer of neural network operation, is the input picture to be identified. The input feature map is a 3-dimensional matrix, and the three dimensions are width (W), height (H) and channel (channel) number respectively.

Output Feature Map (OFM, Output Feature Map): the calculation result data of each layer of neural network can also be a 3-dimensional matrix, and the three dimensions are respectively the width (W), the height (H) and the channel (channel) number.

Filter (Filter): that is, the weights of the neural network or the convolution kernels, and parameters required for performing a convolution operation in the convolution neural network. Each filter may be a 3-dimensional matrix with three dimensions being width (W), height (H) and channel (channel) number, respectively. The number of channels of the filter needs to be consistent with the number of channels of the input feature map, and the same layer of convolution contains a plurality of filters. The Nth filter is convolved with the data of a certain position (w, h) of the input feature map to generate the pixel of the Nth channel of the output feature map at the corresponding position (w, h), so that the number of the filters is equal to the number of the channels of the output feature map.

Deconv: deconvolution, the inverse of the convolution operation, can be viewed as adding 0 values between pixels in the convolved input feature map during execution, and when the convolution step (stride) of the forward convolution is s, (s-1) 0 values are added between every two pixels of the input feature map, thereby expanding the HxW-sized feature map into (s x H-s +1) x (s x W-s +1) -sized feature map.

Linear interpolation convolution: similar to deconvolution, (s-1) linearly interpolated samples are inserted as the interval between pixels in every 2 pixels of the input feature map, thereby expanding the feature map of the HxW size to a feature map of (s x H-s +1) x (s x W-s +1) size.

Scaled Conv: means that (s-1) 0 values are added as intervals between every 2 pixels of the convolved filter, thereby expanding the convolution kernel of WxH to a convolution kernel of (sxw-s +1) × (sxh-s + 1).

Concat: the feature map connection generally refers to that a plurality of feature maps with the same width and height but not necessarily the same channel number are spliced in the channel number dimension, for example, a feature map with the size w x h x ch1 is spliced with a feature map with the size w x h x ch2, and can be spliced into a feature map with the size w x h x (ch1+ ch 2).

Example 1

According to an embodiment of the present invention, there is further provided an embodiment of a processor, and fig. 1 is a schematic structural diagram of a processor according to embodiment 1 of the present application, and shown in fig. 1, the processor includes:

the cache device 10 acquires data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and the cache device and the convolution operation device are configured according to the operation mode.

Specifically, the data to be processed is data to be subjected to neural network operation, and may be image information or the like. For example, if the processor is used for performing image recognition, the data to be processed is input features corresponding to the image information to be recognized, and if the processor is applied to the monitoring field, the data to be processed may be input features corresponding to the image information acquired by the camera.

The neural network parameters may be neural network parameters of a preset neural network model, that is, filter data, and the operation instruction may be issued by a user or triggered by input data to be processed after configuration information is set by the user.

The configuration information in the operation instruction at least comprises an operation mode. The operation mode can be set by a user or determined by the processor according to an actual task. In an alternative embodiment, the configuration information may indicate the address of the cache device from which data is read, the size of the cache device, and the address of the cache device from which data is written, thereby enabling the cache device to adapt to various operating modes.

In an alternative embodiment, the processor is an FPGA module or an ASIC module.

And the convolution operation device 20 is in communication connection with the cache unit and is used for operating the data to be processed according to the neural network parameters by using the operation mode indicated by the operation instruction.

Specifically, the convolution operation device is connected with the cache device, acquires the data to be processed and the neural network parameters from the cache device, and processes the data to be processed according to the configuration information in the operation instruction after receiving the operation instruction.

In an alternative embodiment, the convolution operation device supports operators such as deconvolution, scaled Conv, and nonlinear interpolation convolution, so that more operations can be supported. The convolution operation can be configured according to configuration information in the operation instruction, so that various operation modes are supported.

It should be noted that the conventional dedicated convolutional neural network accelerator does not support a plurality of convolutional neural networks with different structures running on the same hardware platform at the same time. In the above scheme of the application, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for the dynamic scheduling of various network tasks. The plurality of convolution calculation modules cooperate to perform the same convolution neural network task or perform different convolution neural network tasks in groups, and the grouping mode can be dynamically adjusted according to instructions. The method provides a performance optimization space for task scheduling.

Therefore, the embodiment of the application solves the technical problem that the neural network model processing device in the prior art is poor in universality.

As an alternative embodiment, the operation mode includes: and a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task.

The scheme provides two operation modes, the operation mode can be set by a user, and the operation mode can also be determined by the task type.

In an alternative embodiment, the tasks received by the processor have priorities, and if the priorities of the tasks received by the processor are the same, the operation mode is determined to be that the plurality of convolution operation modules execute a plurality of different tasks at the same time, so that the plurality of tasks received by the processor can be executed in parallel; if one of the tasks received by the processor has the highest priority, the mode that the plurality of convolution operation modules execute the same task together can be selected, the task with the highest priority is executed firstly, and then the plurality of convolution operation modules execute other tasks simultaneously.

In another alternative embodiment, the operation module may also be determined according to the data amount in the task (the data amount of the data to be processed and/or the data amount of the neural network parameter). When the data volume of the task is larger than a first preset value, the operation mode can be that N (N >1) convolution operation modules jointly execute the same task, and N can be determined according to the data volume of the task; when the data amount of the task is smaller than the set data amount, the operation mode may be that a plurality of convolution operation modules simultaneously execute a plurality of different tasks.

In yet another alternative embodiment, the operation module may be determined according to the current utilization rate of the processor. When the utilization rate of the processor is greater than the first preset value, the tasks can be serially processed in a mode that a plurality of convolution operation modules jointly execute the same task.

In an optional embodiment, the processor further includes: the memory access device is used for acquiring data to be processed and an operation instruction from the compiler, wherein the compiler is communicated with the neural network model processing device; the synchronous dynamic random access memory is used for storing the neural network parameters; and the buffer acquires the data to be processed and the operation instruction from the direct memory access through the bus, and acquires the neural network parameters from the synchronous dynamic random access memory through the bus.

Specifically, the Memory Access device is a Direct Memory Access (DMA), and DMA transfer is used to copy data from one address controller to another address controller, which allows hardware devices with different speeds to communicate without relying on a large amount of terminal loads of a CPU. The synchronous dynamic random access memory may be a DDR (Double data rate SDRAM). In the above embodiment, the convolution operation device obtains the operation instruction, the neural network parameter and the operation instruction from the cache device, and the cache device obtains the operation instruction, the neural network parameter and the operation instruction from the DMA or the DDR through the BUS.

Fig. 2 is a schematic structural diagram of an alternative processor according to embodiment 1 of the present application, and referring to fig. 2, a DMA module obtains data to be processed and an operation instruction including configuration information from a CPU, the configuration information is stored in a local register (not shown in the figure), and other information such as neural network parameters is stored in an off-chip memory (to external memory), for example, a DDR memory, by a memory control module (memory controller) through a BUS.

As an alternative embodiment, wherein the buffer comprises: the filter buffer is used for storing the neural network parameters; and the input image buffer is used for storing the data to be processed.

Specifically, the Filter Buffer (Filter Buffer) and the input image Buffer (IFM Buffer) are respectively in communication with a bus to obtain the neural network parameters and the data to be processed through the bus.

In an alternative embodiment, as shown in fig. 2, the processor reads the data to be processed, i.e., the input feature map and the neural network weights (i.e., the filters), from the DDR memory to the corresponding buffer via the memory control module and the bus according to the instruction.

As an alternative embodiment, the convolution operator reads the data to be processed from the input image buffer in a skip mode.

In the above scheme, the convolution operation device can read the data to be processed in a continuous or discontinuous manner for operation. The convolution operation device continuously reads the data to be processed and indicates that the convolution operation device sequentially reads each element in the data to be processed according to the sequence of each element; the convolution operation device reads the data to be processed according to a discontinuous mode for representation, and the convolution operation device does not read according to the sequence of each element in the data to be processed, namely skip reading.

Specifically, the skip mode is used to indicate that when the convolution operation device reads the data to be processed from the input image buffer, the data to be processed in the input image buffer is read according to the preset interval step number.

The convolution operation device in the embodiment of the application reads data to be processed from the image buffer in a skip reading mode, and achieves shuffling (shuffle) of data positions by directly configuring the read-write position of the buffer, so that the purpose of adding 0 value interval in a filter is achieved, namely, the purpose of performing interval sampling on an input feature map and then performing convolution operation is achieved, and the purpose of supporting a scaled Conv operator is achieved. According to the scheme, the scaled Conv operator is converted into the ordinary convolution for discontinuously reading the data to be processed of the reduced convolution filter, so that the scaled Conv operator can be realized by using an ordinary convolution module in an college mode. In addition, the scheme enables a plurality of subsequent modules with different data storage sequence requirements to read the same data through a discontinuous reading method, thereby reducing the hardware resource overhead introduced by data shuffling.

As an alternative embodiment, the convolution operation means includes: and the vector multiplication array unit comprises a plurality of vector multipliers, wherein each vector multiplier carries out operation according to the input characteristics corresponding to the received data to be processed and the neural network parameters, and outputs an operation result.

Specifically, the convolution operation is realized by a vector multiplication operation, and the convolution operation means performs multiplication of a vector by a vector multiplication unit included therein.

Fig. 3 is a schematic diagram of a first middle-volume operation unit according to embodiment 1 of the present application, and in combination with fig. 3, a convolution operation module may read an input feature map from an input image buffer device in a skip mode to perform an operation. Each convolution module comprises M-N vector multiplication units (VU), each vector multiplication unit completes the vector multiplication of an input feature map and a filter, and outputs a calculation result.

As an optional embodiment, the vector multiplier is further configured to determine whether the received to-be-processed data and the neural network parameter are valid, and perform an operation on the input feature and the neural network weight corresponding to the received to-be-processed data and output an operation result when both the received to-be-processed data and the received neural network parameter are valid.

In the above scheme, before the vector multiplier performs the vector multiplication, it is further required to determine whether the neural network parameter and the input feature data are valid, if the neural network parameter or the input feature data are invalid, the matrix multiplication is not performed, and if the neural network parameter and the input feature data are both valid, the multiplication of the neural network parameter and the input feature data is performed, and a calculation result is output.

As an alternative embodiment, the vector multiplier closest to the buffer reads the neural network weights and the data to be processed from the buffer.

Still referring to fig. 3, the vector multiplier at the edge of each buffer device reads in the input feature map data or neural network parameters at the corresponding position.

As an alternative embodiment, the vector multiplier transmits the current input feature data to the right vector multiplier, and transmits the neural network parameters to the lower direction vector multiplier.

Still referring to fig. 3, after the vector multiplier performs the matrix multiplication, the operation result, i.e. the current input feature data, is transmitted to the vector multiplier on the right side, and the filter data is transmitted to the vector multiplier below.

The following describes the procedure of the convolution operation device in detail:

step a, reading input characteristic diagram data or filter data of a corresponding position by a vector multiplication unit of each cache edge;

b, each vector multiplication unit judges whether the currently received input feature data and the filter data are valid, if so, the vector multiplication calculation is finished and a calculation result is output;

step c, each vector multiplication unit transmits the current input feature data to the right vector multiplication unit and transmits the filter data to the lower direction quantity multiplication unit;

and circularly executing a, b and c until all vector multipliers have no input characteristic data.

As an optional embodiment, the above caching apparatus further includes: and the output buffer device is used for buffering the operation result output by the convolution arithmetic unit, wherein the convolution arithmetic unit writes the operation result into the output buffer device according to a skip writing mode.

Specifically, the skip-writing method is a discontinuous method when writing data into the output buffer device. In an alternative embodiment, as shown in fig. 2, a subsequent computation module (Post) is further included between the convolution operation unit and the output buffer, where the Post is used to complete all operations from the current convolutional layer to the next convolutional layer, for example: bias (bias), Batch Normalization (BN, Batch Normalization), Scale transformation (Scale), nonlinear neurons (sigmoid, ReLU, etc.). The convolution operation device writes the calculation result into an output calculation graph buffer (OFM buffer) in a continuous mode or a discontinuous mode through the subsequent calculation module (Post), and the data in the output calculation graph buffer is stored back into the DDR through the bus and the memory control module.

In the scheme, a mode of discontinuously writing and outputting the characteristic diagram cache in the hardware design is utilized, a plurality of convolution operation results (the results finally output from the Post module) can be directly written into the cache position considering the Concat storage interval, so that the Concat operation is directly finished by controlling the cache writing position without using a separate hardware unit for execution, the purpose of Concat elimination is achieved, the additional calculation cost is eliminated, and the calculation performance is improved.

Fig. 4a is a schematic diagram of a skip write according to embodiment 1 of the present application, if a continuous write mode is adopted, OFMs of Conv1 and Conv2 … … Convn are sequentially written into an output buffer device, and if a skip write mode is adopted, in combination with fig. 4a, only by taking Conv1 and Conv2 as an example, corresponding OFMs are written into the output buffer device according to a preset number of interval steps. In this example, the preset number of interval steps is 1, and when the Conv1 writes in the output buffer, a space is reserved for OFMs of Conv2 every time one OFM is written, so that the skip function is realized. Thus, the OFM written in the output buffer device does not need to exchange the position, and the Concat operation is completed directly.

Meanwhile, as shown in fig. 4b, fig. 4b is a schematic diagram of a combined application of skip reading and skip writing according to embodiment 1 of the present application, in which Conv1 and Conv2 are written into an output buffer device in a skip writing manner, so as to directly implement Concat operation, and when it is necessary to use OFM of Conv1 alone as IFM of Conv3, Conv3 performs skip reading from Concat, so as to directly read OFM of Conv1, without splitting Concat. Therefore, by utilizing the characteristic of discontinuously reading the input feature map buffer, although a specific convolution operation result is discontinuous in the buffer, the specific convolution operation result can still be directly and independently used as the input feature map of the next layer of convolution operation.

It should be noted that the existing dedicated convolutional neural network accelerator uses a special computing unit to perform data shuffling, i.e., data position exchange due to the difference of requirements of two cascaded computing units on data storage formats. In the application, by using the provided skip writing or skip reading cache hardware structure, the exchange of data positions can be directly completed during data writing, or a plurality of subsequent modules with different data storage sequence requirements can read the same data through a discontinuous reading method, thereby reducing the hardware resource overhead introduced by data shuffling.

In the solution of the above embodiment of the present application, the input feature map buffer supports reading data in a discontinuous manner, and the output feature map buffer supports writing the output feature map in a discontinuous manner. The processor in this embodiment may further execute the data processing method of the processor in embodiment 2 described below, and in combination with the data processing method of the processor in embodiment 2, the processor enables the convolution calculation module to support deconvolution and scaled Conv operations, thereby implementing merging of a plurality of different operators and improving hardware utilization rate on the premise of ensuring universality. The data processing method of the processor will be described in embodiment 2.

Example 2

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a data processing method for a processor, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than here.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Fig. 5 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a data processing method of a processor. As shown in fig. 5, computer terminal 50 (or mobile device 50) may include one or more (shown as 502a, 502b, … …, 502 n) processors 502 (processor 502 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), memory 504 for storing data, and a transmission module 506 for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 5 is only an illustration and is not intended to limit the structure of the electronic device. For example, computer terminal 50 may also include more or fewer components than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

It should be noted that the one or more processors 502 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 504 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method of the processor in the embodiment of the present invention, and the processor 502 executes various functional applications and data processing by running the software programs and modules stored in the memory 504, that is, implementing the vulnerability detection method of the application program. The memory 504 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 504 may further include memory located remotely from the processor 502, which may be connected to the computer terminal 50 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 506 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 50. In one example, the transmission device 506 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 506 can be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 5 above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 5 is only one example of a particular specific example and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

Under the operating environment, the application provides a data processing method of the processor shown in fig. 6. Fig. 6 is a flowchart of a data processing method of a processor according to embodiment 2 of the present invention.

And step S61, the processor acquires data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and the cache device and the convolution operation device are configured according to the operation mode.

Specifically, the processor is configured to perform an operation on the neural network model, and the processor includes a cache device, and the cache in the processor terminates the acquisition of the data to be processed, the neural network parameters, and the operation instruction.

The data to be processed is data to be subjected to neural network operation, and may be image information or the like. For example, if the processor is used for performing image recognition, the data to be processed is input features corresponding to the image information to be recognized, and if the processor is applied to the monitoring field, the data to be processed may be input features corresponding to the image information acquired by the camera.

And step S63, the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

It should be noted that the conventional dedicated convolutional neural network accelerator does not support a plurality of convolutional neural networks with different structures running on the same hardware platform at the same time. In the above scheme of the application, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for the dynamic scheduling of various network tasks.

In an alternative embodiment, the operation modes include: and a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task.

In an alternative embodiment, the processor includes an input image buffer and a convolution operation device, wherein the convolution operation device reads the data to be processed from the buffer in a skip mode.

In the above scheme, the convolution operation device can read the data to be processed in a continuous or discontinuous manner for operation. The convolution operation device continuously reads the data to be processed for representation, and sequentially reads each element in the data to be processed according to the sequence; the convolution operation device reads the data to be processed according to a discontinuous mode for representation, and the convolution operation device does not read according to the sequence of each element in the data to be processed, namely skip reading.

Specifically, the skipping mode is used to indicate that when the convolution operation device reads the data to be processed from the input image buffer, the data to be processed in the input image buffer is read according to the preset skipping step number.

The convolution operation device in the embodiment of the application reads data to be processed from the image buffer in a skip reading mode, and achieves shuffling (shuffle) of data positions by directly configuring the read-write position of the buffer, so that the purpose of adding 0 value interval in a filter is achieved, namely, the purpose of performing interval sampling on an input feature map and then performing convolution operation is achieved, and the purpose of supporting a scaled Conv operator is achieved. According to the scheme, the scaled Conv operator is converted into the ordinary convolution for discontinuously reading the data to be processed of the reduced convolution filter, so that the scaled Conv operator can be realized by using an ordinary convolution module in an college mode.

In an optional embodiment, the neural network model further includes an output buffer device, wherein the convolution operation device writes the operation result into the output buffer device in a skip-write manner.

In an optional embodiment, the acquiring, by the processor, data to be processed and the neural network parameter include: the processor acquires the split data to be processed, wherein the compiler is used for executing one or more of the following items: under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, converting the deconvolution operation in the operation instruction into a plurality of convolution operations, and splitting the data of the processor to be processed into data corresponding to the convolution operations; under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation, converts the converted deconvolution operation into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

Specifically, the compiler is a processor coupled to a neural network processor, which in an alternative embodiment may be a CPU, and the neural network processor may be a FGPA module or an ASIC module in communication with the CPU.

Fig. 7 is a schematic diagram of processing tasks of a Compiler and a neural network processor according to embodiment 2 of the present application, and with reference to fig. 7, the Compiler (Compiler) runs in a Software layer (Software) and is configured to compile and optimize a neural network model to be executed, and at run time (runtime), according to instructions and parameters compiled by the Compiler, data to be processed and operation instructions are scheduled and transmitted to a processor (CNN processor) in a Hardware layer (Hardware), so as to drive the processor to complete computations and return results to applications, thereby maintaining normal operation of Hardware.

The above-described scheme is explained below on the basis of fig. 7. In the above scheme, the compiler splits the large convolution operation, and the large convolution operation may include: deconvolution operation, convolution operation of nonlinear interpolation and full join operation. The large-scale convolution operation is split into simple convolution operation, so that the purpose of greatly saving operation overhead can be achieved.

Fig. 8a is a schematic diagram of a deconvolution operation, in this example, a deconvolution operation with a step size of 2 (0 is added between every two pixels) is performed on an input feature image of 6 × 6 using a volume set of 3 × 3, and when the deconvolution operation is performed in a conventional manner, if the frequencies of the 4 operation cases shown in fig. 8a are equal (there is a slight change in actual picture size, but the frequencies are close), the proportion of the multiplication operation of non-0 input feature pixels to the total multiplication operation is 9/36 — 25%. The convolution operation can be viewed as a vector multiplication, so that the results of each pixel are accumulated, and thus the remaining 75% of the "multiply by 0" operations are invalid. The proportion of invalid computations increases as the ratio of the step size to the convolution filter size increases, wasting significant computational resources and bandwidth.

According to the scheme, the deconvolution operation is converted into a plurality of forward convolutions for operation, fig. 8b is a schematic diagram for converting the deconvolution operation into the forward convolution operation according to embodiment 2 of the present application, and the situation that the 3x3 deconvolution operation shown in fig. 8a is at 4 different positions is converted into the 4 forward convolution operations shown in fig. 8b, so that all the operations are effective operations, and extra data shuffling cost cannot be introduced by utilizing the skip-write cache characteristic, therefore, the operation overhead is greatly reduced, and meanwhile, by utilizing the innovative characteristic of the hardware that the characteristic diagram is discontinuously written into and output, the calculation results at the four different positions can be directly written into the respective corresponding cache positions without additional data summarizing steps.

Similarly, the linear interpolation convolution conversion method and the scaled Conv conversion method provided by the scheme can delete all invalid operations, so that the universality is ensured, and the calculation speed is increased by using limited hardware resources.

The conversion of the operator can multiplex the processor as efficiently as possible to complete various different computing tasks on the premise of ensuring the universality supported by the algorithm, so that the hardware utilization rate is improved, and the computing performance of the processor is improved.

In an alternative embodiment, the processor obtains the neural network parameters, including: and the processor acquires the split neural network parameters, wherein the compiler detects the data size of the neural network parameters, and if the size of the neural network parameters exceeds a second preset value, the neural network parameters are split.

In the above scheme, by splitting the neural network parameters, the large-size convolution filter is split into a plurality of filters, and the large-size input feature map is split into a plurality of feature maps, so that the large-size convolution operation that cannot be supported by one processor is split into a plurality of small convolution operations.

In an optional embodiment, the operation instruction further includes quantization information, and the method further includes: and the neural network model processor compresses the data to be processed and the neural network parameters according to the quantization information.

Specifically, the quantization information is used for performing low-bit quantization on the data to be processed and the neural network layer parameters, so that the purpose of compressing the data is achieved, and further the overhead required during operation is reduced.

Fig. 9 is a diagram illustrating low bit quantization according to embodiment 2 of the present application, and in conjunction with fig. 9, the quantization process is divided into three stages (stage1, stage2, and stage 3). stage 1: firstly, fixing a feature map (floating point 32bit), and fixing Weights of the filter; stage 2: then fixing the quantified filter, and fixing the Activation SF (Activation Scaling Factor) of the feature map; stage 3: finally, the quantized filter and the feature map are processed with fine-tune.

In the scheme, the quantization range is also used as a training parameter in the fine-tune (fine-tune) process after quantization, so that the bit number after quantization is further reduced on the premise of ensuring the accuracy of the algorithm.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 3

According to an embodiment of the present invention, there is further provided an embodiment of a data processing method of a processor, and fig. 10 is a flowchart of a data processing method of a processor according to embodiment 3 of the present invention, which is shown in fig. 10, and includes:

and step S101, a compiler calculates the instruction according to the received data to be processed, the neural network parameters and the received operation instruction.

Specifically, the compiler is configured to receive data to be processed input by a user, and generate a neural network parameter and an operation instruction according to a program written by the user.

And S103, sending the acquired data to be processed, the neural network parameters and an operation instruction to the processor by the compiler, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and configuring the cache device and the convolution operation device according to the operation mode, and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

The convolution operation device is connected with the cache device, obtains the data to be processed and the neural network parameters from the cache device, and processes the data to be processed according to the configuration information in the operation instruction after receiving the operation instruction.

As an alternative embodiment, before the compiler sends the data to be processed, the neural network parameters and the operation instruction to the processor, the method further includes one or more of the following: under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, the compiler converts the deconvolution operation in the operation instruction into a plurality of convolution operations; under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation and converts the deconvolution operation obtained by conversion into a plurality of convolution operations; and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation.

The above-described scheme is explained below on the basis of fig. 7. In the above scheme, the compiler splits the large convolution operation, and the large convolution operation may include: deconvolution operation, convolution operation of nonlinear interpolation and full join operation. The large-scale convolution operation is split into simple convolution operation, so that the aim of greatly saving operation overhead is fulfilled.

Fig. 8a is a schematic diagram of a deconvolution operation, in this example, a deconvolution operation is performed on 6 × 6 input feature images using 3 × 3 volume sets, and about 75% of the calculation is wasted on multiplying the filter pixels by 0 and a large amount of calculation resources are wasted when the calculation is performed in the conventional manner. According to the scheme, the deconvolution operation is converted into the multiple forward convolutions for operation, fig. 8b is a schematic diagram for converting the deconvolution operation into the forward convolution operation according to embodiment 2 of the present application, the situation that the 3x3 deconvolution shown in fig. 8a is at 4 different positions is converted into the 4 forward convolution operations of fig. 8b, the operation overhead is greatly reduced, and meanwhile, the innovative characteristic of non-continuous writing of hardware into an output characteristic diagram is utilized, the calculation results at the four different positions can be directly written into the corresponding cache positions, and no additional data summarizing step is needed.

The above scheme can also transfer the linear interpolation part in the operation to the filter, thereby converting the linear interpolation convolution into deconvolution by expanding the size of the filter. Fig. 8c is a schematic diagram of converting linear interpolation convolution into convolution according to embodiment 2 of the present application, and taking 3 × 3 linear interpolation convolution shown in fig. 8c as an example, it may be first converted into deconvolution of 5 × 5, and further converted into multiple forward convolutions, so that the forward convolution operation device performs linear interpolation convolution operation, thereby enabling the processor to support operation of linear interpolation convolution.

As an alternative embodiment, the processor obtains the neural network parameters, including: detecting the data size of the neural network parameter by the compiler; and if the size of the neural network parameter exceeds a preset value, splitting the neural network parameter, wherein the processor acquires the split neural network parameter.

Example 4

According to an embodiment of the present invention, there is further provided a data processing apparatus of a processor for implementing the data processing method of the processor in embodiment 2, and fig. 11 is a schematic diagram of the data processing apparatus of the processor according to embodiment 4 of the present application, as shown in fig. 11, the apparatus 1100 includes:

the obtaining module 1102 is configured to obtain, by the processor, to-be-processed data, a neural network parameter, and an operation instruction, where the operation instruction includes configuration information, and the configuration information is used to determine an operation mode and configure the cache device and the convolution operation device according to the operation mode.

And the operation module 1104 is used for the processor to operate the data to be processed according to the neural network parameters by using the operation mode.

It should be noted here that the acquiring module 1102 and the calculating module 1104 correspond to steps S61 to S63 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the processor includes an input image buffer device and a convolution operation device, wherein the convolution operation device reads the data to be processed from the buffer device in a skip mode.

As an alternative embodiment, the neural network model further includes an output buffer device, wherein the convolution operation device writes the operation result into the output buffer device in a skip-writing manner.

As an alternative embodiment, the obtaining module includes: the first obtaining submodule is used for obtaining the split data to be processed and the converted operation instruction by the processor, wherein the compiler is used for executing one or more of the following items: under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, converting the deconvolution operation in the operation instruction into a plurality of convolution operations, and splitting the data of the processor to be processed into data corresponding to the convolution operations; under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation, converts the converted deconvolution operation into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

As an alternative embodiment, the obtaining module includes: and the second acquisition sub-module is used for acquiring the split neural network parameters by the processor, wherein the compiler detects the data size of the neural network parameters, and if the size of the neural network parameters exceeds a second preset value, the neural network parameters are split.

As an optional embodiment, the operation instruction further includes quantization information, and the apparatus further includes: and the compression module is used for compressing the data to be processed and the neural network parameters by the neural network model processor according to the quantization information.

Example 5

According to an embodiment of the present invention, there is further provided a data processing apparatus of a processor for implementing the data processing method of the processor in embodiment 3, and fig. 12 is a schematic diagram of the data processing apparatus of the processor according to embodiment 5 of the present application, and as shown in fig. 12, the apparatus 1200 includes:

the receiving module 1202 is configured to receive data to be processed, neural network parameters, and an operation instruction by a compiler.

A sending module 1204, configured to send, by the compiler, the acquired to-be-processed data, the neural network parameter, and the operation instruction to the processor, where the operation instruction includes configuration information, the configuration information is used to determine an operation mode, and configure the cache device and the convolution operation device according to the operation mode, and the processor uses the operation mode to perform operation on the to-be-processed data according to the neural network parameter.

It should be noted here that the receiving module 1202 and the sending module 1204 correspond to steps S101 to S103 in embodiment 2, and the two modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of the apparatus may be run in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the apparatus further comprises one or more of the following:

the first conversion module is used for converting the deconvolution operation in the operation instruction into a plurality of convolution operations under the condition that the operation instruction is used for performing deconvolution operation on the data to be processed before the compiler sends the acquired data to be processed, the neural network parameters and the operation instruction to the processor;

the second conversion module is used for converting the convolution operation of the nonlinear interpolation in the operation instruction into the deconvolution operation by the compiler under the condition that the operation instruction is the convolution operation for carrying out the nonlinear interpolation on the data to be processed, and converting the deconvolution operation obtained by conversion into a plurality of convolution operations;

and the third conversion module is used for converting the full-connection operation in the operation instruction into the convolution operation by the compiler under the condition that the operation instruction is used for performing the full-connection operation on the data to be processed.

As an alternative embodiment, the receiving module comprises: the detection module is used for detecting the data size of the neural network parameters by the compiler; and the splitting module is used for splitting the neural network parameters if the size of the neural network parameters exceeds a preset value, wherein the processor acquires the split neural network parameters.

Example 6

An embodiment of the present invention may provide an image pickup apparatus including the processor described in embodiment 1.

The conventional convolutional neural network acceleration scheme based on the FPGA and the ASIC mainly faces to an embedded scene of an intelligent camera and the like aiming at a certain specific application, so that most of the conventional convolutional neural network acceleration scheme still stays at the level of completely customizing a certain specific network structure. Therefore, on one hand, the schemes do not support some emerging operators used in actual services, on the other hand, the schemes also cannot support simultaneous calculation of a plurality of convolutional neural networks with different structures in the same scheme, and finally, the universality of the schemes is limited. The camera device in the embodiment of the application, including the processor in embodiment 1, has higher commonality, especially to the scene that needs to calculate multiple different convolution neural networks in the cloud server simultaneously and handle the data of different sensors, has higher advantage.

Example 7

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application program: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

Alternatively, fig. 13 is a block diagram of a computer terminal according to embodiment 7 of the present invention. As shown in fig. 13, the computer terminal a may include: one or more processors 1302 (only one of which is shown), memory 1304, and peripheral devices 1306

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the security vulnerability detection method and apparatus in the embodiments of the present invention, and the processor executes various functional applications and data processing by operating the software programs and modules stored in the memory, that is, the above-mentioned method for detecting a system vulnerability attack is implemented. The memory may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located from the processor, and these remote memories may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

Optionally, the processor may further execute the program code of the following steps: the operation mode comprises the following steps: and a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task.

Optionally, the processor may further execute the program code of the following steps: the processor comprises an input image buffer device and a convolution operation device, wherein the convolution operation device reads data to be processed from the buffer device in a skip mode.

Optionally, the processor may further execute the program code of the following steps: the neural network model further comprises an output cache device, wherein the convolution operation device writes operation results into the output cache device according to a skip writing mode.

Optionally, the processor may further execute the program code of the following steps: the processor acquires the split data to be processed and the converted operation instruction, wherein the compiler is used for executing one or more of the following items: under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, converting the deconvolution operation in the operation instruction into a plurality of convolution operations, and splitting the data of the processor to be processed into data corresponding to the convolution operations; under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation, converts the converted deconvolution operation into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

Optionally, the processor may further execute the program code of the following steps: and the processor acquires the split neural network parameters, wherein the compiler detects the data size of the neural network parameters, and if the size of the neural network parameters exceeds a second preset value, the neural network parameters are split.

Optionally, the processor may further execute the program code of the following steps: the operation instruction further comprises quantization information, and the method further comprises the following steps: and the neural network model processor compresses the data to be processed and the neural network parameters according to the quantization information.

The embodiment of the invention provides a data processing method of a processor. It should be noted that the conventional dedicated convolutional neural network accelerator does not support a plurality of convolutional neural networks with different structures running on the same hardware platform at the same time. In the above scheme of the application, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for the dynamic scheduling of various network tasks. Therefore, the embodiment of the application solves the technical problem that the neural network model processing device in the prior art is poor in universality.

It can be understood by those skilled in the art that the structure shown in fig. 13 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 13 is a diagram illustrating a structure of the electronic device. For example, the computer terminal 30 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 13, or have a different configuration than shown in FIG. 13.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 8

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the data processing method of the processor provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode; the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A processor, comprising:

the device comprises a cache device and a convolution operation device, wherein the cache device is used for acquiring data to be processed, neural network parameters and an operation instruction, the operation instruction comprises configuration information, and the configuration information is used for determining an operation mode and configuring the cache device and the convolution operation device according to the operation mode;

and the convolution operation device is in communication connection with the cache device and is used for operating the data to be processed according to the neural network parameters by using the operation mode.

2. The processor of claim 1, wherein the operational mode comprises: and a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task.

3. The processor of claim 1, wherein the processor further comprises:

the memory access device is used for acquiring the data to be processed and the operation instruction from a compiler, wherein the compiler is communicated with the processor;

the synchronous dynamic random access memory is used for storing the neural network parameters;

the cache device acquires the data to be processed and the operation instruction from the memory access device through the bus, and acquires the neural network parameters from the synchronous dynamic random access memory through the bus.

4. The processor of claim 1, wherein the caching apparatus comprises:

the filter buffer device is used for storing the neural network parameters;

and the input image buffer device is used for storing the data to be processed.

5. The processor according to claim 4, wherein the convolution operation means reads the data to be processed in a skip manner from the input image buffer means.

6. The processor of claim 1, wherein the convolution operation means comprises:

and the vector multiplication array comprises a plurality of vector multipliers, wherein each vector multiplier carries out operation according to the input characteristics corresponding to the received data to be processed and the neural network parameters, and outputs an operation result.

7. The processor according to claim 6, wherein the vector multiplier is further configured to determine whether the received data to be processed and the neural network parameter are valid, and operate the input features and the neural network weights corresponding to the received data to be processed and output an operation result when both the received data to be processed and the neural network parameter are valid.

8. The processor according to claim 6, wherein a vector multiplier closest to the caching device reads in the neural network parameters and the input features corresponding to the data to be processed from the caching device.

9. The processor of claim 6, wherein the vector multiplier passes current input feature data to a right vector multiplier and neural network parameters to a lower vector multiplier.

10. The processor of claim 1, wherein the caching apparatus further comprises:

and the output buffer device is used for buffering the operation result output by the convolution operation device, wherein the convolution operation device writes the operation result into the output buffer device in a skip writing mode.

11. The processor of claim 1, wherein the processor is an FPGA or an ASIC.

12. A data processing method of a processor, comprising:

the method comprises the steps that a processor obtains data to be processed, neural network parameters and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a cache device and a convolution operation device are configured according to the operation mode;

and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

13. The method of claim 12, wherein the operational mode comprises: and a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task.

14. The method of claim 12, wherein the processor comprises an input image buffer and a convolution operation, wherein the convolution operation reads the data to be processed from the buffer in a skip mode.

15. The method of claim 12, wherein the buffer device further comprises an output buffer device, wherein the convolution operation device writes operation results to the output buffer device in a skip-write manner.

16. The method of claim 12, wherein the processor fetching the data to be processed and the operation instruction comprises:

the processor obtains the split data to be processed and the converted operation instruction, wherein the compiler is used for executing one or more of the following items:

under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, converting the deconvolution operation in the operation instruction into a plurality of convolution operations, and splitting the data to be processed into data corresponding to the convolution operations;

under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation, converts the converted deconvolution operation into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation;

and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

17. The method of claim 12, wherein the processor obtaining neural network parameters comprises:

the processor obtains the split neural network parameters, wherein the compiler detects the data size of the neural network parameters, and if the size of the neural network parameters exceeds a second preset value, the neural network parameters are split.

18. The method of claim 12, wherein the operation instruction further includes quantization information therein, the method further comprising: and the processor compresses the data to be processed and the neural network parameters according to the quantization information.

19. A data processing method of a processor, comprising:

the compiler receives data to be processed, neural network parameters and an operation instruction;

the compiler sends and acquires data to be processed, neural network parameters and an operation instruction to a processor, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode and configuring a cache device and a convolution operation device according to the operation mode, and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

20. The method of claim 19, wherein prior to the compiler sending the fetch pending data, neural network parameters, and operational instructions to the processor, the method further comprises one or more of:

under the condition that the operation instruction is used for carrying out deconvolution operation on the data to be processed, the compiler converts the deconvolution operation in the operation instruction into a plurality of convolution operations;

under the condition that the operation instruction is convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation and converts the deconvolution operation obtained by conversion into a plurality of convolution operations;

and under the condition that the operation instruction is used for carrying out full connection operation on the data to be processed, the compiler converts the full connection operation in the operation instruction into convolution operation.

21. The method of claim 19, wherein the processor obtaining neural network parameters comprises:

detecting the data size of the neural network parameter by a compiler;

and if the size of the neural network parameter exceeds a preset value, splitting the neural network parameter, wherein the processor acquires the split neural network parameter.

22. A storage medium comprising a stored program, wherein the program, when executed, controls an apparatus on which the storage medium is located to perform the steps of:

23. A processor for running a program, wherein the program when run performs the steps of:

24. An image capture device comprising the processor of claim 1.