CN111523652B

CN111523652B - Processor, data processing method thereof and image pickup device

Info

Publication number: CN111523652B
Application number: CN201910105542.8A
Authority: CN
Inventors: 林伟; 张健松; 夏立雪; 刁岚松; 叶长安; 窦顺利; 孙猛; 蒋昭; 赵永科; 梁昊; 陈凯; 丁力
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2019-02-01
Filing date: 2019-02-01
Publication date: 2023-05-02
Anticipated expiration: 2039-02-01
Also published as: CN111523652A

Abstract

The invention discloses a processor, a data processing method thereof and an image pickup device. Wherein the method comprises the following steps: the buffer device is used for acquiring data to be processed, neural network parameters and operation instructions, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, and the buffer device and the convolution operation device are configured according to the operation mode; the convolution operation device is in communication connection with the buffer device and is used for operating the data to be processed according to the neural network parameters by using an operation mode. The invention solves the technical problem of poor universality of the neural network model processing device in the prior art.

Description

Processor, data processing method thereof and image pickup device

Technical Field

The invention relates to the field of processors, in particular to a processor, a data processing method thereof and an image pickup device.

Background

At present, a convolutional neural network acceleration scheme based on an FPGA and an ASIC, a hardware structure design and a performance optimization method are mainly oriented to an embedded scene of a smart camera and the like aiming at a specific application, so that most of the embedded scenes still stay at the level of fully customized design aiming at a specific network structure. This makes these schemes not supporting some emerging operators used in real traffic on the one hand, and simultaneous computation of multiple convolutional neural networks of different structures in the same scheme on the other hand, resulting in limited versatility of these schemes.

On the premise of ensuring the universality, how to utilize given hardware resources to improve the computing performance becomes a key. In extreme cases, if a special hardware calculation module and a control flow are reserved for each network connection relation and operator, a large number of modules are in an idle state in actual operation, and hardware resources are wasted; and if all operators are disassembled to the most basic operation, it is degenerated to a CPU or GPU. Therefore, how to support all operations with a small number of high-level operators and design corresponding hardware calculation modules, cache modules and control modules to complete the operations is currently difficult to achieve.

Aiming at the problem of poor universality of the neural network model processing device in the prior art, no effective solution is proposed at present.

Disclosure of Invention

The embodiment of the invention provides a processor, a data processing method thereof and an image pickup device, which are used for at least solving the technical problem of poor universality of a neural network model processing device in the prior art.

According to an aspect of an embodiment of the present invention, there is provided a processor including: the buffer device is used for acquiring data to be processed, neural network parameters and operation instructions, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, and the buffer device and the convolution operation device are configured according to the operation mode; the convolution operation device is in communication connection with the buffer device and is used for operating the data to be processed according to the neural network parameters by using an operation mode. The invention solves the technical problem of poor universality of the neural network model processing device in the prior art.

According to another aspect of the embodiment of the present invention, there is also provided a data processing method of a processor, including: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiment of the present invention, there is also provided a data processing method of a processor, including: the compiler receives data to be processed, neural network parameters and operation instructions; the compiler sends and acquires data to be processed, the neural network parameters and operation instructions to the processor, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, the caching device and the convolution operation device are configured according to the operation mode, and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiment of the present invention, there is also provided an image capturing apparatus including the above processor.

According to another aspect of the embodiment of the present invention, there is also provided a storage medium including a stored program, wherein the program controls a device in which the storage medium is located to execute the following steps when running: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

According to another aspect of the embodiment of the present invention, there is also provided a processor for running a program, wherein the program executes the following steps: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

Existing dedicated convolutional neural network accelerators do not support convolutional neural networks running multiple different structures simultaneously on the same hardware platform. In the above scheme, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for dynamic scheduling of various network tasks, so that the technical problem of poor universality of the neural network model processing device in the prior art is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:

FIG. 1 is a schematic diagram of a processor according to embodiment 1 of the present application;

FIG. 2 is a schematic diagram of an alternative processor according to embodiment 1 of the present application;

FIG. 3 is a schematic diagram of a first mid-volume arithmetic unit according to embodiment 1 of the present application;

FIG. 4a is a schematic diagram of a skip write according to embodiment 1 of the present application;

FIG. 4b is a schematic diagram of a skip and skip write combined application in accordance with embodiment 1 of the present application;

FIG. 5 shows a block diagram of the hardware architecture of a computer terminal (or mobile device) for implementing a data processing method of a processor;

FIG. 6 is a flowchart of a data processing method of a processor according to embodiment 2 of the present invention;

FIG. 7 is a schematic diagram of a compiler and neural network processor processing tasks according to embodiment 2 of the present application;

FIG. 8a is a schematic diagram of a deconvolution operation;

FIG. 8b is a schematic diagram of a convolution operation to convert deconvolution to forward convolution operation in accordance with embodiment 2 of the present application;

FIG. 8c is a schematic diagram of a method of converting a linear interpolation convolution to a convolution according to embodiment 2 of the present application;

fig. 9 is a schematic diagram of a low bit quantization according to embodiment 2 of the present application;

FIG. 10 is a flowchart of a data processing method of a processor according to embodiment 3 of the present invention;

FIG. 11 is a schematic diagram of a data processing apparatus of a processor according to embodiment 4 of the present application;

FIG. 12 is a schematic diagram of a data processing apparatus of a processor according to embodiment 5 of the present application; and

fig. 13 is a block diagram of a computer terminal according to embodiment 7 of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, partial terms or terminology appearing in describing embodiments of the present application are applicable to the following explanation:

and (3) FPGA: the Field-Programmable Gate Array Field programmable gate array is programmable hardware capable of modifying hardware functions through a programming program, is widely applied to the Field of hardware acceleration, and can also be used for function verification before the design of a customized chip.

ASIC: application Specific Integrated Circuit, the special chip for specific application, generally called AI chip, bitcoin machine, etc. belong to AISC.

Input feature map (IFM, input feature map): the input data of each layer of neural network operation, for example, the input feature map of the first layer of neural network operation is the input picture to be identified. The input feature map can be a 3-dimensional matrix, and the three dimensions are width (W), height (H) and channel (channel) numbers respectively.

Output feature map (OFM, output Feature Map): the calculation result data of each layer of neural network can also be a 3-dimensional matrix, and the three dimensions are respectively width (W), height (H) and channel (channel) number.

Filter (Filter): i.e., the neural network weights or convolution kernels, parameters required for a convolution operation in the convolution neural network. Each filter may be a 3-dimensional matrix with three dimensions, width (W), height (H) and channel (channel) numbers, respectively. The number of channels of the filter needs to be consistent with the number of channels of the input feature map, and the same layer convolution contains multiple filters. The nth filter is convolved with the data at a certain position (w, h) of the input feature map to generate pixels of the nth channel of the output feature map at the corresponding position (w, h), so that the number of filters is equal to the number of channels of the output feature map.

Deconv: deconvolution, the reverse operation of the convolution operation, can be regarded as adding 0 values between pixels in the convolved input feature map in execution, and when the convolution step (stride) of the forward convolution is s, (s-1) 0 values are added between every two pixels of the input feature map, so that the feature map of HxW size is expanded into a feature map of (sxH-s+1) x (sxW-s+1) size.

Linear interpolation convolution: similarly to deconvolution, (s-1) linear interpolation samples are inserted as the inter-pixel spacing in every 2 pixels of the input feature map, thereby expanding the HxW-sized feature map to a (sxH-s+1) x (sxW-s+1) -sized feature map.

Dimated Conv: refers to adding (s-1) 0 values as intervals between every 2 pixels of the convolved filter, thereby expanding the convolution kernel of WxH to a convolution kernel of (s x W-s+1) × (s x H-s+1).

Concat: the feature map connection generally refers to stitching a plurality of feature maps having the same width and height but not necessarily the same number of channels, in the dimension of the number of channels, for example, a feature map having a size w x h x ch1 and a feature map having a size w x h x ch2 may be stitched to a feature map having a size w x h x (ch1+ch2).

Example 1

According to an embodiment of the present invention, there is further provided an embodiment of a processor, and fig. 1 is a schematic structural diagram of a processor according to embodiment 1 of the present application, and in combination with fig. 1, the processor includes:

the caching device 10 acquires data to be processed, parameters of a neural network and operation instructions, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, and the caching device and the convolution operation device are configured according to the operation mode.

Specifically, the data to be processed is data to be subjected to neural network operation, and may be image information or the like. For example, if the processor is used for performing image recognition, the data to be processed is the input feature corresponding to the image information to be recognized, and if the processor is applied to the monitoring field, the data to be processed can be the input feature corresponding to the image information collected by the camera.

The neural network parameters can be the neural network parameters of a preset neural network model, namely filter data, and the operation instruction can be sent by a user or triggered by input data to be processed after the configuration information is set by the user.

The configuration information in the operation instruction at least comprises an operation mode. The operation mode can be set by a user or can be determined by a processor according to actual tasks. In an alternative embodiment, the configuration information may indicate the address where the buffer device reads data, the size of the buffer device, and the address where the buffer device writes data, thereby enabling the buffer device to adapt to various operation modes.

In an alternative embodiment, the processor is an FPGA module or an ASIC module.

The convolution operation device 20 is communicatively connected to the buffer unit, and is configured to perform an operation on the data to be processed according to the neural network parameters using the operation mode indicated by the operation instruction.

Specifically, the convolution operation device is connected with the buffer device, acquires data to be processed and neural network parameters from the buffer device, and processes the data to be processed according to configuration information in the operation instruction after receiving the operation instruction.

In an alternative embodiment, the convolution operation device supports operations such as deconvolution, related Conv, nonlinear interpolation convolution, and the like, so that more operations can be supported. The convolution operation can be configured according to configuration information in an operation instruction, so that various operation modes are supported.

It should be noted that the existing dedicated convolutional neural network accelerator does not support the convolutional neural network running multiple different structures on the same hardware platform at the same time. In the above scheme, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for dynamic scheduling of various network tasks. The convolution calculation modules cooperate to carry out the same convolution neural network task or carry out different convolution neural network tasks in groups, and the grouping mode can be dynamically adjusted according to the instruction. The method provides a performance optimization space for task scheduling.

Therefore, the embodiment of the application solves the technical problem that the neural network model processing device in the prior art is poor in universality.

As an alternative embodiment, the operation modes include: multiple convolution operation modules are used for simultaneously executing multiple different tasks or multiple convolution operation modules are used for jointly executing the same task.

The above scheme provides two operation modes, which can be set by a user or can be determined by task type.

In an alternative embodiment, the tasks received by the processor have priority, and if the priorities of the tasks received by the processor are the same, determining that the operation mode is that a plurality of convolution operation modules simultaneously execute a plurality of different tasks, so that the tasks received by the processor can be executed in parallel; if one of the tasks received by the processor has the highest priority, the mode that the convolution operation modules jointly execute the same task can be selected, the task with the highest priority is executed first, and then the other tasks are executed by using the mode that the convolution operation modules simultaneously execute different tasks.

In a further alternative embodiment, the calculation module can also be determined as a function of the amount of data in the task (amount of data of the data to be processed and/or of the neural network parameters). When the data amount of the task is larger than a first preset value, the operation mode can be that N (N > 1) convolution operation modules jointly execute the same task, and N can be determined according to the data amount of the task; when the data size of the task is smaller than the sum setting, the operation mode may be that a plurality of convolution operation modules simultaneously execute a plurality of different tasks.

In yet another alternative embodiment, the computing module may also be determined based on the current utilization of the processor. When the utilization rate of the processor is larger than a first preset value, the tasks can be processed in series in a mode that a plurality of convolution operation modules jointly execute the same task.

In an alternative embodiment, the processor further includes: the memory access device is used for acquiring data to be processed and operation instructions from a compiler, wherein the compiler is communicated with the neural network model processing device; the synchronous dynamic random access memory is used for storing the neural network parameters; and the buffer acquires the data to be processed and the operation instruction from the direct memory access through the bus, and acquires the neural network parameters from the synchronous dynamic random access memory through the bus.

Specifically, the memory access device is a DMA (Direct Memory Access ) device, and the DMA transfer is used to copy data from one address control to another address control, which allows hardware devices with different speeds to communicate without relying on a large amount of terminal load of the CPU. The synchronous dynamic random access memory may be DDR memory (Double Data Rate SDRAM, double rate SDRAM). The BUS (BUS) is a common information rail for each unit in the processor to transfer information, and in the above embodiment, the convolution operation device obtains the operation instruction, the neural network parameter, and the operation instruction from the buffer device, and the buffer device obtains the operation instruction, the neural network parameter, and the operation instruction from the DMA or the DDR through the BUS.

Fig. 2 is a schematic diagram of an alternative processor according to embodiment 1 of the present application, and in conjunction with fig. 2, the DMA module obtains data to be processed and an operation instruction including configuration information from the CPU, where the configuration information is stored in a local register (not shown), and other information such as parameters of the neural network is stored in off-chip storage (to external memory), for example, DDR memory, by the memory control module (Memory Controller) through BUS.

As an alternative embodiment, the buffer includes: the filter buffer is used for storing the neural network parameters; and the input image buffer is used for storing the data to be processed.

Specifically, the Filter Buffer (Filter Buffer) and the input image Buffer (IFM Buffer) are respectively in communication with the bus, so as to obtain the neural network parameters and the data to be processed through the bus.

In an alternative embodiment, as shown in connection with fig. 2, the processor reads the data to be processed, i.e., the input profile and the neural network weights (i.e., the filters), from the DDR memory into the corresponding registers via the memory control module and the bus according to the instructions.

As an alternative embodiment, the convolution operator reads the data to be processed from the input image buffer in a skip-read manner.

In the above scheme, the convolution operation device can read the data to be processed in a continuous or discontinuous mode to perform operation. The convolution operation device continuously reads the data to be processed, and the convolution operation device sequentially reads each element in the data to be processed; the convolution operation device reads the data to be processed in a discontinuous mode for representing, and the convolution operation device does not read according to the sequence of each element in the data to be processed, namely skip reading.

Specifically, the skip mode is used for indicating that when the convolution operation device reads the data to be processed from the input image buffer, the data to be processed in the input image buffer is read according to a preset interval step number.

The convolution operation device in the embodiment of the application reads the data to be processed from the image buffer in a skip reading mode, so that the data position shuffling (shuffle) is realized by directly configuring the buffer read-write position, the purpose that a 0-value interval is added into a filter is realized, which is equivalent to the purpose that the interval sampling is carried out on the input characteristic diagram and then the convolution operation is carried out, and the purpose of supporting a dialated Conv operator is achieved. The scheme converts the related Conv operator into the ordinary convolution of the discontinuous read data to be processed of the reduced convolution filter, so that the related Conv operator can be realized by using a general convolution module colleges and universities. In the scheme, a plurality of subsequent modules with different data storage sequence requirements read the same data through a discontinuous reading method, so that hardware resource overhead introduced by data shuffling is reduced.

As an alternative embodiment, the convolution operation device includes: the vector multiplication array unit comprises a plurality of vector multipliers, wherein each vector multiplier carries out operation according to input features and neural network parameters corresponding to received data to be processed, and outputs operation results.

Specifically, the convolution operation is implemented by a vector multiplication operation, and the convolution operation device performs multiplication of vectors by a vector multiplication unit included therein.

Fig. 3 is a schematic diagram of a first mid-volume operation unit according to embodiment 1 of the present application, and in combination with the one shown in fig. 3, the convolution operation module may read the input feature map from the input image buffer device in a skip-reading manner to perform an operation. Each convolution module comprises M x N vector multiplication units (VU), each vector multiplication unit is used for completing vector multiplication of an input characteristic diagram and a filter, and a calculation result is output.

As an optional embodiment, the vector multiplier is further configured to determine whether the received data to be processed and the neural network parameter are valid, and operate the input feature and the neural network weight corresponding to the received data to be processed and output an operation result when the received data to be processed and the neural network parameter are both valid.

In the above scheme, before the vector multiplier performs the multiplication operation of the vector, it is further required to determine whether the neural network parameter and the input feature data are valid, if the neural network parameter or the input feature data are invalid, the matrix multiplication operation is not continued, and if the neural network parameter and the input feature data are valid, the multiplication operation of the neural network parameter and the input feature data is performed, and the calculation result is output.

As an alternative embodiment, the vector multiplier closest to the buffer reads in the neural network weights and the data to be processed from the buffer.

Still referring to fig. 3, the vector multiplier at the edge of each buffer reads in the input feature map data or neural network parameters at the corresponding location.

As an alternative embodiment, the vector multiplier passes the current input feature data to the right vector multiplier and the neural network parameters to the down vector multiplier.

Still referring to fig. 3, after the matrix multiplication operation is performed, the vector multiplier transmits the operation result, i.e., the current input characteristic data, to the right vector multiplier and transmits the filter data to the lower vector multiplier.

Next, the procedure of the convolution operation device will be described in detail:

step a, a vector multiplication unit of each cache edge reads in input feature map data or filter data of a corresponding position;

step b, each vector multiplication unit judges whether the input characteristic data and the filter data which are currently accepted are valid or not, if so, vector multiplication calculation is completed, and a calculation result is output;

step c, each vector multiplication unit transmits the current input characteristic data to a right direction quantity multiplication unit and transmits the filter data to a lower direction quantity multiplication unit;

The loop performs a, b and c until no characteristic data is input to all vector multipliers.

As an optional embodiment, the above-mentioned caching apparatus further includes: and the output buffer device is used for buffering the operation result output by the convolution operation unit, wherein the convolution operation unit writes the operation result into the output buffer device in a skip writing mode.

Specifically, the skip-write method is used to indicate a discontinuous method when writing data into the output buffer. In an alternative embodiment, as shown in connection with fig. 2, the convolution operation unit and the output buffer device further include a Post calculation module (Post), where Post is used to complete all operations between the current convolution layer and the next convolution layer, for example: bias (bias), batch normalization (BN, batch Normalization), scale transformation (Scale), nonlinear neurons (sigmoid, reLU, etc.). The convolution operation device writes the calculation result into an output calculation graph buffer (OFM buffer) in a continuous mode or a discontinuous mode through the subsequent calculation module (Post), and data in the output calculation graph buffer is stored back into the DDR through a bus and a memory control module.

In the above scheme, the Concat operator is used for performing connection of the feature graphs, and by using a discontinuous writing output feature graph caching mode in hardware design, a plurality of convolution operation results (the result finally output from the Post module) can be directly written into a cache position considering the storage interval of the Concat, so that the Concat operation is directly completed by controlling the cache writing position without using a separate hardware unit to execute, the purpose of Concat elimination is achieved, further, the additional calculation cost is eliminated, and the calculation performance is improved.

Fig. 4a is a schematic diagram of skip writing according to embodiment 1 of the present application, if continuous writing is adopted, the OFMs of Conv1 and Conv2 … … Convn are sequentially written into the output buffer device, and if skip writing is adopted, only Conv1 and Conv2 are taken as examples, and their corresponding OFMs are written into the output buffer device according to a preset number of steps. In this example, when the preset number of steps of interval is 1 and Conv1 is written into the output buffer device, every OFM is written into a space to reserve the OFM of Conv2, thereby realizing the skip function. Therefore, the OFM written into the output buffer device directly completes Concat operation without exchanging positions.

Meanwhile, as shown in fig. 4b, fig. 4b is a schematic diagram of a skip-read and skip-write combined application according to embodiment 1 of the present application, conv1 and Conv2 are written into the output buffer device by skip-write, so as to directly implement the Concat operation, and when the OFM of Conv1 needs to be used alone as the IFM of Conv3, conv3 performs skip-read from the Concat, so that the OFM of Conv1 can be directly read without splitting the Concat. Thus, by utilizing the characteristic of non-continuous reading input feature map buffer, although a specific convolution operation result is non-continuous in the buffer, the specific convolution operation result can still be directly and independently used as the input feature map of the next convolution operation.

It should be noted that, the existing special convolutional neural network accelerator uses a special computing unit to perform data shuffling, that is, data location exchange caused by the difference of the requirements of two cascaded computing units on the data storage format. In the application, the proposed skip write or skip read cache hardware structure can directly complete the exchange of data positions during data writing, or a plurality of subsequent modules with different data storage sequence requirements can read the same data through a discontinuous reading method, so that the hardware resource cost introduced by data shuffling is reduced.

According to the scheme of the embodiment, the input characteristic diagram cache supports reading data in a discontinuous mode, and the output characteristic diagram cache supports writing the output characteristic diagram in a discontinuous mode. The processor in this embodiment may further execute the data processing method of the processor in the following embodiment 2, and in combination with the data processing method of the processor in embodiment 2, the processor makes the convolution calculation module support deconvolution and related Conv operation, so as to implement merging of multiple different operators, and improve the hardware utilization rate on the premise of ensuring universality. The data processing method of this processor is described below in example 2.

Example 2

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a data processing method for a processor, it being noted that the steps shown in the flowchart of the figures may be performed in a computer system, such as a set of computer executable instructions, and that, although a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in an order other than that shown or described herein.

The method embodiment provided in the first embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 5 shows a block diagram of a hardware structure of a computer terminal (or mobile device) for implementing a data processing method of a processor. As shown in fig. 5, the computer terminal 50 (or mobile device 50) may include one or more processors 502 (shown in the figures as 502a, 502b, … …,502 n) (the processor 502 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, etc.) a memory 504 for storing data, and a transmission module 506 for communication functions. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 5 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 50 may also include more or fewer components than shown in FIG. 5, or have a different configuration than shown in FIG. 5.

It should be noted that the one or more processors 502 and/or other data processing circuits described above may be referred to herein generally as "data processing circuits. The data processing circuit may be embodied in whole or in part in software, hardware, firmware, or any other combination. Furthermore, the data processing circuitry may be a single stand-alone processing module, or incorporated, in whole or in part, into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the present application, the data processing circuit acts as a processor control (e.g., selection of the path of the variable resistor termination to interface).

The memory 504 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method of the processor in the embodiment of the present invention, and the processor 502 executes the software programs and modules stored in the memory 504, thereby executing various functional applications and data processing, that is, implementing the vulnerability detection method of the application program. Memory 504 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 504 may further comprise memory located remotely from the processor 502, which may be connected to the computer terminal 50 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 506 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 50. In one example, the transmission device 506 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 506 may be a Radio Frequency (RF) module, which is used to communicate with the internet wirelessly.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

It should be noted here that in some alternative embodiments, the computer device (or mobile device) shown in fig. 5 described above may include hardware elements (including circuitry), software elements (including computer code stored on a computer-readable medium), or a combination of both hardware and software elements. It should be noted that fig. 5 is only one example of a specific example, and is intended to illustrate the types of components that may be present in the computer device (or mobile device) described above.

In the above-described operating environment, the present application provides a data processing method of a processor as shown in fig. 6. Fig. 6 is a flowchart of a data processing method of a processor according to embodiment 2 of the present invention.

In step S61, the processor acquires data to be processed, parameters of the neural network, and an operation instruction, where the operation instruction includes configuration information, the configuration information is used to determine an operation mode, and the buffer device and the convolution operation device are configured according to the operation mode.

Specifically, the processor is used for performing operation of the neural network model, and includes a buffer device, and the buffer device in the processor terminates to obtain the data to be processed, the neural network parameters and the operation instruction.

The data to be processed is data to be subjected to neural network operation, and may be image information or the like. For example, if the processor is used for performing image recognition, the data to be processed is the input feature corresponding to the image information to be recognized, and if the processor is applied to the monitoring field, the data to be processed can be the input feature corresponding to the image information collected by the camera.

In step S63, the processor uses the operation mode to operate on the data to be processed according to the neural network parameters.

It should be noted that the existing dedicated convolutional neural network accelerator does not support the convolutional neural network running multiple different structures on the same hardware platform at the same time. In the above scheme, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for dynamic scheduling of various network tasks.

In an alternative embodiment, the operation mode includes: multiple convolution operation modules are used for simultaneously executing multiple different tasks or multiple convolution operation modules are used for jointly executing the same task.

In an alternative embodiment, the processor comprises an input image buffer and a convolution operation device, wherein the convolution operation device reads the data to be processed from the buffer in a skip mode.

In the above scheme, the convolution operation device can read the data to be processed in a continuous or discontinuous mode to perform operation. The convolution operation device continuously reads the data to be processed for representation, and sequentially reads each element in the data to be processed according to the sequence of each element in the data to be processed; the convolution operation device reads the data to be processed in a discontinuous mode for representing, and the convolution operation device does not read according to the sequence of each element in the data to be processed, namely skip reading.

Specifically, the skip mode is used for indicating that when the convolution operation device reads the data to be processed from the input image buffer, the data to be processed in the input image buffer is read according to the preset skip step number.

The convolution operation device in the embodiment of the application reads the data to be processed from the image buffer in a skip reading mode, so that the data position shuffling (shuffle) is realized by directly configuring the buffer read-write position, the purpose that a 0-value interval is added into a filter is realized, which is equivalent to the purpose that the interval sampling is carried out on the input characteristic diagram and then the convolution operation is carried out, and the purpose of supporting a dialated Conv operator is achieved. The scheme converts the related Conv operator into the ordinary convolution of the discontinuous read data to be processed of the reduced convolution filter, so that the related Conv operator can be realized by using a general convolution module colleges and universities.

In an alternative embodiment, the neural network model further includes an output buffer, where the convolution operation device writes the operation result to the output buffer in a skip-write manner.

In an alternative embodiment, the processor acquires data to be processed, neural network parameters, including: the processor obtains the split data to be processed, wherein the compiler is used for executing one or more of the following: under the condition that the operation instruction carries out deconvolution operation on the data to be processed, deconvolution operation in the operation instruction is converted into a plurality of convolution operations, and the data to be processed is split into data corresponding to the convolution operations; under the condition that the operation instruction is a convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the nonlinear interpolation convolution operation in the operation instruction into deconvolution operation, converts the deconvolution operation obtained by conversion into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; under the condition that the operation instruction is full-connection operation of the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

In particular, the compiler is a processor coupled to a neural network processor, which in an alternative embodiment may be a CPU, which may be a FGPA module or an ASIC module in communication with the CPU.

Fig. 7 is a schematic diagram of a Compiler and a neural network processor processing task according to embodiment 2 of the present application, and in conjunction with fig. 7, the Compiler runs in a Software layer (Software) for compiling and optimizing a neural network model to be executed, and at runtime (run time), according to instructions and parameters compiled by the Compiler, data to be processed and operation instructions are sent to a processor (CNN processor) of a Hardware layer (Hardware) through scheduling, so that the processor is driven to complete calculation and return a result to an application, and normal operation of Hardware is maintained.

The above scheme will be described below on the basis of fig. 7. In the above scheme, the compiler splits a large convolution operation, where the large convolution operation may include: deconvolution operations, convolution operations of nonlinear interpolation, and full join operations. The large convolution operation is split into simple convolution operation, so that the aim of saving a large amount of operation cost can be achieved.

Fig. 8a is a schematic diagram of a deconvolution operation, in this example, using a convolution set of 3*3, deconvolution with a step size of 2 (one 0 for each two pixels), for 6*6 input feature images, when calculated in a conventional manner, the multiplication operation for non-0 input feature pixels accounts for 9/36=25% of the total multiplication operation, assuming that the 4 operations shown in fig. 8a occur equally frequently (actually vary slightly according to the picture size, but have similar frequencies). The convolution operation can be regarded as a vector multiplication, so that the calculation result of each pixel is accumulated, and thus the remaining 75% of the "multiplication 0" operations are all invalid operations. The proportion of invalid computations increases as the ratio of step size to convolution filter size increases, wasting a lot of computing resources and bandwidth.

According to the scheme, the deconvolution operation is converted into a plurality of forward convolutions to operate, fig. 8b is a schematic diagram of converting the deconvolution operation into the forward convolutions operation according to embodiment 2 of the present application, which converts the condition of 3x3 deconvolution shown in fig. 8a at 4 different positions into 4 forward convolutions operation shown in fig. 8b, so that all operations are effective operations, and extra data shuffling cost is not introduced by utilizing skip-write caching characteristics, so that operation cost is greatly reduced, and meanwhile, by utilizing the innovative characteristics of the discontinuous writing output feature diagram of hardware, calculation results at four different positions can be directly written into respective corresponding caching positions without an extra data summarizing step.

Similarly, the linear interpolation convolution conversion method and the related Conv conversion method provided by the scheme can also delete all invalid operations, so that the calculation speed is improved by utilizing limited hardware resources while the universality is ensured.

The operator conversion can multiplex the processor to finish various different calculation tasks as efficiently as possible on the premise of ensuring the universality of algorithm support, thereby improving the hardware utilization rate and further improving the calculation performance of the processor.

In an alternative embodiment, the processor obtains the neural network parameters, including: the processor acquires the split neural network parameters, wherein the compiler detects the data size of the neural network parameters, and splits the neural network parameters if the size of the neural network parameters exceeds a second preset value.

In the scheme, the large-size convolution filter is split into a plurality of filters by splitting the neural network parameters, and the large-size input feature map is split into a plurality of feature maps, so that the large-size convolution operation which cannot be supported by one processor is split into a plurality of small convolution operations.

In an alternative embodiment, the operation instruction further includes quantization information, and the method further includes: the neural network model processor compresses the data to be processed and the neural network parameters according to the quantized information.

Specifically, the quantization information is used for carrying out low-bit quantization on the data to be processed and the neural network layer parameters, so that the purpose of compressing the data is achieved, and the cost required in operation is further reduced.

Fig. 9 is a schematic diagram of low-bit quantization according to embodiment 2 of the present application, and the quantization process is divided into three stages (stage 1, stage2, and stage 3) as shown in fig. 9. stage1: firstly, fixing a feature map (floating point 32 bit), and fixing the Weights of the filter; stage2: then fixing the quantized filter, and fixing the action SF (Activation Scaling Factor) of the feature map and the feature map quantization interval; stage3: and finally, carrying out fine-tune on the quantized filter and the feature map together.

In the fine-tuning process after quantization, the quantization range is also used as a training parameter, so that the number of bits after quantization is further reduced on the premise of ensuring the accuracy of an algorithm.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present invention is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present invention. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present invention.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present invention.

Example 3

There is further provided, according to an embodiment of the present invention, an embodiment of a data processing method of a processor, and fig. 10 is a flowchart of a data processing method of a processor according to embodiment 3 of the present invention, and in combination with fig. 10, the method includes:

in step S101, the compiler receives the data to be processed, the neural network parameters and the operation instruction.

Specifically, the compiler is configured to receive data to be processed input by a user, and generate a neural network parameter and an operation instruction according to a program written by the user.

Step S103, the compiler sends and acquires data to be processed, neural network parameters and operation instructions to the processor, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, the buffer memory device and the convolution operation device are configured according to the operation mode, and the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

The convolution operation device is connected with the buffer device, acquires the data to be processed and the neural network parameters from the buffer device, and processes the data to be processed according to the configuration information in the operation instruction after receiving the operation instruction.

As an alternative embodiment, before the compiler sends the acquisition pending data, the neural network parameters, and the operation instructions to the processor, the method further includes one or more of: under the condition that the operation instruction carries out deconvolution operation on the data to be processed, the compiler converts deconvolution operation in the operation instruction into a plurality of convolution operations; under the condition that the operation instruction is a convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the nonlinear interpolation convolution operation in the operation instruction into deconvolution operation, and converts the deconvolution operation obtained by conversion into a plurality of convolution operations; and under the condition that the operation instruction is the full-connection operation of the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation.

The above scheme will be described below on the basis of fig. 7. In the above scheme, the compiler splits a large convolution operation, where the large convolution operation may include: deconvolution operations, convolution operations of nonlinear interpolation, and full join operations. The large convolution operation is split into simple convolution operation, so that the purpose of greatly saving operation cost can be achieved.

Fig. 8a is a schematic diagram of a deconvolution operation, in this example, using a set of volumes 3*3 to deconvolve 6*6 input feature images, about 75% of the computation is wasted on multiplying the filter pixels by 0 using conventional computation, and a large amount of computational resources are wasted. According to the scheme, the deconvolution operation is converted into a plurality of forward convolutions for operation, fig. 8b is a schematic diagram of converting deconvolution into forward convolutions operation according to embodiment 2 of the present application, which converts the condition of 3x3 deconvolution shown in fig. 8a at 4 different positions into 4 forward convolutions operation of fig. 8b, so that the operation cost is greatly reduced, and meanwhile, by utilizing the innovative characteristics of the discontinuous writing-in output characteristic diagram of hardware, the calculation results at four different positions can be directly written into the respective corresponding cache positions without additional data summarizing steps.

The above scheme also shifts the linear interpolation part of the operation into the filter, thereby converting the linear interpolation convolution into deconvolution in a way that enlarges the filter size. Fig. 8c is a schematic diagram of converting a linear interpolation convolution into a convolution according to embodiment 2 of the present application, and in combination with the 3*3 linear interpolation convolution shown in fig. 8c, for example, the linear interpolation convolution may be converted into a deconvolution of 5×5 first, and further converted into a plurality of forward convolutions, so that the forward convolution operation device performs a linear interpolation convolution operation, and thus the processor can support the operation of the linear interpolation convolution.

As an alternative embodiment, the processor obtains the neural network parameters, including: the compiler detects the data size of the neural network parameters; if the size of the neural network parameter exceeds the preset value, splitting the neural network parameter, wherein the processor acquires the split neural network parameter.

Example 4

According to an embodiment of the present invention, there is further provided a data processing apparatus of a processor for implementing the data processing method of the processor in the above embodiment 2, and fig. 11 is a schematic diagram of a data processing apparatus of a processor according to embodiment 4 of the present application, and as shown in fig. 11, the apparatus 1100 includes:

the obtaining module 1102 is configured to obtain data to be processed, parameters of the neural network, and an operation instruction by using the processor, where the operation instruction includes configuration information, and the configuration information is used to determine an operation mode, and configure the buffer device and the convolution operation device according to the operation mode.

The operation module 1104 is configured to operate the data to be processed according to the neural network parameters by using the operation mode.

Here, it should be noted that the above-mentioned obtaining module 1102 and the operation module 1104 correspond to steps S61 to S63 in embodiment 2, and the two modules are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the processor comprises an input image buffer and a convolution operation device, wherein the convolution operation device reads the data to be processed from the buffer in a skip-read manner.

As an alternative embodiment, the neural network model further includes an output buffer, wherein the convolution operation device writes the operation result to the output buffer in a skip-write manner.

As an alternative embodiment, the obtaining module includes: the first acquisition submodule is used for acquiring the split data to be processed and the converted operation instruction by the processor, wherein the compiler is used for executing one or more of the following: under the condition that the operation instruction carries out deconvolution operation on the data to be processed, deconvolution operation in the operation instruction is converted into a plurality of convolution operations, and the data to be processed is split into data corresponding to the convolution operations; under the condition that the operation instruction is a convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the nonlinear interpolation convolution operation in the operation instruction into deconvolution operation, converts the deconvolution operation obtained by conversion into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; under the condition that the operation instruction is full-connection operation of the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

As an alternative embodiment, the obtaining module includes: and the second acquisition sub-module is used for acquiring the split neural network parameters by the processor, wherein the compiler detects the data size of the neural network parameters, and splits the neural network parameters if the size of the neural network parameters exceeds a second preset value.

As an alternative embodiment, the arithmetic instruction further includes quantization information, and the apparatus further includes: and the compression module is used for compressing the data to be processed and the neural network parameters according to the quantized information by the neural network model processor.

Example 5

According to an embodiment of the present invention, there is further provided a data processing apparatus of a processor for implementing the data processing method of the processor in the above embodiment 3, and fig. 12 is a schematic diagram of the data processing apparatus of the processor according to embodiment 5 of the present application, and as shown in fig. 12, the apparatus 1200 includes:

the receiving module 1202 is configured to receive data to be processed, parameters of a neural network, and an operation instruction by using a compiler.

The sending module 1204 is configured to send, to the processor, an instruction for obtaining data to be processed, parameters of the neural network, and an operation instruction, where the operation instruction includes configuration information, the configuration information is used to determine an operation mode, and configure the buffer device and the convolution operation device according to the operation mode, and the processor uses the operation mode to perform an operation on the data to be processed according to the parameters of the neural network.

Here, it should be noted that the above-mentioned receiving module 1202 and the sending module 1204 correspond to steps S101 to S103 in embodiment 2, and the two modules are the same as the examples and the application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above-mentioned embodiment one. It should be noted that the above-described module may be operated as a part of the apparatus in the computer terminal 10 provided in the first embodiment.

As an alternative embodiment, the apparatus further comprises one or more of the following:

the first conversion module is used for converting deconvolution operation in the operation instruction into a plurality of convolution operations under the condition that the operation instruction is deconvolution operation of the data to be processed before the compiler sends the data to be processed, the neural network parameters and the operation instruction to the processor;

the second conversion module is used for converting the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation and converting the deconvolution operation obtained by conversion into a plurality of convolution operations under the condition that the operation instruction is the convolution operation of the nonlinear interpolation of the data to be processed;

and the third conversion module is used for converting the full-connection operation in the operation instruction into convolution operation under the condition that the operation instruction is the full-connection operation of the data to be processed.

As an alternative embodiment, the receiving module includes: the detection module is used for detecting the data size of the neural network parameters by the compiler; the splitting module is used for splitting the neural network parameters if the sizes of the neural network parameters exceed a preset value, wherein the processor acquires the split neural network parameters.

Example 6

An embodiment of the present invention can provide an image pickup apparatus including the processor described in embodiment 1.

The existing convolutional neural network acceleration scheme based on the FPGA and the ASIC is mainly oriented to an embedded scene of a smart camera and the like aiming at a specific application, so that the scheme is mostly still at the level of fully customizing design aiming at a specific network structure. Therefore, the schemes cannot support some emerging operators used in actual services on one hand, and cannot support simultaneous computation of a plurality of convolutional neural networks with different structures in the same scheme on the other hand, and finally the universality of the schemes is limited. The image pickup device in the embodiment of the application, including the processor in embodiment 1, has higher universality, and particularly has higher advantages for the scene that the data of different sensors need to be processed by simultaneously operating a plurality of different convolutional neural networks in the cloud server.

Example 7

Embodiments of the present invention may provide a computer terminal, which may be any one of a group of computer terminals. Alternatively, in the present embodiment, the above-described computer terminal may be replaced with a terminal device such as a mobile terminal.

Alternatively, in this embodiment, the above-mentioned computer terminal may be located in at least one network device among a plurality of network devices of the computer network.

In this embodiment, the computer terminal may execute the program code of the following steps in the vulnerability detection method of the application program: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

Alternatively, fig. 13 is a block diagram of a computer terminal according to embodiment 7 of the present invention. As shown in fig. 13, the computer terminal a may include: one or more (only one is shown) processors 1302, memory 1304, and external device 1306

The memory may be used to store software programs and modules, such as program instructions/modules corresponding to the security vulnerability detection method and device in the embodiments of the present invention, and the processor executes various functional applications and data processing by running the software programs and modules stored in the memory, thereby implementing the above-mentioned method for detecting a system vulnerability attack. The memory may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory may further include memory remotely located with respect to the processor, which may be connected to terminal a through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor may call the information and the application program stored in the memory through the transmission device to perform the following steps: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

Optionally, the above processor may further execute program code for: the operation modes include: multiple convolution operation modules are used for simultaneously executing multiple different tasks or multiple convolution operation modules are used for jointly executing the same task.

Optionally, the above processor may further execute program code for: the processor comprises an input image caching device and a convolution operation device, wherein the convolution operation device reads data to be processed from the caching device in a skip mode.

Optionally, the above processor may further execute program code for: the neural network model also comprises an output buffer device, wherein the convolution operation device writes the operation result into the output buffer device in a skip writing mode.

Optionally, the above processor may further execute program code for: the processor acquires the split data to be processed and the converted operation instruction, wherein the compiler is used for executing one or more of the following: under the condition that the operation instruction carries out deconvolution operation on the data to be processed, deconvolution operation in the operation instruction is converted into a plurality of convolution operations, and the data to be processed is split into data corresponding to the convolution operations; under the condition that the operation instruction is a convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the nonlinear interpolation convolution operation in the operation instruction into deconvolution operation, converts the deconvolution operation obtained by conversion into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation; under the condition that the operation instruction is full-connection operation of the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

Optionally, the above processor may further execute program code for: the processor acquires the split neural network parameters, wherein the compiler detects the data size of the neural network parameters, and splits the neural network parameters if the size of the neural network parameters exceeds a second preset value.

Optionally, the above processor may further execute program code for: the operation instruction also comprises quantization information, and the method further comprises the following steps: the neural network model processor compresses the data to be processed and the neural network parameters according to the quantized information.

By adopting the embodiment of the invention, a data processing method of a processor is provided. It should be noted that the existing dedicated convolutional neural network accelerator does not support the convolutional neural network running multiple different structures on the same hardware platform at the same time. In the above scheme, the data cache can be configured through the instruction, so that the cooperation relationship between the computing units can be dynamically configured, and a great space is provided for dynamic scheduling of various network tasks. Therefore, the embodiment of the application solves the technical problem that the neural network model processing device in the prior art is poor in universality.

It will be appreciated by those skilled in the art that the configuration shown in fig. 13 is only illustrative, and the computer terminal may be a smart phone (such as an Android phone, an iOS phone, etc.), a tablet computer, a palm-phone computer, a mobile internet device (Mobile Internet Devices, MID), a PAD, etc. Fig. 13 is not limited to the structure of the electronic device. For example, the computer terminal 30 may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in fig. 13, or have a different configuration than shown in fig. 13.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of the above embodiments may be implemented by a program for instructing a terminal device to execute in association with hardware, the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

Example 8

The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be used to store program codes executed by the data processing method of the processor provided in the first embodiment.

Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: the method comprises the steps that a processor acquires data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, and a caching device and a convolution operation device are configured according to the operation mode; and the processor uses an operation mode to operate the data to be processed according to the neural network parameters.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A processor, comprising:

the device comprises a buffer device, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, the buffer device and the convolution operation device are configured according to the operation mode, and the operation mode comprises: a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task;

the convolution operation device is in communication connection with the buffer device and is used for operating the data to be processed according to the neural network parameters by using the operation mode.

2. The processor of claim 1, wherein the processor further comprises:

memory access means for obtaining the data to be processed and the operation instruction from a compiler, wherein the compiler is in communication with the processor;

The synchronous dynamic random access memory is used for storing the neural network parameters;

and the buffer device acquires the data to be processed and the operation instruction from the memory access device through the bus, and acquires the neural network parameter from the synchronous dynamic random access memory through the bus.

3. The processor of claim 1, wherein the caching means comprises:

the filter caching device is used for storing the neural network parameters;

and the input image caching device is used for storing the data to be processed.

4. A processor according to claim 3, wherein the convolution operation means reads the data to be processed from the input image buffer means in a skip-read manner.

5. The processor of claim 1, wherein the convolution operation means comprises:

the vector multiplication array comprises a plurality of vector multipliers, wherein each vector multiplier carries out operation according to the received input characteristics and neural network parameters corresponding to the data to be processed, and outputs operation results.

6. The processor of claim 5, wherein the vector multiplier is further configured to determine whether the received data to be processed and the neural network parameter are valid, and if the received data to be processed and the neural network parameter are both valid, perform an operation on input features and a neural network weight corresponding to the received data to be processed, and output an operation result.

7. The processor of claim 5, wherein a vector multiplier closest to the caching device reads in the neural network parameters and input features corresponding to the data to be processed from the caching device.

8. The processor of claim 5, wherein the vector multiplier passes current input feature data to a right vector multiplier and neural network parameters to a down vector multiplier.

9. The processor of claim 1, wherein the caching apparatus further comprises:

and the output buffer device is used for buffering the operation result output by the convolution operation device, wherein the convolution operation device writes the operation result into the output buffer device in a skip writing mode.

10. The processor of claim 1, wherein the processor is an FPGA or an ASIC.

11. A data processing method of a processor, comprising:

the method comprises the steps that a processor acquires data to be processed, neural network parameters and operation instructions sent by a caching device, wherein the operation instructions comprise configuration information, the configuration information is used for determining an operation mode, the caching device and a convolution operation device are configured according to the operation mode, and the operation mode comprises the following steps: a plurality of convolution operation modules are used for simultaneously executing a plurality of different tasks, or a plurality of convolution operation modules are used for jointly executing the same task;

And the processor uses the operation mode to operate the data to be processed according to the neural network parameters.

12. The method of claim 11, wherein the processor comprises an input image caching device and a convolution operation device, wherein the convolution operation device reads the data to be processed from the caching device in a skip-read manner.

13. The method of claim 11, wherein the caching device further comprises an output caching device, wherein the convolution operation device writes the operation result to the output caching device in a skip-write manner.

14. The method of claim 11, wherein the processor obtains data to be processed and arithmetic instructions, comprising:

the processor acquires the split data to be processed and the converted operation instruction, wherein the compiler is used for executing one or more of the following:

under the condition that the operation instruction is deconvolution operation of the data to be processed, converting the deconvolution operation in the operation instruction into a plurality of convolution operations, and splitting the data to be processed into data corresponding to the convolution operations;

under the condition that the operation instruction is a convolution operation for carrying out nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into deconvolution operation, converts the converted deconvolution operation into a plurality of convolution operations, and splits the data to be processed into data corresponding to the convolution operation;

And under the condition that the operation instruction is full-connection operation on the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation and splits the data to be processed into data corresponding to the convolution operation.

15. The method of claim 11, wherein the processor obtaining neural network parameters comprises:

the processor acquires the split neural network parameters, wherein a compiler detects the data size of the neural network parameters, and if the size of the neural network parameters exceeds a second preset value, the neural network parameters are split.

16. The method of claim 11, wherein the arithmetic instruction further includes quantization information therein, the method further comprising: and the processor compresses the data to be processed and the neural network parameters according to the quantized information.

17. A data processing method of a processor, comprising:

the compiler receives data to be processed, neural network parameters and operation instructions sent by the caching device;

the compiler sends to a processor to obtain data to be processed, a neural network parameter and an operation instruction, wherein the operation instruction comprises configuration information, the configuration information is used for determining an operation mode, the buffer memory device and the convolution operation device are configured according to the operation mode, the processor uses the operation mode to operate the data to be processed according to the neural network parameter, and the operation mode comprises: multiple convolution operation modules are used for simultaneously executing multiple different tasks or multiple convolution operation modules are used for jointly executing the same task.

18. The method of claim 17, wherein before the compiler sends the fetch pending data, neural network parameters, and operation instructions to a processor, the method further comprises one or more of:

the compiler converts the deconvolution operation in the operation instruction into a plurality of convolution operations under the condition that the operation instruction is deconvolution operation on the data to be processed;

when the operation instruction is a convolution operation for performing nonlinear interpolation on the data to be processed, the compiler converts the convolution operation of the nonlinear interpolation in the operation instruction into a deconvolution operation, and converts the deconvolution operation obtained by conversion into a plurality of convolution operations;

and under the condition that the operation instruction is full-connection operation on the data to be processed, the compiler converts the full-connection operation in the operation instruction into convolution operation.

19. The method of claim 17, wherein the processor obtaining neural network parameters comprises:

a compiler detects the data size of the neural network parameters;

and splitting the neural network parameters if the sizes of the neural network parameters exceed a preset value, wherein the processor acquires the split neural network parameters.

20. A storage medium comprising a stored program, wherein the program, when run, controls a device on which the storage medium resides to perform the steps of:

21. A processor for running a program, wherein the program when run performs the steps of:

22. An image pickup apparatus comprising the processor of claim 1.