CN111325327B

CN111325327B - Universal convolution neural network operation architecture based on embedded platform and use method

Info

Publication number: CN111325327B
Application number: CN202010150285.2A
Authority: CN
Inventors: 曾小华; 魏新; 王正伟; 刘志刚
Original assignee: Sichuan Jiuzhou Electric Group Co Ltd
Current assignee: Sichuan Jiuzhou Electric Group Co Ltd
Priority date: 2020-03-06
Filing date: 2020-03-06
Publication date: 2022-03-08
Anticipated expiration: 2040-03-06
Also published as: CN111325327A

Abstract

The invention discloses a general convolutional neural network operation architecture based on an embedded platform and a using method thereof, wherein the operation architecture comprises a hardware part and a software part; the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other; the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA. The configurable independent and parallel convolutional neural network operation modules are mounted in the FPGA, so that the software and hardware can be freely cut, and convolutional neural networks with different complexity can be quickly realized on embedded platforms with different resource quantities.

Description

Universal convolution neural network operation architecture based on embedded platform and use method

Technical Field

The invention relates to a general convolutional neural network operation architecture based on an embedded platform and a using method thereof.

Background

Convolutional Neural Networks (CNN) are an efficient Neural network identification method. The CNN structure mainly comprises a convolution layer, a nonlinear activation layer, a pooling layer, a full-connection layer and an output layer. Deep learning using CNNs is playing an important role in more and more fields, such as recognition of airplanes, tanks, etc. in military, and recognition of audio, video, images, etc. in civilian use. General purpose computing platforms (CPU, GPU) cannot achieve a good balance of performance, power consumption, volume in an embedded environment, however, to apply the research results of the convolutional neural network to various industries in life, it is necessary to implement the convolutional neural network in the embedded platform. The most important characteristic of the embedded platform is that the software and hardware can be cut, and the resource sizes of different platforms are different. Because the CNN has the characteristics of general function modules and special function, the main difference between convolutional neural networks with different functions is the difference between the network structures and the weight parameters, most of the existing FPGA acceleration platforms are only suitable for fixed convolutional neural networks, and cannot quickly realize network reconstruction, and if the function modules are redesigned for each convolutional network, the difficulty and the time cost in the development process can be greatly increased. If a neural network operation architecture which can be cut by software and hardware and can rapidly realize different complexities in different embedded platforms can be designed, the landing application pace of the neural network technology can be greatly accelerated.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: aiming at the existing problems, a general convolutional neural network operation architecture based on an embedded platform and a using method are provided.

The invention adopts a general convolutional neural network operation architecture based on an embedded platform, which comprises a hardware part and a software part;

the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other;

the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA.

Furthermore, the embedded software accesses the virtual address mapped by the physical address of the IP core of the FPGA on the operating system layer to realize the access and scheduling of the convolutional neural network operation module.

Furthermore, each convolution neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through buses, and each functional module is configured and dispatched by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; and the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit.

Furthermore, a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module and an output module in the functional modules are composed of a state machine, a configuration module, a data reading module, a data writing module, an operation processing module and a response module;

the state machine is used for skipping the state of the functional module;

the response module is used for responding the embedded software to the state query of the functional module;

the configuration module is connected with the data reading module, the data writing module and the operation processing module and is used for the embedded software to perform parameter configuration on the functional module;

the data reading module is used for reading data from the storage unit;

the input end of the operation processing module is connected with the data reading module and is used for performing operation processing on the data read by the data reading module;

and the data writing module is connected with the output end of the operation processing module and is used for writing the operation processing result of the operation processing module into the storage unit.

Preferably, the storage unit includes:

the read-only data storage unit is used for reading data by the data reading module of each functional module;

and the write-only data storage unit is used for writing data into the data writing module of each functional module.

Further, each functional module has 5 operating states, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein the content of the first and second substances,

the initial state, the configuration state, the processing state and the ending state form a cycle, which indicates that the 4 states can only appear circularly, and each time has one and only one running state;

the response state is independent of the other four states, and the response state may occur simultaneously with any of the other four operating states.

Further, the content of the configuration of each functional module through the embedded software is as follows:

the configurable contents of the convolution module include: data input address, convolution kernel weight parameter address, data output address, convolution kernel size, input data size, whether to add bias, whether to enable internal nonlinear operation and the direction of reading and writing the storage unit;

the non-linear activation module configurable contents include: data input address, data output address, activation function selection and direction of reading and writing the storage unit;

the pooling module configurable content includes: selecting and reading and writing the direction of the storage unit by a data input address, a data output address, a pooling size and a pooling mode;

the configurable contents of the accumulation module include: data 1 input address, data 2 input address, data 3 input address, data output address, whether to enable internal nonlinear operation and the direction of reading and writing the memory cell;

the fully connected module configurable content includes: data input address, parameter input address, input layer neuron number, output layer neuron number, data output address, whether to add bias, whether to enable internal nonlinear operation, and the direction of reading and writing the memory cell.

The invention also provides a using method of the general convolutional neural network operation architecture based on the embedded platform, which comprises the following steps:

(1) determining the number of convolution neural network operation modules according to the designed neural network structure and the size of the used embedded platform resources;

(2) determining the type and the number of functional modules in the convolutional neural network operation module according to the designed neural network structure and the size of the used embedded platform resources;

(3) calculating the parameter quantity of each operation processing process according to the designed neural network structure, and dividing the storage space;

(4) storing a file of the parameters of the trained neural network structure, and copying the file to an embedded software directory to be used as a weight parameter file of an embedded platform;

(5) and writing an application program of embedded software for scheduling and configuring each functional module in the convolutional neural network operation module according to the designed neural network structure.

When in use, the operation flow of the embedded software is as follows:

(1) power-on initialization;

(2) the embedded software applies for virtual addresses for each functional module of the convolutional neural network operational module in the hardware portion,

(2) writing the weight into a designated storage space;

(3) configuring fixed parameters of each functional module in the operation process;

(4) calling according to the designed sequence of each functional module, configuring parameters to be modified before calling, and entering a processing process after completing configuration;

(5) in the processing process of the functional module, parameters needing to be modified of the next functional module to be called are configured, whether the last functional module is completed or not is inquired after the configuration is completed, the current functional module is started after the completion of the configuration, the next class pushing is carried out until all operations of the convolutional neural network are completed, and finally, the operation result is read out and processed.

In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:

the invention realizes the rapid reconstruction of the network by using the embedded platform through the design of the operation architecture under the conditions of low power consumption and small volume, and simultaneously, the configurable independent and parallel convolutional neural network operation modules are mounted in the FPGA to realize the free cutting of software and hardware, so that the convolutional neural networks with different complexity can be rapidly realized on the embedded platforms with different resource quantities.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic structural diagram of a general convolutional neural network operation architecture based on an embedded platform according to the present invention.

Fig. 2 is a schematic structural diagram of the convolutional neural network operation module according to the present invention.

Fig. 3 is a schematic structural diagram of functional modules of the convolutional neural network operation module according to the present invention.

Fig. 4 is a schematic diagram of state transition of functional modules of the convolutional neural network operation module according to the present invention.

FIG. 5 is a diagram illustrating a configuration register structure of the convolution module according to the present invention.

FIG. 6 is a flow chart of a method for using the embedded platform-based general convolutional neural network operation architecture according to the present invention.

FIG. 7 is a flowchart illustrating the operation of embedded software when the general convolutional neural network architecture based on an embedded platform is used.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

The features and properties of the present invention are described in further detail below with reference to examples.

The general convolutional neural network operation architecture based on the embedded platform provided by the embodiment comprises a hardware part and a software part;

the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA. Specifically, the embedded software accesses a virtual address mapped by the physical address of the IP core of the FPGA on the operating system layer to realize the access and scheduling of the convolutional neural network operation module.

As shown in fig. 1, the peripheral hardware circuit includes DDR, USB, SD card, serial port, network port, and the like. The HPS and the FPGA communicate through an AXI bus, an operating system and an application program of embedded software are placed in the SD card, the HPS loads the operating system and the application program of the embedded software from the SD card into the DDR to run in the running process, and data interaction is carried out with the outside through a USB, a serial port, a network port and the like. The FPGA comprises a plurality of convolutional neural network operation modules, and each convolutional neural network operation module is independent and does not influence each other.

As shown in fig. 2, each convolutional neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through a bus, and each functional module is configured and scheduled by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; and the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit. Specifically, the method comprises the following steps:

the convolution module is used for convolution operation;

the nonlinear activation module is used for nonlinear activation operation;

the pooling module is used for pooling operation;

the accumulation module is used for carrying out accumulation operation on input data;

the full-connection module is used for full-connection operation;

the multi-path selector module is used for connecting each functional module with the storage unit in a time-sharing manner;

the storage unit is used for data interaction of each functional module and can store input data, weight parameter data, process data, result data and the like.

For convenience of illustration, the memory unit in this embodiment employs two dual-port RAMs: RAM1 and RAM2, the corresponding multiplexer module includes multiplexer 1 and multiplexer 2, each function module shares two memory cells.

As shown in fig. 3, the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module in the functional module are composed of a state machine, a configuration module, a data reading module, a data writing module, an operation processing module and a response module;

a. the state machine is used for skipping the state of the functional module; as shown in fig. 4, there are 5 operating states for each functional module, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein, the initial state, the configuration state, the processing state and the ending state form a cycle, which indicates that the 4 states can only appear circularly, and each time has one and only one operation state; the response state is independent of the other four states, and the response state may occur simultaneously with any of the other four operating states. Specifically, the functional module enters an initial state after power-on initialization is completed, the module enters a configuration state when the embedded software configures the embedded software, enters a processing state after configuration is completed, enters an ending state after processing is completed, then returns to the initial state, enters a next cycle, and can enter a response state at any time in the cycle process without affecting the operation of other operation states of the functional module.

b. The response module is used for responding the embedded software to the state inquiry of the functional module and corresponding to the response state of the functional module;

c. the configuration module is connected with the data reading module, the data writing module and the operation processing module and is used for parameter configuration of the functional modules by the embedded software, and each functional module can perform parameter configuration in the operation process; the content of each functional module configured by the embedded software is as follows:

the configurable contents of the convolution module include: data input address, convolution kernel weight parameter address, data output address, convolution kernel size, input data size, whether to add bias, whether to enable internal nonlinear operation and the direction of reading and writing the storage unit; in this embodiment, the convolution operation with a convolution kernel size of 2-11 may be implemented, and the hardware parameters may also be modified so that the configured convolution kernel size is larger or smaller than 2-11.

the pooling module configurable content includes: selecting and reading and writing the direction of the storage unit by a data input address, a data output address, a pooling size and a pooling mode; in this embodiment, the pooling operation with a pooling window of 2 to 32 may be implemented, and the pooling operation of the maximum value or the average value may also be implemented by embedded software configuration.

From the above, the nonlinear operation is integrated inside the convolution module, the accumulation module and the full-connection module, and the nonlinear operation inside the modules can be started as required, so as to save hardware resources and shorten the time of operation processing.

As shown in fig. 5, the convolution module of this embodiment includes 32 8-bit configuration registers, where the configuration registers correspond to different configuration contents, and some configuration registers are reserved and can be used for subsequent upgrade. Other modules are similar to the convolution module and also comprise configuration registers, and different working modes can be realized by setting parameters of the configuration registers.

d. The data reading module is used for reading data from the storage unit; it should be noted that, for convenience of distinction and data processing;

e. the input end of the operation processing module is connected with the data reading module and is used for performing operation processing on the data read by the data reading module;

f. and the data writing module is connected with the output end of the operation processing module and is used for writing the operation processing result of the operation processing module into the storage unit.

Preferably, the memory cell is divided into: a read-only data storage unit and a write-only data storage unit; wherein:

When 2 memory cells are used in this embodiment, the RAM1 may be used as a read-only data memory cell and the RAM2 may be used as a write-only data memory cell. Thus, the data in the RAM1 is read by the data reading module, and after the data is subjected to arithmetic processing by the arithmetic processing module, the result is written into the RAM2 by the data writing module.

As shown in fig. 6, the method for using the general convolutional neural network operation architecture based on the embedded platform includes the following steps:

The use method can show that the general convolutional neural network operation architecture based on the embedded platform can cut software and hardware according to requirements, and can quickly realize convolutional neural networks with different complexity on embedded platforms with different resource quantities.

As shown in fig. 7, when in use, the operation flow of the embedded software is as follows:

(1) power-on initialization;

(2) writing the weight into a designated storage space;

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims

1. A general convolutional neural network operation architecture based on an embedded platform is characterized by comprising a hardware part and a software part;

the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other; each convolution neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through a bus, and each functional module is configured and dispatched by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit;

2. The embedded platform-based general convolutional neural network operation architecture of claim 1, wherein the embedded software implements access and scheduling of the convolutional neural network operation module by accessing a virtual address mapped at an operating system layer by a physical address of an IP core of the FPGA.

3. The embedded platform-based general convolutional neural network operational architecture as claimed in claim 1, wherein the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module in the functional modules are composed of a state machine, a configuration module, a data reading module, a data writing module, an operational processing module and a response module;

the state machine is used for skipping the state of the functional module;

the data reading module is used for reading data from the storage unit;

4. The embedded platform based general convolutional neural network operation architecture as claimed in claim 3, wherein said storage unit comprises:

5. The embedded platform-based general convolutional neural network operation architecture of claim 3, wherein there are 5 operating states for each functional module, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein the content of the first and second substances,

6. The embedded platform-based general convolutional neural network operation architecture of claim 3, wherein the configuration of each functional module through the embedded software is as follows:

7. A use method of a general convolutional neural network operation architecture based on an embedded platform is characterized by comprising the following steps:

8. The use method of the embedded platform-based general convolutional neural network operation architecture as claimed in claim 7, wherein the operation flow of the embedded software is as follows:

(1) power-on initialization;

(2) writing the weight into a designated storage space;

(5) in the processing process of the functional module, parameters needing to be modified of the next functional module to be called are configured, whether the last functional module is completed or not is inquired after the configuration is completed, the current functional module is started after the completion, and the like is repeated until all operations of the convolutional neural network are completed, and finally, the operation result is read out and processed.