CN111325327B - Universal convolution neural network operation architecture based on embedded platform and use method - Google Patents

Universal convolution neural network operation architecture based on embedded platform and use method Download PDF

Info

Publication number
CN111325327B
CN111325327B CN202010150285.2A CN202010150285A CN111325327B CN 111325327 B CN111325327 B CN 111325327B CN 202010150285 A CN202010150285 A CN 202010150285A CN 111325327 B CN111325327 B CN 111325327B
Authority
CN
China
Prior art keywords
module
neural network
data
convolutional neural
network operation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010150285.2A
Other languages
Chinese (zh)
Other versions
CN111325327A (en
Inventor
曾小华
魏新
王正伟
刘志刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Jiuzhou Electric Group Co Ltd
Original Assignee
Sichuan Jiuzhou Electric Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Jiuzhou Electric Group Co Ltd filed Critical Sichuan Jiuzhou Electric Group Co Ltd
Priority to CN202010150285.2A priority Critical patent/CN111325327B/en
Publication of CN111325327A publication Critical patent/CN111325327A/en
Application granted granted Critical
Publication of CN111325327B publication Critical patent/CN111325327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)

Abstract

The invention discloses a general convolutional neural network operation architecture based on an embedded platform and a using method thereof, wherein the operation architecture comprises a hardware part and a software part; the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other; the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA. The configurable independent and parallel convolutional neural network operation modules are mounted in the FPGA, so that the software and hardware can be freely cut, and convolutional neural networks with different complexity can be quickly realized on embedded platforms with different resource quantities.

Description

Universal convolution neural network operation architecture based on embedded platform and use method
Technical Field
The invention relates to a general convolutional neural network operation architecture based on an embedded platform and a using method thereof.
Background
Convolutional Neural Networks (CNN) are an efficient Neural network identification method. The CNN structure mainly comprises a convolution layer, a nonlinear activation layer, a pooling layer, a full-connection layer and an output layer. Deep learning using CNNs is playing an important role in more and more fields, such as recognition of airplanes, tanks, etc. in military, and recognition of audio, video, images, etc. in civilian use. General purpose computing platforms (CPU, GPU) cannot achieve a good balance of performance, power consumption, volume in an embedded environment, however, to apply the research results of the convolutional neural network to various industries in life, it is necessary to implement the convolutional neural network in the embedded platform. The most important characteristic of the embedded platform is that the software and hardware can be cut, and the resource sizes of different platforms are different. Because the CNN has the characteristics of general function modules and special function, the main difference between convolutional neural networks with different functions is the difference between the network structures and the weight parameters, most of the existing FPGA acceleration platforms are only suitable for fixed convolutional neural networks, and cannot quickly realize network reconstruction, and if the function modules are redesigned for each convolutional network, the difficulty and the time cost in the development process can be greatly increased. If a neural network operation architecture which can be cut by software and hardware and can rapidly realize different complexities in different embedded platforms can be designed, the landing application pace of the neural network technology can be greatly accelerated.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the existing problems, a general convolutional neural network operation architecture based on an embedded platform and a using method are provided.
The invention adopts a general convolutional neural network operation architecture based on an embedded platform, which comprises a hardware part and a software part;
the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other;
the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA.
Furthermore, the embedded software accesses the virtual address mapped by the physical address of the IP core of the FPGA on the operating system layer to realize the access and scheduling of the convolutional neural network operation module.
Furthermore, each convolution neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through buses, and each functional module is configured and dispatched by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; and the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit.
Furthermore, a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module and an output module in the functional modules are composed of a state machine, a configuration module, a data reading module, a data writing module, an operation processing module and a response module;
the state machine is used for skipping the state of the functional module;
the response module is used for responding the embedded software to the state query of the functional module;
the configuration module is connected with the data reading module, the data writing module and the operation processing module and is used for the embedded software to perform parameter configuration on the functional module;
the data reading module is used for reading data from the storage unit;
the input end of the operation processing module is connected with the data reading module and is used for performing operation processing on the data read by the data reading module;
and the data writing module is connected with the output end of the operation processing module and is used for writing the operation processing result of the operation processing module into the storage unit.
Preferably, the storage unit includes:
the read-only data storage unit is used for reading data by the data reading module of each functional module;
and the write-only data storage unit is used for writing data into the data writing module of each functional module.
Further, each functional module has 5 operating states, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein the content of the first and second substances,
the initial state, the configuration state, the processing state and the ending state form a cycle, which indicates that the 4 states can only appear circularly, and each time has one and only one running state;
the response state is independent of the other four states, and the response state may occur simultaneously with any of the other four operating states.
Further, the content of the configuration of each functional module through the embedded software is as follows:
the configurable contents of the convolution module include: data input address, convolution kernel weight parameter address, data output address, convolution kernel size, input data size, whether to add bias, whether to enable internal nonlinear operation and the direction of reading and writing the storage unit;
the non-linear activation module configurable contents include: data input address, data output address, activation function selection and direction of reading and writing the storage unit;
the pooling module configurable content includes: selecting and reading and writing the direction of the storage unit by a data input address, a data output address, a pooling size and a pooling mode;
the configurable contents of the accumulation module include: data 1 input address, data 2 input address, data 3 input address, data output address, whether to enable internal nonlinear operation and the direction of reading and writing the memory cell;
the fully connected module configurable content includes: data input address, parameter input address, input layer neuron number, output layer neuron number, data output address, whether to add bias, whether to enable internal nonlinear operation, and the direction of reading and writing the memory cell.
The invention also provides a using method of the general convolutional neural network operation architecture based on the embedded platform, which comprises the following steps:
(1) determining the number of convolution neural network operation modules according to the designed neural network structure and the size of the used embedded platform resources;
(2) determining the type and the number of functional modules in the convolutional neural network operation module according to the designed neural network structure and the size of the used embedded platform resources;
(3) calculating the parameter quantity of each operation processing process according to the designed neural network structure, and dividing the storage space;
(4) storing a file of the parameters of the trained neural network structure, and copying the file to an embedded software directory to be used as a weight parameter file of an embedded platform;
(5) and writing an application program of embedded software for scheduling and configuring each functional module in the convolutional neural network operation module according to the designed neural network structure.
When in use, the operation flow of the embedded software is as follows:
(1) power-on initialization;
(2) the embedded software applies for virtual addresses for each functional module of the convolutional neural network operational module in the hardware portion,
(2) writing the weight into a designated storage space;
(3) configuring fixed parameters of each functional module in the operation process;
(4) calling according to the designed sequence of each functional module, configuring parameters to be modified before calling, and entering a processing process after completing configuration;
(5) in the processing process of the functional module, parameters needing to be modified of the next functional module to be called are configured, whether the last functional module is completed or not is inquired after the configuration is completed, the current functional module is started after the completion of the configuration, the next class pushing is carried out until all operations of the convolutional neural network are completed, and finally, the operation result is read out and processed.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
the invention realizes the rapid reconstruction of the network by using the embedded platform through the design of the operation architecture under the conditions of low power consumption and small volume, and simultaneously, the configurable independent and parallel convolutional neural network operation modules are mounted in the FPGA to realize the free cutting of software and hardware, so that the convolutional neural networks with different complexity can be rapidly realized on the embedded platforms with different resource quantities.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of a general convolutional neural network operation architecture based on an embedded platform according to the present invention.
Fig. 2 is a schematic structural diagram of the convolutional neural network operation module according to the present invention.
Fig. 3 is a schematic structural diagram of functional modules of the convolutional neural network operation module according to the present invention.
Fig. 4 is a schematic diagram of state transition of functional modules of the convolutional neural network operation module according to the present invention.
FIG. 5 is a diagram illustrating a configuration register structure of the convolution module according to the present invention.
FIG. 6 is a flow chart of a method for using the embedded platform-based general convolutional neural network operation architecture according to the present invention.
FIG. 7 is a flowchart illustrating the operation of embedded software when the general convolutional neural network architecture based on an embedded platform is used.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The features and properties of the present invention are described in further detail below with reference to examples.
The general convolutional neural network operation architecture based on the embedded platform provided by the embodiment comprises a hardware part and a software part;
the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other;
the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA. Specifically, the embedded software accesses a virtual address mapped by the physical address of the IP core of the FPGA on the operating system layer to realize the access and scheduling of the convolutional neural network operation module.
As shown in fig. 1, the peripheral hardware circuit includes DDR, USB, SD card, serial port, network port, and the like. The HPS and the FPGA communicate through an AXI bus, an operating system and an application program of embedded software are placed in the SD card, the HPS loads the operating system and the application program of the embedded software from the SD card into the DDR to run in the running process, and data interaction is carried out with the outside through a USB, a serial port, a network port and the like. The FPGA comprises a plurality of convolutional neural network operation modules, and each convolutional neural network operation module is independent and does not influence each other.
The invention realizes the rapid reconstruction of the network by using the embedded platform through the design of the operation architecture under the conditions of low power consumption and small volume, and simultaneously, the configurable independent and parallel convolutional neural network operation modules are mounted in the FPGA to realize the free cutting of software and hardware, so that the convolutional neural networks with different complexity can be rapidly realized on the embedded platforms with different resource quantities.
As shown in fig. 2, each convolutional neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through a bus, and each functional module is configured and scheduled by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; and the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit. Specifically, the method comprises the following steps:
the convolution module is used for convolution operation;
the nonlinear activation module is used for nonlinear activation operation;
the pooling module is used for pooling operation;
the accumulation module is used for carrying out accumulation operation on input data;
the full-connection module is used for full-connection operation;
the multi-path selector module is used for connecting each functional module with the storage unit in a time-sharing manner;
the storage unit is used for data interaction of each functional module and can store input data, weight parameter data, process data, result data and the like.
For convenience of illustration, the memory unit in this embodiment employs two dual-port RAMs: RAM1 and RAM2, the corresponding multiplexer module includes multiplexer 1 and multiplexer 2, each function module shares two memory cells.
As shown in fig. 3, the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module in the functional module are composed of a state machine, a configuration module, a data reading module, a data writing module, an operation processing module and a response module;
a. the state machine is used for skipping the state of the functional module; as shown in fig. 4, there are 5 operating states for each functional module, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein, the initial state, the configuration state, the processing state and the ending state form a cycle, which indicates that the 4 states can only appear circularly, and each time has one and only one operation state; the response state is independent of the other four states, and the response state may occur simultaneously with any of the other four operating states. Specifically, the functional module enters an initial state after power-on initialization is completed, the module enters a configuration state when the embedded software configures the embedded software, enters a processing state after configuration is completed, enters an ending state after processing is completed, then returns to the initial state, enters a next cycle, and can enter a response state at any time in the cycle process without affecting the operation of other operation states of the functional module.
b. The response module is used for responding the embedded software to the state inquiry of the functional module and corresponding to the response state of the functional module;
c. the configuration module is connected with the data reading module, the data writing module and the operation processing module and is used for parameter configuration of the functional modules by the embedded software, and each functional module can perform parameter configuration in the operation process; the content of each functional module configured by the embedded software is as follows:
the configurable contents of the convolution module include: data input address, convolution kernel weight parameter address, data output address, convolution kernel size, input data size, whether to add bias, whether to enable internal nonlinear operation and the direction of reading and writing the storage unit; in this embodiment, the convolution operation with a convolution kernel size of 2-11 may be implemented, and the hardware parameters may also be modified so that the configured convolution kernel size is larger or smaller than 2-11.
The non-linear activation module configurable contents include: data input address, data output address, activation function selection and direction of reading and writing the storage unit;
the pooling module configurable content includes: selecting and reading and writing the direction of the storage unit by a data input address, a data output address, a pooling size and a pooling mode; in this embodiment, the pooling operation with a pooling window of 2 to 32 may be implemented, and the pooling operation of the maximum value or the average value may also be implemented by embedded software configuration.
The configurable contents of the accumulation module include: data 1 input address, data 2 input address, data 3 input address, data output address, whether to enable internal nonlinear operation and the direction of reading and writing the memory cell;
the fully connected module configurable content includes: data input address, parameter input address, input layer neuron number, output layer neuron number, data output address, whether to add bias, whether to enable internal nonlinear operation, and the direction of reading and writing the memory cell.
From the above, the nonlinear operation is integrated inside the convolution module, the accumulation module and the full-connection module, and the nonlinear operation inside the modules can be started as required, so as to save hardware resources and shorten the time of operation processing.
As shown in fig. 5, the convolution module of this embodiment includes 32 8-bit configuration registers, where the configuration registers correspond to different configuration contents, and some configuration registers are reserved and can be used for subsequent upgrade. Other modules are similar to the convolution module and also comprise configuration registers, and different working modes can be realized by setting parameters of the configuration registers.
d. The data reading module is used for reading data from the storage unit; it should be noted that, for convenience of distinction and data processing;
e. the input end of the operation processing module is connected with the data reading module and is used for performing operation processing on the data read by the data reading module;
f. and the data writing module is connected with the output end of the operation processing module and is used for writing the operation processing result of the operation processing module into the storage unit.
Preferably, the memory cell is divided into: a read-only data storage unit and a write-only data storage unit; wherein:
the read-only data storage unit is used for reading data by the data reading module of each functional module;
and the write-only data storage unit is used for writing data into the data writing module of each functional module.
When 2 memory cells are used in this embodiment, the RAM1 may be used as a read-only data memory cell and the RAM2 may be used as a write-only data memory cell. Thus, the data in the RAM1 is read by the data reading module, and after the data is subjected to arithmetic processing by the arithmetic processing module, the result is written into the RAM2 by the data writing module.
As shown in fig. 6, the method for using the general convolutional neural network operation architecture based on the embedded platform includes the following steps:
(1) determining the number of convolution neural network operation modules according to the designed neural network structure and the size of the used embedded platform resources;
(2) determining the type and the number of functional modules in the convolutional neural network operation module according to the designed neural network structure and the size of the used embedded platform resources;
(3) calculating the parameter quantity of each operation processing process according to the designed neural network structure, and dividing the storage space;
(4) storing a file of the parameters of the trained neural network structure, and copying the file to an embedded software directory to be used as a weight parameter file of an embedded platform;
(5) and writing an application program of embedded software for scheduling and configuring each functional module in the convolutional neural network operation module according to the designed neural network structure.
The use method can show that the general convolutional neural network operation architecture based on the embedded platform can cut software and hardware according to requirements, and can quickly realize convolutional neural networks with different complexity on embedded platforms with different resource quantities.
As shown in fig. 7, when in use, the operation flow of the embedded software is as follows:
(1) power-on initialization;
(2) the embedded software applies for virtual addresses for each functional module of the convolutional neural network operational module in the hardware portion,
(2) writing the weight into a designated storage space;
(3) configuring fixed parameters of each functional module in the operation process;
(4) calling according to the designed sequence of each functional module, configuring parameters to be modified before calling, and entering a processing process after completing configuration;
(5) in the processing process of the functional module, parameters needing to be modified of the next functional module to be called are configured, whether the last functional module is completed or not is inquired after the configuration is completed, the current functional module is started after the completion of the configuration, the next class pushing is carried out until all operations of the convolutional neural network are completed, and finally, the operation result is read out and processed.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (8)

1. A general convolutional neural network operation architecture based on an embedded platform is characterized by comprising a hardware part and a software part;
the hardware part comprises an HPS and an FPGA which are connected, and a peripheral hardware circuit; the FPGA comprises one or more convolution neural network operation modules; when a plurality of convolutional neural network operation modules exist, each convolutional neural network operation module is independent and parallel to each other; each convolution neural network operation module comprises a storage unit and a plurality of functional modules, wherein the storage unit and each functional module are connected with the HPS through a bus, and each functional module is configured and dispatched by embedded software; the plurality of functional modules are a convolution module, a nonlinear activation module, a pooling module, an accumulation module, a full-connection module, an output module and a multiplexer module; the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module are mutually independent; the multiplexer module is used for connecting each functional module with the storage unit in a time-sharing manner, so that each functional module performs data interaction through the storage unit;
the software part is embedded software running in the HPS and used for configuring and scheduling the hardware part and allocating memory space which can be accessed by an IP core of the FPGA.
2. The embedded platform-based general convolutional neural network operation architecture of claim 1, wherein the embedded software implements access and scheduling of the convolutional neural network operation module by accessing a virtual address mapped at an operating system layer by a physical address of an IP core of the FPGA.
3. The embedded platform-based general convolutional neural network operational architecture as claimed in claim 1, wherein the convolution module, the nonlinear activation module, the pooling module, the accumulation module, the full-connection module and the output module in the functional modules are composed of a state machine, a configuration module, a data reading module, a data writing module, an operational processing module and a response module;
the state machine is used for skipping the state of the functional module;
the response module is used for responding the embedded software to the state query of the functional module;
the configuration module is connected with the data reading module, the data writing module and the operation processing module and is used for the embedded software to perform parameter configuration on the functional module;
the data reading module is used for reading data from the storage unit;
the input end of the operation processing module is connected with the data reading module and is used for performing operation processing on the data read by the data reading module;
and the data writing module is connected with the output end of the operation processing module and is used for writing the operation processing result of the operation processing module into the storage unit.
4. The embedded platform based general convolutional neural network operation architecture as claimed in claim 3, wherein said storage unit comprises:
the read-only data storage unit is used for reading data by the data reading module of each functional module;
and the write-only data storage unit is used for writing data into the data writing module of each functional module.
5. The embedded platform-based general convolutional neural network operation architecture of claim 3, wherein there are 5 operating states for each functional module, including: an initial state, a configuration state, a processing state, an ending state and a response state; wherein the content of the first and second substances,
the initial state, the configuration state, the processing state and the ending state form a cycle, which indicates that the 4 states can only appear circularly, and each time has one and only one running state;
the response state is independent of the other four states, and the response state may occur simultaneously with any of the other four operating states.
6. The embedded platform-based general convolutional neural network operation architecture of claim 3, wherein the configuration of each functional module through the embedded software is as follows:
the configurable contents of the convolution module include: data input address, convolution kernel weight parameter address, data output address, convolution kernel size, input data size, whether to add bias, whether to enable internal nonlinear operation and the direction of reading and writing the storage unit;
the non-linear activation module configurable contents include: data input address, data output address, activation function selection and direction of reading and writing the storage unit;
the pooling module configurable content includes: selecting and reading and writing the direction of the storage unit by a data input address, a data output address, a pooling size and a pooling mode;
the configurable contents of the accumulation module include: data 1 input address, data 2 input address, data 3 input address, data output address, whether to enable internal nonlinear operation and the direction of reading and writing the memory cell;
the fully connected module configurable content includes: data input address, parameter input address, input layer neuron number, output layer neuron number, data output address, whether to add bias, whether to enable internal nonlinear operation, and the direction of reading and writing the memory cell.
7. A use method of a general convolutional neural network operation architecture based on an embedded platform is characterized by comprising the following steps:
(1) determining the number of convolution neural network operation modules according to the designed neural network structure and the size of the used embedded platform resources;
(2) determining the type and the number of functional modules in the convolutional neural network operation module according to the designed neural network structure and the size of the used embedded platform resources;
(3) calculating the parameter quantity of each operation processing process according to the designed neural network structure, and dividing the storage space;
(4) storing a file of the parameters of the trained neural network structure, and copying the file to an embedded software directory to be used as a weight parameter file of an embedded platform;
(5) and writing an application program of embedded software for scheduling and configuring each functional module in the convolutional neural network operation module according to the designed neural network structure.
8. The use method of the embedded platform-based general convolutional neural network operation architecture as claimed in claim 7, wherein the operation flow of the embedded software is as follows:
(1) power-on initialization;
(2) the embedded software applies for virtual addresses for each functional module of the convolutional neural network operational module in the hardware portion,
(2) writing the weight into a designated storage space;
(3) configuring fixed parameters of each functional module in the operation process;
(4) calling according to the designed sequence of each functional module, configuring parameters to be modified before calling, and entering a processing process after completing configuration;
(5) in the processing process of the functional module, parameters needing to be modified of the next functional module to be called are configured, whether the last functional module is completed or not is inquired after the configuration is completed, the current functional module is started after the completion, and the like is repeated until all operations of the convolutional neural network are completed, and finally, the operation result is read out and processed.
CN202010150285.2A 2020-03-06 2020-03-06 Universal convolution neural network operation architecture based on embedded platform and use method Active CN111325327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010150285.2A CN111325327B (en) 2020-03-06 2020-03-06 Universal convolution neural network operation architecture based on embedded platform and use method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010150285.2A CN111325327B (en) 2020-03-06 2020-03-06 Universal convolution neural network operation architecture based on embedded platform and use method

Publications (2)

Publication Number Publication Date
CN111325327A CN111325327A (en) 2020-06-23
CN111325327B true CN111325327B (en) 2022-03-08

Family

ID=71169493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010150285.2A Active CN111325327B (en) 2020-03-06 2020-03-06 Universal convolution neural network operation architecture based on embedded platform and use method

Country Status (1)

Country Link
CN (1) CN111325327B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112581366B (en) * 2020-11-30 2022-05-20 黑龙江大学 Portable image super-resolution system and system construction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106951961A (en) * 2017-02-24 2017-07-14 清华大学 The convolutional neural networks accelerator and system of a kind of coarseness restructural
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 Convolutional neural networks module based on FPGA
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108090565A (en) * 2018-01-16 2018-05-29 电子科技大学 Accelerated method is trained in a kind of convolutional neural networks parallelization
CN109086867B (en) * 2018-07-02 2021-06-08 武汉魅瞳科技有限公司 Convolutional neural network acceleration system based on FPGA
US11151769B2 (en) * 2018-08-10 2021-10-19 Intel Corporation Graphics architecture including a neural network pipeline
GB2587032B (en) * 2019-09-16 2022-03-16 Samsung Electronics Co Ltd Method for designing accelerator hardware

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106951961A (en) * 2017-02-24 2017-07-14 清华大学 The convolutional neural networks accelerator and system of a kind of coarseness restructural
CN109711533A (en) * 2018-12-20 2019-05-03 西安电子科技大学 Convolutional neural networks module based on FPGA
CN109784489A (en) * 2019-01-16 2019-05-21 北京大学软件与微电子学院 Convolutional neural networks IP kernel based on FPGA
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
FPGA-Based CNN Inference Accelerator Synthesized from Multi-Threaded C Software;Jin Hee Kim等;《SOCC》;20171021;第1-6页 *
Scalable FPGA Accelerator for Deep Convolutional Neural Networks with Stochastic Streaming;Mohammed Alawad等;《IEEE TRANSACTIONS ON MULTI-SCALE COMPUTING SYSTEMS》;20181017;第4卷(第4期);第1-13页 *
The Design of Lightweight and Multi Parallel CNN Accelerator Based on FPGA;LI Zong-ling等;《ITAIC 2019》;20190805;第1521-1528页 *
卷积神经网络的FPGA并行加速方案设计;方睿;《计算机工程与应用》;20150430;第51卷(第8期);第32-36页 *
基于FPGA的卷积神经网络加速方法研究及实现;仇越;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115(第(2019)01期);I140-269第3.2.1节、图3-1 *
基于FPGA的卷积神经网络并行结构研究;陆志坚;《中国博士学位论文全文数据库 信息科技辑》;20140415(第(2014)04期);I140-12 *
基于异构处理器的深度卷积神经网络加速系统设计与实现;姜典坤;《中国优秀硕士学位论文全文数据库 信息科技辑》;20190115(第(2019)1期);I138-934第4.1节、图4-1 *

Also Published As

Publication number Publication date
CN111325327A (en) 2020-06-23

Similar Documents

Publication Publication Date Title
US20040215852A1 (en) Active memory data compression system and method
CN114356223B (en) Memory access method and device, chip and electronic equipment
US8631186B2 (en) Hardware and file system agnostic mechanism for achieving capsule support
EP3657337A1 (en) Method, apparatus, device and storage medium for accessing static random access memory
US20220188073A1 (en) Data-type-aware clock-gating
US20210158131A1 (en) Hierarchical partitioning of operators
US11809953B1 (en) Dynamic code loading for multiple executions on a sequential processor
CN111325327B (en) Universal convolution neural network operation architecture based on embedded platform and use method
US11494326B1 (en) Programmable computations in direct memory access engine
US11500962B1 (en) Emulating fine-grained sparsity in a systolic array
CN108427584A (en) The configuration method of the chip and the chip with parallel computation core quickly started
CN112433812A (en) Method, system, equipment and computer medium for virtual machine cross-cluster migration
US11500802B1 (en) Data replication for accelerator
CN115577747A (en) High-parallelism heterogeneous convolutional neural network accelerator and acceleration method
CN113111013B (en) Flash memory data block binding method, device and medium
US11354130B1 (en) Efficient race-condition detection
US11803736B1 (en) Fine-grained sparsity computations in systolic array
CN114724103A (en) Neural network processing system, instruction generation method and device and electronic equipment
JP2022520912A (en) Data processing methods, devices and chips, electronic devices, storage media
CN104951406A (en) Paging type address space management method and controller
US11620120B1 (en) Configuration of secondary processors
CN111061507A (en) Operation method, operation device, computer equipment and storage medium
CN108874468A (en) Loading method, device, computer equipment and the storage medium of application program
CN110520839A (en) Storage medium space application method, apparatus and unmanned plane in System on Chip/SoC
CN115794604B (en) Data generation method, device, equipment, medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant