CN114428761A

CN114428761A - Neural network warping method and device based on FPGA

Info

Publication number: CN114428761A
Application number: CN202210052457.1A
Authority: CN
Inventors: 凌味未; 相博镪; 赵良平; 胡双; 邹金成; 李蠡
Original assignee: Chengdu University of Information Technology
Current assignee: Chengdu University of Information Technology
Priority date: 2022-01-18
Filing date: 2022-01-18
Publication date: 2022-05-03

Abstract

The invention discloses a neural network song compiling method and device based on an FPGA (field programmable gate array), and the device comprises a display module, a control key module, an audio decoding module and an FPGA module with a soft core of a neural network hardware accelerator, wherein the FPGA module is used for realizing control of each module and artificial intelligence data operation, and is provided with a system on chip, the system on chip comprises a neural network hardware accelerator based on an instruction set architecture, a data scheduling module and a memory, the neural network hardware accelerator is used for performing operation according to a built neural network model, in the operation process, the data scheduling module moves model weights from the memory to the neural network hardware accelerator for operation, and after an operation result is obtained, audio waveform data corresponding to musical notes obtained through inference are moved from the memory to the audio decoding module for playing. The invention can solve the limitation of the existing neural network curve compiling method in the aspects of calculation power and reconfigurability.

Description

Neural network warping method and device based on FPGA

Technical Field

The invention relates to the technical field of neural network hardware acceleration, in particular to a neural network buckling method and device based on an FPGA (field programmable gate array).

Background

In order to improve the music creation efficiency and the more novel music effect, the method of automatically composing music using an algorithm has been applied to computer-aided music composition systems to various degrees, among which genetic algorithms, artificial neural networks, markov chains, and hybrid algorithms are most widely used. With the development of artificial intelligence technology in recent years, artificial neural networks have been widely used in music application systems. At present, companies such as AIVA, Google and domestic network easiness can realize high-quality artificial intelligence audio processing and music creation on a server side, and for an edge computing side, cost and application scene constraints are considered, tradeoffs are often required in aspects such as calculation power, power consumption and reconfigurability, and a larger space is still available for improvement. Meanwhile, the artificial intelligence transcription model usually takes a recurrent neural network as a core, most of the existing neural network accelerators mainly aim at the convolutional neural network, and the recurrent neural network is lack of targeted optimization.

Disclosure of Invention

In order to solve the problems, the invention provides a neural network compilation method and a device based on an FPGA (field programmable gate array), which can solve the limitation of the existing neural network compilation method in the aspects of computing power and reconfigurability, and adopts the following technical scheme:

an FPGA-based neural network warping device, comprising:

the display module is used for displaying the playing state information;

the control key module is used for selecting different play modes;

the audio decoding module is used for playing music automatically generated through artificial intelligence calculation;

the FPGA module is provided with a neural network hardware accelerator soft core and is used for realizing the control of the display module, the control key module and the audio decoding module and the artificial intelligent data operation; the FPGA module carries on a system on chip, the system on chip comprises a neural network hardware accelerator based on an instruction set framework, a data scheduling module and a memory, the neural network hardware accelerator is used for carrying out operation according to a built neural network model, in the operation process, the data scheduling module moves model weight from the memory to the neural network hardware accelerator for operation, after an operation result is obtained, audio waveform data corresponding to the musical notes obtained through inference are moved from the memory to an audio decoding module for playing.

Furthermore, the system on chip uses a soft core CPU as a controller, uses a DDR3SDRAM and a TF card as a memory, realizes audio decoding through an audio CODEC chip, and is equipped with a UART, a serial peripheral interface SPI, I2C, I2S, and a DDR3SDRAM controller, and is connected by an AHB bus.

Further, the system on chip communicates with an upper computer through a UART interface and prints log information, realizes reading and writing of a TF card through an SPI interface, and configures and transmits data to an audio CODEC chip through an I2C interface and an I2S interface.

Further, the system on chip uses a TF card to obtain data obtained after the neural network model is trained and audio waveform data corresponding to the musical notes, and the data and the audio waveform data are read by a DDR3SDRAM when the system on chip is started, different types of weights in the neural network model are stored in addresses corresponding to the TF card and the DDR3SDRAM according to rules, and a user program needs to control the neural network hardware accelerator according to the addresses to achieve weight moving.

Furthermore, the neural network hardware accelerator can receive an instruction from a user program in a mode of writing a register on a bus through a CPU, and perform memory read-write access operation, on-chip special cache operation and operation resource module operation according to the instruction.

Further, the data flow scheduling module can receive an instruction from a user program in a mode of writing a register on a bus by the CPU, and configure data scheduling among the off-chip memory, the on-chip module and the on-chip cache according to the instruction.

Furthermore, the neural network hardware accelerator takes a multi-path parallel multiplier as an operation core, and a plurality of distributed buffers temporarily store operation intermediate data; and (3) performing hardware implementation on the activation function by using piecewise function fitting and utilizing the symmetric characteristics of sigmoid and tanh.

Further, the neural network hardware accelerator generates pseudo random numbers by using a linear feedback shift register LFSR, realizes exponential operation by using a lookup table in a plurality of intervals, and cooperatively realizes hardware implementation on SOFTMAX.

Further, the method for building the neural network model comprises the following steps: the method comprises the steps of configuring a word embedding model to code musical notes in a software programming mode, and building by using three layers of GRUs, one layer of full connection layer and SOFTMAX.

A neural network warping method based on FPGA comprises the following steps:

s1, the neural network hardware accelerator performs operation according to a built neural network model, and in the operation process, the data scheduling module moves model weight from the memory to the neural network hardware accelerator for operation;

and S2, after the operation result is obtained, the data scheduling module moves the audio waveform data corresponding to the musical note obtained by inference from the memory to the audio decoding module for playing.

The invention has the beneficial effects that:

(1) the real-time neural network warping method and the real-time neural network warping device can adapt to various artificial intelligent neural networks with the circulating neural network as the core at the end side of edge calculation, and the operation performance of the real-time neural network warping device is further improved compared with that of a general processor due to the integration of a hardware accelerator.

(2) In order to deploy a neural network buckling device with low power consumption and capability of ensuring calculation power and reconfigurability at the edge calculation end side, the invention is designed based on an FPGA hardware platform, a soft core CPU is carried as a main controller, a neural network hardware accelerator is responsible for calculation, and a plurality of peripheral controller modules are integrated to realize interaction with users.

(3) In order to realize the neural network buckling acceleration more efficiently, the neural network hardware accelerator is designed as a core, the accelerator is designed based on a single instruction multiple data stream instruction set optimized for the cyclic neural network, and the accelerator can be adapted to the cyclic neural network with different layers and dimensions and a derivative model thereof while the operation speed is ensured.

Drawings

FIG. 1 is a schematic structural diagram of a neural network warping device according to the present invention;

FIG. 2 is a schematic diagram of a system-on-chip architecture of the FPGA platform of the present invention;

FIG. 3 is a functional diagram of the data flow scheduling of the present invention;

FIG. 4 is a schematic diagram of real-time playback audio waveform data according to the present invention;

FIG. 5 is a schematic diagram of a user program deployment of the present invention;

FIG. 6 is a diagram of a neural network hardware accelerator architecture of the present invention.

Detailed Description

In order to more clearly understand the technical features, objects, and effects of the present invention, specific embodiments of the present invention will now be described. It should be understood that the detailed description and specific examples, while indicating the preferred embodiment of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, the present embodiment provides a neural network song compiling method and device based on an FPGA, wherein the song compiling device includes an upper computer interface 1, a control key module 2, a display module 3, a TF Card 4(Trans-flash Card), an FPGA module 5 carrying a soft core of a neural network hardware accelerator, an audio decoding module 6, a DDR3SDRAM 7, a debugging module 8, and a power management module 9. In addition, the display module 3 is connected with a display screen 10, the audio decoding module 6 is connected with a loudspeaker 11, and the power management module 9 is connected with a lithium battery.

In this embodiment, the system on chip is implemented based on an FPGA hardware platform, and referring to fig. 2, a soft core CPU13 is used as a controller core, and an AHB and an APB bus are used to carry a peripheral module. And audio waveform data corresponding to parameters and tones obtained by neural network pre-training is stored in the TF card 4, and when the system is started, the data scheduling module moves the TF card 4 data to the DDR3SDRAM 7. In the operation process of the neural network hardware accelerator 27, the data scheduling module moves the model weight from the DDR3SDRAM 7 to the neural network hardware accelerator 27 for operation, and after an operation result is obtained, moves the audio data corresponding to the musical note obtained through inference from the DDR3SDRAM 7 to the audio decoding module 6, with reference to fig. 3 and 4.

In this embodiment, a developer is supported to use software programming to enable the neural network hardware accelerator 27 to adapt to the recurrent neural network models of different scales, and referring to fig. 5, when the whole system is reset, firstly, a program declares variables required for post-processing, and then, the variables are initialized for the UART (universal asynchronous receiver transmitter) 26, the TIMER 25, the GPIO (general purpose input/output port) 15, the SPI (serial peripheral interface) 16, the audio decoding module 6, the DDR3SDRAM controller module 19, and the neural network hardware accelerator 27, and after the serial printing welcome information is completed. After the initialization of each module is finished, the preparation stage before the operation is started. The CPU13 reads out the neural network model parameter data and the sound source corresponding to each note from the TF card 4 by using a program, and writes them into the DDR3SDRAM 7 so that the data can be read out at high speed in the subsequent operation state. After the data is prepared, the program gives an initial value to the neural network, and the function starts to operate. In this program, the neural network hardware accelerator 27 is not always running, but is prepared for data playing functions. The program circularly queries the status register of the playing module, when the data of the playing buffer is lower than a threshold value, the program starts the neural network hardware accelerator 27 to operate, and the data to be played is put into the buffer of the audio decoding module 6 after the result is obtained, so that the purpose of playing the whole system in real time is realized.

The architecture of the neural network hardware accelerator 27 of the present embodiment is shown in fig. 6, and the work flow thereof can be expressed as: first, the CPU13 notifies the neural network hardware accelerator 27 that the word corresponding to the input value calculated in this round is embedded in the address of the vector in the SDRAM, and the accelerator fetches it into the buffer Data _ in 33. And sequentially extracting bias and weight required by current calculation to corresponding buffers, and performing parallel multiply-add operation by the controller according to the instruction in a corresponding mode. All the operation results are summarized into the buffer Data _ out34, wherein the buffer Data _ tmp0 and Data _ tmp1 can automatically copy the operation results of the Data _ out34 in some special operations. The buffers Data _ r28, Data _ z29 are special buffers required in the calculation flow of the GRU, and Data _ h32 are special buffers of the hidden layer, and the calculation results are updated after each layer of calculation of the GRU is completed. After the calculation of both the GRU and the output layer at the current time is completed, the final stage of hardware implementation is entered, the result vector elements are respectively subjected to exponential operation, the result is taken as the probability corresponding to the index, the simulation extraction is performed from the result according to the form of multinomial distribution, the index of the extraction result is taken as the final result of the hardware end at the current time, and the CPU13 is informed that the operation at the current time is completed in an interruption form.

The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. The utility model provides a neural network song compiling device based on FPGA which characterized in that includes:

the display module is used for displaying the playing state information;

the control key module is used for selecting different play modes;

the FPGA module is provided with a soft core of a neural network hardware accelerator and is used for realizing the control of the display module, the control key module and the audio decoding module and the artificial intelligence data operation; the FPGA module carries on a system on chip, the system on chip comprises a neural network hardware accelerator based on an instruction set framework, a data scheduling module and a memory, the neural network hardware accelerator is used for carrying out operation according to a built neural network model, in the operation process, the data scheduling module moves model weight from the memory to the neural network hardware accelerator for operation, after an operation result is obtained, audio waveform data corresponding to the musical notes obtained through inference are moved from the memory to an audio decoding module for playing.

2. The FPGA-based neural network transcription device according to claim 1, wherein the system on chip uses a soft core CPU as a controller, uses DDR3SDRAM and TF card as memories, implements audio decoding through an audio CODEC chip, and is loaded with a UART, a serial peripheral interface SPI, I2C, I2S, and a DDR3SDRAM controller, and is connected by an AHB bus.

3. The FPGA-based neural network transcription device according to claim 2, wherein the system on chip performs communication with an upper computer and printing log information through a UART interface, realizes reading and writing of a TF card through an SPI interface, and performs configuration and data transmission of an audio CODEC chip through an I2C interface and an I2S interface.

4. The FPGA-based neural network curving device of claim 2, wherein the system on chip uses a TF card to obtain data obtained after the neural network model is trained and audio waveform data corresponding to musical notes, and the audio waveform data are read by DDR3SDRAM when the system on chip is started, different types of weights in the neural network model are stored in the TF card and addresses corresponding to DDR3SDRAM according to rules, and a user program needs to control the neural network hardware accelerator according to the addresses to move the weights.

5. The FPGA-based neural network programming device of claim 2, wherein the neural network hardware accelerator is capable of receiving an instruction from a user program by writing a register on a bus through a CPU, and performing a memory read-write access operation, an on-chip dedicated cache operation, and an operation resource module operation according to the instruction.

6. The FPGA-based neural network transcription device according to claim 2, wherein the data flow scheduling module is capable of receiving an instruction from a user program by writing a register on a bus through a CPU, and configuring data scheduling between an off-chip memory and an on-chip module and an on-chip cache according to the instruction.

7. The FPGA-based neural network buckling device of claim 2, wherein the neural network hardware accelerator takes a plurality of parallel multipliers as an operation core, and a plurality of distributed buffers temporarily store operation intermediate data; and (3) performing hardware implementation on the activation function by using piecewise function fitting and utilizing the symmetric characteristics of sigmoid and tanh.

8. The FPGA-based neural network compilation device of claim 2, wherein the neural network hardware accelerator generates pseudo-random numbers using a Linear Feedback Shift Register (LFSR), performs exponential operation using a look-up table with multiple intervals, and performs hardware implementation on SOFTMAX in cooperation.

9. The FPGA-based neural network buckling device of claim 1, wherein the building method of the neural network model comprises the following steps: the method comprises the steps of configuring a word embedding model to code musical notes in a software programming mode, and building by using three layers of GRUs, one layer of full connection layer and SOFTMAX.

10. An FPGA-based neural network warping method applied to the FPGA-based neural network warping device of any one of claims 1-9, characterized by comprising the following steps: