CN114495971A

CN114495971A - Voice enhancement method for running neural network by adopting embedded hardware

Info

Publication number: CN114495971A
Application number: CN202210182933.1A
Authority: CN
Inventors: 李恺旭; 魏震益; 杜怀云
Original assignee: Sichuan Tianzhongxing Aviation Technology Co ltd
Current assignee: Sichuan Tianzhongxing Aviation Technology Co ltd
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2022-05-13

Abstract

The invention discloses a voice enhancement method for operating a neural network by adopting embedded hardware, which relates to the field of voice signal processing and comprises the following steps: voice data are collected through a voice sensor, and Fourier transform is carried out on the voice data through an FPGA to obtain spectrogram data; adopting a logic unit of the FPGA to construct an R-CED neural network to obtain an R-CED neural network digital logic subsystem; denoising the spectrogram data through an R-CED neural network digital logic subsystem; and performing time domain reduction on the denoised spectrogram data through the FPGA to obtain voice enhancement data. The invention is based on an embedded hardware platform, realizes the neural network by FPGA construction, fully utilizes the parallelism of the FPGA, greatly improves the processing speed and ensures the real-time property of voice enhancement processing compared with the neural network operation mode based on processors such as GPU, CPU and the like.

Description

Voice enhancement method for running neural network by adopting embedded hardware

Technical Field

The invention relates to the field of voice signal processing, in particular to a voice enhancement method for operating a neural network by adopting embedded hardware.

Background

The voice enhancement technology is a technology for extracting the pure target voice as far as possible by inhibiting and reducing the influence of noise through a certain noise reduction algorithm after the pure target voice signal is interfered by one or more noises and even submerged in a complex environment. The method is widely applied to the fields of mobile communication, man-machine interaction, military communication and the like, and is used for eliminating and weakening negative effects brought by various noises.

With the development of the internet of things technology, the voice processing equipment is rapidly developed towards the direction of intellectualization and terminal, and the voice enhancement technology is largely applied to a hardware platform. However, the cloud computing model in the internet of things technology is not suitable for the terminal device due to the fact that a large amount of network bandwidth is used and real-time feedback cannot be achieved. To supplement the disadvantages of cloud computing, edge computing models are in force.

The edge calculation selection disperses the calculation task to the light-weight equipment close to the data source, and part of data is directly collected and calculated locally and fed back to the user in real time. With the improvement of the process level of the semiconductor manufacturing industry, semi-custom integrated circuit chips such as an FPGA (Field Programmable Gate Array) and a system on chip (SoC) FPGA provide an application scene for edge computing. Although such embedded devices have the advantages of local acquisition and local computation, the limitations of transmission bandwidth, storage resources and computing resources also hinder the development of large-scale applications.

Existing speech enhancement algorithms are usually based on machine learning techniques, such as generating network models of a countermeasure network (GAN), a GAN of a self-coder structure, and a Long Short-Term Memory (LSTM). Most of the algorithms adopt neural network models with different structures and deeper layers, and partial performance improvement is replaced by higher calculation cost, so that the complex neural networks are difficult to realize on hardware platforms with limited resources.

Disclosure of Invention

Aiming at the defects in the prior art, the voice enhancement method for operating the neural network by adopting the embedded hardware solves the problem that the existing voice enhancement system based on the neural network is difficult to realize on an embedded hardware platform with limited resources.

In order to achieve the purpose of the invention, the invention adopts the technical scheme that:

a speech enhancement method for operating a neural network by using embedded hardware comprises the following steps:

s1, voice data are collected through a voice sensor, and Fourier transform is carried out on the voice data through an FPGA to obtain spectrogram data;

s2, constructing an R-CED neural network by adopting logic units of the FPGA to obtain an R-CED neural network digital logic subsystem;

s3, denoising the spectrogram data through an R-CED neural network digital logic subsystem;

and S4, performing time domain restoration on the denoised spectrogram data through the FPGA to obtain voice enhancement data.

Further, in step S1, performing fourier transform on the voice data through a programmable logic PL port of the Zynq7020 type hardware platform FPGA;

in the step S2, an R-CED neural network is constructed by adopting logic units in a programmable logic PL end of a Zynq7020 type hardware platform FPGA;

in the step S4, the processor system PS end of the Zynq7020 type hardware platform FPGA performs time domain restoration on the denoised spectrogram data.

Further, the step S2 includes the following sub-steps:

s21, constructing a neural network convolution module by adopting a logic unit in a programmable logic PL (programmable logic) end of a Zynq7020 type hardware platform FPGA;

s22, building an R-CED neural network digital logic subsystem through a neural network convolution unit;

s23, constructing a neural network convolution parameter storage module by adopting a logic unit in a programmable logic PL (programmable logic) end of a Zynq7020 type hardware platform FPGA;

and S24, storing the parameters of each convolution kernel module in the R-CED neural network digital logic subsystem through the neural network convolution parameter storage module.

Further, the neural network convolution module constructed in step S21 includes: the device comprises a shift register, at least one multiplication group unit, a convolution control unit and an accumulation unit;

the shift register is used for moving input spectrogram data and convolution operation weight parameters stored in the neural network convolution parameter storage module to the convolution operation module according to the clock period of the FPGA machine through shift operation;

the multiplication group unit is used for carrying out multiplication operation on the input spectrogram data and the convolution operation weight parameters to obtain a convolution operation result;

the convolution control unit is used for controlling the time sequence of the shift register, the multiplication group unit and the accumulation unit through a preset convolution control finite state machine so as to realize convolution operation;

and the accumulation unit is used for accumulating the operation results of all the convolution control units.

Further, the R-CED neural network digital logic subsystem further includes an input spectrogram data padding module, which is configured to perform padding 0 padding operation in a process of performing convolution operation on the input spectrogram data by the neural network convolution module in a full padding manner.

The invention has the beneficial effects that:

1) the invention is based on an embedded hardware platform, realizes the neural network by FPGA construction, fully utilizes the parallelism of the FPGA, greatly improves the processing speed and ensures the real-time property of voice enhancement processing compared with the neural network operation mode based on processors such as GPU, CPU and the like.

2) A Zynq7020 type hardware platform FPGA with a built-in programmable logic PL end and a processor system PS end is used, the difference characteristic that a digital logic module with programmable logic PL easy to construct parallel data processing and a processor system PS easy to execute complex serial program instructions is fully utilized, and the operation of neural network noise reduction and frequency domain signal time domain conversion is executed in a division mode.

3) Aiming at the characteristics of Zynq7020 type hardware platform FPGA, a basic structure of R-CED neural network-neural network convolution module is designed, the multi-cycle serial operation of convolution is converted into the pipelined parallel operation controlled by a finite state machine unique to a digital logic circuit, and the processing speed of the R-CED neural network is effectively increased.

Drawings

FIG. 1 is a flowchart of a method for operating a neural network using embedded hardware according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a Zynq7020 type hardware platform FPGA structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a software program implementation of a neural network convolution module;

FIG. 4 is a block diagram of a neural network convolution module according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a shift register according to an embodiment of the present invention;

FIG. 6 is a diagram of a row accumulation structure of an accumulation unit according to an embodiment of the present invention;

FIG. 7 is a diagram of a column accumulation structure of an accumulation unit according to an embodiment of the present invention;

FIG. 8 is a layer state transition diagram of an embodiment of the present invention;

FIG. 9 is an operational state transition diagram of an embodiment of the present invention;

FIG. 10 is a data transport state transition diagram according to an embodiment of the present invention;

FIG. 11 is a state transition diagram of a fill operation according to an embodiment of the present invention;

fig. 12 is a schematic diagram illustrating a principle of a neural network convolution parameter storage module according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention is provided to facilitate the understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and it will be apparent to those skilled in the art that various changes may be made without departing from the spirit and scope of the invention as defined and defined in the appended claims, and all matters produced by the invention using the inventive concept are protected.

As shown in fig. 1, in one embodiment of the present invention, a speech enhancement method for operating a neural network by using embedded hardware comprises the following steps:

s1, performing Fourier transform on the voice data through a programmable logic PL terminal of the Zynq7020 type hardware platform FPGA to obtain spectrogram data.

S2, constructing an R-CED neural network by adopting logic units in a programmable logic PL (programmable logic) end of Zynq7020 type hardware platform FPGA to obtain the R-CED neural network digital logic subsystem.

The structure of the Zynq7020 hardware platform FPGA of this embodiment is shown in fig. 2, where a programmable logic PL end provides a logic gate array capable of controlling a connection relationship with a hardware description language (e.g., Verilog HDL), and a processor system PS end provides an ARM processor capable of running a software program. In fig. 2, the dotted line indicates the flow of the control signal, and the solid line indicates the flow of data.

The core of the R-CED neural network digital logic subsystem is convolution CNN realized by FPGA. Step S2 of the present embodiment includes the following substeps:

s21, constructing a neural network convolution module by adopting logic units in a programmable logic PL (programmable logic) end of Zynq7020 type hardware platform FPGA.

Typically, the neural network convolution module is implemented by a software program, the program of which is shown in fig. 3, and the convolution operation can be regarded as accumulation of multiplication, and the operation includes an output feature map channel cycle, an input feature map cycle and a convolution kernel cycle.

The core idea of the invention is to realize the parallelization processing of the convolution operation by a digital logic gate circuit of hardware, thereby achieving the effect of hardware acceleration.

According to the parallelization analysis result, in the convolution operation, the operations of all layers are serial, and the operations of single layer are parallel, so that the accelerated design is carried out aiming at the operations of single layer.

TABLE 1 convolution module parameters for each neural network

As shown in the pseudo code in fig. 3, a single-layer operation includes four nested loops, and different unfolding strategies are adopted for different characteristics of the loops to accelerate convolution operation:

and (3) convolution kernel circulation: all expansion is carried out, and all characteristic diagram data and convolution kernel data required by single convolution kernel operation are stored by a register; the convolution module parameters of each neural network are shown in table 1.

Inputting a characteristic diagram cycle: the data of convolution operation in the system is derived from the RAM in the FPGA chip, so that a plurality of addresses cannot be simultaneously input to obtain the feature map data of a plurality of different convolution windows corresponding to the convolution kernel in the sliding process;

inputting a characteristic diagram channel cycle: partial expansion, dividing the feature map and the weight into 4 operation channels according to the channels, wherein in the operation channels, the reading and operation of the input feature map are serial, and the operation channels are parallel;

and (3) outputting a characteristic diagram channel cycle: and (3) partial expansion, namely dividing the weight into 4 parts according to an output characteristic diagram channel, and comprehensively expanding the weight according to the input characteristic diagram channel in a circulating mode, wherein each part comprises 4 operation channels and 16 operation channels in total.

As shown in fig. 4, the neural network convolution module constructed in this embodiment includes: the device comprises a shift register, at least one multiplication group unit, a convolution control unit and an accumulation unit.

The shift register is used for moving the input spectrogram data and the convolution operation weight parameters stored in the neural network convolution parameter storage module to the convolution operation module according to the clock period of the FPGA machine through shift operation. The working principle is shown in fig. 5.

If the maximum window of all the convolution layers is I multiplied by J, the number of the shift registers is J-1, the depth is I, and the weight input is a single register. The shift register does not need to be full of I × J data, and the transfer of data to the REG register can be started when I × (J-1) + 1. Fig. 5 illustrates a maximum convolution window of 5 × 5, a current convolution window of 4 × 4, and weight data of 0-15. The position occupied by the current convolution window is in the dotted line frame, as shown by an arrow, the weight data is stored in a single register, and each clock shift register carries the right address data to the left. The left address not only outputs data but also stores the data in the designated address of the last shift register.

And outputting data to the REG register by each clock of the shift register, and moving the data by each clock in the REG register according to the arrow direction to guide all weight data sets of the convolution window to be obtained at the effective position.

And the multiplication group unit is used for performing multiplication operation on the input spectrogram data and the convolution operation weight parameters to obtain a convolution operation result.

The multiplication group unit of the embodiment adopts a pipeline design, and each clock outputs the product result of all elements in the convolution window.

The accumulation unit is used for accumulating the operation results of the convolution control units. Aiming at the characteristics of hardware, the accumulation process is divided into an adder tree structure with multi-stage accumulation, and the intermediate result of each stage of operation is cached by a register to form a pipeline structure.

The accumulation unit of the present embodiment includes two parts, Add _ col (row accumulation) and Add _ row (column accumulation), as shown in fig. 6 and 7. The Add _ col module correspondingly accumulates and outputs the 4 paths of data synchronously input in parallel, and the assembly line processing calculation is adopted without extra parameter configuration. The Add _ row module accumulates data as serial input and the first frame of data is buffered in RAM. In the accumulation, the result of the previous accumulation in the RAM is read out by address and written to the original address after the addition. After the last frame accumulation is completed, the bias offset value is added. And after the data input is finished, obtaining a frame of output characteristic diagram, and sending the frame of output characteristic diagram to a post-stage module.

The convolution control unit is used for controlling the time sequence of the shift register, the multiplication group unit and the accumulation unit through a preset convolution control finite state machine so as to realize convolution operation.

The convolution control unit contains two types of inputs: the overall controller command ctrl _ cmd and the input flag signal flag _ in. The output includes three categories: parameter configuration signal config, output flag signal flag _ out, and read data signal Rd.

The convolution control state machine comprises three nested state machines, wherein the outermost layer is in a layer state, the second layer is in an operation state, and the innermost layer is in a data transmission state. The state machines and state descriptions are shown in table 2.

The layer state is divided into convolutional layers, and the signal initialization is completed before the layer0 operation is performed, and the ready state IDLE is added. The layer state machine transition diagram is shown in fig. 8.

The initialization of convolution operation related signals is completed in an IDLE state, and convolution operations from the 0 th LAYER to the 15 th LAYER are completed in a LAYER0-LAYER15 state.

TABLE 2 composition and description of convolution control State machine

According to the output flag signals of the Conv _ ctrl module, up _ fm, up _ w, get _ fm, PS, it can be determined when to input the weight data and the feature map data to the convolution operation module and when to read the data from the output buffer of the convolution module.

The operating state is shown in fig. 9. In the CONFIG state, parameter configuration of the relevant module is mainly completed.

And in the CAL state, determining the reality of reading the weight kernel characteristic diagram data and recording the number of the convolution kernels which finish the operation.

And generating a flag signal for updating data in the DM state, and recording the number of the convolution kernels which are operated.

The data transfer state is shown in fig. 10.

The data transfer state is a sub-state in the layer state DM. According to different data carrying requirements, the method comprises the following steps: WAIT state WAIT, MOVE feature map state MOVE _ M, MOVE weight and feature map state MOVE _ MW, MOVE NEXT data state MOVE _ NEXT.

And S22, constructing the R-CED neural network digital logic subsystem through the neural network convolution unit.

The R-CED neural network digital logic subsystem further comprises an input spectrogram data filling module which is used for performing padding 0 filling operation in the process of performing convolution operation on the input spectrogram data by the neural network convolution module in a full filling mode.

The filling module performs three steps:

a1, before the original characteristic diagram is written into the OutputRAM, writing 0 with corresponding number into the reserved space;

a2, during the writing period of the original characteristic diagram, after the writing of each line of data is finished, writing 0 with corresponding number;

and A3, after the characteristic diagram is written, continuously writing 0 with the corresponding number. A

The control of the filling module uses a filling operation state machine, the transition diagram of which is shown in fig. 11.

The fill operation state machine contains five states: IDLE, UP _ ROW, MAP _ ROW, PAD0, DOWN _ ROW. Specific meanings of the respective states are shown in table 3.

Table 3 composition and description of the filling operation state machine

S23, constructing a neural network convolution parameter storage module by adopting logic units in a programmable logic PL (programmable logic) end of Zynq7020 type hardware platform FPGA.

In this embodiment, the design of the neural network convolution parameter storage module is shown in fig. 12. FM _ K for 14 channel convolutional layer_k(k-0, 1,2,3, …) and W _ C_c(c is 0,1,2,3, …) respectively representing the feature map and weight data of the k, c channel, the redundant channel is filled with 0 because 4 RAMs have 16 channels in total;

in the figure, Convx1_ x2(x1, x2 is 0,1,2,3) indicates that the input of the convolution module comes from x2 channels of the x1 th weight RAM and the x2 th input feature map RAM, so a total of 16 convolution operation modules work simultaneously; the input profile buffer and transfer scheme for each convolutional layer are shown in table 4.

TABLE 4 input characteristic diagram buffer storage and transportation scheme of each convolution module stored by neural network convolution parameter storage module

And S3, denoising the spectrogram data through the R-CED neural network digital logic subsystem.

And S4, performing time domain reduction on the speech spectrogram data subjected to noise reduction through a processor system PS (packet switched) end of the Zynq7020 type hardware platform FPGA to obtain speech enhancement data.

In conclusion, the invention has the following beneficial effects:

The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. A speech enhancement method for operating a neural network by using embedded hardware is characterized by comprising the following steps:

s2, constructing an R-CED neural network by adopting a logic unit of the FPGA to obtain an R-CED neural network digital logic subsystem;

2. The speech enhancement method using embedded hardware to run neural network according to claim 1, wherein in step S1, the speech data is fourier transformed by the programmable logic PL port of the Zynq7020 type hardware platform FPGA;

3. The speech enhancement method for operating a neural network with embedded hardware according to claim 2, wherein the step S2 comprises the following sub-steps:

4. The speech enhancement method according to claim 3, wherein the neural network convolution module constructed in step S21 comprises: the device comprises a shift register, at least one multiplication group unit, a convolution control unit and an accumulation unit;

5. The speech enhancement method for operating a neural network by using embedded hardware as claimed in claim 3, wherein the R-CED neural network digital logic subsystem further comprises an input spectrogram data padding module for performing padding 0 operation in the convolution operation of the input spectrogram data by the neural network convolution module in a full padding manner.