CN112328957A

CN112328957A - Fourier transform system and method based on implementation through neural network hardware system

Info

Publication number: CN112328957A
Application number: CN202011057903.5A
Authority: CN
Inventors: 孔超; 唐士斌; 欧阳鹏
Original assignee: Beijing Qingwei Intelligent Technology Co ltd
Current assignee: Beijing Qingwei Intelligent Technology Co ltd
Priority date: 2020-09-29
Filing date: 2020-09-29
Publication date: 2021-02-05
Anticipated expiration: 2040-09-29
Also published as: CN112328957B

Abstract

The invention discloses a Fourier transform system realized through a neural network hardware system, and belongs to the technical field of neural network chip-based systems. The method comprises the following steps: the input data points are arranged to a set two-dimensional length. And respectively acquiring a first real part matrix and a second real part matrix corresponding to the input data points and the real part equation of the twiddle factor. The input data matrix is arranged into a two-dimensional first matrix corresponding to the row number and the column number of the computing unit array. And acquiring a fourth matrix according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix. And the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full-connection layer calculation unit to obtain a fifth matrix. And acquiring output Fourier real part data and output Fourier imaginary part data according to the fifth matrix. The invention solves the problems of lower performance and larger area when the artificial intelligent chip calculates Fourier transform in the application scenes of voice recognition, voice synthesis and the like in the prior art.

Description

Fourier transform system and method based on implementation through neural network hardware system

Technical Field

The invention belongs to the technical field of neural network chip based, and particularly relates to a Fourier transform system and method based on implementation of a neural network hardware system.

Background

The fourier transform is an algorithm commonly used in speech and DSP (digital signal processing). Currently, most of the artificial intelligence chips employ an integrated ASIC (application specific integrated chip) in hardware, and most of the algorithms calculate fourier transform by FFT (fast fourier transform). Taking the number of the calculated points as 1024 points as an example, if the method is used, about 20000 clock cycles are needed to calculate the 1024 points, the performance is difficult to meet the requirement, and the area and the power consumption are also difficult to accept. How to calculate the Fourier transform in an artificial intelligence chip under the application scenes of voice recognition, voice synthesis and the like, the technical difficulty is that the performance is high and the area is small.

Disclosure of Invention

The invention aims to provide a Fourier transform system and a Fourier transform method realized through a neural network hardware system, and aims to solve the problems of low performance and large area when an artificial intelligent chip calculates Fourier transform in application scenes such as voice recognition, voice synthesis and the like in the prior art.

In order to achieve the above purpose, the invention provides the following technical scheme:

a fourier transform system implemented via a neural network hardware system, comprising:

the neural network hardware system comprises a first full connection layer computing unit and a second full connection layer computing unit. The neural network hardware system has a plurality of computing unit arrays. The array of computational cells is a two-dimensional array. The two-dimensional matrix has a number of bits equal to the number of row bits multiplied by the number of column bits. The Fourier transform method realized by the neural network hardware system comprises the following steps:

in step S101, a plurality of input data points are arranged into an input data matrix with a set two-dimensional length. The number of the plurality of input data points is an integer multiple of the number of bits of the two-dimensional array. The two-dimensional length corresponds to the number of row bits and the number of column bits of the calculation unit.

And S102, obtaining a real part equation and an imaginary part equation of the twiddle factor by combining the twiddle factor formula and the Euler formula.

Step S103, according to the multiple input data points and the real part equation of the twiddle factor, respectively obtaining a first real part matrix and a second real part matrix corresponding to the multiple input data points and the real part equation of the twiddle factor. And respectively acquiring a first imaginary matrix and a second imaginary matrix corresponding to the plurality of input data points and the twiddle factor imaginary part equation.

The first real matrix and the first imaginary matrix are arranged according to a first number of bits. The first number of bits is either the number of row bits or the number of column bits. The second real matrix and the second imaginary matrix are arranged according to a second number of bits. The second number of bits is a two-dimensional matrix number of bits.

Step S104, arranging the input data matrix into a two-dimensional first matrix corresponding to the row digit and the column digit of the computing unit array. And arranging the first real part matrix and the first imaginary part matrix into a two-dimensional second matrix corresponding to the row number and the column number of the computing unit array respectively.

And S105, exchanging the row and column orders of the two-dimensional first matrix and the two-dimensional second matrix.

And step S106, multiplying the two-dimensional first matrix and the two-dimensional second matrix by the first full-connection layer calculation unit to obtain a two-dimensional third matrix.

And S107, acquiring a fourth matrix according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix.

Step S108, the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full connection layer calculation unit, and a fifth matrix is obtained.

And step S109, acquiring output Fourier real part data and output Fourier imaginary part data according to the fifth matrix.

On the basis of the technical scheme, the invention can be further improved as follows:

further, the number of the plurality of input data points is 1024 input data points. The input data matrix is a 1 x 1024 matrix. The cell array was calculated as a 32 x 32 two-dimensional matrix.

Further, the first real part matrix and the first imaginary part matrix are 1 x 1024 matrices. The first bit number is 32 bits. The second real part matrix and the second imaginary part matrix are 1 x 1024 matrices. The first number of bits is 1024 bits.

Further, the two-dimensional first matrix is a 32 x 32 matrix. The second two-dimensional matrix is a 32 x 32 matrix.

Further, the two-dimensional fourth matrix is a 32 × 32 matrix. The second two-dimensional matrix is a 32 x 32 matrix.

A neural network hardware system capable of realizing Fourier transform comprises a first full-connection layer computing unit and a second full-connection layer computing unit. The neural network hardware system has a plurality of computing unit arrays. The array of computational cells is a two-dimensional array. The two-dimensional matrix has a number of bits equal to the number of row bits multiplied by the number of column bits. A neural network hardware system capable of implementing a fourier transform, configured to:

a plurality of input data points are arranged into an input data matrix with a set two-dimensional length. The number of the plurality of input data points is an integer multiple of the number of bits of the two-dimensional array. The two-dimensional length corresponds to the number of row bits and the number of column bits of the calculation unit.

And obtaining a real part equation and an imaginary part equation of the twiddle factor by combining the twiddle factor formula and the Euler formula.

And respectively acquiring a first real part matrix and a second real part matrix corresponding to the plurality of input data points and the real part equation of the twiddle factor. And respectively acquiring a first imaginary matrix and a second imaginary matrix corresponding to the plurality of input data points and the twiddle factor imaginary part equation.

The input data matrix is arranged into a two-dimensional first matrix corresponding to the row number and the column number of the computing unit array. And arranging the first real part matrix and the first imaginary part matrix into a two-dimensional second matrix corresponding to the row number and the column number of the computing unit array respectively.

And exchanging the row and column orders of the two-dimensional first matrix and the two-dimensional second matrix.

And the two-dimensional first matrix and the two-dimensional second matrix are multiplied through the first full-connection layer calculation unit to obtain a two-dimensional third matrix.

And acquiring a fourth matrix according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix.

And the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full-connection layer calculation unit to obtain a fifth matrix.

And acquiring output Fourier real part data and output Fourier imaginary part data according to the fifth matrix.

The invention has the following advantages:

1. the hardware consumption is small: the neural network is used for Fourier transform, the existing neural network hardware layer can be flexibly used in hardware implementation, the IP of ASIC-FFT does not need to be integrated again, and the chip area can be greatly reduced.

2. The calculation is efficient: compared with approximately 20000 clock cycles required by ASIC calculation, the invention can approximately complete one 1024-point Fourier transform by only using 192 times of 32X 32 full connection layers and 8 times of 32X 32 Eltwise layers. Under the scene of speech recognition, the efficiency of speech recognition can be greatly improved.

3. The storage is efficient: the twiddle factors required by the present invention are only 2 x 32, and compared with 1024 x 1024 DFT, the storage space required by the present invention is only 1/512 of DFT.

4. The quantization precision is high: the conventional FFT with radix-2 decimation in time/frequency has 10 layers, and the multiplication of each layer generates a loss of precision. The number of the calculated layers is only 3 (2 times of full connection layers and 1 time of eltwise layers), so that the precision loss is greatly reduced.

5. The hardware is friendly: from the aspect of hardware design, it is difficult to separately design a high-dimensional fully-connected layer and an Eltwise layer similar to 1024 x 1024 for Fourier transformation in a neural network chip at present, and the hardware design of the low-dimensional fully-connected layer and the Eltwise layer is popular. The problem of find accurate phase place spectrum among the prior art time consuming for the phase place is unmatched is solved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a flow chart of a Fourier transform method of the present invention.

FIG. 2 is a schematic diagram of the arrangement of W32 according to the present invention.

FIG. 3 is a schematic diagram of a W1024 arrangement according to the present invention.

FIG. 4 is a flow chart of the Fourier transform system calculation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.

As shown in fig. 1-4, the embodiment of the present invention provides a fourier transform system based on hardware system implementation through a neural network.

Compared with approximately 20000 clock cycles required by ASIC calculation, the Fourier transform system of the invention can approximately complete one 1024-point Fourier transform by using only 192 times of 32X 32 full-connection layers and 8 times of 32X 32 Eltwise layers. Under the scene of speech recognition, the efficiency of speech recognition can be greatly improved.

The neural network hardware system comprises a first full connection layer computing unit and a second full connection layer computing unit. The neural network hardware system has a plurality of computing unit arrays. The array of computational cells is a two-dimensional array. The two-dimensional matrix has a number of bits equal to the number of row bits multiplied by the number of column bits.

The conventional FFT with radix-2 decimation in time/frequency has 10 layers, and the multiplication of each layer generates a loss of precision. The number of the calculated layers is only 3 (2 times of full connection layers and 1 time of eltwise layers), so that the precision loss is greatly reduced.

The Fourier transform method realized by the neural network hardware system comprises the following steps:

in step S101, a plurality of input data points are arranged into an input data matrix with a set two-dimensional length.

In this step, a plurality of input data points are arranged into an input data matrix with a set two-dimensional length. The number of the plurality of input data points is an integer multiple of the number of bits of the two-dimensional array. The two-dimensional length corresponds to the number of row bits and the number of column bits of the calculation unit.

Let 1024 points of input be { data _0, data _1, data _2,. and data _1023}, which is effectively a 1 1024 × 1 matrix.

In the step, a twiddle factor formula and an Euler formula are combined to obtain a real part equation and an imaginary part equation of the twiddle factor.

By the definition of the twiddle factor and the euler formula:

obtain the real part of the twiddle factor as W_{N_Real}(i) Cos (2 pi × i/N), where i is 0,1, 2. Imaginary part of the twiddle factor is W_{N_IM}(i) Sin (2 pi × i/N), where i is 0,1, 2.

The fourier transform system of the present invention requires only 2 x 32 twiddle factors, and compared to 1024 x 1024 DFT, the fourier transform system of the present invention requires only 1/512 storage space.

Step S103, according to the multiple input data points and the real part equation of the twiddle factor, respectively obtaining a first real part matrix and a second real part matrix corresponding to the multiple input data points and the real part equation of the twiddle factor.

In this step, according to the multiple input data points and the real part equation of the twiddle factor, a first real part matrix and a second real part matrix corresponding to the multiple input data points and the real part equation of the twiddle factor are respectively obtained. And respectively acquiring a first imaginary matrix and a second imaginary matrix corresponding to the plurality of input data points and the twiddle factor imaginary part equation.

According to formula 1, the real part matrix of the twiddle factor matrix W32 is formed as W32_ R ═ { W32_ R _0, W32_ R _ 1., W32_ R _1023}, where W32_ R _ i ═ cos (2 × pi i/32), i ═ 32 × p + q, i ═ 0,1, 2.., 1023. p, q ═ 0,1, 2. W32_ R is a matrix of 1 × 1024.

According to formula 1, an imaginary matrix forming the twiddle factor matrix W1024_ I ═ W1024I0, W1024_ I _ 1., W1024_ I _1023}, W1024_ I ═ sin (2 × pi I/1024), I ═ 32 × p + q, I ═ 0,1, 2., 1023. W1024_ I is a matrix of 1 × 1024.

Step S104, arranging the input data matrix into a two-dimensional first matrix corresponding to the row digit and the column digit of the computing unit array.

In this step, the input data matrix is arranged into a two-dimensional first matrix corresponding to the row number and the column number of the calculation unit array. And arranging the first real part matrix and the first imaginary part matrix into a two-dimensional second matrix corresponding to the row number and the column number of the computing unit array respectively.

Input 1024 point data is reordered into a 32 x 32 matrix which is marked as a data _ in matrix.

And similarly, W32_ R, W32_ I, W1024_ R and W1024_ I are all reordered into a matrix of 32 × 32.

Where p is taken as the number of rows and q is taken as the number of columns, p, q being 0,1, 2.

In this step, the row and column orders of the two-dimensional first matrix and the two-dimensional second matrix are exchanged.

Reading the elements of the data _ in matrix in columns, which is actually a transposition operation, and recording the result as a data _ in _ t matrix.

After transposing

In this step, the two-dimensional first matrix and the two-dimensional second matrix are multiplied by the first full-connection layer calculation unit to obtain a two-dimensional third matrix.

The full join operation is actually a vector matrix multiplication. If the vector is raised to form a matrix, the multiplication of two matrices can be regarded as a multi-group full-connection operation. Take the example of a 32 × 32 matrix multiplied by a 32 × 32 matrix.

First, the full-connection operation is performed on each row of data _ in _ t (32 × 32) and W32_ R (32 × 32), and the result is denoted as data _ out _1_ R (32 × 32), and 32 full-connection operations are performed in total.

And performing full connection operation on each row of the data _ in _ t (32 × 32) and W32_ I (32 × 32), and recording the result as data _ out _1_ I (32 × 32), wherein 32 times of 32 × 32 full connection operations are total.

There are 64 total operations of 32 x 32 full joins.

In this step, a fourth matrix is obtained according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix.

In this step, the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full connection layer calculation unit to obtain the fifth matrix.

In this step, output fourier real part data and output fourier imaginary part data are obtained according to the fifth matrix.

The number of the plurality of input data points is 1024 input data points. The input data matrix is a 1 x 1024 matrix. The cell array was calculated as a 32 x 32 two-dimensional matrix.

The Eltwise operation is actually an operation performed on corresponding elements of two matrices, such as two 2 × 2 matrices

Eltwise multiplication:

eltwise addition:

the data _ out _1_ R (32 × 32) and W1024_ R (32 × 32) are multiplied by corresponding elements of the Eltwise layer, and the result is denoted as data _ out _2_ R1(32 × 32).

And secondly, multiplying corresponding elements of the Eltwise layer by the data _ out _1_ I (32 × 32) and the W1024_ I (32 × 32), and marking the result as data _ out _2_ R2(32 × 32).

And thirdly, multiplying corresponding elements of the Eltwise layer by the data _ out _1_ R (32 × 32) and the W1024_ I (32 × 32), and marking the result as data _ out _2_ I1(32 × 32).

And fourthly, multiplying corresponding elements of the Eltwise layer by the data _ out _1_ I (32X 32) and the W1024_ R (32X 32), and recording the result as data _ out _2_ I2 (32X 32).

And fifthly, corresponding elements of the Eltwise layer are added to data _ out _2_ R1 (32X 32) and data _ out _2_ R2 (32X 32), and the result is marked as data _ out _2_ R (32X 32).

Sixthly, corresponding elements of the Eltwise layer are added to the data _ out _2_ I1(32 × 32) and the data _ out _2_ I2(32 × 32), and the result is recorded as data _ out _2_ I (32 × 32).

There are 4 total multiplications at the Eltwise layer of 32 x 32 and 2 additions at the Eltwise layer of 32 x 32.

Fully-Connected (Fully Connected layer)

The full-connection operation is performed on each row of data _ out _2_ R (32 × 32) and W32_ R (32 × 32), and the result is denoted as data _ out _3_ R1(32 × 32), and 32 full-connection operations are performed in total.

And performing full connection operation on each row of the data _ out _2_ I (32 × 32) and the W32_ I (32 × 32) respectively, and recording the result as data _ out _3_ R2(32 × 32), wherein 32 times of 32 × 32 full connection operation are total.

And thirdly, performing full connection operation on each row of the data _ out _2_ R (32 × 32) and W32_ I (32 × 32), and recording the result as data _ out _3_ I1(32 × 32), wherein 32 times of 32 × 32 full connection operation are total.

And fourthly, performing full connection operation on each row of the data _ out _2_ I (32) and the W32_ R (32) respectively, and recording the result as data _ out _3_ I2 (32) so as to obtain 32 times of 32 full connection operations in total.

There are 128 total operations of 32 x 32 full connections

Eltwise (corresponding element operation)

Adding corresponding elements of the Eltwise layer to data _ out _3_ R1(32 × 32) and data _ out _3_ R2(32 × 32), and marking the result as data _ out _ R (32 × 32).

And secondly, adding corresponding elements of the Eltwise layer to the data _ out _3_ I1 (32X 32) and the data _ out _3_ I2 (32X 32), and marking the result as data _ out _ I (32X 32).

There are 2 total additions of 32 x 32 Eltwise layers.

The first real part matrix and the first imaginary part matrix are 1 x 1024 matrices. The first bit number is 32 bits. The second real part matrix and the second imaginary part matrix are 1 x 1024 matrices. The first number of bits is 1024 bits.

The two-dimensional first matrix is a 32 x 32 matrix. The second two-dimensional matrix is a 32 x 32 matrix.

The fourth two-dimensional matrix is a 32 x 32 matrix. The second two-dimensional matrix is a 32 x 32 matrix.

From the aspect of hardware design, it is difficult to separately design a high-dimensional fully-connected layer and an Eltwise layer similar to 1024 x 1024 for Fourier transformation in a neural network chip at present, and the hardware design of the low-dimensional fully-connected layer and the Eltwise layer is popular.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solutions of the present invention, and not for limiting the same. Although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: modifications of the technical solutions described in the embodiments above, or equivalent substitutions of some technical features, can still be made. And such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A Fourier transform method realized by a neural network hardware system is characterized in that the neural network hardware system comprises a first full connection layer computing unit and a second full connection layer computing unit; the neural network hardware system has a plurality of computing unit arrays; the computing unit array is a two-dimensional array; the two-dimensional matrix bit number is equal to the row bit number multiplied by the column bit number; the Fourier transform method realized by the neural network hardware system comprises the following steps:

step S101, arranging a plurality of input data points into an input data matrix with a set two-dimensional length; the number of the plurality of input data points is an integral multiple of the number of bits of the two-dimensional array; the two-dimensional length corresponds to the number of row bits and the number of column bits of the computing unit;

step S102, a twiddle factor formula and an Euler formula are combined to obtain a real part equation and an imaginary part equation of the twiddle factor;

step S103, respectively acquiring a first real part matrix and a second real part matrix corresponding to the plurality of input data points and the real part equation of the twiddle factor; respectively acquiring a first imaginary matrix and a second imaginary matrix corresponding to the plurality of input data points and the twiddle factor imaginary equation;

the first real matrix and the first imaginary matrix are arranged according to a first number of bits; the first digit is the row digit or the column digit; the second real matrix and the second imaginary matrix are arranged according to a second number of bits; the second digit is the digit of the two-dimensional matrix;

step S104, arranging the input data matrix into a two-dimensional first matrix corresponding to the row number and the column number of the computing unit array; arranging the first real part matrix and the first imaginary part matrix into a two-dimensional second matrix corresponding to the row number and the column number of the computing unit array respectively;

step S105, exchanging the row and column orders of the two-dimensional first matrix and the two-dimensional second matrix;

step S106, the two-dimensional first matrix and the two-dimensional second matrix are multiplied through the first full-connection layer calculation unit to obtain a two-dimensional third matrix;

step S107, a fourth matrix is obtained according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix;

step S108, the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full-connection layer calculation unit to obtain a fifth matrix;

2. The fourier transform method of claim 1, wherein the number of the plurality of input data points is 1024 input data points; the input data matrix is a 1 x 1024 matrix; the array of computational cells is a two-dimensional matrix of 32 x 32.

3. The fourier transform method according to claim 1 or 2, wherein the first real part matrix and the first imaginary part matrix are 1 x 1024 matrices; the first number of bits is 32 bits; the second real part matrix and the second imaginary part matrix are 1 x 1024 matrices; the first number of bits is 1024 bits.

4. A method of fourier transformation according to claim 3, wherein the two-dimensional first matrix is a 32 x 32 matrix; the two-dimensional second matrix is a 32 x 32 matrix.

5. A method of fourier transformation as claimed in claim 3, wherein the two-dimensional fourth matrix is a 32 x 32 matrix; the two-dimensional fifth matrix is a 32 x 32 matrix.

6. A neural network hardware system capable of realizing Fourier transform is characterized by comprising a first full-connection layer computing unit and a second full-connection layer computing unit; the neural network hardware system has a plurality of computing unit arrays; the computing unit array is a two-dimensional array; the two-dimensional matrix bit number is equal to the row bit number multiplied by the column bit number; the neural network hardware system capable of realizing Fourier transform is configured to:

arranging a plurality of input data points into an input data matrix with a set two-dimensional length; the number of the plurality of input data points is an integral multiple of the number of bits of the two-dimensional array; the two-dimensional length corresponds to the number of row bits and the number of column bits of the computing unit;

obtaining a real part equation and an imaginary part equation of the twiddle factor by combining a twiddle factor formula and an Euler formula;

respectively acquiring a first real part matrix and a second real part matrix corresponding to the plurality of input data points and the real part equation of the twiddle factor; respectively acquiring a first imaginary matrix and a second imaginary matrix corresponding to the plurality of input data points and the twiddle factor imaginary equation;

arranging the input data matrix into a two-dimensional first matrix corresponding to the row number and the column number of the computing unit array; arranging the first real part matrix and the first imaginary part matrix into a two-dimensional second matrix corresponding to the row number and the column number of the computing unit array respectively;

exchanging the row and column orders of the two-dimensional first matrix and the two-dimensional second matrix;

the two-dimensional first matrix and the two-dimensional second matrix are multiplied through the first full-connection layer computing unit to obtain a two-dimensional third matrix;

acquiring a fourth matrix according to the two-dimensional third matrix, the second real part matrix and the second imaginary part matrix;

the operation of the fourth matrix, the first real part matrix and the first imaginary part matrix is realized through the first full-connection layer calculation unit to obtain a fifth matrix;

7. The neural network hardware system of claim 6, wherein the number of the plurality of input data points is 1024 input data points; the input data matrix is a 1 x 1024 matrix; the array of computational cells is a two-dimensional matrix of 32 x 32.

8. The neural network hardware system of claim 6 or 7, wherein the first real matrix and the first imaginary matrix are 1 x 1024 matrices; the first number of bits is 32 bits; the second real part matrix and the second imaginary part matrix are 1 x 1024 matrices; the first number of bits is 1024 bits.

9. The neural network hardware system of claim 8, wherein the two-dimensional first matrix is a 32 x 32 matrix; the two-dimensional second matrix is a 32 x 32 matrix.

10. The neural network hardware system of claim 8, wherein the two-dimensional fourth matrix is a 32 x 32 matrix; the two-dimensional fifth matrix is a 32 x 32 matrix.