CN111091187A

CN111091187A - Discrete BAM neural network system based on FPGA

Info

Publication number: CN111091187A
Application number: CN201911171220.XA
Authority: CN
Inventors: 汪木兰; 朱昊; 包永强; 刘婷婷; 宋宇飞; 蒋姝
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2019-11-26
Filing date: 2019-11-26
Publication date: 2020-05-01

Abstract

The invention discloses a discrete BAM neural network system based on FPGA, which comprises an address selection module AddSel, a weight Matrix storage and reading module SRAM, a HADAMARD preprocessing module hdm and an associative memory recovery diagnosis module Matrix, wherein the address selection module AddSel, the weight Matrix storage and reading module SRAM, the HADAMARD preprocessing module hdm and the associative memory recovery diagnosis module Matrix are encapsulated in the FPGA; a weight matrix storage and reading module SRAM serial storage reads the weight matrix obtained by the BAM neural network system; the address selection module AddSel selects addresses in the process of weight reading and writing; the HADAMARD preprocessing module hdm is used for preprocessing HADAMARD of the input vector; and an association recall diagnosis module Matrix normalizes and processes the online association recall diagnosis process and result of the BAM neural network. Compared with the traditional BAM neural network calculation by using a software program, the method does not need to occupy a CPU, and has higher speed and stronger anti-interference performance than the software calculation.

Description

Discrete BAM neural network system based on FPGA

Technical Field

The invention belongs to the technical field of design of a discrete Bidirectional Associated Memory (BAM) neural network model, and particularly relates to a discrete BAM neural network system based on an FPGA (field programmable gate array).

Background

The research of the associative memory network is an important branch of the neural network, and the BAM network proposed by B · Kosko in 1988 is most widely applied among various associative memory network models, and can realize bidirectional different associations. The BAM network is divided into a discrete type, a continuous type, an adaptive type and the like, and the invention relates to the discrete type BAM network.

The topology of the discrete BAM network is shown in fig. 1, which is a two-layer bidirectional network, and when a signal is input to one layer, the other layer can obtain a corresponding output value. Since the initial mode can act on any layer of the network and information can be propagated in two directions, there is no explicit input layer and output layer, one of which may be referred to as an X layer, and there are n neuron nodes, i.e., X ═ X₁,x₂,…,x_n]Another layer, called the Y layer, has m neuron nodes, i.e., Y ═ Y₁,y₂,…,y_m]The state of the node may be unipolar {0, +1} or bipolar { -1, +1 }. If the weight matrix from X to Y is W, the weight matrix from Y to X is the transposed matrix W^T。

The process of implementing bidirectional hetero-association in a BAM network is the process of network operation from dynamic to steady state. For BAM network with established weight matrix, when inputting sample X^pWhen acting on the X side, X (1) ═ X^pThe weight is transmitted to the Y side through the W weight matrix, and the transfer function f of the node on the Y side is passed through_yAfter nonlinear transformation, the output vector is obtained

Y(1)＝f_y(WX(1)) (1)

Then passing the output through W^TThe weight matrix weighting is transmitted back to X from Y side as input, and is passed through transfer function f of X side node_xAfter nonlinear transformation, the output vector is obtained

X(2)＝f_x[W^TY(1)]＝f_x{W^T[f_y(WX(1))]} (2)

Such asThe bidirectional round-trip process is carried out until the states of all the neurons on both sides are not changed any more. The network state at this time is called steady state, and the corresponding Y-side output vector Y^pIs the input mode X^pAnd (4) obtaining a result after two-way association. Similarly, if the mode Y is inputted from the Y side^pAfter the association process, the X side will output the association result X^p. This dynamic process of two-way association can be illustrated by fig. 2 and 3.

Further, it can be deduced that the dynamic equations corresponding to X (t +1) and Y (t +1) are

X(t+1)＝f_x{W^Tf_y[WX(t)]} (3)

Y(t+1)＝f_y{Wf_x[W^TY(t)]} (4)

It can be seen that for the fully trained weight matrix, when a defective stored pattern is input to one side of the BAM network, the network can not only realize correct cross-correlation at the other side through a limited number of operations, but also reconstruct a complete input pattern at the input side.

At present, the traditional neural network is usually calculated by a software program. For large processing systems, this may be done using server or high performance computer computing. However, for small low-power-consumption systems such as palm systems and mobile platforms, the core embedded processor has relatively low computing power and low computing speed, and is more susceptible to interference due to the complex environment of the mobile platform.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a discrete BAM neural network system based on an FPGA, aiming at the defects of the prior art.

In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:

a discrete BAM neural network system based on FPGA comprises an address selection module AddSel, a weight Matrix storage and reading module SRAM, a HADAMARD preprocessing module hdm and an associative memory-recall diagnosis module Matrix, wherein the address selection module AddSel, the weight Matrix storage and reading module SRAM, the HADAMARD preprocessing module hdm and the associative memory-recall diagnosis module Matrix are encapsulated in the FPGA;

the weight matrix storage and reading module SRAM is used for serially storing a weight matrix obtained by the BAM neural network system through offline learning, and when the online association recall diagnosis process is started, the weight is read out serially from the weight matrix storage and reading module SRAM;

the address selection module AddSel is used for selecting addresses in weight reading and writing processes;

the HADAMARD preprocessing module hdm is used for preprocessing HADAMARD of the input vector;

the association recall diagnosis module Matrix is used for the on-line association recall diagnosis process of the BAM neural network and the normalization processing of results.

In order to optimize the technical scheme, the specific measures adopted further comprise:

the weight matrix storage and reading module SRAM is generated by using a macro function module (LPM) in QuartusII, and the scale, word length, and initial value of the weight matrix storage and reading module SRAM are set by a user.

The BAM neural network system receives 16-bit discrete input row vector signals, obtains orthogonal or approximately orthogonal row vectors after HADAMARD preprocessing, and performs normalization processing; and then reading corresponding weight column vectors from the weight matrix in sequence, performing online association memory-recall diagnosis with the input row vector to obtain corresponding discrete output vectors, and outputting the discrete output vectors after normalization processing.

The weight matrix storage and reading module SRAM adopts a vector form to store a weight matrix W_16×16That is, the weight matrix is divided into 16 column vectors, each column vector contains 16 elements, and each element has at most 5 bits, so each column vector has: 5 × 16 ═ 80 bits.

In the BAM neural network system, the association calculation is carried out by adopting a serial-parallel combination method, and an input row vector X is subjected to_1×16And weight matrix W_16×16One column in the system adopts a parallel computing mode to obtain an output row vector Y_1×16Then the other 15 elements are sequentially calculated over 15 clock cycles.

The weight matrixThe storage and reading module SRAM stores the HADAMARD matrix in vector form, the HADAMARD matrix (H)_16×16) The matrix is a 16 × 16 square matrix, and the value of the matrix element is "+ 1" or "-1", so that each element is represented by a 2-bit binary number;

the HADAMARD matrix has 16 column vectors, each column vector comprising 16 elements, so that the total binary number of the column vectors is 2 × 16-32 bits;

in the multiplication of the HADAMARD matrix with the input row vector signals, the elements in the multiplier and multiplicand matrices are: and +1, 0 or-1, the multiplication operation at this time is realized by a conditional judgment statement.

The HADAMARD preprocessing module hdm comprises 16 row-column vector multiplication submodules, wherein the 16 row-column vector multiplication submodules are controlled by the same clk clock signal to complete input of row vectors and a HADAMARD matrix H in parallel_16×16The multiplication and addition operation of the middle 16 column vectors, and normalization processing and output are performed;

each row-column vector multiplication submodule has the same structure and is realized through a VHDL program;

each row-column vector multiplication submodule comprises a vector multiplication submodule element and a vector addition submodule addr 16;

16 vector multiplication sub-module elements realize 16 elements in input row vector and HADAMARD matrix H_16×16Multiplying 16 elements in the column vector, and realizing the multiplication in parallel through conditional statements;

the subsequent vector addition submodule addr 16 completes the addition function of 16 products and carries out normalization processing.

The associative memory diagnostic module Matrix comprises 16 2-bit input signals, and the output from the HADAMARD preprocessing module hdm is a normalized vector after HADAMARD preprocessing; and an 80-bit weight matrix column vector, which receives 16 weight data from the weight matrix storage and reading module SRAM;

matrix multiplication operation in an associative memory diagnosis module Matrix adopts a structural form of a state machine, and multiplication and addition operation of actually input row vectors and weight Matrix column vectors and normalization processing of results are realized through a parallel structure.

The invention has the following beneficial effects:

the BAM neural network is realized through FPGA hardening, and the method is suitable for an embedded system. The neural network calculation can be completed on the premise of not occupying an embedded CPU. Compared with the traditional method of running software program calculation by using a CPU, the hardening module provided by the invention has higher calculation speed, is not easy to be interfered, can not be cracked by decompilation software, and improves the protection degree of intellectual property.

Drawings

FIG. 1 is a schematic diagram of a discrete BAM neural network topology;

FIG. 2 is a schematic diagram of the dynamic process of X (t) in conjuction with X (t + 1);

FIG. 3 is a schematic diagram of the dynamic process of Y (t) conjecture Y (t + 1);

FIG. 4 is a block diagram of an FPGA-based discrete BAM neural network system of the present invention;

FIG. 5 is a schematic block diagram of the HADAMARD preprocessing module in the BAM neural network in an embodiment of the present invention;

FIG. 6 is a block diagram illustrating the row and column vector multiplication sub-module in the HADAMARD preprocessing in the BAM neural network according to an embodiment of the present invention;

FIG. 7 is a chip package for FPGA hardening implementation of BAM neural networks in an embodiment of the present invention;

FIG. 8 is a simulation waveform of the weight matrix data read by the SRAM read module in the BAM neural network according to the embodiment of the present invention;

FIG. 9 is a simulation waveform of the HADAMARD preprocessing row-column vector multiplication sub-module in the BAM neural network according to the embodiment of the present invention;

FIG. 10 is a simulation waveform of the HADAMARD preprocessing module hdm in the BAM neural network according to an embodiment of the present invention;

FIG. 11 is a BAM neural network associative memory recall diagnostic module Matrix hardening implementation waveform in an embodiment of the present invention;

FIG. 12 is a FPGA hardening implementation waveform of a BAM neural network in an embodiment of the invention.

Detailed Description

Embodiments of the present invention are described in further detail below with reference to the accompanying drawings.

As shown in fig. 4, the discrete BAM neural network system based on FPGA of the present invention has an online recall diagnosis function;

the system comprises an address selection module AddSel, a weight Matrix storage and reading module SRAM, a HADAMARD preprocessing module hdm and a associative memory recovery diagnosis module Matrix which are packaged in an FPGA;

in the embodiment, the BAM neural network system receives a 16-bit discrete input row vector signal, obtains an orthogonal or approximately orthogonal row vector after HADAMARD preprocessing, and performs normalization processing; and then reading corresponding weight column vectors from the weight matrix in sequence, performing online association memory-recall diagnosis with the input row vector to obtain corresponding discrete output vectors, and outputting the discrete output vectors after normalization processing.

In an embodiment, the weight matrix storage and reading module SRAM is generated by using a macro of parameter Modules (LPM) in QuartusII, and the scale, word length, and initial value of the weight matrix storage and reading module SRAM are set by a user.

The weight matrix W in the system is a signed integer matrix of 16 × 16, if a bipolar vector form is adopted, the value range of each element is maximally [ -16, +16]I.e. each element requires a maximum of 5 bits of binary number representation. On the other hand, when the online associative memory is recalled, the matrix multiplication needs to be performed, that is, the following steps are performed: y ═ XW, i.e. inputVector X_16×1And weight matrix W_16×16Multiplication, so as to complete one associative recall, at most, 16 × 16-256 signed integer multiplications and 15 × 16-240 signed integer additions need to be executed.

If the FPGA adopts a full parallel mode to perform the associative calculation, theoretically, the whole operation can be completed in one clk clock cycle, but actually, the current FPGA chip cannot provide 256 hardware multipliers at the same time. Therefore, due to chip hardware resource limitations, fully parallel operations cannot be achieved. If the FPGA adopts full serial multiplication and addition to carry out associative calculation, only 1 multiplier is theoretically needed, but the calculation time needs at least 256 clk clock cycles, so that the speed is deficient.

Based on the reasons, the BAM neural network system adopts a serial-parallel combination method to carry out associative calculation, and an input row vector X is subjected to associative calculation_1×16And weight matrix W_16×16One column in the system adopts a parallel computing mode to obtain an output row vector Y_1×16Then the other 15 elements are sequentially calculated over 15 clock cycles. For this purpose, a vector storage form is adopted in the SRAM, that is, the weight matrix is divided into 16 column vectors, each column vector contains 16 elements, and each element has at most 5 bits, so each column vector has a total: 5 × 16 ═ 80 bits.

In the examples, the HADAMARD matrix (H)_16×16) It is also a 16 × 16 square matrix, but the matrix elements take on "+ 1" or "-1", so each element can be represented by a 2-bit binary number.

Similar to the weight matrix, the HADAMARD matrix is also stored in the weight matrix storage and reading module SRAM constructed by the LPM. The storage format is also the same as the weight matrix, and takes the form of a vector, which includes 16 column vectors, each column vector includes 16 elements, so that the binary total number of bits of the column vector is 2 × 16 — 32 bits. Unlike the matrix multiplication "Y ═ XW" corresponding to the associative memory, in the multiplication of the HADAMARD matrix with the input row vector signal, the elements in the multiplier and multiplicand matrices are: +1, 0, or-1, in which case the multiplication may be implemented by a conditional judgment statement. On the basis, 256 times of multiplication can be completed by adopting a parallel processing mode, so that the limitation of insufficient hardware multiplier resources in the FPGA can be avoided.

FIG. 5 shows the schematic structure of the hdm module of the HADAMARD pre-processing module, wherein xin [15 … 0] is designed]For the system input of the row vector signals, 16 row-column vector multiplication sub-modules (L1-L16) adopt the same clk clock signal for control, and can complete the input of the row vector and H in parallel_16×16And performing multiplication and addition operation on the 16 column vectors, and performing normalization processing and output.

Each row-column vector multiplication submodule in L1-L16 is read by SRAM H_16×16The system comprises a column vector, a row-column vector multiplication and addition module and the like, namely each row-column vector multiplication sub-module comprises a vector multiplication sub-module element and a vector addition sub-module addr 16; and the structure is completely the same, and is realized by a VHDL program, as shown in fig. 6 in detail. For example, in the SRAM read block of L1, the input address is 0000B, corresponding to the read matrix H_16×16Column 0 of (1); in the SRAM read block of L2, with input address 0001B, matrix H will be read_16×16 Column 1; and so on.

As shown in FIG. 6, the 16 vector multiplication sub-module elements implement 16 elements in the input row vector and the HADAMARD matrix H_16×16Multiplying 16 elements in the column vector, and realizing the multiplication in parallel through conditional statements; the subsequent vector addition submodule addr 16 completes the addition function of 16 products and carries out normalization processing. Therefore, in the BAM neural network HADAMARD preprocessing process, the row-column multiplication submodule is completely composed of a combined circuit, an FPGA internal hardware multiplier is not used, the occupied resources are few, and the processing speed is high.

In the embodiment, as shown in fig. 4, the association recall diagnosis module Matrix implements the on-line association recall diagnosis process of the BAM neural network and the normalization processing function of the result. In the figure, 16 2-bit input signals x1 n-x 16n [1 … 0] come from the output of the HADAMARD preprocessing module hdm, namely, the normalized vector after HADAMARD preprocessing; win [79 … 0] is an 80-bit weight matrix column vector, and receives 16 weight data from the weight matrix storage and reading module SRAM, namely q [79 … 0 ].

Similarly, according to the FPGA hardening implementation structure block diagrams shown in fig. 4, 5, and 6, a corresponding BAM neural network FPGA chip packaging form with 16 input variables can be obtained, the external input-output data bus interface relationship is shown in fig. 7, and the meanings of the corresponding pins are as described above. Because the system fault mode memory is obtained through off-line learning, and after the system fault mode memory is transferred into the hardening module for on-line association memory, the calculated amount is small, the real-time performance is strong, and the system fault mode memory is easy to be embedded into other control systems, thereby realizing the comprehensive diagnosis function of the system fault.

The specific embodiment is as follows:

SRAM operation example of weight matrix storage and reading module

Fig. 8 shows a control waveform of the weight matrix read by the SRAM in the weight matrix storage and read module during implementation of the hardening of the BAM neural network FPGA. Wherein wren is the read-write control end of the weight matrix storage and reading module SRAM, where wren is constantly at low level, which indicates that the SRAM realizes the function of weight column vector data reading, and at this time, the data is input into the bus data [79 … 0]]Is always 0; address input Signal Address [4 … 0]Changing along with the change of a clock clk, wherein the value of the clock clk corresponds to the serial number of the weight matrix array of the BAM neural network; q [79 … 0]For the weight matrix W of the reading_16×16The middle column vector has 80 bits in each group of data actually, and can be displayed locally only after being compressed due to the limitation of the format.

Hadamard preprocessing module hdm operation example

After QuartusII synthesis, the total number of 221 logic units is used, which accounts for less than 1% of the total number of logic units, and the corresponding simulation waveforms are shown in FIG. 9. In the figure, xin [15 … 0]]Inputting a row vector signal for 16 bits; h1 h-h 16h [1 … 0]Is a HADAMARD matrix H_16×16A first column vector of; LineOut [1 … 0]And outputting a result value after row-column multiplication and accumulation operation and normalization processing.

Based on the implementation of the HADAMARD pre-processing module hdm of fig. 4 as shown in fig. 5 and 6, the output waveform of the HADAMARD pre-processing module is obtained as shown in fig. 10. In the figure, xin [15 … 0] is a 16-bit actual input row vector signal; y1 o-y 16o [1 … 0] are standard output signals after orthogonal or near orthogonal preprocessing and normalization by HADAMARD. As can be seen from the waveform diagram, the output signal reliably goes into a steady state over approximately 3 clk clock cycles.

Association recall diagnosis module Matrix operation example

Recall that the diagnostic module Matrix was programmed with VHDL with an execution time of about 16 clk clock cycles, resulting in a Matrix multiplication and a normalized simulation result waveform, as shown in fig. 11. In the figure, x1 n-x 16n [1 … 0]]The standard actual input row vector signal is subjected to orthogonal and normalization processing; addout [3 … 0]For SRAM read address, the corresponding BAM neural network memorizes the weight matrix W_16×16The column number of (1); dataout [15 … 0]Outputting data of the result of associating and recalling the BAM neural network; t8o and t16o are two intermediate data of the monitoring execution process, where t8o is the result of multiplying the 8 th element (2 bits) in the input row vector by the 8 th element (5 bits) in a certain column vector of the weight matrix, and t16o is the result of multiplying the 16 th element (2 bits) in the input row vector by the 16 th element (5 bits) in a certain column vector of the weight matrix.

BAM neural network integral operation waveform

Aiming at a complete structure block diagram for realizing the hardening of a neural network of Bidirectional Associative Memory (BAM) as shown in FIG. 4, a Cyclone II series EP2C35F672C6 chip of Altera company is selected for realizing the hardening, and statistics is carried out after Quartus II synthesis, 3782 logic units are used totally, which accounts for 11% of the total number of the logic units; the 2240 bit memory is used, accounting for less than 1% of the total memory. It can be seen that this FPGA chip resource is sufficient to harden to implement a BAM neural network.

The simulation waveform of the corresponding FPGA hardening implementation of the BAM neural network obtained according to the method is shown in FIG. 12. In the figure, wren is an SRAM read-write control signal; xin [15 … 0]Is a 16-bit actual input row vector signal; addextn [3 … 0]Updating the weight matrix for finishing the offline learning of the BAM neural networkTime of address input signal (corresponding to W)_16×16Column number of); dataextn [79 … 0]Data input signal (corresponding to W) when updating weight matrix for BAM neural network off-line learning_16×16Column vector of (d); dataout [15 … 0]Outputting data of the result of associating and recalling the BAM neural network; ramin [79 … 0]＝q[79…0]Is a weight matrix W read from SRAM_16×16Middle column vector signal (80 bits); addout [3 … 0]Reading an address signal for an SRAM output by an associative memory recovery diagnosis module Matrix; addin [3 … 0]Storing and reading an address input signal of a module SRAM for a weight matrix; hdm8[1 … 0]]The 8 th element in the vector is output for the HADAMARD pre-processing module hdm, corresponding to the y8o signal in fig. 10; hdm16[1 … 0]]The 16 th element in the vector is output for the HADAMARD pre-processing module hdm, corresponding to the y816 signal in fig. 10.

The above is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above-mentioned embodiments, and all technical solutions belonging to the idea of the present invention belong to the protection scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A discrete BAM neural network system based on FPGA is characterized in that: the system comprises an address selection module AddSel, a weight Matrix storage and reading module SRAM, a HADAMARD preprocessing module hdm and a associative memory recovery diagnosis module Matrix which are packaged in an FPGA;

2. The discrete FPGA-based BAM neural network system of claim 1, wherein: the weight matrix storage and reading module SRAM is generated by a macro function module in QuartusII, and the scale, word length and initial value of the weight matrix storage and reading module SRAM are all set by a user.

3. The discrete FPGA-based BAM neural network system of claim 1 or 2, wherein: the BAM neural network system receives 16-bit discrete input row vector signals, obtains orthogonal or approximately orthogonal row vectors after HADAMARD preprocessing, and performs normalization processing; and then reading corresponding weight column vectors from the weight matrix in sequence, performing online association memory-recall diagnosis with the input row vector to obtain corresponding discrete output vectors, and outputting the discrete output vectors after normalization processing.

4. The FPGA-based discrete BAM neural network system of claim 3, wherein: the weight matrix storage and reading module SRAM adopts a vector form to store a weight matrix W_16×16That is, the weight matrix is divided into 16 column vectors, each column vector contains 16 elements, and each element has at most 5 bits, so each column vector has: 5 × 16 ═ 80 bits.

5. The FPGA-based discrete BAM neural network system of claim 4, wherein: in the BAM neural network system, the association calculation is carried out by adopting a serial-parallel combination method, and an input row vector X is subjected to_1×16And weight matrix W_16×16One column in the system adopts a parallel computing mode to obtain an output row vector Y_1×16Then the other 15 elements are sequentially calculated over 15 clock cycles.

6. The FPGA-based discrete BAM neural network system of claim 4, as a wholeIs characterized in that: the weight matrix storage and reading module SRAM adopts a vector form to store a HADAMARD matrix, wherein the HADAMARD matrix (H) is_16×16) The matrix is a 16 × 16 square matrix, and the value of the matrix element is "+ 1" or "-1", so that each element is represented by a 2-bit binary number;

7. The FPGA-based discrete BAM neural network system of claim 6, wherein: the HADAMARD preprocessing module hdm comprises 16 row-column vector multiplication sub-modules, the 16 row-column vector multiplication sub-modules are controlled by the same clk clock signal, and input row vectors and an HADAMARD matrix H are completed in parallel_16×16The multiplication and addition operation of the middle 16 column vectors, and normalization processing and output are performed;

8. The FPGA-based discrete BAM neural network system of claim 3, wherein: the associative memory diagnostic module Matrix comprises 16 2-bit input signals, and the output from the HADAMARD preprocessing module hdm is a normalized vector after HADAMARD preprocessing; and an 80-bit weight matrix column vector, which receives 16 weight data from the weight matrix storage and reading module SRAM;