CN220913655U

CN220913655U - Reconfigurable characteristic measurement circuit applicable to small-sample neural network hardware acceleration system

Info

Publication number: CN220913655U
Application number: CN202322458169.9U
Authority: CN
Inventors: 林奕侠; 王云峰
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2023-09-11
Filing date: 2023-09-11
Publication date: 2024-05-07
Anticipated expiration: 2033-09-11

Abstract

The utility model discloses a reconfigurable characteristic measurement circuit applicable to a small sample neural network hardware acceleration system, which relates to the field of neural network hardware systems and comprises the following components: the device comprises a signal input port, a subtracter, a first selector, a second selector, a first vector dot-multiply accumulation module, a third selector, a second vector dot-multiply accumulation module, a register, a square root module, a divider, an output result selection module, a configuration register, a signal output port and an effective signal control module. The utility model adopts reconfigurable configuration to realize Euclidean distance measurement or cosine distance measurement of different support image types and different feature vector lengths, has flexible reconstruction, can accelerate the design of a small sample neural network hardware acceleration system, and ensures that the small sample neural network hardware acceleration system has universality.

Description

Reconfigurable characteristic measurement circuit applicable to small-sample neural network hardware acceleration system

Technical Field

The utility model relates to the field of neural network hardware systems, in particular to a reconfigurable characteristic measurement circuit applicable to a small-sample neural network hardware acceleration system.

Background

In recent years, deep learning becomes a main technical means of artificial intelligence, has a great breakthrough in various directions such as image recognition, voice recognition and the like, and plays an important role in daily life. At least hundreds or even thousands of samples are often needed to complete the learning process of the current deep learning. In real life, however, there may not be a large number of samples available for learning in the face of a new task. The first purpose of small sample learning networks was to mimic humans, and it is desirable to be able to learn new concepts with very few samples. Small sample learning differs from traditional deep learning in that only a small number of samples are required to classify an unknown class.

Edge-side and embedded devices have great potential for small sample neural network hardware acceleration systems. Typically, small sample neural networks require feature extraction using convolutional neural networks and classification using metrology modules. At present, the hardware acceleration research of the convolutional neural network for feature extraction is relatively extensive, and the technology of computing resource multiplexing and data reuse is generally used, so that the parallel computation of an input feature map, the parallel computation of an output feature map in a convolutional window and the parallel computation of the same input convolutional are realized, and the performance of an accelerator is relatively good.

The measurement module firstly measures the similarity between the feature vector of the query image and the feature vector of each support set image, and then judges that the query image and a certain support set image belong to the same class according to the similarity measurement. The measurement algorithm for the small-sample neural network influences the network identification effect, the Euclidean distance and cosine distance are the most used and the best identification effect, but the research on the measurement circuit suitable for the small-sample neural network is lacking at present. Particularly, different application environments, the types of the supporting images, the dimensionality of the feature vector and the proper measurement modes are different, and if a fixed-dimensionality Euclidean distance computing circuit or a cosine distance computing circuit is adopted to realize a small-sample neural network hardware acceleration circuit system, the application range of the small-sample neural network hardware acceleration circuit system can be seriously influenced. .

Disclosure of Invention

The utility model aims to solve the problem of inflexible characteristic measurement in the prior art, designs a reconfigurable measurement circuit suitable for a small sample neural network, performs hardware realization on a main stream measurement mode of the small sample neural network, adopts reconfigurable configuration, is more flexible and saves resources, and when the device is not in operation, a register is not turned over, so that the power consumption is reduced. The parametric classifier can be implemented in combination with a convolutional neural network based on a non-parametric classifier.

The technical scheme adopted for solving the technical problems is as follows: a reconfigurable feature metric circuit suitable for a small sample neural network hardware acceleration system is provided, comprising:

The signal input port is used for receiving a control signal and a data signal to be calculated;

A subtracter connected with the signal input port for receiving the input signal;

The input end of the configuration register is connected with the signal input port to receive the control signal, and the output end of the configuration register outputs the configuration signal;

The first selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive the data signal, the other signal input end is connected with the output end of the subtracter to receive the subtracter output signal, and the control end is connected with the configuration register to receive the configuration information;

The second selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive the data signal, the other signal input end is connected with the output end of the first selector, and the control end is connected with the configuration register to receive the configuration information;

The first vector dot-multiply accumulation module is provided with two signal input ends and a control end, wherein one signal input end is connected with the output end of the first selector, the other signal input end is connected with the output end of the second selector, and the control end is connected with the signal input port to receive a control signal;

The third selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive a data signal, the input of the other input end is constant 0, and the control end is connected with the configuration register to receive configuration information;

The second vector dot-multiply accumulation module is provided with two signal input ends and a control end, wherein the two signal input ends are connected with the output end of the third selector, and the control end is connected with the signal input end to receive a control signal;

the register is connected with the output end of the first vector dot-multiply accumulation module;

the square root module is connected with the output section of the second vector dot-multiply accumulation module;

The divider is provided with two signal input ends, one input end is connected with the output end of the register, and the other input end is connected with the output end of the square root module;

The output result selection module is provided with two input ends and a control end, wherein one input end is connected with the output end of the first vector dot-multiply accumulation module, the other input end is connected with the output end of the divider, and the control end is connected with the configuration register to receive configuration information;

And the signal output port is connected with the output end of the output result selection module and is used for outputting the calculated data signal.

Preferably, the control signals include a chip select signal, a read-write signal, and an address signal.

Preferably, the first vector point multiplication accumulation module and the second vector point multiplication accumulation module have the same structure, and each module comprises:

The multiplier is provided with two signal input ends which are used as the two signal input ends of the vector point multiplication accumulation module;

The fourth selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the output end of the multiplier, the input of the other signal input end is constantly 0, and the control end receives control information and a comparison signal output by the first comparator;

the first D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the fourth selector, and the clock input end of the time signal port inputs the clock signal required by the operation of the D trigger.

The first adder is provided with two signal input ends, one signal input end is connected with the output end of the first D trigger, and the other signal input end is connected with the output end of the second D trigger;

A fifth selector having two signal input terminals and a control terminal, wherein one signal input terminal is connected with the output terminal of the first adder, the input of the other signal input terminal is constantly 0, and the control terminal receives the control information and the comparison signal output by the first comparator;

The second D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the fifth selector, and the clock input end of the time signal port inputs the clock signal required by the operation of the D trigger.

The first comparator is provided with two signal input ends, one signal input end is used for receiving a control signal, the other signal input end receives a preset initial signal addr_mult, and a comparison signal is output according to a comparison result of the initial signal and the control signal.

Preferably, the system further comprises an effective signal control module, wherein the effective signal control module is provided with two control ends and an output end, one control end is connected with the signal input port to receive control signals, the other control end is connected with the configuration register to receive configuration signals, and the output end outputs operation result effective signals or ineffective signals.

Preferably, the effective signal control module includes:

The second comparator is provided with two signal input ends, one signal input end is used for receiving a control signal, the other signal input end receives a preset initial signal addr_mult, and a comparison signal is output according to a comparison result of the initial signal and the control signal;

The second adder is provided with two signal input ends, one signal input end is connected with the output end of the third D trigger, and the other signal input end is constantly 1;

A sixth selector having three signal input terminals and a control terminal, wherein the first signal input terminal is connected with the output terminal of the second adder, the second signal input terminal is connected with the output terminal of the third D trigger, the input of the third signal input terminal is constantly 0, and the control terminal receives the control signal and the comparison signal;

The third D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the sixth selector, and the clock input end of the time signal port inputs the clock signal required by the operation of the D trigger.

And the third adder is provided with two signal input ends, and the two signal input ends are used for receiving control signals.

A seventh selector having two signal input terminals and a control terminal, wherein one signal input terminal is connected to the output terminal of the third adder, and the other signal input terminal and the control terminal both receive control information;

And the third comparator is provided with two signal input ends, one signal input end is connected with the output end of the third D trigger, the other signal input end is connected with the output end of the seventh selector, and an operation result valid signal or an operation result invalid signal is output according to the comparison result of the two input signals.

Preferably, the bits of the configuration register are allocated as follows:

0-1 bit represents the circuit function, 0-1 bit represents the measurement circuit, and more models can be supported in subsequent expansion;

2-4 bits representing the dimension of the vector in the small sample classification, category range 1-8;

5 bits, representing the metric method, 0 representing the euclidean distance metric, 1 representing the cosine distance metric;

6 bits, representing a measurement start signal, 1 representing a measurement start;

7-10 bits representing the required additional delay, the additional delay introduced by the square root module and divider in the cosine distance metric;

Bits 11-31, reserved.

The utility model has the following beneficial effects:

1. The system can perform write operation through the configuration register in the circuit, configure the circuit and gate different data channels, so that the Euclidean distance measure or cosine distance measure which can realize various numbers of support set image types and different feature vector lengths is reconstructed, the reconstruction is flexible, and the system has universality.

2. When the circuit reconstructs Euclidean distance measurement, the square root solving process is reduced, the square of Euclidean distance is adopted for sorting, the measurement time is shortened, and the performance is improved.

3. When the circuit reconstructs cosine distance measurement, the norm of the characteristic vector of the query image in the cosine distance formula is reduced, the measurement time is shortened, and the performance is improved.

The present utility model will be described in further detail with reference to the drawings and examples, but the present utility model is not limited to the examples.

Drawings

FIG. 1 is a circuit block diagram of an embodiment of the present utility model;

FIG. 2 is a circuit diagram of a vector dot product accumulation module according to an embodiment of the present utility model;

fig. 3 is a circuit diagram of an effective signal control module according to an embodiment of the utility model.

Detailed Description

Referring to fig. 1, a circuit module structure diagram of an embodiment of the present utility model includes:

The first selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive the data signal, the other signal input end is connected with the output end of the subtracter to receive the output signal of the subtracter, and the control end is connected with the configuration register to receive the configuration information and control the output result;

The second selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive the data signal, the other signal input end is connected with the output end of the first selector, and the control end is connected with the configuration register to receive the configuration information and control the output result;

The first vector dot-multiply accumulation module is provided with two signal input ends and a control end, wherein one signal input end is connected with the output end of the first selector, the other signal input end is connected with the output end of the second selector, and the control end is connected with the signal input end to receive a control signal and control an output result;

The third selector is provided with two signal input ends and a control end, wherein one signal input end is connected with the signal input end to receive the data signal, the input of the other input end is constantly 0, and the control end is connected with the configuration register to receive the configuration information and control the output result;

The second vector dot-multiply accumulation module is provided with two signal input ends and a control end, wherein the two signal input ends are connected with the output end of the third selector, and the control end is connected with the signal input end to receive a control signal and control an output result;

The output result selection module is provided with two input ends and a control end, wherein one input end is connected with the output end of the first vector dot-multiply accumulation module, the other input end is connected with the output end of the divider, and the control end is connected with the configuration register to receive configuration information and control the output result;

The signal output port is connected with the output end of the output result selection module and is used for outputting the calculated data signal;

The system also comprises an effective signal control module, wherein the effective signal control module is provided with two control ends and an output end, one control end is connected with the signal input port to receive control signals, the other control end is connected with the configuration register to receive configuration signals, and the output end outputs operation result effective signals or ineffective signals.

Specifically, referring to fig. 2, a circuit diagram of a vector dot product accumulation module according to an embodiment of the present utility model includes:

Specifically, referring to fig. 3, a circuit diagram of an effective signal control module according to an embodiment of the present utility model includes:

The second adder has two signal input ends, one of which is connected with the output end of the third D trigger, and the other of which is input with constant 1.

When the embodiment of the utility model works, the specific meanings of the signal input port and the signal output port are shown in the table 1.

Table 1 table of input/output signals

For the feature vector a= (a ₀,a₁…a_n) and the feature vector b= (B ₀,b₁…b_n), there are two methods for realizing the feature metric in this embodiment, including calculating the euclidean distance and calculating the cosine distance.

The calculated Euclidean distance is shown as a formula (1):

Since the calculated Euclidean distance is used for comparison, the square root can be omitted, the ordering result of the Euclidean distance is not affected, and the time cost is reduced in hardware calculation. The calculation of the euclidean distance can therefore be reduced to equation (2):

The cosine distance is calculated as shown in formula 3:

when the cosine distance is calculated by different support sets and query sets, the common vector modulo length of the query set needs to be calculated This term can therefore be ignored during the comparison without affecting the ordering accuracy. The calculated cosine distance can therefore be reduced to equation (4):

Comparing the formulas of cosine and Euclidean distances, it can be seen that the vector point multiply-accumulate is common, i.e And/>B _i＝a_i can be regarded as a special case.

In addition to vector point multiply accumulate, there is also a square root and division in the cosine distance, so square root and divider modules are designed for cosine distance metrics, the divider divisor is from the output of the vector point multiply accumulate module registered through a registerThe dividend is output after passing through the square root module

In order to improve the calculation speed, two vector point multiplication accumulation modules are used simultaneously, when cosine distance measurement is used, divisors can reach dividers simultaneously with dividends after being delayed by registers, and effective results are output.

The calculation of the euclidean distance and the cosine distance is applicable, and needs to be selected and output, so the embodiment uses the configuration register to select the measurement method and the calculation result, and provides the information such as the dimension of the vector in the small sample classification, the measurement start signal and the like.

The control signals comprise a chip selection signal cs, a read-write signal r/w and an address signal addr, when the address signal addr is the address of the configuration register, data is written into the configuration register, and when the address signal addr is the address of the data register, data is written into the data register in the vector dot-multiply accumulation module.

Wherein the bit allocation of the configuration registers is shown in table 2.

Table 2 bit allocation table for configuration registers

When the Euclidean distance metric is adopted, the output ctrl [5] =0 of the configuration register is used, a first vector point multiply-accumulate module is used, and two inputs of the module are selected through a selector and are data [31:16] -data [15:0], namely a _i and b _i in the Euclidean distance metric. The input of the second vector point multiplication accumulation module is constant 0, the operation is not participated, the overturn of the register is reduced, and therefore the power consumption can be reduced. The calculation flow comprises the following steps:

S401, writing configuration information into a configuration register;

S402, storing the multiplied result of the input data into a first D trigger in a first vector dot-multiply accumulation module through configuration registers and peripheral signal control;

S403, accumulating in a second D trigger of the first vector point multiplication accumulation module;

S404, an effective signal control module generates a result effective signal according to information of a configuration register;

S405, outputting a result and a result valid signal.

When the cosine distance metric is adopted, the output ctrl [5] =1 of the configuration register uses two vector point multiply-accumulate modules, and two inputs of the first vector point multiply-accumulate module are selected by a selector and respectively are data [31:16] and data [15:0], which correspond to a _i and b _i in the cosine distance metric divisor term. And the two inputs to the second vector dot-multiply-accumulate module are data [31:16], corresponding to a _i in the cosine distance metric dividend term. The calculation flow comprises the following steps:

s501, writing configuration information into a configuration register;

S502, using two vector point multiplication accumulation modules, and storing the multiplied result of input data into a first D trigger in the two vector point multiplication accumulation modules through configuration registers and peripheral signal control;

s503, using two vector point multiplication accumulation modules to simultaneously perform accumulation calculation;

S504, dividing the dividend generated by the square root module by the divisor registered by the register;

S505, the effective signal control module generates a result effective signal according to the information of the configuration register, and additional delay is needed due to the fact that the square root module and the divider module are passed;

S506, outputting a result and a result valid signal.

Specifically, in the vector dot product accumulation module, only when cs=1, r/w=0, addr=addr_mult, the module works, the value of data1 x data2 is stored, and subsequent accumulation is performed, otherwise, the value is added with 0, and out is unchanged. Thus, in the non-working state, the register is not turned over, and the power consumption is reduced. The two vector dot-multiply-accumulate modules share a value of addr_mult because if addr is equal to this value, which means that the data is now written into the vector dot-multiply-accumulate module, both modules should enable writing. If the Euclidean distance is calculated, enabling the writing does not affect the result since the input has already been 0 through the previous selector.

When the operation is completed, if the result is not read by the following module, the output result out is kept unchanged, and after the result is read, out is set to 0 through the selector.

The vector point multiplication accumulation module is mainly composed of a multiplier and an adder, calculates the point multiplication among n-bit vectors, can effectively improve the utilization rate of parallel resources of calculation, and reduces the calculation time. The input is 32 bits, and the correct dot multiplication result is obtained after n times of cyclic calculation for the dot multiplication of an n-dimensional vector.

Specifically, in the valid signal control module, only when cs=1, r/w=0, addr=addr_mult, ctrl [6] =1, the module works to count, otherwise, it always keeps 0. Thus, in the non-working state, the register is not turned over, and the power consumption is reduced. The module counts and compares with the cycle number appointed by the configuration register, the cycle number is equal to indicate that the preset cycle number is passed, the calculation is completed, the result valid signal is set to 1, and otherwise, the enable outputs 0.

The number of cycles specified by the configuration register is selected from ctrl [4:2] and ctrl [4:2] +ctrl [10:7], ctrl [4:2] representing the operational cycle required for the Euclidean distance metric, ctrl [4:2] +ctrl [10:7] taking into account the square root and divider delays required for the cosine distance metric. Thus ctrl [5] = 0 represents the use of the euclidean distance metric, the specified number of cycles is ctrl [4:2], otherwise the cosine distance metric, the specified number of cycles is ctrl [4:2] + ctrl [10:7].

When the operation is finished, if the result is not read by the subsequent module, the effective enable of the output result is kept unchanged, namely the internal count is unchanged, and after the result is read, the count is set to 0 through a selector, and the enable is changed to 0.

Therefore, the reconfigurable characteristic measurement circuit suitable for the small-sample neural network hardware acceleration system provided by the utility model can reconstruct and measure Euclidean distance or cosine distance of characteristic vectors with different types and different lengths according to the system requirement, is flexible in reconstruction and has universality.

The foregoing is only illustrative of the present utility model and is not to be construed as limiting thereof, but rather as various modifications, equivalent arrangements, improvements, etc., within the spirit and principles of the present utility model.

Claims

1. A reconfigurable feature metric circuit suitable for a small sample neural network hardware acceleration system, comprising:

2. The reconfigurable feature metric circuit of claim 1, wherein the control signals include chip select signals, read-write signals, and address signals.

3. The reconfigurable feature metric circuit for a small sample neural network hardware acceleration system of claim 1, wherein the first and second vector dot-multiply-accumulate modules are identical in structure, each comprising:

The first D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the fourth selector, and the clock input end of the time signal port inputs a clock signal required by the operation of the D trigger;

The second D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the fifth selector, and the clock input end of the time signal port inputs a clock signal required by the operation of the D trigger;

4. The reconfigurable feature metric circuit of claim 1, further comprising an effective signal control module, the effective signal control module having two control terminals and an output terminal, wherein one control terminal is connected to the signal input port to receive the control signal, the other control terminal is connected to the configuration register to receive the configuration signal, and the output terminal outputs the operation result effective signal or ineffective signal.

5. The reconfigurable feature metric circuit for a small sample neural network hardware acceleration system of claim 4, wherein the effective signal control module comprises:

The third D trigger is provided with a signal input end and a clock input end, the signal input end is connected with the output end of the sixth selector, and the clock input end of the time signal port inputs the clock signal required by the operation of the D trigger;

The third adder is provided with two signal input ends, and the two signal input ends both receive control signals;