CN108764467B

CN108764467B - Convolution operation and full-connection operation circuit for convolution neural network

Info

Publication number: CN108764467B
Application number: CN201810300523.6A
Authority: CN
Inventors: 谷江涛; 汪波; 王新安; 张超; 欧阳廷炳; 高立钊; 陈红英; 何春舅
Original assignee: Peking University Shenzhen Graduate School
Current assignee: Peking University Shenzhen Graduate School
Priority date: 2018-04-04
Filing date: 2018-04-04
Publication date: 2021-08-17
Anticipated expiration: 2038-04-04
Also published as: CN108764467A

Abstract

A convolution operation and full-connection operation circuit for a convolution neural network is composed of a phase discriminator, a digital-to-time converter, a time amplifier and a time register in a time domain calculation circuit. The provided calculating circuit based on the analog time domain can complete operations such as multiply-accumulate operations required by convolution neural network convolution operations on the time domain, has high time calculation precision, can save additional storage area and reduce corresponding power consumption, and is completely compatible with a CMOS (complementary metal oxide semiconductor) process.

Description

Convolution operation and full-connection operation circuit for convolution neural network

Technical Field

The invention relates to the technical field of electronic information and deep learning, in particular to a convolution operation and full-connection operation circuit for a convolution neural network.

Background

With the increasing demand of artificial intelligence solutions based on neural networks, convolutional neural network construction is applied to some mobile platforms such as unmanned planes, robots and the like, which are deeply changing the production and life styles of human beings. In the aspect of designing and researching special hardware of the convolutional neural network, the implementation mode of novel devices based on a CPU, a GPU, an FPGA, an ASIC, an RRAM and the like is provided. From the cloud to the mobile terminal, different application scenes provide different requirements for the computing capacity of the convolutional neural network, and the convolutional neural network has various structures, large data volume and large computation amount, so that great challenges are provided for hardware to realize neural network algorithm design. The core of the hardware architecture of the convolutional neural network is the hardware architecture of convolution operation.

In the prior art, one is a circuit design of a hardware architecture for performing convolutional neural network convolutional operation by using a conventional digital circuit, such as an FPGA, an ASIC, a GPU, a CPU, and the like. However, as the process size decreases, the circuit node leakage increases and the power supply voltage decreases. Under a certain calculation precision, a large amount of circuit calculation resources and storage resources are consumed. That is, the performance of the whole circuit, such as power consumption, area, speed and precision, is continuously limited. And the other is to design and realize a CNN hardware circuit based on a new device, such as RRAM and the like. However, the novel device technology is not completely compatible with the CMOS technology, and the problem that the resolution precision of the calculated measurement is limited exists.

Disclosure of Invention

Circuits for convolutional neural network convolution operations and fully-connected operations are provided. Under the moore's law process, the traditional digital-analog circuit taking voltage and current as calculated quantities is continuously limited in performances such as speed and precision, and related researches show that the analog time domain operation circuit can have performance advantages such as higher precision.

The specific implementation scheme of the circuit for convolution operation and full-connection operation of the convolutional neural network disclosed by the application is as follows:

according to a first aspect, an embodiment provides a convolution operation circuit based on time domain calculation, including:

the convolution weight input module is provided with a reference pulse signal input end, a convolution kernel weight value input end, a lead-lag control signal input end, a positive output end and a negative output end; the reference pulse signal input end is used for inputting a reference pulse signal, the convolution kernel weight value input end is used for inputting a signal representing a convolution kernel weight value, and the control signal input end is used for inputting a lead-lag control signal; the convolution weight input module is used for judging the negative and positive of a convolution kernel weight value represented by a signal received by the convolution kernel weight value input end according to the lead-lag control signal, outputting the signal through the negative output end when the signal is judged to be negative, and outputting the signal through the positive output end when the signal is judged to be non-negative;

the convolution module comprises one or more independent convolution sub-modules; each convolution sub-module is provided with a core weight value positive input end, a core weight value negative input end, a value to be convolved input end and an output end; the input end of the value to be convolved is used for inputting a signal representing the value to be convolved; the convolution submodule is used for amplifying the signals received by the positive input end of the kernel weight value and accumulating the amplified signals as addends; the convolution submodule is used for amplifying the signal received by the negative input end of the kernel weight value and performing accumulation and subtraction calculation by taking the amplified signal as a subtraction number, wherein the multiplication factor of the convolution submodule for amplifying the signal is the value to be convolved; the convolution sub-module outputs a signal representing the final calculation result through its output.

Further, the convolution weight input module includes:

a first digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the first digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the first digital-to-time converter is connected with the convolution kernel weight value input end and is used for inputting the signal representing the convolution kernel weight value; when the first digital-to-time converter receives the signal representing the convolution kernel weight value, outputting a first pulse signal through an output end of the first digital-to-time converter, wherein the time difference between the first pulse signal and the reference pulse signal is the convolution kernel weight value;

a zeroth digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the zeroth digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the zeroth digital-to-time converter is connected with the lead-lag control signal input end and used for receiving the lead-lag control signal; the zero digital-to-time converter outputs a second pulse signal through an output end thereof when receiving the lead-lag control signal, wherein the time difference between the second pulse signal and the reference pulse signal is the value of the lead-lag control signal;

a first edge detector having a leading positive input terminal, a lagging negative input terminal, a positive output terminal, and a negative output terminal; the leading positive input end of the first edge detector is connected with the output end of the first digital-to-time converter and used for receiving the first pulse signal; the lagging negative input end of the first edge detector is connected with the output end of the zeroth digital-to-time converter and used for receiving the second pulse signal; the first edge detector outputs a pulse signal through a positive output end when detecting the edge of the first pulse signal firstly, and outputs a pulse signal through a negative output end when detecting the edge of the second pulse signal firstly; the width of the pulse signal is equal to the time difference between the edge of the first pulse signal and the corresponding edge of the second pulse signal, the positive output end of the first edge detector is the positive output end of the convolution weight input module, and the negative output end of the first edge detector is the negative output end of the convolution weight input module.

Further, the convolution sub-module includes:

a time amplifier having a positive input terminal, a negative input terminal, a control terminal, a positive output terminal and a negative output terminal; the control end of the time amplifier is connected with the input end of the value to be convolved; the positive input end of the time amplifier is the core weight value positive input end, and the negative input end of the time amplifier is the core weight value negative input end; the time amplifier is used for amplifying the signal received by the positive input end of the time amplifier and outputting the signal through the positive output end of the time amplifier; the time amplifier is used for amplifying the signal received by the negative input end of the time amplifier and outputting the signal through the negative output end of the time amplifier; wherein, the amplification times of the signals are the values corresponding to the signals received by the control end;

a time register having a positive input terminal, a negative input terminal, and an output terminal; the positive input end of the time register is connected with the positive output end of the time amplifier, the negative input end of the time register is connected with the negative output end of the time amplifier, and the output end of the time register is the output end of the convolution submodule; the time register is used for accumulating and calculating signals received by the positive input end of the time register as addends, accumulating and calculating signals received by the negative input end of the time register as subtractions, and outputting signals representing final calculation results through the output end of the time register;

further, the convolution operation circuit further includes:

a convolutional layer bias input module having a reference pulse signal input, a convolutional layer bias value input, a lead-lag control signal input, a positive output, and a negative output; the reference pulse signal input end of the convolutional layer bias input module is used for inputting the reference pulse signal; the convolutional layer offset value input end is used for inputting a signal representing the convolutional layer offset value; the lead-lag control signal input end of the convolutional layer offset input module is used for inputting a lead-lag control signal; the convolutional layer offset input module is used for judging the negativity and the positivity of the convolutional layer offset value represented by the signal received by the convolutional layer offset value input end according to the lead-lag control signal, and outputting the convolutional layer offset value through the negative output end when the convolutional layer offset value is judged to be negative, and outputting the convolutional layer offset value through the positive output end when the convolutional layer offset value is judged to be non-negative.

The convolution submodule is also provided with a convolution layer offset positive input end and a convolution layer offset negative input end;

the convolution sub-module is also used for accumulating and calculating signals received by the convolution layer offset positive input end of the convolution sub-module as addends; the convolution layer offset value negative input end of the convolution submodule receives the pulse signal output by the negative output end of the convolution layer offset input module, and the convolution submodule is also used for carrying out accumulation subtraction calculation by taking the signal received by the convolution layer offset value negative input end as a subtraction number; the convolution sub-module outputs a signal representing the final calculation result through its output.

Further, the convolutional layer bias input module includes:

a third digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the third digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the third digital-to-time converter is connected with the convolutional layer offset value input end and is used for inputting the signal representing the convolutional layer offset value; the third digital-to-time converter outputs a third pulse signal through an output end of the third digital-to-time converter when receiving the signal representing the convolutional layer bias value, wherein the time difference between the third pulse signal and the reference pulse signal is the convolutional layer bias value;

a second digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the second digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the second digital-to-time converter is connected with the lead-lag control signal input end and used for receiving the lead-lag control signal; said second digital to time converter outputting a fourth pulse signal through its output when receiving a signal for said lead-lag control signal, wherein the fourth pulse signal differs from said reference pulse signal by the value of said lead-lag control signal;

a second edge detector having a leading positive input, a lagging negative input, a positive output, and a negative output; the leading positive input end of the second edge detector is connected with the output end of the third digital-to-time converter and used for receiving the third pulse signal; a lagging negative input of the second edge detector is connected to the output of the second digital-to-time converter for receiving the fourth pulse signal; the second edge detector outputs a pulse signal through the positive output end when detecting the edge of the third pulse signal firstly, and outputs a pulse signal through the negative output end when detecting the edge of the fourth pulse signal firstly; wherein the width of the pulse signal is equal to the time difference between the edge of the third pulse signal and the corresponding edge of the fourth pulse signal,

a first dual-channel selection switch having a positive input terminal, a negative input terminal, a positive output terminal, and a negative output terminal; a positive input end of the first double-channel selection switch is connected with a positive output end of the second edge detector and used for receiving a signal output by the positive output end of the second edge detector; the negative input end of the first double-channel selection switch is connected with the negative output end of the second edge detector and used for receiving a signal output by the negative output end of the second edge detector; when the positive input end of the first double-channel selection switch inputs signals before the negative input end of the first double-channel selection switch inputs signals, the received signals are output through the positive output end of the first double-channel selection switch, and the negative output end of the first double-channel selection switch does not output signals; when the negative input end of the first double-channel selection switch inputs signals before the positive input end of the first double-channel selection switch inputs signals, the received signals are output through the negative output end of the first double-channel selection switch, and the positive output end of the first double-channel selection switch does not output signals; a positive output terminal of the first dual-channel selection switch is a positive output terminal of the convolutional layer bias input module, and a negative output terminal of the first dual-channel selection switch is a negative output terminal of the convolutional layer bias input module.

According to a second aspect, an embodiment provides a convolutional neural network fully-connected circuit based on time domain calculation, including:

the full-connection module comprises one or more independent full-connection sub-modules; each path of fully-connected sub-module is provided with a value-to-be-fully-connected input end, a fully-connected input end and an output end; the input end of the value to be fully connected is used for inputting a signal representing an operation value to be fully connected; the full-connection value input end is used for inputting a signal of a full-connection operation value; the full-connection submodule is used for amplifying a signal input by a full-connection value input end of the full-connection submodule and accumulating and calculating the amplified signal as an addend, wherein the amplification times of the full-connection submodule on the signal are the to-be-fully-connected operation value; the fully connected sub-module outputs a signal representing the final result through its output.

Further, the fully-connected sub-module includes:

a second time amplifier having an input terminal, a control terminal, and an output terminal; the control end of the second time amplifier is used for inputting a signal representing the operation value to be fully connected; the input end of the second time amplifier is used for inputting a signal representing the full-connection operation value, the second time amplifier is used for amplifying the signal received by the input end of the second time amplifier and outputting the signal through the output end of the second time amplifier, wherein the amplification multiple of the signal is a value corresponding to the signal received by the control end of the second time amplifier;

a second time register having an input and an output; the input end of the second time register is connected with the output end of the second time amplifier, and the output end of the time register is the output end of the full-connection submodule; the second time register is used for accumulating and calculating the signals received by the input end of the second time register as addends, and outputting signals representing the final calculation result through the output end of the second time register.

Further, the convolutional neural network fully-connected circuit further comprises:

the full-connection layer bias input module is provided with a reference pulse signal input end, a full-connection layer bias value input end, a lead-lag control signal input end, a positive output end and a negative output end; the reference pulse signal input end of the full connection layer bias input module is used for inputting the reference pulse signal; the full link layer bias value input end is used for inputting a signal representing a full link layer bias value; the lead-lag control signal input end of the full connection layer bias input module is used for inputting a lead-lag control signal; and the full-connection layer offset input module is used for judging the negativity and the positivity of the full-connection layer offset value represented by the signal received by the full-connection layer offset value input end according to the lead-lag control signal, outputting the full-connection layer offset value through the negative output end when the full-connection layer offset value is judged to be negative, and outputting the full-connection layer offset value through the positive output end when the full-connection layer offset value is judged to be non-negative.

The full-connection sub-module is also provided with a full-connection layer offset positive input end and a full-connection layer offset negative input end;

the full-connection sub-module is used for receiving the pulse signal output by the positive output end of the full-connection layer offset input module, and accumulating the signal received by the full-connection layer offset positive input end as an addend; the full-connection layer bias value negative input end of the full-connection sub-module receives the pulse signal output by the negative output end of the full-connection layer bias input module, and the full-connection sub-module is also used for carrying out accumulation subtraction calculation by taking the signal received by the full-connection layer bias value negative input end as a subtraction number; the fully connected sub-module outputs a signal representing the final calculation result through its output.

Further, the full link layer bias input module includes:

a fifth digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the fifth digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the fifth digital-to-time converter is connected with the input end of the full link layer offset value and is used for inputting the signal representing the full link layer offset value; the fifth digital-to-time converter outputs a fifth pulse signal through an output end thereof when receiving the signal for representing the full link layer bias value, wherein the time difference between the fifth pulse signal and the reference pulse signal is the full link layer bias value;

a fourth digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the fourth digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the fourth digital-to-time converter is connected with the input end of the lead-lag control signal and used for receiving the lead-lag control signal; said fourth digital to time converter outputting a sixth pulse signal through its output when receiving a signal for said lead-lag control signal, wherein the sixth pulse signal differs from said reference pulse signal by the value of said lead-lag control signal;

a third edge detector having a leading positive input terminal, a lagging negative input terminal, a positive output terminal, and a negative output terminal; the leading positive input end of the third edge detector is connected with the output end of the fifth digital-to-time converter and used for receiving the fifth pulse signal; a lagging negative input of the third edge detector is connected with the output end of the fourth digital-to-time converter and is used for receiving the sixth pulse signal; the third edge detector outputs a pulse signal through the positive output end when detecting the edge of the fifth pulse signal firstly, and outputs a pulse signal through the negative output end when detecting the edge of the sixth pulse signal firstly; wherein the width of the pulse signal is equal to the time difference between the edge of the fifth pulse signal and the corresponding edge of the sixth pulse signal,

a second dual channel select switch having a positive input terminal, a negative input terminal, a positive output terminal, and a negative output terminal; a positive input end of the second double-channel selection switch is connected with a positive output end of the third edge detector and is used for receiving a signal output by the positive output end of the third edge detector; the negative input end of the second double-channel selection switch is connected with the negative output end of the third edge detector and is used for receiving a signal output by the negative output end of the third edge detector; when the positive input end of the second dual-channel selection switch inputs a signal before the negative input end of the second dual-channel selection switch inputs a signal, the received signal is output through the positive output end of the second dual-channel selection switch, and the negative output end of the second dual-channel selection switch does not output a signal; when the negative input end of the second dual-channel selection switch inputs signals before the positive input end of the second dual-channel selection switch inputs signals, the received signals are output through the negative output end of the second dual-channel selection switch, and the positive output end of the second dual-channel selection switch does not output signals; the positive output end of the second double-channel selection switch is the positive output end of the full connection layer bias input module, and the negative output end of the second double-channel selection switch is the negative output end of the full connection layer bias input module.

According to a third aspect, there is provided in one embodiment a circuit for convolutional neural network operation, comprising:

the convolution operation circuit according to the first aspect, wherein when a signal indicating a final calculation result, which is output by the convolution submodule in the convolution operation circuit, is a non-positive value, a pulse signal having a pulse width of 0 is output;

the fully-connected operational circuit for the convolutional neural network of the second aspect;

and the activation pooling module is used for receiving the pulse signal which represents the final calculation result and is output by the output end of each path of convolution submodule of the convolution operation circuit, outputting the signal with the widest pulse width, and giving the signal to the full-connection value input end of each path of full-connection submodule in the convolution neural network full-connection operation circuit.

According to the convolution operation circuit of the embodiment, because the convolution operation circuit is based on the conventional analog time domain calculation circuit, the operations such as multiplication and accumulation and the like corresponding to the convolution neural network convolution operation are completed in the time domain, so that the convolution neural network operation circuit can be completely compatible with a CMOS (complementary metal oxide semiconductor) process, has high time calculation precision, and can save extra storage area and reduce corresponding power consumption.

Drawings

FIG. 1 is a block diagram of a convolutional neural network;

FIG. 2 is a diagram of a single hidden layer convolutional neural network;

FIG. 3 is image data information for another embodiment;

FIG. 4 is a flow diagram of a convolutional neural network convolution calculation of another embodiment;

FIG. 5 is a graph of data change after ReLu activation is applied to a feature map data of another embodiment of the convolutional neural network;

FIG. 6 is a diagram of a feature map data pooling process of a convolutional neural network according to another embodiment;

FIG. 7 is a calculation process for fully connecting convolutional neural networks of another embodiment;

FIG. 8 is a block diagram of a convolutional neural network architecture of another embodiment;

FIG. 9 is a circuit diagram of an alternative embodiment of the convolution operation in the analog time domain;

FIG. 10 is a block diagram of a digital to time converter of another embodiment;

FIG. 11 is a schematic diagram of a time register of another embodiment;

FIG. 12 is a schematic diagram of another embodiment of a convolutional layer bias circuit;

FIG. 13 is a diagram of a fully connected module and a fully connected bias circuit connection of another embodiment;

FIG. 14 is a circuit diagram of an alternative embodiment of the convolution operation in the analog time domain;

FIG. 15 is a position diagram of a convolutional neural network parameter write time domain circuit of another embodiment;

FIG. 16 is a diagram of rules for extracting convolutional neural network parameters according to another embodiment;

FIG. 17 is a diagram of a Cadence-AMS platform hardware simulation model according to another embodiment.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The convolutional neural network is a feedforward neural network, and its artificial neurons can respond to a part of surrounding units in the coverage range, and can be generally divided into an input layer, a hidden layer and an output layer, wherein the hidden layer can be further divided into a convolutional layer and a pooling layer.

The following explains the structure of the convolutional neural network by using a specific example, and is shown in fig. 1, which is a structural diagram of the convolutional neural network. The convolutional neural network inputs an a-a resolution image, for example, a 28-28 resolution image.

Convolution layer C1 uses M convolution kernels of n × n to convolve the above images to obtain M images of b × b resolution, and usually adds bias and activation operations, which is convenient for understanding the structure of the convolutional neural network and omits these two steps.

The sampling layer S2 performs a sampling operation on the M b × b-resolution images obtained by the convolutional layer C1 to obtain M b/2 × b/2-resolution images.

The convolution layer C3 convolves the 6 images of 12 × 12 resolution obtained by the sampling layer S2 with 12 convolution kernels of 5 × 5, to obtain 12 images of 8 × 8 resolution.

The sampling layer S3 performs a sampling operation on the 12 8 × 8-resolution images obtained by the convolutional layer C3 to obtain 12 4 × 4-resolution images.

The output layer is used for fully connecting and outputting 12 images with 4 × 4 resolutions obtained by the sampling layer S3 to obtain 12 pieces of feature information of the images.

The convolutional neural network in the above example uses two convolutional layers, and the fully-connected output of the output layer is also a special convolution operation, so the convolution operation is the core of the operation of the convolutional neural network.

In the embodiment of the invention, the hardware circuit architecture realized by adopting a mode of simulating a time domain realizes the convolution operation on the convolution neural network.

The first embodiment is as follows:

please refer to fig. 2, which is a structure diagram of a single hidden layer convolutional neural network;

the input layer inputs MINIST data set image information with 7 × 7 pixels, 4 bits of quantization is carried out, and 2 bits are taken as decimal. The MINIST data set comprises a plurality of gray-scale pictures, and a digital array can be used for representing one picture, wherein each picture comprises 7 x 7 pixel points. Each picture has a corresponding label, that is, a number corresponding to the picture, for example, as shown in fig. 3, the picture information is a number "0" and a value of a pixel point of the corresponding image, and the value of each pixel point in each picture represents an intensity value of a certain pixel point in the picture. The training data set label of the MINIST data set is a number between 0 and 9, which is used to describe the number represented in a given picture. The remaining dimension numbers of a unique vector except for a certain number 1 are all 0, so the number N in this example will be represented as a 10-dimensional vector with the number 1 only in the nth dimension (starting from 0). For example, the label 0 shown in fig. 3 will be denoted ([1, 0,0, 0,0, 0 ]). The label is a [10000, 10] number matrix. In this embodiment, 6 convolution kernels with a size of 4 × 4 are selected, and image feature data (Yk) is obtained according to the following formula:

where k represents the number of convolution kernels (e.g., k is 1, 2, 3, 4, 5, 6), m represents the convolution kernel size (e.g., 4 bits quantized m is 4), Wc_ikIs the convolutional layer weight, b_ckFor convolution kernel offset, xi is the input image data.

The maximum down-sampling operation of the pooling layer is 2, the weight of the full connection layer is 2 x 10 x 40, 4-bit quantization is carried out, the decimal takes 0 bit, the offset of the full connection layer is 101, 4-bit quantization is adopted, and 2 bits are taken as the decimal. The convolution formula for the fully connected layer is:

Z_{out_channeli}is representative of the result of a full join operation, f_k() Represents an active pooling transform function, Yk is an output value of a previous convolution calculation layer, k represents a bias number of a full-connection layer (k is a natural number), i represents a serial number of a full-connection output classification channel, m represents a channel number (for example, m is 4, 16), Wdik is a full-connection weight, bd_kThe full link layer is biased.

The following explains the process of convolutional neural network computation by a specific example, and inputs image data information shown in fig. 3.

As shown in fig. 4, it is a flowchart of convolution calculation of a convolutional neural network, in which a numerical value in an image block of a size corresponding to a convolution kernel in an input image is multiplied by a numerical value at a corresponding position in a convolution kernel matrix and the products are added, and the added value is added to a convolution offset value to obtain data in a feature map corresponding to the image block, where the position of the data in the feature map corresponds to the position of the image block in the input image.

Specifically, in FIG. 4, b_ijRepresenting image data, a_ijRepresenting data in a convolution kernel, b_ckOffset value d representing convolution_ijRepresenting values in the feature maps, wherein i, j, c and k are natural numbers, N convolution kernels are provided, and N corresponding feature maps are generated; specific examples are b₃₂Is the image data represented as the second row of the third column, whose value is 1; a is₂₃Data represented as the second column, third row in the convolution kernel, with a value of 0.75; b_c4Expressed as an offset value in the fourth row, which has a value of 0; t is t₃₃The data, shown as the third row of the third column in the feature map, had a value of-34.50.

If t is calculated₁₁Then, then

t₁₁＝∑bij*aij+b_c1＝(b11*va11+b12*a12+b12*a12+b13*a13+b14*a14+b21*a21+b22*a22+b23*a23+b24*a24+b31*a31+b32*a32+b33*a33+b34*a34+b41*a41+b42*a42+b43*a43+b44*a44)+bc1＝2.25

And by analogy, obtaining 16 image blocks in the input image, and respectively convolving the 16 image blocks with a convolution kernel to obtain 16 data so as to form a feature map. And by analogy, 6 feature maps corresponding to the 6 convolution kernels are obtained.

As shown in fig. 5, the ReLu-activated data change is applied to the data of the first signature of the convolutional neural network. In this example, the ReLu activation operation corresponds to a process of negating, and the operation of setting a value less than 0 in the first profile data to 0.

As shown in fig. 6, a first feature map data pooling process for convolutional neural networks. The pooling process is to equally divide the characteristic spectrum matrix into even small blocks, obtain the maximum value in each small block and form a pooled matrix. For example, in the feature map data, (t11, t12, t21, t22), (t31, t32, t41, t42), (t13, t14, t23, t24), and (t33, t34, t43, t44) are each maximized to form a matrix of 2 × 2.

As shown in fig. 7, is a calculation process of the convolutional neural network full connection. And (3) firstly expanding the pooled feature maps according to a specified sequence, expanding 6 feature maps to form a1 x 24 array set, performing convolution operation on the array set with a convolution kernel of 10 x 24 numerical values of 1, and performing offset of full-connection layer operation to be 1 x 10 array, wherein the numerical values in the 1 x 10 array are all 1. And after full connection operation, obtaining a 10 × 1 sequence, outputting values of all items in the sequence, and judging the size of all the values in the 10 × 1 sequence through a logic classifier, wherein the position corresponding to the maximum value (the value is the maximum compared with the position 87.25 of 0 in the example) is the original image label image information of the input picture, and the image is the digital '0' image in the example.

And designing a circuit architecture of the single hidden layer neural network based on the calculation of the simulation time domain based on the calculation step of the convolutional neural network.

Fig. 8 is a block diagram showing a convolution operation in the analog time domain. The convolutional neural network includes a convolutional weights input module 110, a convolutional module 120, a convolutional layer bias input module 130, an activation pooling module 140, a fully-connected module 150, and a fully-connected layer bias input module 160.

A convolution weight input module 110 having a reference pulse signal input terminal 1, a convolution kernel weight value input terminal 2, a lead-lag control signal input terminal 24, a positive output terminal 3, and a negative output terminal 4; the reference pulse signal input terminal 1 is used for inputting a reference pulse signal, the convolution kernel weight value input terminal 2 is used for inputting a signal representing a convolution kernel weight value, and the control signal input terminal 24 is used for inputting a lead-lag control signal; the convolution weight input module is used for judging the negativity and the positivity of a convolution kernel weight value represented by a signal received by the convolution kernel weight value input end according to the lead-lag control signal, outputting the convolution kernel weight value through a negative output end 4 when the convolution kernel weight value is judged to be negative, and outputting the convolution kernel weight value through a positive output end 3 when the convolution kernel weight value is judged to be non-negative;

a convolution module 120 including one or more independent convolution sub-modules 121; each convolution sub-module 121 has a positive kernel weight value input terminal 5, a negative kernel weight value input terminal 6, an input terminal 22 for a value to be convolved, and an output terminal 9; the value to be convolved input end 22 is used for inputting a signal representing a value to be convolved; the positive input end 5 of the kernel weight value is configured to receive a signal output by the positive output end 3 of the convolution weight input module, and the convolution sub-module 121 is configured to amplify the signal received by the positive input end 5 of the kernel weight value and perform accumulation calculation by taking the amplified signal as an addend; the kernel weight value negative input end 6 is configured to receive a signal output by the negative output end 4 of the convolution weight input module, the convolution submodule 121 is configured to amplify the signal received by the kernel weight value negative input end 6, and perform cumulative subtraction calculation by using the amplified signal as a subtraction number, where a magnification factor of the signal by the convolution submodule is the value to be convolved; the convolution sub-module 121 outputs a signal representing the final calculation result via its output 9.

A convolutional layer offset input module 130 having a reference pulse signal input 10, a convolutional layer offset value input 11, a lead-lag control signal input 25, a positive output 12, and a negative output 13; the reference pulse signal input terminal 10 of the convolutional layer bias input module 130 is used for inputting the reference pulse signal; the convolutional layer offset value input terminal 11 is used for inputting a signal representing a convolutional layer offset value; the lead-lag control signal input terminal 25 of the convolutional layer offset input module 130 is used for inputting a lead-lag control signal; the convolutional layer offset input module 130 is configured to determine, according to the lead-lag control signal, negative and positive of the convolutional layer offset value indicated by the signal received by the convolutional layer offset value input terminal 11, and output the convolutional layer offset value through the negative output terminal 13 when the convolutional layer offset value is determined to be negative, and output the convolutional layer offset value through the positive output terminal 12 when the convolutional layer offset value is determined to be non-negative.

The convolution submodule 121 further has a convolutional layer offset positive input terminal 7 and a convolutional layer offset negative input terminal 8;

the convolutional layer offset positive input end 7 of the convolutional layer sub-module 121 receives the pulse signal output by the positive output end 12 of the convolutional layer offset input module, and the convolutional sub-module 121 is further configured to perform accumulation calculation by taking the signal received by the convolutional layer offset positive input end 12 as an addend; the convolution layer offset value negative input end 8 of the convolution sub-module 121 receives the pulse signal output by the negative output end 13 of the convolution layer offset input module, and the convolution sub-module 121 is further configured to perform subtraction calculation by using the signal received by the convolution layer offset value negative input end 8 as a subtraction number; the convolution sub-module 121 outputs a signal representing the final calculation result through its output.

When the signal which is used for outputting and represents the final calculation result and is output by the convolution submodule 121 of the convolution operation circuit is a non-positive value, a pulse signal with 0 pulse width is output;

a full-link module 150 comprising one or more independent full-link sub-modules 151; each path of fully-connected sub-module 151 has a value-to-be-fully-connected input 23, a fully-connected input 14, and an output 17; the input end 23 of the value to be fully connected is used for inputting a signal representing the operation value to be fully connected; the full-connection value input end 14 is used for inputting a signal of a full-connection operation value; the full-connection submodule 151 is configured to amplify a signal input by the full-connection value input end 14, and perform accumulation calculation by using the amplified signal as an addend, where a factor of amplifying the signal by the full-connection submodule 151 is the operation value to be fully connected; the fully connected sub-module 151 outputs a signal representing the final result via its output 17.

A full link layer bias input module 160 having a reference pulse signal input 18, a full link layer bias value input 19, a lead-lag control signal input 26, a positive output 20, and a negative output 21; the reference pulse signal input end 18 of the full link layer bias input module 160 is used for inputting the reference pulse signal; the full link layer bias value input terminal 19 is used for inputting a signal representing a full link layer bias value; the lead-lag control signal input 26 of the fully-connected layer bias input module 160 is used for inputting a lead-lag control signal; the full link layer offset input module 160 is configured to determine, according to the lead-lag control signal, the negative and positive values of the full link layer offset value indicated by the signal received at the full link layer offset value input terminal 19, and output the full link layer offset value through the negative output terminal 21 when the full link layer offset value is determined to be negative, and output the full link layer offset value through the positive output terminal 20 when the full link layer offset value is determined to be non-negative.

The full-link sub-module 151 further has a full-link layer bias value positive input terminal 15 and a full-link layer bias value negative input terminal 16;

the full-link layer offset positive input terminal 15 of the full-link sub-module 151 receives the pulse signal output by the positive output terminal 20 of the full-link layer offset input module 160, and the full-link sub-module 151 is further configured to perform accumulation calculation by taking the signal received by the full-link layer offset positive input terminal 15 as an addend; the fully-connected layer bias value negative input terminal 16 of the fully-connected sub-module 151 receives the pulse signal output by the negative output terminal 21 of the fully-connected layer bias input module 160, and the fully-connected sub-module 151 is further configured to perform subtraction calculation by taking the signal received by the fully-connected layer bias value negative input terminal 16 as a subtraction number; the fully connected submodule 151 outputs a signal representing the final calculation result via its output 17.

And an activation pooling module 140, configured to receive a pulse signal indicating a final calculation result output by each output end of the convolution submodule of the convolution operation circuit, and output a signal with a widest pulse width to the fully-connected value input end 14 of each fully-connected submodule of the convolution neural network fully-connected operation circuit.

Fig. 9 is a schematic diagram of a convolution operation circuit in an analog time domain. As shown in fig. 4, in the single hidden layer convolutional neural network operation, a hardware circuit diagram is constructed based on a convolutional operation structure. The hardware circuit is composed of a Digital Time Converter (DTC), an edge detector (PFD), a Time Amplifier (TA) and a Time Register (TR).

The DTC comprises two inputs, one for inputting a reference pulse signal and the other for inputting a pulse signal representing a value, the output being designated as the leading control output if the output is for outputting a leading pulse signal and the lagging control output if the output is for outputting a lagging reference pulse signal.

The PFD includes two input terminals and two output terminals for outputting a pulse signal having a pulse width that is the time difference between the two input signals. Setting one of the two input terminals as a positive input terminal (leading positive input terminal INP) and the other as a negative output terminal (lagging negative input terminal INN), the output terminal corresponding to the output signal of the positive input terminal is called a positive output terminal (UP terminal), and the output terminal corresponding to the output signal of the negative output terminal is called a negative output terminal (DN terminal). The positive input end is used for inputting pulse signals representing non-negative values, the pulse signals output by the corresponding positive output ends represent output of the non-negative values, the negative input end is used for inputting pulse signals representing negative values, and the pulse signals output by the corresponding negative output ends represent output of the negative values.

TA comprises three input terminals and two output terminals for amplifying the input signals. The two input ends correspond to the two output ends, one of the two input ends is set as a positive input end, the output end corresponding to the output signal is a positive output end, the other input end is set as a negative input end, and the output end corresponding to the output signal is a negative output end. The other input terminal is used for inputting a digital pulse signal, which is the amplification factor of the signal at the positive input terminal or the negative input terminal of the TA to be amplified.

TR includes two input terminals and an output terminal, and is configured to perform an accumulation or subtraction operation on the input pulse width signal and output the accumulation or subtraction result from the output terminal. One input end is set as a positive input end, and TR is used for carrying out accumulation operation on the pulse width signals input by the positive input end. And setting the other input end as a negative input end, wherein TR is used for carrying out accumulation and subtraction operation on the pulse width signal input by the negative input end.

Fig. 9 shows that DTC0 is a first digital-to-time converter, DTC1 is a second digital-to-time converter, PFD1 is a first edge detector, TA1 is a first time amplifier, TA2 is a second time amplifier, TA3 is a third time amplifier, TA4 is a fourth time amplifier, TR1 is a first time register, TR2 is a second time register, TR3 is a third time register, and TR4 is a fourth time register.

Firstly, aij pulse signals of 4-bit quantized (convolution kernel is a number sequence of 4 x 4) convolution kernel weight are sequentially input to a DTC1, a control signal is input to a DTC0, two clock signals with the same pulse width and time delay are respectively output by the DTC1 and the DTC0, and the DTC0 outputs the clock signals to a lagging negative input end INN of a following PFD 1; the DTC1 outputs a clock signal to the leading positive input terminal INP of the trailing PFD 1. DTC0 is used as a control circuit for the digital-to-time converter to implement control for positive inputs where aij is positive or negative inputs where aij is negative, i.e., lead control and lag control as indicated in the figure.

The specific working process is as follows:

for example, when the 4-bit quantization weight aij input is a positive number sequence, assuming that the convolution kernel weight is aij (0.00,0.25,0.50,0.75,1.00,1.25,1.50,1.75), the binary conversion is performed first to obtain [0000,0001,0010,0011,0100,0101,0110,0111 ]; setting the control signal input by the control end of the DTC0 as [1000], setting the control end of the DTC1 as [ [1000] - [ aij ] ], generating a leading clock edge difference as [ aij ] | Δ t, wherein Δ t is the minimum delay time unit of the clock output of the DTC0 and the DTC1, introducing B (pulse delay unit) into the negative-direction lagging input end in the PFD1, gating an UP end switch of the PFD1, outputting a (B + | [ aij ] | Δ t) time domain pulse width signal of positive weight information, and switching a DN end switch of the PFD1 into an off state to output 0.

When the 4-bit quantized weight aij is input as a negative sequence, for example, assuming that the weight of the convolution kernel is aij [ -0.25, -0.50, -0.75, -1.00, -1.25, -1.50, -1.75], when the convolution kernel is firstly converted into [1001,1010,1011,1100,1101,1110,1111], setting the input control signal of the control end of the DTC0 as [1000], the control end of the DTC1 as [ aij ], the DTC1 outputs a signal lagging behind the DTC0, the lagging value is [ [ aij ] - [1000] ] |. DELTA.t, wherein B (pulse delay unit) is introduced into the positive leading input end inside the PFD1, the (B + [ [ aij ] - [1000] ] |. DELTA.t) time domain pulse width signal of the negative weight information is gated and output by the DN end switch of the PFD1, and the UP end switch of the PFD1 is disconnected and outputs 0.

The PFD1 may be connected in parallel with one or more TAs. The specific connection mode is that the positive input end of each TA is connected with the UP output end of the PFD1, and is used for completing the multiplication operation of a non-negative value in the convolution operation, and outputting a pulse signal from the corresponding positive output end, and is used for adding the TR for accumulation operation. The negative input end is connected with the DN output end of the PFD and used for finishing multiplication operation which is negative in convolution operation, and outputting a pulse signal from the corresponding negative output end and used for carrying out subtraction operation on the number subtracted from the TR. That is, each TA for convolution multiplication outputs a multiplied pulse width signal to the time register TR for convolution accumulation or subtraction. The final pulse width signal output via the TR output is the result of the convolution operation.

As shown in fig. 10, a signal input/output diagram and a circuit diagram of the digital clock converter DTC are shown. The digital clock converter DTC comprises two digital-to-time converters DTC0 and DTC1, a digital-to-time converter DTC0 or DTC1 for inputting a reference pulse signal and a digital signal control M1 for outputting a pulse signal with a time delay | M1| from the reference pulse signal, such that the time delay of the output pulse signal and the input reference pulse signal is indicative of the value of the input digital signal (M1). Among them, DTC0 includes: a transistor T1 and a transistor T2 coupled in series, a first capacitor C_DA second capacitor C_LA first amplifier and a second amplifier, wherein the control terminals of the transistor T1 and the transistor T2 are coupled together to receive a reference pulse signal, the first terminal of the transistor T1 is coupled to the power supply, the second terminal of the transistor T1 is coupled to the transistor T1 at a node aA first terminal of T2 is coupled, and a second terminal of transistor T2 is coupled to ground; a first capacitor C_DAnd a second capacitor C_LCoupled between node a and ground; wherein the second capacitor C_LIs varied based on the digital signal control M1. The input end of the first amplifier is coupled with the node a, the output end of the first amplifier is coupled with the input end of the second amplifier, and the output end of the second amplifier outputs a pulse signal with time delay of | M1| between the pulse signal and the reference pulse signal. DTC1 has a similar structure to DTC0 and will not be described in detail herein.

In this embodiment, the edge detector is configured to detect edges of two signals output by the DTC, and then output a time-domain pulse signal, where a pulse width of the output pulse signal is equal to a time delay width between rising edges of the two signals, or a time delay width between falling edges of the two signals. I.e., thereby outputting a pulse width representative of the magnitude of the digital signal M1. The edge Detector may be implemented by a Phase Detector (PD) or a Phase Frequency Detector (PFD), and the PFD is used in this embodiment to compare the time difference between two signal edges output by the digital time converter DTC (DTC0-DTC1) and convert it into the pulse width of a square wave signal. Taking PFD (phase frequency detector) as an example, the input end of PFD can be set to be valid at the upper edge or valid at the lower edge, so as to increase the flexibility of setting to handle the calculation with sign bit, when a valid signal edge appears at one of the input ports, the corresponding output signal changes to high level, and when a valid signal edge appears at the other input port, the corresponding output signal also changes to high level, but the high level is unstable and changes to low level instantaneously. Because these two output signals are fed to the inputs of a nand gate, the output signals are coupled to the reset port of the PFD. Thus when the second active edge arrives the nand output goes low (active), the PFD is reset and the PFD output goes low momentarily. Thus, the output square wave has a width equal to the time difference between the first active edge and the second active edge. This achieves a square-wave signal with an adjustable time width, the time width between the two edges corresponding to the digital control signal of the digital-to-time converter DTC.

In this embodiment, each TA has two sub-time amplifiers with the same structure, and each sub-time amplifier has two input ends and two output ends, and is configured to amplify pulse widths of signals input by the two input ends, and then output the amplified signals from the two output ends. The time amplifier TA receives the pulse signal output by the edge detector ED from its two input terminals, amplifies the pulse width of the pulse signal by | M2| and outputs the two signals, wherein M2 is a digital control signal for controlling the amplification factor of the time amplifier TA, and the amplification factor of the time amplifier TA can be set to different values according to different requirements. Since the digital control signal M1 outputs a pulse signal with a time delay | M1| from the reference pulse signal after passing through the digital-to-time converter DTC, and the pulse signal with the time delay | M1| from the reference pulse signal passes through the edge detector ED to output a time-domain pulse signal with a pulse width of M1, after the amplification process of the time amplifier TA, the time delay between the two signals becomes | M1 × M2|, thereby realizing the operation of multiplying the two values M1 and M2. It should be noted that although M2 is the amplification factor of the time amplifier TA, this does not mean that M2 is a value greater than 1, M2 may also be a value equal to 1 or less than 1, or even a negative number with a sign bit.

In this embodiment, as shown in fig. 11, the time register TR is a basic schematic diagram of pulse width accumulation, and TR accumulates a plurality of pulse width signals in a time domain based on a principle of capacitor charging and discharging storage, temporarily stores the pulse signals obtained by each accumulation, and outputs a signal having a total accumulated pulse width. Assuming that the maximum total pulse width capacity of TR is Tf-B a, the input TR is subjected to time domain accumulation of a pulse width signal, and the output of Tf-B a from TR1 can be obtained from fig. 11; then, the second sub-TR 2 completes the time domain operation of Tf- (Tf-B a), i.e. TR outputs a B pulse width by TR2 complementary operation, completes the time domain accumulation of time register TR and outputs the result.

As shown in fig. 12, a circuit configuration diagram of convolution offset parameter input in convolution operation is shown. Including DTC2, DTC3 and one PFD2, one SW2 (dual channel selector switch). The input mode of the bias parameters is the same as the input mode of the convolution kernel weight, the structure is the same, only the input circuit of the bias parameters is connected with one SW2 at the output end of the PFD2, the positive and negative output ends of the PFD2 are connected with the positive and negative input ends of the SW, and the positive and negative output ends of the SW can be connected with the positive and negative input ends of one or more time registers TRn (n is a natural number such as 1, 2, 3 and … …) used for convolution operation, so as to realize the accumulation operation of the convolution bias parameters in the convolution operation. It is the positive and negative inputs of the time register TR for convolution operations that are connected for the accumulation or subtraction operation of the convolution offset parameters.

The dual channel selection switch SW2 includes two input terminals and two corresponding output terminals, when it is first detected that there is a signal input at one of the input terminals, the corresponding output terminal outputs the input signal at the corresponding input terminal, and the other output terminal does not output the signal. An input end is set as a positive input end and is connected with the UP end output of the PFD2 for inputting the non-negative convolution offset parameter value, and the corresponding output end is a positive output end and is used for outputting the non-negative convolution offset parameter value to the positive input end of the TRn for accumulation operation of the convolution offset parameter value. And setting the other input end as a negative input end to be connected with the DN end output of the PFD2 for inputting the negative convolution offset parameter value, and setting the corresponding output end as a negative output end for outputting the negative convolution offset parameter value to the negative input end of the TRn for the subtraction operation of the convolution offset parameter value. For example, SW2 in fig. 12 is connected in parallel with TRn (TR1, TR2, TR3, and TR4) as in fig. 9.

The specific steps of the whole convolution operation are as follows:

1) the DTC0-DTC1-PFD1 converts the convolution kernel weight parameters into corresponding time domain pulse width signals, and provides time domain signal input for a time multiplier TA1 at the later stage.

2) And controlling the amplification factor of TA1 by the input layer image data, performing time domain multiplication operation on the amplification factor and the weight conversion pulse width signal obtained in the step 1, and outputting the amplified pulse width signal to TR1 for accumulation or subtraction and storage. And multiplication and accumulation or subtraction operation in convolution operation is realized.

3) The DTC2-DTC3-PFD2-SW2 convert the convolution kernel offset parameters into corresponding time domain pulse width signals, and output the signals to a time register TR for finishing accumulation or subtraction operation on a time domain, thereby realizing the accumulation or subtraction operation of the convolution kernel offset parameters in the convolution operation.

4) And (3) continuously repeating the operations of the

steps

1, 2 and 3 according to a certain period, and finally completing the whole convolution operation on the time domain circuit.

In this embodiment, the activation step of the convolutional neural network operation is activated by applying the ReLu function, which is simply a process of removing negative numbers, and as shown in fig. 5, a value smaller than 0 in the first feature map data is set to 0. The hardware circuit is realized by the time register TRn in fig. 11. TRn (n is a natural number such as 1, 2, 3, … …) represents a plurality of time registers TRn connected in parallel (for example, TR1, TR2, TR3, and TR4 are connected in parallel in the present embodiment); the specific way for realizing activation by explaining the ReLu activation function based on the example of the embodiment is as follows: TA1-TR 1-TA 4-TR4 work in parallel for convolution operation, TR1-TR4 outputs the multiplication and accumulation positive operation result of the convolution kernel to the next stage of pooling layer for sub-sampling; if the multiplication accumulation or the accumulation subtraction operation of the convolution kernel is a negative value result in the TRn, the TRn outputs 0 to a post-stage pooling layer, and further realizes the process of activating the ReLu function.

In the example of the convolutional neural network operation of this embodiment, the hardware implementation of the pooling operation is implemented by outputting four paths of TA1-TR1 to TA4-TR4 in parallel to a four-input OR gate, and outputting four convolution operation results in parallel by TR1-TR4 is implemented by a four-input OR gate, as shown in fig. 6, a sampling operation of 2 × 2 region is performed, and the maximum pulse width is screened from the output end of TR1-TR4, that is, the process of implementing the maximum pooling operation of the convolutional neural network is implemented.

The working process of the convolution operation and the ReLu activated and pooled hardware circuit is as follows: the data of the specific position of the image mapping input layer are led in four paths of parallel paths in sequence, and are input to four parallel working time amplifiers TA1-TA4 digital control ends to control the amplification factor of the four parallel working time amplifiers; time domain multiplication is carried out on the time pulse width signals corresponding to the convolution kernel weight serially output by the previous stage of DTC0-DTC1-PFD 1; then, the result is parallelly input into four time registers TR1 and TR4 for storage, thus the multiply-accumulate operation of 4 convolution kernels is carried out in series and repeatedly; secondly, inputting time pulse width signals corresponding to convolution kernel offset serially output by DTC2-DTC3-PFD2 to an offset input end of TR1-TR4 to finish time domain accumulation operation; therefore, the convolution operation of the convolution layer on the time domain is completed, and finally, TR1-TR4 outputs time pulse width results in four paths in parallel to a four-input OR gate at the later stage for carrying out the pooling operation process. If the final output result is a negative value, a pulse signal with the pulse width of 0 is output, and the activation process of the ReLu function is realized by setting TR1-TR 4.

In the present embodiment, the example of the convolutional neural network operation is illustrated in which the fully connected convolutional operation circuit and the fully connected bias parameter input circuit are as shown in fig. 13. The full-connection module comprises a plurality of full-connection sub-modules, each full-connection sub-module comprises a TA (10 in total from TA5-TA14 in the figure) and a TR (10 in total from TR5-TR14 in the figure), the input end of each TA is used for inputting a signal representing a full-connection operation numerical value, the signal is amplified by the TA, and the amplification factor is W_{n_j}。W_{n_j}A signal representing a value to be fully concatenated. And outputting the signal subjected to the product operation by the TA to the TR for accumulation operation, and finally outputting the result subjected to the accumulation operation through the output end of the TR. TR is also calculated by adding or subtracting full link offsets, or by adding or subtracting bd 1. bd1 is obtained by fully connecting the bias block circuit inputs.

The fully connected bias module circuit comprises a DTC4, a DTC5, a PFD3 and a plurality of SW2m (dual-channel selection switches). The input mode of the fully-connected bias parameters is the same as the input mode of the convolution kernel weight, the structure is the same, only the input circuit of the bias parameters is connected with M SW2M (the number of M is the same as that of the fully-connected modules TR) at the output end of the PFD3, the positive and negative output ends of the PFD3 are connected with the positive and negative input ends of the SW2M, and the positive and negative output ends of the SW2M can be connected with the fully-connected bias positive and negative input ends of one or more time registers TRn (the number of n is the number of the fully-connected modules TR) for realizing the accumulation operation of the fully-connected bias parameters in the convolution neural network. The fully-connected offset positive and negative input ends of the fully-connected operation time register TR are used for the accumulation or subtraction operation of the fully-connected offset parameters.

A convolution operation and full-connection operation circuit for a convolution neural network and an operation circuit for the convolution neural network are composed of a phase discriminator, a digital time converter, a time amplifier and a time register in a time domain calculation circuit. The provided calculating circuit based on the analog time domain can complete operations such as multiply-accumulate operations required by convolution neural network convolution operations on the time domain, has high time calculation precision, can save additional storage area and reduce corresponding power consumption, and is completely compatible with a CMOS (complementary metal oxide semiconductor) process.

Example two

As shown in fig. 14, the hardware framework structure diagram of the convolutional neural network includes a clock module, a network parameter importing module, a synchronization control module, a convolutional weight input module, a convolutional module, an activation pooling module, a full-link module, a convolutional offset input module, a full-link layer offset input module, and a logic classifier. Where CLKREF represents the reference pulse signal.

The clock module is used for a pulse generator of the clock input of the full circuit and can be realized by adopting a crystal oscillator circuit or a phase-locked loop circuit and the like. The network parameter importing module is used for importing the set network parameters into the time domain circuit, which is equivalent to completing the forward calculation of the CNN network. The synchronous control module is used for controlling the sequence of importing image digital information and network parameter information in the operation of the convolutional neural network, so that the synchronization of the export of the convolution operation result and the import of the full-connection layer and the synchronization of the stable export of the operation result of the full-connection layer and the import of the logic classifier are realized. The convolution weight input module is used for importing a convolution operation weight value. The convolution module is used for realizing convolution operation of the convolution neural network. And the activation pooling module is used for realizing the activation pooling operation of the convolutional neural network. And the full-connection module is used for realizing the weight multiplication accumulation operation of the full-connection layer of the convolutional neural network. The convolution offset input module is used for importing convolution layer offset parameters. The full-connection layer bias input module is used for importing full-connection layer bias parameters. The logic classifier is used for receiving the pulse width signal output by each output channel of the full-connection module, recording the pulse width of each output signal and the serial number of the corresponding output signal, and determining the maximum value and the serial number of the output maximum value channel.

The convolution weight input module comprises two Digital Time Converters (DTCs) and a Phase Frequency Detector (PFD), the input end of one DTC is used for inputting the positive value of a convolution kernel weight value, the input end of the other DTC is used for inputting the negative value of the convolution kernel weight value, and the output ends of the two DTCs are respectively connected to the INN end and the INP end of the PFD. The DTC for positive input outputs a clock signal to the INN lagging negative input of the following PFD; the DTC for the negative input outputs a clock signal to the INP leading positive input of the following PFD. The introduction of the convolution kernel weight value is realized by controlling whether the PFD is an INN output signal or an INP output signal.

The convolution module comprises a plurality of time amplifiers TA and a plurality of corresponding time registers TR. One TA and one TR are in a group, each group of time register TR comprises two groups of positive and negative input ends and two output ends in a hardware structure, each group of positive and negative input ends comprises a positive input end and a negative input end, the positive input end of one group of positive and negative input ends is connected with the positive result output corresponding to the previous TA to accumulate the time pulse width signals of the positive multiplication results, and the negative input end of the group of positive and negative input ends is connected with the negative result output corresponding to the previous TA to accumulate the time pulse width signals of the negative multiplication results. And the other group of positive and negative input ends are used for the accumulation operation of the convolution offset parameters and are connected with the output end of a double-channel selection switch SW of the convolution offset parameter input module.

The active pooling module is a multi-path input OR gate, each input end of the OR gate is connected with the output end of each time register TR in the convolution module, and the output end of the OR gate outputs a signal with the maximum pulse width in each time register TR in the convolution module and outputs the signal to the full-connection module.

The fully connected module comprises a plurality of time amplifiers TA and a plurality of corresponding time registers TR. One TA and one TR are in a group, each group of time register TR comprises two groups of positive and negative input ends and two output ends in a hardware structure, each group of positive and negative input ends comprises a positive input end and a negative input end, the positive input end of one group of positive and negative input ends is connected with the positive result output corresponding to the previous TA to accumulate the time pulse width signals of the positive multiplication results, and the negative input end of the group of positive and negative input ends is connected with the negative result output corresponding to the previous TA to accumulate the time pulse width signals of the negative multiplication results. And the other group of positive and negative input ends are used for accumulation operation of the full-connection offset parameters and are connected with the output end of a double-channel selection switch SW of the full-connection offset parameter input module.

The output end of each time register TR in the full-connection module is respectively connected with the corresponding input end of the logic classifier.

The convolution offset input module comprises two Digital Time Converters (DTC), a Phase Frequency Detector (PFD) and a dual-channel selection Switch (SW). The input mode of the offset parameter is the same as the input mode of the convolution kernel weight, the structure is the same, only the input circuit of the offset parameter is connected with the input end of a double-channel selection switch SW at the output end of a PFD, the output end of the PFD is connected with the input end of the double-channel selection switch SW, the output end of the double-channel selection switch SW is connected with a group of positive and negative input ends of a plurality of time registers TR of a convolution module, the positive result corresponding to the positive input SW of the positive and negative input ends of each time register TR is output, the time pulse width signals of the positive result are accumulated, and the negative input end of the group of positive and negative input ends is connected with the negative result corresponding to the SW, so that the accumulation operation of the convolution layer offset parameter in the convolution calculation is realized. It is connected to the other set of positive and negative inputs of the time register TR in the convolution module for the accumulation operation of the convolution offset parameters.

The full-connection layer bias input module comprises two Digital Time Converters (DTC), a Phase Frequency Detector (PFD) and a plurality of double-channel selection Switches (SW). The number of the dual-channel selection switches SW is the same as the number of output paths of full connection operation, for example, 10 output paths need to be connected with 10 dual-channel selection switches SW. The structure of the convolution offset input module is the same as that of the convolution offset input module, except that the output end of the PFD is connected with the input ends of a plurality of double-channel selection switches SW, the output end of each double-channel selection switch SW is connected with a group of positive and negative input ends in a corresponding time register TR in the full-connection module, positive results corresponding to the positive input SW of the positive and negative input ends of each time register TR are output, time pulse width signals of the positive results are accumulated, and the negative input ends of the group of positive and negative input ends are connected with negative results corresponding to the SW to realize the accumulation operation of full-connection layer offset parameters in full-connection calculation.

For example, for time amplifiers such as TA5-TA14, the amplification factor is controlled by introducing network parameters such as full link layer weight. The network parameters of the weight have positive and negative values, for example, when the network parameters of the weight of the full connection layer are positive, the network parameters are led into the gain control end of the positive time pulse width amplifier to adjust the amplification factor of the positive time pulse width amplifier, and the gain control end of the negative time pulse width amplifier is reset and does not amplify; when the weight network parameter of the full connection layer is negative, the network parameter is led into the gain control end of the negative time pulse width amplifier to adjust the amplification factor of the negative time pulse width amplifier, and the gain control end of the positive time pulse width amplifier is reset and does not amplify.

As shown in fig. 14, the hardware circuit operation process of the whole convolutional neural network is as follows: a digital time converter formed by DTC0-DTC1 converts the convolution kernel weight network parameters into clock signals of two paths of time delay signals and outputs the clock signals to a rear-stage PFD 1; the PFD1 further obtains a time domain pulse signal of a convolution kernel weight network parameter corresponding to certain pulse width information and outputs the time domain pulse signal to a post-stage four-path parallel working TA1-TA 4; the four TA1-TA4 work in parallel, data at four specific positions of an Image Map layer are respectively imported to control the time amplification times of the data, convolutional layer time domain multiplication operation of pulse width signals corresponding to the weight of a convolutional kernel is completed, and four paths of time multiplication results are output to TR1-TR 4; a digital time converter formed by DTC2-DTC3 converts the parameters of the convolution kernel bias network into clock signals of two paths of time delay signals and outputs the clock signals to a rear-stage PFD 2; the PFD2 further obtains time domain pulse signal output of a certain pulse width information corresponding to the convolution kernel bias network parameters, and the time domain pulse signal output is input to a rear-stage four-way parallel working TR1-TR4 through two-way two-channel switches; TR1-TR4 accumulates and sums the time multiplication result of the convolution kernel weight and the time pulse width signal corresponding to the convolution kernel offset to complete the time domain convolution operation of the convolution kernel; outputting convolution operation time pulse width results to a rear-stage four-input OR gate for maximum pooling operation by TR1-TR4 parallel four paths, and outputting the results to TA5-TA 14; ten paths of TA5-TA14 time amplifiers work in parallel, the amplification factor of the time amplifiers is controlled by introducing network parameters such as the weight of a full connection layer, the full connection time domain multiplication operation of the time amplifiers and the pulse width signals output by the upper level of pooling layer is completed, and ten paths of time multiplication results are output to TR5-TR 14; a digital time converter formed by DTC4-DTC5 converts the parameters of the full-connection layer bias network into clock signals of two paths of time delay signals and outputs the clock signals to a rear-stage PFD 3; the PFD3 further obtains time domain pulse signal output of the full connection layer bias network parameter corresponding to certain pulse width information, and the time domain pulse signal output is input to a rear-stage ten-path parallel working TR5-TR14 through ten-path two-channel switches; TR5-TR14 carries out accumulation summation on the time multiplication result of the weight of the full connection layer and the time pulse width signal corresponding to the bias of the full connection layer to complete the time domain multiplication accumulation operation of the full connection layer; and finally, outputting the full-connection operation time pulse width result to a post-stage logic classifier SoftMax by ten parallel channels TR5-TR14 to complete the classification and identification functions of the CNN neural network. Note that: wherein the clock generation module generates REF to provide a reference clock input for the digital-to-time converters DTC0-DTC 5; the network parameter control module mainly controls the import process of the 5-class convolutional neural network parameters; the global synchronization module mainly provides two synchronization signals to control the result output processes of TR1-TR4 and TR5-TR14 respectively.

The locations and steps for writing the time domain circuit for the convolutional neural network parameters are shown in FIG. 15. When the overall circuit realizes the calculation of the convolutional neural network, the problem of signal input and output synchronization needs to be considered.

For the input of the network parameters and the image data, firstly, the signed decimal to binary conversion is processed, the high order is not regarded as a sign bit, then the binary to decimal conversion is carried out, and finally, the conversion result is written into a file for storage and is used for importing and finishing the control of the convolution neural network operation of the relevant time domain circuit.

The network parameters are input and controlled from a convolutional neural network algorithm to an input circuit, and certain rules need to be read. Next, taking an example in this embodiment as an example, as shown in fig. 16, with respect to the CNN hardware structure illustrated in fig. 14, fig. 16 shows extraction rules of two network parameters, i.e., input layer image data and convolutional layer convolutional kernel weight, in the overall serial convolution and local parallel pooling process of the system circuit. Dividing 16 sub-regions of 4 x 4 in a 7 x 7 image input layer, and performing 16 convolution multiplication accumulation operations with convolution kernels of 4 x 4 respectively; the specific positions designated by the four colors are used as sub-image areas, 4-way parallel reading is performed to TA1-TA4, and then the lower level activation and pooling operations are performed, wherein,

the first row dark portion is the partial character image data to be serially imported into the circuit by time amplifier TA 1;

the second row dark color part is the partial character image data to be serially led into the circuit by the time amplifier TA 2;

the third row dark color part is the local character image data to be serially led into the circuit by the time amplifier TA 3;

the fourth dark portion is the partial character image data to be serially imported into the circuit by time amplifier TA 4.

In the process of calculating the CNN convolution of one 7 × 7 picture, 4 times of reading image data are needed to be performed by one convolution kernel for (1) - > (2) - > (3) - > (4), and 6 convolution kernel weight data are input to the DTC0-DTC1 in series to be repeatedly imported for 4 × 4 ═ 6 ═ 384 times; 4 local sub-image data are repeatedly imported into TA1-TA2-TA3-TA4 in series, and the total number of the local sub-image data is 4 × 6 ═ 384 times; the convolutional kernel offset data is repeatedly led into the DTC2-DTC3 in series for 24 times; the total connection weight data is 10 × 24 to 240, 10 parts are divided, and the data are respectively led into TA5-T14 in series for 24 times; the full-connection bias data are serially led into 10 DTCs 4-DTC5, and are finally led into TR5-TR14 in series.

In the CNN circuit structure of this embodiment, signals of each circuit are imported in a serial manner under the global synchronous clock period count, and the serial import period value constraint design for each network parameter of the CNN circuit structure is set by referring to the working characteristics and requirements of each time domain computing circuit module correspondingly. For example, each network parameter counting unit takes a global clock CLKREF period Tp as a unit, and is mainly constrained by a digital time converter DTC actual circuit, and the period of an input clock of the network parameter counting unit may be Tp-10 ns; the convolution kernel weight serial lead-in period is Naij ═ 2 × (20+ 4).

According to the circuit timing specification of the following table, wherein a global signal SYC _ Finish _ bc is TR1-TR4 for convolutional layer convolution operation, the operation result is stably exported and synchronized with the full link layer import process, a global signal SYC _ Finish _ bd is TR4-TR14 for full link layer operation, and the operation result is stably exported and synchronized with the SoftMax logic classifier import process. The weight of character image input data and a full connection layer is 4-bit fixed point quantization, 0-bit decimal place, and time amplification gain of TA1-TA14 is controlled; the convolutional kernel weight of the convolutional layer, convolutional offset of the convolutional layer and full link layer offset are 4-bit fixed point quantization, 2-bit decimal, and the magnitude of digital time conversion of the control DTC0-DTC5 is imported.

By applying the circuit framework structure of the present embodiment, a hardware simulation model as in fig. 17 was constructed. Under the Cadence-AMS platform, Verilog-A modeling simulation verification is carried out on the proposed time domain circuit structure, and the results of the two calculation methods are compared as shown in the following table:

from the above table, it can be seen that the results obtained by identifying the 7 × 7 mini character pictures by the algorithm and the circuit model at the same time are consistent, and the calculation result of the maximum channel of the two full-connection layers is complete, that is, the circuit completely implements the convolutional neural network algorithm without any difference from the algorithm, which indicates that the circuit design of the CNN convolutional neural network provided by the present invention on the time domain has realizability and accuracy.

The feasibility of the CNN convolutional neural network algorithm based on time domain operation module circuit for hardware circuit mapping realization is verified through specific circuit system modeling simulation, and the modeling simulation realizes algorithm operation and judgment on consistency and accuracy of time domain circuit model simulation calculation results.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A circuit for convolutional neural network convolutional operation, comprising:

2. The circuit of claim 1, wherein the convolution weight input module comprises:

3. The circuit of claim 1, wherein the convolution sub-module comprises:

a time register having a positive input terminal, a negative input terminal, and an output terminal; the positive input end of the time register is connected with the positive output end of the time amplifier, the negative input end of the time register is connected with the negative output end of the time amplifier, and the output end of the time register is the output end of the convolution submodule; the time register is used for accumulating and calculating the signal received by the positive input end of the time register as an addend, accumulating and calculating the signal received by the negative input end of the time register as a subtracter, and outputting a signal representing the final calculation result through the output end of the time register.

4. The circuit of claim 1, further comprising:

a convolutional layer bias input module having a reference pulse signal input, a convolutional layer bias value input, a lead-lag control signal input, a positive output, and a negative output; the reference pulse signal input end of the convolutional layer bias input module is used for inputting the reference pulse signal; the convolutional layer offset value input end is used for inputting a signal representing the convolutional layer offset value; the lead-lag control signal input end of the convolutional layer offset input module is used for inputting a lead-lag control signal; the convolutional layer offset input module is used for judging the negativity and the positivity of the convolutional layer offset value represented by the received signal at the convolutional layer offset value input end according to the lead-lag control signal, and outputting the convolutional layer offset value through the negative output end when the convolutional layer offset value is judged to be negative, and outputting the convolutional layer offset value through the positive output end when the convolutional layer offset value is judged to be non-negative;

5. The circuit of claim 4, wherein the convolutional layer bias input module comprises:

a third digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the third digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the third digital-to-time converter is connected with the convolutional layer offset value input end and is used for inputting the signal representing the convolutional layer offset value; the third digital-to-time converter outputs a third pulse signal through an output end of the third digital-to-time converter when receiving the signal for representing the convolutional layer bias value, wherein the time difference between the third pulse signal and the reference pulse signal is the convolutional layer bias value;

a second edge detector having a leading positive input, a lagging negative input, a positive output, and a negative output; the leading positive input end of the second edge detector is connected with the output end of the third digital-to-time converter and used for receiving the third pulse signal; a lagging negative input of the second edge detector is connected to the output of the second digital-to-time converter for receiving the fourth pulse signal; the second edge detector outputs a pulse signal through the positive output end when detecting the edge of the third pulse signal firstly, and outputs a pulse signal through the negative output end when detecting the edge of the fourth pulse signal firstly; wherein the width of the pulse signal is equal to the time difference between the edge of the third pulse signal and the corresponding edge of the fourth pulse signal;

6. A circuit for convolutional neural network operations, comprising:

a convolution operation circuit for a convolutional neural network as claimed in any one of claims 1 to 5; when the signal which is used for outputting by the convolution submodule in the convolution operation circuit and represents the final calculation result is a non-positive value, a pulse signal with 0 pulse width is output;

the full-connection operation circuit comprises a full-connection module; the full-connection module comprises one or more independent full-connection sub-modules; each path of fully-connected sub-module is provided with a value-to-be-fully-connected input end, a fully-connected input end and an output end; the input end of the value to be fully connected is used for inputting a signal representing an operation value to be fully connected; the full-connection value input end is used for inputting a signal of a full-connection operation value; the full-connection submodule is used for amplifying a signal input by a full-connection value input end of the full-connection submodule and accumulating and calculating the amplified signal as an addend, wherein the amplification times of the full-connection submodule on the signal are the to-be-fully-connected operation value; the fully-connected submodule outputs a signal representing a final result through an output end of the fully-connected submodule;

7. The circuit of claim 6, wherein the fully connected sub-module comprises:

8. The circuit of claim 6, further comprising:

the full-connection layer bias input module is provided with a reference pulse signal input end, a full-connection layer bias value input end, a lead-lag control signal input end, a positive output end and a negative output end; the reference pulse signal input end of the full connection layer bias input module is used for inputting the reference pulse signal; the full link layer bias value input end is used for inputting a signal representing a full link layer bias value; the lead-lag control signal input end of the full connection layer bias input module is used for inputting a lead-lag control signal; the full-connection layer offset input module is used for judging the negativity and the positivity of a full-connection layer offset value represented by a signal received by a full-connection layer offset value input end according to the lead-lag control signal, outputting the full-connection layer offset value through a negative output end when the full-connection layer offset value input end judges the negativity, and outputting the full-connection layer offset value through a positive output end when the full-connection layer offset value input end judges the nonnegativity;

9. The circuit of claim 8, wherein the fully-connected layer bias input module comprises:

a fifth digital-to-time converter having an input terminal, a control terminal, and an output terminal; the input end of the fifth digital-to-time converter is connected with the reference pulse signal input end and used for receiving the reference pulse signal; the control end of the fifth digital-to-time converter is connected with the input end of the full link layer offset value and is used for inputting the signal representing the full link layer offset value; when the fifth digital-to-time converter receives the signal for representing the full link layer offset value, a fifth pulse signal is output through an output end of the fifth digital-to-time converter, wherein the time difference between the fifth pulse signal and the reference pulse signal is the full link layer offset value;

a third edge detector having a leading positive input terminal, a lagging negative input terminal, a positive output terminal, and a negative output terminal; the leading positive input end of the third edge detector is connected with the output end of the fifth digital-to-time converter and used for receiving the fifth pulse signal; a lagging negative input of the third edge detector is connected with the output end of the fourth digital-to-time converter and is used for receiving the sixth pulse signal; the third edge detector outputs a pulse signal through the positive output end when detecting the edge of the fifth pulse signal firstly, and outputs a pulse signal through the negative output end when detecting the edge of the sixth pulse signal firstly; wherein the width of the pulse signal is equal to the time difference between the edge of the fifth pulse signal and the corresponding edge of the sixth pulse signal;

a second dual channel select switch having a positive input terminal, a negative input terminal, a positive output terminal, and a negative output terminal; a positive input end of the second double-channel selection switch is connected with a positive output end of the third edge detector and is used for receiving a signal output by the positive output end of the third edge detector; the negative input end of the second double-channel selection switch is connected with the negative output end of the third edge detector and is used for receiving a signal output by the negative output end of the third edge detector; when the positive input end of the second dual-channel selection switch inputs a signal before the negative input end of the second dual-channel selection switch inputs a signal, the received signal is output through the positive output end of the second dual-channel selection switch, and the negative output end of the second dual-channel selection switch does not output a signal; when the negative input end of the second dual-channel selection switch inputs signals before the positive input end of the second dual-channel selection switch inputs signals, the received signals are output through the negative output end of the second dual-channel selection switch, and the positive output end of the second dual-channel selection switch does not output signals; and the positive output end of the second double-channel selection switch is the positive output end of the full connection layer bias input module, and the negative output end of the second double-channel selection switch is the negative output end of the full connection layer bias input module.