CN113762480B

CN113762480B - Time sequence processing accelerator based on one-dimensional convolutional neural network

Info

Publication number: CN113762480B
Application number: CN202111065987.1A
Authority: CN
Inventors: 刘冬生; 朱令松; 陆家昊; 胡昂; 魏来; 成轩
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2021-09-10
Filing date: 2021-09-10
Publication date: 2024-03-19
Anticipated expiration: 2041-09-10
Also published as: CN113762480A

Abstract

The invention discloses a time sequence processing accelerator based on a one-dimensional convolutional neural network, which belongs to the field of artificial intelligence and integrated circuit design, and comprises the following components: the input processing module comprises N rows of register groups, wherein the number of registers in the first row of register groups is N, and the number of registers in each row of register groups is reduced by one row; under the cooperative control of the global control module, each reasoning data sequentially passes through the register reg _1N Input, cross-flow in first row register set, through register reg ₁₁ The data in the first row of register sets is outputted and passed through the register reg after flowing longitudinally in each row of register sets _nn Output, n=2, 3, …, N; the convolution operation array carries out convolution operation and activation on the data output by the input processing module, the pooling processing module pools the activation result and outputs the activation result, and the full-connection processing module carries out full-connection addition operation on the activation result and outputs the activation result. The method realizes the multiplexing of the reasoning data, reduces the moving amount of the reasoning data, and improves the network operation efficiency and the configurability.

Description

Time sequence processing accelerator based on one-dimensional convolutional neural network

Technical Field

The invention belongs to the field of artificial intelligence and integrated circuit design, and in particular relates to a time sequence processing accelerator based on a one-dimensional convolutional neural network.

Background

Convolutional neural network (Convolutional NeuralNetworks, CNN) has advantages of high prediction accuracy, good classification effect, and wide tolerance of data sets, and has been widely used in various recognition and classification tasks such as image recognition, voice recognition, text recognition, and information classification in recent years. Along with the increasing scale and the increasing depth of the convolutional neural network, a great number of problems of parallel multiplication and addition operation, massive data moving operation and the like are brought, and the problems become important factors influencing the operation efficiency of the convolutional neural network.

The method has the advantages that the time sequence related problems such as electrocardiographic classification, electroencephalogram identification and the like are realized by using a one-dimensional convolutional neural network model algorithm, and a good effect can be obtained. The problems generally need to be read, operated and output continuously for a long time, and have higher requirements on the operation efficiency and the power consumption of the neural network accelerator. However, the computational power of existing embedded central processor architecture devices is far from supporting the above operations, and in such a context, hardware circuit based convolutional neural network accelerators have evolved.

The acceleration of convolutional neural network operation by using a hardware circuit is a novel direction at present. The existing convolutional neural network accelerator realized based on a hardware circuit is designed for a two-dimensional convolutional neural network, and the advantages of high parallelism and high efficiency can not be fully reflected when the convolutional neural network accelerator is directly applied to one-dimensional convolutional neural network operation such as electrocardio and myoelectricity.

Disclosure of Invention

Aiming at the defects and improvement demands of the prior art, the invention provides a time sequence processing accelerator based on a one-dimensional convolutional neural network, which aims to improve the operation efficiency, the configurability and the energy efficiency ratio of the one-dimensional convolutional neural network.

In order to achieve the above purpose, the invention provides a time series processing accelerator based on a one-dimensional convolutional neural network, which comprises an input processing module, a convolutional operation array, a pooling processing module, a fully-connected processing module and a global control module; the input processing module comprises N rows of register groups, the number of registers in the first row of register groups is N, the number of registers in each row of register groups is reduced by one row, each column of registers in the first row of register groups are sequentially connected, each row of register groups are sequentially connected with each column of registers, and N is the size of a convolution kernel in the convolution operation array; under the cooperative control of the global control module, each reasoning data sequentially passes through a register reg _1N Input in the first row of registersCross-flow post-pass register reg in group ₁₁ The data in the first row of register sets is outputted and passed through the register reg after flowing longitudinally in each row of register sets _nn Output, n=2, 3, …, N, where register reg _ij Registers in the (N-j+1) th column in the (i) th row register set; the convolution operation array is used for carrying out convolution operation and activation on the data output by the input processing module, the pooling processing module is used for pooling the activation result and outputting the result, and the full-connection processing module is used for carrying out full-connection addition operation on the activation result and outputting the result.

Still further, the convolution kernel in the time-series processing accelerator includes: n multipliers, two inputs of the ith multiplier are registers reg respectively _ii I=1, 2, …, N; the two inputs of the first adder are convolution kernel offset and convolution part sum of the corresponding position of the input characteristic diagram of the previous round, the two inputs of the kth adder are output of the kth-1 adder and output of the kth multiplier, k=2, 3, …, n+1, and the output of the nth+1 adder is convolution part sum of the corresponding position of the input characteristic diagram of the present round.

Still further, the method further comprises: the input of the first multiplexer is connected with the output of the second adder to the N adder, the Fcmode port is used for receiving a mode control signal s of the global control module and outputting the output of the s adder corresponding to the mode control signal s, and s=2, 3, … and N; when the first multiplexer receives the mode control signal s, the working mode of the convolution kernel is full-connection layer operation, otherwise, the working mode of the convolution kernel is convolution operation.

Still further, the method further comprises: and the reasoning data storage module is used for storing the reasoning data originally input into the time sequence processing accelerator and the reasoning data output by the convolution operation array, the pooling processing module and the fully-connected processing module, and outputting the stored reasoning data to the input processing module.

Further, the reasoning data storage module comprises a plurality of partitions which are respectively used for storing different types of reasoning data; the input processing module also comprises a second multiplexer, and the second multiplexer is used for selecting the reasoning data or the zero filling sequence in the corresponding partition according to the current layer state of the time sequence processing accelerator and sending the reasoning data or the zero filling sequence to the input processing module.

Still further, the system also comprises a convolution kernel weight storage module and a convolution kernel offset storage module which are respectively used for storing the convolution kernel weight and the convolution kernel offset in each layer of convolution operation.

Still further, the system also comprises a reset port, an enable port and an output port; when the enabling port receives continuous enabling signals and the resetting port receives resetting enabling signals with more than two clock cycles, the time sequence processing accelerator conducts one-dimensional convolutional neural network reasoning, and after the reasoning is completed, the output port outputs high-level pulse to represent completion.

Furthermore, two signal lines are connected between the global control module and the pooling processing module, and the global control module outputs a maximum pooling layer mark or a global average pooling layer mark to the pooling processing module through the two signal lines so as to control the pooling operation mode of the pooling processing module.

In general, through the above technical solutions conceived by the present invention, the following beneficial effects can be obtained:

(1) Providing a new register set structure capable of carrying out transverse data flow and longitudinal data flow in parallel for an input processing module, forming a new data flow processing mode, generating a pipeline data flow and providing the pipeline data flow to a convolution operation related unit, wherein the data flow can simultaneously use multiplication and addition resources in a convolution operation multiplication and addition unit, so that an additional addition circuit is obviously reduced, and meanwhile, the data flow enables reasoning data to be multiplexed, and reduces the additional overhead generated by reasoning data movement;

(2) In the convolution operation array, partial sum of the input characteristic diagram and convolution kernel offset addition can be synchronously carried out with the previous round in the convolution operation process, and the operation of the whole convolution kernel can be completed by only one extra adder, so that the number of the adders is further reduced;

(3) The multiplexer is arranged to be connected to the output end of each multiplication and addition unit in the convolution kernel, the convolution operation array can also realize multiplication operation of a full connection layer by controlling the multiplexer to select and output the calculation result of the corresponding multiplication and addition unit, and the resource cost generated by the additional multiplication array is reduced.

Drawings

FIG. 1 is a schematic diagram of a time-series processing accelerator based on a one-dimensional convolutional neural network according to an embodiment of the present invention;

fig. 2 is a schematic circuit diagram of an input processing module according to an embodiment of the present invention;

FIG. 3 is a schematic circuit diagram of a convolution operation array according to an embodiment of the present disclosure;

fig. 4 is a schematic diagram of a workflow of a time series processing accelerator reasoning based on a one-dimensional convolutional neural network according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. In addition, the technical features of the embodiments of the present invention described below may be combined with each other as long as they do not collide with each other.

In the present invention, the terms "first," "second," and the like in the description and in the drawings, if any, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Fig. 1 is a schematic diagram of a time-series processing accelerator based on a one-dimensional convolutional neural network according to an embodiment of the present invention. Referring to fig. 1, a time-series processing accelerator based on a one-dimensional convolutional neural network in this embodiment will be described in detail with reference to fig. 2 to 4.

Referring to fig. 1, the time series processing accelerator based on the one-dimensional convolutional neural network comprises an input processing module, a convolutional operation array, a pooling processing module, a full-connection processing module and a global control module. The input processing module comprises N rows of register groups, the number of registers in the first row of register groups is N, the number of registers in each row of register groups is reduced by one row, each column of registers in the first row of register groups are sequentially connected, each row of register groups are sequentially connected with each column of registers, and N is the size of a convolution kernel in the convolution operation array, as shown in fig. 2.

Under the cooperative control of the global control module, the input processing module, the convolution operation array, the pooling processing module and the full-connection processing module perform the following operations. Each reasoning data sequentially passes through the register reg _1N Input, cross-flow in first row register set, through register reg ₁₁ The data in the first row of register sets is outputted and passed through the register reg after flowing longitudinally in each row of register sets _nn Output, n=2, 3, …, N, where register reg _ij Is the register of the N-j+1 column in the i-th row register set. The convolution operation array carries out convolution operation and activation on the data output by the input processing module, the pooling processing module pools the activation result and outputs the activation result, and the full-connection processing module carries out full-connection addition operation on the activation result and outputs the activation result.

In accordance with an embodiment of the present invention, the time series processing accelerator based on the one-dimensional convolutional neural network further comprises an inference data storage module, as shown in fig. 1. The reasoning data storage module is used for storing the reasoning data originally input into the time sequence processing accelerator and the reasoning data output by the convolution operation array, the pooling processing module and the full-connection processing module, and outputting the stored reasoning data to the input processing module. The input processing module processes the data stored in the reasoning data storage module and outputs the processed data. Further, the reasoning data storage module may be divided into a plurality of partitions for storing different types of reasoning data, respectively. For example, the reasoning data storage module is divided into two partitions, namely a storage group 1PixelRAM1 and a storage group 2 PixelRAM 2, which are respectively used for storing the original reasoning data input into the time sequence processing accelerator and the intermediate reasoning data obtained by the accelerator processing.

This embodimentIn the present embodiment, the circuit configuration of the input processing module is described with n=5 (convolution kernel size is 5×1). Referring to FIG. 2, reg ₁₁ ～reg ₁₅ 、reg ₂₂ ～reg ₂₅ 、reg ₃₃ ～reg ₃₅ 、reg ₄₄ ～reg ₄₅ Reg ₅₅ All of which are data flow register sets. The input processing module further comprises a second multiplexer, the port PixelInMode is an input data chip selection signal of the second multiplexer, and is used for receiving the PixelInMode signal (determined by the current layer state of the time sequence processing accelerator) sent by the global control module, so that the second multiplexer selects reasoning data or zero filling sequences (corresponding to PADDING input modes) in corresponding partitions according to the current layer state of the time sequence processing accelerator and sends the reasoning data or zero filling sequences (corresponding to PADDING input modes) to the input processing module, and the chip selection process of the reasoning data input and PADDING filling is completed. The port MulAddEn is an enable signal for turning on/off the longitudinal flow of data in the register set, i.e. turning on/off the shift-up operation in the register set. Ports PixelIn1 and PixelIn2 are reasoning data reading ports; ports PixelW 1-PixelW 5 are reasoning data stream output ports, and are correspondingly connected with reasoning data input ports of the same 5×1 convolution kernels MAC 1-MAC 5 respectively.

Specifically, when a round of input feature map starts to be input, the second multiplexer performs chip selection according to the PixelInMode and inputs the chip-selected data into reg ₁₅ . The next clock cycle comes, reg ₁₅ Data to the left to feed reg ₁₄ And the next data from the second multiplexer is fed into reg ₁₅ The method comprises the steps of carrying out a first treatment on the surface of the And then the next clock cycle reg ₁₄ Data to the left to feed reg ₁₃ ，reg ₁₅ Data to the left to feed reg ₁₄ The next clock cycle, the data from the second multiplexer is sent to reg ₁₅ . Thus reciprocating reg ₁₁ ～reg ₁₅ Performing a data move operation to the left until reg ₁₁ ～reg ₁₅ And (5) finishing the filling.

When reg ₁₁ ～reg ₁₅ When the filling is finished, the global control module synchronously gives MulAddEn enable and reg ₁₁ Medium data is output through PixelW1, reg ₁₂ ～reg ₁₅ The data in the medium is moved upwards to correspondingly send reg ₂₂ ～reg ₂₅ At the same time reg ₁₂ ～reg ₁₅ Synchronously moving the data in the buffer to the left; temporary, reg, next clock cycle ₁₁ ～reg ₁₅ Moving data according to the above operations reg ₂₂ Is output through PixelW2 while reg ₂₃ ～reg ₂₅ The data in the middle correspondingly moves upwards to reg ₃₃ ～reg ₃₅ In (a) and (b); and then the next clock cycle comes, reg ₁₁ ～reg ₁₅ 、reg ₂₂ ～reg ₂₅ Moving data according to the above operations reg ₃₃ Is output through PixelW3 while reg ₃₄ 、reg ₃₅ Corresponding upward movement of data to reg ₄₄ 、reg ₄₅ In (a) and (b); and then the next clock cycle comes, reg ₁₁ ～reg ₁₅ 、reg ₂₂ ～reg ₂₅ 、reg ₃₃ ～reg ₃₅ Moving data according to the above operations reg ₄₄ Data in (a) is output through PixelW4, and reg ₄₅ Moves up to reg ₅₅ The method comprises the steps of carrying out a first treatment on the surface of the On the next clock cycle, reg ₁₁ ～reg ₁₅ 、reg ₂₂ ～reg ₂₅ 、reg ₃₃ ～reg ₃₅ 、reg ₄₄ ～reg ₄₅ Moving data according to the above operations reg ₅₅ The data in the data are output through the PixelW5, the subsequent clock cycles are repeated, and the pipelined inference data stream is continuously output to the convolution operation array.

When the last data of the input feature diagram of the present round is composed of reg ₅₅ When the input characteristic diagram is fed into the PixelW5 output port, the global control module stops enabling MulAddEn, closes the upward moving process of the register group, and completes the data flow processing of the reasoning data of the input characteristic diagram of the round. At the same time, the next round of input profile inference data can continue to fill reg by shifting left ₁₁ ～reg ₁₅ The forming of the input pipelined data stream is thus reciprocally completed.

The time series processing accelerator based on the one-dimensional convolutional neural network further comprises a convolutional kernel weight storage module and a convolutional kernel bias storage module, which are shown in fig. 1. The convolution kernel weight storage module and the convolution kernel offset storage module are respectively used for storing the convolution kernel weight and the convolution kernel offset in the convolution operation of each layer.

The convolution kernel in the time-series processing accelerator includes: n multipliers, two inputs of the ith multiplier are registers reg respectively _ii I=1, 2, …, N; the two inputs of the first adder are convolution kernel offset and the convolution part sum of the corresponding position of the input characteristic diagram of the previous round, the two inputs of the kth adder are the output of the kth-1 adder and the output of the kth multiplier, k=2, 3, …, n+1, and the output of the nth+1 adder is the convolution part sum of the corresponding position of the input characteristic diagram of the present round, as shown in fig. 3.

Referring to fig. 3, taking n=5 as an example, in the convolution operation, multiply-add operation units MAC1 to MAC5, partial and store FIFOs, pre-adder PartSumADD, speculative data input ports PixelW1 to PixelW5, input port MulAddEn (multiply-add enable port), partSumFIFO (partial and store FIFO read port), output port FcDone (full-connection-layer multiply output port), and output port PartSumDone (partial and output) are involved.

Specifically, when the accelerator works, the convolution kernel Weight storage module and the convolution kernel Bias storage module read convolution kernel Weight parameters and Bias parameters required by the input feature map of the round from corresponding storage, when first reasoning data is sent into the MAC1 through the PixelW1, the global control module enables MulAddEn, at the moment, the MAC1 starts to work, the convolution kernel Weight data Weight1 and the reasoning data from the PixelW1 are sent into the multiplication unit of the multiplication adder to carry out multiplication operation, and meanwhile, the last round of parts and the convolution kernel Bias are sent into the PartSumADD to carry out addition operation; the next clock comes, the MAC1 multiplication result and the PartSumADD addition result are sent to the MAC1 addition unit for addition operation, and meanwhile, the reasoning data from PixelW2 and the Weight2 are read into the MAC2 multiplication unit for multiplication operation; then, the next clock period is used for temporarily sending the MAC2 multiplication result and the MAC1 addition result to the MAC2 addition unit for addition operation, and meanwhile, the inferred data from PixelW3 and Weight3 are sent to the MAC3 multiplication unit for operation; then, the next clock period is used for temporarily sending the MAC3 multiplication result and the MAC2 addition result to the MAC3 addition unit for addition operation, and meanwhile, the inferred data from PixelW4 and Weight4 are sent to the MAC4 multiplication unit for operation; then, the next clock period is used for temporarily sending the MAC4 multiplication result and the MAC3 addition result to the MAC4 addition unit for addition operation, and meanwhile, the inferred data from PixelW5 and Weight5 are sent to the MAC5 multiplication unit for operation; and the next clock period comes, the MAC5 multiplication result and the MAC4 addition result are sent to the MAC5 addition unit for addition operation, and the MAC 1-MAC 5 can realize the pipeline implementation of the high-efficiency one-dimensional convolution neural network volume product multiplication and addition operation, so that the utilization rate of the MAC 1-MAC 5 multiplication and addition units is 100%, the operation of the whole convolution kernel can be completed only by adding one adder, and the use of the adder is greatly reduced.

According to an embodiment of the present invention, the time-series processing accelerator based on the one-dimensional convolutional neural network further includes a first multiplexer. The input of the first multiplexer is connected with the output of the second adder to the nth adder, the FcMode port is used for receiving a mode control signal s of the global control module and outputting the output of the s-th adder corresponding to the mode control signal s, and s=2, 3, … and N; when the first multiplexer receives the mode control signal s, the working mode of the convolution kernel is full-connection operation, otherwise, the working mode of the convolution kernel is convolution operation, as shown in fig. 3.

The circuit structure of the convolution operation array in this embodiment may be multiplexed into an operation array of a full connection layer, and the full connection layer performs a 1×1 convolution. Taking 4 neurons as an input layer of the full connection layer as an example, if the current layer state is the full connection layer, the global control module enables FcLayer, and the mode control signal s of the FcMode port synchronously gives the full connection layer mode as 4 neurons of the input layer. During full-connection layer operation, MAC 1-MAC 4 are identical to the convolution operation process, and the difference is that when the addition of MAC4 is completed, data are directly sent to FcMUX through Fc4Done, and because the mode control signal s received by FcMUX is 4 at this time, fcMUX gates Fc4Done to output the data to a subsequent processing module through FcDone ports, and the effect that the convolution operation array is multiplexed as the full-connection layer operation array is realized through the circuit structure.

The ports of each module in the processing accelerator based on the time series of the one-dimensional convolutional neural network, the connection relationship between each port, and the data flow in this embodiment are described below.

The time series processing accelerator includes three I/O ports, two sets of input ports rst, modelStart and one set of output ports ModelDone. rst is the reset port of the accelerator, modelStart is the enable port of the accelerator, and ModelDone is the inference completion flag bit. When the enabling port ModelStart receives a continuous enabling signal and the reset port rst receives a reset enabling signal with more than two clock cycles, all registers in the accelerator execute reset operation, the time sequence processing accelerator performs one-dimensional convolutional neural network reasoning after the reset is completed, and the output port ModelDone outputs high-level pulses after the reasoning is completed to indicate that the reasoning is completed.

The global control module is connected with the storage module through eight groups of signal lines, and all the eight groups of signal lines are output to the storage management module by the global control module. The eight signal lines are respectively 1-bit wide inferred data read control PixelRd1/PixelRd2, 1-bit wide inferred data write control PixelWr1/PixelWr2, 1-bit wide convolution kernel weight read control weight Rd, 1-bit wide convolution kernel bias read BiasRd, 4-bit wide convolution kernel bias BiasAddr read address, 8-bit wide convolution kernel weight read address weight addr, 12-bit wide inferred data read address PixelRdAddr, and 12-bit wide inferred data write address PixelWrAddr, and are responsible for transferring access addresses and access control signals to inferred data storage, convolution kernel weight storage, and convolution kernel bias storage.

The global control module is connected with the input processing module through two groups of signal lines, and the signals are all output from the global control module to the input processing module. The two signal lines are respectively 1-bit-wide data stream processing enabling MulAddEn and are used for controlling the opening and closing of the input processing module, and 2-bit-wide reading chip selection PixelInMode and are used for reasoning the source selection of data reading. The storage module is connected with the input processing module through two groups of signal lines, all the storage modules are output to the input processing module, all the storage modules are 64-bit wide, specific ports are inferred data output PixelIn1 and PixelIn2, and the two groups of ports are responsible for moving inferred data from a memory to the input processing module.

The global control module is connected with two groups of signal lines of the convolution operation array, the bit width is 1bit, and the bit width is output to the convolution operation array from the global control module. The two groups of signal lines are respectively a multiplication and addition unit enabling MulAddEn and an activation function circuit enabling ActEn and are respectively used for controlling the opening of the convolution array circuit and the activation function circuit. The input processing module is connected with the convolution operation array through five groups of signal lines, the bit width is 64 bits, the input processing module is respectively input PixelW 1-PixelW 5 for completing data stream processing, and the direction is from the input processing module to the convolution operation array, and the input processing module is used for sending the reasoning data which has formed the data stream into the convolution operation array.

The global control module is connected with the pooling processing module through two groups of signal lines, and the two groups of signal lines are all output to the pooling processing module through the global control module. The two groups of signal lines are respectively a global average pooling layer mark GAP with the bit width of 1bit and the maximum pooling layer mark MaxPool with the bit width of 1bit and are used for informing the pooling processing module of the current pooling operation mode so as to control the pooling operation mode of the pooling processing module. The convolution operation array is connected with the pooling processing module through a group of signal lines, and the convolution operation array is used for carrying the activated partial convolution and PartSumDone to the pooling processing module. The pooling processing module is connected with the storage module through two groups of signal lines, the bit width is 64 bits, the output buffer module outputs the signals to the storage management module, and the output buffer module outputs ConvOut and PoolOut which are convolution layer operation results respectively and is used for moving the pooled result back to the reasoning data storage.

The global control module is connected with two groups of signal lines of the full-connection processing module, the bit width is 1bit, and the two groups of signal lines are all output to the full-connection processing module by the global control module, specifically, a full-connection layer mark Fc and a full-connection layer adder enabling FcAddEn and are used for enabling the full-connection layer processing module and the full-connection layer adder. The full-connection processing module is connected with the storage module through a group of signal lines, the bit width is 64 bits, and the full-connection layer output FcOut which is output from the full-connection processing module to the storage module is responsible for carrying the full-connection layer operation result back to the reasoning data storage.

Referring to fig. 4, the time-series processing accelerator in this embodiment operates as follows. The global control module controls state circulation and sends control signal enabling and address signals to the corresponding functional modules. The reasoning data read from the reasoning data storage is processed by an input processing module, a convolution operation module, a pooling processing module or a fully-connected processing module, and finally sent back to another group of reasoning data storage until one neural network layer completely completes exchange input and output, and the cycle is repeated until all the neural network layers complete processing, the final result is output to a designated position, and a high-level pulse is given out at an output port ModelDone to represent that the reasoning is completed.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the invention and is not intended to limit the invention, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The time sequence processing accelerator based on the one-dimensional convolutional neural network is characterized by comprising an input processing module, a convolutional operation array, a pooling processing module, a full-connection processing module and a global control module;

the input processing module comprises N rows of register groups, the number of registers in the first row of register groups is N, the number of registers in each row of register groups is reduced by one row, each column of registers in the first row of register groups are sequentially connected, each row of register groups are sequentially connected with each column of registers, and N is the size of a convolution kernel in the convolution operation array;

under the cooperative control of the global control module, each reasoning data sequentially passes through a register reg _1N Input, cross-flow in first row register set, through register reg ₁₁ Outputting the data in the first row register set in each row register setMiddle-to-vertical flow back through register reg _nn Output, n=2, 3, …, N, where register reg _ij Registers in the (N-j+1) th column in the (i) th row register set;

the convolution operation array is used for carrying out convolution operation and activation on the data output by the input processing module, the pooling processing module is used for pooling the activation result and outputting the result, and the full-connection processing module is used for carrying out full-connection addition operation on the activation result and outputting the result;

the convolution kernel in the time-series processing accelerator includes:

n multipliers, two inputs of the ith multiplier are registers reg respectively _ii I=1, 2, …, N;

the two inputs of the first adder are respectively convolution kernel offset and convolution part summation of the corresponding position of the input characteristic diagram of the previous round, the two inputs of the kth adder are respectively output of the kth-1 adder and output of the kth multiplier, k=2, 3, …, n+1, and the output of the nth+1 adder is convolution part summation of the corresponding position of the input characteristic diagram of the present round;

the time sequence processing accelerator further comprises a first multiplexer, wherein the input of the first multiplexer is connected with the output of the second adder to the nth adder, the FcMode port is used for receiving a mode control signal s of the global control module and outputting the output of the s-th adder corresponding to the mode control signal s, and s=2, 3, … and N;

when the first multiplexer receives the mode control signal s, the working mode of the convolution kernel is full-connection layer operation, otherwise, the working mode of the convolution kernel is convolution operation.

2. A time-series processing accelerator based on a one-dimensional convolutional neural network as recited in any one of claim 1, further comprising:

and the reasoning data storage module is used for storing the reasoning data originally input into the time sequence processing accelerator and the reasoning data output by the convolution operation array, the pooling processing module and the fully-connected processing module, and outputting the stored reasoning data to the input processing module.

3. The time-series processing accelerator based on a one-dimensional convolutional neural network according to claim 2, wherein the inference data storage module comprises a plurality of partitions for storing different types of inference data, respectively;

the input processing module also comprises a second multiplexer, and the second multiplexer is used for selecting the reasoning data or the zero filling sequence in the corresponding partition according to the current layer state of the time sequence processing accelerator and sending the reasoning data or the zero filling sequence to the input processing module.

4. The time series processing accelerator based on one-dimensional convolutional neural network of claim 1, further comprising a convolutional kernel weight storage module and a convolutional kernel bias storage module for storing the convolutional kernel weights and the convolutional kernel biases in the convolutional operations of each layer, respectively.

5. The one-dimensional convolutional neural network-based time-series processing accelerator of claim 1, further comprising a reset port, an enable port, and an output port; when the enabling port receives continuous enabling signals and the resetting port receives resetting enabling signals with more than two clock cycles, the time sequence processing accelerator conducts one-dimensional convolutional neural network reasoning, and after the reasoning is completed, the output port outputs high-level pulse to represent completion.

6. The time-series processing accelerator based on the one-dimensional convolutional neural network according to claim 1, wherein two signal lines are connected between the global control module and the pooling processing module, and the global control module outputs a maximum pooling layer flag or a global average pooling layer flag to the pooling processing module through the two signal lines so as to control a pooling operation mode of the pooling processing module.