CN116151340A - Parallel random computing neural network system and hardware compression method and system thereof - Google Patents

Parallel random computing neural network system and hardware compression method and system thereof Download PDF

Info

Publication number
CN116151340A
CN116151340A CN202211677013.3A CN202211677013A CN116151340A CN 116151340 A CN116151340 A CN 116151340A CN 202211677013 A CN202211677013 A CN 202211677013A CN 116151340 A CN116151340 A CN 116151340A
Authority
CN
China
Prior art keywords
bits
bit
neural network
input
compression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211677013.3A
Other languages
Chinese (zh)
Other versions
CN116151340B (en
Inventor
贺光辉
鞠春晖
叶璐
岳大胜
林啸
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huixi Intelligent Technology Shanghai Co ltd
Original Assignee
Huixi Intelligent Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huixi Intelligent Technology Shanghai Co ltd filed Critical Huixi Intelligent Technology Shanghai Co ltd
Priority to CN202211677013.3A priority Critical patent/CN116151340B/en
Publication of CN116151340A publication Critical patent/CN116151340A/en
Application granted granted Critical
Publication of CN116151340B publication Critical patent/CN116151340B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention provides a parallel random computing neural network and a hardware compression method and a system thereof, wherein the method aims at the parallel random computing neural network realized by FPGA or ASIC and comprises any one or any plurality of compression methods as follows: the positive-negative separation compression method is used for carrying out positive-negative separation on the weight, so that the bit expansion of the input activated sign bit is avoided; the continuous bit compression method compresses and weights and sums homologous bits, and effectively reduces the hardware overhead of the adder on the premise of not losing network precision; the approximate compression method utilizes the idea of approximate calculation, only selects partial bits for addition, and can achieve a good compromise between precision and compression efficiency. A corresponding terminal and computer-readable storage medium are also provided. The compression method provided by the invention provides an effective hardware resource optimization scheme for the parallel random computing neural network system, so that the area efficiency and the energy efficiency can be higher.

Description

Parallel random computing neural network system and hardware compression method and system thereof
Technical Field
The invention relates to the technical field of super large scale integrated circuit architecture design, in particular to a parallel random computing neural network system, a hardware compression method and a hardware compression system thereof, and provides a corresponding terminal and a computer readable storage medium.
Background
Neural networks have achieved tremendous success in the fields of image classification, object detection, natural language processing, and the like in recent years. A neural network model often contains a large number of multiply-add operations, and the traditional CPU does not meet its computationally intensive requirements, so that research on neural network hardware accelerators based on FPGAs or ASICs has received a lot of attention. Neural networks themselves have the potential for highly parallel computing, but are limited by the limited resources of hardware platforms, and highly parallel neural network hardware architecture design has been a challenging problem.
Random computing has gained much attention in recent years as a novel computing paradigm in the latter molar age. He uses the idea of approximate computation to realize some common arithmetic units, such as multipliers, adders, hyperbolic tangent functions, etc., with very low hardware resource overhead at the expense of a certain computation accuracy. Because of the fault tolerance of the neural network, the neural network based on random calculation can effectively reduce the hardware resource cost on the premise of ensuring the precision, and has been widely studied by the academy. Conventional random computation converts binary numbers into a single bit sequence using serial encoding of the bit stream, and the probability of occurrence of bit 1 in the sequence is equal to a constant value. The coding mode enables the random calculation performance to be greatly influenced by the random number quality, and in order to enhance the sequence distribution controllability and improve the random calculation precision, a deterministic coding mode is proposed, wherein the distribution of 1 in the sequence is not random.
The random computation neural network often needs a longer sequence length to obtain better performance, which results in higher computation delay and severely reduces the throughput rate of the random computation neural network. The random computing architecture based on bit expansion is used for solving the problem of low throughput rate, adopts a full parallel structure on a bit layer, utilizes the idea of approximate computation, and realizes the traditional binary multiplication and addition operation by using bit expansion, bit selection and addition, thereby successfully reducing hardware overhead. However, parallel stochastic computing neural networks (parallel stochastic-computing neural network, PSC-NN) still require significant hardware resource overhead, and deployment of PSC-NN on resource-limited hardware platforms such as wearable devices still faces significant challenges in the trend of increasing network depths today.
Disclosure of Invention
The invention provides a parallel random computing neural network system, a hardware compression method and a system thereof, and a corresponding terminal and medium.
The invention is realized by the following technical scheme.
According to one aspect of the invention, a hardware compression method of a parallel random computing neural network system is provided, any one or any plurality of compression methods are adopted, based on FPGA or ASIC, the number of input bits of an addition tree in the parallel random neural network realized by the FPGA or ASIC is compressed, and the hardware compression of the parallel random computing neural network system is realized:
the positive-negative separation compression method is used for carrying out positive-negative separation on the weight of the trained neural network, carrying out parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, taking the expanded bit as the input of the addition tree, and reducing the input scale of the addition tree;
-a continuous bit compression method which compresses the homologous bits and groups the weighted bits as an input to the addition tree, reducing the number of homologous bits input to the addition tree;
-an approximate compression method which partly selects bits to be input to the addition tree, multiplies the bits by a corresponding multiple factor after addition, and reduces the size of the original addition tree to the size of the weighted addition tree.
Optionally, the positive-negative separation compression method includes:
carrying out positive and negative separation on the weight of each neuron of the neural network, and representing the input activation value of each neuron in a symbol-absolute value form;
parallel random bit expansion is carried out on input activation values corresponding to the positive and negative weights, and the parallel random bit expansion comprises the following steps: expanding the bits with the binary weight of W to W bits with the weight of 1, randomly selecting WN bits from the N expanded bits according to the absolute value W of the weight, and completing the parallel random bit expansion; wherein W <1; in the parallel random bit expansion process, the symbol bits do not participate in bit expansion;
the parallel random bit expansion is realized through the expansion of logic lines;
and taking the expanded bits corresponding to the positive and negative weights as the input of the addition tree.
Optionally, the continuous bit compression method includes:
performing bit expansion on a certain bit of the binary number to obtain a series of bit sequences with the same numerical value, namely homologous bits; compressing the homologous bits to obtain single bit values representing the homologous bits;
assigning a corresponding weight to the compressed single bit, the weight being equal to the number of homologous bits before compression of the bit;
and adding single bit values with the same weight in each neuron of the neural network, multiplying the single bit values by the weight, and adding the obtained results corresponding to different weights as the input of an addition tree to obtain the output of the final neuron.
Optionally, the approximate compression method includes:
partial selection is carried out on bits after parallel random bit expansion of an addition tree to be input, the number of the bits after selection is 1/S of the number of the original bits, and S is a multiple factor;
summing the selected bits, and adding the summation result multiplied by a multiple factor S as the input of the addition tree to obtain the output of the neuron.
Optionally, the method for partially selecting the parallel random bit expanded bits includes: a random selection method, a grouping intra-group random selection method and a uniform selection method; wherein:
the random selection method comprises the following steps: randomly selecting 1/S bits of the original bit number;
the method for randomly selecting the group after grouping comprises the following steps: dividing each S bits in the original bits into a group, and randomly selecting 1 bit from each group;
the uniform selection method comprises the following steps: one bit is selected every S bits among the original bits.
Optionally, when the hardware compression of the parallel random computing neural network is performed by adopting multiple compression methods, the implementation is performed according to the sequence of the positive-negative separation compression method, the approximate compression method and/or the continuous bit compression method, and a new hardware architecture of the parallel random computing neural network is obtained.
Optionally, the method further comprises any one or more of:
after the input of the addition tree is obtained through the positive-negative separation compression method, the addition tree carries out independent accumulation calculation on the input to obtain two non-negative numerical results; subtracting the two numerical results by adopting a subtracter, and obtaining the output of a final neuron through an activation function unit;
-after obtaining the input of an addition tree by said sequential bit compression method, said addition tree adds said inputs to obtain the output of the final neuron;
-after obtaining the inputs of the adder tree by said approximate compression method, said adder tree adds said inputs to obtain the output of the neuron.
According to another aspect of the present invention, there is provided a hardware compression system for a parallel random computing neural network, which compresses the number of input bits of an addition tree in the parallel random neural network by using any one or any plurality of compression modules, so as to implement hardware compression for the parallel random computing neural network system:
the positive-negative separation compression module performs positive-negative separation on the weight of the trained neural network, performs parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, takes the expanded bit as the input of the addition tree, and reduces the input scale of the addition tree;
-a successive bit compression module which compresses the homologous bits and groups the weighted bits as an input to the adder tree, reducing the number of homologous bits input to the adder tree;
-an approximate compression module which performs a partial selection of bits to be input into the addition tree, multiplies the addition by a corresponding multiple factor, and reduces the size of the original addition tree to the size of the weighted addition tree;
and the hardware compression of the parallel random computing neural network system is completed through the independent implementation of any one compression module or the common implementation of any plurality of compression modules.
According to a third aspect of the present invention, there is provided a parallel random computing neural network system based on an FPGA or an ASIC, where the method described in any one of the above, or the system described above, is used to compress the number of input bits of an addition tree in a parallel random neural network implemented by the FPGA or the ASIC, so as to implement hardware compression of the parallel random computing neural network system.
According to a fourth aspect of the present invention there is provided a terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing the program being operable to perform a method of any one of the above, or to run a system as described above.
According to a fifth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method of any of the above, or to run a system as described above.
Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:
the parallel random computing neural network system and the hardware compression method and system thereof provided by the invention adopt any one or any plurality of compression technologies of positive and negative separation compression, continuous bit compression and approximate compression, effectively reduce the input scale of the addition tree in the parallel random computing neural network system, apply the addition tree to the existing parallel random computing neural network architecture, obtain three corresponding hardware modules and effectively reduce the original hardware cost of the existing parallel random computing neural network system.
According to the parallel random computing neural network system, the hardware compression method and the hardware compression system thereof, any one or any plurality of compression technologies of positive and negative separation compression, continuous bit compression and approximate compression are implemented on the traditional parallel random computing neural network system, so that the compression of the input bit number scale of the addition tree with the largest hardware cost ratio in the parallel random computing neural network is realized, the hardware cost of the addition tree is reduced, and a brand-new compressed parallel random computing neural network system can be obtained.
The parallel random computing neural network system and the hardware compression method and system thereof provided by the invention are applied to the parallel random computing neural network realized by the ASIC or the FPGA, and can effectively reduce the logic resource cost of the ASIC or the FPGA.
The parallel random computing neural network system and the hardware compression method and system thereof provided by the invention are especially suitable for the technical fields of image classification, target detection, natural language processing and the like, and can effectively reduce the hardware cost of the neural network system and improve the energy efficiency.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a diagram of a parallel stochastic computing architecture of a positive-negative separation compression method in a preferred embodiment of the invention;
FIG. 2 is a schematic diagram of a method of sequential bit compression in accordance with a preferred embodiment of the present invention;
FIG. 3 is a schematic diagram of an approximate compression method in accordance with a preferred embodiment of the present invention;
FIG. 4 is a hardware architecture diagram of neurons of a parallel random computing neural network system (parallel stochastic-computing neural network, PSC-NN) PSC-NN employing three compression methods of positive and negative separation, successive bits, and approximate compression simultaneously in a preferred embodiment of the present invention.
Detailed Description
The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.
The embodiment of the invention provides a hardware compression method of a parallel random computing neural network system, which aims at optimizing the hardware resource expense of the parallel random computing neural network system.
The hardware compression method of the parallel random computing neural network system provided by the embodiment adopts any one or any plurality of compression methods, is based on FPGA or ASIC, compresses the input bit number of the addition tree in the parallel random neural network realized by the FPGA or ASIC, and realizes the hardware compression of the parallel random computing neural network system:
the positive-negative separation compression method is used for carrying out positive-negative separation on the weight of the trained neural network, carrying out parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, taking the expanded bit as the input of the addition tree, and reducing the input scale of the addition tree by reducing the bit expansion of the sign bit;
-a continuous bit compression method which compresses the homologous bits and groups the weighted bits as an input to the addition tree, reducing the number of homologous bits input to the addition tree;
-an approximate compression method which partly selects bits to be input into the addition tree, multiplies the addition tree by a corresponding multiple factor, and changes the addition tree originally input with a larger scale into a weighted addition tree with a smaller input scale.
In a preferred embodiment of the positive and negative separation compression method, it may include:
carrying out positive and negative separation on the weight of each neuron of the neural network, and representing the input activation value of each neuron in a symbol-absolute value form;
parallel random bit expansion is carried out on input activation values corresponding to the positive and negative weights, and the parallel random bit expansion comprises the following steps: expanding the bits with the binary weight of W to W bits with the weight of 1, randomly selecting WN bits from the N expanded bits according to the absolute value W (W < 1) of the weight, and completing the parallel random bit expansion; in the parallel random bit expansion process, the symbol bits do not participate in bit expansion;
the parallel random bit expansion is realized on hardware through the expansion of logic lines, so that the cost of other hardware resources is not additionally consumed;
and taking the expanded bits corresponding to the positive and negative weights as the input of the addition tree.
In a specific application example, W is a binary weight of bits, for example, an input corresponding to a certain weight with an absolute value of w=0.5 for a certain neuron is activated to 1010,4 bits, and the binary weights W are 8,4,2,1 respectively. This activation will be extended to n=8+4+2+1=15 bits, randomly choosing wn=0.5×15=8 bits.
Further, the expanded bits corresponding to the positive and negative weights are used as the input of the addition tree to carry out independent accumulation calculation, so that two non-negative numerical results are obtained; the two numerical results are subtracted and the final neuron output is obtained by the activation function.
In a preferred embodiment of the successive bit compression method, it may comprise:
performing bit expansion on a certain bit of the binary number to obtain a series of bit sequences with the same numerical value, namely homologous bits, and compressing the homologous bits to obtain single bit values to represent the homologous bits;
assigning a corresponding weight to the compressed single bit, the weight being equal to the number of homologous bits before compression of the bit;
after adding the single bit values of all the same weights in each neuron of the neural network, multiplying the single bit values by the weights, and taking the obtained results corresponding to different weights as the input of an addition tree.
The above-described multiplication by weights is achieved by introducing a constant multiplier on the hardware.
Further, the obtained results corresponding to different weights are used as the input of an addition tree, and the output of the final neuron is obtained through addition of the addition tree.
In a preferred embodiment of the approximate compression method, it may include:
partial selection is carried out on bits after parallel random bit expansion of an addition tree to be input, the number of the bits after selection is 1/S of the number of the original bits, and S is a multiple factor;
the selected bits are summed and the result of the summation is multiplied by a multiple factor S as input to the summing tree.
The partial selection of bits does not incur any hardware overhead, while introducing a constant multiplier to effect the multiplication by the multiple factor S.
Further, the method for partially selecting the parallel random bit expanded bits may include: a random selection method, a grouping intra-group random selection method and a uniform selection method; wherein:
the random selection method can comprise the following steps: randomly selecting 1/S bits of the original bit number;
the method for random selection in the group after grouping can comprise the following steps: dividing each S bits in the original bits into a group, and randomly selecting 1 bit from each group;
the uniform selection method may include: one bit is selected every S bits among the original bits.
In a preferred embodiment, when the hardware compression of the parallel random computing neural network is performed by adopting a plurality of compression methods, the implementation is performed according to the sequence of the positive-negative separation compression method, the approximate compression method and/or the continuous bit compression method, and a new hardware architecture of the parallel random computing neural network is obtained.
In some embodiments of the invention:
for each neuron of the neural network, a weight positive-negative separation compression method is adopted, expansion bits corresponding to the positive and negative weights are respectively added to obtain two non-negative values, and then a final result is obtained through subtraction of a subtracter, so that the bit expansion requirement of a sign bit is omitted.
The adopted continuous bit compression method (consecutive bit compression, CBC) compresses the homologous bit to single bit, and gives corresponding weight, the sum of single bit addition of the same weight after compression is multiplied by the weight, and finally the results of different weights are added. The continuous bit compression method belongs to a weighted addition method, compared with the original direct addition method, the number of bits is compressed, the input scale of an addition tree is reduced at the cost of introducing a small amount of constant multipliers, and the hardware cost of an adder is reduced, so that the overall hardware cost is reduced.
The approximate compression method (approximate compression, AC) employed, employing schemes such as random selection, selects only a portion of the expanded bits, then inputs the adder for summation, and finally multiplies the multiple factor by a single constant multiplier. The method utilizes the idea of approximate calculation, and can obtain a better compromise of precision and hardware cost.
Further, the selection scheme adopted by the approximate compression method comprises random selection, random selection in groups after grouping and uniform selection. In the random selection scheme, the AC randomly selects 1/S bits of the original bit number; the group is randomly selected to divide each S of the original bits into a group, and each group is randomly selected to 1; the uniform selection scheme selects one every S bits.
Further, the positive and negative separation of the weights, CBC and AC are compatible with each other, and some homologous bits still exist after AC is used, so that the compression effect can be further improved by using CBC next.
The technical scheme provided by the embodiment of the invention is further described below with reference to the accompanying drawings.
Parallel random computing architectureBinary numbers are converted into a fully parallel sequence of bits such that binary multiplication operations can be implemented by selection of bits. As shown in fig. 1, each bit of the binary number will be extended according to its weight to a parallel bit sequence, the bit with weight i will be extended by 2 i Parallel bits whose value is equal to the value of the currently extended bit. For example, binary number 3 would be expanded into a parallel bit sequence "0000111". These parallel bits are implemented in hardware circuits with logic lines. To implement the multiplication operation, these expanded bits need to be selected, and the unselected bits cannot be passed on to the next stage of computation. Since the bits expanded by the same source bit have the same value, a uniform selection method is adopted at the time of selection. The weight is W, and to achieve multiplication of the input and the weight, it is necessary to uniformly select a part of bits among the extended bits to input to the following circuit. For source bits with weight i, it extends by 2 i In a single bit<2 i ·W>And is selected to be transmitted to the next part of the circuit, wherein<x>Meaning rounding x. As shown in FIG. 1, the selection process corresponding to 3 times 0.5 selects only<4·0.5>+<2·0.5>+<1·0.5>=4 bits. In the above process, the part of binary numbers converted into parallel bit sequences is denoted as bit extension 1, and the remaining part is denoted as bit extension 2. The bits output by bit extension 2 will all be added by an addition tree to complete the conversion of the bit sequence into binary numbers.
For a trained neural network, because the weight of each layer is determined, in order to reduce the expansion of sign bit, a framework of weight positive-negative separation is adopted. On the basis, the binary number is encoded by adopting symbol-absolute value encoding. The input corresponding to the positive weight is added by an addition tree after the bit expansion, the input corresponding to the negative weight is added by another addition tree after the bit expansion, and the values obtained by the two parts are subtracted to obtain a final result. The subtraction is implemented in hardware by introducing a signed subtractor. In most neural networks, the activation function uses a ReLU, while the input feature image pixel values are non-negative, which ensures that the input corresponding to each weight is non-negative.
The continuous bit compression method (CBC) uses the characteristic that homologous bits have the same value, compresses the same, and can be represented by a single bit, but the bit needs to be given a corresponding weight. For X expansion bits obtained by expanding the same source bit, the compressed single bit is given weight X. For all the extension bits, the single bits with the same weight are added firstly, the obtained sum is multiplied by X through a constant multiplier, and finally the results corresponding to different weights are added up through an addition tree. Fig. 2 shows a schematic diagram of a CBC compression method, in which bits in the same gray frame are expanded from the same source bit, and their values are the same, so that they are compressed by the CBC and given corresponding weights. The compressed bits with the same weight are added firstly, then multiplied by the weight, and finally the results corresponding to different weights are added. For example, the operation finally implemented in FIG. 2 is
Figure BDA0004017366160000081
Wherein the addition is realized in hardware through an addition tree, and the multiplication is realized through a constant multiplier.
The number of bits after expansion is sufficiently large for a neural network, so that the numerical distribution of these expanded bits can be approximately represented by only a part thereof. The AC uses the idea that only 1/S of the extension bits are selected to be input to the following adder tree circuit, and the result of the summation is multiplied by S by a constant multiplier. The invention proposes 3 schemes for selecting part of bits: random selection, random selection in groups after grouping, uniform selection. The AC randomly selects 1/S of the expansion bit in the random selection scheme; the random selection in the grouped groups divides each S of the expansion bits into a group, and each group randomly selects 1; the uniform selection scheme selects one for every S extension bits. AC under the above 3 alternatives are denoted AC1, AC2 and AC3, respectively, and fig. 3 gives a schematic diagram when s=2.
With the compression method described above, FIG. 4 shows the hardware architecture of one neuron of PSC-NN. Because CBC and AC are compatible with each other, after undergoing bit expansion 1 and bit expansion 2, AC compression is first employed, and some of the compressed bits still exist as a result of homologous bit expansion, so CBC continues to be employed to increase compression rate. It should be noted that when both AC and CBC are used, the last multiplication of the multiple S in AC needs to be performed with a constant multiplier after the CBC.
The embodiment of the invention provides a hardware compression system of a parallel random computing neural network system, which adopts any one or a plurality of compression modules to compress the input bit number of an addition tree in the parallel random neural network realized by an FPGA or an ASIC, thereby realizing the hardware compression of the parallel random computing neural network system:
the positive-negative separation compression module performs positive-negative separation on the weight of the trained neural network, performs parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, takes the expanded bit as the input of the addition tree, and reduces the input scale of the addition tree;
-a successive bit compression module which compresses the homologous bits and groups the weighted bits as an input to the adder tree, reducing the number of homologous bits input to the adder tree;
-an approximate compression module which performs a partial selection of bits to be input into the addition tree, multiplies the addition by a corresponding multiple factor, and reduces the size of the original addition tree to the size of the weighted addition tree;
and the hardware compression of the parallel random computing neural network system is completed through the independent implementation of any one compression module or the common implementation of any plurality of compression modules.
It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.
The technical scheme of the hardware compression method and system provided by the above embodiment of the invention is further described below with reference to a specific application example.
The specific application example is described by taking the application of the hardware compression method and the system to the image classification task as an example. For the common MNIST data set image classification task in the image classification task, the neural network with parameters of 784-480-240-10 can be adopted to achieve the classification precision of more than 98%. The technical scheme provided by the embodiment of the invention is applied to the existing parallel random computing neural network system, and the obtained hardware architecture comprises three parts: bit extension 1, bit extension 2, and an addition tree. In the bit expansion 1, each bit is expanded into w identical bits according to the binary weight w of the bit according to the activated value, and the bit expansion on hardware is realized through the expansion of a logic line, so that the hardware resource overhead is not additionally consumed. These bits are randomly selected in bit extension 2 according to the absolute value W of the weights of the neurons. Assuming that the number of extension bits output by bit extension 1 is N, the neuron will randomly select WN bits to output to the adder tree via bit extension 2. The addition tree sums the bits, and the sum of the addition trees corresponding to the positive and negative weights is subtracted by a subtracter to obtain the final output of the neuron. Compared with the traditional parallel random computing neural network hardware architecture, the compression technology provided by the embodiment of the invention is applied to effectively reduce the input scale of the addition tree, thereby reducing the overall hardware cost. Experimental results show that aiming at the parallel random computation neural network realized by ASIC hardware, the compression module provided by the invention can effectively reduce the hardware overhead by 21.6% of the 784-480-240-10 neural network, and simultaneously improve the energy efficiency by 40.1%.
An embodiment of the present invention provides a parallel random computing neural network system based on an FPGA or an ASIC, and the method of any one of the foregoing embodiments, or the system of any one of the foregoing embodiments, compresses the number of input bits of an addition tree in the parallel random computing neural network implemented by the FPGA or the ASIC, so as to implement hardware compression of the parallel random computing neural network system.
An embodiment of the present invention provides a terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, may be used in the method of any of the above embodiments, or a system for running any of the above embodiments.
Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.
A processor for executing the computer program stored in the memory to implement the steps in the method according to the above embodiment. Reference may be made in particular to the description of the embodiments of the method described above.
The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.
An embodiment of the present invention also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform the method of any of the above embodiments, or to run the system of any of the above embodiments.
The parallel random computing neural network, the hardware compression method and the system thereof provided by the embodiment of the invention adopt three hardware compression methods aiming at the parallel random computing neural network system (parallel stochastic-computing neural network, PSC-NN). The PSC-NN adopts a hardware architecture with full bit parallelism, and the throughput rate is obviously improved at the cost of high hardware cost. The first compression method separates the weight from the positive and negative, avoiding the bit expansion of the input activation (activation) sign bit. The second compression method compresses and weights and sums the homologous bits, and effectively reduces the hardware overhead of the adder on the premise of not losing network precision. The third compression method uses the idea of approximate calculation, only selects partial bits for addition, and can achieve a good compromise between precision and compression efficiency. In addition, the three compression methods are compatible with each other, and can be used simultaneously, so that the compression efficiency is further improved. The parallel random computing neural network, the hardware compression method and the system thereof provided by the embodiment of the invention provide an effective hardware resource optimization scheme for PSC-NN, so that higher area efficiency and energy efficiency can be achieved.
The foregoing embodiments of the present invention are not all well known in the art.
The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims (10)

1. The hardware compression method of the parallel random computing neural network system is characterized in that any one or any plurality of compression methods are adopted, based on FPGA or ASIC, the input bit number of an addition tree in the parallel random neural network realized by the FPGA or ASIC is compressed, and the hardware compression of the parallel random computing neural network system is realized:
the positive-negative separation compression method is used for carrying out positive-negative separation on the weight of the trained neural network, carrying out parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, taking the expanded bit as the input of the addition tree, and reducing the input scale of the addition tree;
-a continuous bit compression method which compresses the homologous bits and groups the weighted bits as an input to the addition tree, reducing the number of homologous bits input to the addition tree;
-an approximate compression method which partly selects bits to be input to the addition tree, multiplies the bits by a corresponding multiple factor after addition, and reduces the size of the original addition tree to the size of the weighted addition tree.
2. The hardware compression method of the parallel random computing neural network system according to claim 1, wherein the positive-negative separation compression method comprises:
carrying out positive and negative separation on the weight of each neuron of the neural network, and representing the input activation value of each neuron in a symbol-absolute value form;
parallel random bit expansion is carried out on input activation values corresponding to the positive and negative weights, and the parallel random bit expansion comprises the following steps: expanding the bits with the binary weight of W to W bits with the weight of 1, randomly selecting WN bits from the N expanded bits according to the absolute value W of the weight, and completing the parallel random bit expansion; wherein W <1; in the parallel random bit expansion process, the symbol bits do not participate in bit expansion;
the parallel random bit expansion is realized through the expansion of logic lines;
and taking the expanded bits corresponding to the positive and negative weights as the input of the addition tree.
3. The hardware compression method of a parallel random computing neural network system according to claim 1, wherein the sequential bit compression method comprises:
performing bit expansion on a certain bit of the binary number to obtain a series of bit sequences with the same numerical value, namely homologous bits; compressing the homologous bits to obtain single bit values representing the homologous bits;
assigning a corresponding weight to the compressed single bit, the weight being equal to the number of homologous bits before compression of the bit;
and adding single bit values with the same weight in each neuron of the neural network, multiplying the single bit values by the weight, and adding the obtained results corresponding to different weights as the input of an addition tree to obtain the output of the final neuron.
4. The hardware compression method of the parallel random computing neural network system according to claim 1, wherein the approximate compression method includes:
partial selection is carried out on bits after parallel random bit expansion of an addition tree to be input, the number of the bits after selection is 1/S of the number of the original bits, and S is a multiple factor;
summing the selected bits, and adding the summation result multiplied by a multiple factor S as the input of the addition tree to obtain the output of the neuron.
5. The hardware compression method of parallel random computing neural network system of claim 4, wherein the method for partially selecting the parallel random bit expanded bits includes: a random selection method, a grouping intra-group random selection method and a uniform selection method; wherein:
the random selection method comprises the following steps: randomly selecting 1/S bits of the original bit number;
the method for randomly selecting the group after grouping comprises the following steps: dividing each S bits in the original bits into a group, and randomly selecting 1 bit from each group;
the uniform selection method comprises the following steps: one bit is selected every S bits among the original bits.
6. The hardware compression method of the parallel random computing neural network system according to claim 1, wherein when the hardware compression of the parallel random computing neural network is performed by adopting a plurality of compression methods, the implementation is performed in the order of positive-negative separation compression method, approximate compression method and/or continuous bit compression method, and a new hardware architecture of the parallel random computing neural network is obtained.
7. The hardware compression method of a parallel random computing neural network system of any one of claims 1-6, further comprising any one or more of:
after the input of the addition tree is obtained through the positive-negative separation compression method, the addition tree carries out independent accumulation calculation on the input to obtain two non-negative numerical results; subtracting the two numerical results by adopting a subtracter, and obtaining the output of a final neuron through an activation function unit;
-after obtaining the input of an addition tree by said sequential bit compression method, said addition tree adds said inputs to obtain the output of the final neuron;
-after obtaining the inputs of the adder tree by said approximate compression method, said adder tree adds said inputs to obtain the output of the neuron.
8. The hardware compression system of the parallel random computing neural network system is characterized in that any one or a plurality of compression modules are adopted to compress the input bit number of an addition tree in the parallel random neural network, so that the hardware compression of the parallel random computing neural network system is realized:
the positive-negative separation compression module performs positive-negative separation on the weight of the trained neural network, performs parallel random bit expansion without sign bit on the input corresponding to the positive-negative weight, takes the expanded bit as the input of the addition tree, and reduces the input scale of the addition tree;
-a successive bit compression module which compresses the homologous bits and groups the weighted bits as an input to the adder tree, reducing the number of homologous bits input to the adder tree;
-an approximate compression module which performs a partial selection of bits to be input into the addition tree, multiplies the addition by a corresponding multiple factor, and reduces the size of the original addition tree to the size of the weighted addition tree;
and the hardware compression of the parallel random computing neural network system is completed through the independent implementation of any one compression module or the common implementation of any plurality of compression modules.
9. A parallel random computing neural network system based on an FPGA or an ASIC, wherein the method of any one of claims 1 to 7, or the system of claim 8, is used to compress the number of input bits of an addition tree in the parallel random neural network implemented by the FPGA or the ASIC, so as to implement hardware compression of the parallel random computing neural network system.
10. A terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-7 or to run the system of claim 8 when the program is executed by the processor.
CN202211677013.3A 2022-12-26 2022-12-26 Parallel random computing neural network system and hardware compression method and system thereof Active CN116151340B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211677013.3A CN116151340B (en) 2022-12-26 2022-12-26 Parallel random computing neural network system and hardware compression method and system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211677013.3A CN116151340B (en) 2022-12-26 2022-12-26 Parallel random computing neural network system and hardware compression method and system thereof

Publications (2)

Publication Number Publication Date
CN116151340A true CN116151340A (en) 2023-05-23
CN116151340B CN116151340B (en) 2023-09-01

Family

ID=86356666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211677013.3A Active CN116151340B (en) 2022-12-26 2022-12-26 Parallel random computing neural network system and hardware compression method and system thereof

Country Status (1)

Country Link
CN (1) CN116151340B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN112698811A (en) * 2021-01-11 2021-04-23 湖北大学 Neural network random number generator sharing circuit, sharing method and processor chip
CN213934855U (en) * 2021-01-11 2021-08-10 湖北大学 Neural network random number generator sharing circuit based on random computation
CN113313244A (en) * 2021-06-17 2021-08-27 东南大学 Near-storage neural network accelerator facing to addition network and acceleration method thereof
WO2021174790A1 (en) * 2020-03-05 2021-09-10 重庆大学 Sparse quantization neural network coding mode identification method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170286830A1 (en) * 2016-04-04 2017-10-05 Technion Research & Development Foundation Limited Quantized neural network training and inference
CN106909970A (en) * 2017-01-12 2017-06-30 南京大学 A kind of two-value weight convolutional neural networks hardware accelerator computing module based on approximate calculation
CN108090560A (en) * 2018-01-05 2018-05-29 中国科学技术大学苏州研究院 The design method of LSTM recurrent neural network hardware accelerators based on FPGA
WO2021174790A1 (en) * 2020-03-05 2021-09-10 重庆大学 Sparse quantization neural network coding mode identification method and system
CN111832719A (en) * 2020-07-28 2020-10-27 电子科技大学 Fixed point quantization convolution neural network accelerator calculation circuit
CN112698811A (en) * 2021-01-11 2021-04-23 湖北大学 Neural network random number generator sharing circuit, sharing method and processor chip
CN213934855U (en) * 2021-01-11 2021-08-10 湖北大学 Neural network random number generator sharing circuit based on random computation
CN113313244A (en) * 2021-06-17 2021-08-27 东南大学 Near-storage neural network accelerator facing to addition network and acceleration method thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
季渊;陈文栋;冉峰;张金艺;DAVID LILJA;: "具有二维状态转移结构的随机逻辑及其在神经网络中的应用", 电子与信息学报, no. 08 *
林志文;林志贤;郭太良;林珊玲;: "基于FPGA加速的卷积神经网络识别系统", 电子技术应用, no. 02 *

Also Published As

Publication number Publication date
CN116151340B (en) 2023-09-01

Similar Documents

Publication Publication Date Title
CN111062472B (en) Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN107340993B (en) Arithmetic device and method
CN109543140B (en) Convolutional neural network accelerator
CN111832719A (en) Fixed point quantization convolution neural network accelerator calculation circuit
US20210349692A1 (en) Multiplier and multiplication method
CN112200300B (en) Convolutional neural network operation method and device
CN110555516B (en) Method for realizing low-delay hardware accelerator of YOLOv2-tiny neural network based on FPGA
CN113283587B (en) Winograd convolution operation acceleration method and acceleration module
CN113033794B (en) Light weight neural network hardware accelerator based on deep separable convolution
US20230068450A1 (en) Method and apparatus for processing sparse data
CN115982528A (en) Booth algorithm-based approximate precoding convolution operation method and system
US7912891B2 (en) High speed low power fixed-point multiplier and method thereof
CN113902109A (en) Compression method and device for regular bit serial computation of neural network
CN116205244B (en) Digital signal processing structure
CN116151340B (en) Parallel random computing neural network system and hardware compression method and system thereof
CN111966323A (en) Approximate multiplier based on unbiased compressor and calculation method
CN110825346A (en) Low-logic-complexity unsigned approximate multiplier
CN112783473B (en) Method for performing multiplication operation on shaping data by using single DSP unit parallel computation
CN116257210A (en) Spatial parallel hybrid multiplier based on probability calculation and working method thereof
CN112906863B (en) Neuron acceleration processing method, device, equipment and readable storage medium
CN113986194A (en) Neural network approximate multiplier implementation method and device based on preprocessing
CN114021070A (en) Deep convolution calculation method and system based on micro-architecture processor
CN112215349A (en) Sparse convolution neural network acceleration method and device based on data flow architecture
CN112685001A (en) Booth multiplier and operation method thereof
CN110807479A (en) Neural network convolution calculation acceleration method based on Kmeans algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant