CN114707647B - Precision lossless calculation integrated device and method suitable for multi-precision neural network - Google Patents

Precision lossless calculation integrated device and method suitable for multi-precision neural network Download PDF

Info

Publication number
CN114707647B
CN114707647B CN202210227427.XA CN202210227427A CN114707647B CN 114707647 B CN114707647 B CN 114707647B CN 202210227427 A CN202210227427 A CN 202210227427A CN 114707647 B CN114707647 B CN 114707647B
Authority
CN
China
Prior art keywords
precision
neural network
multiply
accumulate
analog
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210227427.XA
Other languages
Chinese (zh)
Other versions
CN114707647A (en
Inventor
周浩翔
刘定邦
刘俊
吴秋平
韩宇亮
罗少波
毛伟
余浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southern University of Science and Technology
Original Assignee
Southern University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southern University of Science and Technology filed Critical Southern University of Science and Technology
Priority to CN202210227427.XA priority Critical patent/CN114707647B/en
Publication of CN114707647A publication Critical patent/CN114707647A/en
Application granted granted Critical
Publication of CN114707647B publication Critical patent/CN114707647B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Analogue/Digital Conversion (AREA)

Abstract

The invention discloses a precision lossless calculation integrated device and a method suitable for a multi-precision neural network, wherein the method comprises the following steps: acquiring input data of a multi-precision neural network, splitting the input data according to bits, and performing digital-to-analog conversion to obtain a plurality of analog signals; based on the selector and the processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network. According to the embodiment of the invention, the multiply-accumulate operation and the multi-precision recombination operation are carried out by adopting a space-time multiplexing mode through the input data of the multi-precision neural network and the preset weight, so that the storage and calculation integration of the multi-precision neural network supports the calculation of the hybrid-precision neural network, the precision loss is avoided, the calculation accuracy is improved, and the calculation energy efficiency can be greatly improved compared with the traditional system-on-chip architecture.

Description

Precision lossless calculation integrated device and method suitable for multi-precision neural network
Technical Field
The invention relates to the technical field of mixed signal circuits, in particular to an integrated device and method for precision lossless computation applicable to a multi-precision neural network.
Background
The core idea of the integrated architecture of the multi-precision neural network is to transfer part or all of the computation to the memory module, i.e. the computation unit and the memory unit are integrated on the same chip. However, most of the existing chips based on the integrated architecture have two problems: the number of AD/DA is reduced by sacrificing a certain calculation accuracy method, so that the power consumption of the AD/DA is reduced, and the calculation energy efficiency is improved, but the method generally brings about the reduction of the calculation accuracy and leads to the reduction of the reasoning accuracy; the existing architecture integrating memory and calculation cannot well support mixed-precision network calculation.
Accordingly, there is a need for improvement and development in the art.
Disclosure of Invention
The invention aims to solve the technical problems that aiming at the defects in the prior art, the invention provides an integrated device and method for storing and calculating the precision of the multi-precision neural network without damage, and aims to solve the problems that the integrated structure for storing and calculating the multi-precision neural network in the prior art has low calculation energy efficiency and can not well support the calculation of the mixed-precision network.
The technical scheme adopted by the invention for solving the problems is as follows:
in a first aspect, an embodiment of the present invention provides an integrated device for precision lossless computation applicable to a multi-precision neural network, where the device includes:
the digital-to-analog conversion module is used for converting the digital signal into an analog signal;
the selector is electrically connected with the digital-to-analog conversion module and is used for selecting a plurality of multiplication accumulators;
and the processing element is electrically connected with the selector and is used for carrying out mixed precision calculation on the analog signals.
In one implementation, the processing element includes a multiply-accumulate array, an analog-to-digital conversion module electrically connected to the multiply-accumulate array, and a multi-precision shift-accumulate module electrically connected to the analog-to-digital conversion module.
In one implementation, the processing elements are arranged in a spatially multiplexed manner.
In one implementation, the multiply-accumulate array consists of p rows and q columns of multiply-accumulators, where p and q are both non-zero integers.
In one implementation, the multiply accumulator includes a memristor and a data processing module electrically connected with the memristor.
In a second aspect, an embodiment of the present invention further provides a method for a precision lossless computation integration apparatus applicable to a multi-precision neural network, where the method includes: acquiring input data of a multi-precision neural network, splitting the input data according to bits, and performing digital-to-analog conversion to obtain a plurality of analog signals;
based on the selector and the processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network.
In one implementation manner, based on the selector and the processing element, performing multiply-accumulate operation and multi-precision recombination operation on the plurality of analog signals and preset weights in a space-time multiplexing manner, and obtaining output data of the multi-precision neural network includes:
acquiring a plurality of time slots, wherein the time slots are used for representing specific time intervals;
selecting a plurality of analog signals by the selector for each time slot to obtain a plurality of time slot signals;
and inputting a plurality of time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, so as to obtain output data of the multi-precision neural network.
In one implementation manner, the inputting the plurality of time slot signals to the processing element in a time multiplexing manner to perform multiply-accumulate operation and multi-precision recombination operation, and obtaining the output data of the multi-precision neural network includes:
inputting a plurality of time slot signals into each row of a multiplication accumulation array in a time multiplexing mode to obtain a plurality of multiplication accumulation results;
and sequentially inputting a plurality of multiply-accumulate results to the analog-to-digital conversion module and the multi-precision shift accumulation module to carry out multi-precision recombination, so as to obtain output data of the multi-precision neural network.
In a third aspect, an embodiment of the present invention further provides an intelligent terminal, including a memory, and one or more programs, where the one or more programs are stored in the memory, and configured to be executed by the one or more processors, where the one or more programs include a method for executing the precision lossless storage integrated device applicable to the multi-precision neural network according to any one of the foregoing.
In a fourth aspect, embodiments of the present invention further provide a non-transitory computer-readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform a method for precision lossless computation of a multi-precision neural network apparatus as described in any one of the above.
The invention has the beneficial effects that: firstly, acquiring input data of a multi-precision neural network, splitting the input data according to bits, and then performing digital-to-analog conversion to obtain a plurality of analog signals; then, based on a selector and a processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network; therefore, in the embodiment of the invention, the multiply-accumulate operation and the multi-precision recombination operation are performed by adopting a space-time multiplexing mode through the input data of the multi-precision neural network and the preset weight, so that the integrated computation of the multi-precision neural network supports the computation of the hybrid-precision neural network, the precision loss is avoided, the computation accuracy is improved, and the computation energy efficiency can be greatly improved compared with the traditional system-on-chip architecture.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
Fig. 1 is a schematic block diagram of an integrated device suitable for precision lossless computation of a multi-precision neural network according to an embodiment of the present invention.
Fig. 2 is a schematic diagram of a spatial multiplexing PE structure according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of MAC structure and convolution calculation according to an embodiment of the present invention.
Fig. 4 is a schematic diagram and design of an accuracy reorganization circuit according to an embodiment of the present invention.
Fig. 5 is a schematic flow chart of a method for a precision lossless calculation integrated device suitable for a multi-precision neural network according to an embodiment of the present invention.
Fig. 6 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.
Detailed Description
The invention discloses an integrated device and method suitable for precision lossless computation of a multi-precision neural network, which are used for making the purposes, technical schemes and effects of the invention clearer and more definite, and further detailed description of the invention is provided below by referring to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. The term "and/or" as used herein includes all or any element and all combination of one or more of the associated listed items.
It will be understood by those skilled in the art that all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs unless defined otherwise. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
Because of the rapid development of the artificial intelligence field in the prior art, the application of the artificial intelligence is rapidly increased, and the requirements on the operation capability and the data processing speed of the complex network are also more and more severe. Under the current computing framework, how to efficiently utilize the deep learning neural network to process data and develop a new generation of energy-efficient neural network accelerator are one of the core problems of research and development in the academia and industry.
The traditional von neumann structure separates data processing and storage, and data needs to be exchanged with a memory frequently in the operation process of deep learning, thus consuming a large amount of energy. According to studies, the energy consumption for data handling is 4 to 1000 times that of floating point calculations. As semiconductor processes progress, the power consumption of data handling is getting larger and larger, although the overall power consumption is decreasing. The integrated architecture of the multi-precision neural network is a key technology for breaking the limitation of a storage wall and breaking through the bottleneck of AI computing energy efficiency. The architecture of the neural network with integrated memory has the following two problems:
in-memory computing architectures that use analog operations inevitably require corresponding digital-to-analog/analog conversions of the input and output data in order to process the digital information. The existing integrated design has the common bottleneck that the AD/DA module occupies the area of the whole system and the energy consumption is overlarge, and the area is about 70-90 percent. At present, the existing calculation work mostly reduces the number of AD/DA by a method sacrificing a certain calculation precision, so that the power consumption of the AD/DA is reduced, the calculation energy efficiency is improved, but the method generally brings about the reduction of the calculation precision and leads to the reduction of the reasoning accuracy.
There is only one case of input and weight bit width in the conventional neural network, such as INT8, but there are many calculations in the neural network that do not cause a decrease in inference accuracy while decreasing the data bit width, and the calculation of lower bit width input data and weight data can achieve higher energy efficiency and throughput than the calculation of 8bit width, especially the network of mixed accuracy (i.e. mixed data bit width) while guaranteeing the inference accuracy as much as possible. The existing architecture integrating calculation cannot well support the mixed precision network calculation.
In order to solve the problems in the prior art, the embodiment provides an integrated device and a method for precision lossless computation suitable for a multi-precision neural network, and performs multiply-accumulate operation and multi-precision recombination operation in a space-time multiplexing mode through input data of the multi-precision neural network and preset weights, so that the integrated computation of the multi-precision neural network supports the computation of the neural network with mixed precision, precision loss is avoided, the computation accuracy is improved, and compared with a traditional system-on-chip architecture, the computation energy efficiency can be greatly improved. When the method is implemented, firstly, input data of a multi-precision neural network are obtained, and digital-to-analog conversion is carried out after the input data are split according to bits, so that a plurality of analog signals are obtained; and then, based on the selector and the processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network.
Exemplary apparatus
As shown in fig. 1, an embodiment of the present invention provides a precision lossless computation integration apparatus suitable for a multi-precision neural network, the apparatus including a digital-to-analog conversion module, a selector, and a processing element, wherein:
the digital-to-analog conversion module is used for converting the digital signal into an analog signal;
the selector is electrically connected with the digital-to-analog conversion module and is used for selecting a plurality of multiplication accumulators;
and the processing element is electrically connected with the selector and is used for carrying out mixed precision calculation on the analog signals.
In this embodiment, the digital-to-analog conversion module (DAC) may be a 36465ADC, or may be any other type of DAC, which is not limited in particular, and converts a digital signal input into the neural network into an analog signal. The selector can be implemented by adopting the existing device or by designing a logic circuit, and is connected with the digital-to-analog conversion module through a circuit for selecting the multiplication accumulator. The selector is p select 1. The processing element, in this embodiment PE, is connected to the selector by a circuit for performing a hybrid accuracy calculation on the analog signal.
In one implementation, a processing element includes a multiply-accumulate array, an analog-to-digital conversion module (ADC) electrically connected to the multiply-accumulate array, and a multi-precision shift-accumulate module electrically connected to the ADC.
In particular, as shown in fig. 2, the Processing Elements (PEs) are arranged in a spatially multiplexed manner, and the multiply-accumulate array is composed of p rows and q columns of multiply-accumulators (MACs), where p and q are non-zero integers. Wherein the number of analog-to-digital conversion modules (ADCs) is q. Each column of MAC shares an ADC and subsequent precision reassembly modules in a spatially multiplexed manner. To achieve spatial multiplexing, only one row of MACs is allowed to remain on at a time. Thus, a p-select 1 selector is used to select one of the p-row MACs to be turned on.
In one implementation, the multiply-accumulator (MAC) includes a memristor and a data processing module electrically connected with the memristor.
Specifically, as shown in fig. 3, the MAC is composed of m×n memristor devices of 1T1R and peripheral circuits (i.e., data processing modules) responsible for input data separation and the like. 1T1R refers to the device structure of the memristor, which is used in the invention, and is named as 1transistor 1resistance in English, and the Chinese meaning is the structure of 1transistor and one resistor.
In this embodiment, as shown in fig. 4, the multi-precision shift accumulation module adopts a processing precision log 2 An ADC of mbit converts the analog voltage signal of each column in the MAC, wherein the processing precision calculation method comprises the following steps: there are m rows of data accumulated simultaneously in mac, which requires that the ADC be able to distinguish between a maximum of m states. At this time, the processing accuracy of the ADC is log 2 mbit processing accuracy. For example, when 16 lines of data are calculated in mac, an ADC with 4-bit processing accuracy is required to be able to complete conversion of the data without loss. Compared with the precision recombination scheme using a current mirror and the conversion of signals with more starting lines by adopting an ADC with lower processing precision, the digital multi-precision shift accumulation module provided by the invention does not bring any extra precision loss. In addition, because the power consumption is mainly concentrated in the ADC and the subsequent multi-precision shift accumulation module, the time multiplexing design is adopted in the invention to reduce the power consumption and the area overhead as much as possible. The clock frequency of the ADC and the multi-precision shift accumulation module is set to n times of the main clock frequency, so that when data is input once, the ADC and the subsequent multi-precision shift accumulation module need to work n times continuously to complete the processing of n columns of data.
Exemplary method
The embodiment provides a method suitable for a precision lossless memory integrated device of a multi-precision neural network, and the method can be applied to an intelligent terminal of a mixed signal circuit. As shown in fig. 5, the method includes:
step S100, obtaining input data of a multi-precision neural network, splitting the input data according to bits, and then performing digital-to-analog conversion to obtain a plurality of analog signals;
specifically, the processing method of the invention is applied to the calculation of the multi-precision neural network, so that the input data of the multi-precision neural network is firstly obtained, then the input data is split according to the bits and then digital-to-analog conversion is carried out, for example, when the calculation is carried out, the input data is split according to the bits and then sequentially sent into a 1-bit DAC (digital-to-analog converter) to be converted into voltage, namely a plurality of analog signals.
After obtaining a plurality of analog signals, the following steps as shown in fig. 5 can be executed, namely, S200, based on the selector and the processing element, multiply-accumulate operation and multi-precision recombination operation are carried out on the plurality of analog signals and preset weights in a space-time multiplexing mode, so that output data of the multi-precision neural network are obtained. Correspondingly, based on the selector and the processing element, the multiplying and accumulating operation and the multi-precision recombination operation are carried out on a plurality of analog signals and preset weights in a space-time multiplexing mode, and the output data of the multi-precision neural network are obtained, and the method comprises the following steps:
s201, acquiring a plurality of time slots, wherein the time slots are used for representing specific time intervals;
s202, selecting a plurality of analog signals by the selector for each time slot to obtain a plurality of time slot signals;
s203, inputting a plurality of time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, and obtaining output data of the multi-precision neural network.
Specifically, a plurality of time slots are acquired firstly, wherein the time slots are used for representing specific time intervals; several time slots such as T1, T2, T3, T4. Selecting a plurality of analog signals through the selector to obtain a plurality of time slot signals; for example, a p-select 1 selector is used to select one of the p-row MACs to be turned on, the m-row slot signal is first selected by a row selector into one of the p-row MACs, and the selected row MAC is held unchanged for U system clock cycles.
Step S203 includes the steps of: inputting a plurality of time slot signals into each row of a multiplication accumulation array in a time multiplexing mode to obtain a plurality of multiplication accumulation results; and sequentially inputting a plurality of multiply-accumulate results to the analog-to-digital conversion module and the multi-precision shift accumulation module to carry out multi-precision recombination, so as to obtain output data of the multi-precision neural network.
Specifically, a plurality of the products are firstly mixedThe time slot signals are input to each row of the multiplication accumulation array in a time multiplexing mode to obtain a plurality of multiplication accumulation results; for example: and (3) carrying out simulated multiply-accumulate operation on the time slot signals and pre-stored weight data in the MAC, completing conversion of n columns of data in one column of MAC in n ADC clocks, namely one system clock, in a time multiplexing mode by the ADC and the shift accumulating module below each column of MAC, and outputting results to obtain a plurality of multiply-accumulate results. After U system clocks, several of the slot signals are selected into the MACs of the other rows for similar operation. And then inputting a plurality of time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, so as to obtain output data of the multi-precision neural network. In this embodiment, the multiply-accumulate (MAC) in the multiply-accumulate array can support multiply-accumulate operations of 1-8bit input data and weight data, and the Ubit weight will be pre-stored in the U memristor devices in the same row before calculation, so that the m×n array size can be stored altogetherThe number of weights, for example, in 2bit computing mode, that can be stored in the MAC array is +.>And in calculation, connecting a plurality of time slot signals to the memristor array to perform simulated multiply-accumulate calculation to obtain a plurality of multiply-accumulate results. And then, a plurality of multiply-accumulate results enter an I/V amplifier in a current mode to finish conversion from current to voltage, a voltage signal is held by a sample hold circuit until the ADC converts the voltage signal into a digital signal and outputs the digital signal to a multi-precision shift accumulating module, the multi-precision shift accumulating module adopts two shift accumulating modules to finish precision recombination of weight bits and precision recombination of input bits (the digital signal) respectively, and one counter signal is used for generating an output marking signal. When calculating, the counter will count from 0-63 repeatedly, the multi-precision shift accumulation module of the weight bit shifts down the clock of each ADC onceAnd accumulating, namely after n ADC clocks, finishing all conversion of n columns of data, pulling up a flag signal at the moment, and sending the flag signal to a multi-precision shifting and accumulating module of an input bit (the digital signal) for one-time shifting and accumulating in the multi-precision shifting and accumulating module of the weight bit to obtain output data of the multi-precision neural network. After n×u ADC clocks, i.e., U system clocks, the calculation result of multiply-accumulate under U bits for both the weight data and the digital signal can be obtained.
The invention adopts the space-time multiplexing ADC to realize the reduction of the power consumption of the AD/DA module without any loss of calculation precision, and uses the multi-bit wide precision reorganization module to realize the support of the multi-precision neural network and improve the calculation energy efficiency and the flux of data processing.
Based on the above embodiment, the present invention also provides an intelligent terminal, and a functional block diagram thereof may be shown in fig. 6. The intelligent terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. The processor of the intelligent terminal is used for providing computing and control capabilities. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the intelligent terminal is used for communicating with an external terminal through network connection. The computer program, when executed by a processor, implements a method for a precision lossless memory-computing integrated device suitable for a multi-precision neural network. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and a temperature sensor of the intelligent terminal is arranged in the intelligent terminal in advance and used for detecting the running temperature of internal equipment.
It will be appreciated by those skilled in the art that the schematic diagram in fig. 6 is merely a block diagram of a portion of the structure associated with the present invention and is not intended to limit the smart terminal to which the present invention is applied, and that a particular smart terminal may include more or fewer components than shown, or may combine certain components, or may have a different arrangement of components.
In one embodiment, a smart terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for: acquiring input data of a multi-precision neural network, splitting the input data according to bits, and performing digital-to-analog conversion to obtain a plurality of analog signals;
based on the selector and the processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In summary, the invention discloses a precision lossless computation integrated device and a method suitable for a multi-precision neural network, wherein the method comprises the following steps: acquiring input data of a multi-precision neural network, splitting the input data according to bits, and performing digital-to-analog conversion to obtain a plurality of analog signals; based on the selector and the processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of the multi-precision neural network. According to the embodiment of the invention, the multiply-accumulate operation and the multi-precision recombination operation are carried out by adopting a space-time multiplexing mode through the input data of the multi-precision neural network and the preset weight, so that the storage and calculation integration of the multi-precision neural network supports the calculation of the hybrid-precision neural network, the precision loss is avoided, the calculation accuracy is improved, and the calculation energy efficiency can be greatly improved compared with the traditional system-on-chip architecture.
Based on the above embodiments, the present invention discloses a precision lossless storage integrated apparatus and method suitable for a multi-precision neural network, it should be understood that the application of the present invention is not limited to the above examples, and those skilled in the art can make modifications or changes according to the above description, and all such modifications and changes should fall within the scope of the appended claims.

Claims (8)

1. An integrated device for precision lossless computation applicable to a multi-precision neural network, characterized in that the device comprises:
the digital-to-analog conversion module is used for converting the digital signal into an analog signal;
the selector is electrically connected with the digital-to-analog conversion module and is used for selecting a plurality of multiplication accumulators;
the processing element is electrically connected with the selector and is used for carrying out mixed precision calculation on the analog signals, wherein the mixed precision calculation refers to carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode, so as to obtain output data of the multi-precision neural network;
performing multiply-accumulate operation and multi-precision recombination operation on the analog signals and preset weights in a space-time multiplexing mode, wherein obtaining output data of the multi-precision neural network comprises the following steps:
acquiring a plurality of time slots, wherein the time slots are used for representing specific time intervals;
selecting a plurality of analog signals by the selector for each time slot to obtain a plurality of time slot signals;
inputting a plurality of time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, so as to obtain output data of the multi-precision neural network;
the step of inputting the time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, and the step of obtaining the output data of the multi-precision neural network comprises the following steps:
inputting a plurality of time slot signals into each row of a multiplication accumulation array in a time multiplexing mode to obtain a plurality of multiplication accumulation results;
and sequentially inputting a plurality of multiply-accumulate results to an analog-digital conversion module and a multi-precision shift accumulation module to carry out multi-precision recombination to obtain output data of the multi-precision neural network, wherein the multi-precision recombination operation is precision recombination of weight bits and digital signal bits carried out on the multi-precision shift accumulation module.
2. The precision lossless memory integrated device for a multi-precision neural network of claim 1, wherein the processing element comprises a multiply-accumulate array, an analog-to-digital conversion module electrically coupled to the multiply-accumulate array, and a multi-precision shift-accumulate module electrically coupled to the analog-to-digital conversion module.
3. The precision non-destructive integrated device for a multi-precision neural network of claim 1, wherein the processing elements are arranged in a spatially multiplexed manner.
4. The precision lossless storage all-in-one device suitable for a multi-precision neural network according to claim 2, wherein the multiply-accumulate array is composed of p rows and q columns of multiply-accumulators, where p and q are non-zero integers.
5. The precision lossless computation apparatus for a multi-precision neural network of claim 4, wherein the multiply accumulator comprises a memristor and a data processing module electrically connected to the memristor.
6. A method of precision non-destructive storage integrated device for use in a multi-precision neural network according to any one of claims 1-5, the method comprising:
acquiring input data of a multi-precision neural network, splitting the input data according to bits, and performing digital-to-analog conversion to obtain a plurality of analog signals;
based on a selector and a processing element, carrying out multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode to obtain output data of a multi-precision neural network;
based on the selector and the processing element, performing multiply-accumulate operation and multi-precision recombination operation on a plurality of analog signals and preset weights in a space-time multiplexing mode, and obtaining output data of the multi-precision neural network comprises the following steps:
acquiring a plurality of time slots, wherein the time slots are used for representing specific time intervals;
selecting a plurality of analog signals by the selector for each time slot to obtain a plurality of time slot signals;
inputting a plurality of time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, so as to obtain output data of the multi-precision neural network;
the step of inputting the time slot signals into a processing element in a time multiplexing mode to carry out multiply-accumulate operation and multi-precision recombination operation, and the step of obtaining the output data of the multi-precision neural network comprises the following steps:
inputting a plurality of time slot signals into each row of a multiplication accumulation array in a time multiplexing mode to obtain a plurality of multiplication accumulation results;
and sequentially inputting a plurality of multiply-accumulate results to an analog-digital conversion module and a multi-precision shift accumulation module to carry out multi-precision recombination to obtain output data of the multi-precision neural network, wherein the multi-precision recombination operation is precision recombination of weight bits and digital signal bits carried out on the multi-precision shift accumulation module.
7. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by one or more processors, the one or more programs comprising instructions for implementing the method of claim 6.
8. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to implement the method of claim 6.
CN202210227427.XA 2022-03-08 2022-03-08 Precision lossless calculation integrated device and method suitable for multi-precision neural network Active CN114707647B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210227427.XA CN114707647B (en) 2022-03-08 2022-03-08 Precision lossless calculation integrated device and method suitable for multi-precision neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210227427.XA CN114707647B (en) 2022-03-08 2022-03-08 Precision lossless calculation integrated device and method suitable for multi-precision neural network

Publications (2)

Publication Number Publication Date
CN114707647A CN114707647A (en) 2022-07-05
CN114707647B true CN114707647B (en) 2023-10-24

Family

ID=82168644

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210227427.XA Active CN114707647B (en) 2022-03-08 2022-03-08 Precision lossless calculation integrated device and method suitable for multi-precision neural network

Country Status (1)

Country Link
CN (1) CN114707647B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115906735B (en) * 2023-01-06 2023-05-05 上海后摩智能科技有限公司 Multi-bit number storage and calculation integrated circuit, chip and calculation device based on analog signals
CN115756388B (en) * 2023-01-06 2023-04-18 上海后摩智能科技有限公司 Multi-mode storage and calculation integrated circuit, chip and calculation device
CN116151343B (en) * 2023-04-04 2023-09-05 荣耀终端有限公司 Data processing circuit and electronic device

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018109968A (en) * 2016-12-28 2018-07-12 株式会社半導体エネルギー研究所 Data processing device, electronic component, and electronic apparatus using neural network
CN112257844A (en) * 2020-09-29 2021-01-22 浙江大学 Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN112559046A (en) * 2020-12-09 2021-03-26 清华大学 Data processing device and artificial intelligence processor
CN112836813A (en) * 2021-02-09 2021-05-25 南方科技大学 Reconfigurable pulsation array system for mixed precision neural network calculation
CN113126953A (en) * 2019-12-30 2021-07-16 三星电子株式会社 Method and apparatus for floating point processing
CN113364462A (en) * 2021-04-27 2021-09-07 北京航空航天大学 Analog storage and calculation integrated multi-bit precision implementation structure
CN113741857A (en) * 2021-07-27 2021-12-03 北京大学 Multiply-accumulate operation circuit
CN114026573A (en) * 2019-06-25 2022-02-08 Arm有限公司 Compact mixed signal multiply-accumulate engine based on nonvolatile memory
CN114049530A (en) * 2021-10-20 2022-02-15 阿里巴巴(中国)有限公司 Hybrid precision neural network quantization method, device and equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11741362B2 (en) * 2018-05-08 2023-08-29 Microsoft Technology Licensing, Llc Training neural networks using mixed precision computations
KR20210154502A (en) * 2020-06-12 2021-12-21 삼성전자주식회사 Neural network apparatus performing floating point operation and operating method of the same

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2018109968A (en) * 2016-12-28 2018-07-12 株式会社半導体エネルギー研究所 Data processing device, electronic component, and electronic apparatus using neural network
CN114026573A (en) * 2019-06-25 2022-02-08 Arm有限公司 Compact mixed signal multiply-accumulate engine based on nonvolatile memory
CN113126953A (en) * 2019-12-30 2021-07-16 三星电子株式会社 Method and apparatus for floating point processing
CN112257844A (en) * 2020-09-29 2021-01-22 浙江大学 Convolutional neural network accelerator based on mixed precision configuration and implementation method thereof
CN112559046A (en) * 2020-12-09 2021-03-26 清华大学 Data processing device and artificial intelligence processor
CN112836813A (en) * 2021-02-09 2021-05-25 南方科技大学 Reconfigurable pulsation array system for mixed precision neural network calculation
CN113364462A (en) * 2021-04-27 2021-09-07 北京航空航天大学 Analog storage and calculation integrated multi-bit precision implementation structure
CN113741857A (en) * 2021-07-27 2021-12-03 北京大学 Multiply-accumulate operation circuit
CN114049530A (en) * 2021-10-20 2022-02-15 阿里巴巴(中国)有限公司 Hybrid precision neural network quantization method, device and equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
A Configurable Floating-Point Multiple-Precision Processing Element for HPC and AI Converged Computing;Wei Mao et.al;《IEEE Transactions on Very Large Scale Integration (VLSI) Systems》;第30卷(第2期);第213-226页 *
一种细粒度可重构的深度神经网络加速芯片;刘晏辰 等;半导体技术(01);第25-30、51页 *

Also Published As

Publication number Publication date
CN114707647A (en) 2022-07-05

Similar Documents

Publication Publication Date Title
CN114707647B (en) Precision lossless calculation integrated device and method suitable for multi-precision neural network
US10936941B2 (en) Efficient data access control device for neural network hardware acceleration system
US20240168718A1 (en) Circuit based on digital domain in-memory computing
Chu et al. PIM-prune: Fine-grain DCNN pruning for crossbar-based process-in-memory architecture
US10496855B2 (en) Analog sub-matrix computing from input matrixes
US20200285605A1 (en) Systolic array and processing system
CN110442323B (en) Device and method for performing floating point number or fixed point number multiply-add operation
CN113743600B (en) Storage and calculation integrated architecture pulse array design method suitable for multi-precision neural network
CN111915001A (en) Convolution calculation engine, artificial intelligence chip and data processing method
CN112153139B (en) Control system and method based on sensor network and in-memory computing neural network
CN111611197A (en) Operation control method and device of software-definable storage and calculation integrated chip
CN212112470U (en) Matrix multiplication circuit
CN114945916A (en) Apparatus and method for matrix multiplication using in-memory processing
CN115906976A (en) Full-analog vector matrix multiplication memory computing circuit and application thereof
CN113870918A (en) In-memory sparse matrix multiplication method, equation solving method and solver
US11748100B2 (en) Processing in memory methods for convolutional operations
US11256503B2 (en) Computational memory
CN116306854A (en) Transformer neural network acceleration device and method based on photoelectric storage and calculation integrated device
CN111931938B (en) Cyclic neural network reasoning operation acceleration system and method based on structured sparsity
CN113741857A (en) Multiply-accumulate operation circuit
CN111988031A (en) Memristor memory vector matrix arithmetic device and arithmetic method
CN113326914A (en) Neural network computing method and neural network computing device
CN221200393U (en) Small chip device and artificial intelligent accelerator device
CN110717580B (en) Calculation array based on voltage modulation and oriented to binarization neural network
CN116386687B (en) Memory array for balancing voltage drop influence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant