CN114997388A - Linear programming-based neural network bias processing method for memory and computation integrated chip - Google Patents

Linear programming-based neural network bias processing method for memory and computation integrated chip Download PDF

Info

Publication number
CN114997388A
CN114997388A CN202210764364.1A CN202210764364A CN114997388A CN 114997388 A CN114997388 A CN 114997388A CN 202210764364 A CN202210764364 A CN 202210764364A CN 114997388 A CN114997388 A CN 114997388A
Authority
CN
China
Prior art keywords
data
bias
integrated chip
low
linear programming
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210764364.1A
Other languages
Chinese (zh)
Other versions
CN114997388B (en
Inventor
胡剑超
刘俊麟
张爱飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Witinmem Technology Co ltd
Original Assignee
Beijing Witinmem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Witinmem Technology Co ltd filed Critical Beijing Witinmem Technology Co ltd
Priority to CN202210764364.1A priority Critical patent/CN114997388B/en
Publication of CN114997388A publication Critical patent/CN114997388A/en
Application granted granted Critical
Publication of CN114997388B publication Critical patent/CN114997388B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/0223User address space allocation, e.g. contiguous or non contiguous base addressing
    • G06F12/023Free address space management
    • G06F12/0238Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory
    • G06F12/0246Memory management in non-volatile memory, e.g. resistive RAM or ferroelectric memory in block erasable memory, e.g. flash memory
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the invention provides a linear programming-based neural network bias processing method for a memory-computation integrated chip, which comprises the following steps: acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer; inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored in a digital domain of the target storage and calculation integrated chip to participate in digital domain calculation and then combined with an analog calculation result output by an analog domain. By adopting the technical scheme, the truncation error problem caused by saturation is reduced, and the operation precision of the chip is improved.

Description

Linear programming-based neural network bias processing method for memory and computation integrated chip
Technical Field
The invention relates to the technical field of semiconductors, in particular to a linear programming-based neural network bias processing method, device and equipment for a memory-computation-integrated chip and a storage medium.
Background
In recent years, with the continuous development of three dimensions of algorithm, computing power and data size, machine learning technology continuously shows strong advantages in solving many problems. Among them, the artificial neural network has attracted much attention due to its prominent expression in the fields of image recognition, object detection, semantic segmentation, and the like. However, with the enlargement of the scale of the neural network, the mode of processing the neural network algorithm by the CPU + GPU architecture has been gradually limited by the speed and power consumption, and the root cause of the bottleneck is that the von neumann architecture is separated, so that the data-centric neural network algorithm brings too much data transmission overhead to the computing system, and the speed is reduced while the power consumption is increased.
The In-memory computing technology solves the problems caused by storage and computation separation, the weight of a neural network is stored on the conductance of each flash memory unit of a flash memory unit array In a storage and computation integrated neural network processing (In-flash NPU) chip, then a data source expressed by voltage is sent to the flash memory unit array, and the current output by the flash memory unit array is the product of the voltage and the conductance known from ohm's law, so that the matrix multiplication and addition operation of the data source and the weight is completed, and the analog computation is performed essentially instead of the traditional digital computation.
The design of the tool chain is an important link in the whole process from design to production of the integrated chip. In the tool chain design facing to the storage and computation integrated chip, in order to improve the computation precision, the value range of the weight can be widened, the weight is amplified, and the weight part is allowed to exceed the representation range of 8 bits. In actual operation, the weight is divided into a high-order part and a low-order part, the operation of multiplying and adding the high-order part and an input matrix and dividing by a scaling factor is carried out on a digital domain of the storage and calculation integrated chip, the operation of multiplying and adding the low-order part and the input matrix, the operation of summing a matrix multiplication and addition result and offset and the operation of dividing the summation result by the scaling factor are carried out on an analog domain of the storage and calculation integrated chip, and finally the operation results of the high-order part and the low-order part are summed in the digital domain. In such a scenario, the summation operation of the whole offset and the matrix multiplication and addition result is directly put on an analog domain for operation, and after the sum of the matrix multiplication and addition result and the offset is divided by a scaling coefficient, a result exceeds a bit width range preset by a memory integrated chip to cause truncation errors, so that the calculation accuracy of the chip is reduced.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a linear programming-based neural network bias processing method, device, equipment and storage medium for a storage and computation integrated chip, which can at least partially solve the problems in the prior art.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, a linear programming based neural network bias processing method for a storage and computation integrated chip is provided, including:
acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer;
inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; wherein the content of the first and second substances,
the bias low-bit data is used for mapping to a flash memory unit array of a simulation domain of the target storage and calculation integrated chip to participate in simulation operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
Further, the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.
Further, the linear programming solution model objective function includes:
summing the result of multiply-add operation of the input sample data and the low-level data of the weight array with the offset low-level data, wherein the total saturation truncation of the quotient of the sum divided by the scaling factor G _ Scale is performed for a minimum number of times, wherein the truncation is performed if the quotient is lower than a saturation truncation lower limit or higher than a saturation truncation upper limit, and the saturation truncation lower limit is-2 n-1 The upper limit of saturation cut-off is 2 n-1 -1, n is the input-output bit width;
summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;
the maximum value of the absolute value of the offset low-bit data is as small as possible;
wherein the upper limit of the bias is m x (2) n-1 -1) xK with a lower bias limit of mx (-2) n-1 ) And x K, wherein m is the maximum row number of the offset array, and K is the amplification factor of the offset data.
Further, the constraint conditions of the linear programming solution model include:
the bias lower data is located between the bias lower limit and the bias upper limit;
the bias high-order data is positioned between the bias high-order lower limit and the bias high-order upper limit;
the sum of the offset high-order data multiplied by a scaling factor and the offset low-order data is equal to the offset data;
wherein, the upper limit of the bias high level is: 2 w-1 -1, lower limit of offset high is 2 w-1 And w is the bit width of the digital domain.
In a second aspect, a linear programming based neural network bias processing apparatus for storing a monolithic chip is provided, including:
the parameter acquisition module is used for acquiring input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;
the linear solving module is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-bit data and bias low-bit data; wherein the content of the first and second substances,
the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
In a third aspect, a computing integrated chip is provided, comprising: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; and generating the bias low data and the bias high data according to the neural network bias processing method based on the linear programming.
In a fourth aspect, an electronic device is provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the linear programming based neural network bias processing method when executing the program.
In a fifth aspect, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the steps of the above-mentioned linear programming-based neural network bias processing method.
The embodiment of the invention provides a linear programming-based neural network bias processing method for a memory-computation integrated chip, which comprises the following steps: acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer; inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored in a digital domain of the target storage and calculation integrated chip to participate in digital domain calculation and then combined with an analog calculation result output by an analog domain. By adopting the technical scheme, the truncation error problem caused by saturation is reduced, and the operation precision of the chip is improved.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts. In the drawings:
FIG. 1 is a flow chart illustrating a linear programming based neural network bias processing method for a computational integrated chip in an embodiment of the present invention;
FIG. 2 illustrates operational data in an embodiment of the present invention;
FIG. 3 is a block diagram of a linear programming-based neural network mapping apparatus for a computational integrated chip according to an embodiment of the present invention;
fig. 4 is a structural diagram of an electronic device according to an embodiment of the invention.
Detailed Description
In order to make the technical solutions of the present application better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
It should be noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of this application and the above-described drawings, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
In the prior art, the summation operation of the whole offset and the matrix multiplication and addition result is directly put on an analog domain for operation, and after the sum of the matrix multiplication and addition result and the offset is divided by a scaling coefficient, a result exceeds a bit width range preset by a storage and calculation integrated chip to cause truncation errors, so that the calculation accuracy of the chip is reduced. For example, g _ scale is 1, the result of the multiplication and addition of the representative samples and the low-order weight has 3 sets, which are respectively [ -200,0], [ -100,100], [0,200], the original bias has a value of [0,0], if the bias split is not considered, the result of the three sets is added to the bias value and then divided by g _ scale, the result is also [ -200,0], [ -100,100], [0,200], at this time, truncation occurs, and the final output result is [ -128,0], [ -100,100], [0,127], which results in low operation precision of the chip.
FIG. 1 is a flow chart illustrating a linear programming based neural network bias processing method for a computational integrated chip in an embodiment of the present invention; as shown in fig. 1, the linear programming based neural network bias processing method for a memory integrated chip may include the following steps:
step S100: acquiring input sample data, weight data, bias data and hardware parameters of the target storage and computation integrated chip of the target neural network layer.
It is worth to be noted that, the flash memory cell array of the target bank chip has been designed with the flash memory cell array for writing the weight array and the flash memory cell array for writing the offset in the hardware design stage, and the hardware parameters of the target bank chip are known; for a specific trained neural network, the weight data and the bias data of each layer are known.
In addition, the input sample data may be a plurality of samples, which are typical samples corresponding to the target neural network application scenario.
Step S200: and inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model for solving to obtain bias high-bit data and bias low-bit data.
It is worth to be noted that the offset low-bit data is used for being mapped to the flash memory unit array of the analog domain of the target storage and computation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and is combined with an analog operation result output by an analog domain.
The embodiment of the invention provides a strict mathematical basis for bias splitting to search an optimal splitting result, splits bias data to reduce the truncation error problem caused by saturation, converts the split bias data into a solving problem of a linear programming mathematical model, and uses the solved bias low-bit data to be mapped to a flash memory unit array of a simulation domain of the target storage and computation integrated chip to participate in simulation operation; the bias high-order data is stored into a digital domain of the target storage and calculation integrated chip and is combined with an analog operation result output by an analog domain, so that the truncation error problem caused by saturation is reduced, and the operation precision of the chip is improved.
In an alternative embodiment, the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.
It should be noted that the weight array high-order data and the weight array low-order data are obtained by splitting the weight array according to a preset method, specifically, the weight array splitting technology may be to truncate the neural network weight array, use an array formed by the low-order data as the weight array low-order data, and use an array formed by the high-order data as the weight array high-order data, or may be to truncate an overflow bit into the weight array high-order data after integrally amplifying or reducing the weight array, and use the remaining data after truncation as the weight array low-order data after integrally amplifying or reducing the weight array.
The principles of embodiments of the present invention are described in the following formula:
Figure BDA0003724935400000061
wherein Input represents Input samples, each sample being a one-dimensional vector; weight L represents the lower x-bit part of the weight array and is a two-dimensional matrix; weight H represents the high y bit part of the weight array and is a two-dimensional matrix; for example, x may be equal to 8 or 16, etc., and y may also represent 8 or 16, etc., for example, if the hardware supports 8 bits, x and y may be designed to be equal, both being equal to 8.
BiasL indicates the lower bits of Bias after splitting, and is a one-dimensional vector, and the data type of the lower bits of Bias can be a 32-bit integer, assuming that x is 8; BiasH represents the high order of the Bias after splitting, and is a one-dimensional vector, the data type can be INT8, and the value range is-128 to 127 (the output data type also supports INT16, and the corresponding value range is-32768 to 32767). Wherein:
WeightL+WeightH*2 x weight is the Weight array and x is the low bit width of the Weight split.
BiasL + BiasH G _ Scale ═ Bias, Bias is the scaling factor G _ Scale;
"+" is the matrix/vector addition.
(weight Input + BiasL)/G _ Scale are placed in an analog part of the chip for calculation, the type of the output data can be INT8, data beyond the range indicated by INT8 can be cut into-128 and 127, and may be INT16 and the like, which is not limited in the embodiment of the present invention; (WeightH Input) 2 x the/G _ Scale and the sum with BiasH are put in the digital part of the chip for calculation.
In an alternative embodiment, the objective function of the linear programming solution model includes three, specifically:
(1) and summing the multiplication and addition operation result of the input sample data and the lower data of the weight array and the offset lower data, wherein the frequency of total saturation truncation of the sum value divided by the scaling coefficient G _ Scale is minimum.
Wherein the truncation is performed if the sum is lower than a saturation truncation lower limit or higher than a saturation truncation upper limit, and the saturation truncation lower limit is-2 n-1 The upper limit of saturation cut-off is 2 n-1 -1, n is the input-output bit width; saturation truncation may also be understood as data overflow.
See in particular the following formula:
Figure BDA0003724935400000071
wherein, | | represents or, InputNum represents the number of typical samples; weight Column represents the number of columns of the Weight array, the formula takes the output of hardware support 8bit as an example, 128 and 127 correspond to the upper and lower limits of int8, if the number of bits supported by hardware changes, the corresponding upper and lower limit values in the above formula also need to change.
(2) Summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;
see in particular the following formula:
Figure BDA0003724935400000072
Figure BDA0003724935400000073
by adopting the technical scheme, the robustness of the model can be improved. The target may be understood as a second solution target that is a distance between the quotient of the sum of the weighted low-order data and the representative input sample and the sum of the biased low-order bits divided by the scaling factor and the overflow upper and lower bounds.
(3) The maximum value of the absolute value of the offset low-bit data is as small as possible;
see in particular the following formula:
Figure BDA0003724935400000081
min(max(abs(BiasL i )))
wherein the upper limit of the bias is m × (2) n-1 -1) xK with a lower bias limit of mx (-2) n-1 ) xK, m being the maximum number of rows in the bias array, and K being the amplification factor of the bias data, typically 128, which is self-contained after the model has been trained.
In an alternative embodiment, the constraints of the linear programming solution model include:
(1) the bias lower data is positioned between the bias lower limit and the bias upper limit;
(2) the bias high-order data is positioned between the bias high-order lower limit and the bias high-order upper limit, and the bias high-order upper limit is as follows: 2 w-1 -1, lower limit of offset high is 2 w-1 And w is the bit width of the digital domain. (ii) a
For example, if the hardware supports 8 bits, the value of each element in the Bias low-bit data cannot exceed the value range of the Bias array: -128 offset array row numbers 128 ═ BiasL < ═ 127 offset array row numbers 128.
(3) The sum of the offset high-order data multiplied by the scaling coefficient and the offset low-order data is equal to the offset data;
by adopting the technical scheme, the bias is split according to the low-order weight and a typical input multiply-add result to reduce the truncation error problem caused by saturation, and the split bias is converted into a solving problem of a linear programming mathematical model; converting the low-order weight value of each layer of the neural network, the multiplication and addition result of typical input, the maximum row number of the offset array, the offset and the G _ Scale value into the constraint and target of linear programming for solving; the sum of the result of multiplication and addition of the weighted low bits and the sample and the bias low bits is divided by the scaling factor G _ Scale, the total amount of saturation is minimum, the maximum value of the absolute value of the split bias is required to be as small as possible (the space occupation of the bias when the bias is arranged in the bias array is reduced), and the maximum value of the absolute value when the bias is not saturated is required to be as small as possible (the possibility of saturation truncation when the bias is input in an atypical mode is reduced).
The scheme provided by the embodiment of the invention simultaneously considers a plurality of conditions of minimum saturation times, minimum maximum value of the bias absolute value, minimum maximum value of the absolute value during unsaturation and the like, and can obtain a theoretical optimal solution; this scheme is easy to scale, increasing and decreasing the restrictions in the mapping.
It should be noted that the upper bias limit and the lower bias limit refer to the maximum representation range of the hardware bias array, for example, for a certain hardware, the bias array is 16 rows, the single row is represented by-128 to 127, the total representation range is 16 × (-128 to 127), and then multiplied by an amplification factor of 128, and the final representation range is 16-128 × 128 to 16 × 127 × 128.
It should be noted that, in practical applications, the linear programming solution model may directly call the open-source linear programming solution model in Python and the like, for example, the open-source linear programming solution model of google may be used. To enable those skilled in the art to better understand the present application, FIG. 2 illustrates a split example of an offset array.
In an alternative embodiment, an embodiment of the present invention further provides a storage integrated chip, where the chip includes: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and the weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data, and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; and generating the bias low-bit data and the bias high-bit data according to the neural network bias processing method based on the linear programming.
It should be noted that the memory integrated chip provided in the embodiment of the present invention may be applied to various electronic devices, such as: smart phones, tablet electronic devices, network set-top boxes, portable computers, desktop computers, Personal Digital Assistants (PDAs), vehicle-mounted devices, smart wearable devices, toys, smart home control devices, pipeline device controllers, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
Based on the same inventive concept, the embodiment of the present application further provides a linear programming-based neural network bias processing apparatus for storing and computing a monolithic chip, which can be used to implement the methods described in the above embodiments, as described in the following embodiments. The principle of solving the problems by the linear programming based neural network bias processing device for the memory and computation integrated chip is similar to that of the method, so the implementation of the linear programming based neural network bias processing device for the memory and computation integrated chip can refer to the implementation of the method, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a linear programming-based neural network bias processing apparatus for storing a monolithic chip according to an embodiment of the present invention. The linear programming-based neural network bias processing device for the memory-computation-integrated chip comprises: a parameter acquisition module 10 and a linear solving module 20.
The parameter acquisition module 10 acquires input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;
the linear solving module 20 is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-order data and bias low-order data; wherein the content of the first and second substances,
the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
The apparatuses, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is an electronic device, which may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
In a typical example, the electronic device specifically includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the program to implement the steps of the neural network mapping method for storing a monolithic chip described above.
Referring now to FIG. 4, shown is a schematic diagram of an electronic device 600 suitable for use in implementing embodiments of the present application.
As shown in fig. 4, the electronic apparatus 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate jobs and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM)) 603. In the RAM603, various programs and data necessary for the operation of the system 600 are also stored. The CPU601, ROM602, and RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted as necessary on the storage section 608.
In particular, the processes described above with reference to the flowcharts may be implemented as a computer software program according to an embodiment of the present invention. For example, an embodiment of the present invention includes a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the above-described linear programming-based neural network bias processing method for storing a monolithic chip.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the various elements may be implemented in the same one or more pieces of software and/or hardware in the practice of the present application.
The present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
All the embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (8)

1. A linear programming-based neural network bias processing method for a memory-computer integrated chip is characterized by comprising the following steps:
acquiring input sample data, weight data, bias data and hardware parameters of a target storage and calculation integrated chip of a target neural network layer;
inputting the input sample data, the weight data, the bias data and the hardware parameters into a pre-established linear programming solving model to solve to obtain bias high-bit data and bias low-bit data; wherein the content of the first and second substances,
the bias low-bit data is used for mapping to a flash memory unit array of an analog domain of the target storage and calculation integrated chip to participate in analog operation; and the bias high-order data is used for being stored into a digital domain of the target storage and calculation integrated chip and combined with an analog operation result output by an analog domain.
2. The linear programming-based neural network bias processing method for a computationally integrated chip of claim 1, wherein the weight data includes: the weight array high-order data, the weight array low-order data and the scaling coefficient; the hardware parameters of the target storage and calculation integrated chip comprise: the input and output bit width of the flash memory unit array, the bit width of the digital domain and the maximum row number of the offset array.
3. The linear programming-based neural network bias processing method for the storage-computation-integrated chip according to claim 2, wherein the linear programming solving an objective function of a model comprises:
the input sample data and the input data are combinedThe sum of the multiplication and addition operation result of the low-order data of the weight array and the bias low-order data is the least, and the total saturation truncation times of the quotient divided by the scaling coefficient is the least, wherein, if the quotient is lower than the saturation truncation lower limit or higher than the saturation truncation upper limit, the saturation truncation lower limit is-2 n-1 The upper limit of saturation cut-off is 2 n-1 -1, n is the input and output bit width;
summing the multiplication and addition operation result of the input sample data and the weight array low-level data with the offset low-level data, and if the sum is not saturated and truncated after being divided by the scaling coefficient, enabling the maximum value of the absolute value of the quotient to be as small as possible;
the maximum value of the absolute value of the offset low-bit data is as small as possible;
wherein the upper limit of the bias is m x (2) n-1 -1) xK with a lower bias limit of mx (-2) n-1 ) And x K, wherein m is the maximum row number of the offset array, and K is the amplification factor of the offset data.
4. The linear programming-based neural network bias processing method for the storage-computation-integrated chip according to claim 3, wherein the constraints of the linear programming solution model include:
the bias low data is located between the bias lower limit and the bias upper limit;
the bias high-level data is positioned between the lower limit of the bias high-level and the upper limit of the bias high-level, and the upper limit of the bias high-level is as follows: 2 w-1 -1, the lower limit of the offset high is-2 w-1 W is the bit width of the digital domain;
the sum of the offset high order data multiplied by a scaling factor and the offset low order data is equal to the offset data.
5. A linear programming-based neural network bias processing device for storing and calculating a unified chip is characterized by comprising the following components:
the parameter acquisition module is used for acquiring input sample data, weight data, bias data and hardware parameters of the target storage and calculation integrated chip of the target neural network layer;
the linear solving module is used for solving the input sample data, the weight data, the bias data and the hardware parameter input into a pre-established linear programming solving model to obtain bias high-bit data and bias low-bit data; wherein the content of the first and second substances,
the bias low-bit data is used for mapping to a flash memory unit array of a simulation domain of the target storage and calculation integrated chip to participate in simulation operation; and the bias high-order data is used for being stored in a digital domain of the target storage and calculation integrated chip to participate in digital operation and then combined with an analog operation result output by an analog domain.
6. A computing integrated chip, comprising: the analog domain and the digital domain; the analog domain includes: the analog domain is used for executing matrix multiply-add operation of input data and the weight low-bit data, summation operation of a matrix multiply-add operation result and the bias low-bit data, and division operation of the summation result and a scaling parameter; pre-storing bias high-bit data and the scaling parameter in the digital domain, and summing a division operation result of the weight high-bit data and the scaling parameter with a division operation result output by the analog domain; wherein the bias low data and the bias high data are generated according to the linear programming-based neural network bias processing method of any one of claims 1 to 4.
7. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 4 when executing the program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the linear programming based neural network bias processing method of any one of claims 1 to 4.
CN202210764364.1A 2022-06-30 2022-06-30 Neural network bias processing method based on linear programming for memory and calculation integrated chip Active CN114997388B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210764364.1A CN114997388B (en) 2022-06-30 2022-06-30 Neural network bias processing method based on linear programming for memory and calculation integrated chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210764364.1A CN114997388B (en) 2022-06-30 2022-06-30 Neural network bias processing method based on linear programming for memory and calculation integrated chip

Publications (2)

Publication Number Publication Date
CN114997388A true CN114997388A (en) 2022-09-02
CN114997388B CN114997388B (en) 2024-05-07

Family

ID=83019499

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210764364.1A Active CN114997388B (en) 2022-06-30 2022-06-30 Neural network bias processing method based on linear programming for memory and calculation integrated chip

Country Status (1)

Country Link
CN (1) CN114997388B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109861666A (en) * 2019-01-28 2019-06-07 山东大学 FRM filter design method and system based on Feedback Neural Network
CN111291876A (en) * 2020-01-21 2020-06-16 厦门星宸科技有限公司 Arithmetic device, arithmetic method, and arithmetic chip
US20200209813A1 (en) * 2018-12-26 2020-07-02 Fujitsu Limited Optimization device and control method of optimization device
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN111694544A (en) * 2020-06-02 2020-09-22 杭州知存智能科技有限公司 Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip
CN112825153A (en) * 2019-11-20 2021-05-21 华为技术有限公司 Data processing method in neural network system and neural network system
CN113033759A (en) * 2019-12-09 2021-06-25 南京惟心光电系统有限公司 Pulse convolution neural network algorithm, integrated circuit, arithmetic device, and storage medium
CN113467751A (en) * 2021-07-16 2021-10-01 东南大学 Analog domain in-memory computing array structure based on magnetic random access memory
CN113988277A (en) * 2021-10-11 2022-01-28 北京知存科技有限公司 Neural network mapping method, device and equipment for storage and computation integrated chip
CN114444688A (en) * 2022-01-14 2022-05-06 百果园技术(新加坡)有限公司 Neural network quantization method, apparatus, device, storage medium, and program product

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200209813A1 (en) * 2018-12-26 2020-07-02 Fujitsu Limited Optimization device and control method of optimization device
CN109861666A (en) * 2019-01-28 2019-06-07 山东大学 FRM filter design method and system based on Feedback Neural Network
CN112825153A (en) * 2019-11-20 2021-05-21 华为技术有限公司 Data processing method in neural network system and neural network system
CN113033759A (en) * 2019-12-09 2021-06-25 南京惟心光电系统有限公司 Pulse convolution neural network algorithm, integrated circuit, arithmetic device, and storage medium
CN111291876A (en) * 2020-01-21 2020-06-16 厦门星宸科技有限公司 Arithmetic device, arithmetic method, and arithmetic chip
CN111694544A (en) * 2020-06-02 2020-09-22 杭州知存智能科技有限公司 Multi-bit multiplexing multiply-add operation device, neural network operation system, and electronic apparatus
CN111626414A (en) * 2020-07-30 2020-09-04 电子科技大学 Dynamic multi-precision neural network acceleration unit
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip
CN113467751A (en) * 2021-07-16 2021-10-01 东南大学 Analog domain in-memory computing array structure based on magnetic random access memory
CN113988277A (en) * 2021-10-11 2022-01-28 北京知存科技有限公司 Neural network mapping method, device and equipment for storage and computation integrated chip
CN114444688A (en) * 2022-01-14 2022-05-06 百果园技术(新加坡)有限公司 Neural network quantization method, apparatus, device, storage medium, and program product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIA H, ET AL.: "Scalable and programmable neural network inference accelerator based on in-memory computing", IEEE, 1 January 2022 (2022-01-01), pages 198 - 211, XP093051677, DOI: 10.1109/JSSC.2021.3119018 *
刘续文: "局部动态重构内存计算系统资源调度算法研究", 中国优秀硕士学位论文全文数据库-信息科技辑, 15 April 2022 (2022-04-15), pages 9 - 52 *

Also Published As

Publication number Publication date
CN114997388B (en) 2024-05-07

Similar Documents

Publication Publication Date Title
US10691413B2 (en) Block floating point computations using reduced bit-width vectors
KR102342604B1 (en) Method and apparatus for generating neural network
US10860679B2 (en) Calculating device, calculation program, recording medium, and calculation method
US20200117981A1 (en) Data representation for dynamic precision in neural network cores
CN113344170A (en) Neural network weight matrix adjusting method, writing control method and related device
EP3676698B1 (en) Providing efficient floating-point operations using matrix processors in processor-based systems
Ilin et al. Fast integer approximations in convolutional neural networks using layer-by-layer training
US20200090066A1 (en) Calculating device, calculation program, recording medium, and calculation method
CN112686031A (en) Text feature extraction model quantification method, device, equipment and storage medium
US20220137924A1 (en) Dynamic bias analog vector-matrix multiplication operation circuit and operation control method therefor
CN114997388B (en) Neural network bias processing method based on linear programming for memory and calculation integrated chip
CN116306709A (en) Data processing method, medium and electronic equipment
CN116151961A (en) Credit risk prediction method, electronic device and readable storage medium
CN110874206A (en) Data processing method and device based on optical chip, storage medium and electronic equipment
CN114723024A (en) Linear programming-based neural network mapping method for storage and calculation integrated chip
CN112765936B (en) Training method and device for operation based on language model
Kim et al. Applying piecewise linear approximation for DNN non-linear activation functions to Bfloat16 MACs
JP7137648B2 (en) Calculation device, calculation program, recording medium and calculation method
CN112074806B (en) System, method and computer storage medium for block floating point computing
US20240211530A1 (en) Calculating device, calculation program, recording medium, and calculation method
CN111587441B (en) Generating output examples using regression neural networks conditioned on bit values
He et al. Deep neural network acceleration method based on sparsity
Schneider et al. Analog hardware implementation issues in deterministic Boltzmann machines
Han et al. Optimizing Deep-Learning Inference Operations
CN117973478A (en) Optimization method, device, equipment, medium and computer program product for large language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 213-175, 2nd Floor, Building 1, No. 180 Kecheng Street, Qiaosi Street, Linping District, Hangzhou City, Zhejiang Province, 311100

Applicant after: Hangzhou Zhicun Computing Technology Co.,Ltd.

Address before: 1707, 17th floor, shining building, No. 35, Xueyuan Road, Haidian District, Beijing 100083

Applicant before: BEIJING WITINMEM TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant