CN111967588A - Quantitative operation method and related product - Google Patents

Quantitative operation method and related product Download PDF

Info

Publication number
CN111967588A
CN111967588A CN202010769615.6A CN202010769615A CN111967588A CN 111967588 A CN111967588 A CN 111967588A CN 202010769615 A CN202010769615 A CN 202010769615A CN 111967588 A CN111967588 A CN 111967588A
Authority
CN
China
Prior art keywords
data
operated
quantization
neural network
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010769615.6A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN202010769615.6A priority Critical patent/CN111967588A/en
Publication of CN111967588A publication Critical patent/CN111967588A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • G06F17/153Multidimensional correlation or convolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a quantization operation method and related products, which are applied to a combined processing device, wherein the combined processing device comprises electronic equipment, an interface device, other processing devices and a storage device, the electronic equipment can comprise one or more computing devices, and the computing devices can be configured to execute the quantization operation method. By adopting the embodiment of the application, the processing efficiency of the electronic equipment can be improved.

Description

Quantitative operation method and related product
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a quantization operation method and a related product.
Background
With the continuous development of information technology and the increasing demand of people, the requirement of people on the timeliness of information is higher and higher. Currently, the electronic device obtains and processes information based on a general-purpose processor. In practice, it is found that the way in which such electronic devices process information based on a general-purpose processor running a software program is limited by the large amount of stored data of the general-purpose processor, which reduces data processing efficiency.
Disclosure of Invention
The embodiment of the application provides a quantitative operation method and a related product, which can improve the processing efficiency of electronic equipment.
In a first aspect, an embodiment of the present application provides a quantization operation method, where the method includes:
acquiring data to be operated;
performing first quantization processing on the data to be operated to obtain appointed data to be operated;
performing neural network operation on the specified data to be operated to obtain a first operation result;
and carrying out second quantization processing on the first operation result to obtain a second operation result.
In a second aspect, an embodiment of the present application further provides a quantization operation apparatus, where the apparatus includes:
the acquisition unit is used for acquiring data to be operated;
the first quantization unit is used for performing first quantization processing on the data to be operated to obtain appointed data to be operated;
the operation unit is used for carrying out neural network operation on the specified data to be operated to obtain a first operation result;
and the second quantization unit is used for performing second quantization processing on the first operation result to obtain a second operation result.
In a third aspect, an embodiment of the present application further provides a neural network chip, configured to perform the method according to the first aspect.
In a fourth aspect, an embodiment of the present application further provides a board card, where the board card includes: a memory device, an interface apparatus and a control device and a neural network chip as described in the third aspect;
wherein, the neural network chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the chip and external equipment;
and the control device is used for monitoring the state of the chip.
In a fifth aspect, embodiments of the present application further provide an electronic device, where the electronic device is configured to perform the method according to the first aspect, or the electronic device includes the neural network chip according to the third aspect, or the electronic device includes the board according to the fourth aspect.
In a sixth aspect, an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program enables a computer to perform some or all of the steps as described in the first aspect of the embodiment of the present application.
In a seventh aspect, this application embodiment provides a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, where the computer program is operable to cause a computer to perform some or all of the steps as described in the first aspect of this application embodiment. The computer program product may be a software installation package.
By adopting the embodiment of the application, the following beneficial effects are achieved:
it can be seen that, in the quantization operation method and the related product according to the embodiments of the present application, data to be operated is obtained, first quantization processing is performed on the data to be operated to obtain specified data to be operated, neural network operation is performed on the specified data to be operated to obtain a first operation result, and second quantization processing is performed on the first operation result to obtain a second operation result.
These and other aspects of the present application will be more readily apparent from the following description of the embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1A is a schematic flowchart of a quantization operation method according to an embodiment of the present disclosure;
FIG. 1B is a schematic diagram illustrating a comparison between a quantization model and a non-quantization model provided in an embodiment of the present application;
fig. 1C is a schematic flowchart of another quantization operation method according to an embodiment of the present disclosure;
fig. 1D is a schematic flowchart of another quantization operation method according to an embodiment of the present disclosure;
fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;
FIG. 3 is a block diagram illustrating functional units of a quantization operation device according to an embodiment of the present disclosure;
fig. 4 is a block diagram of functional units of a combined processing device according to an embodiment of the present disclosure;
fig. 5 is a block diagram of functional units of a board card according to an embodiment of the present disclosure.
Detailed Description
The following are detailed below.
The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and in the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The electronic devices may include various handheld devices having wireless communication functions, in-vehicle devices, wireless headsets, computing devices or other processing devices connected to wireless modems, as well as various forms of User Equipment (UE), Mobile Stations (MS), terminal devices (terminal device), and the like, and may be, for example, smart phones, tablets, earphone boxes, and the like. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.
The electronic device described above may be applied in the following (including but not limited to) scenarios: the system comprises various electronic products such as a data processing device, a robot, a computer, a printer, a scanner, a telephone, a tablet computer, an intelligent terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage device and a wearable device; various vehicles such as airplanes, ships, vehicles, and the like; various household appliances such as televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas stoves, range hoods and the like; and various medical devices including nuclear magnetic resonance apparatuses, B-ultrasonic apparatuses, electrocardiographs and the like.
In order to make the technical solutions of the present application better understood, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The following describes embodiments of the present application in detail.
Referring to fig. 1A, fig. 1A is a schematic flow chart of a quantization operation method according to an embodiment of the present application,
101. and acquiring data to be operated.
Wherein the data to be calculated may be stored in a memory or an external device.
In one possible example, the data to be operated on is at least one of: neuron data, and/or gradient data, and/or weight data.
In a specific implementation, the data to be calculated may be neuron data, or the data to be calculated may be weight data, or the data to be calculated may be gradient data, or the data to be calculated may include neuron data, weight data, and the like, which are not listed here. The data to be calculated may be at least one of the following data types: fixed point data, integer data, discrete data, continuous data, power data, floating point data, data representing length of 32 bit length floating point data, 16 bit length fixed point data, 16 bit length floating point data, 8 bit length fixed point data, 4 bit length fixed point data, etc.
102. And carrying out first quantization processing on the data to be operated to obtain appointed data to be operated.
The electronic equipment can input the data to be operated to the quantization model to perform first quantization processing, so that the appointed data to be operated are obtained. The quantization model can quantize the data to be operated to obtain the data of the type corresponding to the quantization model, namely, the data to be operated is appointed.
In a possible example, in step 102, the data to be operated is subjected to a first quantization process to obtain specified data to be operated, which may be implemented as follows:
inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated, wherein the quantization model comprises at least one quantization network layer, and the quantization network layer comprises at least one of the following data: a piecewise function, a quantized valued function, or a preset type network layer.
In this embodiment of the application, the electronic device may input part or all of the data in the data to be operated into the quantization model to obtain the specified data to be operated, for example, weight data in the data to be operated may be input into the quantization model, and for example, neuron data in the data to be operated may be input into the quantization model.
In an embodiment of the present application, the quantization model may include at least one quantization network layer, and the quantization network layer may include at least one of: the method comprises the steps of segmenting a function, quantizing a value function or presetting a type network layer, wherein the quantizing function can be realized through the segmenting function and/or the quantizing value function, in specific realization, the segmenting function can be represented by G (x), and the quantizing value function can be represented by T (x). The preset type network layer may include at least one network layer, and the network layer may be at least one of: convolutional layers, fully-connected layers, deconvolution layers, and the like, without limitation.
In specific implementation, g (x) is used to quantize the original data, t (x) is used to restore the quantized data to the data of the same type as the original data, the data read from the memory can be restored to the data of the same type as the original data through t (x), so as to facilitate subsequent neural network operations, the operation result can be quantized through g (x), and finally the quantized result is stored in the memory, so that the data storage amount of the memory can be reduced.
Several forms of the g (x) function are given as follows:
the first method comprises the following steps:
G(x)=ceil(a*x+b)
wherein x is input data, and a and b are constants.
And the second method comprises the following steps:
G(x)=floor(c*log(a*x+b)+d)
wherein x is input data, and a, b, c and d are constants.
And the third is that:
Figure BDA0002615152120000051
where x is input data and L, N is a constant, where the specific form of ReluN (N, x) is as follows:
Figure BDA0002615152120000061
of course, in a specific implementation, the piecewise function is not limited to the above type, and may also include other types of piecewise functions, which are not limited herein.
In addition, the functional form of the t (x) function is given as follows:
the first method comprises the following steps:
T(x)=x1
and the second method comprises the following steps:
T(x)=x2
and the third is that:
T(x)=(x1+x2)/2
wherein, x1 and x2 are both end points of the interval [ x1, x2], and certainly, the determination method of t (x) can be various, which is not limited herein, and the values of t (x) are all located in [ x1, x2 ].
For example, as shown in fig. 1B, the original network (non-quantization model) includes a convolution layer, an activation function layer, a full connection layer and an activation function layer, and the adjusted network (quantization model) includes g (x), t (x), convolution layer, activation function layer, g (x), t (x), full connection layer and activation function layer.
In one possible example, the inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated includes:
a1, determining target model configuration parameters corresponding to the data to be operated;
a2, configuring the quantization model according to the target model configuration parameters to obtain a configured quantization model;
and A3, inputting the data to be operated into a configured quantization model to obtain the specified data to be operated.
In specific implementation, the electronic device may pre-store a mapping relationship between data to be operated and a model configuration parameter, determine a target model configuration parameter corresponding to the data to be operated according to the mapping relationship, configure the quantization model according to the target model configuration parameter to obtain a configured quantization model, and input the data to be operated into the configured quantization model to obtain specified data to be operated.
Further, in a possible example, the step a1, determining the target model configuration parameters corresponding to the data to be calculated, may include the following steps:
a11, determining the target attribute information of the data to be operated;
a12, determining target model configuration parameters corresponding to the data to be operated according to the mapping relation between preset attribute information and the model configuration parameters.
In this embodiment, the attribute information may be at least one of the following: the data type of the data to be operated, the operation precision of the data to be operated, the data amount of the data to be operated, and the like are not limited herein.
In specific implementation, the electronic device may pre-store a mapping relationship between preset attribute information and model configuration parameters, and further, the electronic device may determine target attribute information of the data to be operated, and determine target model configuration parameters corresponding to the data to be operated according to the mapping relationship between the preset attribute information and the model configuration parameters.
In one possible example, the preset type network layer includes at least one of the following network layers: a convolutional layer, a full link layer, an anti-convolutional layer, a normalization layer.
The preset type network layer comprises at least one of the following network layers: a convolutional layer, a full link layer, an anti-convolutional layer, a normalization layer. For example, the predetermined type network layer may be a convolutional layer.
In one possible example, the target model configuration parameters include at least one of: the network layer comprises a segmentation function type and corresponding function adjusting parameters thereof, a quantization value function and corresponding function adjusting parameters thereof, the number of quantization intervals and configuration parameters of the preset type network layer.
In a specific implementation, the segmentation function types may include a plurality of types, for example, the above-listed segmentation functions, and may also be other types of segmentation functions, each segmentation function type has its corresponding function tuning parameter, and the function tuning parameter is used to tune the function, as follows:
G(x)=ceil(a*x+b)
wherein G (x) is a piecewise function, and a and b adjust parameters for their respective functions.
In addition, the quantization value function, i.e. t (x), may also include corresponding function adjustment parameters, the number of quantization intervals, e.g. N intervals, and configuration parameters of a preset type network layer, i.e. each network layer includes corresponding configuration parameters, taking the convolutional layer as an example, the configuration parameters may be: convolution kernel size, step size, etc., and are not limited herein.
For example, taking N intervals as an example, g (x) is calculated as follows:
Figure BDA0002615152120000071
Figure BDA0002615152120000081
according to the neuron data x, the neuron data fall at the positions L1-LN in the N intervals to obtain AN index of the table entry index, and further, according to the obtained index, a table is looked up to obtain a quantization result (namely, one is selected from A1-AN).
For example, taking N intervals as an example, t (x) is calculated by the following table:
Figure BDA0002615152120000082
the index of the entry of the designated neuron data (real neuron data) is obtained according to the quantization result, and further, the converted designated neuron data can be obtained by looking up a table according to the obtained index.
103. And carrying out neural network operation on the specified data to be operated to obtain a first operation result.
The electronic equipment can input the appointed data to be operated to the neural network model to carry out neural network operation, and a first operation result is obtained.
In a specific implementation, step 103 may include the following cases:
1. when the data to be operated comprise quantized neuron data, the quantized neuron data can be directly utilized to carry out neural network operation;
2. when the data to be operated comprise the quantized neuron data and the weight, the quantized neuron data and the weight can be subjected to neural network operation;
3. when the data to be operated comprise the neuron data and the quantized weight, the neuron data and the quantized weight can be operated by a neural network;
4. when the data to be operated includes the quantized neuron data and the quantized weight, the quantized neuron data and the quantized weight may be subjected to neural network operation.
104. And carrying out second quantization processing on the first operation result to obtain a second operation result.
The first quantization process may be the same as or different from the second quantization process. For example, the quantization model of the first quantization process may be the same as or different from the quantization model of the second quantization process.
In a specific implementation, taking quantization of neuron data as an example, a specific quantization process is as follows, assuming that neurons are quantized to N kinds, and the value range of the neuron itself is left, right. The mapping relationship g (x) from the real value of the neuron to the quantization result can be constructed as follows. G (x) can divide the range of the input data x into N sections, and the section lengths may or may not be equidistant, that is, the section lengths may or may not be the same. For the values in each interval, g (x) is mapped to a fixed quantized value, which yields N possible quantized values in total, respectively: a1, A2 … … AN.
The mapping scheme g (x) is chosen in many ways, providing several efficient functions:
G(x)=ceil(a*x+b)
G(x)=floor(c*log(a*x+b)+d)
Figure BDA0002615152120000091
Figure BDA0002615152120000092
by constructing the completion function g (x), each quantized interval and N quantized values corresponding to each interval can be obtained. In addition, a function T (x) for converting the quantized data into the real neuron data can be determined according to G (x).
Since the neuron values in an interval all correspond to a quantization result, t (x) can obtain a neural data in the original interval [ x1, x2] according to the quantization result, where t (x) can be determined in various ways, for example, as follows:
T(x)=x1
T(x)=x2
T(x)=(x1+x2)/2
in a specific implementation, in this embodiment of the application, the above g (x) and t (x) may also be applied to a training process of a neural network to ensure the accuracy of the quantized neural network, for example, the dense computation layer in the original network includes a fully connected layer and a convolutional layer, and the neural network operation is performed, which may refer to steps 101 to 104.
In a specific implementation, for a network layer in an original neural network, for example, a fully-connected layer, and/or a convolutional layer, functions g (x) and t (x) defined in the embodiment of the present application are added before the network layer, and then the entire neural network is trained to converge, where the training method is similar to a commonly used training method, for example, a back propagation algorithm, a random gradient descent method, and the like, which is not limited herein.
For example, as shown in fig. 1C, the original network includes a convolutional layer, and the adjusted network introduces G1(x) and T1(x), or introduces G2(x) and T2(x), and the adjusted network has a quantization operation function, and after the data to be operated is input to G1(x) and T1(x) and is subjected to first quantization processing, the specified data to be operated can be obtained, and then the specified data to be operated is input to the convolutional layer to obtain a first operation result, and then the first operation result is input to G2(x) and T2(x) and is subjected to second quantization processing to obtain a second operation result. The functions corresponding to T1(x) and T2(x) may be the same or different, and the functions corresponding to G1(x) and G2(x) may be the same or different.
For another example, as shown in fig. 1D, the original network includes a convolution layer, the adjusted network may include t (x), and the network is configured to implement a first quantization process, and t (x) is configured to implement an inverse quantization operation, so that quantized data is read, which may save storage resources, after the inverse quantization operation, a convolution operation is performed to obtain an operation result, and finally, the previous operation result is input to g (x) to perform a second quantization process, which further saves storage resources, and of course, multiple quantization processes may be further implemented, which may further save memory resources.
It can be seen that, in the quantization operation method according to the embodiment of the present application, data to be operated is obtained, data to be operated is subjected to first quantization processing to obtain designated data to be operated, the designated data to be operated is subjected to neural network operation to obtain a first operation result, and the first operation result is subjected to second quantization processing to obtain a second operation result.
Specifically, when the data to be operated is neuron data, the neuron storage amount in the middle process of the neural network calculation can be greatly reduced, the storage overhead of the neural network is saved, and the processing efficiency of the electronic equipment is improved. When the quantized data is weight data or gradient data, the corresponding weight storage amount or gradient storage amount can be optimized, and the data storage amount is reduced, so that the data amount to be transmitted is reduced when the data is transmitted, the transmission bandwidth is saved, and the transmission efficiency is improved.
Referring to fig. 2, fig. 2 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, where the electronic device includes a processor, a memory, a communication interface, and one or more programs; wherein the one or more programs are stored in the memory and configured to be executed by the processor, in an embodiment of the present application, the programs include instructions for:
acquiring data to be operated;
performing first quantization processing on the data to be operated to obtain appointed data to be operated;
performing neural network operation on the specified data to be operated to obtain a first operation result;
and carrying out second quantization processing on the first operation result to obtain a second operation result.
It can be seen that, in the electronic device according to the embodiment of the present application, data to be operated is obtained, data to be operated is subjected to first quantization processing to obtain designated data to be operated, the designated data to be operated is subjected to neural network operation to obtain a first operation result, and the first operation result is subjected to second quantization processing to obtain a second operation result.
Specifically, when the data to be operated is neuron data, the neuron storage amount in the middle process of the neural network calculation can be greatly reduced, the storage overhead of the neural network is saved, and the processing efficiency of the electronic equipment is improved. When the quantized data is weight data or gradient data, the corresponding weight storage amount or gradient storage amount can be optimized, and the data storage amount is reduced, so that the data amount to be transmitted is reduced when the data is transmitted, the transmission bandwidth is saved, and the transmission efficiency is improved.
In one possible example, the data to be operated on is at least one of:
neuron data, and/or gradient data, and/or weight data.
In one possible example, in the aspect of performing the first quantization processing on the data to be operated to obtain the specified data to be operated, the program includes instructions for performing the following steps:
inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated, wherein the quantization model comprises at least one quantization network layer, and the quantization network layer comprises at least one of the following data: a piecewise function, a quantized valued function, or a preset type network layer.
In one possible example, in the aspect of inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated, the program includes instructions for performing the following steps:
determining target model configuration parameters corresponding to the data to be operated;
configuring the quantization model according to the target model configuration parameters to obtain a configured quantization model;
and inputting the data to be operated into a configured quantization model to obtain the specified data to be operated.
In one possible example, the preset type network layer includes at least one of the following network layers: a convolutional layer, a full link layer, an anti-convolutional layer, a normalization layer.
In one possible example, the target model configuration parameters include at least one of: the network layer comprises a segmentation function type and corresponding function adjusting parameters thereof, a quantization value function and corresponding function adjusting parameters thereof, the number of quantization intervals and configuration parameters of the preset type network layer.
The above description has introduced the solution of the embodiment of the present application mainly from the perspective of the method-side implementation process. It is understood that the electronic device comprises corresponding hardware structures and/or software modules for performing the respective functions in order to realize the above-mentioned functions. Those of skill in the art will readily appreciate that the present application is capable of hardware or a combination of hardware and computer software implementing the various illustrative elements and algorithm steps described in connection with the embodiments provided herein. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiment of the present application, the electronic device may be divided into the functional units according to the method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated into one processing unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit. It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a quantization operation device provided in the present embodiment. The quantization operation device is applied to an electronic device, and the quantization operation device may include: an acquisition unit 301, a first quantization unit 302, an arithmetic unit 303, and a second quantization unit 304, wherein,
the acquiring unit 301 is configured to acquire data to be operated;
the first quantization unit 302 is configured to perform a first quantization process on the data to be operated to obtain specified data to be operated;
the operation unit 303 is configured to perform a neural network operation on the specified data to be operated to obtain a first operation result;
the second quantization unit 304 is configured to perform a second quantization on the first operation result to obtain a second operation result.
It can be seen that, in the quantization operation device according to the embodiment of the present application, to-be-operated data is obtained, first quantization processing is performed on the to-be-operated data to obtain designated to-be-operated data, neural network operation is performed on the designated to-be-operated data to obtain a first operation result, and second quantization processing is performed on the first operation result to obtain a second operation result.
Specifically, when the data to be operated is neuron data, the neuron storage amount in the middle process of the neural network calculation can be greatly reduced, the storage overhead of the neural network is saved, and the processing efficiency of the electronic equipment is improved. When the quantized data is weight data or gradient data, the corresponding weight storage amount or gradient storage amount can be optimized, and the data storage amount is reduced, so that the data amount to be transmitted is reduced when the data is transmitted, the transmission bandwidth is saved, and the transmission efficiency is improved.
In one possible example, the data to be operated on is at least one of:
neuron data, and/or gradient data, and/or weight data.
In a possible example, in terms of performing the first quantization processing on the data to be operated to obtain the specified data to be operated, the first quantization unit 302 is specifically configured to:
inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated, wherein the quantization model comprises at least one quantization network layer, and the quantization network layer comprises at least one of the following data: a piecewise function, a quantized valued function, or a preset type network layer.
In a possible example, in the aspect that part or all of the data to be operated is input into a quantization model to obtain the specified data to be operated, the first quantization unit 302 is specifically configured to:
determining target model configuration parameters corresponding to the data to be operated;
configuring the quantization model according to the target model configuration parameters to obtain a configured quantization model;
and inputting the data to be operated into a configured quantization model to obtain the specified data to be operated.
In one possible example, the preset type network layer includes at least one of the following network layers: a convolutional layer, a full link layer, an anti-convolutional layer, a normalization layer.
In one possible example, the target model configuration parameters include at least one of: the network layer comprises a segmentation function type and corresponding function adjusting parameters thereof, a quantization value function and corresponding function adjusting parameters thereof, the number of quantization intervals and configuration parameters of the preset type network layer.
It is to be understood that the functions of each program module of the quantization operation device in this embodiment may be specifically implemented according to the method in the foregoing method embodiment, and the specific implementation process may refer to the related description of the foregoing method embodiment, which is not described herein again.
Fig. 4 is a block diagram illustrating a combined processing device 400 according to an embodiment of the present disclosure. As shown in fig. 4, the combined processing device 400 includes an electronic apparatus 402, an interface device 404, other processing devices 406, and a storage device 408. Depending on the application scenario, one or more computing devices 410 may be included in the electronic device and may be configured to perform the operations described herein in conjunction with fig. 1A-3.
In various embodiments, the electronic device of the present disclosure may be configured to perform user-specified operations. In an exemplary application, the electronic device may be implemented as a single-core artificial intelligence processor or a multi-core artificial intelligence processor. Similarly, one or more computing devices included within the electronic device may be implemented as an artificial intelligence processor core or as part of a hardware structure of an artificial intelligence processor core. When multiple computing devices are implemented as an artificial intelligence processor core or as part of a hardware structure of an artificial intelligence processor core, electronic devices of the present disclosure may be viewed as having a single core structure or a homogeneous multi-core structure.
In an exemplary operation, the electronic device of the present disclosure may interact with other processing means through the interface means to collectively perform a user-specified operation. Other Processing devices of the present disclosure may include one or more types of general and/or special purpose processors, such as Central Processing Units (CPUs), Graphics Processing Units (GPUs), and artificial intelligence processors, depending on the implementation. These processors may include, but are not limited to, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic, discrete hardware components, etc., and the number may be determined based on actual needs. As previously mentioned, the electronic device of the present disclosure may be considered to have a single core structure or an isomorphic multi-core structure only in terms of that. However, when considered together, the electronic device and the other processing means may be considered to form a heterogeneous multi-core structure.
In one or more embodiments, the other processing devices may interface with external data and controls as the electronic equipment of the present disclosure (which may be embodied as artificial intelligence, e.g., computing devices associated with neural network operations), performing basic controls including, but not limited to, data handling, turning on and/or off of computing devices, and the like. In other embodiments, other processing devices may cooperate with the electronic device to perform computational tasks together.
In one or more embodiments, the interface device may be used to transfer data and control instructions between the electronic device and other processing devices. For example, the electronic device may obtain input data from other processing devices via the interface device, and write the input data into a storage device (or memory) on the electronic device chip. Further, the electronic device may obtain the control instruction from the other processing device via the interface device, and write the control instruction into the control cache on the electronic device slice. Alternatively or optionally, the interface device may also read data in a storage device of the electronic device and transmit the data to other processing devices.
Additionally or alternatively, the combined processing device of the present disclosure may further include a storage device. As shown in the figure, the storage means is connected to the electronic device and the other processing means, respectively. In one or more embodiments, the storage device may be used to store data for the electronic device and/or the other processing device. For example, the data may be data that is not fully retained within internal or on-chip storage of the electronic device or other processing apparatus.
In some embodiments, the present disclosure also discloses a chip (e.g., chip 502 shown in fig. 5). In one implementation, the Chip is a System on Chip (SoC) and is integrated with one or more combinatorial processing devices as shown in fig. 4. The chip may be connected to other associated components through an external interface device, such as external interface device 506 shown in fig. 5. The relevant component may be, for example, a camera, a display, a mouse, a keyboard, a network card, or a wifi interface. In some application scenarios, other processing units (e.g., video codecs) and/or interface modules (e.g., DRAM interfaces) and/or the like may be integrated on the chip. In some embodiments, the disclosure also discloses a chip packaging structure, which includes the chip. In some embodiments, the present disclosure also discloses a board card including the above chip packaging structure. The board will be described in detail below with reference to fig. 5.
Fig. 5 is a schematic diagram illustrating a structure of a board 500 according to an embodiment of the disclosure. As shown in FIG. 5, the board includes a memory device 504 for storing data, which includes one or more memory cells 510. The memory device may be connected and data transferred to and from the control device 508 and the chip 502 described above by means of, for example, a bus. Further, the board card further includes an external interface device 506 configured for data relay or transfer function between the chip (or the chip in the chip package structure) and an external device 512 (such as a server or a computer). For example, the data to be processed may be transferred to the chip by an external device through an external interface means. For another example, the calculation result of the chip may be transmitted back to an external device via the external interface device. According to different application scenarios, the external interface device may have different interface forms, for example, it may adopt a standard PCIE interface or the like.
In one or more embodiments, the control device in the disclosed card may be configured to regulate the state of the chip. Therefore, in an application scenario, the control device may include a single chip Microcomputer (MCU) for controlling the operating state of the chip.
From the above description in conjunction with fig. 4 and 5, it will be understood by those skilled in the art that the present disclosure also discloses an electronic device or apparatus, which may include one or more of the above boards, one or more of the above chips and/or one or more of the above combination processing devices.
According to different application scenarios, the electronic device or apparatus of the present disclosure may include a server, a cloud server, a server cluster, a data processing apparatus, a robot, a computer, a printer, a scanner, a tablet computer, a smart terminal, a PC device, a terminal of the internet of things, a mobile terminal, a mobile phone, a vehicle recorder, a navigator, a sensor, a camera, a video camera, a projector, a watch, an earphone, a mobile storage, a wearable device, a visual terminal, an autopilot terminal, a vehicle, a household appliance, and/or a medical device. The vehicle comprises an airplane, a ship and/or a vehicle; the household appliances comprise a television, an air conditioner, a microwave oven, a refrigerator, an electric cooker, a humidifier, a washing machine, an electric lamp, a gas stove and a range hood; the medical equipment comprises a nuclear magnetic resonance apparatus, a B-ultrasonic apparatus and/or an electrocardiograph. The electronic device or apparatus of the present disclosure may also be applied to the fields of the internet, the internet of things, data centers, energy, transportation, public management, manufacturing, education, power grid, telecommunications, finance, retail, construction site, medical, and the like. Further, the electronic device or apparatus disclosed herein may also be used in application scenarios related to artificial intelligence, big data, and/or cloud computing, such as a cloud end, an edge end, and a terminal. In one or more embodiments, a computationally powerful electronic device or apparatus according to the present disclosure may be applied to a cloud device (e.g., a cloud server), while a less power-consuming electronic device or apparatus may be applied to a terminal device and/or an edge-end device (e.g., a smartphone or a camera). In one or more embodiments, the hardware information of the cloud device and the hardware information of the terminal device and/or the edge device are compatible with each other, so that appropriate hardware resources can be matched from the hardware resources of the cloud device to simulate the hardware resources of the terminal device and/or the edge device according to the hardware information of the terminal device and/or the edge device, and uniform management, scheduling and cooperative work of end-cloud integration or cloud-edge-end integration can be completed.
It is noted that for the sake of brevity, the present disclosure describes some methods and embodiments thereof as a series of acts and combinations thereof, but those skilled in the art will appreciate that the aspects of the present disclosure are not limited by the order of the acts described. Accordingly, one of ordinary skill in the art will appreciate that certain steps may be performed in other sequences or simultaneously, in accordance with the disclosure or teachings of the present disclosure. Further, those skilled in the art will appreciate that the embodiments described in this disclosure are capable of alternative embodiments, in which acts or modules are involved, which are not necessarily required to practice one or more aspects of the disclosure. In addition, the present disclosure may focus on the description of some embodiments, depending on the solution. In view of the above, those skilled in the art will understand that portions of the disclosure that are not described in detail in one embodiment may also be referred to in the description of other embodiments.
In particular implementation, based on the disclosure and teachings of the present disclosure, one skilled in the art will appreciate that the several embodiments disclosed in the present disclosure may be implemented in other ways not disclosed herein. For example, as for the units in the foregoing embodiments of the electronic device or apparatus, the units are divided based on the logic functions, and there may be other dividing manners in actual implementation. Also for example, multiple units or components may be combined or integrated with another system or some features or functions in a unit or component may be selectively disabled. The connections discussed above in connection with the figures may be direct or indirect couplings between the units or components in terms of connectivity between the different units or components. In some scenarios, the aforementioned direct or indirect coupling involves a communication connection utilizing an interface, where the communication interface may support electrical, optical, acoustic, magnetic, or other forms of signal transmission.
In the present disclosure, units described as separate parts may or may not be physically separate, and parts shown as units may or may not be physical units. The aforementioned components or units may be co-located or distributed across multiple network elements. In addition, according to actual needs, part or all of the units can be selected to achieve the purpose of the solution of the embodiment of the present disclosure. In addition, in some scenarios, multiple units in embodiments of the present disclosure may be integrated into one unit or each unit may exist physically separately.
In some implementation scenarios, the integrated units may be implemented in the form of software program modules. If implemented in the form of software program modules and sold or used as a stand-alone product, the integrated units may be stored in a computer readable memory. In this regard, when aspects of the present disclosure are embodied in the form of a software product (e.g., a computer-readable storage medium), the software product may be stored in a memory, which may include instructions for causing a computer device (e.g., a personal computer, a server, or a network device, etc.) to perform some or all of the steps of the methods described in embodiments of the present disclosure. The Memory may include, but is not limited to, a usb disk, a flash disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.
In other implementation scenarios, the integrated unit may also be implemented in hardware, that is, a specific hardware circuit, which may include a digital circuit and/or an analog circuit, etc. The physical implementation of the hardware structure of the circuit may include, but is not limited to, physical devices, which may include, but are not limited to, transistors or memristors, among other devices. In view of this, the various devices described herein (e.g., computing devices or other processing devices) may be implemented by suitable hardware processors, such as CPUs, GPUs, FPGAs, DSPs, ASICs, and the like. Further, the aforementioned storage unit or storage device may be any suitable storage medium (including magnetic storage medium or magneto-optical storage medium, etc.), and may be, for example, a variable Resistive Memory (RRAM), a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), an Enhanced Dynamic Random Access Memory (EDRAM), a High Bandwidth Memory (HBM), a Hybrid Memory Cube (HMC), a ROM, a RAM, or the like.
Embodiments of the present application also provide a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, the computer program enabling a computer to execute part or all of the steps of any one of the methods described in the above method embodiments, and the computer includes an electronic device.
Embodiments of the present application also provide a computer program product comprising a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any of the methods as described in the above method embodiments. The computer program product may be a software installation package, the computer comprising an electronic device.
While various embodiments of the present disclosure have been shown and described herein, it will be obvious to those skilled in the art that such embodiments are provided by way of example only. Numerous modifications, changes, and substitutions will occur to those skilled in the art without departing from the spirit and scope of the present disclosure. It should be understood that various alternatives to the embodiments of the disclosure described herein may be employed in practicing the disclosure. It is intended that the following claims define the scope of the disclosure and that equivalents or alternatives within the scope of these claims be covered thereby.

Claims (11)

1. A method of quantization operations, the method comprising:
acquiring data to be operated;
performing first quantization processing on the data to be operated to obtain appointed data to be operated;
performing neural network operation on the specified data to be operated to obtain a first operation result;
and carrying out second quantization processing on the first operation result to obtain a second operation result.
2. The method of claim 1, wherein the data to be operated on is at least one of:
neuron data, and/or gradient data, and/or weight data.
3. The method according to claim 1 or 2, wherein the performing a first quantization process on the data to be operated to obtain specified data to be operated includes:
inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated, wherein the quantization model comprises at least one quantization network layer, and the quantization network layer comprises at least one of the following data: a piecewise function, a quantized valued function, or a preset type network layer.
4. The method according to claim 3, wherein the inputting part or all of the data to be operated into a quantization model to obtain the specified data to be operated comprises:
determining target model configuration parameters corresponding to the data to be operated;
configuring the quantization model according to the target model configuration parameters to obtain a configured quantization model;
and inputting the data to be operated into a configured quantization model to obtain the specified data to be operated.
5. The method according to claim 3 or 4, wherein the preset type network layer comprises at least one of the following network layers: a convolutional layer, a full link layer, an anti-convolutional layer, a normalization layer.
6. The method of claim 4, wherein the target model configuration parameters comprise at least one of: the network layer comprises a segmentation function type and corresponding function adjusting parameters thereof, a quantization value function and corresponding function adjusting parameters thereof, the number of quantization intervals and configuration parameters of the preset type network layer.
7. A quantization operation apparatus, characterized in that the apparatus comprises:
the acquisition unit is used for acquiring data to be operated;
the first quantization unit is used for performing first quantization processing on the data to be operated to obtain appointed data to be operated;
the operation unit is used for carrying out neural network operation on the specified data to be operated to obtain a first operation result;
and the second quantization unit is used for performing second quantization processing on the first operation result to obtain a second operation result.
8. A neural network chip for performing the method of any one of claims 1-6.
9. The utility model provides a board card, its characterized in that, the board card includes: a memory device, an interface apparatus and a control device and the neural network chip of claim 8;
wherein, the neural network chip is respectively connected with the storage device, the control device and the interface device;
the storage device is used for storing data;
the interface device is used for realizing data transmission between the chip and external equipment;
and the control device is used for monitoring the state of the chip.
10. An electronic device, characterized in that the electronic device is configured to perform the method according to any one of claims 1 to 6, or the electronic device comprises the neural network chip according to claim 8, or the electronic device comprises the board according to claim 9.
11. A computer-readable storage medium, characterized in that a computer program for electronic data exchange is stored, wherein the computer program causes a computer to perform the method according to any one of claims 1-6.
CN202010769615.6A 2020-08-03 2020-08-03 Quantitative operation method and related product Pending CN111967588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010769615.6A CN111967588A (en) 2020-08-03 2020-08-03 Quantitative operation method and related product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010769615.6A CN111967588A (en) 2020-08-03 2020-08-03 Quantitative operation method and related product

Publications (1)

Publication Number Publication Date
CN111967588A true CN111967588A (en) 2020-11-20

Family

ID=73363714

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010769615.6A Pending CN111967588A (en) 2020-08-03 2020-08-03 Quantitative operation method and related product

Country Status (1)

Country Link
CN (1) CN111967588A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238976A (en) * 2021-06-08 2021-08-10 中科寒武纪科技股份有限公司 Cache controller, integrated circuit device and board card

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113238976A (en) * 2021-06-08 2021-08-10 中科寒武纪科技股份有限公司 Cache controller, integrated circuit device and board card

Similar Documents

Publication Publication Date Title
CN109993296B (en) Quantitative implementation method and related product
CN109101273B (en) Neural network processing device and method for executing vector maximum value instruction
JP6761134B2 (en) Processor controllers, methods and devices
CN111488976B (en) Neural network computing device, neural network computing method and related products
CN111488963B (en) Neural network computing device and method
CN111967588A (en) Quantitative operation method and related product
CN113918221A (en) Operation module, flow optimization method and related product
CN112084023A (en) Data parallel processing method, electronic equipment and computer readable storage medium
CN114692824A (en) Quantitative training method, device and equipment of neural network model
CN115549854A (en) Cyclic redundancy check method, cyclic redundancy check device, storage medium and electronic device
CN112948001A (en) Method for setting tensor hardware configuration, readable storage medium and device
CN113238976A (en) Cache controller, integrated circuit device and board card
CN112817898A (en) Data transmission method, processor, chip and electronic equipment
CN113238975A (en) Memory, integrated circuit and board card for optimizing parameters of deep neural network
CN112801276A (en) Data processing method, processor and electronic equipment
CN112232498B (en) Data processing device, integrated circuit chip, electronic equipment, board card and method
WO2022001438A1 (en) Computing apparatus, integrated circuit chip, board card, device and computing method
CN114692864A (en) Quantization method, quantization device, storage medium, and electronic apparatus
CN114692825A (en) Quantitative training method, device and equipment of neural network model
CN113918222A (en) Assembly line control method, operation module and related product
CN113918220A (en) Assembly line control method, operation module and related product
CN111368985B (en) Neural network computing device and method
CN114444677A (en) Device, board card and method for sparse training and readable storage medium
CN113469333A (en) Artificial intelligence processor, method and related product for executing neural network model
WO2019165939A1 (en) Computing device, and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination