CN115037608A - Quantization method, device, equipment and readable storage medium - Google Patents
Quantization method, device, equipment and readable storage medium Download PDFInfo
- Publication number
- CN115037608A CN115037608A CN202110240917.9A CN202110240917A CN115037608A CN 115037608 A CN115037608 A CN 115037608A CN 202110240917 A CN202110240917 A CN 202110240917A CN 115037608 A CN115037608 A CN 115037608A
- Authority
- CN
- China
- Prior art keywords
- quantization
- parameter
- module
- neural network
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013139 quantization Methods 0.000 title claims abstract description 296
- 238000000034 method Methods 0.000 title claims abstract description 129
- 238000004891 communication Methods 0.000 claims abstract description 57
- 238000012545 processing Methods 0.000 claims abstract description 20
- 238000013473 artificial intelligence Methods 0.000 claims description 51
- 238000013528 artificial neural network Methods 0.000 claims description 51
- 210000002569 neuron Anatomy 0.000 claims description 28
- 230000000306 recurrent effect Effects 0.000 claims description 21
- 238000013527 convolutional neural network Methods 0.000 claims description 14
- 239000000654 additive Substances 0.000 claims description 10
- 230000000996 additive effect Effects 0.000 claims description 10
- 238000012549 training Methods 0.000 claims description 6
- 230000011664 signaling Effects 0.000 claims description 5
- 210000004027 cell Anatomy 0.000 claims description 2
- 230000006870 function Effects 0.000 description 17
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 6
- 238000011002 quantification Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000001514 detection method Methods 0.000 description 3
- 238000012886 linear function Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000007620 mathematical function Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000001186 cumulative effect Effects 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000007599 discharging Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000007667 floating Methods 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0893—Assignment of logical groups to network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0894—Policy-based network configuration management
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application discloses a quantization method, a quantization device, quantization equipment and a readable storage medium, wherein the method comprises the following steps: determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an AI module; and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter. In the embodiment of the application, the AI module is quantized through a quantization strategy, a quantization level and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced, and the system performance is improved.
Description
Technical Field
The application belongs to the technical field of communication, and particularly relates to a method, a device, equipment and a readable storage medium for Artificial Intelligence (AI) module quantification.
Background
Artificial intelligence is currently in wide use in a variety of fields. In a communication network, artificial intelligence may be implemented through the AI module. However, there is no flow for quantizing the AI module, which results in the complexity increase of the AI module.
Disclosure of Invention
Embodiments of the present application provide a quantization method, apparatus, device, and readable storage medium, which solve the problem of how to reduce the complexity of an AI module.
In a first aspect, a quantization method is provided, which is performed by a first communication device, and includes:
determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an Artificial Intelligence (AI) module;
and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
In a second aspect, an apparatus for quantization is provided, which is applied to a first communication device, and includes:
a first determining module, configured to determine a quantization policy, a quantization level, and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
and the quantization module is used for performing quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
In a third aspect, a communication device is provided, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which program, when executed by the processor, carries out the steps of the method according to the first aspect.
In a fourth aspect, a readable storage medium is provided, on which a program or instructions are stored, which when executed by a processor, implement the steps of the method according to the first aspect.
In a fifth aspect, a program product is provided, which is stored on a non-volatile storage medium, which program product is executable by at least one processor to implement the steps of the method according to the first aspect.
In a sixth aspect, a chip is provided, the chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a program or instructions to implement the method according to the first aspect.
In the embodiment of the application, the AI module is quantized through a quantization strategy, a quantization level and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced, and the system performance is improved.
Drawings
Fig. 1 is a schematic diagram of a wireless communication system to which embodiments of the present application are applicable;
FIG. 2 is a flow chart of a method of quantization provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of an apparatus for quantization provided by an embodiment of the present application;
fig. 4 is a schematic diagram of a terminal according to an embodiment of the present application;
fig. 5 is a schematic diagram of a network-side device according to an embodiment of the present application.
Detailed Description
In order to facilitate understanding of the embodiments of the present application, the following technical points are introduced below: and (5) artificial intelligence.
Artificial intelligence is currently in wide use in a variety of fields. The AI module for implementing artificial intelligence can be implemented in various ways, such as neural networks, decision trees, support vector machines, bayesian classifiers, and the like.
Taking a neural network as an example, parameters of the neural network are optimized through an optimization algorithm. An optimization algorithm is one type of algorithm that minimizes or maximizes an objective function (sometimes called a loss function). Whereas the objective function is often a mathematical combination of model parameters and data. For example, given data X and its corresponding label Y, a neural network model f () is constructed, with the model, a prediction output f (X) can be obtained from the input X, and the difference (f (X) -Y) between the predicted value and the true value can be calculated, which is a loss function. The objective is to find the appropriate W, b to minimize the value of the above-mentioned loss function, the smaller the loss value, the closer the model is to the real situation.
The current common optimization algorithm is basically based on an error Back Propagation (BP) algorithm. The basic idea of the BP algorithm is that the learning process consists of two processes, forward propagation of signals and back propagation of errors. In forward propagation, an input sample is transmitted from an input layer, processed layer by each hidden layer, and transmitted to an output layer. If the actual output of the output layer does not match the expected output, the error is propagated back to the error stage. The error back transmission is to back transmit the output error to the input layer by layer through the hidden layer in a certain form, and distribute the error to all units of each layer, thereby obtaining the error signal of each layer of units, and the error signal is used as the basis for correcting the weight of each unit. The weight adjustment process of each layer of signal forward propagation and error backward propagation is performed in cycles. And (4) continuously adjusting the weight value, namely, a learning and training process of the network. This process continues until the error in the output of the network is reduced to an acceptable level, or until a predetermined number of learning cycles.
Common optimization algorithms include Gradient Descent (SGD), mini-batch Gradient Descent (mini-batch Gradient Descent), Momentum method (Momentum), Nesterov (named by the inventor, specifically, Stochastic Gradient Descent with Momentum), ADAptive Gradient Descent (ADAptive Gradient Descent, ADAptive delta), Root Mean Square error deceleration (Root Mean Square screw, RMSprop), ADAptive Momentum Estimation (ADAptive Momentum Estimation, Adam), and the like.
When errors are reversely propagated, the optimization algorithms obtain the gradients by solving the derivatives/partial derivatives of the current neurons according to the errors/losses obtained by the loss functions and adding the learning rate, the previous gradients/derivatives/partial derivatives and other influences to obtain the gradients, and then transmitting the gradients to the previous layer.
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in other sequences than those illustrated or otherwise described herein, and that the terms "first" and "second" are generally used herein in a generic sense to distinguish one element from another, and not necessarily from another element, such as a first element which may be one or more than one. In the specification and claims, "and" represents at least one of connected objects, and a character "/" generally indicates that a preceding and succeeding related object is in an "or" relationship.
It is noted that the techniques described in the embodiments of the present application are not limited to Long Term Evolution (LTE)/LTE Evolution (LTE-Advanced) systems, but may also be used in other wireless communication systems, such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Orthogonal Frequency Division Multiple Access (OFDMA), Single-carrier Frequency-Division Multiple Access (SC-FDMA), and other systems. The terms "system" and "network" are often used interchangeably in embodiments of the present application, and the described techniques may be used for both the above-mentioned systems and radio technologies, as well as for other systems and radio technologies. However, the following description describes a New Radio (NR) system for purposes of example, and NR terminology is used in much of the description below, and the techniques may also be applied to applications other than NR system applications, such as 6 th generation (6 th generation) NR systems th Generation, 6G) communication system.
Fig. 1 shows a block diagram of a wireless communication system to which embodiments of the present application are applicable. The wireless communication system includes a terminal 11 and a network-side device 12. Wherein, the terminal 11 may also be called as a terminal Device or a User Equipment (UE), the terminal 11 may be a Mobile phone, a Tablet Personal Computer (Tablet Personal Computer), a Laptop Computer (Laptop Computer) or a notebook Computer, a Personal Digital Assistant (PDA), a palmtop Computer, a netbook, a super-Mobile Personal Computer (UMPC), a Mobile Internet Device (MID), a Wearable Device (Wearable Device) or a vehicle-mounted Device (VUE), a pedestrian terminal (PUE), and other terminal side devices, the Wearable Device includes: bracelets, earphones, glasses and the like. It should be noted that the embodiment of the present application does not limit the specific type of the terminal 11. The network-side device 12 may be a Base station or a core network-side device, wherein the Base station may be referred to as a node B, an evolved node B, an access Point, a Base Transceiver Station (BTS), a radio Base station, a radio transceiver, a Basic Service Set (BSS), an Extended Service Set (ESS), a node B, an evolved node B (eNB), a home node B, a home evolved node B, a WLAN access Point, a WiFi node, a Transmit Receive Point (TRP), or some other suitable term in the field, as long as the same technical effect is achieved, the Base station is not limited to a specific technical vocabulary, and it should be noted that in the embodiment of the present application, only the Base station in the NR system is taken as an example, but the specific type of the Base station is not limited.
A quantization method, apparatus, device and readable storage medium provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 2, an embodiment of the present application provides a quantization method, where an execution subject of the quantization method may be a first communication device, and the quantization method includes:
step 201: determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an AI module;
step 202: and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
The quantization strategy may also be referred to as a quantization method, which is a method for quantizing the parameters of the AI module.
The quantization scale may represent an accuracy of parameter quantization of the AI module, for example, the higher the quantization scale is, the more accurate the parameter of the AI module is, the closer to the original parameter is; the lower the quantization level, the coarser the parameter of the AI module, the further away from the original parameter. For example, the quantization levels are divided by bits, and the quantization level X bits represent that the parameter of the AI module is quantized into X bits, so that the larger the value of X, the more bits the parameter of the AI module occupies, where X is a positive integer. The single precision type (float type) commonly used in the computer at present occupies 32 bits, the double precision type (double type) occupies 64 bits, and the quantization is actually very high precision.
The quantization configuration parameter is used to indicate a configuration for quantizing the AI module, for example, the quantization configuration parameter includes one or more of the following items: what quantization strategy the AI module adopts and how the details of the quantization strategy are configured, whether all parameters of the AI module use a uniform quantization level, whether the quantization level of the multiplicative coefficient of the AI module is the same as the quantization level of the additive coefficient, what the quantization level of the AI module is, how many bits the parameters of the AI module quantize, and the like.
For example, the quantization level is configured to be 8 bits, the quantization strategy is a direct quantization method, all parameters of the AI module are quantized from floating point numbers to 8 bits, and assuming that the AI module is a neural network, multiplicative coefficients and additive coefficients of all neurons are quantized to 8 bits.
In one embodiment of the present application, the quantization strategy may include one or more of:
(1) a direct quantization method;
the direct quantization method is to quantize each parameter of the AI module directly according to the quantization level and/or the quantization configuration parameter.
(2) Homogeneous quantification (uniformity quantification) method;
the uniform quantization method is a quantization method in which parameters of the AI module (for example, a value-taking range of input parameters) are divided at equal intervals.
(3) A non-uniform quantization method;
the non-uniform quantization method is a quantization method in which quantization intervals are not equal in the dynamic range of parameters (e.g., input parameters) of the AI module.
For example, the quantization intervals/quantization levels of different input intervals are determined according to the input probability density, probability distribution, cumulative probability distribution, etc. For example, for an interval with a small input value, the quantization interval is also small; conversely, for an interval with a large input value, the quantization interval is large.
(4) Weight sharing quantization method;
(5) a block quantization method;
in the weight-sharing quantization method and the grouping quantization method, parameters of the AI module may be divided into a plurality of sets, and elements in each set share one value.
(6) Transform domain quantization;
transform-domain quantization refers to transforming parameters (such as weights, offsets, convolution kernels, etc.) of an AI module into another domain, such as a frequency domain, an S domain, a Z domain, etc., performing quantization operation in another domain, and then performing inverse transformation.
Illustratively, the network convolution kernel is first transformed into the frequency domain, then randomly hashed in the frequency domain, and a lower number of hash bits is used for the less important high frequency portions to achieve higher compression.
(7) A parametric coding quantization method;
the parameter coding quantization method is to code parameters of the AI module, and the coding method includes but is not limited to: lossy coding, lossless coding (e.g., huffman coding), and the like.
(8) Product quantification (Product quantification) method.
The product quantization method is to divide the network weight into a plurality of subspaces and perform quantization operation on each subspace, for example, performing weight sharing quantization method on each subspace.
Alternatively, the above-mentioned quantization strategies may be cascaded or combined. Illustratively, the quantization strategy includes: the method comprises the steps of firstly carrying out uniform quantization on a network through the uniform quantization method, then carrying out quantization on the uniformly quantized weight by using the weight sharing quantization method, and then carrying out quantization on the weight according to the parameter coding quantization method.
In an embodiment of the present application, the step of quantizing the parameter of the first module includes:
and in the network training stage, carrying out quantization processing on the parameters of the first module according to the quantization strategy, the quantization grade and/or the quantization configuration parameters.
For example, a common gradient calculation method is used to obtain a gradient corresponding to each weight, the weights are grouped according to previous weights, the gradient values of the weights in the same group are accumulated to obtain an updated amount in the network training of the cluster center, and the cluster center value is subtracted by the product of the updated amount and the learning rate to obtain the updated cluster center in the training of the current round.
In one embodiment of the present application, the parameter dividing method in the grouping quantization method includes:
(1) a random division mode;
in the random division manner, the parameters of the AI modules may be grouped in a random manner.
(2) Determining a set identifier where the parameter is located according to the identifier of the parameter;
the above-described mode (2) may also be referred to as a direct addressing method. For example, the parameters of the AI modules are sorted, the IDs of the respective parameters are determined, then the parameter IDs are input into a linear function, an N-order function, or other common functions to obtain a new value X, and the set ID of the network parameter is obtained through X. Wherein the linear function comprises a function having an output equal to an input.
In an embodiment of the present application, the determining, according to the identifier of the parameter, the identifier of the set where the parameter is located includes:
obtaining a first numerical value according to the identifier of the parameter;
determining a set identifier where the parameter is located according to the first numerical value;
wherein, according to the first value, determining the set identifier where the parameter is located includes one or more of:
(a) rounding the first numerical value to obtain a set identifier where the parameter is located;
(b) at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;
(c) and dividing the first numerical value by a preset value, and using the obtained remainder as the set identifier of the parameter.
Optionally, the identification parameter of the parameter is input into a linear function or other common mathematical function to obtain the first value (X). The common mathematical functions include addition and subtraction of multipliers, power N, root opening number N, logarithm, derivation, partial derivation, and other combinations of various common mathematical operations. N is any number, for example, N may be a positive or negative number or 0, real or complex.
Optionally, the obtaining, by X, a set ID where the network parameter is located includes:
a) and X is rounded, namely the set ID. Rounding includes rounding up, rounding down, and the like. For example, if X is 3.23, then the set ID may be 3 or 4, where 3 represents rounding down or rounding down and 4 represents rounding up.
b) X takes at least one bit and combines into a set ID.
For example, X is 3215217, taking the 2 nd and 4 th bits from the front with a set ID of 25, or the 1 st and 3 rd bits from the back with a set ID of 72 or 27.
For example, X is 872351.1237, the 1 st and 2 nd digits after the decimal point are taken, and the set ID is 12 or 21, or the 1 st and 2 nd digits before the decimal point are taken, and the set ID is 51 or 15, or the 2 nd digits before the decimal point and the 3 rd digits after the decimal point are taken, and the set ID is 53 or 35.
Exemplarily, the following steps are carried out:
(i) at least two bits are taken, and the values on the bits are arranged according to a certain rule to form a set ID.
For example, from front to back in digits, or from back to front in digits, or from large to small in numerical values, or from small to large in numerical values. For example, X is 67429815, taking the 1 st, 3 rd and 5 th bits from front to back, the values on these bits are 5, 8 and 2, and the set ID is 285 according to the number of bits from front to back; from back to front by number of digits, the set ID is 582; if the numerical value is from large to small, the set ID is 852; if the value is from small to large, the set ID is 258.
(ii) If a bit is not present, the value of the bit is 0, or some other default value.
For example, if X is 52 and the 1 st, 3 rd and 5 th bits from the front to the back are taken, the values of the corresponding bits are 5, 0 and 0.
c) And dividing X by a certain number to obtain the remainder.
For example, if X is 752 and the number is 11, the set ID is 4 — 752mod (11).
d) The set IDs are randomly divided according to X.
(3) And (5) clustering division mode.
And grouping the parameters of the AI modules according to the clustering centers in a clustering division mode.
For example, if the data is divided into K groups in advance, K objects are randomly selected as initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is assigned to the cluster center closest to the object. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.
In one embodiment of the present application, the quantization strategy and/or quantization configuration parameter is determined according to one or more of the following:
(1) reporting by a terminal;
that is, the network side may obtain the quantization strategy and/or the quantization configuration parameter according to the manner reported by the terminal.
(2) The capabilities of the terminal;
that is, the quantization strategy and/or quantization configuration parameters may be taken as capabilities of the terminal.
(3) And (5) network side configuration.
That is, the terminal side may obtain the quantization strategy and/or the quantization configuration parameter according to the configuration of the network side.
For example, the network side performs configuration, activation or triggering through Radio Resource Control (RRC), Media Access Control Element (MAC CE), or Downlink Control Information (DCI).
In an embodiment of the present application, the quantization strategy is a direct quantization method, and the step of performing quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter includes:
and carrying out quantization processing on the parameter of the first module according to the quantization grade and/or the quantization configuration parameter of the first module.
In one embodiment of the present application, the quantization scale is determined according to one or more of:
(1) information relating to a parameter of the first module;
optionally, the information related to the parameter of the first module includes: the size of the parameter;
for example, the larger the parameter, the higher the quantization level; alternatively, the larger the parameter, the lower the quantization level.
For another example, the smaller the parameter, the lower the quantization level, or the smaller the parameter, the higher the quantization level.
That is, different quantization levels may be determined according to the size of the parameter of the AI module. For example, the larger the parameters of the AI module, the more granular it is; the smaller the parameters of the AI module, the coarser the quantization. Or, the larger the parameter of the AI module, the coarser the quantification; the smaller the parameters of the AI module, the more granular it is.
(2) Reporting by a terminal;
that is, the network side can obtain the quantization level according to the mode reported by the terminal.
(3) The capabilities of the terminal;
that is, the quantization level may be the capability of the terminal.
(4) Network side configuration;
(5) an output accuracy requirement of the first module;
for example, the higher the requirement of the output accuracy of the AI module, the higher the quantization level.
(6) Performance requirements of the first module.
For example, the performance requirements of the AI modules are divided into multiple levels, with different levels of performance requirements corresponding to different quantization levels.
In one embodiment of the present application, the higher the quantization level is, the more precise the parameter quantization of the first module is, or the lower the quantization level is, the coarser the parameter quantization of the first module is.
In one embodiment of the present application, the first module is of the type of a neural network;
wherein the quantization levels of neurons of different layers in the neural network are the same;
and/or the presence of a gas in the atmosphere,
the quantization levels of the neurons of the same layer in the neural network are the same;
and/or the presence of a gas in the gas,
the quantization levels of multiplicative coefficients in the neural network are the same as the quantization levels of additive coefficients.
In one embodiment of the present application, the first module is of the type of a neural network;
wherein the quantization levels of neurons of different layers in the neural network are different;
and/or the presence of a gas in the atmosphere,
the quantization levels of the neurons of the same layer in the neural network are different;
and/or the presence of a gas in the atmosphere,
the quantization levels of multiplicative coefficients in the neural network are different from the quantization levels of additive coefficients.
In one embodiment of the present application, the first module is of a type of Recurrent Neural Network (RNN);
wherein the quantization level of the parameter (such as multiplicative coefficient and additive coefficient) of the memory unit in the recurrent neural network is the same as the quantization level of the parameter of the non-memory neuron (including neuron and unit of non-neuron) in the recurrent neural network or the quantization level of the non-memory parameter of the neuron in the recurrent neural network,
or,
the quantization level of the parameter of the memory unit in the recurrent neural network is different from the quantization level of the parameter of the non-memory neuron in the recurrent neural network or different from the quantization level of the non-memory parameter of the neuron in the recurrent neural network.
In one embodiment of the present application, the type of the first module is Convolutional Neural Networks (CNN);
wherein,
the quantization scale of the parameter of the convolution kernel of the convolutional neural network is the same as or different from the quantization scale of the parameter of the non-convolution kernel in the convolutional neural network,
or,
the quantization level of pooled parameters (multiplicative coefficients, additive coefficients) of the convolutional neural network is the same or different from the quantization level of non-pooled parameters in the convolutional neural network.
In one embodiment of the present application, the input or output of the first module is first information;
wherein the first information comprises one or more of:
(1) a reference signal;
the Reference Signal is used for Signal processing, including Signal detection, filtering, equalization, etc., and includes, for example, Demodulation Reference signals (DMRSs), Sounding Reference Signals (SRS), Synchronization Signal Blocks (SSB), Tracking Reference Signals (TRSs), Phase-tracking Reference signals (PTRS), Channel State Information Reference signals (CSI-RS), etc.
(2) A signal carried by a channel;
the channel may include one or more of: a Physical Downlink Control Channel (PDCCH), a Physical Downlink Shared Channel (PDSCH), a Physical Uplink Control Channel (PUCCH), a Physical Uplink Shared Channel (PUSCH), a Physical Random Access Channel (PRACH), a Physical Broadcast Channel (PBCH), and the like.
(3) Channel state information;
optionally, the channel state information includes channel state information feedback information and/or channel state information of uplink and downlink partial reciprocity in a Frequency Division multiplexing (FDD) system.
Wherein the channel state information feedback information comprises one or more of: channel related information, Channel matrix related information, Channel characteristic information, Channel matrix characteristic information, Precoding Matrix Indicator (PMI), Rank Indicator (RI), CSI-RS Resource Indicator (CRI), Channel Quality Indicator (CQI), Layer Indicator (LI), and the like.
For an FDD system, according to partial reciprocity, a base station acquires angle and time delay information according to an uplink channel, the angle information and the time delay information can be notified to UE through a CSI-RS precoding or direct indication method, and the UE reports according to the indication of the base station or selects and reports in the indication range of the base station, so that the calculation amount of the UE and the cost of CSI reporting are reduced.
(4) Beam information;
the beam information includes one or more of: beam quality, indication information of a beam (reference signal ID), beam failure indication information, new beam indication information in beam failure recovery. The method is used for beam management, including beam measurement, beam reporting, beam prediction, beam failure detection, beam failure recovery and new beam indication in the beam failure recovery.
(5) Channel prediction information;
the channel prediction information includes: prediction of channel state information, beam prediction.
(6) Interference information;
the interference information includes one or more of: intra-cell interference information, inter-cell interference information, out-of-band interference information, inter-modulation interference information, and the like.
(7) Positioning information (alternatively referred to as trajectory information);
the estimated specific position (including horizontal position and/or vertical position) or future possible trajectory of the UE, or information to assist position estimation or trajectory estimation, through Reference signals (e.g. Sounding Reference Signal (SRS)).
(8) Prediction information of high-level services and/or parameters;
(9) management information of high-level services and/or parameters;
for example, the prediction information or management information may include throughput, required packet size, traffic demand, speed of movement, and/or noise information, among others
(10) And controlling signaling.
Such as signaling related to power control and signaling related to beam management.
In one embodiment of the present application, in a case that the output of the first module is first information, the method further includes:
and sending the first information to a second communication device, or sending the first information to a second module of the first communication device.
The first information includes that the first communication device is a terminal and the second communication device is a network side device, or the first communication device is a network side device and the second communication device is a terminal; or the first communication device is a first terminal and the second communication device is a second terminal; or, the first communication device is a first network side device and the second communication device is a second network side device.
In the embodiment of the application, the AI module is quantized through a quantization strategy, a quantization level and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced, and the system performance is improved.
Referring to fig. 3, an embodiment of the present application provides an apparatus for quantization, which is applied to a first communication device, where the apparatus 300 includes:
a first determining module 301, configured to determine a quantization policy, a quantization level, and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
a quantization module 302, configured to perform quantization processing on the parameter of the first module according to the quantization policy, the quantization level, and/or the quantization configuration parameter.
In one embodiment of the present application, the quantization strategy includes one or more of:
(1) a direct quantization method;
(2) a uniform quantization method;
(3) a non-uniform quantization method;
(4) a weight sharing quantization method;
(5) a block quantization method;
(6) transform domain quantization;
(7) a parametric coding quantization method;
(8) product quantization method.
In one embodiment of the present application, the quantization strategy includes: the system comprises a uniform quantization method, a weight sharing quantization method and a parameter coding quantization method, wherein the network is uniformly quantized by the uniform quantization method, then the uniformly quantized weight is quantized by the weight sharing quantization method, and then the parameter coding quantization method is carried out on the weight.
In an embodiment of the present application, the quantization module 302 is further configured to: and in the network training stage, carrying out quantization processing on the parameters of the first module according to the quantization strategy, the quantization grade and/or the quantization configuration parameters.
In one embodiment of the present application, the parameter dividing method in the grouping quantization method includes:
(1) a random division mode;
(2) determining a set identifier where the parameter is located according to the identifier of the parameter;
(3) and (5) clustering division mode.
In an embodiment of the present application, the determining, according to the identifier of the parameter, an identifier of a set where the parameter is located includes:
obtaining a first numerical value according to the identifier of the parameter;
determining a set identifier where the parameter is located according to the first numerical value;
according to the first numerical value, determining the set identifier where the parameter is located includes one or more of the following items:
(1) rounding the first numerical value to obtain a set identifier where the parameter is located;
(2) at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;
(3) and dividing the first numerical value by a preset value, and taking the obtained remainder as the set identifier of the parameter.
In one embodiment of the present application, the quantization strategy and/or quantization configuration parameter is determined according to one or more of the following:
(1) reporting by a terminal;
(2) the capabilities of the terminal;
(3) and (5) network side configuration.
In an embodiment of the application, the quantization strategy is a direct quantization method, and the quantization module 302 is further configured to: and carrying out quantization processing on the parameter of the first module according to the quantization grade and/or the quantization configuration parameter of the first module.
In one embodiment of the present application, the quantization scale is determined according to one or more of:
(1) information relating to a parameter of the first module;
(2) reporting by a terminal;
(3) the capabilities of the terminal;
(4) network side configuration;
(5) an output accuracy requirement of the first module;
(6) performance requirements of the first module.
In one embodiment of the present application, the information related to the parameter of the first module includes: the size of the parameter; wherein the larger the parameter, the higher the quantization level, and the smaller the parameter, the lower the quantization level; alternatively, the larger the parameter, the lower the quantization scale, and the smaller the parameter, the higher the quantization scale.
In one embodiment of the present application, the higher the quantization level is, the more precise the parameter quantization of the first module is, or the lower the quantization level is, the coarser the parameter quantization of the first module is.
In one embodiment of the present application, the first module is of the type of neural network;
wherein the quantization levels of neurons of different layers in the neural network are the same;
and/or the presence of a gas in the gas,
the quantization levels of the neurons in the same layer in the neural network are the same;
and/or the presence of a gas in the atmosphere,
the quantization levels of multiplicative coefficients in the neural network are the same as the quantization levels of additive coefficients.
In one embodiment of the present application, the first module is of the type of neural network;
wherein the quantization levels of neurons of different layers in the neural network are different;
and/or the presence of a gas in the gas,
the quantization levels of the neurons in the same layer in the neural network are different;
and/or the presence of a gas in the gas,
the quantization levels of multiplicative coefficients in the neural network are different from the quantization levels of additive coefficients.
In one embodiment of the present application, the first module is of the type of recurrent neural network;
wherein the quantization levels of the parameters of the memory cells in the recurrent neural network are the same as the quantization levels of the parameters of the non-memory neurons in the recurrent neural network or the quantization levels of the non-memory parameters of the neurons of the recurrent neural network,
or,
the quantization level of the parameter of the memory unit in the recurrent neural network is different from the quantization level of the parameter of the non-memory neuron in the recurrent neural network or different from the quantization level of the non-memory parameter of the neuron in the recurrent neural network.
In one embodiment of the present application, the first module is of the type of convolutional neural network;
wherein,
the quantization scale of the parameter of the convolution kernel of the convolutional neural network is the same as or different from the quantization scale of the parameter of the non-convolution kernel in the convolutional neural network,
or,
the quantization scale of pooled parameters of the convolutional neural network may be the same or different than the quantization scale of non-pooled parameters in the convolutional neural network.
In one embodiment of the present application, the input or output of the first module is first information;
wherein the first information comprises one or more of:
(1) a reference signal;
(2) a signal carried by a channel;
(3) channel state information;
(4) beam information;
(5) channel prediction information;
(6) interference information;
(7) positioning information;
(8) prediction information of high-level services and/or parameters;
(9) management information of high-level services and/or parameters;
(10) and controlling signaling.
In one embodiment of the present application, in a case where the output of the first module is first information, the apparatus further includes:
and the sending module is used for sending the first information to a second communication device, or sending the first information to a second module of the first communication device.
In an embodiment of the present application, the first communication device is a terminal, and the second communication device is a network side device, or the first communication device is a network side device, and the second communication device is a terminal; or, the first communication device is a first terminal, and the second communication device is a second terminal; or, the first communication device is a first network side device, and the second communication device is a second network side device.
The device provided in the embodiment of the present application can implement each process implemented by the method embodiment shown in fig. 2, and achieve the same technical effect, and for avoiding repetition, details are not described here again.
Fig. 4 is a schematic diagram of a hardware structure of a terminal for implementing an embodiment of the present application, where the terminal 400 includes, but is not limited to: radio unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, and processor 410.
Those skilled in the art will appreciate that the terminal 400 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The terminal structure shown in fig. 4 does not constitute a limitation of the terminal, and the terminal may include more or less components than those shown, or combine some components, or have a different arrangement of components, and will not be described again here.
It should be understood that in the embodiment of the present application, the input Unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the Graphics processor 4041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 407 includes a touch panel 4071 and other input devices 4072. A touch panel 4071, also referred to as a touch screen. The touch panel 4071 may include two parts, a touch detection device and a touch controller. Other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.
In this embodiment, the radio frequency unit 401 receives downlink data from a network side device and then processes the downlink data to the processor 410; in addition, the uplink data is sent to the network side equipment. Typically, radio unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
The memory 409 may be used to store software programs or instructions as well as various data. The memory 409 may mainly include a storage program or instruction area and a storage data area, wherein the storage program or instruction area may store an operating system, an application program or instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. In addition, the Memory 409 may include a high-speed random access Memory, and may further include a nonvolatile Memory, wherein the nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), or a flash Memory. Such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
The terminal provided in the embodiment of the present application can implement each process implemented in the method embodiment shown in fig. 2, and achieve the same technical effect, and is not described here again to avoid repetition.
The embodiment of the application also provides network side equipment. As shown in fig. 5, the network side device 500 includes: antenna 501, radio frequency device 502, baseband device 503. The antenna 501 is connected to a radio frequency device 502. In the uplink direction, the rf device 502 receives information through the antenna 501, and sends the received information to the baseband device 503 for processing. In the downlink direction, the baseband device 503 processes information to be transmitted and transmits the information to the rf device 502, and the rf device 502 processes the received information and transmits the processed information through the antenna 501.
The above band processing means may be located in the baseband means 503, and the method performed by the network side device in the above embodiment may be implemented in the baseband means 503, where the baseband means 503 includes a processor 504 and a memory 505.
The baseband device 503 may include, for example, at least one baseband board, on which a plurality of chips are disposed, as shown in fig. 5, where one of the chips, for example, the processor 504, is connected to the memory 505 and calls the program in the memory 505 to perform the network device operations shown in the above method embodiments.
The baseband device 503 may further include a network interface 506, such as a Common Public Radio Interface (CPRI), for exchanging information with the radio frequency device 502.
Specifically, the network side device in the embodiment of the present application further includes: the instructions or programs stored in the memory 505 and capable of being executed on the processor 504, and the processor 504 calls the instructions or programs in the memory 505 to execute the method executed by each module shown in fig. 3, and achieve the same technical effect, and are not described herein in detail to avoid repetition.
Embodiments of the present application further provide a program product, which is stored in a non-volatile storage medium and executed by at least one processor to implement the steps of the method of processing as described in fig. 2.
An embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the method embodiment shown in fig. 2, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
Wherein, the processor is the processor in the terminal described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a network-side device program or an instruction, to implement each process of the method embodiment shown in fig. 2, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip or a system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (20)
1. A method of quantization, performed by a first communication device, comprising:
determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, the first module being an Artificial Intelligence (AI) module;
and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
2. The method of claim 1, wherein the quantization strategy comprises one or more of:
a direct quantization method;
a uniform quantization method;
a non-uniform quantization method;
weight sharing quantization method;
a block quantization method;
transform domain quantization;
a parametric coding quantization method;
and (4) a product quantization method.
3. The method according to claim 1, wherein the step of quantizing the parameter of the first module according to the quantization strategy, quantization level and/or quantization configuration parameter comprises:
and in the network training stage, carrying out quantization processing on the parameters of the first module according to the quantization strategy, the quantization grade and/or the quantization configuration parameters.
4. The method of claim 2, wherein the dividing of the parameters in the block quantization method comprises:
a random division mode;
determining a set identifier where the parameter is located according to the identifier of the parameter;
and (5) clustering division mode.
5. The method according to claim 4, wherein the determining, according to the identifier of the parameter, the identifier of the set in which the parameter is located includes:
obtaining a first numerical value according to the identifier of the parameter;
determining a set identifier where the parameter is located according to the first numerical value;
wherein, according to the first value, determining the set identifier where the parameter is located includes one or more of:
rounding the first numerical value to obtain a set identifier where the parameter is located;
at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;
and dividing the first numerical value by a preset value, and taking the obtained remainder as the set identifier of the parameter.
6. The method of claim 1, wherein the quantization strategy and/or quantization configuration parameters are determined according to one or more of the following:
reporting by a terminal;
the capabilities of the terminal;
and (5) network side configuration.
7. The method according to claim 2, wherein the quantization strategy is a direct quantization method, and the step of quantizing the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter comprises:
and carrying out quantization processing on the parameter of the first module according to the quantization grade and/or the quantization configuration parameter of the first module.
8. The method of claim 1, wherein the quantization level is determined based on one or more of:
information relating to a parameter of the first module;
reporting by a terminal;
the capabilities of the terminal;
network side configuration;
an output accuracy requirement of the first module;
performance requirements of the first module.
9. The method of claim 8, wherein the information related to the parameter of the first module comprises: the size of the parameter;
wherein the larger the parameter, the higher the quantization level; alternatively, the larger the parameter, the lower the quantization level.
10. The method of claim 1, wherein the higher the quantization level, the more precise the parameter quantization of the first module, or wherein the lower the quantization level, the coarser the parameter quantization of the first module.
11. The method of claim 1, wherein the first module is of a type of neural network;
wherein the quantization levels of neurons of different layers in the neural network are the same;
and/or the presence of a gas in the gas,
the quantization levels of the neurons of the same layer in the neural network are the same;
and/or the presence of a gas in the gas,
the quantization levels of multiplicative coefficients in the neural network are the same as the quantization levels of additive coefficients.
12. The method of claim 1, wherein the first module is of a type of neural network;
wherein the quantization levels of neurons of different layers in the neural network are different;
and/or the presence of a gas in the gas,
the quantization levels of the neurons of the same layer in the neural network are different;
and/or the presence of a gas in the gas,
the quantization levels of multiplicative coefficients and additive coefficients in the neural network are different.
13. The method of claim 1, wherein the first module is of a type of recurrent neural network;
wherein the quantization levels of the parameters of the memory cells in the recurrent neural network are the same as the quantization levels of the parameters of the non-memory neurons in the recurrent neural network or the quantization levels of the non-memory parameters of the neurons of the recurrent neural network,
or,
the quantization level of the parameter of the memory unit in the recurrent neural network is different from the quantization level of the parameter of the non-memory neuron in the recurrent neural network or different from the quantization level of the non-memory parameter of the neuron in the recurrent neural network.
14. The method of claim 1, wherein the type of the first module is a convolutional neural network;
wherein,
the quantization level of the parameter of the convolution kernel of the convolution neural network is the same as or different from the quantization level of the parameter of the non-convolution kernel in the convolution neural network,
or,
the quantization scale of pooled parameters of the convolutional neural network may be the same or different than the quantization scale of non-pooled parameters in the convolutional neural network.
15. The method of claim 1, wherein the input or output of the first module is first information;
wherein the first information comprises one or more of:
a reference signal;
a signal carried by a channel;
channel state information;
beam information;
channel prediction information;
interference information;
positioning information;
prediction information of high-level services and/or parameters;
management information of high-level services and/or parameters;
and controlling signaling.
16. The method of claim 15, wherein in the case that the output of the first module is first information, the method further comprises:
and sending the first information to a second communication device, or sending the first information to a second module of the first communication device.
17. The method according to claim 16, wherein the first communication device is a terminal, and the second communication device is a network side device;
or,
the first communication equipment is network side equipment, and the second communication equipment is a terminal;
or,
the first communication equipment is a first terminal, and the second communication equipment is a second terminal;
or,
the first communication device is a first network side device, and the second communication device is a second network side device.
18. An apparatus for quantization applied to a first communication device, comprising:
a first determining module, configured to determine a quantization policy, a quantization level, and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
and the quantization module is used for performing quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
19. A communication device, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method of any one of claims 1 to 17.
20. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 17.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110240917.9A CN115037608B (en) | 2021-03-04 | 2021-03-04 | Quantization method, quantization device, quantization apparatus, and readable storage medium |
PCT/CN2022/078241 WO2022184009A1 (en) | 2021-03-04 | 2022-02-28 | Quantization method and apparatus, and device and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110240917.9A CN115037608B (en) | 2021-03-04 | 2021-03-04 | Quantization method, quantization device, quantization apparatus, and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115037608A true CN115037608A (en) | 2022-09-09 |
CN115037608B CN115037608B (en) | 2024-09-06 |
Family
ID=83118095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110240917.9A Active CN115037608B (en) | 2021-03-04 | 2021-03-04 | Quantization method, quantization device, quantization apparatus, and readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115037608B (en) |
WO (1) | WO2022184009A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024104126A1 (en) * | 2022-11-14 | 2024-05-23 | 维沃移动通信有限公司 | Method and apparatus for updating ai network model, and communication device |
WO2024153039A1 (en) * | 2023-01-17 | 2024-07-25 | 维沃移动通信有限公司 | Ai model processing method, terminal, and network side device |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118525492A (en) * | 2022-12-20 | 2024-08-20 | 北京小米移动软件有限公司 | Information processing method and device, communication equipment and storage medium |
WO2024207317A1 (en) * | 2023-04-06 | 2024-10-10 | 富士通株式会社 | Information transceiving method and apparatus |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070078532A1 (en) * | 2003-10-29 | 2007-04-05 | Wolfgang Fick | Method for the operation of a technical system |
CN108647250A (en) * | 2018-04-19 | 2018-10-12 | 郑州科技学院 | A kind of talent's big data quantization fine matching method based on artificial intelligence |
CN109543826A (en) * | 2017-09-21 | 2019-03-29 | 杭州海康威视数字技术股份有限公司 | A kind of activation amount quantization method and device based on deep neural network |
CN110223105A (en) * | 2019-05-17 | 2019-09-10 | 知量科技(深圳)有限公司 | Trading strategies generation method and engine based on artificial intelligence model |
CN111160517A (en) * | 2018-11-07 | 2020-05-15 | 杭州海康威视数字技术股份有限公司 | Convolutional layer quantization method and device of deep neural network |
WO2020118553A1 (en) * | 2018-12-12 | 2020-06-18 | 深圳鲲云信息科技有限公司 | Method and device for quantizing convolutional neural network, and electronic device |
CN111582476A (en) * | 2020-05-09 | 2020-08-25 | 北京百度网讯科技有限公司 | Automatic quantization strategy searching method, device, equipment and storage medium |
CN111582432A (en) * | 2019-02-19 | 2020-08-25 | 北京嘉楠捷思信息技术有限公司 | Network parameter processing method and device |
CN111667054A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Method and device for generating neural network model, electronic equipment and storage medium |
CN111815458A (en) * | 2020-07-09 | 2020-10-23 | 四川长虹电器股份有限公司 | Dynamic investment portfolio configuration method based on fine-grained quantitative marking and integration method |
WO2020228655A1 (en) * | 2019-05-10 | 2020-11-19 | 腾讯科技(深圳)有限公司 | Method, apparatus, electronic device, and computer storage medium for optimizing quantization model |
CN112149266A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining network model quantization strategy |
CN112215331A (en) * | 2019-07-10 | 2021-01-12 | 华为技术有限公司 | Data processing method for neural network system and neural network system |
CN112287986A (en) * | 2020-10-16 | 2021-01-29 | 浪潮(北京)电子信息产业有限公司 | Image processing method, device and equipment and readable storage medium |
CN112288697A (en) * | 2020-10-23 | 2021-01-29 | 北京百度网讯科技有限公司 | Method and device for quantifying degree of abnormality, electronic equipment and readable storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085182A (en) * | 2019-06-12 | 2020-12-15 | 安徽寒武纪信息科技有限公司 | Data processing method, data processing device, computer equipment and storage medium |
CN110659678B (en) * | 2019-09-09 | 2023-11-17 | 腾讯科技(深圳)有限公司 | User behavior classification method, system and storage medium |
-
2021
- 2021-03-04 CN CN202110240917.9A patent/CN115037608B/en active Active
-
2022
- 2022-02-28 WO PCT/CN2022/078241 patent/WO2022184009A1/en active Application Filing
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070078532A1 (en) * | 2003-10-29 | 2007-04-05 | Wolfgang Fick | Method for the operation of a technical system |
CN109543826A (en) * | 2017-09-21 | 2019-03-29 | 杭州海康威视数字技术股份有限公司 | A kind of activation amount quantization method and device based on deep neural network |
CN108647250A (en) * | 2018-04-19 | 2018-10-12 | 郑州科技学院 | A kind of talent's big data quantization fine matching method based on artificial intelligence |
CN111160517A (en) * | 2018-11-07 | 2020-05-15 | 杭州海康威视数字技术股份有限公司 | Convolutional layer quantization method and device of deep neural network |
WO2020118553A1 (en) * | 2018-12-12 | 2020-06-18 | 深圳鲲云信息科技有限公司 | Method and device for quantizing convolutional neural network, and electronic device |
CN111582432A (en) * | 2019-02-19 | 2020-08-25 | 北京嘉楠捷思信息技术有限公司 | Network parameter processing method and device |
WO2020228655A1 (en) * | 2019-05-10 | 2020-11-19 | 腾讯科技(深圳)有限公司 | Method, apparatus, electronic device, and computer storage medium for optimizing quantization model |
CN110223105A (en) * | 2019-05-17 | 2019-09-10 | 知量科技(深圳)有限公司 | Trading strategies generation method and engine based on artificial intelligence model |
CN112215331A (en) * | 2019-07-10 | 2021-01-12 | 华为技术有限公司 | Data processing method for neural network system and neural network system |
CN111582476A (en) * | 2020-05-09 | 2020-08-25 | 北京百度网讯科技有限公司 | Automatic quantization strategy searching method, device, equipment and storage medium |
CN111667054A (en) * | 2020-06-05 | 2020-09-15 | 北京百度网讯科技有限公司 | Method and device for generating neural network model, electronic equipment and storage medium |
CN111815458A (en) * | 2020-07-09 | 2020-10-23 | 四川长虹电器股份有限公司 | Dynamic investment portfolio configuration method based on fine-grained quantitative marking and integration method |
CN112287986A (en) * | 2020-10-16 | 2021-01-29 | 浪潮(北京)电子信息产业有限公司 | Image processing method, device and equipment and readable storage medium |
CN112149266A (en) * | 2020-10-23 | 2020-12-29 | 北京百度网讯科技有限公司 | Method, device, equipment and storage medium for determining network model quantization strategy |
CN112288697A (en) * | 2020-10-23 | 2021-01-29 | 北京百度网讯科技有限公司 | Method and device for quantifying degree of abnormality, electronic equipment and readable storage medium |
Non-Patent Citations (3)
Title |
---|
""R1-2007090 FL summary 1 for Potential UE complexity reduction features for RedCap"", 3GPP TSG_RAN\\WG1_RL1, 21 August 2020 (2020-08-21) * |
胡青禾: "非均匀量化在电能测量中的应用", 电测与仪表, no. 10, 10 October 1990 (1990-10-10) * |
邢俊文;迟宝山;刘锋;: "研发项目技术风险度的三参数量化模型研究", 系统工程理论与实践, no. 10, 15 October 2008 (2008-10-15) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024104126A1 (en) * | 2022-11-14 | 2024-05-23 | 维沃移动通信有限公司 | Method and apparatus for updating ai network model, and communication device |
WO2024153039A1 (en) * | 2023-01-17 | 2024-07-25 | 维沃移动通信有限公司 | Ai model processing method, terminal, and network side device |
Also Published As
Publication number | Publication date |
---|---|
WO2022184009A1 (en) | 2022-09-09 |
CN115037608B (en) | 2024-09-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114390580B (en) | Beam reporting method, beam information determining method and related equipment | |
CN115037608B (en) | Quantization method, quantization device, quantization apparatus, and readable storage medium | |
CN110167176B (en) | Wireless network resource allocation method based on distributed machine learning | |
WO2023040887A1 (en) | Information reporting method and apparatus, terminal and readable storage medium | |
JP2024512358A (en) | Information reporting method, device, first device and second device | |
WO2022105913A1 (en) | Communication method and apparatus, and communication device | |
CN115379508A (en) | Carrier management method, resource allocation method and related equipment | |
US11411600B2 (en) | Processing of uplink data streams | |
US20230299910A1 (en) | Communications data processing method and apparatus, and communications device | |
CN114422380A (en) | Neural network information transmission method, device, communication equipment and storage medium | |
CN114531696A (en) | Method and device for processing partial input missing of AI (Artificial Intelligence) network | |
WO2023040888A1 (en) | Data transmission method and apparatus | |
CN115022172B (en) | Information processing method, apparatus, communication device, and readable storage medium | |
US12021665B2 (en) | Methods and wireless network for selecting pilot pattern for optimal channel estimation | |
CN109151882B (en) | Method, terminal, computer readable medium and system for reporting RSRP | |
CN114501353B (en) | Communication information sending and receiving method and communication equipment | |
EP4150861B1 (en) | Determining cell upgrade | |
CN115843045A (en) | Data acquisition method and device | |
WO2024041421A1 (en) | Measurement feedback processing method and apparatus, terminal, and network side device | |
WO2024041420A1 (en) | Measurement feedback processing method and apparatus, and terminal and network-side device | |
EP4270884A1 (en) | Channel estimation using neural networks | |
CN117834427A (en) | Method and device for updating AI model parameters and communication equipment | |
Wang et al. | Joint Computing and Radio Resource Allocation in C-RAN Systems Under Imperfect CSI | |
CN117676668A (en) | Information transmission method, device, terminal and network side equipment | |
CN118214667A (en) | AI model monitoring method, AI model performance measuring device and AI model performance measuring equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |