CN115037608A

CN115037608A - Quantization method, device, equipment and readable storage medium

Info

Publication number: CN115037608A
Application number: CN202110240917.9A
Authority: CN
Inventors: 杨昂
Original assignee: Vivo Mobile Communication Co Ltd
Current assignee: Vivo Mobile Communication Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2022-09-09
Anticipated expiration: 2041-03-04
Also published as: WO2022184009A1; CN115037608B

Abstract

The application discloses a quantization method, a quantization device, quantization equipment and a readable storage medium, wherein the method comprises the following steps: determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an AI module; and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter. In the embodiment of the application, the AI module is quantized through a quantization strategy, a quantization level and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced, and the system performance is improved.

Description

Quantization method, device, equipment and readable storage medium

Technical Field

The application belongs to the technical field of communication, and particularly relates to a method, a device, equipment and a readable storage medium for Artificial Intelligence (AI) module quantification.

Background

Artificial intelligence is currently in wide use in a variety of fields. In a communication network, artificial intelligence may be implemented through the AI module. However, there is no flow for quantizing the AI module, which results in the complexity increase of the AI module.

Disclosure of Invention

Embodiments of the present application provide a quantization method, apparatus, device, and readable storage medium, which solve the problem of how to reduce the complexity of an AI module.

In a first aspect, a quantization method is provided, which is performed by a first communication device, and includes:

determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an Artificial Intelligence (AI) module;

and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.

In a second aspect, an apparatus for quantization is provided, which is applied to a first communication device, and includes:

a first determining module, configured to determine a quantization policy, a quantization level, and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;

and the quantization module is used for performing quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.

In a third aspect, a communication device is provided, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which program, when executed by the processor, carries out the steps of the method according to the first aspect.

In a fourth aspect, a readable storage medium is provided, on which a program or instructions are stored, which when executed by a processor, implement the steps of the method according to the first aspect.

In a fifth aspect, a program product is provided, which is stored on a non-volatile storage medium, which program product is executable by at least one processor to implement the steps of the method according to the first aspect.

In a sixth aspect, a chip is provided, the chip comprising a processor and a communication interface, the communication interface being coupled to the processor, the processor being configured to execute a program or instructions to implement the method according to the first aspect.

In the embodiment of the application, the AI module is quantized through a quantization strategy, a quantization level and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced, and the system performance is improved.

Drawings

Fig. 1 is a schematic diagram of a wireless communication system to which embodiments of the present application are applicable;

FIG. 2 is a flow chart of a method of quantization provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of an apparatus for quantization provided by an embodiment of the present application;

fig. 4 is a schematic diagram of a terminal according to an embodiment of the present application;

fig. 5 is a schematic diagram of a network-side device according to an embodiment of the present application.

Detailed Description

In order to facilitate understanding of the embodiments of the present application, the following technical points are introduced below: and (5) artificial intelligence.

Artificial intelligence is currently in wide use in a variety of fields. The AI module for implementing artificial intelligence can be implemented in various ways, such as neural networks, decision trees, support vector machines, bayesian classifiers, and the like.

Taking a neural network as an example, parameters of the neural network are optimized through an optimization algorithm. An optimization algorithm is one type of algorithm that minimizes or maximizes an objective function (sometimes called a loss function). Whereas the objective function is often a mathematical combination of model parameters and data. For example, given data X and its corresponding label Y, a neural network model f () is constructed, with the model, a prediction output f (X) can be obtained from the input X, and the difference (f (X) -Y) between the predicted value and the true value can be calculated, which is a loss function. The objective is to find the appropriate W, b to minimize the value of the above-mentioned loss function, the smaller the loss value, the closer the model is to the real situation.

The current common optimization algorithm is basically based on an error Back Propagation (BP) algorithm. The basic idea of the BP algorithm is that the learning process consists of two processes, forward propagation of signals and back propagation of errors. In forward propagation, an input sample is transmitted from an input layer, processed layer by each hidden layer, and transmitted to an output layer. If the actual output of the output layer does not match the expected output, the error is propagated back to the error stage. The error back transmission is to back transmit the output error to the input layer by layer through the hidden layer in a certain form, and distribute the error to all units of each layer, thereby obtaining the error signal of each layer of units, and the error signal is used as the basis for correcting the weight of each unit. The weight adjustment process of each layer of signal forward propagation and error backward propagation is performed in cycles. And (4) continuously adjusting the weight value, namely, a learning and training process of the network. This process continues until the error in the output of the network is reduced to an acceptable level, or until a predetermined number of learning cycles.

Common optimization algorithms include Gradient Descent (SGD), mini-batch Gradient Descent (mini-batch Gradient Descent), Momentum method (Momentum), Nesterov (named by the inventor, specifically, Stochastic Gradient Descent with Momentum), ADAptive Gradient Descent (ADAptive Gradient Descent, ADAptive delta), Root Mean Square error deceleration (Root Mean Square screw, RMSprop), ADAptive Momentum Estimation (ADAptive Momentum Estimation, Adam), and the like.

When errors are reversely propagated, the optimization algorithms obtain the gradients by solving the derivatives/partial derivatives of the current neurons according to the errors/losses obtained by the loss functions and adding the learning rate, the previous gradients/derivatives/partial derivatives and other influences to obtain the gradients, and then transmitting the gradients to the previous layer.

The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in other sequences than those illustrated or otherwise described herein, and that the terms "first" and "second" are generally used herein in a generic sense to distinguish one element from another, and not necessarily from another element, such as a first element which may be one or more than one. In the specification and claims, "and" represents at least one of connected objects, and a character "/" generally indicates that a preceding and succeeding related object is in an "or" relationship.

It is noted that the techniques described in the embodiments of the present application are not limited to Long Term Evolution (LTE)/LTE Evolution (LTE-Advanced) systems, but may also be used in other wireless communication systems, such as Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Orthogonal Frequency Division Multiple Access (OFDMA), Single-carrier Frequency-Division Multiple Access (SC-FDMA), and other systems. The terms "system" and "network" are often used interchangeably in embodiments of the present application, and the described techniques may be used for both the above-mentioned systems and radio technologies, as well as for other systems and radio technologies. However, the following description describes a New Radio (NR) system for purposes of example, and NR terminology is used in much of the description below, and the techniques may also be applied to applications other than NR system applications, such as 6 th generation (6 th generation) NR systems ^th Generation, 6G) communication system.

Fig. 1 shows a block diagram of a wireless communication system to which embodiments of the present application are applicable. The wireless communication system includes a terminal 11 and a network-side device 12. Wherein, the terminal 11 may also be called as a terminal Device or a User Equipment (UE), the terminal 11 may be a Mobile phone, a Tablet Personal Computer (Tablet Personal Computer), a Laptop Computer (Laptop Computer) or a notebook Computer, a Personal Digital Assistant (PDA), a palmtop Computer, a netbook, a super-Mobile Personal Computer (UMPC), a Mobile Internet Device (MID), a Wearable Device (Wearable Device) or a vehicle-mounted Device (VUE), a pedestrian terminal (PUE), and other terminal side devices, the Wearable Device includes: bracelets, earphones, glasses and the like. It should be noted that the embodiment of the present application does not limit the specific type of the terminal 11. The network-side device 12 may be a Base station or a core network-side device, wherein the Base station may be referred to as a node B, an evolved node B, an access Point, a Base Transceiver Station (BTS), a radio Base station, a radio transceiver, a Basic Service Set (BSS), an Extended Service Set (ESS), a node B, an evolved node B (eNB), a home node B, a home evolved node B, a WLAN access Point, a WiFi node, a Transmit Receive Point (TRP), or some other suitable term in the field, as long as the same technical effect is achieved, the Base station is not limited to a specific technical vocabulary, and it should be noted that in the embodiment of the present application, only the Base station in the NR system is taken as an example, but the specific type of the Base station is not limited.

A quantization method, apparatus, device and readable storage medium provided by the embodiments of the present application are described in detail below with reference to the accompanying drawings.

Referring to fig. 2, an embodiment of the present application provides a quantization method, where an execution subject of the quantization method may be a first communication device, and the quantization method includes:

step 201: determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, wherein the first module is an AI module;

step 202: and carrying out quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.

The quantization strategy may also be referred to as a quantization method, which is a method for quantizing the parameters of the AI module.

The quantization scale may represent an accuracy of parameter quantization of the AI module, for example, the higher the quantization scale is, the more accurate the parameter of the AI module is, the closer to the original parameter is; the lower the quantization level, the coarser the parameter of the AI module, the further away from the original parameter. For example, the quantization levels are divided by bits, and the quantization level X bits represent that the parameter of the AI module is quantized into X bits, so that the larger the value of X, the more bits the parameter of the AI module occupies, where X is a positive integer. The single precision type (float type) commonly used in the computer at present occupies 32 bits, the double precision type (double type) occupies 64 bits, and the quantization is actually very high precision.

The quantization configuration parameter is used to indicate a configuration for quantizing the AI module, for example, the quantization configuration parameter includes one or more of the following items: what quantization strategy the AI module adopts and how the details of the quantization strategy are configured, whether all parameters of the AI module use a uniform quantization level, whether the quantization level of the multiplicative coefficient of the AI module is the same as the quantization level of the additive coefficient, what the quantization level of the AI module is, how many bits the parameters of the AI module quantize, and the like.

For example, the quantization level is configured to be 8 bits, the quantization strategy is a direct quantization method, all parameters of the AI module are quantized from floating point numbers to 8 bits, and assuming that the AI module is a neural network, multiplicative coefficients and additive coefficients of all neurons are quantized to 8 bits.

In one embodiment of the present application, the quantization strategy may include one or more of:

(1) a direct quantization method;

the direct quantization method is to quantize each parameter of the AI module directly according to the quantization level and/or the quantization configuration parameter.

(2) Homogeneous quantification (uniformity quantification) method;

the uniform quantization method is a quantization method in which parameters of the AI module (for example, a value-taking range of input parameters) are divided at equal intervals.

(3) A non-uniform quantization method;

the non-uniform quantization method is a quantization method in which quantization intervals are not equal in the dynamic range of parameters (e.g., input parameters) of the AI module.

For example, the quantization intervals/quantization levels of different input intervals are determined according to the input probability density, probability distribution, cumulative probability distribution, etc. For example, for an interval with a small input value, the quantization interval is also small; conversely, for an interval with a large input value, the quantization interval is large.

(4) Weight sharing quantization method;

(5) a block quantization method;

in the weight-sharing quantization method and the grouping quantization method, parameters of the AI module may be divided into a plurality of sets, and elements in each set share one value.

(6) Transform domain quantization;

transform-domain quantization refers to transforming parameters (such as weights, offsets, convolution kernels, etc.) of an AI module into another domain, such as a frequency domain, an S domain, a Z domain, etc., performing quantization operation in another domain, and then performing inverse transformation.

Illustratively, the network convolution kernel is first transformed into the frequency domain, then randomly hashed in the frequency domain, and a lower number of hash bits is used for the less important high frequency portions to achieve higher compression.

(7) A parametric coding quantization method;

the parameter coding quantization method is to code parameters of the AI module, and the coding method includes but is not limited to: lossy coding, lossless coding (e.g., huffman coding), and the like.

(8) Product quantification (Product quantification) method.

The product quantization method is to divide the network weight into a plurality of subspaces and perform quantization operation on each subspace, for example, performing weight sharing quantization method on each subspace.

Alternatively, the above-mentioned quantization strategies may be cascaded or combined. Illustratively, the quantization strategy includes: the method comprises the steps of firstly carrying out uniform quantization on a network through the uniform quantization method, then carrying out quantization on the uniformly quantized weight by using the weight sharing quantization method, and then carrying out quantization on the weight according to the parameter coding quantization method.

In an embodiment of the present application, the step of quantizing the parameter of the first module includes:

and in the network training stage, carrying out quantization processing on the parameters of the first module according to the quantization strategy, the quantization grade and/or the quantization configuration parameters.

For example, a common gradient calculation method is used to obtain a gradient corresponding to each weight, the weights are grouped according to previous weights, the gradient values of the weights in the same group are accumulated to obtain an updated amount in the network training of the cluster center, and the cluster center value is subtracted by the product of the updated amount and the learning rate to obtain the updated cluster center in the training of the current round.

In one embodiment of the present application, the parameter dividing method in the grouping quantization method includes:

(1) a random division mode;

in the random division manner, the parameters of the AI modules may be grouped in a random manner.

(2) Determining a set identifier where the parameter is located according to the identifier of the parameter;

the above-described mode (2) may also be referred to as a direct addressing method. For example, the parameters of the AI modules are sorted, the IDs of the respective parameters are determined, then the parameter IDs are input into a linear function, an N-order function, or other common functions to obtain a new value X, and the set ID of the network parameter is obtained through X. Wherein the linear function comprises a function having an output equal to an input.

In an embodiment of the present application, the determining, according to the identifier of the parameter, the identifier of the set where the parameter is located includes:

obtaining a first numerical value according to the identifier of the parameter;

determining a set identifier where the parameter is located according to the first numerical value;

wherein, according to the first value, determining the set identifier where the parameter is located includes one or more of:

(a) rounding the first numerical value to obtain a set identifier where the parameter is located;

(b) at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;

(c) and dividing the first numerical value by a preset value, and using the obtained remainder as the set identifier of the parameter.

Optionally, the identification parameter of the parameter is input into a linear function or other common mathematical function to obtain the first value (X). The common mathematical functions include addition and subtraction of multipliers, power N, root opening number N, logarithm, derivation, partial derivation, and other combinations of various common mathematical operations. N is any number, for example, N may be a positive or negative number or 0, real or complex.

Optionally, the obtaining, by X, a set ID where the network parameter is located includes:

a) and X is rounded, namely the set ID. Rounding includes rounding up, rounding down, and the like. For example, if X is 3.23, then the set ID may be 3 or 4, where 3 represents rounding down or rounding down and 4 represents rounding up.

b) X takes at least one bit and combines into a set ID.

For example, X is 3215217, taking the 2 nd and 4 th bits from the front with a set ID of 25, or the 1 st and 3 rd bits from the back with a set ID of 72 or 27.

For example, X is 872351.1237, the 1 st and 2 nd digits after the decimal point are taken, and the set ID is 12 or 21, or the 1 st and 2 nd digits before the decimal point are taken, and the set ID is 51 or 15, or the 2 nd digits before the decimal point and the 3 rd digits after the decimal point are taken, and the set ID is 53 or 35.

Exemplarily, the following steps are carried out:

(i) at least two bits are taken, and the values on the bits are arranged according to a certain rule to form a set ID.

For example, from front to back in digits, or from back to front in digits, or from large to small in numerical values, or from small to large in numerical values. For example, X is 67429815, taking the 1 st, 3 rd and 5 th bits from front to back, the values on these bits are 5, 8 and 2, and the set ID is 285 according to the number of bits from front to back; from back to front by number of digits, the set ID is 582; if the numerical value is from large to small, the set ID is 852; if the value is from small to large, the set ID is 258.

(ii) If a bit is not present, the value of the bit is 0, or some other default value.

For example, if X is 52 and the 1 st, 3 rd and 5 th bits from the front to the back are taken, the values of the corresponding bits are 5, 0 and 0.

c) And dividing X by a certain number to obtain the remainder.

For example, if X is 752 and the number is 11, the set ID is 4 — 752mod (11).

d) The set IDs are randomly divided according to X.

(3) And (5) clustering division mode.

And grouping the parameters of the AI modules according to the clustering centers in a clustering division mode.

For example, if the data is divided into K groups in advance, K objects are randomly selected as initial cluster centers, then the distance between each object and each seed cluster center is calculated, and each object is assigned to the cluster center closest to the object. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal.

In one embodiment of the present application, the quantization strategy and/or quantization configuration parameter is determined according to one or more of the following:

(1) reporting by a terminal;

that is, the network side may obtain the quantization strategy and/or the quantization configuration parameter according to the manner reported by the terminal.

(2) The capabilities of the terminal;

that is, the quantization strategy and/or quantization configuration parameters may be taken as capabilities of the terminal.

(3) And (5) network side configuration.

That is, the terminal side may obtain the quantization strategy and/or the quantization configuration parameter according to the configuration of the network side.

For example, the network side performs configuration, activation or triggering through Radio Resource Control (RRC), Media Access Control Element (MAC CE), or Downlink Control Information (DCI).

In an embodiment of the present application, the quantization strategy is a direct quantization method, and the step of performing quantization processing on the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter includes:

and carrying out quantization processing on the parameter of the first module according to the quantization grade and/or the quantization configuration parameter of the first module.

In one embodiment of the present application, the quantization scale is determined according to one or more of:

(1) information relating to a parameter of the first module;

optionally, the information related to the parameter of the first module includes: the size of the parameter;

for example, the larger the parameter, the higher the quantization level; alternatively, the larger the parameter, the lower the quantization level.

For another example, the smaller the parameter, the lower the quantization level, or the smaller the parameter, the higher the quantization level.

That is, different quantization levels may be determined according to the size of the parameter of the AI module. For example, the larger the parameters of the AI module, the more granular it is; the smaller the parameters of the AI module, the coarser the quantization. Or, the larger the parameter of the AI module, the coarser the quantification; the smaller the parameters of the AI module, the more granular it is.

(2) Reporting by a terminal;

that is, the network side can obtain the quantization level according to the mode reported by the terminal.

(3) The capabilities of the terminal;

that is, the quantization level may be the capability of the terminal.

(4) Network side configuration;

(5) an output accuracy requirement of the first module;

for example, the higher the requirement of the output accuracy of the AI module, the higher the quantization level.

(6) Performance requirements of the first module.

For example, the performance requirements of the AI modules are divided into multiple levels, with different levels of performance requirements corresponding to different quantization levels.

In one embodiment of the present application, the higher the quantization level is, the more precise the parameter quantization of the first module is, or the lower the quantization level is, the coarser the parameter quantization of the first module is.

In one embodiment of the present application, the first module is of the type of a neural network;

wherein the quantization levels of neurons of different layers in the neural network are the same;

and/or the presence of a gas in the atmosphere,

the quantization levels of the neurons of the same layer in the neural network are the same;

and/or the presence of a gas in the gas,

the quantization levels of multiplicative coefficients in the neural network are the same as the quantization levels of additive coefficients.

wherein the quantization levels of neurons of different layers in the neural network are different;

and/or the presence of a gas in the atmosphere,

the quantization levels of the neurons of the same layer in the neural network are different;

and/or the presence of a gas in the atmosphere,

the quantization levels of multiplicative coefficients in the neural network are different from the quantization levels of additive coefficients.

In one embodiment of the present application, the first module is of a type of Recurrent Neural Network (RNN);

wherein the quantization level of the parameter (such as multiplicative coefficient and additive coefficient) of the memory unit in the recurrent neural network is the same as the quantization level of the parameter of the non-memory neuron (including neuron and unit of non-neuron) in the recurrent neural network or the quantization level of the non-memory parameter of the neuron in the recurrent neural network,

or,

the quantization level of the parameter of the memory unit in the recurrent neural network is different from the quantization level of the parameter of the non-memory neuron in the recurrent neural network or different from the quantization level of the non-memory parameter of the neuron in the recurrent neural network.

In one embodiment of the present application, the type of the first module is Convolutional Neural Networks (CNN);

wherein,

the quantization scale of the parameter of the convolution kernel of the convolutional neural network is the same as or different from the quantization scale of the parameter of the non-convolution kernel in the convolutional neural network,

or,

the quantization level of pooled parameters (multiplicative coefficients, additive coefficients) of the convolutional neural network is the same or different from the quantization level of non-pooled parameters in the convolutional neural network.

In one embodiment of the present application, the input or output of the first module is first information;

wherein the first information comprises one or more of:

(1) a reference signal;

the Reference Signal is used for Signal processing, including Signal detection, filtering, equalization, etc., and includes, for example, Demodulation Reference signals (DMRSs), Sounding Reference Signals (SRS), Synchronization Signal Blocks (SSB), Tracking Reference Signals (TRSs), Phase-tracking Reference signals (PTRS), Channel State Information Reference signals (CSI-RS), etc.

(2) A signal carried by a channel;

the channel may include one or more of: a Physical Downlink Control Channel (PDCCH), a Physical Downlink Shared Channel (PDSCH), a Physical Uplink Control Channel (PUCCH), a Physical Uplink Shared Channel (PUSCH), a Physical Random Access Channel (PRACH), a Physical Broadcast Channel (PBCH), and the like.

(3) Channel state information;

optionally, the channel state information includes channel state information feedback information and/or channel state information of uplink and downlink partial reciprocity in a Frequency Division multiplexing (FDD) system.

Wherein the channel state information feedback information comprises one or more of: channel related information, Channel matrix related information, Channel characteristic information, Channel matrix characteristic information, Precoding Matrix Indicator (PMI), Rank Indicator (RI), CSI-RS Resource Indicator (CRI), Channel Quality Indicator (CQI), Layer Indicator (LI), and the like.

For an FDD system, according to partial reciprocity, a base station acquires angle and time delay information according to an uplink channel, the angle information and the time delay information can be notified to UE through a CSI-RS precoding or direct indication method, and the UE reports according to the indication of the base station or selects and reports in the indication range of the base station, so that the calculation amount of the UE and the cost of CSI reporting are reduced.

(4) Beam information;

the beam information includes one or more of: beam quality, indication information of a beam (reference signal ID), beam failure indication information, new beam indication information in beam failure recovery. The method is used for beam management, including beam measurement, beam reporting, beam prediction, beam failure detection, beam failure recovery and new beam indication in the beam failure recovery.

(5) Channel prediction information;

the channel prediction information includes: prediction of channel state information, beam prediction.

(6) Interference information;

the interference information includes one or more of: intra-cell interference information, inter-cell interference information, out-of-band interference information, inter-modulation interference information, and the like.

(7) Positioning information (alternatively referred to as trajectory information);

the estimated specific position (including horizontal position and/or vertical position) or future possible trajectory of the UE, or information to assist position estimation or trajectory estimation, through Reference signals (e.g. Sounding Reference Signal (SRS)).

(8) Prediction information of high-level services and/or parameters;

(9) management information of high-level services and/or parameters;

for example, the prediction information or management information may include throughput, required packet size, traffic demand, speed of movement, and/or noise information, among others

(10) And controlling signaling.

Such as signaling related to power control and signaling related to beam management.

In one embodiment of the present application, in a case that the output of the first module is first information, the method further includes:

and sending the first information to a second communication device, or sending the first information to a second module of the first communication device.

The first information includes that the first communication device is a terminal and the second communication device is a network side device, or the first communication device is a network side device and the second communication device is a terminal; or the first communication device is a first terminal and the second communication device is a second terminal; or, the first communication device is a first network side device and the second communication device is a second network side device.

Referring to fig. 3, an embodiment of the present application provides an apparatus for quantization, which is applied to a first communication device, where the apparatus 300 includes:

a first determining module 301, configured to determine a quantization policy, a quantization level, and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;

a quantization module 302, configured to perform quantization processing on the parameter of the first module according to the quantization policy, the quantization level, and/or the quantization configuration parameter.

In one embodiment of the present application, the quantization strategy includes one or more of:

(1) a direct quantization method;

(2) a uniform quantization method;

(3) a non-uniform quantization method;

(4) a weight sharing quantization method;

(5) a block quantization method;

(6) transform domain quantization;

(7) a parametric coding quantization method;

(8) product quantization method.

In one embodiment of the present application, the quantization strategy includes: the system comprises a uniform quantization method, a weight sharing quantization method and a parameter coding quantization method, wherein the network is uniformly quantized by the uniform quantization method, then the uniformly quantized weight is quantized by the weight sharing quantization method, and then the parameter coding quantization method is carried out on the weight.

In an embodiment of the present application, the quantization module 302 is further configured to: and in the network training stage, carrying out quantization processing on the parameters of the first module according to the quantization strategy, the quantization grade and/or the quantization configuration parameters.

(1) a random division mode;

(3) and (5) clustering division mode.

In an embodiment of the present application, the determining, according to the identifier of the parameter, an identifier of a set where the parameter is located includes:

obtaining a first numerical value according to the identifier of the parameter;

according to the first numerical value, determining the set identifier where the parameter is located includes one or more of the following items:

(1) rounding the first numerical value to obtain a set identifier where the parameter is located;

(2) at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;

(3) and dividing the first numerical value by a preset value, and taking the obtained remainder as the set identifier of the parameter.

(1) reporting by a terminal;

(2) the capabilities of the terminal;

(3) and (5) network side configuration.

In an embodiment of the application, the quantization strategy is a direct quantization method, and the quantization module 302 is further configured to: and carrying out quantization processing on the parameter of the first module according to the quantization grade and/or the quantization configuration parameter of the first module.

(1) information relating to a parameter of the first module;

(2) reporting by a terminal;

(3) the capabilities of the terminal;

(4) network side configuration;

(5) an output accuracy requirement of the first module;

(6) performance requirements of the first module.

In one embodiment of the present application, the information related to the parameter of the first module includes: the size of the parameter; wherein the larger the parameter, the higher the quantization level, and the smaller the parameter, the lower the quantization level; alternatively, the larger the parameter, the lower the quantization scale, and the smaller the parameter, the higher the quantization scale.

In one embodiment of the present application, the first module is of the type of neural network;

and/or the presence of a gas in the gas,

the quantization levels of the neurons in the same layer in the neural network are the same;

and/or the presence of a gas in the atmosphere,

and/or the presence of a gas in the gas,

the quantization levels of the neurons in the same layer in the neural network are different;

and/or the presence of a gas in the gas,

In one embodiment of the present application, the first module is of the type of recurrent neural network;

wherein the quantization levels of the parameters of the memory cells in the recurrent neural network are the same as the quantization levels of the parameters of the non-memory neurons in the recurrent neural network or the quantization levels of the non-memory parameters of the neurons of the recurrent neural network,

or,

In one embodiment of the present application, the first module is of the type of convolutional neural network;

wherein,

or,

the quantization scale of pooled parameters of the convolutional neural network may be the same or different than the quantization scale of non-pooled parameters in the convolutional neural network.

wherein the first information comprises one or more of:

(1) a reference signal;

(2) a signal carried by a channel;

(3) channel state information;

(4) beam information;

(5) channel prediction information;

(6) interference information;

(7) positioning information;

(8) prediction information of high-level services and/or parameters;

(9) management information of high-level services and/or parameters;

(10) and controlling signaling.

In one embodiment of the present application, in a case where the output of the first module is first information, the apparatus further includes:

and the sending module is used for sending the first information to a second communication device, or sending the first information to a second module of the first communication device.

In an embodiment of the present application, the first communication device is a terminal, and the second communication device is a network side device, or the first communication device is a network side device, and the second communication device is a terminal; or, the first communication device is a first terminal, and the second communication device is a second terminal; or, the first communication device is a first network side device, and the second communication device is a second network side device.

The device provided in the embodiment of the present application can implement each process implemented by the method embodiment shown in fig. 2, and achieve the same technical effect, and for avoiding repetition, details are not described here again.

Fig. 4 is a schematic diagram of a hardware structure of a terminal for implementing an embodiment of the present application, where the terminal 400 includes, but is not limited to: radio unit 401, network module 402, audio output unit 403, input unit 404, sensor 405, display unit 406, user input unit 407, interface unit 408, memory 409, and processor 410.

Those skilled in the art will appreciate that the terminal 400 may further include a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 410 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. The terminal structure shown in fig. 4 does not constitute a limitation of the terminal, and the terminal may include more or less components than those shown, or combine some components, or have a different arrangement of components, and will not be described again here.

It should be understood that in the embodiment of the present application, the input Unit 404 may include a Graphics Processing Unit (GPU) 4041 and a microphone 4042, and the Graphics processor 4041 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 406 may include a display panel 4061, and the display panel 4061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 407 includes a touch panel 4071 and other input devices 4072. A touch panel 4071, also referred to as a touch screen. The touch panel 4071 may include two parts, a touch detection device and a touch controller. Other input devices 4072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein.

In this embodiment, the radio frequency unit 401 receives downlink data from a network side device and then processes the downlink data to the processor 410; in addition, the uplink data is sent to the network side equipment. Typically, radio unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.

The memory 409 may be used to store software programs or instructions as well as various data. The memory 409 may mainly include a storage program or instruction area and a storage data area, wherein the storage program or instruction area may store an operating system, an application program or instruction (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like. In addition, the Memory 409 may include a high-speed random access Memory, and may further include a nonvolatile Memory, wherein the nonvolatile Memory may be a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Programmable Read-Only Memory (Erasable PROM, EPROM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), or a flash Memory. Such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.

Processor 410 may include one or more processing units; alternatively, the processor 410 may integrate an application processor, which primarily handles operating systems, user interfaces, and applications or instructions, etc., and a modem processor, which primarily handles wireless communications, such as a baseband processor. It will be appreciated that the modem processor described above may not be integrated into the processor 410.

The terminal provided in the embodiment of the present application can implement each process implemented in the method embodiment shown in fig. 2, and achieve the same technical effect, and is not described here again to avoid repetition.

The embodiment of the application also provides network side equipment. As shown in fig. 5, the network side device 500 includes: antenna 501, radio frequency device 502, baseband device 503. The antenna 501 is connected to a radio frequency device 502. In the uplink direction, the rf device 502 receives information through the antenna 501, and sends the received information to the baseband device 503 for processing. In the downlink direction, the baseband device 503 processes information to be transmitted and transmits the information to the rf device 502, and the rf device 502 processes the received information and transmits the processed information through the antenna 501.

The above band processing means may be located in the baseband means 503, and the method performed by the network side device in the above embodiment may be implemented in the baseband means 503, where the baseband means 503 includes a processor 504 and a memory 505.

The baseband device 503 may include, for example, at least one baseband board, on which a plurality of chips are disposed, as shown in fig. 5, where one of the chips, for example, the processor 504, is connected to the memory 505 and calls the program in the memory 505 to perform the network device operations shown in the above method embodiments.

The baseband device 503 may further include a network interface 506, such as a Common Public Radio Interface (CPRI), for exchanging information with the radio frequency device 502.

Specifically, the network side device in the embodiment of the present application further includes: the instructions or programs stored in the memory 505 and capable of being executed on the processor 504, and the processor 504 calls the instructions or programs in the memory 505 to execute the method executed by each module shown in fig. 3, and achieve the same technical effect, and are not described herein in detail to avoid repetition.

Embodiments of the present application further provide a program product, which is stored in a non-volatile storage medium and executed by at least one processor to implement the steps of the method of processing as described in fig. 2.

An embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the method embodiment shown in fig. 2, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.

Wherein, the processor is the processor in the terminal described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.

The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a network-side device program or an instruction, to implement each process of the method embodiment shown in fig. 2, and can achieve the same technical effect, and details are not repeated here to avoid repetition.

It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip or a system-on-chip, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.

While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A method of quantization, performed by a first communication device, comprising:

determining a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, the first module being an Artificial Intelligence (AI) module;

2. The method of claim 1, wherein the quantization strategy comprises one or more of:

a direct quantization method;

a uniform quantization method;

a non-uniform quantization method;

weight sharing quantization method;

a block quantization method;

transform domain quantization;

a parametric coding quantization method;

and (4) a product quantization method.

3. The method according to claim 1, wherein the step of quantizing the parameter of the first module according to the quantization strategy, quantization level and/or quantization configuration parameter comprises:

4. The method of claim 2, wherein the dividing of the parameters in the block quantization method comprises:

a random division mode;

determining a set identifier where the parameter is located according to the identifier of the parameter;

and (5) clustering division mode.

5. The method according to claim 4, wherein the determining, according to the identifier of the parameter, the identifier of the set in which the parameter is located includes:

obtaining a first numerical value according to the identifier of the parameter;

rounding the first numerical value to obtain a set identifier where the parameter is located;

at least one bit is taken from the first numerical value and combined into a set identifier of the parameter;

and dividing the first numerical value by a preset value, and taking the obtained remainder as the set identifier of the parameter.

6. The method of claim 1, wherein the quantization strategy and/or quantization configuration parameters are determined according to one or more of the following:

reporting by a terminal;

the capabilities of the terminal;

and (5) network side configuration.

7. The method according to claim 2, wherein the quantization strategy is a direct quantization method, and the step of quantizing the parameter of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter comprises:

8. The method of claim 1, wherein the quantization level is determined based on one or more of:

information relating to a parameter of the first module;

reporting by a terminal;

the capabilities of the terminal;

network side configuration;

an output accuracy requirement of the first module;

performance requirements of the first module.

9. The method of claim 8, wherein the information related to the parameter of the first module comprises: the size of the parameter;

wherein the larger the parameter, the higher the quantization level; alternatively, the larger the parameter, the lower the quantization level.

10. The method of claim 1, wherein the higher the quantization level, the more precise the parameter quantization of the first module, or wherein the lower the quantization level, the coarser the parameter quantization of the first module.

11. The method of claim 1, wherein the first module is of a type of neural network;

and/or the presence of a gas in the gas,

12. The method of claim 1, wherein the first module is of a type of neural network;

and/or the presence of a gas in the gas,

the quantization levels of multiplicative coefficients and additive coefficients in the neural network are different.

13. The method of claim 1, wherein the first module is of a type of recurrent neural network;

or,

14. The method of claim 1, wherein the type of the first module is a convolutional neural network;

wherein,

the quantization level of the parameter of the convolution kernel of the convolution neural network is the same as or different from the quantization level of the parameter of the non-convolution kernel in the convolution neural network,

or,

15. The method of claim 1, wherein the input or output of the first module is first information;

wherein the first information comprises one or more of:

a reference signal;

a signal carried by a channel;

channel state information;

beam information;

channel prediction information;

interference information;

positioning information;

prediction information of high-level services and/or parameters;

management information of high-level services and/or parameters;

and controlling signaling.

16. The method of claim 15, wherein in the case that the output of the first module is first information, the method further comprises:

17. The method according to claim 16, wherein the first communication device is a terminal, and the second communication device is a network side device;

or,

the first communication equipment is network side equipment, and the second communication equipment is a terminal;

or,

the first communication equipment is a first terminal, and the second communication equipment is a second terminal;

or,

the first communication device is a first network side device, and the second communication device is a second network side device.

18. An apparatus for quantization applied to a first communication device, comprising:

19. A communication device, comprising: a processor, a memory and a program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the method of any one of claims 1 to 17.

20. A readable storage medium, characterized in that it stores thereon a program or instructions which, when executed by a processor, implement the steps of the method according to any one of claims 1 to 17.