WO2022184009A1 - 量化的方法、装置、设备及可读存储介质 - Google Patents

量化的方法、装置、设备及可读存储介质 Download PDF

Info

Publication number
WO2022184009A1
WO2022184009A1 PCT/CN2022/078241 CN2022078241W WO2022184009A1 WO 2022184009 A1 WO2022184009 A1 WO 2022184009A1 CN 2022078241 W CN2022078241 W CN 2022078241W WO 2022184009 A1 WO2022184009 A1 WO 2022184009A1
Authority
WO
WIPO (PCT)
Prior art keywords
quantization
module
parameter
parameters
level
Prior art date
Application number
PCT/CN2022/078241
Other languages
English (en)
French (fr)
Inventor
杨昂
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2022184009A1 publication Critical patent/WO2022184009A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0803Configuration setting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0893Assignment of logical groups to network elements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0894Policy-based network configuration management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence

Definitions

  • the present application belongs to the field of communication technologies, and specifically relates to a method, apparatus, device, and readable storage medium for artificial intelligence (Artificial Intelligence, AI) module quantization.
  • artificial intelligence Artificial Intelligence, AI
  • Embodiments of the present application provide a quantification method, apparatus, device, and readable storage medium to solve the problem of how to reduce the complexity of an AI module.
  • a quantization method executed by a first communication device, comprising:
  • the parameters of the first module are quantized according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • a quantization device applied to the first communication device, including:
  • a first determination module configured to determine a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
  • a quantization module configured to perform quantization processing on the parameters of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • a communication device comprising: a processor, a memory, and a program stored on the memory and executable on the processor, the program being executed by the processor to implement the first aspect the steps of the method.
  • a readable storage medium is provided, and a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method according to the first aspect are implemented.
  • a computer program product is provided, the computer program product being stored in a non-volatile storage medium, the program product being executed by at least one processor to implement the steps of the method according to the first aspect.
  • a chip in a sixth aspect, includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement the method according to the first aspect .
  • the AI module is quantized through a quantization strategy, a quantization level, and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced and the system performance can be improved.
  • FIG. 1 is a schematic diagram of a wireless communication system to which an embodiment of the present application can be applied;
  • FIG. 3 is a schematic diagram of a device for quantization provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a terminal according to an embodiment of the present application.
  • FIG. 5 is a schematic diagram of a network side device according to an embodiment of the present application.
  • AI modules for artificial intelligence such as neural networks, decision trees, support vector machines, Bayesian classifiers, etc.
  • An optimization algorithm is a class of algorithms that minimize or maximize an objective function (sometimes called a loss function).
  • the objective function is often a mathematical combination of model parameters and data. For example, given data X and its corresponding label Y, construct a neural network model f(.), after having the model, the predicted output f(x) can be obtained according to the input x, and the difference between the predicted value and the actual value can be calculated. The gap between (f(x)-Y), this is the loss function.
  • the purpose is to find a suitable W, b to minimize the value of the above loss function. The smaller the loss value, the closer the model is to the real situation.
  • the current common optimization algorithms are basically based on the error back propagation (error Back Propagation, BP) algorithm.
  • BP error Back Propagation
  • the basic idea of the BP algorithm is that the learning process consists of two processes, the forward propagation of the signal and the back propagation of the error.
  • input samples are passed in from the input layer, processed layer by layer in each hidden layer, and then transmitted to the output layer. If the actual output of the output layer does not match the expected output, it goes to the back-propagation stage of the error.
  • the error back propagation is to pass the output error back to the input layer layer by layer through the hidden layer in some form, and distribute the error to all units of each layer, so as to obtain the error signal of each layer unit, and this error signal is used as the correction unit. basis of weight.
  • the weight adjustment process of each layer of signal forward propagation and error back propagation is carried out repeatedly.
  • the process of continuously adjusting the weights is the learning and training process of the network. This process continues until the error of the network output is reduced to an acceptable level, or until a preset number of learning times is reached.
  • first, second, etc. in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specified order or sequence. It is to be understood that the terms so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and that "first”, “second” distinguishes Usually it is a class, and the number of objects is not limited.
  • the first object may be one or multiple.
  • “and” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • LTE Long Term Evolution
  • LTE-Advanced LTE-Advanced
  • LTE-A Long Term Evolution-Advanced
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • FDMA Frequency Division Multiple Access
  • OFDMA Orthogonal Frequency Division Multiple Access
  • SC-FDMA Single-carrier Frequency-Division Multiple Access
  • system and “network” in the embodiments of the present application are often used interchangeably, and the described technology can be used not only for the above-mentioned systems and radio technologies, but also for other systems and radio technologies.
  • NR New Radio
  • the following description describes a New Radio (NR) system for example purposes, and uses NR terminology in most of the description below, these techniques are also applicable to applications other than NR system applications, such as 6th generation (6 th Generation, 6G) communication system.
  • 6th generation 6 th Generation, 6G
  • FIG. 1 shows a block diagram of a wireless communication system to which the embodiments of the present application can be applied.
  • the wireless communication system includes a terminal 11 and a network-side device 12 .
  • the terminal 11 may also be called a terminal device or a user terminal (User Equipment, UE), and the terminal 11 may be a mobile phone, a tablet computer (Tablet Personal Computer), a laptop computer (Laptop Computer) or a notebook computer, a personal digital computer Assistant (Personal Digital Assistant, PDA), PDA, Netbook, Ultra-Mobile Personal Computer (UMPC), Mobile Internet Device (Mobile Internet Device, MID), Wearable Device (Wearable Device) or vehicle-mounted device (VUE), pedestrian terminal (PUE) and other terminal-side devices, wearable devices include: bracelets, headphones, glasses, etc.
  • PDA Personal Digital Assistant
  • the network side device 12 may be a base station or a core network side device, wherein the base station may be referred to as a Node B, an evolved Node B, an access point, a Base Transceiver Station (BTS), a radio base station, a radio transceiver, a basic Service Set (BasicServiceSet, BSS), Extended Service Set (ExtendedServiceSet, ESS), Node B, Evolved Node B (eNB), Home Node B, Home Evolved Node B, WLAN Access Point, WiFi Node, Transmit/Receive Point ( Transmitting Receiving Point, TRP) or some other suitable term in the field, as long as the same technical effect is achieved, the base station is not limited to the specified technical vocabulary.
  • the base station is taken as an example, but the specific type of the base station is not limited.
  • an embodiment of the present application provides a quantification method, and the execution subject of the method may be a first communication device, including:
  • Step 201 Determine a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
  • Step 202 Perform quantization processing on the parameters of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • the above-mentioned quantization strategy can also be called a quantization method, which refers to which method is used to quantize the parameters of the AI module.
  • the above quantization level can represent the accuracy of parameter quantization of the AI module. For example, the higher the quantization level, the more accurate the parameters of the AI module are, and the closer to the original parameters; the lower the quantization level, the rougher the parameters of the AI module, and the farther away from the original parameters.
  • the quantization level is divided into bits, and the quantization level X bits represent that the parameters of the AI module are quantized into X bits. Then, when the value of X is larger, the parameters of the AI module occupy more bits, where X is a positive integer.
  • the common single-precision type (float type) in the computer occupies 32 bits
  • the double-precision type double type occupies 64 bits, which is actually a very high-precision quantization.
  • the above-mentioned quantization configuration parameters are used to indicate the configuration for quantizing the AI module.
  • the quantization configuration parameters include one or more of the following: which quantization strategy is adopted by the AI module and how to configure the details of the quantization strategy, and all parameters of the AI module Whether a uniform quantization level is adopted, whether the quantization level of the multiplicative coefficient of the AI module is the same as the quantization level of the additive coefficient, what is the quantization level of the AI module, and how many bits are used for parameter quantization of the AI module, etc.
  • the quantization level is configured as 8 bits
  • the quantization strategy is the direct quantization method
  • all parameters of the AI module are quantized from floating point numbers to 8 bits.
  • the AI module is a neural network
  • the multiplicative and additive coefficients of all neurons The average is quantized to 8 bits.
  • the quantitative strategy may include one or more of the following:
  • the above-mentioned direct quantization method refers to quantizing various parameters of the AI module directly according to the quantization level and/or the quantization configuration parameter.
  • the above-mentioned uniform quantization method refers to a quantization method in which the parameters of the AI module (for example, the value range of the input parameters) are divided into equal intervals.
  • the above-mentioned non-uniform quantization method is a quantization method in which the quantization intervals are not equal within the dynamic range of the parameters (eg, input parameters) of the AI module.
  • the quantization interval/quantization level of different input intervals For example, for an interval with a small input value, the quantization interval is also small; conversely, for an interval with a large input value, the quantization interval is large.
  • the parameters of the AI module can be divided into multiple sets, and the elements in each set share a value.
  • the transform domain quantization method refers to transforming the parameters of the AI module (such as weights, biases, convolution kernels, etc.) to another domain, such as frequency domain, S domain, Z domain, etc., and performing quantization operations in another domain, and then Transform back again.
  • parameters of the AI module such as weights, biases, convolution kernels, etc.
  • the network convolution kernel is first transformed to the frequency domain, then random hashing is performed in the frequency domain, and a lower number of hash bits is used for the less important high-frequency parts to achieve higher compression.
  • the parameter encoding and quantization method refers to encoding the parameters of the AI module, and the encoding methods include but are not limited to: lossy encoding, lossless encoding (for example, Huffman encoding), and the like.
  • the product quantization method refers to dividing the network weights into multiple subspaces, and performing quantization operations on each subspace, such as weight sharing quantization method on each subspace.
  • the quantization strategy includes: a uniform quantization method, a weight sharing quantization method, and a parameter coding quantization method.
  • the network is uniformly quantized by the uniform quantization method, and then the weights after uniform quantization are uniformly quantized using weight sharing.
  • the quantization method is then quantized, and then the weights are quantized according to the parameter coding quantization method.
  • the step of quantizing the parameters of the first module includes:
  • the parameters of the first module are quantized according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • the cluster center For example, use the ordinary gradient calculation method to obtain the gradient corresponding to each weight, and accumulate the weight gradient values of the same group according to the previous weight groups to obtain the cluster center.
  • the update amount in this round of network training, the cluster center Subtract the product of the update amount and the learning rate from the value to get the updated cluster center of this round of training.
  • the parameter division method in the grouping quantization method includes:
  • the parameters of the AI module can be grouped in a random manner.
  • the above method (2) can also be called a direct addressing method. For example, sort the parameters of the AI module, determine the ID of each parameter, and then input the parameter ID into a linear function or an N-th function or other common functions to obtain a new value X, and obtain the set ID of the network parameter through X.
  • linear functions include functions whose output is equal to the input.
  • the determining the set identifier where the parameter is located according to the identifier of the parameter includes:
  • the set identifier where the parameter is located includes one or more of the following:
  • the identification parameter of the parameter is input into a linear function or other common mathematical function to obtain the first value (X).
  • Common mathematical functions include the combination of addition and subtraction multipliers, Nth power, Nth root, logarithm, derivation, partial derivation and other common mathematical operations.
  • N is any number, for example, N can be positive or negative or 0, real or complex.
  • the method of obtaining the ID of the set where the network parameter is located by X includes:
  • X is rounded, which is the set ID. Rounding includes rounding up, rounding down or rounding up. For example, if X is 3.23, the set ID can be 3 or 4, where 3 means round down or round up, and 4 means round up.
  • X takes at least one of them and combines them into a set ID.
  • the set ID is 25, or the 1st and 3rd digits from the back, the set ID is 72 or 27.
  • the set ID is 12 or 21, or the 1st and 2nd digits before the decimal point, the set ID is 51 or 15, or the 2nd place before the decimal point digits and 3 decimal places, the set ID is 53 or 35.
  • the set ID is 285; If the number of digits is from back to front, the set ID is 582; if the number is from large to small, the set ID is 852; if the number is from small to large, the set ID is 258.
  • X is 52, take the 1st, 3rd, and 5th digits from the front to the back, then the values of the corresponding digits are 5, 0, and 0.
  • the parameters of the AI module are grouped according to the cluster center.
  • K objects are randomly selected as the initial cluster centers, and then the distance between each object and each seed cluster center is calculated, and each object is assigned to the cluster closest to it.
  • Class Center Cluster centers and the objects assigned to them represent a cluster.
  • the cluster center of the cluster is recalculated based on the existing objects in the cluster. This process will repeat until a certain termination condition is met. Termination conditions can be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers change again, and the sum of squared errors is locally minimized.
  • the quantitative strategy and/or the quantitative configuration parameters are determined according to one or more of the following:
  • the network side can obtain the quantization strategy and/or the quantization configuration parameter according to the method reported by the terminal.
  • the quantization strategy and/or the quantization configuration parameters can be used as the capabilities of the terminal.
  • the terminal side can acquire the quantization strategy and/or the quantization configuration parameter according to the configuration of the network side.
  • the network side configures, activates or triggers through radio resource control (Radio Resource Control, RRC), media access control control element (Media Access Control Control Element, MAC CE) or downlink control information (Downlink Control Information, DCI).
  • RRC Radio Resource Control
  • Media Access Control Control Element Media Access Control Control Element
  • DCI Downlink Control Information
  • the quantization strategy is a direct quantization method
  • the step of performing quantization processing on the parameters of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter includes the following steps: :
  • the parameters of the first module are quantized according to the quantization level and/or the quantization configuration parameter of the first module.
  • the quantification level is determined according to one or more of the following:
  • the relevant information of the parameter of the first module includes: the size of the parameter;
  • different quantization levels can be determined according to the size of the parameters of the AI module. For example, the larger the parameter of the AI module, the finer the quantification; the smaller the parameter of the AI module, the coarser the quantification. Alternatively, the larger the parameter of the AI module, the coarser the quantization; the smaller the parameter of the AI module, the finer the quantization.
  • the network side can obtain the quantization level according to the method reported by the terminal.
  • the quantization level can be used as the capability of the terminal.
  • the performance requirements of an AI module are divided into multiple levels, and different levels of performance requirements correspond to different quantification levels.
  • the type of the first module is a neural network
  • the quantization levels of neurons in different layers in the neural network are the same;
  • the quantization levels of neurons in the same layer in the neural network are the same;
  • the quantization level of the multiplicative coefficients in the neural network is the same as the quantization level of the additive coefficients.
  • the type of the first module is a neural network
  • the quantization levels of neurons in different layers in the neural network are different
  • the quantization levels of neurons in the same layer in the neural network are different;
  • the quantization level of the multiplicative coefficients in the neural network is different from the quantization level of the additive coefficients.
  • the type of the first module is a Recurrent Neural Network (RNN);
  • RNN Recurrent Neural Network
  • the quantization level of the parameters of the memory units in the RNN is the same as the non-memory neurons in the RNN (including units of neurons and non-neurons).
  • the quantization level of the parameter is the same, or the same as the quantization level of the non-memory parameter of the neuron of the recurrent neural network,
  • the quantization level of the parameter of the memory unit in the RNN is different from the quantization level of the parameter of the non-memory neuron in the RNN, or the non-memory parameter of the neuron in the RNN
  • the quantization levels are not the same.
  • the type of the first module is a convolutional neural network (Convolutional Neural Networks, CNN);
  • the quantization level of the parameters of the convolution kernel of the convolutional neural network is the same or different from the quantization level of the parameters of the non-convolution kernel in the convolutional neural network,
  • the quantization level of the pooled parameters (multiplicative coefficients, additive coefficients) of the convolutional neural network is the same or different from the quantization level of the non-pooled parameters in the convolutional neural network.
  • the input or output of the first module is first information
  • the first information includes one or more of the following:
  • the reference signal is used for signal processing, including signal detection, filtering, equalization, etc., such as demodulation reference signal (Demodulation Reference Signal, DMRS), sounding reference signal (Sounding Reference Signal, SRS), synchronization signal block (Synchronization Signal and PBCH) block, SSB), tracking reference signal (Tracking Reference Signal, TRS), phase tracking reference signal (Phase-Tracking Reference Signal, PTRS), channel state information reference signal (Channel State Information Reference Signal, CSI-RS) and so on.
  • demodulation reference signal Demodulation Reference Signal
  • SRS Sounding reference signal
  • PBCH Synunding Reference Signal
  • PBCH Synchrom Reference Signal
  • TRS tracking Reference Signal
  • Phase tracking reference signal Phase tracking reference signal
  • PTRS Phase-Tracking Reference Signal
  • CSI-RS Channel State Information Reference Signal
  • the channel may include one or more of the following: Physical Downlink Control Channel (PDCCH), Physical Downlink Shared Channel (PDSCH), Physical Uplink Control Channel (PUCCH) ), Physical Uplink Shared Channel (PUSCH), Physical Random Access Channel (PRACH), Physical Broadcast Channel (PBCH), etc.
  • PDCCH Physical Downlink Control Channel
  • PDSCH Physical Downlink Shared Channel
  • PUCCH Physical Uplink Control Channel
  • PUSCH Physical Uplink Shared Channel
  • PRACH Physical Broadcast Channel
  • PBCH Physical Broadcast Channel
  • the channel state information includes channel state information feedback information and/or channel state information of the reciprocity of uplink and downlink parts in a frequency division multiplexing (Frequency Division Duplex, FDD) system.
  • FDD Frequency Division Duplex
  • the channel state information feedback information includes one or more of the following: channel related information, channel matrix related information, channel feature information, channel matrix feature information, precoding matrix indicator (PMI), rank indicator (Rank Indicator) , RI), CSI-RS Resource Indicator (CSI-RS Resource Indicator, CRI), Channel Quality Indicator (Channel Quality Indicator, CQI), Layer Indicator (Layer Indicator, LI), etc.
  • the base station obtains the angle and delay information according to the uplink channel, and can notify the UE of the angle information and delay information through CSI-RS precoding or direct indication, and the UE reports or reports the information according to the base station's instructions or It is selected and reported within the indicated range of the base station, thereby reducing the calculation amount of the UE and the overhead of CSI reporting.
  • the beam information includes one or more of the following: beam quality, beam indication information (reference signal ID), beam failure indication information, and new beam indication information in beam failure recovery. Used for beam management, including beam measurement, beam reporting, beam prediction, beam failure detection, beam failure recovery, and new beam indication in beam failure recovery.
  • the channel prediction information includes: prediction of channel state information and beam prediction.
  • the interference information includes one or more of the following: intra-cell interference information, inter-cell interference information, out-of-band interference information, intermodulation interference information, etc.
  • Positioning information (or track information);
  • a reference signal eg Sounding Reference Signal (SRS)
  • SRS Sounding Reference Signal
  • predictive information or management information may include throughput, required packet size, traffic requirements, movement speed, and/or noise information, etc.
  • the method when the output of the first module is the first information, the method further includes:
  • the first information is sent to a second communication device, or the first information is sent to a second module of the first communication device.
  • the first information includes that the following first communication device is a terminal and the second communication device is a network side device, or the first communication device is a network side device and the second communication device is a terminal or, the first communication device is a first terminal and the second communication device is a second terminal; or, the first communication device is a first network side device and the second communication device is a second network side device.
  • the AI module is quantized through a quantization strategy, a quantization level, and/or a quantization configuration parameter, so that the complexity of the AI module can be reduced and the system performance can be improved.
  • an embodiment of the present application provides a quantization apparatus, which is applied to a first communication device, and the apparatus 300 includes:
  • a first determination module 301 configured to determine a quantization strategy, a quantization level and/or a quantization configuration parameter of a first module of the first communication device, where the first module is an AI module;
  • a quantization module 302 configured to perform quantization processing on the parameters of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • the quantitative strategy includes one or more of the following:
  • the quantization strategy includes: a uniform quantization method, a weight sharing quantization method, and a parameter coding quantization method, wherein the uniform quantization method is used to uniformly quantize the network, and then the uniform quantization method The weights are quantized using the weight sharing quantization method, and then the parameter coding quantization method is performed on the weights.
  • the quantization module 302 is further configured to: in the network training phase, perform quantization processing on the parameters of the first module according to the quantization strategy, the quantization level and/or the quantization configuration parameter.
  • the parameter division method in the grouping quantization method includes:
  • the determining the set identifier where the parameter is located according to the identifier of the parameter includes:
  • the set identifier where the parameter is located includes one or more of the following:
  • the quantitative strategy and/or the quantitative configuration parameters are determined according to one or more of the following:
  • the quantization strategy is a direct quantization method
  • the quantization module 302 is further configured to: quantify the parameters of the first module according to the quantization level and/or quantization configuration parameters of the first module. quantify.
  • the quantification level is determined according to one or more of the following:
  • the relevant information of the parameter of the first module includes: the size of the parameter; wherein, the larger the parameter, the higher the quantization level, the smaller the parameter, the higher the quantization level.
  • the quantization level is lower; or, the larger the parameter is, the lower the quantization level is, and the smaller the parameter is, the higher the quantization level is.
  • the type of the first module is a neural network
  • the quantization levels of neurons in different layers in the neural network are the same;
  • the quantization levels of neurons in the same layer in the neural network are the same;
  • the quantization level of the multiplicative coefficients in the neural network is the same as the quantization level of the additive coefficients.
  • the type of the first module is a neural network
  • the quantization levels of neurons in different layers in the neural network are different
  • the quantization levels of neurons in the same layer in the neural network are different;
  • the quantization level of the multiplicative coefficients in the neural network is different from the quantization level of the additive coefficients.
  • the type of the first module is a recurrent neural network
  • the quantization level of the parameters of the memory unit in the RNN is the same as the quantization level of the parameters of the non-memory neurons in the RNN, or the same as the non-memory parameters of the neurons in the RNN The quantization level is the same,
  • the quantization level of the parameter of the memory unit in the RNN is different from the quantization level of the parameter of the non-memory neuron in the RNN, or the non-memory parameter of the neuron in the RNN
  • the quantization levels are not the same.
  • the type of the first module is a convolutional neural network
  • the quantization level of the parameters of the convolution kernel of the convolutional neural network is the same or different from the quantization level of the parameters of the non-convolution kernel in the convolutional neural network,
  • the quantization level of the pooled parameters in the convolutional neural network is the same or different from the quantization level of the non-pooled parameters in the convolutional neural network.
  • the input or output of the first module is first information
  • the first information includes one or more of the following:
  • the apparatus when the output of the first module is the first information, the apparatus further includes:
  • a sending module configured to send the first information to a second communication device, or send the first information to a second module of the first communication device.
  • the first communication device is a terminal, and the second communication device is a network-side device, or the first communication device is a network-side device, and the second communication device is terminal; alternatively, the first communication device is a first terminal, and the second communication device is a second terminal; alternatively, the first communication device is a first network-side device, and the second communication device is a second network-side device equipment.
  • the apparatus provided in the embodiment of the present application can implement each process implemented by the method embodiment shown in FIG. 2 , and achieve the same technical effect. To avoid repetition, details are not described here.
  • the terminal 400 includes but is not limited to: a radio frequency unit 401, a network module 402, an audio output unit 403, an input unit 404, a sensor 405, a display unit 406, a user
  • the terminal 400 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 410 through a power management system, so as to manage charging, discharging, and power consumption through the power management system management and other functions.
  • a power source such as a battery
  • the terminal structure shown in FIG. 4 does not constitute a limitation on the terminal, and the terminal may include more or less components than shown, or combine some components, or arrange different components, which will not be repeated here.
  • the input unit 404 may include a graphics processor (Graphics Processing Unit, GPU) 4041 and a microphone 4042. Such as camera) to obtain still pictures or video image data for processing.
  • the display unit 406 may include a display panel 4061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 407 includes a touch panel 4071 and other input devices 4072.
  • the touch panel 4071 is also called a touch screen.
  • the touch panel 4071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 4072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again.
  • the radio frequency unit 401 receives the downlink data from the network side device, and then processes it to the processor 410; in addition, sends the uplink data to the network side device.
  • the radio frequency unit 401 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like.
  • Memory 409 may be used to store software programs or instructions as well as various data.
  • the memory 409 may mainly include a storage program or instruction area and a storage data area, wherein the stored program or instruction area may store an operating system, an application program or instruction required for at least one function (such as a sound playback function, an image playback function, etc.) and the like.
  • the memory 409 may include a high-speed random access memory, and may also include a non-volatile memory, wherein the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM) , PROM), erasable programmable read-only memory (Erasable PROM, EPROM), electrically erasable programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • PROM erasable programmable read-only memory
  • Erasable PROM Erasable PROM
  • EPROM electrically erasable programmable read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • flash memory for example at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device.
  • the processor 410 may include one or more processing units; optionally, the processor 410 may integrate an application processor and a modem processor, wherein the application processor mainly processes the operating system, user interface, application programs or instructions, etc., Modem processors mainly deal with wireless communications, such as baseband processors. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 410.
  • the terminal provided in this embodiment of the present application can implement each process implemented by the method embodiment shown in FIG. 2 and achieve the same technical effect. To avoid repetition, details are not described here.
  • the network-side device 500 includes: an antenna 501 , a radio frequency device 502 , and a baseband device 503 .
  • the antenna 501 is connected to the radio frequency device 502 .
  • the radio frequency device 502 receives information through the antenna 501, and sends the received information to the baseband device 503 for processing.
  • the baseband device 503 processes the information to be sent and sends it to the radio frequency device 502
  • the radio frequency device 502 processes the received information and sends it out through the antenna 501 .
  • the foregoing frequency band processing apparatus may be located in the baseband apparatus 503 , and the method performed by the network side device in the above embodiments may be implemented in the baseband apparatus 503 , where the baseband apparatus 503 includes a processor 504 and a memory 505 .
  • the baseband device 503 may include, for example, at least one baseband board on which multiple chips are arranged. As shown in FIG. 5 , one of the chips is, for example, the processor 504 , which is connected to the memory 505 to call the program in the memory 505 and execute it.
  • the network devices shown in the above method embodiments operate.
  • the baseband device 503 may further include a network interface 506 for exchanging information with the radio frequency device 502, and the interface is, for example, a Common Public Radio Interface (CPRI).
  • CPRI Common Public Radio Interface
  • the network-side device in this embodiment of the present application further includes: an instruction or program stored in the memory 505 and executable on the processor 504, and the processor 504 invokes the instruction or program in the memory 505 to execute each module shown in FIG. 3
  • Embodiments of the present application further provide a computer program product, where the computer program product is stored in a non-volatile storage medium, and the program product is executed by at least one processor to implement the processing method described in FIG. 2 . steps, and can achieve the same technical effect, in order to avoid repetition, it is not repeated here.
  • An embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each process of the method embodiment shown in FIG. To achieve the same technical effect, in order to avoid repetition, details are not repeated here.
  • the processor is the processor in the terminal described in the foregoing embodiment.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is used to run a network-side device program or instruction to implement the above-mentioned FIG. 2
  • the chip includes a processor and a communication interface
  • the communication interface is coupled to the processor
  • the processor is used to run a network-side device program or instruction to implement the above-mentioned FIG. 2
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-chip, or a system-on-a-chip, or the like.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or in a part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM
  • the disclosed apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the related technology or the part of the technical solution.
  • the computer software product is stored in a storage medium, including several
  • the instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk and other mediums that can store program codes.
  • the realization of all or part of the processes in the methods of the above embodiments can be accomplished by controlling the relevant hardware through a computer program, and the program can be stored in a computer-readable storage medium. During execution, the processes of the embodiments of the above-mentioned methods may be included.
  • the storage medium may be a magnetic disk, an optical disk, a ROM or a RAM, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种量化的方法、装置、设备及可读存储介质,该方法包括:确定第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,第一模块为AI模块;根据量化策略、量化等级和/或量化配置参数对第一模块的参数进行量化处理。

Description

量化的方法、装置、设备及可读存储介质
相关申请的交叉引用
本申请主张在2021年03月04日在中国提交的中国专利申请No.202110240917.9的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于通信技术领域,具体涉及一种人工智能(Artificial Intelligence,AI)模块量化的方法、装置、设备及可读存储介质。
背景技术
人工智能目前在各个领域获得了广泛的应用。在通信网络中,可以通过AI模块实现人工智能。然而,目前尚无对AI模块量化的流程,造成AI模块的复杂度提升。
发明内容
本申请实施例提供一种量化的方法、装置、设备及可读存储介质,解决如何降低AI模块的复杂度的问题。
第一方面,提供一种量化的方法,由第一通信设备执行,包括:
确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为人工智能AI模块;
根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
第二方面,提供一种量化的装置,应用于第一通信设备,包括:
第一确定模块,用于确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为AI模块;
量化模块,用于根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
第三方面,提供一种通信设备,包括:处理器、存储器及存储在所述存 储器上并可在所述处理器上运行的程序,所述程序被所述处理器执行时实现如第一方面所述的方法的步骤。
第四方面,提供一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如第一方面所述的方法的步骤。
第五方面,提供一种计算机程序产品,所述计算机程序产品被存储在非易失的存储介质中,所述程序产品被至少一个处理器执行以实现如第一方面所述的方法的步骤。
第六方面,提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如第一方面所述的方法。
在本申请实施例中,通过量化策略、量化等级和/或量化配置参数对AI模块进行量化处理,从而可以降低AI模块的复杂度,提升系统性能。
附图说明
图1是本申请实施例可应用的一种无线通信系统的示意图;
图2是本申请实施例提供的量化的方法的流程图;
图3是本申请实施例提供的量化的装置的示意图;
图4为本申请实施例的终端的示意图;
图5为本申请实施例的网络侧设备的示意图。
具体实施方式
为了便于理解本申请实施例,下面先介绍以下技术点:人工智能。
人工智能目前在各个领域获得了广泛的应用。实现人工智能的AI模块有多种实现方式,例如神经网络、决策树、支持向量机、贝叶斯分类器等。
以神经网络为例,神经网络的参数通过优化算法进行优化。优化算法就是一种能够最小化或者最大化目标函数(有时候也叫损失函数)的一类算法。而目标函数往往是模型参数和数据的数学组合。例如给定数据X和其对应的标签Y,构建一个神经网络模型f(.),有了模型后,根据输入x就可以得到预测输出f(x),并且可以计算出预测值和真实值之间的差距(f(x)-Y),这个就是 损失函数。目的是找到合适的W,b使上述的损失函数的值达到最小,损失值越小,则说明模型越接近于真实情况。
目前常见的优化算法,基本都是基于误差反向传播(error Back Propagation,BP)算法。BP算法的基本思想是,学习过程由信号的正向传播与误差的反向传播两个过程组成。正向传播时,输入样本从输入层传入,经各隐层逐层处理后,传向输出层。若输出层的实际输出与期望的输出不符,则转入误差的反向传播阶段。误差反传是将输出误差以某种形式通过隐层向输入层逐层反传,并将误差分摊给各层的所有单元,从而获得各层单元的误差信号,此误差信号即作为修正各单元权值的依据。这种信号正向传播与误差反向传播的各层权值调整过程,是周而复始地进行的。权值不断调整的过程,也就是网络的学习训练过程。此过程一直进行到网络输出的误差减少到可接受的程度,或进行到预先设定的学习次数为止。
常见的优化算法有梯度下降(Gradient Descent)、随机梯度下降(Stochastic Gradient Descent,SGD)、小批量梯度下降(mini-batch gradient descent)、动量法(Momentum)、Nesterov(发明者的名字,具体为带动量的随机梯度下降)、自适应梯度下降(ADAptive GRADient descent,Adagrad)、自适应delta(ADAptive delta,Adadelta)、均方根误差降速(Root Mean Square prop,RMSprop)、自适应动量估计(Adaptive Moment Estimation,Adam)等。
这些优化算法在误差反向传播时,都是根据损失函数得到的误差/损失,对当前神经元求导数/偏导,加上学习速率、之前的梯度/导数/偏导等影响,得到梯度,将梯度传给上一层。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述指定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”所区别的对象通常为一类,并不 限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和”表示所连接对象的至少其中之一,字符“/”一般表示前后关联对象是一种“或”的关系。
值得指出的是,本申请实施例所描述的技术不限于长期演进型(Long Term Evolution,LTE)/LTE的演进(LTE-Advanced,LTE-A)系统,还可用于其他无线通信系统,诸如码分多址(Code Division Multiple Access,CDMA)、时分多址(Time Division Multiple Access,TDMA)、频分多址(Frequency Division Multiple Access,FDMA)、正交频分多址(Orthogonal Frequency Division Multiple Access,OFDMA)、单载波频分多址(Single-carrier Frequency-Division Multiple Access,SC-FDMA)和其他系统。本申请实施例中的术语“系统”和“网络”常被可互换地使用,所描述的技术既可用于以上提及的系统和无线电技术,也可用于其他系统和无线电技术。然而,以下描述出于示例目的描述了新空口(New Radio,NR)系统,并且在以下大部分描述中使用NR术语,这些技术也可应用于NR系统应用以外的应用,如第6代(6 th Generation,6G)通信系统。
图1示出本申请实施例可应用的一种无线通信系统的框图。无线通信系统包括终端11和网络侧设备12。其中,终端11也可以称作终端设备或者用户终端(User Equipment,UE),终端11可以是手机、平板电脑(Tablet Personal Computer)、膝上型电脑(Laptop Computer)或称为笔记本电脑、个人数字助理(Personal Digital Assistant,PDA)、掌上电脑、上网本、超级移动个人计算机(Ultra-Mobile Personal Computer,UMPC)、移动上网装置(Mobile Internet Device,MID)、可穿戴式设备(Wearable Device)或车载设备(VUE)、行人终端(PUE)等终端侧设备,可穿戴式设备包括:手环、耳机、眼镜等。需要说明的是,在本申请实施例并不限定终端11的具体类型。网络侧设备12可以是基站或核心网侧设备,其中,基站可被称为节点B、演进节点B、接入点、基收发机站(Base TransceiverStation,BTS)、无线电基站、无线电收发机、基本服务集(BasicServiceSet,BSS)、扩展服务集(ExtendedServiceSet,ESS)、B节点、演进型B节点(eNB)、家用B节点、家用演进型B节点、WLAN接入点、WiFi节点、发送接收点(Transmitting Receiving Point,TRP) 或所述领域中其他某个合适的术语,只要达到相同的技术效果,所述基站不限于指定技术词汇,需要说明的是,在本申请实施例中仅以NR系统中的基站为例,但是并不限定基站的具体类型。
下面结合附图,通过一些实施例及其应用场景对本申请实施例提供的一种量化的方法、装置、设备及可读存储介质进行详细地说明。
参见图2,本申请实施例提供一种量化方法,该方法的执行主体可以是第一通信设备,包括:
步骤201:确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为AI模块;
步骤202:根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
上述量化策略也可以称为量化方法,是指采用哪种方式对AI模块的参数进行量化处理。
上述量化等级可以表示AI模块的参数量化的精确度,比如,量化等级越高,AI模块的参数越精确,越接近原始参数;量化等级越低,AI模块的参数越粗糙,越远离原始参数。例如,量化等级按比特划分,量化等级X比特代表AI模块的参数量化为X比特,那么当X的值越大,则AI模块的参数占据的比特数越多,其中X为正整数。目前计算机中常见的单精度类型(float型)占据32比特,双精度型(double型)占据64比特,实际也是一种很高精度的量化。
上述量化配置参数用于表示对AI模块进行量化的配置,比如量化配置参数包括以下一项或多项:该AI模块采用何种量化策略以及该量化策略的细节如何配置、该AI模块的所有参数是否采用统一的量化等级、该AI模块的乘性系数的量化等级与加性系数的量化等级是否相同、该AI模块的量化等级是多少、该AI模块的参数量化采用多少比特等等。
例如,量化等级配置为8比特,量化策略为直接量化法,则AI模块的所有参数从浮点数均量化为8比特,假设AI模块为神经网络,则所有神经元的乘性系数与加性系数均量化为8比特。
在本申请的一种实施方式中,所述量化策略可以包括以下一项或多项:
(1)直接量化法;
上述直接量化法是指直接根据量化等级和/或量化配置参数,对AI模块的各项参数进行量化。
(2)均匀量化(Uniform quantization)法;
上述均匀量化法是指把AI模块的参数(例如输入参数的取值域)等间隔分割的量化法。
(3)非均匀量化法;
上述非均匀量化法是AI模块的参数(例如输入参数)的动态范围内量化间隔不相等的量化法。
比如,根据输入的概率密度、概率分布、累积概率分布等,确定输入的不同区间的量化间隔/量化等级。例如,对于输入取值小的区间,其量化间隔也小;反之,对于输入取值大的区间,量化间隔就大。
(4)权值共享量化法;
(5)分组量化法;
在权值共享量化法和分组量化法中,可以将AI模块的参数划分到多个集合,每个集合中的元素共享一个值。
(6)变换域量化法;
变换域量化法是指将AI模块的参数(比如权值、偏置、卷积核等)变换到另一个域,例如频域、S域、Z域等,在另一个域进行量化操作,然后再反变换回来。
示例性地,首先将网络卷积核变换到频率域,然后在频率域进行随机哈希,并且对不太重要的高频部分使用更低的哈希位数,以实现更高的压缩。
(7)参数编码量化法;
参数编码量化法是指对AI模块的参数进行编码,编码的方式包括但不限于:有损编码、无损编码(例如霍夫曼编码)等。
(8)乘积量化(Product Quantization)法。
乘积量化法是指把网络权值划分成多个子空间,并在每个子空间上进行量化操作,比如在每个子空间上进行权值共享量化法。
可选地,上述多种量化策略可以进行级联或组合。示例性地,所述量化 策略包括:均匀量化法、权值共享量化法和参数编码量化法,首先通过所述均匀量化法对网络进行均匀量化,然后对均匀量化后的权值使用权值共享量化法再进行量化,再对权值进行根据参数编码量化法进行量化处理。
在本申请的一种实施方式中,所述对所述第一模块的参数进行量化处理的步骤包括:
在网络训练阶段,根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
比如,使用普通的梯度计算方法得到每个权重所对应的梯度,根据之前的权重分组,将同一组的权重梯度值进行累加,得到聚类中心这一轮网络训练中的更新量,聚类中心值减去更新量与学习率的乘积,就得到本轮训练更新后的聚类中心。
在本申请的一种实施方式中,所述分组量化法中的参数划分方式包括:
(1)随机划分方式;
在随机划分方式中,可以按照随机的方式对AI模块的参数进行分组。
(2)根据所述参数的标识确定所述参数所在的集合标识;
上述方式(2)也可以称为直接定址法。比如,将AI模块的参数排序,定好各自的参数的ID,然后将参数ID输入线性函数或N次函数或其它常见函数,得到新的值X,通过X得到该网络参数所在的集合ID。其中,线性函数包括输出等于输入的函数。
在本申请的一种实施方式中,所述根据所述参数的标识确定所述参数所在的集合标识,包括:
根据所述参数的标识,得到第一数值;
根据所述第一数值,确定所述参数所在的集合标识;
其中,根据所述第一数值,确定所述参数所在的集合标识包括以下一项或多项:
(a)将所述第一数值取整,得到所述参数所在的集合标识;
(b)从所述第一数值中取其中至少一位,组合为所述参数所在的集合标识;
(c)将所述第一数值除以预设值,将得到的余数作为所述参数所在的集 合标识。
可选地,将所述参数的标识参数输入线性函数或其它常见数学函数,得到第一数值(X)。常见数学函数包括加减乘数、N次方、N次开根号、对数、求导、求偏导等各种常见数学操作的组合。N为任意数,例如,N可以为正数或负数或0,实数或复数。
可选地,通过X得到该网络参数所在的集合ID的方式包括:
a)X取整,即为集合ID。取整包括向上取整,向下取整或四舍五入等方式。例如,X为3.23,则集合ID可以为3或4,其中3代表向下取整或四舍五入,4代表向上取整。
b)X取其中至少一位,组合为集合ID。
例如,X为3215217,取从前面数第2位和第4位,集合ID为25,或从后面数第1位和第3位,集合ID为72或27。
又例如,X为872351.1237,取小数点后第1位和第2位,集合ID为12或21,或取小数点前第1位和第2位,集合ID为51或15,或取小数点前第2位和小数点后第3位,集合ID为53或35。
示例性地:
(i)取至少二位,则这些位上的数值按一定规律排列,成为集合ID。
例如,按位数从前往后,或位数从后往前,或按数值从大到小,或数值从小到大。比如,X为67429815,取从前往后数第1位、第3位、第5位,则这些位上的数值为5、8、2,按位数从前往后,则集合ID为285;按位数从后往前,则集合ID为582;若按数值从大到小,则集合ID为852;若按数值从小到大,则集合ID为258。
(ii)若某一位没有,则该位的数值为0,或其它默认值。
例如X为52,取从前往后数第1位、第3位、第5位,则相应位上的数值为5、0、0。
c)X除某个数,取余。
例如,X为752,某个数为11,则集合ID为4=752mod(11)。
d)根据X,随机划分集合ID。
(3)聚类划分方式。
聚类划分方式中按照聚类中心对AI模块的参数进行分组。
比如,预先将数据分为K组,则随机选取K个对象作为初始的聚类中心,然后计算每个对象与各个种子聚类中心之间的距离,把每个对象分配给距离它最近的聚类中心。聚类中心以及分配给它们的对象就代表一个聚类。每分配一个样本,聚类的聚类中心会根据聚类中现有的对象被重新计算。这个过程将不断重复直到满足某个终止条件。终止条件可以是没有(或最小数目)对象被重新分配给不同的聚类,没有(或最小数目)聚类中心再发生变化,误差平方和局部最小。
在本申请的一种实施方式中,所述量化策略和/或量化配置参数根据以下一项或多项确定:
(1)终端上报;
也就是,网络侧可以根据终端上报的方式获取量化策略和/或量化配置参数。
(2)终端的能力;
也就是,可以将量化策略和/或量化配置参数作为终端的能力。
(3)网络侧配置。
也就是,终端侧可以根据网络侧的配置获取量化策略和/或量化配置参数。
比如,网络侧通过无线资源控制(Radio Resource Control,RRC)、媒体接入控制控制单元(Media Access Control Control Element,MAC CE)或者下行控制信息(Downlink Control Information,DCI)来进行配置、激活或触发。
在本申请的一种实施方式中,所述量化策略为直接量化法,所述根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理的步骤包括:
根据所述第一模块的量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
在本申请的一种实施方式中,所述量化等级根据以下一项或多项确定:
(1)所述第一模块的参数的相关信息;
可选地,所述第一模块的参数的相关信息包括:所述参数的大小;
比如,所述参数越大,所述量化等级越高;或者,所述参数越大,所述 量化等级越低。
又比如,所述参数越小,所述量化等级越低,或者,所述参数越小,所述量化等级越高。
也就是,可以根据AI模块的参数的大小,确定不同的量化等级。比如,AI模块的参数越大,量化地越精细;AI模块的参数越小,量化地越粗糙。或者,AI模块的参数越大,量化地越粗糙;AI模块的参数越小,量化地越精细。
(2)终端上报;
也就是,网络侧可以根据终端上报的方式获取量化等级。
(3)终端的能力;
也就是,量化等级可以作为终端的能力。
(4)网络侧配置;
(5)所述第一模块的输出精度要求;
比如,AI模块的输出精度的要求越高,则量化等级越高。
(6)所述第一模块的性能要求。
比如,AI模块的性能要求分为多个等级,不同等级的性能要求对应不同的量化等级。
在本申请的一种实施方式中,所述量化等级越高,所述第一模块的参数量化的精确度越精确,或者,所述量化等级越低,所述第一模块的参数量化的精确度越粗糙。
在本申请的一种实施方式中,所述第一模块的类型为神经网络;
其中,所述神经网络中的不同层的神经元的量化等级相同;
和/或,
所述神经网络中的同一层的神经元的量化等级相同;
和/或,
所述神经网络中的乘性系数的量化等级与加性系数的量化等级相同。
在本申请的一种实施方式中,所述第一模块的类型为神经网络;
其中,所述神经网络中的不同层的神经元的量化等级不相同;
和/或,
所述神经网络中的同一层的神经元的量化等级不相同;
和/或,
所述神经网络中的乘性系数的量化等级与加性系数的量化等级不相同。
在本申请的一种实施方式中,所述第一模块的类型为循环神经网络(Recurrent Neural Network,RNN);
其中,所述循环神经网络中的记忆单元的参数(比如,乘性系数、加性系数)的量化等级,与所述循环神经网络中的非记忆神经元(包括神经元、非神经元的单位)的参数的量化等级相同、或与所述循环神经网络的神经元的非记忆参数的量化等级相同,
或者,
所述循环神经网络中的记忆单元的参数的量化等级,与所述循环神经网络中的非记忆神经元的参数的量化等级不相同、或与所述循环神经网络中的神经元的非记忆参数的量化等级不相同。
在本申请的一种实施方式中,所述第一模块的类型为卷积神经网络(Convolutional Neural Networks,CNN);
其中,
所述卷积神经网络的卷积核的参数的量化等级,与所述卷积神经网络中的非卷积核的参数的量化等级相同或不相同,
或者,
所述卷积神经网络的池化的参数(乘性系数、加性系数)的量化等级,与所述卷积神经网络中的非池化的参数的量化等级相同或不相同。
在本申请的一种实施方式中,所述第一模块的输入或输出为第一信息;
其中,所述第一信息包括以下一项或多项:
(1)参考信号;
该参考信号用于信号处理,包括信号检测、滤波、均衡等,比如包括解调参考信号(Demodulation Reference Signal,DMRS)、探测参考信号(Sounding Reference Signal,SRS)、同步信号块(Synchronization Signal and PBCH block,SSB)、跟踪参考信号(Tracking Reference Signal,TRS)、相位跟踪参考信号(Phase-Tracking Reference Signal,PTRS)、信道状态信息参考信号(Channel State Information Reference Signal,CSI-RS)等。
(2)信道承载的信号;
该信道可以包括以下一项或多项:物理下行控制信道(Physical Downlink Control Channel,PDCCH)、物理下行共享信道(Physical Downlink Shared Channel,PDSCH)、物理上行链路控制信道(Physical Uplink Control Channel,PUCCH)、物理上行共享信道(Physical Uplink Shared Channel,PUSCH)、物理随机接入信道(Physical Random Access Channel,PRACH)、物理广播信道(Physical Broadcast Channel,PBCH)等。
(3)信道状态信息;
可选地,信道状态信息包括信道状态信息反馈信息和/或频分复用(Frequency Division Duplex,FDD)系统中的上下行部分互易性的信道状态信息。
其中,信道状态信息反馈信息包括以下一项或多项:信道相关信息、信道矩阵相关信息、信道特征信息、信道矩阵特征信息、预编码矩阵指示(Precoding matrix indicator,PMI)、秩指示(Rank Indicator,RI)、CSI-RS资源指示(CSI-RS Resource Indicator,CRI)、信道质量指示(Channel Quality Indicator,CQI)、层指示(Layer Indicator,LI)等。
对于FDD系统,根据部分互异性,基站根据上行信道获取角度和时延信息,可以通过CSI-RS预编码或者直接指示的方法,将角度信息和时延信息通知UE,UE根据基站的指示上报或者在基站的指示范围内选择并上报,从而减少UE的计算量和CSI上报的开销。
(4)波束信息;
波束信息包括以下一项或多项:波束质量、波束的指示信息(参考信号ID)、波束失败指示信息、波束失败恢复中的新波束指示信息。用于波束管理,包括波束测量、波束上报、波束预测、波束失败检测、波束失败恢复、波束失败恢复中的新波束指示。
(5)信道预测信息;
信道预测信息包括:信道状态信息的预测、波束预测。
(6)干扰信息;
干扰信息包括以下一项或多项:小区内干扰信息、小区间干扰信息、带 外干扰信息、交调干扰信息等。
(7)定位信息(或者称为轨迹信息);
通过参考信号(例如探测参考信号(Sounding Reference Signal,SRS),估计出的UE的具体位置(包括水平位置和或垂直位置)或未来可能的轨迹,或辅助位置估计或轨迹估计的信息。
(8)高层业务和/或参数的预测信息;
(9)高层业务和/或参数的管理信息;
比如,预测信息或管理信息可以包括吞吐量、所需数据包大小、业务需求、移动速度、和/或噪声信息等
(10)控制信令。
比如,功率控制的相关信令,波束管理的相关信令。
在本申请的一种实施方式中,在所述第一模块的输出为第一信息的情况下,所述方法还包括:
发送所述第一信息至第二通信设备,或者发送所述第一信息至所述第一通信设备的第二模块。
其中,所述第一信息包括以下所述第一通信设备为终端,所述第二通信设备为网络侧设备,或者,所述第一通信设备为网络侧设备,所述第二通信设备为终端;或者,所述第一通信设备为第一终端和所述第二通信设备为第二终端;或者,所述第一通信设备为第一网络侧设备和所述第二通信设备为第二网络侧设备。
在本申请实施例中,通过量化策略、量化等级和/或量化配置参数对AI模块进行量化处理,从而可以降低AI模块的复杂度,提升系统性能。
参见图3,本申请实施例提供一种量化的装置,应用于第一通信设备,该装置300包括:
第一确定模块301,用于确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为AI模块;
量化模块302,用于根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
在本申请的一种实施方式中,所述量化策略包括以下一项或多项:
(1)直接量化法;
(2)均匀量化法;
(3)非均匀量化法;
(4)权值共享量化法;
(5)分组量化法;
(6)变换域量化法;
(7)参数编码量化法;
(8)乘积量化法。
在本申请的一种实施方式中,所述量化策略包括:均匀量化法、权值共享量化法和参数编码量化法,其中,通过所述均匀量化法对网络进行均匀量化,然后对均匀量化后的权值使用所述权值共享量化法再进行量化,再对权值进行所述参数编码量化法。
在本申请的一种实施方式中,量化模块302进一步用于:在网络训练阶段,根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
在本申请的一种实施方式中,所述分组量化法中的参数划分方式包括:
(1)随机划分方式;
(2)根据所述参数的标识确定所述参数所在的集合标识;
(3)聚类划分方式。
在本申请的一种实施方式中,所述根据所述参数的标识确定所述参数所在的集合标识,包括:
根据所述参数的标识,得到第一数值;
根据所述第一数值,确定所述参数所在的集合标识;
其中,根据所述第一数值,确定所述参数所在的集合标识包括以下一项或多项:
(1)将所述第一数值取整,得到所述参数所在的集合标识;
(2)从所述第一数值中取其中至少一位,组合为所述参数所在的集合标识;
(3)将所述第一数值除以预设值,将得到的余数作为所述参数所在的集 合标识。
在本申请的一种实施方式中,所述量化策略和/或量化配置参数根据以下一项或多项确定:
(1)终端上报;
(2)终端的能力;
(3)网络侧配置。
在本申请的一种实施方式中,所述量化策略为直接量化法,量化模块302进一步用于:根据所述第一模块的量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
在本申请的一种实施方式中,所述量化等级根据以下一项或多项确定:
(1)所述第一模块的参数的相关信息;
(2)终端上报;
(3)终端的能力;
(4)网络侧配置;
(5)所述第一模块的输出精度要求;
(6)所述第一模块的性能要求。
在本申请的一种实施方式中,所述第一模块的参数的相关信息包括:所述参数的大小;其中,所述参数越大,所述量化等级越高,所述参数越小,所述量化等级越低;或者,所述参数越大,所述量化等级越低,所述参数越小,所述量化等级越高。
在本申请的一种实施方式中,所述量化等级越高,所述第一模块的参数量化的精确度越精确,或者,所述量化等级越低,所述第一模块的参数量化的精确度越粗糙。
在本申请的一种实施方式中,所述第一模块的类型为神经网络;
其中,所述神经网络中的不同层的神经元的量化等级相同;
和/或,
所述神经网络中的同一层的神经元的量化等级相同;
和/或,
所述神经网络中的乘性系数的量化等级与加性系数的量化等级相同。
在本申请的一种实施方式中,所述第一模块的类型为神经网络;
其中,所述神经网络中的不同层的神经元的量化等级不相同;
和/或,
所述神经网络中的同一层的神经元的量化等级不相同;
和/或,
所述神经网络中的乘性系数的量化等级与加性系数的量化等级不相同。
在本申请的一种实施方式中,所述第一模块的类型为循环神经网络;
其中,所述循环神经网络中的记忆单元的参数的量化等级,与所述循环神经网络中的非记忆神经元的参数的量化等级相同、或与所述循环神经网络的神经元的非记忆参数的量化等级相同,
或者,
所述循环神经网络中的记忆单元的参数的量化等级,与所述循环神经网络中的非记忆神经元的参数的量化等级不相同、或与所述循环神经网络中的神经元的非记忆参数的量化等级不相同。
在本申请的一种实施方式中,所述第一模块的类型为卷积神经网络;
其中,
所述卷积神经网络的卷积核的参数的量化等级,与所述卷积神经网络中的非卷积核的参数的量化等级相同或不相同,
或者,
所述卷积神经网络的池化的参数的量化等级,与所述卷积神经网络中的非池化的参数的量化等级相同或不相同。
在本申请的一种实施方式中,所述第一模块的输入或输出为第一信息;
其中,所述第一信息包括以下一项或多项:
(1)参考信号;
(2)信道承载的信号;
(3)信道状态信息;
(4)波束信息;
(5)信道预测信息;
(6)干扰信息;
(7)定位信息;
(8)高层业务和/或参数的预测信息;
(9)高层业务和/或参数的管理信息;
(10)控制信令。
在本申请的一种实施方式中,在所述第一模块的输出为第一信息的情况下,所述装置还包括:
发送模块,用于发送所述第一信息至第二通信设备,或者发送所述第一信息至所述第一通信设备的第二模块。
在本申请的一种实施方式中,所述第一通信设备为终端,所述第二通信设备为网络侧设备,或者,所述第一通信设备为网络侧设备,所述第二通信设备为终端;或者,所述第一通信设备为第一终端,第二通信设备为第二终端;或者,所述第一通信设备为第一网络侧设备,所述第二通信设备为第二网络侧设备。
本申请实施例提供的装置能够实现图2所示的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
图4为实现本申请实施例的一种终端的硬件结构示意图,该终端400包括但不限于:射频单元401、网络模块402、音频输出单元403、输入单元404、传感器405、显示单元406、用户输入单元407、接口单元408、存储器409、以及处理器410等部件。
本领域技术人员可以理解,终端400还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器410逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图4中示出的终端结构并不构成对终端的限定,终端可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
应理解的是,本申请实施例中,输入单元404可以包括图形处理器(Graphics Processing Unit,GPU)4041和麦克风4042,图形处理器4041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元406可包括显示面板4061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板4061。用户输入单 元407包括触控面板4071以及其他输入设备4072。触控面板4071,也称为触摸屏。触控面板4071可包括触摸检测装置和触摸控制器两个部分。其他输入设备4072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。
本申请实施例中,射频单元401将来自网络侧设备的下行数据接收后,给处理器410处理;另外,将上行的数据发送给网络侧设备。通常,射频单元401包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。
存储器409可用于存储软件程序或指令以及各种数据。存储器409可主要包括存储程序或指令区和存储数据区,其中,存储程序或指令区可存储操作系统、至少一个功能所需的应用程序或指令(比如声音播放功能、图像播放功能等)等。此外,存储器409可以包括高速随机存取存储器,还可以包括非易失性存储器,其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。
处理器410可包括一个或多个处理单元;可选的,处理器410可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序或指令等,调制解调处理器主要处理无线通信,如基带处理器。可以理解的是,上述调制解调处理器也可以不集成到处理器410中。
本申请实施例提供的终端能够实现图2所示的方法实施例实现的各个过程,并达到相同的技术效果,为避免重复,这里不再赘述。
本申请实施例还提供了一种网络侧设备。如图5所示,该网络侧设备500包括:天线501、射频装置502、基带装置503。天线501与射频装置502连接。在上行方向上,射频装置502通过天线501接收信息,将接收的信息发送给基带装置503进行处理。在下行方向上,基带装置503对要发送的信息进行处理,并发送给射频装置502,射频装置502对收到的信息进行处理后经过天线501发送出去。
上述频带处理装置可以位于基带装置503中,以上实施例中网络侧设备执行的方法可以在基带装置503中实现,该基带装置503包括处理器504和存储器505。
基带装置503例如可以包括至少一个基带板,该基带板上设置有多个芯片,如图5所示,其中一个芯片例如为处理器504,与存储器505连接,以调用存储器505中的程序,执行以上方法实施例中所示的网络设备操作。
该基带装置503还可以包括网络接口506,用于与射频装置502交互信息,该接口例如为通用公共无线接口(Common Public Radio Interface,CPRI)。
具体地,本申请实施例的网络侧设备还包括:存储在存储器505上并可在处理器504上运行的指令或程序,处理器504调用存储器505中的指令或程序执行图3所示各模块执行的方法,并达到相同的技术效果,为避免重复,故不在此赘述。
本申请实施例还提供一种计算机程序产品,所述计算机程序产品被存储在非易失的存储介质中,所述程序产品被至少一个处理器执行以实现如图2所述的处理的方法的步骤,且能达到相同的技术效果,为避免重复,这里不再赘述。
本申请实施例还提供一种可读存储介质,所述可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述图2所示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,所述处理器为上述实施例中所述的终端中的处理器。所述可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行网络侧设备程序或指令,实现上述图2所示方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片,系统芯片,芯片系统或片上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意 在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器或者网络设备等)执行本申请各个实施例所述的方法。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直 接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、ROM或RAM等。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (23)

  1. 一种量化的方法,由第一通信设备执行,其中,所述方法包括:
    确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为人工智能AI模块;
    根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
  2. 根据权利要求1所述的方法,其中,所述量化策略包括以下一项或多项:
    直接量化法;
    均匀量化法;
    非均匀量化法;
    权值共享量化法;
    分组量化法;
    变换域量化法;
    参数编码量化法;
    乘积量化法。
  3. 根据权利要求1所述的方法,其中,所述根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理的步骤包括:
    在网络训练阶段,根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
  4. 根据权利要求2所述的方法,其中,所述分组量化法中的参数划分方式包括:
    随机划分方式;
    根据所述参数的标识确定所述参数所在的集合标识;
    聚类划分方式。
  5. 根据权利要求4所述的方法,其中,所述根据所述参数的标识确定所述参数所在的集合标识,包括:
    根据所述参数的标识,得到第一数值;
    根据所述第一数值,确定所述参数所在的集合标识;
    其中,根据所述第一数值,确定所述参数所在的集合标识包括以下一项或多项:
    将所述第一数值取整,得到所述参数所在的集合标识;
    从所述第一数值中取其中至少一位,组合为所述参数所在的集合标识;
    将所述第一数值除以预设值,将得到的余数作为所述参数所在的集合标识。
  6. 根据权利要求1所述的方法,其中,所述量化策略和/或量化配置参数根据以下一项或多项确定:
    终端上报;
    终端的能力;
    网络侧配置。
  7. 根据权利要求2所述的方法,其中,所述量化策略为直接量化法,所述根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理的步骤包括:
    根据所述第一模块的量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
  8. 根据权利要求1所述的方法,其中,所述量化等级根据以下一项或多项确定:
    所述第一模块的参数的相关信息;
    终端上报;
    终端的能力;
    网络侧配置;
    所述第一模块的输出精度要求;
    所述第一模块的性能要求。
  9. 根据权利要求8所述的方法,其中,所述第一模块的参数的相关信息包括:所述参数的大小;
    其中,所述参数越大,所述量化等级越高;或者,所述参数越大,所述量化等级越低。
  10. 根据权利要求1所述的方法,其中,所述量化等级越高,所述第一模块的参数量化的精确度越精确,或者,所述量化等级越低,所述第一模块的参数量化的精确度越粗糙。
  11. 根据权利要求1所述的方法,其中,所述第一模块的类型为神经网络;
    其中,所述神经网络中的不同层的神经元的量化等级相同;
    和/或,
    所述神经网络中的同一层的神经元的量化等级相同;
    和/或,
    所述神经网络中的乘性系数的量化等级与加性系数的量化等级相同。
  12. 根据权利要求1所述的方法,其中,所述第一模块的类型为神经网络;
    其中,所述神经网络中的不同层的神经元的量化等级不相同;
    和/或,
    所述神经网络中的同一层的神经元的量化等级不相同;
    和/或,
    所述神经网络中的乘性系数的量化等级与加性系数的量化等级不相同。
  13. 根据权利要求1所述的方法,其中,所述第一模块的类型为循环神经网络;
    其中,所述循环神经网络中的记忆单元的参数的量化等级,与所述循环神经网络中的非记忆神经元的参数的量化等级相同、或与所述循环神经网络的神经元的非记忆参数的量化等级相同,
    或者,
    所述循环神经网络中的记忆单元的参数的量化等级,与所述循环神经网络中的非记忆神经元的参数的量化等级不相同、或与所述循环神经网络中的神经元的非记忆参数的量化等级不相同。
  14. 根据权利要求1所述的方法,其中,所述第一模块的类型为卷积神经网络;
    其中,
    所述卷积神经网络的卷积核的参数的量化等级,与所述卷积神经网络中的非卷积核的参数的量化等级相同或不相同,
    或者,
    所述卷积神经网络的池化的参数的量化等级,与所述卷积神经网络中的非池化的参数的量化等级相同或不相同。
  15. 根据权利要求1所述的方法,其中,所述第一模块的输入或输出为第一信息;
    其中,所述第一信息包括以下一项或多项:
    参考信号;
    信道承载的信号;
    信道状态信息;
    波束信息;
    信道预测信息;
    干扰信息;
    定位信息;
    高层业务和/或参数的预测信息;
    高层业务和/或参数的管理信息;
    控制信令。
  16. 根据权利要求15所述的方法,其中,在所述第一模块的输出为第一信息的情况下,所述方法还包括:
    发送所述第一信息至第二通信设备,或者发送所述第一信息至所述第一通信设备的第二模块。
  17. 根据权利要求16所述的方法,其中,所述第一通信设备为终端,所述第二通信设备为网络侧设备;
    或者,
    所述第一通信设备为网络侧设备,所述第二通信设备为终端;
    或者,
    所述第一通信设备为第一终端,所述第二通信设备为第二终端;
    或者,
    所述第一通信设备为第一网络侧设备,所述第二通信设备为第二网络侧设备。
  18. 一种量化的装置,应用于第一通信设备,其中,所述装置包括:
    第一确定模块,用于确定所述第一通信设备的第一模块的量化策略、量化等级和/或量化配置参数,所述第一模块为AI模块;
    量化模块,用于根据所述量化策略、量化等级和/或量化配置参数,对所述第一模块的参数进行量化处理。
  19. 一种通信设备,包括:处理器、存储器及存储在所述存储器上并可在所述处理器上运行的程序,其中,所述程序被所述处理器执行时实现如权利要求1至17中任一项所述的方法的步骤。
  20. 一种可读存储介质,所述可读存储介质上存储程序或指令,其中,所述程序或指令被处理器执行时实现如权利要求1至17中任一项所述的方法的步骤。
  21. 一种芯片,包括处理器和通信接口,其中,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如权利要求1至17中任一项所述的方法的步骤。
  22. 一种计算机程序产品,其中,所述计算机程序产品被存储在非易失的存储介质中,所述计算机程序产品被至少一个处理器执行以实现如权利要求1至17中任一项所述的方法的步骤。
  23. 一种通信设备,被配置为执行如权利要求1至17中任一项所述的方法的步骤。
PCT/CN2022/078241 2021-03-04 2022-02-28 量化的方法、装置、设备及可读存储介质 WO2022184009A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110240917.9 2021-03-04
CN202110240917.9A CN115037608A (zh) 2021-03-04 2021-03-04 量化的方法、装置、设备及可读存储介质

Publications (1)

Publication Number Publication Date
WO2022184009A1 true WO2022184009A1 (zh) 2022-09-09

Family

ID=83118095

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/078241 WO2022184009A1 (zh) 2021-03-04 2022-02-28 量化的方法、装置、设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN115037608A (zh)
WO (1) WO2022184009A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110659678A (zh) * 2019-09-09 2020-01-07 腾讯科技(深圳)有限公司 一种用户行为分类方法、系统及存储介质
CN111160517A (zh) * 2018-11-07 2020-05-15 杭州海康威视数字技术股份有限公司 一种深度神经网络的卷积层量化方法及装置
CN111582432A (zh) * 2019-02-19 2020-08-25 北京嘉楠捷思信息技术有限公司 一种网络参数处理方法及装置
CN112085182A (zh) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 数据处理方法、装置、计算机设备和存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2003291924B2 (en) * 2003-10-29 2009-05-28 Siemens Aktiengesellschaft Method for the operation of a technical system
CN109543826A (zh) * 2017-09-21 2019-03-29 杭州海康威视数字技术股份有限公司 一种基于深度神经网络的激活量量化方法及装置
CN108647250A (zh) * 2018-04-19 2018-10-12 郑州科技学院 一种基于人工智能的人才大数据量化精确匹配方法
CN110223105B (zh) * 2019-05-17 2020-12-01 知量科技(深圳)有限公司 基于人工智能模型的交易策略生成方法和引擎
CN112215331A (zh) * 2019-07-10 2021-01-12 华为技术有限公司 对神经网络系统中的数据处理方法和神经网络系统
CN111582476A (zh) * 2020-05-09 2020-08-25 北京百度网讯科技有限公司 自动量化策略搜索方法、装置、设备以及存储介质
CN111667054B (zh) * 2020-06-05 2023-09-01 北京百度网讯科技有限公司 生成神经网络模型的方法、装置、电子设备以及存储介质
CN112287986B (zh) * 2020-10-16 2023-07-18 浪潮(北京)电子信息产业有限公司 一种图像处理方法、装置、设备及可读存储介质
CN112149266A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 确定网络模型量化策略的方法、装置、设备以及存储介质
CN112288697B (zh) * 2020-10-23 2023-07-28 北京百度网讯科技有限公司 用于量化异常程度的方法、装置、电子设备及可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160517A (zh) * 2018-11-07 2020-05-15 杭州海康威视数字技术股份有限公司 一种深度神经网络的卷积层量化方法及装置
CN111582432A (zh) * 2019-02-19 2020-08-25 北京嘉楠捷思信息技术有限公司 一种网络参数处理方法及装置
CN112085182A (zh) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 数据处理方法、装置、计算机设备和存储介质
CN110659678A (zh) * 2019-09-09 2020-01-07 腾讯科技(深圳)有限公司 一种用户行为分类方法、系统及存储介质

Also Published As

Publication number Publication date
CN115037608A (zh) 2022-09-09

Similar Documents

Publication Publication Date Title
WO2022083593A1 (zh) 波束上报方法、波束信息确定方法及相关设备
WO2022078276A1 (zh) Ai网络参数的配置方法和设备
US20230244911A1 (en) Neural network information transmission method and apparatus, communication device, and storage medium
US11411600B2 (en) Processing of uplink data streams
US20230299910A1 (en) Communications data processing method and apparatus, and communications device
WO2022184009A1 (zh) 量化的方法、装置、设备及可读存储介质
WO2022083619A1 (zh) 通信信息的发送、接收方法及通信设备
WO2022105907A1 (zh) Ai网络部分输入缺失的处理方法和设备
WO2022152105A1 (zh) 信道状态信息的上报方法、装置及终端
JP2024512358A (ja) 情報報告方法、装置、第1機器及び第2機器
CN115913486A (zh) 信息上报方法、装置、终端及可读存储介质
WO2024041421A1 (zh) 测量反馈处理方法、装置、终端及网络侧设备
WO2022184011A1 (zh) 信息处理方法、装置、通信设备及可读存储介质
WO2024041420A1 (zh) 测量反馈处理方法、装置、终端及网络侧设备
WO2024032695A1 (zh) Csi预测处理方法、装置、通信设备及可读存储介质
WO2024041419A1 (zh) 信息传输方法、装置、终端及网络侧设备
WO2024032694A1 (zh) Csi预测处理方法、装置、通信设备及可读存储介质
WO2024037380A1 (zh) 信道信息处理方法、装置、通信设备及存储介质
WO2023040888A1 (zh) 数据传输方法及装置
US20220376955A1 (en) Methods and wireless network for selecting pilot pattern for optimal channel estimation
WO2024067281A1 (zh) Ai模型的处理方法、装置及通信设备
WO2024104070A1 (zh) 波束报告的发送方法、接收方法、装置及通信设备
WO2023078229A1 (zh) 信息上报方法、终端及网络侧设备
WO2024032606A1 (zh) 信息传输方法、装置、设备、系统及存储介质
WO2024046202A1 (zh) 发送方法、用户设备、网络侧设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22762474

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22762474

Country of ref document: EP

Kind code of ref document: A1