WO2024109907A1 - Quantization method and apparatus, and recommendation method and apparatus - Google Patents

Quantization method and apparatus, and recommendation method and apparatus Download PDF

Info

Publication number
WO2024109907A1
WO2024109907A1 PCT/CN2023/133825 CN2023133825W WO2024109907A1 WO 2024109907 A1 WO2024109907 A1 WO 2024109907A1 CN 2023133825 W CN2023133825 W CN 2023133825W WO 2024109907 A1 WO2024109907 A1 WO 2024109907A1
Authority
WO
WIPO (PCT)
Prior art keywords
precision
representation
low
full
embedding
Prior art date
Application number
PCT/CN2023/133825
Other languages
French (fr)
Chinese (zh)
Inventor
郭慧丰
李世伟
侯璐
章伟
唐睿明
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024109907A1 publication Critical patent/WO2024109907A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/042Knowledge-based neural networks; Logical representations of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the field of computers, and in particular to a quantification method, a recommendation method and a device.
  • Machine learning systems including personalized recommendation systems, train the parameters of machine learning models based on input data and labels through optimization methods such as gradient descent. After the model parameters converge, the model can be used to predict unknown data.
  • the model can usually include an embedding layer and a multilayer perceptron (MLP) layer.
  • the embedding layer is usually used to map high-dimensional sparse data to low-dimensional dense vectors
  • the MLP is usually used to fit the combination relationship between features, sequence information, or click rate, etc.
  • the input data volume of the recommendation model is usually very large, so the scale of the embedding layer is very large, resulting in a large amount of storage space required during storage and training.
  • the present application provides a quantization method, a recommended method and a device for quantizing each feature in a full-precision embedded representation based on an adaptive step size, thereby improving the quantization accuracy.
  • the present application provides a quantization method, comprising: first, obtaining a full-precision embedded representation, the embedded representation including multiple features; determining an adaptive step size corresponding to each of the multiple features, the step sizes corresponding to the multiple features may be the same or different; then quantizing the multiple features according to the adaptive step size corresponding to each feature, to obtain a low-precision embedded representation, the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation, so that the storage resources or transmission resources required to save or transmit the low-precision embedded representation are lower than the storage resources required to save or transmit the full-precision embedded representation, thereby reducing the storage space required to save or transmit the embedded representation.
  • the adaptive step size corresponding to each feature in the process of quantizing the full-precision embedded representation, can be calculated, and quantization can be performed based on the adaptive step size corresponding to each feature, thereby improving the quantization accuracy and avoiding the loss of accuracy caused by the fixed step size. For example, when a certain feature is updated less frequently, if a fixed step size is used, the quantization accuracy of the less updated part may be reduced due to the step size.
  • each feature has a corresponding adaptive step size, which matches the length of each feature or the amount of updated data, thereby avoiding data loss during quantization and improving quantization accuracy.
  • a low-precision embedding representation vocabulary is applied to a neural network, and the aforementioned acquisition of the full-precision embedding representation vocabulary may include: acquiring a representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain a low-precision embedding representation of the current iteration; and dequantizing the low-precision embedding representation of the current iteration to obtain a full-precision embedding representation of the current iteration.
  • the quantization method provided in the present application can be applied to quantization in the process of neural network training.
  • a low-precision embedded representation is transmitted, and a full-precision embedded representation can be obtained by dequantizing it through the corresponding adaptive step size, thereby achieving full-precision restoration of the low-precision embedded representation and obtaining a lossless full-precision embedded representation, which can reduce the storage space occupied by the embedded representation during the neural network training process.
  • the aforementioned determination of the adaptive step size corresponding to each of the multiple features may include: using the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtaining an updated full-precision embedding representation based on the full-precision gradient to obtain an updated full-precision embedding representation; obtaining the adaptive step size corresponding to each feature in the updated full-precision embedding representation based on the full-precision gradient.
  • the adaptive step size corresponding to each feature can be determined based on the full-precision gradient, so that the step size can be adaptively updated to obtain an adaptive step size that matches each feature. This can avoid reducing the quantization accuracy due to the small update amount in the embedded representation, and can improve the quantization accuracy.
  • the aforementioned quantizing of multiple features according to the adaptive step size corresponding to each feature includes: quantizing multiple features in the full-precision low-dimensional representation of the current iteration according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation.
  • the method provided in the present application may further include: updating a low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
  • the new low-precision embedding representation can be written back into the low-precision embedding representation vocabulary to facilitate subsequent low-precision storage or transmission.
  • the aforementioned determination of the adaptive step length corresponding to each of the multiple features may include: calculating the adaptive step length corresponding to each feature by using a heuristic algorithm.
  • the adaptive step size can be calculated by a heuristic algorithm, which can be applicable to the scenario of storing a low-precision embedded representation vocabulary.
  • the aforementioned calculation of the adaptive step size corresponding to each feature by a heuristic algorithm may include: calculating the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature. Therefore, the adaptive step size can be calculated based on the weight value of each feature itself without relying on external data.
  • the aforementioned quantizing of multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation vocabulary may also include: obtaining discrete features of each feature according to the adaptive step size corresponding to each feature; truncating the discrete features of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
  • each feature can be truncated by a random truncation algorithm, so that effective features can be adaptively retained and quantization accuracy can be improved.
  • the low-precision embedding representation vocabulary is applied to a language model or a recommendation model
  • the language model is used to obtain the semantic information of the corpus
  • the recommendation model is used to generate recommendation information based on the user's information. Therefore, the method provided in this application can be applied to natural language processing or recommendation scenarios, etc.
  • the present application provides a recommendation method, comprising: obtaining input data, the input data including data generated by a user for at least one behavior of a terminal; obtaining a low-precision embedded representation corresponding to the input data from a low-precision embedded representation vocabulary, the low-precision embedded representation including multiple features; dequantizing the multiple features according to an adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation, and the adaptive step size may be an adaptive step size obtained when quantizing the full-precision embedded representation; outputting recommendation information based on the full-precision embedded representation as input to a neural network, and the recommendation information is used to make recommendations for at least one behavior of the user.
  • the low-precision embedded representation can be dequantized using an adaptive step size to obtain a full-precision embedded representation, so that the low-precision can be saved or transmitted during the reasoning process, and the full-precision embedded representation can be obtained by losslessly restoring the adaptive step size. This can reduce the storage space occupied by the embedded representation vocabulary and perform lossless restoration when used.
  • the neural network includes a language model or a recommendation model
  • the language model is used to obtain semantic information of the corpus
  • the recommendation model is used to generate recommendation information based on user information.
  • the present application provides a quantization device, comprising:
  • An acquisition module is used to acquire a full-precision embedded representation, where the embedded representation includes multiple features
  • a determination module used to determine the adaptive step size corresponding to each of the multiple features
  • the quantization module is used to quantize multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, where the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
  • a low-precision embedding representation vocabulary is applied to a neural network.
  • the acquisition module is specifically used to obtain the representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; dequantize the low-precision embedding representation of the current iteration to obtain the full-precision embedding representation of the current iteration.
  • the determination module is specifically used to: use the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtain the updated full-precision embedding representation according to the full-precision gradient to obtain the updated full-precision embedding representation; obtain the adaptive step size corresponding to each feature in the updated full-precision embedding representation according to the full-precision gradient.
  • the quantization module is specifically configured to quantize the current iteration according to the adaptive step size corresponding to each feature.
  • the multiple features in the full-precision low-dimensional representation of the previous generation are quantized to obtain a low-precision embedded representation.
  • the acquisition module is further configured to update the low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
  • the determination module is specifically configured to calculate the adaptive step size corresponding to each feature by using a heuristic algorithm.
  • the determination module is specifically configured to calculate the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
  • the quantization module is specifically used to: obtain a discrete feature of each feature according to an adaptive step size corresponding to each feature; and truncate the discrete feature of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
  • the low-precision embedding representation vocabulary is applied to a language model or a recommendation model.
  • the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
  • the present application provides a recommendation device, comprising:
  • An input module used to obtain input data, where the input data includes data generated by at least one behavior of a user on a terminal;
  • An acquisition module is used to acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, where the low-precision embedding representation includes multiple features;
  • a dequantization module used to dequantize multiple features according to the adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation
  • the recommendation module is used to output recommendation information based on the full-precision embedding representation as the input of the neural network, and the recommendation information is used to recommend at least one behavior of the user.
  • the neural network includes a language model or a recommendation model
  • the language model is used to obtain semantic information of the corpus
  • the recommendation model is used to generate recommendation information based on user information.
  • the present application provides a quantification device, which includes: a processor, a memory, an input/output device, and a bus; the memory stores computer instructions; when the processor executes the computer instructions in the memory, the memory stores computer instructions; when the processor executes the computer instructions in the memory, it is used to implement any one of the implementation methods of the first aspect.
  • the present application provides a recommendation device, comprising: a processor, a memory, an input/output device, and a bus; the memory stores computer instructions; when the processor executes the computer instructions in the memory, the memory stores computer instructions; when the processor executes the computer instructions in the memory, the device is used to implement any one of the implementation methods of the second aspect.
  • an embodiment of the present application provides a chip system, which includes a processor and an input/output port, wherein the processor is used to implement the processing functions involved in the method described in the first aspect or the second aspect above, and the input/output port is used to implement the transceiver functions involved in the method described in the first aspect or the second aspect above.
  • the chip system also includes a memory, which is used to store program instructions and data for implementing the functions involved in the method described in the first aspect or the second aspect above.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions; when the computer instructions are executed on a computer, the computer executes the method described in any possible implementation of the first aspect or the second aspect.
  • an embodiment of the present application provides a computer program product.
  • the computer program product includes a computer program or instructions, and when the computer program or instructions are executed on a computer, the computer executes the method described in any possible implementation of the first aspect or the second aspect.
  • FIG1 is a schematic diagram of an artificial intelligence main framework used in this application.
  • FIG2 is a schematic diagram of a system architecture provided by the present application.
  • FIG3 is a schematic diagram of another system architecture provided by the present application.
  • FIG4 is a schematic diagram of an application scenario provided by the present application.
  • FIG5A is a schematic diagram of another application scenario provided by the present application.
  • FIG5B is a schematic diagram of another application scenario provided by the present application.
  • FIG6 is a flowchart of a quantification method provided by the present application.
  • FIG7 is a flowchart of another quantification method provided by the present application.
  • FIG8 is a flowchart of another quantification method provided by the present application.
  • FIG9 is a schematic diagram of another application scenario provided by the present application.
  • FIG10 is a schematic diagram of another application scenario provided by the present application.
  • FIG11 is a schematic diagram of another application scenario provided by the present application.
  • FIG12 is a schematic diagram of a flow chart of a recommended method provided by the present application.
  • FIG13 is a schematic diagram of the structure of a quantization device provided by the present application.
  • FIG14 is a schematic diagram of the structure of a recommended device provided by the present application.
  • FIG15 is a schematic diagram of the structure of a quantization device provided by the present application.
  • FIG16 is a schematic diagram of the structure of a recommended device provided by the present application.
  • FIG17 is a schematic diagram of the structure of a chip provided in the present application.
  • AI artificial intelligence
  • AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making.
  • Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
  • Figure 1 shows a structural diagram of the main framework of artificial intelligence.
  • the following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain” (horizontal axis) and “IT value chain” (vertical axis).
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom".
  • the "IT value chain” reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
  • the infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • smart chips CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips
  • the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc.
  • sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
  • the data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence.
  • the data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
  • Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
  • some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
  • Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.
  • the embodiments of the present application involve related applications of neural networks.
  • the relevant terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.
  • Convolutional neural network is a deep neural network with a convolutional structure.
  • Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer, which can be regarded as a filter.
  • Convolutional layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal.
  • a neuron can only be connected to some neurons in the adjacent layers.
  • a convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units.
  • the neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position.
  • Convolution kernels can be formalized as matrices of random sizes, and convolution kernels can obtain reasonable weights through learning during the training process of convolutional neural networks.
  • the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
  • Graph neural network is a deep learning model that models and processes non-Euclidean spatial data (such as graph data). Its principle is to use pairwise message passing so that graph nodes iteratively update their corresponding representations by exchanging information with their neighbors.
  • GCN is similar to CNN, except that the input of CNN is usually two-dimensional structured data, while the input of GCN is usually graph structured data.
  • GCN has cleverly designed a method to extract features from graph data, so that these features can be used to perform node classification, graph classification, link prediction, and graph embedding.
  • the loss function is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value.
  • the loss function can usually include loss functions such as squared error, cross entropy, logarithm, exponential, etc.
  • the squared error can be used as a loss function, defined as The specific loss function can be selected according to the actual application scenario.
  • the neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller.
  • BP error back propagation
  • the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges.
  • the back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
  • the BP algorithm in the training stage or the inference stage, can be used to train the model to obtain the training After the model.
  • Stochastic gradient The number of samples in machine learning is very large, so the loss function is calculated each time based on data obtained by random sampling, and the corresponding gradient is called stochastic gradient.
  • Embedding refers to the feature representation of samples or word embedding representation.
  • the recommendation system uses machine learning algorithms to analyze and learn based on the user's historical click behavior data, then predicts the user's new requests and returns a personalized item recommendation list.
  • Model quantization This is a model compression method that converts high bits into low bits.
  • the model compression technology that converts conventional 32-bit floating-point operations into low-bit integer operations can be called model quantization.
  • model quantization when the low bit is quantized to 8 bits, it can be called int8 quantization, that is, a weight originally needs to be represented by float32, but after quantization, it only needs to be represented by int8. In theory, it can achieve 4 times network acceleration.
  • 8 bits can reduce 4 times the storage space compared to 32 bits, reducing storage space and computing time, thereby achieving the purpose of compressing the model and accelerating.
  • Automatic machine learning refers to the design of a series of advanced control systems to operate machine learning models so that the models can automatically learn appropriate parameters and configurations without human intervention.
  • automatic computational learning mainly includes network architecture search and global parameter setting.
  • network architecture search is used to allow computers to generate the neural network architecture that best suits the problem based on data, which has the characteristics of high training complexity and great performance improvement.
  • Corpus Also known as free text, it can be words, phrases, sentences, fragments, articles, or any combination thereof. For example, “Today’s weather is really nice” is a piece of corpus.
  • Neural machine translation is a typical task in natural language processing. Given a sentence in a source language, the task is to output a corresponding sentence in a target language. In the commonly used neural machine translation model, the words in the sentences of the source language and the target language are encoded into vector representations, and the associations between words and sentences are calculated in the vector space to perform the translation task.
  • Pre-trained language model It is a natural language sequence encoder that encodes each word in a natural language sequence into a vector representation for prediction tasks.
  • the training of PLM consists of two stages, namely the pre-training stage and the fine-tuning stage.
  • the pre-training stage the model is trained on language model tasks on large-scale unsupervised text to learn word representation.
  • the fine-tuning stage the model is initialized using the parameters learned in the pre-training stage and trained on downstream tasks such as text classification or sequence labeling with fewer steps, so that the semantic information obtained from pre-training can be successfully transferred to downstream tasks.
  • CTR Click Through Rate
  • Post-click conversion rate refers to the probability that a user converts a clicked item in a specific environment. For example, if a user clicks on the icon of an APP, conversion refers to downloading, installing, registering, etc.
  • An epoch can be considered the number of times the neural network is trained using the entire training set.
  • the recommendation method provided in the embodiment of the present application can be executed on a server or on a terminal device.
  • the terminal device can be a mobile phone with image processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), a personal digital assistant (PDA), a personal computer (PC), a camera, a video camera, a smart watch, a wearable device (WD) or an autonomous driving vehicle, etc., and the embodiment of the present application does not limit this.
  • an embodiment of the present application provides a system architecture 200 .
  • a data acquisition device 260 can be used to collect training data.
  • the training data is stored in a database 230 , and the training device 220 trains the target model/rule 201 based on the training data maintained in the database 230 .
  • the training device 220 obtains the target model/rule 201 based on the training data.
  • the training device 220 processes multiple frames of sample images, outputs corresponding predicted labels, and calculates the loss between the predicted labels and the original labels of the samples, and updates the classification network based on the loss until the predicted labels are close to the original labels of the samples or the difference between the predicted labels and the original labels is less than a threshold, thereby completing the training of the target model/rule 201.
  • the training method in the following text.
  • the target model/rule 201 in the embodiment of the present application can specifically be a neural network.
  • the training data maintained in the database 230 does not necessarily all come from the collection of the data acquisition device 260, and may also be received from other devices.
  • the training device 220 does not necessarily train the target model/rule 201 entirely based on the training data maintained by the database 230, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a limitation on the embodiments of the present application.
  • the target model/rule 201 obtained by training the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG. 2 .
  • the execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, augmented reality (AR)/virtual reality (VR), a vehicle terminal, a television, etc., or a server or a cloud.
  • the execution device 210 is configured with a transceiver 212, which can include an input/output (I/O) interface or other wireless or wired communication interfaces, etc., for data interaction with external devices. Taking the I/O interface as an example, a user can input data to the I/O interface through the client device 240.
  • I/O input/output
  • the execution device 210 When the execution device 210 preprocesses the input data, or when the computing module 212 of the execution device 210 performs calculations and other related processing, the execution device 210 can call the data, code, etc. in the data storage system 250 for corresponding processing, and can also store the data, instructions, etc. obtained from the corresponding processing into the data storage system 250.
  • the transceiver 212 returns the processing result to the client device 240 so as to provide it to the user.
  • the training device 220 can generate corresponding target models/rules 201 based on different training data for different goals or different tasks.
  • the corresponding target models/rules 201 can be used to achieve the above goals or complete the above tasks, thereby providing users with the desired results.
  • the user can manually give input data, and the manual giving can be operated through the interface provided by the transceiver 212.
  • the client device 240 can automatically send input data to the transceiver 212. If the client device 240 is required to automatically send input data, the user can set the corresponding authority in the client device 240. The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form can be a specific method such as display, sound, action, etc.
  • the client device 240 can also be used as a data acquisition terminal to collect the input data of the input transceiver 212 and the output result of the output transceiver 212 as shown in the figure as new sample data, and store it in the database 230. Of course, it is also possible not to collect through the client device 240, but the transceiver 212 directly stores the input data of the input transceiver 212 and the output result of the output transceiver 212 as new sample data in the database 230.
  • FIG2 is only a schematic diagram of a system architecture provided in an embodiment of the present application.
  • the positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation.
  • the data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
  • a target model/rule 201 is obtained through training by a training device 220 .
  • the target model/rule 201 may be a recommendation model in the present application.
  • the system architecture of the application of the neural network training method provided by the present application can be shown in Figure 3.
  • the server cluster 310 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as data storage, routers, load balancers, etc.
  • the server cluster 310 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the steps of the neural network training method provided by the present application.
  • Each local device can represent any computing device, such as a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other type of cellular phone, a media consumption device, a wearable device, a set-top box, a game console, etc.
  • the local device of each user can interact with the server cluster 310 through a communication network of any communication mechanism/communication standard, and the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof.
  • the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, etc.
  • the wireless network includes, but is not limited to: a fifth-generation mobile communication technology (5th-Generation, 5G) system, a long-term evolution (long term evolution, LTE) system, a global system for mobile communication (global system for mobile communication, GSM) or a code division multiple access (code division multiple access, CDMA) network, a wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), radio frequency identification technology (radio frequency identification, RFID), long-range (Lora) wireless communication, and near-field wireless communication (NFC) Any one or more combinations.
  • the wired network may include an optical fiber communication network or a network composed of coaxial cables, etc.
  • one or more aspects of the execution device 210 may be implemented by each local device.
  • the local device 301 may provide local data or feedback calculation results to the execution device 210 .
  • the local device 301 implements the functions of the execution device 210 and provides services to its own user, or provides services to the user of the local device 302.
  • a machine learning system can include a personalized recommendation system. Based on input data and labels, the parameters of the machine learning model can be trained through optimization methods such as gradient descent. After the model parameters converge, the model can be used to predict unknown data. Taking the click-through rate prediction in a personalized recommendation system as an example, its input data includes user features, item features, and context features. How to predict a personalized recommendation list based on user preferences has an important impact on improving the user experience of the recommendation system and the platform revenue.
  • the click rate prediction model in the recommendation system can generally include the Embedding and MLP layers, that is, the feature interaction layer, deep neural network layer and prediction layer shown in FIG4 .
  • the Embedding is used to map high-dimensional sparse data to low-dimensional dense vectors
  • the MLP layer is generally used to fit the combination relationship and sequence information between features to approximate the actual click rate distribution.
  • Mainstream models are based on the representation of features based on the embedding parameters, and the explicit/implicit combination relationship of features is learned based on the representation.
  • the recommendation model has many features, resulting in a large Embedding scale, such as TB level for Internet companies.
  • the embedding representation vocabulary (Embedding table) is too large, and the video memory of a single GPU or NPU computing card is not enough to store all parameters, and multiple nodes are required for distributed storage.
  • distributed storage brings new problems: more memory overhead is required; in the training/inference stage, the Embedding parameters need to be pulled through the network, which brings more communication overhead, increases the delay of model calculation, and ultimately affects the recommendation effect.
  • the Embedding table can usually be quantized, thereby compressing the Embedding table by reducing the precision.
  • pruning can be used for compression.
  • Parameter thresholds can be set and parameters in the Embedding table that are below the threshold can be pruned.
  • retraining can be performed based on the pruned Embedding.
  • only the memory in the inference phase is compressed, and the training memory is not compressed; retraining is required, which increases the training cost; and the generated Embedding table is unstructured data and requires special storage.
  • compression can be performed based on AutoML, such as adjusting the number of features and the size of different features in the embedding table end-to-end based on the reinforcement learning and differentiable architecture learning method (DARTS) method.
  • DARTS reinforcement learning and differentiable architecture learning method
  • high-frequency features are independently assigned embeddings, and low-frequency features are mapped using hash functions, thereby achieving the purpose of compressing the embedding parameters of low-frequency features.
  • all parameters in the training process are stored as low-precision parameters, and fp32 full-precision parameters are obtained through dequantization, and then forward and reverse calculations are performed to obtain full-precision gradients, and then the fp32 full-precision parameters are updated according to the learning rate step ⁇ to obtain updated parameters.
  • the weight is smaller and smaller, much smaller than the quantization step, deterministic rounding will erase the parameter update, causing the network to be unable to be trained, thereby affecting the training accuracy.
  • the present application provides a quantization method for preserving more parameter information and improving quantization accuracy by setting an adaptive quantization step size.
  • the quantification method provided in this application can be applied to a language model or a recommendation model.
  • the language model may include a neural machine translation or PLM model.
  • the recommendation model may include a click-through rate prediction model, a conversion rate prediction model, etc.
  • the embedding table is used to extract the representation of the input corpus, and then the semantics corresponding to the representation is obtained, followed by further translation or semantic recognition. The subsequent steps can be carried out according to the tasks that the model needs to perform.
  • the application recommendation framework of the present application can be shown in FIG5A, which can be divided into a training part and an online reasoning part.
  • the training set includes input data and corresponding labels.
  • the training set can include products that the user clicks, collects or likes, and the products that are finally purchased.
  • the training set is input into the initial model, and the parameters of the machine learning model are trained by optimization methods such as gradient descent to obtain a recommendation model.
  • the recommendation model can be deployed on the recommendation platform, such as deployed in a server or terminal.
  • the server can be used to output a recommendation list for the user.
  • the information of the recommended products for the user can be displayed on the homepage of the user terminal, such as product icons or link titles, etc., or after the user clicks on a product, the icon or link title of the recommended product for the user can be displayed in the recommendation area.
  • the recommendation process can be shown in FIG5B, which may include display lists, logs, offline training, and online predictions. Users perform a series of actions in the front-end display list, such as browsing, clicking, commenting, downloading, etc., to generate behavioral data, which is stored in the log.
  • the recommendation system uses data including user behavior logs to perform offline model training, generates a prediction model after the training converges, deploys the model in an online service environment, and gives recommendation results based on user request access, product features, and contextual information. Then the user generates feedback on the recommendation results to form user data.
  • this application proposes an end-to-end Adaptive Low-Precision Training framework, which can be used to compress the memory of the Embedding table in the recommendation model, including training memory and reasoning memory, thereby reducing the storage overhead of saving, using, and training models.
  • the full-precision embedded representation may include multiple features, and each feature may be represented as one or more sets of feature vectors.
  • the full-precision embedding representation may include all or part of the features in the embedding table. If the full-precision embedding table is obtained, all or part of the data can be directly read from the full-precision embedding table to obtain the aforementioned full-precision embedding representation. If the low-precision embedding table is obtained, all or part of the features can be read from the low-precision embedding table, and the read features can be dequantized to obtain the full-precision embedding representation.
  • the embedding layer in a neural network can be used to map high-dimensional sparse data to low-dimensional dense vectors, specifically by querying the low-dimensional representation corresponding to the input data from the embedding table.
  • the embedding table stores low-dimensional representations of various data.
  • the input data is high-dimensional sparse data
  • the high-dimensional sparse data can be mapped to low-dimensional representations through the embedding table, which is equivalent to splitting the semantics of multiple dimensions included in the input data.
  • the representation corresponding to the input data of the current iteration can be obtained from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; the low-precision embedding representation of the current iteration is dequantized to obtain the full-precision embedding representation of the current iteration.
  • the neural network may include a language model or a recommendation model.
  • the language model may include models such as neural machine translation or PLM.
  • the recommendation model may include a click-through rate prediction model, a conversion rate prediction model, etc. Therefore, the method provided in this application can be applied to language processing or recommendation scenarios.
  • the adaptive step size corresponding to each feature can be determined.
  • a heuristic algorithm may be used to calculate the adaptive step size corresponding to each feature, or the adaptive step size may be calculated by learning.
  • the heuristic algorithm may specifically include: calculating the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
  • the adaptive quantization step size may be calculated according to the maximum absolute value of the weight in each embedding vector: Where e is the embedding parameter vector,
  • the adaptive compensation is applied to the training process of the neural network by learning and calculating the quantization, such as calculating the adaptive step size in the current iteration according to the weights in the neural network after the current iteration update and the step size updated in the training process of the neural network in the previous iteration. Long, thus achieving higher training accuracy.
  • different methods can be used to calculate the adaptive step size in different scenarios. For example, in the scenario of training a neural network, you can choose a heuristic or learning method. For example, if the accuracy requirement is high and there are many training resources, you can choose a learning method to calculate the adaptive step size. If the computational efficiency requirement is high, you can choose a heuristic method for quantization. For example, when saving the Embedding table, you can use a heuristic algorithm to calculate the adaptive step size, so that the adaptive step size can be calculated efficiently without relying on the training-related parameters of the neural network.
  • the adaptive step size corresponding to each feature can be saved, so that when dequantization is performed later, the low-precision features can be losslessly dequantized based on the adaptive step size to obtain full-precision features.
  • the full-precision embedding representation of the current iteration can be used as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; the full-precision embedding representation is updated according to the full-precision gradient to obtain the updated full-precision embedding representation; the adaptive step size corresponding to each feature in the updated full-precision embedding representation is obtained according to the full-precision gradient. Therefore, during the training process, the adaptive step size adapted to the updated parameters can be updated in real time according to the updated parameters.
  • the calculation step size can be adaptively calculated based on the updated parameters, so that the parameters with less updates can be retained, which can reduce the loss of precision.
  • each feature After determining the adaptive step size corresponding to each feature in the full-precision embedded representation, each feature can be quantized based on the adaptive step size corresponding to each feature to obtain a low-precision embedded representation. Therefore, the storage resources or transmission resources of the computing device required to save or transmit the low-precision embedded representation are lower than the storage resources of the computing device required to save or transmit the full-precision embedded representation.
  • the computing device may include a device that executes the quantization method or recommended method provided in this application.
  • the corresponding adaptive step size is calculated and quantized according to the adaptive step size. Therefore, when quantizing, quantization can be performed based on the matching adaptive step size. For some features whose number does not match the quantization bit, the adaptive step size can be used for quantization. Compared with quantization using a fixed step size, the use of adaptive step size quantization can reduce precision loss and improve quantization accuracy.
  • the low-precision embedding representation vocabulary is updated based on the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary, and the updated low-precision embedding representation is written back to the low-precision embedding table.
  • the method of the present application can be applied to various model preservation or model training processes. For example, when saving a model, a quantization method provided by the present application can be used to achieve lower precision quantization. Alternatively, in the process of training a model, the quantization method provided by the present application can be used to reduce the amount of data required to be transmitted during training and reduce the required cache space.
  • training can usually be performed in one or more epochs, and each epoch can be divided into multiple batches.
  • one of the batches is taken as an example for exemplary introduction.
  • the input data of the current batch training neural network can be used as the input of the embedding layer, and the input data can be mapped into a low-precision, low-dimensional embedding representation through a low-precision embedding table, that is, a low-precision batch Embedding.
  • the low-precision batch embedding can be dequantized, that is, the inverse operation of quantization, to obtain the full-precision batch embedding, so that the neural network can obtain the representation corresponding to the input sample based on the full-precision batch embedding.
  • the training samples can be mapped to The full-precision batch embedding of is used as the input of the neural network and the prediction result is output. Then, based on the prediction result and the true label of the input training sample, the value of the loss function is calculated, and the full-precision gradient of the parameters of the neural network in the current batch is calculated based on the value of the loss function.
  • the weights of the neural network can be updated based on the full-precision gradient to obtain the updated neural network for the current batch.
  • the parameters of the neural network can be updated through the back propagation algorithm.
  • the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges.
  • the adaptive step size can be updated based on the full-precision gradient, and the full-precision batch Embedding can be quantized based on the adaptive step size to obtain a new low-precision batch Embedding, and the updated low-precision batch Embedding can be saved in the low-precision Embedding table to achieve low-precision storage and transmission of the Embedding table, thereby reducing the storage space required for saving and transmitting the Embedding table.
  • the adaptive step size can be calculated through learning and combined with the post-weight updated in each iteration, so that the Embedding table can be quantized in real time based on the update process of the neural network, thereby reducing the storage space occupied during training and saving.
  • the adaptive step size can also be calculated through heuristic algorithms, such as calculating the adaptive step size corresponding to each feature in the full-precision batch embedding according to the absolute value of the updated full-precision batch embedding weight, so that the adaptive step size can be calculated efficiently and accurately.
  • the updated full-precision batch embedding can be quantized based on the adaptive quantization step size to obtain a new low-precision batch embedding.
  • the discrete value of each feature, or discrete feature can be obtained according to the adaptive step size corresponding to each feature. Subsequently, the discrete feature of each feature can be truncated by a random truncation algorithm to obtain a low-precision Embedding table. When truncating by the random truncation algorithm, the value of each discrete feature determines the truncation value, so that the truncation length matches the update of the feature value. Even if the parameter update amplitude is small, the updated part can be quantized to achieve quantization accuracy.
  • step 707. Determine whether convergence has occurred. If so, terminate the iteration. If not, execute step 701.
  • the iteration can be terminated, that is, the neural network after the current batch training is output. If not, the neural network has not converged, and the iterative training can continue.
  • determining whether the neural network converges can be to judge whether the number of iterations reaches a preset number, the change in the loss value is less than a preset value, or whether the iteration duration reaches a preset duration, etc., which can be determined according to your own application scenario, and this application does not limit this.
  • the adaptive step size in the process of training the neural network, can be updated based on the calculated gradient, and quantization can be performed according to the adaptive step size adapted to each feature, so that the quantization accuracy of each feature can be guaranteed as much as possible, lower precision quantization can be achieved, and the loss of information during quantization can be reduced.
  • the recommendation model inputs a batch of high-dimensional sparse data, reads the feature id in the batch data, reads the corresponding batch embedding from the low-precision embedding table, and then obtains the low-precision batch embedding with full-precision representation that can be used for subsequent neural network calculations through inverse quantization; in the reverse stage, the gradient of the current batch embedding is obtained from the upper network, and the batch embedding is updated based on the gradient. Since the embedding table stores low-precision parameters, it is necessary to obtain the low-precision batch embedding through quantization, and then finally write it into the low-precision Embedding Table.
  • the specific steps may include the following.
  • the user's log data 801 is read, and the log data can be used as a training set for the recommendation model.
  • the user's log data may include information generated when the user uses the client. Usually, when the user uses different clients, Different information can be generated. For example, when a user uses a music app, the information of the music played, clicked, collected or searched by the user can be saved in the user's log; when a user uses a shopping app, the information of the items browsed, collected or purchased by the user can be saved in the user's log; when a user uses an application market, the information of the apps clicked, downloaded, installed or collected by the user can be saved in the user's log, etc.
  • high-dimensional sparse batch data 802 of the current batch is read from the user log data.
  • part of the user's log data can be extracted as the high-dimensional sparse data of the current batch and used as the training data for the current iteration.
  • the corresponding low-precision batch embedding is read from the low-precision embedding table803.
  • the user's log data is high-dimensional sparse data, so the high-dimensional sparse data can be mapped to low-dimensional features through the embedding table so that the model can recognize each feature and process it. That is, after reading the high-dimensional sparse batch data of the current batch from the log data, the high-dimensional sparse batch data can be mapped to low-dimensional representations through the low-precision embedding table, such as expressed as low-precision batch embedding.
  • dequantization is performed to obtain the full-precision batch embedding804.
  • the low-precision batch embedding is dequantized through the dequantization algorithm to obtain the full-precision batch embedding.
  • the full-precision batch embedding can be used as the input of the recommendation model 805 and the prediction result 806 can be output.
  • the full-precision gradient of the current batch is calculated based on the prediction result 806, and the batch embedding and quantization step size 807 are updated based on the full-precision gradient of the current batch.
  • the loss value between the prediction result and the true label of the input sample can be calculated, and back propagation is performed based on the loss value to calculate the full-precision gradient of each parameter in the current batch recommendation model.
  • the adaptive quantization step size may be calculated by a heuristic method or by a learning method.
  • the steps for heuristically calculating the adaptive step size can be expressed as: Calculate the adaptive quantization step size based on the maximum absolute value of the weight in each embedding vector: Where e is the embedding parameter vector,
  • the step of learning to calculate the adaptive quantization step size may include: after the weight is updated, the updated weight and the unupdated quantization step size are trained in a quantization-aware manner to update the quantization step size end-to-end. For example, it can be expressed as:
  • the adaptive step size is then updated as expressed as:
  • the updated parameter ⁇ can be quantized.
  • the quantization process can be expressed as:
  • R() is the stage rounding function, which can usually include multiple types, such as deterministic truncation rounding or random truncation rounding.
  • deterministic truncation rounding or random truncation rounding.
  • the clip function is used to return -2 m -1 when ⁇ / ⁇ is less than -2 m -1 , and 2 m - 1 if ⁇ / ⁇ is greater than 2 m-1 .
  • a quantization step size is better selected for the Embedding parameters of each feature to retain as much parameter information as possible, helping the model to still ensure convergence during low-precision training.
  • the memory usage and communication overhead of embedding during training and reasoning can be reduced, so that the same memory can accommodate more parameters.
  • a randomly truncated Rounding function can be used to ensure that the gradient information in the low-precision training process will not be lost due to deterministic truncation.
  • heuristic adaptive quantization step sizes and learning-based adaptive quantization step sizes are provided to adapt to different application scenarios, so as to avoid the need for manual selection of quantization step sizes for different features, thereby improving model training and quantization efficiency.
  • the recommendation model will model the user's multi-behavior interaction history, predict the products that the user may interact with based on the target behavior, and sort the products and display them to the user.
  • Click-through rate prediction can be performed in the manner provided by this application, and the products can be sorted according to the predicted click-through rate and displayed in the recommended page in the sorted order; or the predicted click-through rate value can be sorted and displayed; or the top few click-through rates can be sorted; or each object to be recommended can be scored, and the items can be sorted and displayed according to the score value, etc.
  • the method provided in the present application can be applied to an APP recommendation scenario.
  • the icon of the recommended app can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking or downloading the recommended app, thereby allowing the user to quickly find the required app and improve the user experience.
  • the method provided in the present application can be applied to a product recommendation scenario.
  • the icon of the recommended product can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking, adding to cart, or purchasing the recommended product, thereby allowing the user to view the required products and improving the user experience.
  • the method provided in the present application can be applied to a music recommendation scenario.
  • an icon of recommended music can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking, collecting or playing the recommended music, thereby allowing the user to view more preferred music and improving the user experience.
  • the click-through rate prediction model can usually include two parts: embedding and MLP.
  • the recommended data is high-dimensional and sparse, and the embedding table is large, which will cause problems such as increased memory usage and increased training latency.
  • the commonly used pruning and AutoML methods cannot compress the training memory, and the accuracy of the hash-based method will be lost.
  • the traditional low-precision training method can only use INT16, and does not consider how to use adaptive quantization step size.
  • the quantization method based on adaptive quantization step size provided by this application, when training the click-through rate prediction model offline, the continuous features are first normalized and then automatically discretized.
  • Batch Embedding is taken from the low-precision Embedding Table; low-precision parameters represented by full precision are obtained through inverse quantization calculation, which are used for MLP layer calculation and finally output predicted values; in the training phase, the predicted values are output and the loss function is calculated with the predicted values, and the full-precision gradient of Batch Embedding is obtained by reverse gradient calculation; the Batch Embedding module is updated based on the Batch full-precision gradient, and the quantization step size is adaptively updated; the Batch Embedding is quantized into low-precision parameters based on the adaptive quantization step size; and the low-precision Batch Embedding is then written back to the embedding table.
  • the embedding corresponding to the input data can be read from the low-precision embedding table, and dequantized to obtain the full-precision embedding.
  • the full-precision embedding is used as the input of the click-through rate prediction model to output the prediction result.
  • some public data sets are used as examples to compare some existing quantization methods with the quantization method provided by the present application, such as using the Avazu data set and the Criteo data set.
  • the statistical information of the data sets can be shown in Table 1.
  • the training set and test set in the dataset are divided according to users, with 90% of users as training sets and 10% of users as test sets.
  • Discrete features are one-hot encoded and continuous features are discretized.
  • Evaluation indicators include AUC (Area Under Curve).
  • Some existing quantization methods include: full precision method (Full Precision, FP), quantization-aware method (LSQ), quantization-aware method based on dynamic step size (PACT), INT8 low precision training method (LPT) and INT16 low precision training method (LPT-16), etc.
  • the quantization method provided in this application can be based on different adaptive step size calculation methods, such as: heuristic adaptive step size INT8 low precision training method (ALPT_H) and learnable adaptive step size INT8 low precision training method (ALPT_L).
  • the deterministic Rounding function is used in the above Table 2, and the random stage Rounding function achieves better results in low-precision training, as shown in Table 3.
  • the existing low-precision training method adopts deterministic truncation and does not consider adaptive quantization step size. It can only perform low-precision parameter training based on INT16, which makes it difficult for the model to converge during lower precision training. For example, the Embedding parameters in the compression inference stage need to be retrained, which has low practicality. Although some quantization methods can compress parameters through the hash method, the accuracy is low due to the inevitable collision of the hash function. Although some quantization methods can train the model through INT16, lower precision training is often difficult to converge.
  • this application proposes to use a randomly truncated Rounding function to ensure the parameter update of gradient information during training; and proposes to assign an adaptive quantization step size to each feature to better select the quantization step size to retain as much parameter information as possible.
  • the present application also provides a recommendation method, as shown in FIG12 , which may specifically include:
  • the input data may include data generated by at least one behavior of the user on the terminal.
  • information about the user clicking on the music can be collected; or when a user downloads or installs an app, information about the user downloading or installing the app can be collected.
  • the input data can be converted into features that can be recognized by the neural network through the embedding table.
  • the low-precision embedding table usually stores the mapping relationship between the original data and the representation.
  • the embedding table can be used to convert the input data into features that can be recognized by the neural network. relation, mapping the input data into low-precision embeddings.
  • each feature can be dequantized according to the adaptive step size corresponding to each feature to obtain the full-precision embedding.
  • the inverse quantization step may refer to step 702 in FIG. 7 or step 804 in FIG. 8 , which will not be described in detail here.
  • the obtained full-precision embedding can be used as the input of the recommendation network to output the corresponding recommendation information.
  • the low-precision embedded representation can be dequantized using an adaptive step size to obtain a full-precision embedded representation, so that the low-precision can be saved or transmitted during the reasoning process, and the full-precision embedded representation can be obtained by losslessly restoring the adaptive step size. This can reduce the storage space occupied by the embedded representation vocabulary and perform lossless restoration when used.
  • a schematic diagram of a quantization device provided by the present application includes:
  • An acquisition module 1301 is used to acquire a full-precision embedded representation, where the embedded representation includes multiple features;
  • a determination module 1302 is used to determine an adaptive step size corresponding to each of the multiple features
  • the quantization module 1303 is used to quantize multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, where the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
  • a low-precision embedding representation vocabulary is applied to a neural network.
  • the acquisition module 1301 is specifically used to obtain the representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; dequantize the low-precision embedding representation of the current iteration to obtain the full-precision embedding representation of the current iteration.
  • the determination module 1302 is specifically used to: use the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtain the updated full-precision embedding representation according to the full-precision gradient to obtain the updated full-precision embedding representation; obtain the adaptive step size corresponding to each feature in the updated full-precision embedding representation according to the full-precision gradient.
  • the quantization module 1303 is specifically configured to quantize multiple features in the full-precision low-dimensional representation of the current iteration according to the adaptive step size corresponding to each feature, so as to obtain a low-precision embedded representation.
  • the acquisition module is further configured to update the low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
  • the determination module 1302 is specifically configured to calculate the adaptive step size corresponding to each feature by using a heuristic algorithm.
  • the determination module 1302 is specifically configured to calculate the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
  • the quantization module 1303 is specifically used to: obtain a discrete feature of each feature according to the adaptive step size corresponding to each feature; and truncate the discrete feature of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
  • the low-precision embedding representation vocabulary is applied to a language model or a recommendation model.
  • the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
  • a schematic diagram of a recommended device provided by the present application includes:
  • An input module 1401 is used to obtain input data, where the input data includes data generated by at least one behavior of a user on a terminal;
  • An acquisition module 1402 is used to acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, where the low-precision embedding representation includes multiple features;
  • a dequantization module 1403, configured to dequantize the multiple features according to an adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation
  • the recommendation module 1404 is used to output recommendation information based on the full-precision embedding representation as the input of the neural network, and the recommendation information is used to At least one behavior of the user is recommended.
  • the neural network includes a language model or a recommendation model
  • the language model is used to obtain semantic information of the corpus
  • the recommendation model is used to generate recommendation information based on user information.
  • FIG. 15 is a schematic diagram of the structure of another quantization device provided in the present application, as described below.
  • the recommendation device may include a processor 1501 and a memory 1502.
  • the processor 1501 and the memory 1502 are interconnected via a line.
  • the memory 1502 stores program instructions and data.
  • the memory 1502 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 6 to 8 .
  • the processor 1501 is used to execute the method steps performed by the quantization device shown in any of the embodiments in FIG. 6 to FIG. 8 .
  • the recommendation device may further include a transceiver 1503 for receiving or sending data.
  • a computer-readable storage medium is also provided in an embodiment of the present application.
  • the computer-readable storage medium stores a program, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiments shown in the aforementioned Figures 6 to 8.
  • the recommended device shown in the aforementioned FIG. 15 is a chip.
  • FIG. 16 is a schematic diagram of the structure of another recommended device provided by the present application, as described below.
  • the recommendation device may include a processor 1601 and a memory 1602.
  • the processor 1601 and the memory 1602 are interconnected via a line.
  • the memory 1602 stores program instructions and data.
  • the memory 1602 stores program instructions and data corresponding to the steps in FIG. 12 .
  • the processor 1601 is used to execute the method steps performed by the recommendation device shown in FIG. 12 .
  • the recommendation device may further include a transceiver 1603 for receiving or sending data.
  • a computer-readable storage medium is also provided in an embodiment of the present application.
  • the computer-readable storage medium stores a program, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiment shown in FIG. 12 above.
  • the recommended device shown in the aforementioned FIG. 16 is a chip.
  • An embodiment of the present application also provides a recommendation device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to execute the method steps of the aforementioned Figure 11.
  • An embodiment of the present application also provides a recommendation device, which can also be called a digital processing chip or chip.
  • the chip includes a processing unit and a communication interface.
  • the processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit.
  • the processing unit is used to execute the method steps of the aforementioned Figure 12.
  • the embodiment of the present application also provides a digital processing chip.
  • the digital processing chip integrates a circuit and one or more interfaces for implementing the functions of the above-mentioned processor 1501, processor 1601, or processor 1501, processor 1601.
  • the digital processing chip can complete the method steps of any one or more embodiments in the above-mentioned embodiments.
  • the digital processing chip does not integrate a memory, it can be connected to an external memory through a communication interface.
  • the digital processing chip implements the recommendation device or the action performed by the recommendation device in the above-mentioned embodiment according to the program code stored in the external memory.
  • An embodiment of the present application also provides a computer program product, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiments shown in the aforementioned Figures 6 to 12.
  • the recommendation device or recommendation device provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc.
  • the processing unit may execute the computer execution instructions stored in the storage unit so that the chip in the server executes the method steps described in the embodiments shown in the above-mentioned Figures 6 to 12.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
  • ROM read-only memory
  • RAM random access memory
  • the aforementioned processing unit or processor may be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • CPU central processing unit
  • NPU neural-network processing unit
  • GPU graphics processing unit
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • a general-purpose processor may be a microprocessor or any Any conventional processor, etc.
  • FIG. 17 is a schematic diagram of a structure of a chip provided in an embodiment of the present application.
  • the chip can be a neural network processor NPU 170, which is mounted on the host CPU (Host CPU) as a coprocessor and assigned tasks by the Host CPU.
  • the core part of the NPU is the operation circuit 1703, which is controlled by the controller 1704 to extract matrix data from the memory and perform multiplication operations.
  • the operation circuit 1703 includes multiple processing units (process engines, PEs) inside.
  • the operation circuit 1703 is a two-dimensional systolic array.
  • the operation circuit 1703 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition.
  • the operation circuit 1703 is a general-purpose matrix processor.
  • the operation circuit takes the corresponding data of matrix B from the weight memory 1702 and caches it on each PE in the operation circuit.
  • the operation circuit takes the matrix A data from the input memory 1701 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1708.
  • the unified memory 1706 is used to store input data and output data.
  • the weight data is directly transferred to the weight memory 1702 through the direct memory access controller (DMAC) 1705.
  • the input data is also transferred to the unified memory 1706 through the DMAC.
  • DMAC direct memory access controller
  • the bus interface unit (BIU) 1710 is used for the interaction between the AXI bus and the DMAC and instruction fetch buffer (IFB) 1709.
  • the bus interface unit 1710 (BIU) is used for the instruction fetch memory 1709 to obtain instructions from the external memory, and is also used for the storage unit access controller 1705 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1706 or to transfer weight data to the weight memory 1702 or to transfer input data to the input memory 1701.
  • the vector calculation unit 1707 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.
  • the vector calculation unit 1707 can store the processed output vector to the unified memory 1706.
  • the vector calculation unit 1707 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1703, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value.
  • the vector calculation unit 1707 generates a normalized value, a pixel-level summed value, or both.
  • the processed output vector can be used as an activation input to the operation circuit 1703, for example, for use in a subsequent layer in a neural network.
  • An instruction fetch buffer 1709 connected to the controller 1704, for storing instructions used by the controller 1704;
  • Unified memory 1706, input memory 1701, weight memory 1702 and instruction fetch memory 1709 are all on-chip memories. External memories are private to the NPU hardware architecture.
  • each layer in the recurrent neural network can be performed by the operation circuit 1703 or the vector calculation unit 1707.
  • the processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the methods of Figures 6 to 12 above.
  • the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • the technical solution of the present application is essentially In other words, the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., and includes a number of instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.
  • a readable storage medium such as a computer floppy disk, USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc.
  • ROM read-only memory
  • RAM random access memory
  • magnetic disk or optical disk etc.
  • all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof.
  • all or part of the embodiments may be implemented in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means.
  • wired e.g., coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless e.g., infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that includes one or more available media integrated.
  • the available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided in the present application are a quantization method and apparatus, and a recommendation method and apparatus, which are used for quantizing each feature in a full-precision embedding representation on the basis of an adaptive step size, so as to improve the quantization precision. The method comprises: first acquiring a full-precision embedding representation, the embedding representation comprising a plurality of features; determining an adaptive step size separately corresponding to each feature amongst the plurality of features, wherein the step sizes corresponding to the plurality of features may be the same or different; and then, according to the adaptive step size corresponding to each feature, respectively quantizing the plurality of features to obtain a low-precision embedding representation, wherein the precision of the features in the low-precision embedding representation is lower than the precision of the features in the full-precision embedding representation, thus reducing a storage space required for storing or transmitting the embedding representation.

Description

一种量化方法、推荐方法以及装置A quantification method, a recommendation method and a device
本申请要求于2022年11月25日提交中国专利局、申请号为202211490535.2、申请名称为“一种量化方法、推荐方法以及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on November 25, 2022, with application number 202211490535.2 and application name “A Quantification Method, Recommendation Method and Device”, the entire contents of which are incorporated by reference in this application.
技术领域Technical Field
本申请涉及计算机领域,尤其涉及一种量化方法、推荐方法以及装置。The present application relates to the field of computers, and in particular to a quantification method, a recommendation method and a device.
背景技术Background technique
机器学习系统,包括个性化推荐系统,基于输入数据和标签,通过梯度下降等优化方法训练机器学习模型的参数,当模型参数收敛之后,可利用该模型完成未知数据的预测。Machine learning systems, including personalized recommendation systems, train the parameters of machine learning models based on input data and labels through optimization methods such as gradient descent. After the model parameters converge, the model can be used to predict unknown data.
例如,以推荐系统中的点击率预测模型为例,通常该模型可以包括嵌入(embedding)层以及多层感知机(multilayer perceptron,MLP)层,embedding层通常用于将高维稀疏的额数据映射至低维稠密的向量,MLP通常用于拟合特征之间的组合关系、序列信息或者点击率等不等。然而,对于一些大规模数据场景,通常推荐模型的输入数据量也非常大,因此embedding层的规模非常大,导致存储以及训练等过程中所需的存储空间非常大。For example, taking the click rate prediction model in the recommendation system as an example, the model can usually include an embedding layer and a multilayer perceptron (MLP) layer. The embedding layer is usually used to map high-dimensional sparse data to low-dimensional dense vectors, and the MLP is usually used to fit the combination relationship between features, sequence information, or click rate, etc. However, for some large-scale data scenarios, the input data volume of the recommendation model is usually very large, so the scale of the embedding layer is very large, resulting in a large amount of storage space required during storage and training.
发明内容Summary of the invention
本申请提供一种量化方法、推荐方法以及装置,用于基于自适应步长对全精度嵌入表征中每种特征进行量化,从而提高量化精度。The present application provides a quantization method, a recommended method and a device for quantizing each feature in a full-precision embedded representation based on an adaptive step size, thereby improving the quantization accuracy.
有鉴于此,第一方面,本申请提供一种量化方法,包括:首先,获取全精度嵌入表征,嵌入表征包括多种特征;确定多种特征中每种特征分别对应的自适应步长,该多种特征对应的步长可能相同也可能不相同;随后根据每种特征对应的自适应步长分别对多种特征进行量化,得到低精度嵌入表征,该低精度嵌入表征中的特征的精度低于全精度嵌入表征中特征的精度,因此保存或者传输该低精度嵌入表征所需的存储资源或者传输资源低于保存或者传输全精度嵌入表征所需的存储资源,从而降低保存或者传输该嵌入表征所需的存储空间。In view of this, in a first aspect, the present application provides a quantization method, comprising: first, obtaining a full-precision embedded representation, the embedded representation including multiple features; determining an adaptive step size corresponding to each of the multiple features, the step sizes corresponding to the multiple features may be the same or different; then quantizing the multiple features according to the adaptive step size corresponding to each feature, to obtain a low-precision embedded representation, the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation, so that the storage resources or transmission resources required to save or transmit the low-precision embedded representation are lower than the storage resources required to save or transmit the full-precision embedded representation, thereby reducing the storage space required to save or transmit the embedded representation.
本申请实施方式中,在对全精度嵌入表征进行量化的过程中,可计算每种特征分别对应的自适应步长,并基于每种特征对应的自适应步长进行量化,从而提高量化精度,可以避免因固定步长而导致的精度损失。如当某种特征的更新较少时,若使用固定步长,将可能使更新较少部分因步长而导致降低量化精度。而通过本申请提供的量化方法,每种特征具有对应的自适应步长,该自适应步长与每种特征的长度或者更新数据量匹配,从而在量化时可以避免数据丢失,提高量化精度。In the implementation manner of the present application, in the process of quantizing the full-precision embedded representation, the adaptive step size corresponding to each feature can be calculated, and quantization can be performed based on the adaptive step size corresponding to each feature, thereby improving the quantization accuracy and avoiding the loss of accuracy caused by the fixed step size. For example, when a certain feature is updated less frequently, if a fixed step size is used, the quantization accuracy of the less updated part may be reduced due to the step size. However, through the quantization method provided by the present application, each feature has a corresponding adaptive step size, which matches the length of each feature or the amount of updated data, thereby avoiding data loss during quantization and improving quantization accuracy.
在一种可能的实施方式中,低精度嵌入表征词表应用于神经网络,前述的获取全精度嵌入表征词表,可以包括:从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;对当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的全精度嵌入表征。In one possible implementation, a low-precision embedding representation vocabulary is applied to a neural network, and the aforementioned acquisition of the full-precision embedding representation vocabulary may include: acquiring a representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain a low-precision embedding representation of the current iteration; and dequantizing the low-precision embedding representation of the current iteration to obtain a full-precision embedding representation of the current iteration.
因此,本申请提供的量化方法可以应用于神经网络训练过程中的量化,在每次迭代过程中,传输低精度嵌入表征,通过对应的自适应步长进行反量化即可得到全精度嵌入表征,从而可以实现低精度嵌入表征的全精度还原,得到无损全精度嵌入表征,可以降低神经网络训练过程中嵌入表征所占用的存储空间。Therefore, the quantization method provided in the present application can be applied to quantization in the process of neural network training. In each iteration, a low-precision embedded representation is transmitted, and a full-precision embedded representation can be obtained by dequantizing it through the corresponding adaptive step size, thereby achieving full-precision restoration of the low-precision embedded representation and obtaining a lossless full-precision embedded representation, which can reduce the storage space occupied by the embedded representation during the neural network training process.
在一种可能的实施方式中,前述的确定多种特征中每种特征分别对应的自适应步长,可以包括:将当前次迭代的全精度嵌入表征作为神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;根据全精度梯度获取更新全精度嵌入表征,得到更新后的全精度嵌入表征;根据全精度梯度获取更新后的全精度嵌入表征中每种特征分别对应的自适应步长。In a possible implementation, the aforementioned determination of the adaptive step size corresponding to each of the multiple features may include: using the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtaining an updated full-precision embedding representation based on the full-precision gradient to obtain an updated full-precision embedding representation; obtaining the adaptive step size corresponding to each feature in the updated full-precision embedding representation based on the full-precision gradient.
本申请实施方式中,在神经网络的训练过程中,可以根据全精度梯度来确定每种特征对应的自适应步长,从而可以自适应更新步长,得到与每种特征匹配的自适应步长,可以避免嵌入表征中因更新量较少而导致降低量化精度,可以提高量化精度。In the implementation mode of the present application, during the training process of the neural network, the adaptive step size corresponding to each feature can be determined based on the full-precision gradient, so that the step size can be adaptively updated to obtain an adaptive step size that matches each feature. This can avoid reducing the quantization accuracy due to the small update amount in the embedded representation, and can improve the quantization accuracy.
在一种可能的实施方式中,前述的根据每种特征对应的自适应步长分别对多种特征进行量化,包括:根据每种特征分别对应的自适应步长,对当前次迭代的全精度低维表征中的多种特征进行量化,得到低精度嵌入表征。 In a possible implementation, the aforementioned quantizing of multiple features according to the adaptive step size corresponding to each feature includes: quantizing multiple features in the full-precision low-dimensional representation of the current iteration according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation.
因此,本申请实施方式中,可以使用基于全精度梯度计算得到的自适应步长进行量化,从而在训练过程中对嵌入表征进行同步量化。Therefore, in the implementation manner of the present application, an adaptive step size obtained based on full-precision gradient calculation can be used for quantization, so as to synchronously quantize the embedded representation during the training process.
在一种可能的实施方式中,本申请提供的方法还可以包括:根据低精度嵌入表征更新低精度嵌入表征词表,得到更新后的低精度嵌入表征词表。In a possible implementation, the method provided in the present application may further include: updating a low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
在进行量化得到新的低精度嵌入表征后,可以将新的低精度嵌入表征后写回低精度嵌入表征词表中,以便于后续进行低精度保存或者传输。After quantization is performed to obtain a new low-precision embedding representation, the new low-precision embedding representation can be written back into the low-precision embedding representation vocabulary to facilitate subsequent low-precision storage or transmission.
在一种可能的实施方式中,前述的确定多种特征中每种特征对应的自适应步长,可以包括:通过启发式算法计算每种特征对应的自适应步长。In a possible implementation, the aforementioned determination of the adaptive step length corresponding to each of the multiple features may include: calculating the adaptive step length corresponding to each feature by using a heuristic algorithm.
本申请实施方式中,都可以通过启发式算法计算自适应步长,可以适用于保存低精度嵌入表征词表的场景中。In the implementation manner of the present application, the adaptive step size can be calculated by a heuristic algorithm, which can be applicable to the scenario of storing a low-precision embedded representation vocabulary.
在一种可能的实施方式中,前述的通过启发式算法计算每种特征对应的自适应步长,可以包括:根据每种特征中权重绝对值计算每种特征对应的自适应步长。因此,可以基于每种特征自身的权重值来计算自适应步长,无需依赖外部数据。In a possible implementation, the aforementioned calculation of the adaptive step size corresponding to each feature by a heuristic algorithm may include: calculating the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature. Therefore, the adaptive step size can be calculated based on the weight value of each feature itself without relying on external data.
在一种可能的实施方式中,前述的根据每种特征对应的自适应步长分别对多种特征进行量化,得到低精度嵌入表征词表,还可以包括:根据每种特征对应的自适应步长,得到每种特征的离散特征;通过随机截断算法对每种特征的离散特征进行截断,得到低精度嵌入表征。In a possible implementation, the aforementioned quantizing of multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation vocabulary may also include: obtaining discrete features of each feature according to the adaptive step size corresponding to each feature; truncating the discrete features of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
本申请实施方式中,可以通过随机截断算法来对每种特征进行截断,从而可以自适应地保留有效特征,提高量化精度。In the implementation manner of the present application, each feature can be truncated by a random truncation algorithm, so that effective features can be adaptively retained and quantization accuracy can be improved.
在一种可能的实施方式中,低精度嵌入表征词表应用于语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。因此,本申请提供的方法可以应用于自然语言处理或者推荐场景等。In a possible implementation, the low-precision embedding representation vocabulary is applied to a language model or a recommendation model, the language model is used to obtain the semantic information of the corpus, and the recommendation model is used to generate recommendation information based on the user's information. Therefore, the method provided in this application can be applied to natural language processing or recommendation scenarios, etc.
第二方面,本申请提供一种推荐方法,包括:获取输入数据,输入数据包括用户针对终端的至少一种行为产生的数据;从低精度嵌入表征词表中获取与输入数据对应的低精度嵌入表征,低精度嵌入表征中包括多种特征;根据多种特征中每种特征对应的自适应步长对多种特征进行反量化,得到全精度嵌入表征,该自适应步长可以是量化全精度嵌入表征时得到的自适应步长;根据全精度嵌入表征作为神经网络的输入,输出推荐信息,推荐信息用于针对用户的至少一种行为进行推荐。In a second aspect, the present application provides a recommendation method, comprising: obtaining input data, the input data including data generated by a user for at least one behavior of a terminal; obtaining a low-precision embedded representation corresponding to the input data from a low-precision embedded representation vocabulary, the low-precision embedded representation including multiple features; dequantizing the multiple features according to an adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation, and the adaptive step size may be an adaptive step size obtained when quantizing the full-precision embedded representation; outputting recommendation information based on the full-precision embedded representation as input to a neural network, and the recommendation information is used to make recommendations for at least one behavior of the user.
本申请实施方式中,在神经网络的推理过程中,可以使用自适应步长对低精度嵌入表征进行反量化得到全精度嵌入表征,因此在推理过程中可以保存或者传输低精度,通过自适应步长进行无损还原,得到全精度嵌入表征。从而可以降低嵌入表征词表所占用的存储空间,并在使用时进行无损还原。In the implementation of the present application, during the reasoning process of the neural network, the low-precision embedded representation can be dequantized using an adaptive step size to obtain a full-precision embedded representation, so that the low-precision can be saved or transmitted during the reasoning process, and the full-precision embedded representation can be obtained by losslessly restoring the adaptive step size. This can reduce the storage space occupied by the embedded representation vocabulary and perform lossless restoration when used.
在一种可能的实施方式中,神经网络包括语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。In a possible implementation, the neural network includes a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
第三方面,本申请提供一种量化装置,包括:In a third aspect, the present application provides a quantization device, comprising:
获取模块,用于获取全精度嵌入表征,嵌入表征包括多种特征;An acquisition module is used to acquire a full-precision embedded representation, where the embedded representation includes multiple features;
确定模块,用于确定多种特征中每种特征分别对应的自适应步长;A determination module, used to determine the adaptive step size corresponding to each of the multiple features;
量化模块,用于根据每种特征对应的自适应步长分别对多种特征进行量化,得到低精度嵌入表征,低精度嵌入表征中的特征的精度低于全精度嵌入表征中特征的精度。The quantization module is used to quantize multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, where the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
在一种可能的实施方式中,低精度嵌入表征词表应用于神经网络,In one possible implementation, a low-precision embedding representation vocabulary is applied to a neural network.
获取模块,具体用于从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;对当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的全精度嵌入表征。The acquisition module is specifically used to obtain the representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; dequantize the low-precision embedding representation of the current iteration to obtain the full-precision embedding representation of the current iteration.
在一种可能的实施方式中,确定模块,具体用于:将当前次迭代的全精度嵌入表征作为神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;根据全精度梯度获取更新全精度嵌入表征,得到更新后的全精度嵌入表征;根据全精度梯度获取更新后的全精度嵌入表征中每种特征分别对应的自适应步长。In one possible implementation, the determination module is specifically used to: use the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtain the updated full-precision embedding representation according to the full-precision gradient to obtain the updated full-precision embedding representation; obtain the adaptive step size corresponding to each feature in the updated full-precision embedding representation according to the full-precision gradient.
在一种可能的实施方式中,量化模块,具体用于根据每种特征分别对应的自适应步长,对当前次迭 代的全精度低维表征中的多种特征进行量化,得到低精度嵌入表征。In a possible implementation, the quantization module is specifically configured to quantize the current iteration according to the adaptive step size corresponding to each feature. The multiple features in the full-precision low-dimensional representation of the previous generation are quantized to obtain a low-precision embedded representation.
在一种可能的实施方式中,获取模块,还用于根据低精度嵌入表征更新低精度嵌入表征词表,得到更新后的低精度嵌入表征词表。In a possible implementation, the acquisition module is further configured to update the low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
在一种可能的实施方式中,确定模块,具体用于通过启发式算法计算每种特征对应的自适应步长。In a possible implementation, the determination module is specifically configured to calculate the adaptive step size corresponding to each feature by using a heuristic algorithm.
在一种可能的实施方式中,确定模块,具体用于根据每种特征中权重绝对值计算每种特征对应的自适应步长。In a possible implementation, the determination module is specifically configured to calculate the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
在一种可能的实施方式中,量化模块,具体用于:根据每种特征对应的自适应步长,得到每种特征的离散特征;通过随机截断算法对每种特征的离散特征进行截断,得到低精度嵌入表征。In a possible implementation, the quantization module is specifically used to: obtain a discrete feature of each feature according to an adaptive step size corresponding to each feature; and truncate the discrete feature of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
在一种可能的实施方式中,低精度嵌入表征词表应用于语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。In a possible implementation, the low-precision embedding representation vocabulary is applied to a language model or a recommendation model. The language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
第四方面,本申请提供一种推荐装置,包括:In a fourth aspect, the present application provides a recommendation device, comprising:
输入模块,用于获取输入数据,输入数据包括用户针对终端的至少一种行为产生的数据;An input module, used to obtain input data, where the input data includes data generated by at least one behavior of a user on a terminal;
获取模块,用于从低精度嵌入表征词表中获取与输入数据对应的低精度嵌入表征,低精度嵌入表征中包括多种特征;An acquisition module is used to acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, where the low-precision embedding representation includes multiple features;
反量化模块,用于根据多种特征中每种特征对应的自适应步长对多种特征进行反量化,得到全精度嵌入表征;A dequantization module, used to dequantize multiple features according to the adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation;
推荐模块,用于根据全精度嵌入表征作为神经网络的输入,输出推荐信息,推荐信息用于针对用户的至少一种行为进行推荐。The recommendation module is used to output recommendation information based on the full-precision embedding representation as the input of the neural network, and the recommendation information is used to recommend at least one behavior of the user.
在一种可能的实施方式中,神经网络包括语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。In a possible implementation, the neural network includes a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
第五方面,本申请提供一种量化装置,该推荐装置包括:处理器、存储器、输入输出设备以及总线;该存储器中存储有计算机指令;该处理器在执行该存储器中的计算机指令时,该存储器中存储有计算机指令;该处理器在执行该存储器中的计算机指令时,用于实现如第一方面任意一种实现方式。In a fifth aspect, the present application provides a quantification device, which includes: a processor, a memory, an input/output device, and a bus; the memory stores computer instructions; when the processor executes the computer instructions in the memory, the memory stores computer instructions; when the processor executes the computer instructions in the memory, it is used to implement any one of the implementation methods of the first aspect.
第六方面,本申请提供一种推荐装置,该推荐装置包括:处理器、存储器、输入输出设备以及总线;该存储器中存储有计算机指令;该处理器在执行该存储器中的计算机指令时,该存储器中存储有计算机指令;该处理器在执行该存储器中的计算机指令时,用于实现如第二方面任意一种实现方式。In a sixth aspect, the present application provides a recommendation device, comprising: a processor, a memory, an input/output device, and a bus; the memory stores computer instructions; when the processor executes the computer instructions in the memory, the memory stores computer instructions; when the processor executes the computer instructions in the memory, the device is used to implement any one of the implementation methods of the second aspect.
第七方面,本申请实施例提供了一种芯片系统,该芯片系统包括处理器和输入/输出端口,所述处理器用于实现上述第一方面或第二方面所述的方法所涉及的处理功能,所述输入/输出端口用于实现上述第一方面或第二方面所述的方法所涉及的收发功能。In the seventh aspect, an embodiment of the present application provides a chip system, which includes a processor and an input/output port, wherein the processor is used to implement the processing functions involved in the method described in the first aspect or the second aspect above, and the input/output port is used to implement the transceiver functions involved in the method described in the first aspect or the second aspect above.
在一种可能的设计中,该芯片系统还包括存储器,该存储器用于存储实现上述第一方面或第二方面所述的方法所涉及功能的程序指令和数据。In one possible design, the chip system also includes a memory, which is used to store program instructions and data for implementing the functions involved in the method described in the first aspect or the second aspect above.
该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。The chip system may be composed of chips, or may include chips and other discrete devices.
第八方面,本申请实施例提供一种计算机可读存储介质。该计算机可读存储介质中存储有计算机指令;当该计算机指令在计算机上运行时,使得该计算机执行如第一方面或第二方面中任意一种可能的实现方式所述的方法。In an eighth aspect, an embodiment of the present application provides a computer-readable storage medium. The computer-readable storage medium stores computer instructions; when the computer instructions are executed on a computer, the computer executes the method described in any possible implementation of the first aspect or the second aspect.
第九方面,本申请实施例提供一种计算机程序产品。该计算机程序产品包括计算机程序或指令,当该计算机程序或指令在计算机上运行时,使得该计算机执行如第一方面或第二方面中任意一种可能的实现方式所述的方法。In a ninth aspect, an embodiment of the present application provides a computer program product. The computer program product includes a computer program or instructions, and when the computer program or instructions are executed on a computer, the computer executes the method described in any possible implementation of the first aspect or the second aspect.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本申请应用的一种人工智能主体框架示意图;FIG1 is a schematic diagram of an artificial intelligence main framework used in this application;
图2为本申请提供的一种系统架构示意图;FIG2 is a schematic diagram of a system architecture provided by the present application;
图3为本申请提供的另一种系统架构示意图;FIG3 is a schematic diagram of another system architecture provided by the present application;
图4为本申请提供的一种应用场景示意图;FIG4 is a schematic diagram of an application scenario provided by the present application;
图5A为本申请提供的另一种应用场景示意图;FIG5A is a schematic diagram of another application scenario provided by the present application;
图5B为本申请提供的另一种应用场景示意图; FIG5B is a schematic diagram of another application scenario provided by the present application;
图6为本申请提供的一种量化方法的流程意图;FIG6 is a flowchart of a quantification method provided by the present application;
图7为本申请提供的另一种量化方法的流程意图;FIG7 is a flowchart of another quantification method provided by the present application;
图8为本申请提供的另一种量化方法的流程意图;FIG8 is a flowchart of another quantification method provided by the present application;
图9为本申请提供的另一种应用场景示意图;FIG9 is a schematic diagram of another application scenario provided by the present application;
图10为本申请提供的另一种应用场景示意图;FIG10 is a schematic diagram of another application scenario provided by the present application;
图11为本申请提供的另一种应用场景示意图;FIG11 is a schematic diagram of another application scenario provided by the present application;
图12为本申请提供的一种推荐方法的流程示意图;FIG12 is a schematic diagram of a flow chart of a recommended method provided by the present application;
图13为本申请提供的一种量化装置的结构示意图;FIG13 is a schematic diagram of the structure of a quantization device provided by the present application;
图14为本申请提供的一种推荐装置的结构示意图;FIG14 is a schematic diagram of the structure of a recommended device provided by the present application;
图15为本申请提供的一种量化装置的结构示意图;FIG15 is a schematic diagram of the structure of a quantization device provided by the present application;
图16为本申请提供的一种推荐装置的结构示意图;FIG16 is a schematic diagram of the structure of a recommended device provided by the present application;
图17为本申请提供的一种芯片的结构示意图。FIG17 is a schematic diagram of the structure of a chip provided in the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The following will describe the technical solutions in the embodiments of the present application in conjunction with the drawings in the embodiments of the present application. Obviously, the described embodiments are only part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those skilled in the art without creative work are within the scope of protection of the present application.
本申请提供的推荐方法可以应用于人工智能(artificial intelligence,AI)场景中。AI是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能领域的研究包括机器人,自然语言处理,计算机视觉,决策与推理,人机交互,推荐与搜索,AI基础理论等。The recommendation method provided in this application can be applied to artificial intelligence (AI) scenarios. AI is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can respond in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines so that machines have the functions of perception, reasoning and decision-making. Research in the field of artificial intelligence includes robotics, natural language processing, computer vision, decision-making and reasoning, human-computer interaction, recommendation and search, basic AI theory, etc.
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。First, the overall workflow of the artificial intelligence system is described. Please refer to Figure 1. Figure 1 shows a structural diagram of the main framework of artificial intelligence. The following is an explanation of the above artificial intelligence theme framework from the two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Among them, the "intelligent information chain" reflects a series of processes from data acquisition to processing. For example, it can be a general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has undergone a condensation process of "data-information-knowledge-wisdom". The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of human intelligence, information (providing and processing technology implementation) to the industrial ecology process of the system.
(1)基础设施(1) Infrastructure
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。The infrastructure provides computing power support for the artificial intelligence system, enables communication with the outside world, and is supported by the basic platform. It communicates with the outside world through sensors; computing power is provided by smart chips (CPU, NPU, GPU, ASIC, FPGA and other hardware acceleration chips); the basic platform includes distributed computing frameworks and networks and other related platform guarantees and support, which can include cloud storage and computing, interconnected networks, etc. For example, sensors communicate with the outside world to obtain data, and these data are provided to the smart chips in the distributed computing system provided by the basic platform for calculation.
(2)数据(2) Data
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。The data on the upper layer of the infrastructure is used to represent the data sources in the field of artificial intelligence. The data involves graphics, images, voice, text, and IoT data of traditional devices, including business data of existing systems and perception data such as force, displacement, liquid level, temperature, and humidity.
(3)数据处理(3) Data processing
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、预处理、训练等。Among them, machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。 Reasoning refers to the process of simulating human intelligent reasoning in computers or intelligent systems, using formalized information to perform machine thinking and solve problems based on reasoning control strategies. Typical functions are search and matching.
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。Decision-making refers to the process of making decisions after intelligent information is reasoned, usually providing functions such as classification, sorting, and prediction.
(4)通用能力(4) General capabilities
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。After the data has undergone the data processing mentioned above, some general capabilities can be further formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
(5)智能产品及行业应用(5) Smart products and industry applications
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能交通、智能医疗、自动驾驶、智慧城市等。Smart products and industry applications refer to the products and applications of artificial intelligence systems in various fields. They are the encapsulation of the overall artificial intelligence solution, which productizes intelligent information decision-making and realizes practical applications. Its application areas mainly include: smart terminals, smart transportation, smart medical care, autonomous driving, smart cities, etc.
本申请实施例涉及了神经网络的相关应用,为了更好地理解本申请实施例的方案,下面先对本申请实施例可能涉及的神经网络的相关术语和概念进行介绍。The embodiments of the present application involve related applications of neural networks. In order to better understand the solutions of the embodiments of the present application, the relevant terms and concepts of the neural networks that may be involved in the embodiments of the present application are first introduced below.
(1)卷积神经网络(1) Convolutional Neural Network
卷积神经网络(convolutional neuron network,CNN)是一种带有卷积结构的深度神经网络。卷积神经网络包含了一个由卷积层和子采样层构成的特征抽取器,该特征抽取器可以看作是滤波器。卷积层是指卷积神经网络中对输入信号进行卷积处理的神经元层。在卷积神经网络的卷积层中,一个神经元可以只与部分邻层神经元连接。一个卷积层中,通常包含若干个特征平面,每个特征平面可以由一些矩形排列的神经单元组成。同一特征平面的神经单元共享权重,这里共享的权重就是卷积核。共享权重可以理解为提取特征的方式与位置无关。卷积核可以以随机大小的矩阵的形式化,在卷积神经网络的训练过程中卷积核可以通过学习得到合理的权重。另外,共享权重带来的直接好处是减少卷积神经网络各层之间的连接,同时又降低了过拟合的风险。Convolutional neural network (CNN) is a deep neural network with a convolutional structure. Convolutional neural network contains a feature extractor consisting of a convolution layer and a subsampling layer, which can be regarded as a filter. Convolutional layer refers to the neuron layer in the convolutional neural network that performs convolution processing on the input signal. In the convolutional layer of the convolutional neural network, a neuron can only be connected to some neurons in the adjacent layers. A convolutional layer usually contains several feature planes, each of which can be composed of some rectangularly arranged neural units. The neural units in the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as the way to extract features is independent of position. Convolution kernels can be formalized as matrices of random sizes, and convolution kernels can obtain reasonable weights through learning during the training process of convolutional neural networks. In addition, the direct benefit of shared weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.
(2)图神经网络(Graph Convolutional Network,GCN)(2) Graph Convolutional Network (GCN)
图神经网络是一种建模处理非欧式空间数据(如图数据)的深度学习模型。其原理是使用成对消息传递,使得图节点通过与其邻居交换信息来迭代地更新其对应的表征。Graph neural network is a deep learning model that models and processes non-Euclidean spatial data (such as graph data). Its principle is to use pairwise message passing so that graph nodes iteratively update their corresponding representations by exchanging information with their neighbors.
GCN与CNN类似,区别在于,CNN的输入通常是二维结构数据,而GCN的输入通常是图结构数据。GCN精妙地设计了一种从图数据中提取特征的方法,从而可以使用这些特征去对图数据进行节点分类(node classification)、图分类(graph classification)、边预测(link prediction),还可以得到图的嵌入表示(graph embedding)等。GCN is similar to CNN, except that the input of CNN is usually two-dimensional structured data, while the input of GCN is usually graph structured data. GCN has cleverly designed a method to extract features from graph data, so that these features can be used to perform node classification, graph classification, link prediction, and graph embedding.
(3)损失函数(3) Loss Function
在训练深度神经网络的过程中,因为希望深度神经网络的输出尽可能的接近真正想要预测的值,所以可以通过比较当前网络的预测值和真正想要的目标值,再根据两者之间的差异情况来更新每一层神经网络的权重向量(当然,在第一次更新之前通常会有化的过程,即为深度神经网络中的各层预先配置参数),比如,如果网络的预测值高了,就调整权重向量让它预测低一些,不断地调整,直到深度神经网络能够预测出真正想要的目标值或与真正想要的目标值非常接近的值。因此,就需要预先定义“如何比较预测值和目标值之间的差异”,这便是损失函数(loss function)或目标函数(objective function),它们是用于衡量预测值和目标值的差异的重要方程。其中,以损失函数举例,损失函数的输出值(loss)越高表示差异越大,那么深度神经网络的训练就变成了尽可能缩小这个loss的过程。该损失函数通常可以包括误差平方均方、交叉熵、对数、指数等损失函数。例如,可以使用误差均方作为损失函数,定义为具体可以根据实际应用场景选择具体的损失函数。In the process of training deep neural networks, because we hope that the output of the deep neural network is as close as possible to the value we really want to predict, we can compare the predicted value of the current network with the target value we really want, and then update the weight vector of each layer of the neural network according to the difference between the two (of course, there is usually a process of optimization before the first update, that is, pre-configuring parameters for each layer in the deep neural network). For example, if the predicted value of the network is high, adjust the weight vector to make it predict a lower value, and continue to adjust until the deep neural network can predict the target value we really want or a value very close to the target value we really want. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function or objective function, which are important equations used to measure the difference between the predicted value and the target value. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference, so the training of the deep neural network becomes a process of minimizing this loss as much as possible. The loss function can usually include loss functions such as squared error, cross entropy, logarithm, exponential, etc. For example, the squared error can be used as a loss function, defined as The specific loss function can be selected according to the actual application scenario.
(4)反向传播算法(4) Back propagation algorithm
一种计算根据损失函数计算模型参数梯度、更新模型参数的算法。神经网络可以采用误差反向传播(back propagation,BP)算法在训练过程中修正初始的神经网络模型中参数的大小,使得神经网络模型的重建误差损失越来越小。具体地,前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。反向传播算法是以误差损失为主导的反向传播运动,旨在得到最优的神经网络模型的参数,例如权重矩阵。An algorithm that calculates the gradient of model parameters based on the loss function and updates the model parameters. The neural network can use the error back propagation (BP) algorithm to correct the size of the parameters in the initial neural network model during the training process, so that the reconstruction error loss of the neural network model becomes smaller and smaller. Specifically, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges. The back propagation algorithm is a back propagation movement dominated by error loss, which aims to obtain the optimal parameters of the neural network model, such as the weight matrix.
本申请实施方式中,在训练阶段或者推理阶段,都可以采用BP算法来对模型进行训练,得到训练 后的模型。In the implementation mode of the present application, in the training stage or the inference stage, the BP algorithm can be used to train the model to obtain the training After the model.
(5)梯度:损失函数关于参数的导数向量。(5) Gradient: The derivative vector of the loss function with respect to the parameters.
(6)随机梯度:机器学习中样本数量很大,所以每次计算的损失函数都由随机采样得到的数据计算,相应的梯度称作随机梯度。(6) Stochastic gradient: The number of samples in machine learning is very large, so the loss function is calculated each time based on data obtained by random sampling, and the corresponding gradient is called stochastic gradient.
(7)Embedding:指样本的特征表示或者词嵌入表征。(7)Embedding: refers to the feature representation of samples or word embedding representation.
(8)推荐系统:推荐系统根据用户的历史点击行为数据,采用机器学习算法进行分析和学习,然后对用户的新请求进行预测,返回个性化物品推荐列表。(8) Recommendation system: The recommendation system uses machine learning algorithms to analyze and learn based on the user's historical click behavior data, then predicts the user's new requests and returns a personalized item recommendation list.
(9)模型量化:是一种由高比特转换为低比特的模型压缩方式。例如,将常规32位浮点运算转换为低bit整型运算的模型压缩技术,即可称为模型量化。如当低bit量化为8bit时,可以称之为int8量化,即原来表示一个权重需要float32表示,量化后只需要用int8表示,理论上能够获得4倍的网络加速,同时8位相较于32位能够减少4倍存储空间,减少了存储空间和运算时间,从而达到了压缩模型和加速的目的。(9) Model quantization: This is a model compression method that converts high bits into low bits. For example, the model compression technology that converts conventional 32-bit floating-point operations into low-bit integer operations can be called model quantization. For example, when the low bit is quantized to 8 bits, it can be called int8 quantization, that is, a weight originally needs to be represented by float32, but after quantization, it only needs to be represented by int8. In theory, it can achieve 4 times network acceleration. At the same time, 8 bits can reduce 4 times the storage space compared to 32 bits, reducing storage space and computing time, thereby achieving the purpose of compressing the model and accelerating.
(10)自动机器学习(AutoML):是指设计一系列高级的控制系统去操作机器学习模型,使得模型可以自动化地学习到合适的参数和配置而无需人工干预。在基于深度神经网络的学习模型中,自动计算学习主要包括网络架构搜索与全局参数设定。其中,网络架构搜索用于根据数据让计算机生成最适应问题的神经网络架构,具有训练复杂度高,性能提升大的特点。(10) Automatic machine learning (AutoML): refers to the design of a series of advanced control systems to operate machine learning models so that the models can automatically learn appropriate parameters and configurations without human intervention. In learning models based on deep neural networks, automatic computational learning mainly includes network architecture search and global parameter setting. Among them, network architecture search is used to allow computers to generate the neural network architecture that best suits the problem based on data, which has the characteristics of high training complexity and great performance improvement.
(11)语料(Corpus):也称为自由文本,其可以是字、词语、句子、片段、文章及其任意组合。例如,“今天天气真好”即为一段语料。(11) Corpus: Also known as free text, it can be words, phrases, sentences, fragments, articles, or any combination thereof. For example, “Today’s weather is really nice” is a piece of corpus.
(12)神经机器翻译(neural machine translation):神经机器翻译是自然语言处理的一个典型任务。该任务是给定一个源语言的句子,输出其对应的目标语言句子的技术。在常用的神经机器翻译模型中,源语言和目标语言的句子中的词均会编码成为向量表示,在向量空间进行计算词与词以及句子与句子之间的关联,从而进行翻译任务。(12) Neural machine translation: Neural machine translation is a typical task in natural language processing. Given a sentence in a source language, the task is to output a corresponding sentence in a target language. In the commonly used neural machine translation model, the words in the sentences of the source language and the target language are encoded into vector representations, and the associations between words and sentences are calculated in the vector space to perform the translation task.
(13)预训练语言模型(pre-trained language model,PLM):是一种自然语言序列编码器,将自然语言序列中的每个词进行编码为一个向量表示,从而进行预测任务。PLM的训练包含两个阶段,即预训练(pre-training)阶段和微调(finetuning)阶段。在预训练阶段,该模型在大规模无监督文本上进行语言模型任务的训练,从而学习到词表示方式。在微调阶段,该模型利用预训练阶段学到的参数做初始化,在文本分类(text classification)或序列标注(sequence labeling)等下游任务(Downstream Task)上进行较少步骤的训练,就可以成功把预训练得到的语义信息成功迁移到下游任务上来。(13) Pre-trained language model (PLM): It is a natural language sequence encoder that encodes each word in a natural language sequence into a vector representation for prediction tasks. The training of PLM consists of two stages, namely the pre-training stage and the fine-tuning stage. In the pre-training stage, the model is trained on language model tasks on large-scale unsupervised text to learn word representation. In the fine-tuning stage, the model is initialized using the parameters learned in the pre-training stage and trained on downstream tasks such as text classification or sequence labeling with fewer steps, so that the semantic information obtained from pre-training can be successfully transferred to downstream tasks.
(14)点击率(Click Through Rate,CTR):指用户在特定环境下点击某个展示物品的概率。(14) Click Through Rate (CTR): refers to the probability that a user clicks on a displayed item in a specific environment.
(15)转化率(Post-click conversion rate,CVR):指用户在特定环境下对已点击的某个展示物品转化的概率,例如,若用户点击了某个APP的图标,转化即指下载、安装、注册等行为。(15) Post-click conversion rate (CVR): refers to the probability that a user converts a clicked item in a specific environment. For example, if a user clicks on the icon of an APP, conversion refers to downloading, installing, registering, etc.
(16)Epoch(16)Epoch
定义了学习算法在整个训练集上的工作次数,一个epoch可以认为使用整个训练集对神经网络进行训练的次数。Defines the number of times the learning algorithm works on the entire training set. An epoch can be considered the number of times the neural network is trained using the entire training set.
(17)batch;(17) batch;
与epoch的定义紧密相关,一个epoch包含使用整个数据集对神经网络进行训练,而一个batch代表一个epoch中的其中一个批次的数据,具体表现为batch_size*batchs=epoch,可以理解为每个epoch分为了一个或者多个batch,每个batch可以使用训练集中的部分数据对神经网络进行训练。Closely related to the definition of epoch, an epoch involves training the neural network using the entire dataset, and a batch represents one of the batches of data in an epoch, specifically expressed as batch_size*batchs=epoch. It can be understood that each epoch is divided into one or more batches, and each batch can use part of the data in the training set to train the neural network.
本申请实施例提供的推荐方法可以在服务器上被执行,还可以在终端设备上被执行。其中该终端设备可以是具有图像处理功能的移动电话、平板个人电脑(tablet personal computer,TPC)、媒体播放器、智能电视、笔记本电脑(laptop computer,LC)、个人数字助理(personal digital assistant,PDA)、个人计算机(personal computer,PC)、照相机、摄像机、智能手表、可穿戴式设备(wearable device,WD)或者自动驾驶的车辆等,本申请实施例对此不作限定。The recommendation method provided in the embodiment of the present application can be executed on a server or on a terminal device. The terminal device can be a mobile phone with image processing function, a tablet personal computer (TPC), a media player, a smart TV, a laptop computer (LC), a personal digital assistant (PDA), a personal computer (PC), a camera, a video camera, a smart watch, a wearable device (WD) or an autonomous driving vehicle, etc., and the embodiment of the present application does not limit this.
下面介绍本申请实施例提供的系统架构。 The following introduces the system architecture provided by the embodiments of the present application.
参见图2,本申请实施例提供了一种系统架构200。如系统架构200所示,数据采集设备260可以用于采集训练数据。在数据采集设备260采集到训练数据之后,将这些训练数据存入数据库230,训练设备220基于数据库230中维护的训练数据训练得到目标模型/规则201。Referring to FIG. 2 , an embodiment of the present application provides a system architecture 200 . As shown in the system architecture 200 , a data acquisition device 260 can be used to collect training data. After the data acquisition device 260 collects the training data, the training data is stored in a database 230 , and the training device 220 trains the target model/rule 201 based on the training data maintained in the database 230 .
下面对训练设备220基于训练数据得到目标模型/规则201进行描述。示例性地,训练设备220对多帧样本图像进行处输出对应的预测标签,并计算预测标签和样本的原始标签之间的损失,基于该损失对分类网络进行更新,直到预测标签接近样本的原始标签或者预测标签和原始标签之间的差异小于阈值,从而完成目标模型/规则201的训练。具体描述详见后文中的训练方法。The following describes how the training device 220 obtains the target model/rule 201 based on the training data. Exemplarily, the training device 220 processes multiple frames of sample images, outputs corresponding predicted labels, and calculates the loss between the predicted labels and the original labels of the samples, and updates the classification network based on the loss until the predicted labels are close to the original labels of the samples or the difference between the predicted labels and the original labels is less than a threshold, thereby completing the training of the target model/rule 201. For a detailed description, please refer to the training method in the following text.
本申请实施例中的目标模型/规则201具体可以为神经网络。需要说明的是,在实际的应用中,数据库230中维护的训练数据不一定都来自于数据采集设备260的采集,也有可能是从其他设备接收得到的。另外需要说明的是,训练设备220也不一定完全基于数据库230维护的训练数据进行目标模型/规则201的训练,也有可能从云端或其他地方获取训练数据进行模型训练,上述描述不应该作为对本申请实施例的限定。The target model/rule 201 in the embodiment of the present application can specifically be a neural network. It should be noted that in actual applications, the training data maintained in the database 230 does not necessarily all come from the collection of the data acquisition device 260, and may also be received from other devices. It should also be noted that the training device 220 does not necessarily train the target model/rule 201 entirely based on the training data maintained by the database 230, and may also obtain training data from the cloud or other places for model training. The above description should not be used as a limitation on the embodiments of the present application.
根据训练设备220训练得到的目标模型/规则201可以应用于不同的系统或设备中,如应用于图2所示的执行设备210,所述执行设备210可以是终端,如手机终端,平板电脑,笔记本电脑,增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR),车载终端,电视等,还可以是服务器或者云端等。在图2中,执行设备210配置有收发器212,该收发器可以包括输入/输出(input/output,I/O)接口或者其他无线或者有线的通信接口等,用于与外部设备进行数据交互,以I/O接口为例,用户可以通过客户设备240向I/O接口输入数据。The target model/rule 201 obtained by training the training device 220 can be applied to different systems or devices, such as the execution device 210 shown in FIG. 2 . The execution device 210 can be a terminal, such as a mobile phone terminal, a tablet computer, a laptop computer, augmented reality (AR)/virtual reality (VR), a vehicle terminal, a television, etc., or a server or a cloud. In FIG. 2 , the execution device 210 is configured with a transceiver 212, which can include an input/output (I/O) interface or other wireless or wired communication interfaces, etc., for data interaction with external devices. Taking the I/O interface as an example, a user can input data to the I/O interface through the client device 240.
在执行设备210对输入数据进行预处理,或者在执行设备210的计算模块212执行计算等相关的处理过程中,执行设备210可以调用数据存储系统250中的数据、代码等以用于相应的处理,也可以将相应处理得到的数据、指令等存入数据存储系统250中。When the execution device 210 preprocesses the input data, or when the computing module 212 of the execution device 210 performs calculations and other related processing, the execution device 210 can call the data, code, etc. in the data storage system 250 for corresponding processing, and can also store the data, instructions, etc. obtained from the corresponding processing into the data storage system 250.
最后,收发器212将处理结果返回给客户设备240,从而提供给用户。Finally, the transceiver 212 returns the processing result to the client device 240 so as to provide it to the user.
值得说明的是,训练设备220可以针对不同的目标或称不同的任务,基于不同的训练数据生成相应的目标模型/规则201,该相应的目标模型/规则201即可以用于实现上述目标或完成上述任务,从而为用户提供所需的结果。It is worth noting that the training device 220 can generate corresponding target models/rules 201 based on different training data for different goals or different tasks. The corresponding target models/rules 201 can be used to achieve the above goals or complete the above tasks, thereby providing users with the desired results.
在附图2中所示情况下,用户可以手动给定输入数据,该手动给定可以通过收发器212提供的界面进行操作。另一种情况下,客户设备240可以自动地向收发器212发送输入数据,如果要求客户设备240自动发送输入数据需要获得用户的授权,则用户可以在客户设备240中设置相应权限。用户可以在客户设备240查看执行设备210输出的结果,具体的呈现形式可以是显示、声音、动作等具体方式。客户设备240也可以作为数据采集端,采集如图所示输入收发器212的输入数据及输出收发器212的输出结果作为新的样本数据,并存入数据库230。当然,也可以不经过客户设备240进行采集,而是由收发器212直接将如图所示输入收发器212的输入数据及输出收发器212的输出结果,作为新的样本数据存入数据库230。In the case shown in FIG. 2 , the user can manually give input data, and the manual giving can be operated through the interface provided by the transceiver 212. In another case, the client device 240 can automatically send input data to the transceiver 212. If the client device 240 is required to automatically send input data, the user can set the corresponding authority in the client device 240. The user can view the results output by the execution device 210 on the client device 240, and the specific presentation form can be a specific method such as display, sound, action, etc. The client device 240 can also be used as a data acquisition terminal to collect the input data of the input transceiver 212 and the output result of the output transceiver 212 as shown in the figure as new sample data, and store it in the database 230. Of course, it is also possible not to collect through the client device 240, but the transceiver 212 directly stores the input data of the input transceiver 212 and the output result of the output transceiver 212 as new sample data in the database 230.
值得注意的是,附图2仅是本申请实施例提供的一种系统架构的示意图,图中所示设备、器件、模块等之间的位置关系不构成任何限制,例如,在图2中,数据存储系统250相对执行设备210是外部存储器,在其它情况下,也可以将数据存储系统250置于执行设备210中。It is worth noting that FIG2 is only a schematic diagram of a system architecture provided in an embodiment of the present application. The positional relationship between the devices, components, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG2, the data storage system 250 is an external memory relative to the execution device 210. In other cases, the data storage system 250 can also be placed in the execution device 210.
如图2所示,根据训练设备220训练得到目标模型/规则201,该目标模型/规则201在本申请实施例中可以是本申请中的推荐模型。As shown in FIG. 2 , a target model/rule 201 is obtained through training by a training device 220 . In an embodiment of the present application, the target model/rule 201 may be a recommendation model in the present application.
示例性地,本申请提供的神经网络训练方法的应用的系统架构可以如图3所示。在该系统架构300中,服务器集群310由一个或多个服务器实现,可选的,与其它计算设备配合,例如:数据存储、路由器、负载均衡器等设备。服务器集群310可以使用数据存储系统250中的数据,或者调用数据存储系统250中的程序代码实现本申请提供的神经网络训练方法的步骤。Exemplarily, the system architecture of the application of the neural network training method provided by the present application can be shown in Figure 3. In the system architecture 300, the server cluster 310 is implemented by one or more servers, and optionally, cooperates with other computing devices, such as data storage, routers, load balancers, etc. The server cluster 310 can use the data in the data storage system 250, or call the program code in the data storage system 250 to implement the steps of the neural network training method provided by the present application.
用户可以操作各自的用户设备(例如本地设备301和本地设备302)与服务器集群310进行交互。每个本地设备可以表示任何计算设备,例如个人计算机、计算机工作站、智能手机、平板电脑、智能摄像头、智能汽车或其他类型蜂窝电话、媒体消费设备、可穿戴设备、机顶盒、游戏机等。 Users can operate their respective user devices (e.g., local device 301 and local device 302) to interact with server cluster 310. Each local device can represent any computing device, such as a personal computer, a computer workstation, a smart phone, a tablet computer, a smart camera, a smart car or other type of cellular phone, a media consumption device, a wearable device, a set-top box, a game console, etc.
每个用户的本地设备可以通过任何通信机制/通信标准的通信网络与服务器集群310进行交互,通信网络可以是广域网、局域网、点对点连接等方式,或它们的任意组合。具体地,该通信网络可以包括无线网络、有线网络或者无线网络与有线网络的组合等。该无线网络包括但不限于:第五代移动通信技术(5th-Generation,5G)系统,长期演进(long term evolution,LTE)系统、全球移动通信系统(global system for mobile communication,GSM)或码分多址(code division multiple access,CDMA)网络、宽带码分多址(wideband code division multiple access,WCDMA)网络、无线保真(wireless fidelity,WiFi)、蓝牙(bluetooth)、紫蜂协议(Zigbee)、射频识别技术(radio frequency identification,RFID)、远程(Long Range,Lora)无线通信、近距离无线通信(near field communication,NFC)中的任意一种或多种的组合。该有线网络可以包括光纤通信网络或同轴电缆组成的网络等。The local device of each user can interact with the server cluster 310 through a communication network of any communication mechanism/communication standard, and the communication network can be a wide area network, a local area network, a point-to-point connection, etc., or any combination thereof. Specifically, the communication network may include a wireless network, a wired network, or a combination of a wireless network and a wired network, etc. The wireless network includes, but is not limited to: a fifth-generation mobile communication technology (5th-Generation, 5G) system, a long-term evolution (long term evolution, LTE) system, a global system for mobile communication (global system for mobile communication, GSM) or a code division multiple access (code division multiple access, CDMA) network, a wideband code division multiple access (wideband code division multiple access, WCDMA) network, wireless fidelity (wireless fidelity, WiFi), Bluetooth (bluetooth), Zigbee protocol (Zigbee), radio frequency identification technology (radio frequency identification, RFID), long-range (Lora) wireless communication, and near-field wireless communication (NFC) Any one or more combinations. The wired network may include an optical fiber communication network or a network composed of coaxial cables, etc.
在另一种实现中,执行设备210的一个方面或多个方面可以由每个本地设备实现,例如,本地设备301可以为执行设备210提供本地数据或反馈计算结果。In another implementation, one or more aspects of the execution device 210 may be implemented by each local device. For example, the local device 301 may provide local data or feedback calculation results to the execution device 210 .
需要注意的,执行设备210的所有功能也可以由本地设备实现。例如,本地设备301实现执行设备210的功能并为自己的用户提供服务,或者为本地设备302的用户提供服务。It should be noted that all functions of the execution device 210 can also be implemented by the local device. For example, the local device 301 implements the functions of the execution device 210 and provides services to its own user, or provides services to the user of the local device 302.
通常,机器学习系统可以包括个性化推荐系统,可以基于输入数据和标签,通过梯度下降等优化方法训练机器学习模型的参数,当模型参数收敛之后,可利用该模型来完成未知数据的预测。以个性化推荐系统中的点击率预测为例,其输入数据包括用户特征、物品特征和上下文特征等。如何根据用户的偏好,预测出个性化的推荐列表,对提升推荐系统的用户体验和平台收入有着重要的影响。Generally, a machine learning system can include a personalized recommendation system. Based on input data and labels, the parameters of the machine learning model can be trained through optimization methods such as gradient descent. After the model parameters converge, the model can be used to predict unknown data. Taking the click-through rate prediction in a personalized recommendation system as an example, its input data includes user features, item features, and context features. How to predict a personalized recommendation list based on user preferences has an important impact on improving the user experience of the recommendation system and the platform revenue.
示例性地,以推荐系统中的点击率预测模型为例,如图4所示,通常可以包括Embedding和MLP层,即如图4中所示出的特征交互层、深度神经网络层和预测层,Embedding用于将高维稀疏的数据映射至低维稠密的向量,MLP层一般用于拟合特征之间的组合关系、序列信息以逼近真实的点击率分布。主流模型均基于embedding参数表征特征,并基于该表征学习特征的显式/隐式组合关系,而推荐模型特征较多,导致Embedding规模大,如互联网公司可以达到TB级。嵌入表征词表(Embedding table)过大,单个GPU或NPU计算卡的显存不足以存储所有参数,需要多个节点来分布式存储。然而分布式存储带来了新的问题:需要更多的内存开销;在训练/推理阶段,Embedding参数需要通过网络拉取,带来了更多的通信开销,增加了模型计算的时延,最终影响推荐效果。For example, taking the click rate prediction model in the recommendation system as an example, as shown in FIG4 , it can generally include the Embedding and MLP layers, that is, the feature interaction layer, deep neural network layer and prediction layer shown in FIG4 . The Embedding is used to map high-dimensional sparse data to low-dimensional dense vectors, and the MLP layer is generally used to fit the combination relationship and sequence information between features to approximate the actual click rate distribution. Mainstream models are based on the representation of features based on the embedding parameters, and the explicit/implicit combination relationship of features is learned based on the representation. However, the recommendation model has many features, resulting in a large Embedding scale, such as TB level for Internet companies. The embedding representation vocabulary (Embedding table) is too large, and the video memory of a single GPU or NPU computing card is not enough to store all parameters, and multiple nodes are required for distributed storage. However, distributed storage brings new problems: more memory overhead is required; in the training/inference stage, the Embedding parameters need to be pulled through the network, which brings more communication overhead, increases the delay of model calculation, and ultimately affects the recommendation effect.
为了降低Embedding table的内存占用,通常可以对Embedding table进行量化,从而通过降低精度的方式对Embedding table进行压缩。In order to reduce the memory usage of the Embedding table, the Embedding table can usually be quantized, thereby compressing the Embedding table by reducing the precision.
例如,可以采用剪枝的方式进行压缩,设定参数阈值,对Embedding table中低于阈值的参数进行剪枝。裁剪Embedding参数后,再基于裁剪后的Embedding进行重训练。然而,仅压缩推理阶段内存,训练内存不会压缩;需要重训练,增加了训练成本;且生成的Embedding table为非结构化数据,需要特殊存储。For example, pruning can be used for compression. Parameter thresholds can be set and parameters in the Embedding table that are below the threshold can be pruned. After pruning the Embedding parameters, retraining can be performed based on the pruned Embedding. However, only the memory in the inference phase is compressed, and the training memory is not compressed; retraining is required, which increases the training cost; and the generated Embedding table is unstructured data and requires special storage.
又例如,可以采用基于AutoML的方式进行压缩,如可以基于强化学习、可微分架构学习方法(DARTS)方法端到端的调整Embedding table中特征的个数和不同特征的尺寸。当模型收敛之后,再对模型进行重训练。然而搜索时间长,实用性较差。For another example, compression can be performed based on AutoML, such as adjusting the number of features and the size of different features in the embedding table end-to-end based on the reinforcement learning and differentiable architecture learning method (DARTS) method. After the model converges, the model is retrained. However, the search time is long and the practicality is poor.
还例如,基于hash的方式进行压缩,高频特征独立分配embedding,低频特征分别使用hash函数映射,从而达到压缩低频特征embedding参数的目的。然而可能存在特征冲突,带来精度损失。For example, in the hash-based compression, high-frequency features are independently assigned embeddings, and low-frequency features are mapped using hash functions, thereby achieving the purpose of compressing the embedding parameters of low-frequency features. However, there may be feature conflicts, resulting in precision loss.
还例如,一些低精度训练方式中,训练过程中所有的参数存储的为低精度参数,通过反量化得到fp32全精度参数,然后进行前向和反向计算得到全精度梯度,然后按照学习率步长η更新fp32全精度参数,得到更新后的参数。然而,当权重更小幅度较小,远小于量化步长时,确定性舍入会抹去参数的更新,导致网络无法得到训练,从而影响训练精度。For example, in some low-precision training methods, all parameters in the training process are stored as low-precision parameters, and fp32 full-precision parameters are obtained through dequantization, and then forward and reverse calculations are performed to obtain full-precision gradients, and then the fp32 full-precision parameters are updated according to the learning rate step η to obtain updated parameters. However, when the weight is smaller and smaller, much smaller than the quantization step, deterministic rounding will erase the parameter update, causing the network to be unable to be trained, thereby affecting the training accuracy.
因此,本申请提供一种量化方法,用于通过设置自适应量化步长的方式,来保量更多的参数信息,提高量化准确度。Therefore, the present application provides a quantization method for preserving more parameter information and improving quantization accuracy by setting an adaptive quantization step size.
首先,为便于理解,对本申请提供的方法的应用场景进行介绍。First, to facilitate understanding, the application scenarios of the method provided in this application are introduced.
通常,本申请提供的量化方法可以应用于语言模型或者推荐模型中,该语言模型可以包括神经机器翻译或者PLM等模型,该推荐模型可以包括点击率预测模型,转化率预测模型等。如可以在模型中设置 Embedding table来提取输入语料的表征,然后获取表征对应的语义,随后进一步进行翻译或者语义识别等,具体可以根据模型所需执行的任务来进行后续步骤。Generally, the quantification method provided in this application can be applied to a language model or a recommendation model. The language model may include a neural machine translation or PLM model. The recommendation model may include a click-through rate prediction model, a conversion rate prediction model, etc. For example, The embedding table is used to extract the representation of the input corpus, and then the semantics corresponding to the representation is obtained, followed by further translation or semantic recognition. The subsequent steps can be carried out according to the tasks that the model needs to perform.
示例性地,以推荐场景为例,本申请应用推荐框架可以如图5A所示,可以分为训练部分和在线推理部分。其中,在训练部分,训练集中包括输入数据和对应的标签,如在用户商品推荐场景中,该训练集可以包括用户点击、收藏或喜欢的商品以及最终购买的商品。将训练集输入至初始模型,通过梯度下降等优化方法训练机器学习模型的参数,得到推荐模型。在线推理部分中,即可将推荐模型部署于推荐平台,如部署于服务器或者终端中,此处以服务器为例,即可通过服务器来输出针对用户的推荐列表,如在商品推荐场景中,即可在用户终端的主页展示为用户推荐的商品的信息,如商品图标或者链接标题等,或者在用户点击了某个商品后,即可在推荐区域展示为用户推荐的商品的图标或者链接标题等。Exemplarily, taking the recommendation scenario as an example, the application recommendation framework of the present application can be shown in FIG5A, which can be divided into a training part and an online reasoning part. Among them, in the training part, the training set includes input data and corresponding labels. For example, in the user product recommendation scenario, the training set can include products that the user clicks, collects or likes, and the products that are finally purchased. The training set is input into the initial model, and the parameters of the machine learning model are trained by optimization methods such as gradient descent to obtain a recommendation model. In the online reasoning part, the recommendation model can be deployed on the recommendation platform, such as deployed in a server or terminal. Here, taking the server as an example, the server can be used to output a recommendation list for the user. For example, in the product recommendation scenario, the information of the recommended products for the user can be displayed on the homepage of the user terminal, such as product icons or link titles, etc., or after the user clicks on a product, the icon or link title of the recommended product for the user can be displayed in the recommendation area.
在一些应用场景中,推荐流程可以如图5B所示,其中可以包括展示列表、日志、离线训练以及线上预测等部分。用户在前端展示列表中进行一系列的行为,如浏览、点击、评论、下载等,产生行为数据,存储于日志中。推荐系统利用包括用户行为日志在内的数据进行离线的模型训练,在训练收敛后产生预测模型,将模型部署在线上服务环境并基于用户的请求访问、商品特征和上下文信息给出推荐结果,然后用户对该推荐结果产生反馈形成用户数据。In some application scenarios, the recommendation process can be shown in FIG5B, which may include display lists, logs, offline training, and online predictions. Users perform a series of actions in the front-end display list, such as browsing, clicking, commenting, downloading, etc., to generate behavioral data, which is stored in the log. The recommendation system uses data including user behavior logs to perform offline model training, generates a prediction model after the training converges, deploys the model in an online service environment, and gives recommendation results based on user request access, product features, and contextual information. Then the user generates feedback on the recommendation results to form user data.
其中,在离线训练以及线上预测部分,当模型的Embedding table变大,都会导致训练内存的增大和计算时延的升高。为了同时降低训练和推理阶段的Embedding table内存占用,本申请提出了一种端到端的自适应低精度训练(Adaptive Low-Precision Training)框架,该框架可用于压缩推荐模型中Embedding table的内存,包括训练内存和推理内存,从而降低保存、使用以及训练模型的存储开销。In the offline training and online prediction parts, when the model's Embedding table becomes larger, it will lead to an increase in training memory and an increase in computing latency. In order to reduce the memory usage of the Embedding table in both the training and reasoning stages, this application proposes an end-to-end Adaptive Low-Precision Training framework, which can be used to compress the memory of the Embedding table in the recommendation model, including training memory and reasoning memory, thereby reducing the storage overhead of saving, using, and training models.
下面对本申请提供的量化方法的流程进行介绍。The process of the quantification method provided in this application is introduced below.
参阅图6,本申请提供的一种量化方法的流程示意图,如下所述。Referring to FIG6 , a flowchart of a quantification method provided by the present application is described as follows.
601、获取全精度嵌入表征。601. Obtain full-precision embedding representation.
其中,该全精度嵌入表征中可以包括多种特征。每种特征可以表示为一组或者多组特征向量。The full-precision embedded representation may include multiple features, and each feature may be represented as one or more sets of feature vectors.
该全精度嵌入表征可以包括embedding table中的全部或者部分特征。若获取到全精度embedding table,则可以直接从全精度embedding table中读取全部或者部分数据,得到前述的全精度嵌入表征。若获取到低精度embedding table,则可以从该低精度embedding table中读取全部或者部分特征,并对读取的特征进行反量化,得到全精度嵌入表征。The full-precision embedding representation may include all or part of the features in the embedding table. If the full-precision embedding table is obtained, all or part of the data can be directly read from the full-precision embedding table to obtain the aforementioned full-precision embedding representation. If the low-precision embedding table is obtained, all or part of the features can be read from the low-precision embedding table, and the read features can be dequantized to obtain the full-precision embedding representation.
通常,神经网络中的embedding层可以用于将高维稀疏的数据映射至低维稠密的向量,具体可以是从embedding table中查询与输入数据对应的低维度表征。可以理解为embedding table中存储了多种数据的低维度表征,通常输入数据为高维的稀疏数据,可以通过embedding table将高维稀疏数据映射为低维表征,相当于对输入数据中所包括的多个维度的语义进行了拆分。Generally, the embedding layer in a neural network can be used to map high-dimensional sparse data to low-dimensional dense vectors, specifically by querying the low-dimensional representation corresponding to the input data from the embedding table. It can be understood that the embedding table stores low-dimensional representations of various data. Usually, the input data is high-dimensional sparse data, and the high-dimensional sparse data can be mapped to low-dimensional representations through the embedding table, which is equivalent to splitting the semantics of multiple dimensions included in the input data.
可选地,在神经网络的训练过程中,可以从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;对当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的全精度嵌入表征。Optionally, during the training process of the neural network, the representation corresponding to the input data of the current iteration can be obtained from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; the low-precision embedding representation of the current iteration is dequantized to obtain the full-precision embedding representation of the current iteration.
可选地,该神经网络可以包括语言模型或者推荐模型中,该语言模型可以包括神经机器翻译或者PLM等模型,该推荐模型可以包括点击率预测模型,转化率预测模型等,因此本申请提供的方法可以应用于语言处理或者推荐场景中。Optionally, the neural network may include a language model or a recommendation model. The language model may include models such as neural machine translation or PLM. The recommendation model may include a click-through rate prediction model, a conversion rate prediction model, etc. Therefore, the method provided in this application can be applied to language processing or recommendation scenarios.
602、确定多种特征中每种特征对应的自适应步长。602. Determine an adaptive step size corresponding to each of the multiple features.
在对embedding进行量化之前,可以确定每种特征对应的自适应步长。Before quantizing the embedding, the adaptive step size corresponding to each feature can be determined.
可选地,可以采用启发式算法计算所述每种特征对应的自适应步长,或者通过学习式计算自适应步长。Optionally, a heuristic algorithm may be used to calculate the adaptive step size corresponding to each feature, or the adaptive step size may be calculated by learning.
其中,采用启发式算法具体可以包括:根据每种特征中权重绝对值来计算每种特征对应的自适应步长。例如,可以根据每个embedding向量中权重绝对值的最大值计算自适应量化步长:其中e为embedding参数向量,|·|为最大值,取当前向量的最大值做2^(m-1)等分,m为bit数。The heuristic algorithm may specifically include: calculating the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature. For example, the adaptive quantization step size may be calculated according to the maximum absolute value of the weight in each embedding vector: Where e is the embedding parameter vector, |·| is the maximum value, the maximum value of the current vector is taken and divided into 2^(m-1) equal parts, and m is the number of bits.
通过学习计算自适应补偿的方式应用于训练神经网络的过程中进行量化,如根据当前次迭代更新后的神经网络中的权重以及上一次迭代训练神经网络过程中更新的步长来计算当前次迭代中的自适应步 长,从而可以实现更高的训练精度。The adaptive compensation is applied to the training process of the neural network by learning and calculating the quantization, such as calculating the adaptive step size in the current iteration according to the weights in the neural network after the current iteration update and the step size updated in the training process of the neural network in the previous iteration. Long, thus achieving higher training accuracy.
通常,不同场景下可以采用不同的方式计算自适应步长,如在训练神经网络的场景中,可以选择启发式或者学习式的方式。例如,若精度需求高且训练资源较多,则可以选择学习式的方式来计算自适应步长,若对计算效率要求较高,则可以选择启发式的方式进行量化;又例如,在对Embedding table进行保存时,可以采用启发式算法计算自适应步长,从而可以高效计算得到自适应步长,无需依赖神经网络的训练相关参数。Generally, different methods can be used to calculate the adaptive step size in different scenarios. For example, in the scenario of training a neural network, you can choose a heuristic or learning method. For example, if the accuracy requirement is high and there are many training resources, you can choose a learning method to calculate the adaptive step size. If the computational efficiency requirement is high, you can choose a heuristic method for quantization. For example, when saving the Embedding table, you can use a heuristic algorithm to calculate the adaptive step size, so that the adaptive step size can be calculated efficiently without relying on the training-related parameters of the neural network.
此外,在计算每种特征对应的自适应步长后,可以保存每种特征对应的自适应步长,以便于后续进行反量化时,可以基于自适应步长对低精度特征进行无损反量化,得到全精度特征。In addition, after calculating the adaptive step size corresponding to each feature, the adaptive step size corresponding to each feature can be saved, so that when dequantization is performed later, the low-precision features can be losslessly dequantized based on the adaptive step size to obtain full-precision features.
可选地,在神经网络的训练过程中,可以将当前次迭代的全精度嵌入表征作为神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;根据全精度梯度获取更新全精度嵌入表征,得到更新后的全精度嵌入表征;根据全精度梯度获取更新后的全精度嵌入表征中每种特征分别对应的自适应步长。因此,在训练过程中,可以根据更新的参数实时更新与更新后的参数适配的自适应步长。通常若按照固定步长进行量化,对于参数更新小于量化步长的场景,将可能直接截断导致数据丢失,而本申请提供的方法中,当参数更新较少时,可以基于更新的参数自适应的计算步长,从而可以保留更新较少的参数,可以减少精度损失。Optionally, during the training of the neural network, the full-precision embedding representation of the current iteration can be used as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; the full-precision embedding representation is updated according to the full-precision gradient to obtain the updated full-precision embedding representation; the adaptive step size corresponding to each feature in the updated full-precision embedding representation is obtained according to the full-precision gradient. Therefore, during the training process, the adaptive step size adapted to the updated parameters can be updated in real time according to the updated parameters. Usually, if quantization is performed according to a fixed step size, for scenarios where the parameter update is less than the quantization step size, it may be directly truncated to cause data loss. In the method provided in the present application, when the parameter update is less, the calculation step size can be adaptively calculated based on the updated parameters, so that the parameters with less updates can be retained, which can reduce the loss of precision.
603、根据每种特征对应的自适应步长分别对多种特征进行量化,得到低精度嵌入表征。603. Quantize multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation.
在确定全精度嵌入表征中每种特征对应的自适应步长后,即可基于每种特征对应的自适应步长分别对每种特征进行量化,得到低精度嵌入表征。因此保存或者传输该低精度嵌入表征所需的计算设备的存储资源或者传输资源低于保存或者传输全精度嵌入表征所需的计算设备的存储资源,该计算设备可以包括执行本申请提供的量化方法或者推荐方法的设备。After determining the adaptive step size corresponding to each feature in the full-precision embedded representation, each feature can be quantized based on the adaptive step size corresponding to each feature to obtain a low-precision embedded representation. Therefore, the storage resources or transmission resources of the computing device required to save or transmit the low-precision embedded representation are lower than the storage resources of the computing device required to save or transmit the full-precision embedded representation. The computing device may include a device that executes the quantization method or recommended method provided in this application.
本申请实施方式中,针对全精度的embedding table中的每种特征,分别计算了对应的自适应步长,并根据自适应步长进行量化。因此,在进行量化时,可以基于匹配的自适应步长进行量化,对于一些数量与量化比特不匹配的特征,可以采用自适应步长进行量化,相对于使用固定步长进行量化,使用自适应步长量化可以减少精度损失,提高量化精度。In the implementation of the present application, for each feature in the full-precision embedding table, the corresponding adaptive step size is calculated and quantized according to the adaptive step size. Therefore, when quantizing, quantization can be performed based on the matching adaptive step size. For some features whose number does not match the quantization bit, the adaptive step size can be used for quantization. Compared with quantization using a fixed step size, the use of adaptive step size quantization can reduce precision loss and improve quantization accuracy.
此外,若前述步骤601至步骤603为更新神经网络的其中一个迭代过程,则在量化得到低精度嵌入表征后,将基于低精度嵌入表征更新低精度嵌入表征词表,得到更新后的低精度嵌入表征词表,即将更新后的低精度嵌入表征写回低精度embedding table中。In addition, if the aforementioned steps 601 to 603 are one of the iterative processes of updating the neural network, after the low-precision embedding representation is obtained by quantization, the low-precision embedding representation vocabulary is updated based on the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary, and the updated low-precision embedding representation is written back to the low-precision embedding table.
本申请的方法可以应用于模型保存或者模型训练过程各种,如在保存模型时通过本申请提供的量化方法实现更低精度的量化,或者,在训练模型过程中,通过本申请提供的量化方法,可以降低训练时所需传输的数据量,减少所需的缓存空间。The method of the present application can be applied to various model preservation or model training processes. For example, when saving a model, a quantization method provided by the present application can be used to achieve lower precision quantization. Alternatively, in the process of training a model, the quantization method provided by the present application can be used to reduce the amount of data required to be transmitted during training and reduce the required cache space.
对于在保存模型之前量化的场景,可以参阅前述图6中的步骤,下面以模型训练过程中进行量化的流程为例进行示例性介绍。For the scenario of quantization before saving the model, you can refer to the steps in Figure 6 above. The following is an exemplary introduction using the process of quantization during model training as an example.
以应用于训练场景为例,在训练时,每次迭代训练过程中都可以对Embedding table中的全部或者部分特征进行量化,以其中一次迭代训练过程为例,本申请提供的量化方法的流程可以如图7所示。Taking the application in training scenarios as an example, during training, all or part of the features in the Embedding table can be quantized in each iterative training process. Taking one of the iterative training processes as an example, the process of the quantization method provided in this application can be shown in Figure 7.
其中,应理解,在迭代训练过程中,通常可以以一个或多个epoch进行训练,每个epoch可以分为多个batch,本申请实施例中,以其中一个batch为例进行示例性介绍。It should be understood that, in the iterative training process, training can usually be performed in one or more epochs, and each epoch can be divided into multiple batches. In the embodiment of the present application, one of the batches is taken as an example for exemplary introduction.
701、从低精度embedding table中确定低精度batch Embedding。701. Determine low-precision batch embedding from the low-precision embedding table.
其中,在一个batch中,可以将当前batch训练神经网络的输入数据作为embedding层的输入,通过低精度embedding table将输入数据映射为低精度的低维嵌入表征,即低精度batch Embedding。Among them, in a batch, the input data of the current batch training neural network can be used as the input of the embedding layer, and the input data can be mapped into a low-precision, low-dimensional embedding representation through a low-precision embedding table, that is, a low-precision batch Embedding.
702、对低精度batch embedding进行反量化得到全精度batch embedding。702. Dequantize the low-precision batch embedding to obtain the full-precision batch embedding.
在得到低精度batch embedding后,可以对低精度batch embedding进行反量化,即量化的逆运算,从而得到全精度batch embedding,以便于神经网络可以基于全精度batch embedding来得到与输入样本对应的表征。After obtaining the low-precision batch embedding, the low-precision batch embedding can be dequantized, that is, the inverse operation of quantization, to obtain the full-precision batch embedding, so that the neural network can obtain the representation corresponding to the input sample based on the full-precision batch embedding.
703、通过全精度batch embedding获取神经网络的当前批预测结果对应的全精度梯度。703. Get the full-precision gradient corresponding to the current batch prediction results of the neural network through full-precision batch embedding.
在得到全精度batch embedding之后,在当前batch的神经网络训练过程中,可以将训练样本对应 的全精度batch embedding作为神经网络的输入,输出预测结果。随后基于预测结果和输入的训练样本的真实标签,计算损失函数的值,并基于损失函数的值计算当前batch中神经网络的参数的全精度梯度。After obtaining the full-precision batch embedding, during the neural network training process of the current batch, the training samples can be mapped to The full-precision batch embedding of is used as the input of the neural network and the prediction result is output. Then, based on the prediction result and the true label of the input training sample, the value of the loss function is calculated, and the full-precision gradient of the parameters of the neural network in the current batch is calculated based on the value of the loss function.
704、根据全精度梯度更新神经网络的权重,得到更新后的神经网络。704. Update the weights of the neural network according to the full-precision gradient to obtain an updated neural network.
在得到全精度梯度后,即可基于该全精度梯度更新神经网络的权重,得到当前batch更新后的神经网络。After obtaining the full-precision gradient, the weights of the neural network can be updated based on the full-precision gradient to obtain the updated neural network for the current batch.
如可以通过反向传播算法对神经网络的参数进行更新。通常前向传递输入信号直至输出会产生误差损失,通过反向传播误差损失信息来更新初始的神经网络模型中参数,从而使误差损失收敛。For example, the parameters of the neural network can be updated through the back propagation algorithm. Usually, the forward transmission of the input signal to the output will generate error loss, and the parameters in the initial neural network model are updated by back propagating the error loss information, so that the error loss converges.
705、根据全精度梯度更新全精度batch Embedding,得到新的全精度batch Embedding以及量化步长。705. Update the full-precision batch Embedding according to the full-precision gradient to obtain a new full-precision batch Embedding and quantization step size.
在得到全精度梯度后,可以基于该全精度梯度来更新自适应步长,并基于自适应步长对全精度batch Embedding进行量化,得到新的低精度batch Embedding,并将更新后的低精度batch Embedding保存至低精度Embedding table中,得到实现Embedding table的低精度保存以及传输,减少保存Embedding table以及传输Embedding table所需的存储空间。After obtaining the full-precision gradient, the adaptive step size can be updated based on the full-precision gradient, and the full-precision batch Embedding can be quantized based on the adaptive step size to obtain a new low-precision batch Embedding, and the updated low-precision batch Embedding can be saved in the low-precision Embedding table to achieve low-precision storage and transmission of the Embedding table, thereby reducing the storage space required for saving and transmitting the Embedding table.
具体地,可以通过学习式计算自适应步长,可以结合每次迭代更新的后权重来计算自适应步长,从而可以基于神经网络的更新过程实时量化Embedding table,从而降低训练以及保存过程中所占用的存储空间。Specifically, the adaptive step size can be calculated through learning and combined with the post-weight updated in each iteration, so that the Embedding table can be quantized in real time based on the update process of the neural network, thereby reducing the storage space occupied during training and saving.
当然,也可以通过启发式算法来计算自适应步长,如根据更新后的全精度batch embedding的权重的绝对值来计算全精度batch embedding中每种特征对应的自适应步长,从而可以高效准确地计算出自适应步长。Of course, the adaptive step size can also be calculated through heuristic algorithms, such as calculating the adaptive step size corresponding to each feature in the full-precision batch embedding according to the absolute value of the updated full-precision batch embedding weight, so that the adaptive step size can be calculated efficiently and accurately.
706、根据自适应量化步长对新的batch embedding进行量化,得到新的低精度batch embedding。706. Quantize the new batch embedding according to the adaptive quantization step size to obtain a new low-precision batch embedding.
在得到自适应量化步长后,即可基于该自适应量化步长对更新后的全精度batch embedding进行量化,得到新的低精度batch embedding。After obtaining the adaptive quantization step size, the updated full-precision batch embedding can be quantized based on the adaptive quantization step size to obtain a new low-precision batch embedding.
可选地,在具体的量化过程中,可以根据每种特征对应的自适应步长,来获取每种特征中的离散值,或者称为离散特征,随后可以通过随机截断算法对每种特征的离散特征进行截断,从而得到低精度Embedding table,在通过随机截断算法进行截断时,每种离散特征的值来确定截断值,从而截断的长度与特征值的更新至匹配,即使参数更新幅度小,也可以对更新的部分进行量化,实现量化的准确度。Optionally, in a specific quantization process, the discrete value of each feature, or discrete feature, can be obtained according to the adaptive step size corresponding to each feature. Subsequently, the discrete feature of each feature can be truncated by a random truncation algorithm to obtain a low-precision Embedding table. When truncating by the random truncation algorithm, the value of each discrete feature determines the truncation value, so that the truncation length matches the update of the feature value. Even if the parameter update amplitude is small, the updated part can be quantized to achieve quantization accuracy.
707、判断是否收敛,若是,则终止迭代,若否,则执行步骤701。707. Determine whether convergence has occurred. If so, terminate the iteration. If not, execute step 701.
在每个batch训练后,即可判断神经网络是否收敛,若是,则可以终止迭代,即输出当前batch训练后的神经网络,若否,神经网络未收敛,即可继续进行迭代训练。After each batch of training, it can be determined whether the neural network has converged. If so, the iteration can be terminated, that is, the neural network after the current batch training is output. If not, the neural network has not converged, and the iterative training can continue.
其中,确定神经网络是否收敛,可以是判断迭代次数是否达到预设次数,损失值的变化值小于预设值或者迭代时长是否达到预设时长等,具体可以根据自己应用场景确定,本申请对此不作限定。Among them, determining whether the neural network converges can be to judge whether the number of iterations reaches a preset number, the change in the loss value is less than a preset value, or whether the iteration duration reaches a preset duration, etc., which can be determined according to your own application scenario, and this application does not limit this.
因此,本申请实施方式中,在训练神经网络的过程中,可以基于计算得到的梯度来更新自适应步长,根据与每种特征适配的自适应步长来进行量化,从而可以在尽可能保证每种特征的量化精度,可以实现更低精度的量化,且可以减少量化时的丢失信息。Therefore, in the implementation mode of the present application, in the process of training the neural network, the adaptive step size can be updated based on the calculated gradient, and quantization can be performed according to the adaptive step size adapted to each feature, so that the quantization accuracy of each feature can be guaranteed as much as possible, lower precision quantization can be achieved, and the loss of information during quantization can be reduced.
前述对本申请提供的量化方法应用于神经网络训练过程中的流程进行了介绍,为便于理解,下面结合更具体的推荐场景,对本申请提供的量化方法进行介绍。The foregoing describes the process of applying the quantization method provided in the present application to the neural network training process. For ease of understanding, the quantization method provided in the present application is described below in conjunction with more specific recommended scenarios.
参阅图8,本申请提供的另一种量化方法的流程示意图,如下所述。Referring to FIG8 , a flow chart of another quantization method provided by the present application is described as follows.
在前向阶段,推荐模型中输入的是一个batch的高维稀疏数据,读取batch数据中的特征id,从低精度的embedding Table中读取对应的batch embedding,然后通过反量化,得到可以进行后续神经网络等计算的全精度表示的低精度batch embedding;在反向阶段,从上层网络得到当前batch embedding的梯度,基于梯度更新batch embedding,由于embedding table中存储的是低精度参数,故需要通过量化得到低精度batch embedding,然后最终写入低精度Embedding Table,具体步骤可以包括如下所述。In the forward stage, the recommendation model inputs a batch of high-dimensional sparse data, reads the feature id in the batch data, reads the corresponding batch embedding from the low-precision embedding table, and then obtains the low-precision batch embedding with full-precision representation that can be used for subsequent neural network calculations through inverse quantization; in the reverse stage, the gradient of the current batch embedding is obtained from the upper network, and the batch embedding is updated based on the gradient. Since the embedding table stores low-precision parameters, it is necessary to obtain the low-precision batch embedding through quantization, and then finally write it into the low-precision Embedding Table. The specific steps may include the following.
首先读取用户的日志数据801,即可将该日志数据作为推荐模型的训练集。First, the user's log data 801 is read, and the log data can be used as a training set for the recommendation model.
该用户的日志数据中可以包括用户使用客户端时产生的信息,通常用户使用不同的客户端时可以产 生不同的信息。如在用户使用音乐app时,可以将用户播放、点击、收藏或者搜索的音乐信息保存在用户的日志中;如在用户使用购物app时,可以将用户浏览、收藏或者购买的物品的信息保存在用户的日志中;又如在用户使用应用市场时,可以将用户点击、下载、安装或者收藏的app的信息保存在用户的日志中等。The user's log data may include information generated when the user uses the client. Usually, when the user uses different clients, Different information can be generated. For example, when a user uses a music app, the information of the music played, clicked, collected or searched by the user can be saved in the user's log; when a user uses a shopping app, the information of the items browsed, collected or purchased by the user can be saved in the user's log; when a user uses an application market, the information of the apps clicked, downloaded, installed or collected by the user can be saved in the user's log, etc.
随后,从用户日志数据中读取当前batch的高维稀疏batch数据802。Subsequently, high-dimensional sparse batch data 802 of the current batch is read from the user log data.
在每个batch中,可以从用户的日志数据中抽取部分作为当前batch的高维稀疏数据,作为当期次迭代的训练数据。In each batch, part of the user's log data can be extracted as the high-dimensional sparse data of the current batch and used as the training data for the current iteration.
随后,从低精度embedding table803中读取对应的低精度batch embedding。Then, the corresponding low-precision batch embedding is read from the low-precision embedding table803.
通常,用户的日志数据中为高维稀疏数据,因此可以通过embedding table将高维稀疏数据映射为低维特征,以便于模型可以识别出每种特征并进行处理。即在从日志数据中读取当前batch的高维稀疏batch数据后,可以通过低精度embedding table将高维稀疏batch数据映射为低维表征,如表示为低精度低精度batch embedding。Usually, the user's log data is high-dimensional sparse data, so the high-dimensional sparse data can be mapped to low-dimensional features through the embedding table so that the model can recognize each feature and process it. That is, after reading the high-dimensional sparse batch data of the current batch from the log data, the high-dimensional sparse batch data can be mapped to low-dimensional representations through the low-precision embedding table, such as expressed as low-precision batch embedding.
随后,进行反量化得到全精度batch embedding804。Subsequently, dequantization is performed to obtain the full-precision batch embedding804.
在得到低精度batch embedding后,通过反量化算法对低精度batch embedding进行反量化,得到全精度batch embedding。After obtaining the low-precision batch embedding, the low-precision batch embedding is dequantized through the dequantization algorithm to obtain the full-precision batch embedding.
如可以通过de-quantization函数,得到fp32全精度参数其中Δ为与该batch embedding对应的自适应步长。For example, you can use the de-quantization function to get the fp32 full-precision parameters Where Δ is the adaptive step size corresponding to the batch embedding.
随后,即可将全精度batch embedding作为推荐模型805的输入,输出预测结果806。Subsequently, the full-precision batch embedding can be used as the input of the recommendation model 805 and the prediction result 806 can be output.
然后根据预测结果806计算当前batch的全精度梯度,并基于当前batch的全精度梯度更新batch embedding和量化步长807。Then, the full-precision gradient of the current batch is calculated based on the prediction result 806, and the batch embedding and quantization step size 807 are updated based on the full-precision gradient of the current batch.
在得到预测结果后,即可计算预测结果与输入样本的真实标签之间的损失值,并基于该损失值进行反向传播,计算得到当前batch推荐模型中各个参数的全精度梯度按照学习率步长η更新batch embedding中的fp32全精度参数得到ω以及量化步长Δ。After obtaining the prediction result, the loss value between the prediction result and the true label of the input sample can be calculated, and back propagation is performed based on the loss value to calculate the full-precision gradient of each parameter in the current batch recommendation model. Update the fp32 full-precision parameters in batch embedding according to the learning rate step η Get ω and quantization step size Δ.
具体地,可以采用启发式计算自适应量化步长,也可以采用学习式计算自适应量化步长。Specifically, the adaptive quantization step size may be calculated by a heuristic method or by a learning method.
如启发式计算自适应步长的步骤可以表示为:根据每个embedding向量中权重绝对值的最大值计算自适应量化步长:其中e为embedding参数向量,|·|为最大值,本方法的物理意义即取当前向量的最大值做2^(m-1)等分,m为bit数。For example, the steps for heuristically calculating the adaptive step size can be expressed as: Calculate the adaptive quantization step size based on the maximum absolute value of the weight in each embedding vector: Where e is the embedding parameter vector, |·| is the maximum value, and the physical meaning of this method is to take the maximum value of the current vector and divide it into 2^(m-1) equal parts, and m is the number of bits.
如学习式计算自适应量化步长的步骤可以包括:在权重更新后,将更新的权重与未更新的量化步长进行量化感知方式的训练,以端到端的更新量化步长。如表示为:For example, the step of learning to calculate the adaptive quantization step size may include: after the weight is updated, the updated weight and the unupdated quantization step size are trained in a quantization-aware manner to update the quantization step size end-to-end. For example, it can be expressed as:
首先更新权重参数:


First update the weight parameters:


随后更新自适应步长,如表示为:

The adaptive step size is then updated as expressed as:

随后输出更新后的embedding参数更新后的自适应步长以及更新后的推荐模型的参数 Then output the updated embedding parameters Updated adaptive step size And the parameters of the updated recommendation model
随后进行量化得到低精度batch embedding808,并写回Embedding table。It is then quantized to obtain the low-precision batch embedding808 and written back to the Embedding table.
在得到每种特征对应的自适应步长后,可以对更新后的参数ω进行量化,如量化可过程可以表示为: After obtaining the adaptive step size corresponding to each feature, the updated parameter ω can be quantized. The quantization process can be expressed as:
其中,m是bit数,R()是阶段Rounding函数,通常可以包括多种,如可以包括确定性截断舍入或随机截断舍入等。当权重更小幅度较小,远小于量化步长时,若使用确定性截断舍入会抹去参数的更新,将可能导致网络无法得到训练。因此,本申请通过随机截断的方式来进行截断,如可以表示为:
Where m is the number of bits, R() is the stage rounding function, which can usually include multiple types, such as deterministic truncation rounding or random truncation rounding. When the weight is smaller and much smaller than the quantization step, if deterministic truncation rounding is used, the parameter update will be erased, which may cause the network to be unable to be trained. Therefore, this application uses random truncation to perform truncation, which can be expressed as:
clip函数用于:当ω/Δ小于-2m-1返回值为-2m-1,若ω/Δ大于2m-1,则返回2m-1The clip function is used to return -2 m -1 when ω/Δ is less than -2 m -1 , and 2 m - 1 if ω/Δ is greater than 2 m-1 .
本申请实施方式中,为每一个特征的Embedding参数更好的选择量化步长以保留尽可能多的参数信息,帮助模型在低精度训练时仍然可以保证收敛效果。可以通过更低精度的训练,减少了训练和推理过程中embedding的内存占用和通信开销,使相同内存可以容纳更多参数。此外,还可以使用随机截断Rounding函数,以保证低精度训练过程中的梯度信息不会由于确定性截断而丢失信息。此外,在更新自适应步长时,提供了启发式自适应量化步长和学习式的自适应量化步长以适应不同的应用场景,以避免不同特征的量化步长需要人工选择,提升了模型训练以及量化效率。In the implementation manner of the present application, a quantization step size is better selected for the Embedding parameters of each feature to retain as much parameter information as possible, helping the model to still ensure convergence during low-precision training. Through lower-precision training, the memory usage and communication overhead of embedding during training and reasoning can be reduced, so that the same memory can accommodate more parameters. In addition, a randomly truncated Rounding function can be used to ensure that the gradient information in the low-precision training process will not be lost due to deterministic truncation. In addition, when updating the adaptive step size, heuristic adaptive quantization step sizes and learning-based adaptive quantization step sizes are provided to adapt to different application scenarios, so as to avoid the need for manual selection of quantization step sizes for different features, thereby improving model training and quantization efficiency.
为便于理解,下面示例性地,以一些具体的应用场景为例,对本申请提供的量化方法的效果进行介绍。For ease of understanding, the following exemplarily introduces the effects of the quantification method provided in this application by taking some specific application scenarios as examples.
在大量的个性化服务场景当中,用户与商品之间会有基于不同类型行为的交互记录,推荐模型将对用户的多行为交互历史进行建模,预测用户的基于目标行为可能产生交互的商品,并将商品排序后展示给用户。可以通过本申请提供的方式来进行点击率预测,并按照预测的点击率进行排序,并在推荐页面按照排序进行展示;或者按照预测的点击率的值进行排序展示;或者对点击率的前几进行排序;或者对各个待推荐的对象进行评分,按照评分值进行排序展示等。In a large number of personalized service scenarios, there will be interaction records between users and products based on different types of behaviors. The recommendation model will model the user's multi-behavior interaction history, predict the products that the user may interact with based on the target behavior, and sort the products and display them to the user. Click-through rate prediction can be performed in the manner provided by this application, and the products can be sorted according to the predicted click-through rate and displayed in the recommended page in the sorted order; or the predicted click-through rate value can be sorted and displayed; or the top few click-through rates can be sorted; or each object to be recommended can be scored, and the items can be sorted and displayed according to the score value, etc.
例如,本申请提供的方法可以应用于APP推荐场景,如图9所示,可以在用户的终端的显示界面中显示推荐的app的图标,以便于用户对推荐的app进行进一步的点击或者下载等操作,使用户可以快速查找所需的app,提高用户体验。For example, the method provided in the present application can be applied to an APP recommendation scenario. As shown in FIG. 9 , the icon of the recommended app can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking or downloading the recommended app, thereby allowing the user to quickly find the required app and improve the user experience.
又例如,本申请提供的方法可以应用于商品推荐场景,如图10所示,可以在用户的终端的显示界面中显示推荐的商品的图标,以便于用户对推荐的商品进行进一步的点击、加购或者购买等操作,使用户可以查看所需的商品,提高用户体验。For another example, the method provided in the present application can be applied to a product recommendation scenario. As shown in FIG10 , the icon of the recommended product can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking, adding to cart, or purchasing the recommended product, thereby allowing the user to view the required products and improving the user experience.
还例如,本申请提供的方法可以应用于音乐推荐场景,如图11所示,可以在用户的终端的显示界面中显示推荐的音乐的图标,以便于用户对推荐的音乐进行进一步的点击、收藏或者播放等操作,使用户可以查看更偏好的音乐,提高用户体验。For example, the method provided in the present application can be applied to a music recommendation scenario. As shown in FIG11 , an icon of recommended music can be displayed in the display interface of the user's terminal to facilitate the user to perform further operations such as clicking, collecting or playing the recommended music, thereby allowing the user to view more preferred music and improving the user experience.
以app推荐场景中的点击率预测场景为例,点击率预测模型通常可以包括embedding和MLP两个部分,推荐数据高维稀疏,embedding table很大,会造成内存占用变大、训练时延升高等问题。而常用的剪枝、AutoML方法无法压缩训练内存,基于hash的方法精度会有损失,传统的低精度训练方法也只仅能使用INT16,且未考虑如何使用自适应量化步长。而通过本申请提供的基于自适应量化步长的量化方法中,离线训练点击率预测模型时,将连续型特征先进行归一化,然后进行自动离散化。Taking the click-through rate prediction scenario in the app recommendation scenario as an example, the click-through rate prediction model can usually include two parts: embedding and MLP. The recommended data is high-dimensional and sparse, and the embedding table is large, which will cause problems such as increased memory usage and increased training latency. The commonly used pruning and AutoML methods cannot compress the training memory, and the accuracy of the hash-based method will be lost. The traditional low-precision training method can only use INT16, and does not consider how to use adaptive quantization step size. In the quantization method based on adaptive quantization step size provided by this application, when training the click-through rate prediction model offline, the continuous features are first normalized and then automatically discretized.
在离线训练过程中,在每个batch中,从低精度Embedding Table取Batch Embedding;通过反量化计算得到全精度表示的低精度参数,用于MLP层计算,最终输出预测值;在训练阶段,输出预测值与预测值计算损失函数,依赖反向梯度计算得到Batch Embedding的全精度梯度;基于Batch全精度梯度更新Batch Embedding模块,自适应更新量化步长;基于自适应量化步长将Batch Embedding量化为低精度参数;随后将低精度Batch Embedding写回embedding table。During offline training, in each batch, Batch Embedding is taken from the low-precision Embedding Table; low-precision parameters represented by full precision are obtained through inverse quantization calculation, which are used for MLP layer calculation and finally output predicted values; in the training phase, the predicted values are output and the loss function is calculated with the predicted values, and the full-precision gradient of Batch Embedding is obtained by reverse gradient calculation; the Batch Embedding module is updated based on the Batch full-precision gradient, and the quantization step size is adaptively updated; the Batch Embedding is quantized into low-precision parameters based on the adaptive quantization step size; and the low-precision Batch Embedding is then written back to the embedding table.
在在线推理阶段,即可从低精度embedding table读取与输入数据对应的embedding,并进行反量化得到全精度embedding,将全精度embedding作为点击率预测模型的输入,输出预测结果。In the online inference stage, the embedding corresponding to the input data can be read from the low-precision embedding table, and dequantized to obtain the full-precision embedding. The full-precision embedding is used as the input of the click-through rate prediction model to output the prediction result.
示例性地以,以一些公开数据集为例,对一些已有的量化方式和本申请提供的量化方式进行对比,如使用Avazu数据集和Criteo数据集。其数据集的统计信息可以如表1所示。
For example, some public data sets are used as examples to compare some existing quantization methods with the quantization method provided by the present application, such as using the Avazu data set and the Criteo data set. The statistical information of the data sets can be shown in Table 1.
表1Table 1
其中,数据集中的训练集和测试集按照用户切分,90%的用户做训练集,10%的用户做测试集。对离散特征进行one-hot编码,连续特征进行离散化。评价指标包括AUC(Area Under Curve)。The training set and test set in the dataset are divided according to users, with 90% of users as training sets and 10% of users as test sets. Discrete features are one-hot encoded and continuous features are discretized. Evaluation indicators include AUC (Area Under Curve).
一些已有的量化方式例如:全精度方法(Full Precision,FP)、量化感知方法(LSQ)、基于动态步长的量化感知方法(PACT)、INT8低精度训练方法(LPT)以及INT16低精度训练方法(LPT-16)等,本申请提供的量化方式可以基于不同的自适应步长计算方式,如表示为:启发式自适应步长INT8低精度训练方式(ALPT_H)以及可学习自适应步长INT8低精度训练方式(ALPT_L)。Some existing quantization methods include: full precision method (Full Precision, FP), quantization-aware method (LSQ), quantization-aware method based on dynamic step size (PACT), INT8 low precision training method (LPT) and INT16 low precision training method (LPT-16), etc. The quantization method provided in this application can be based on different adaptive step size calculation methods, such as: heuristic adaptive step size INT8 low precision training method (ALPT_H) and learnable adaptive step size INT8 low precision training method (ALPT_L).
对比结果可以如表2所示:
The comparison results can be shown in Table 2:
表2Table 2
其中,前述表2中采用的是确定性Rounding函数,随机阶段Rounding函数在低精度训练中取得了更好的效果,如表3所示。
Among them, the deterministic Rounding function is used in the above Table 2, and the random stage Rounding function achieves better results in low-precision training, as shown in Table 3.
表3table 3
由上述表2和表3对比,已有的低精度训练方式采用确定性截断且未考虑自适应量化步长,仅能基于INT16进行低精度参数训练,导致更低精度训练时模型收敛困难。如压缩推理阶段Embedding参数,且需要重训练,实用性较低。一些量化方式虽然可以通过hash的方法压缩参数,但由于hash函数不可避免的碰撞导致精度较低。些量化方式虽然可以通过INT16训练模型,但更低精度的训练往往难以收敛。为端到端的进行更低精度的低精度训练,本申请提出使用随机截断Rounding函数保证了训练过程中梯度信息的参数更新;且提出了为每个特征分配自适应量化步长以更好的选择量化步长,以保留尽可能多的参数信息。By comparing Table 2 and Table 3 above, the existing low-precision training method adopts deterministic truncation and does not consider adaptive quantization step size. It can only perform low-precision parameter training based on INT16, which makes it difficult for the model to converge during lower precision training. For example, the Embedding parameters in the compression inference stage need to be retrained, which has low practicality. Although some quantization methods can compress parameters through the hash method, the accuracy is low due to the inevitable collision of the hash function. Although some quantization methods can train the model through INT16, lower precision training is often difficult to converge. In order to perform lower precision low-precision training end-to-end, this application proposes to use a randomly truncated Rounding function to ensure the parameter update of gradient information during training; and proposes to assign an adaptive quantization step size to each feature to better select the quantization step size to retain as much parameter information as possible.
此外,基于前述的量化方法,本申请还提供一种推荐方法,如图12所示,具体可以包括:In addition, based on the aforementioned quantification method, the present application also provides a recommendation method, as shown in FIG12 , which may specifically include:
1201、获取输入数据。1201. Get input data.
其中,该输入数据可以包括用户针对终端的至少一种行为产生的数据。The input data may include data generated by at least one behavior of the user on the terminal.
例如,用户点击或者播放某个音乐时,可以采集用户点击该音乐的信息,或者用户下载或者安装某个app时,可以采集用户下载或者安装该app的信息。For example, when a user clicks on or plays a piece of music, information about the user clicking on the music can be collected; or when a user downloads or installs an app, information about the user downloading or installing the app can be collected.
1202、从低精度embedding table中获取与输入数据对应的低精度embedding。1202. Get the low-precision embedding corresponding to the input data from the low-precision embedding table.
在得到输入数据后,可以通过embedding table将输入数据转换为神经网络可识别的特征。低精度embedding Table中通常保存了原始数据和表征之间的映射关系,在得到输入数据后,即可基于该映射 关系,将输入数据映射为低精度embedding。After obtaining the input data, the input data can be converted into features that can be recognized by the neural network through the embedding table. The low-precision embedding table usually stores the mapping relationship between the original data and the representation. After obtaining the input data, the embedding table can be used to convert the input data into features that can be recognized by the neural network. relation, mapping the input data into low-precision embeddings.
1203、根据每种特征对应的自适应步长对多种特征进行反量化,得到全精度embedding。1203. Dequantize multiple features according to the adaptive step size corresponding to each feature to obtain full-precision embedding.
在得到低精度embedding后,可以根据每种特征对应的自适应步长对每种特征进行反量化,从而可以得到全精度embedding。After obtaining the low-precision embedding, each feature can be dequantized according to the adaptive step size corresponding to each feature to obtain the full-precision embedding.
其中,反量化步骤可以参阅前述图7中的步骤702或者前述图8中的步骤804,此处不再赘述。The inverse quantization step may refer to step 702 in FIG. 7 or step 804 in FIG. 8 , which will not be described in detail here.
1204、根据全精度embedding作为神经网络的输入,输出推荐信息。1204. Output recommendation information based on the full-precision embedding as the input of the neural network.
在得到全精度embedding后,即可将得到的全精度embedding作为推荐网络的输入,输出对应的推荐信息。After obtaining the full-precision embedding, the obtained full-precision embedding can be used as the input of the recommendation network to output the corresponding recommendation information.
本申请实施方式中,在神经网络的推理过程中,可以使用自适应步长对低精度嵌入表征进行反量化得到全精度嵌入表征,因此在推理过程中可以保存或者传输低精度,通过自适应步长进行无损还原,得到全精度嵌入表征。从而可以降低嵌入表征词表所占用的存储空间,并在使用时进行无损还原。In the implementation of the present application, during the reasoning process of the neural network, the low-precision embedded representation can be dequantized using an adaptive step size to obtain a full-precision embedded representation, so that the low-precision can be saved or transmitted during the reasoning process, and the full-precision embedded representation can be obtained by losslessly restoring the adaptive step size. This can reduce the storage space occupied by the embedded representation vocabulary and perform lossless restoration when used.
前述对本申请提供的方法流程进行了介绍,下面基于前述的方法流程,对本申请提供的装置进行介绍。The above is an introduction to the method flow provided by the present application. Based on the above method flow, the following is an introduction to the device provided by the present application.
参阅图13,本申请提供的一种量化装置的结构示意图,包括:Referring to FIG. 13 , a schematic diagram of a quantization device provided by the present application includes:
获取模块1301,用于获取全精度嵌入表征,嵌入表征包括多种特征;An acquisition module 1301 is used to acquire a full-precision embedded representation, where the embedded representation includes multiple features;
确定模块1302,用于确定多种特征中每种特征分别对应的自适应步长;A determination module 1302 is used to determine an adaptive step size corresponding to each of the multiple features;
量化模块1303,用于根据每种特征对应的自适应步长分别对多种特征进行量化,得到低精度嵌入表征,低精度嵌入表征中的特征的精度低于全精度嵌入表征中特征的精度。The quantization module 1303 is used to quantize multiple features according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, where the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
在一种可能的实施方式中,低精度嵌入表征词表应用于神经网络,In one possible implementation, a low-precision embedding representation vocabulary is applied to a neural network.
获取模块1301,具体用于从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;对当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的全精度嵌入表征。The acquisition module 1301 is specifically used to obtain the representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain the low-precision embedding representation of the current iteration; dequantize the low-precision embedding representation of the current iteration to obtain the full-precision embedding representation of the current iteration.
在一种可能的实施方式中,确定模块1302,具体用于:将当前次迭代的全精度嵌入表征作为神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;根据全精度梯度获取更新全精度嵌入表征,得到更新后的全精度嵌入表征;根据全精度梯度获取更新后的全精度嵌入表征中每种特征分别对应的自适应步长。In one possible implementation, the determination module 1302 is specifically used to: use the full-precision embedding representation of the current iteration as the input of the neural network to obtain the full-precision gradient corresponding to the prediction result of the current iteration; obtain the updated full-precision embedding representation according to the full-precision gradient to obtain the updated full-precision embedding representation; obtain the adaptive step size corresponding to each feature in the updated full-precision embedding representation according to the full-precision gradient.
在一种可能的实施方式中,量化模块1303,具体用于根据每种特征分别对应的自适应步长,对当前次迭代的全精度低维表征中的多种特征进行量化,得到低精度嵌入表征。In a possible implementation, the quantization module 1303 is specifically configured to quantize multiple features in the full-precision low-dimensional representation of the current iteration according to the adaptive step size corresponding to each feature, so as to obtain a low-precision embedded representation.
在一种可能的实施方式中,获取模块,还用于根据低精度嵌入表征更新低精度嵌入表征词表,得到更新后的低精度嵌入表征词表。In a possible implementation, the acquisition module is further configured to update the low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
在一种可能的实施方式中,确定模块1302,具体用于通过启发式算法计算每种特征对应的自适应步长。In a possible implementation, the determination module 1302 is specifically configured to calculate the adaptive step size corresponding to each feature by using a heuristic algorithm.
在一种可能的实施方式中,确定模块1302,具体用于根据每种特征中权重绝对值计算每种特征对应的自适应步长。In a possible implementation, the determination module 1302 is specifically configured to calculate the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
在一种可能的实施方式中,量化模块1303,具体用于:根据每种特征对应的自适应步长,得到每种特征的离散特征;通过随机截断算法对每种特征的离散特征进行截断,得到低精度嵌入表征。In a possible implementation, the quantization module 1303 is specifically used to: obtain a discrete feature of each feature according to the adaptive step size corresponding to each feature; and truncate the discrete feature of each feature by a random truncation algorithm to obtain a low-precision embedded representation.
在一种可能的实施方式中,低精度嵌入表征词表应用于语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。In a possible implementation, the low-precision embedding representation vocabulary is applied to a language model or a recommendation model. The language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
参阅图14,本申请提供的一种推荐装置的结构示意图,包括:Referring to FIG. 14 , a schematic diagram of a recommended device provided by the present application includes:
输入模块1401,用于获取输入数据,输入数据包括用户针对终端的至少一种行为产生的数据;An input module 1401 is used to obtain input data, where the input data includes data generated by at least one behavior of a user on a terminal;
获取模块1402,用于从低精度嵌入表征词表中获取与输入数据对应的低精度嵌入表征,低精度嵌入表征中包括多种特征;An acquisition module 1402 is used to acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, where the low-precision embedding representation includes multiple features;
反量化模块1403,用于根据多种特征中每种特征对应的自适应步长对多种特征进行反量化,得到全精度嵌入表征;A dequantization module 1403, configured to dequantize the multiple features according to an adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation;
推荐模块1404,用于根据全精度嵌入表征作为神经网络的输入,输出推荐信息,推荐信息用于针对 用户的至少一种行为进行推荐。The recommendation module 1404 is used to output recommendation information based on the full-precision embedding representation as the input of the neural network, and the recommendation information is used to At least one behavior of the user is recommended.
在一种可能的实施方式中,神经网络包括语言模型或者推荐模型,语言模型用于获取语料的语义信息,推荐模型用于根据用户的信息生成推荐信息。In a possible implementation, the neural network includes a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
请参阅图15,本申请提供的另一种量化装置的结构示意图,如下所述。Please refer to FIG. 15 , which is a schematic diagram of the structure of another quantization device provided in the present application, as described below.
该推荐装置可以包括处理器1501和存储器1502。该处理器1501和存储器1502通过线路互联。其中,存储器1502中存储有程序指令和数据。The recommendation device may include a processor 1501 and a memory 1502. The processor 1501 and the memory 1502 are interconnected via a line. The memory 1502 stores program instructions and data.
存储器1502中存储了前述图6-图8中的步骤对应的程序指令以及数据。The memory 1502 stores program instructions and data corresponding to the steps in the aforementioned FIGS. 6 to 8 .
处理器1501用于执行前述图6-图8中任一实施例所示的量化装置执行的方法步骤。The processor 1501 is used to execute the method steps performed by the quantization device shown in any of the embodiments in FIG. 6 to FIG. 8 .
可选地,该推荐装置还可以包括收发器1503,用于接收或者发送数据。Optionally, the recommendation device may further include a transceiver 1503 for receiving or sending data.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图6-图8所示实施例描述的方法中的步骤。A computer-readable storage medium is also provided in an embodiment of the present application. The computer-readable storage medium stores a program, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiments shown in the aforementioned Figures 6 to 8.
可选地,前述的图15中所示的推荐装置为芯片。Optionally, the recommended device shown in the aforementioned FIG. 15 is a chip.
请参阅图16,本申请提供的另一种推荐装置的结构示意图,如下所述。Please refer to FIG. 16 , which is a schematic diagram of the structure of another recommended device provided by the present application, as described below.
该推荐装置可以包括处理器1601和存储器1602。该处理器1601和存储器1602通过线路互联。其中,存储器1602中存储有程序指令和数据。The recommendation device may include a processor 1601 and a memory 1602. The processor 1601 and the memory 1602 are interconnected via a line. The memory 1602 stores program instructions and data.
存储器1602中存储了前述图12中的步骤对应的程序指令以及数据。The memory 1602 stores program instructions and data corresponding to the steps in FIG. 12 .
处理器1601用于执行前述图12所示的推荐装置执行的方法步骤。The processor 1601 is used to execute the method steps performed by the recommendation device shown in FIG. 12 .
可选地,该推荐装置还可以包括收发器1603,用于接收或者发送数据。Optionally, the recommendation device may further include a transceiver 1603 for receiving or sending data.
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有程序,当其在计算机上运行时,使得计算机执行如前述图12所示实施例描述的方法中的步骤。A computer-readable storage medium is also provided in an embodiment of the present application. The computer-readable storage medium stores a program, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiment shown in FIG. 12 above.
可选地,前述的图16中所示的推荐装置为芯片。Optionally, the recommended device shown in the aforementioned FIG. 16 is a chip.
本申请实施例还提供了一种推荐装置,该推荐装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图11的方法步骤。An embodiment of the present application also provides a recommendation device, which can also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. The processing unit is used to execute the method steps of the aforementioned Figure 11.
本申请实施例还提供了一种推荐装置,该推荐装置也可以称为数字处理芯片或者芯片,芯片包括处理单元和通信接口,处理单元通过通信接口获取程序指令,程序指令被处理单元执行,处理单元用于执行前述图12的方法步骤。An embodiment of the present application also provides a recommendation device, which can also be called a digital processing chip or chip. The chip includes a processing unit and a communication interface. The processing unit obtains program instructions through the communication interface, and the program instructions are executed by the processing unit. The processing unit is used to execute the method steps of the aforementioned Figure 12.
本申请实施例还提供一种数字处理芯片。该数字处理芯片中集成了用于实现上述处理器1501、处理器1601,或者处理器1501、处理器1601的功能的电路和一个或者多个接口。当该数字处理芯片中集成了存储器时,该数字处理芯片可以完成前述实施例中的任一个或多个实施例的方法步骤。当该数字处理芯片中未集成存储器时,可以通过通信接口与外置的存储器连接。该数字处理芯片根据外置的存储器中存储的程序代码来实现上述实施例中推荐装置或者推荐装置执行的动作。The embodiment of the present application also provides a digital processing chip. The digital processing chip integrates a circuit and one or more interfaces for implementing the functions of the above-mentioned processor 1501, processor 1601, or processor 1501, processor 1601. When the digital processing chip integrates a memory, the digital processing chip can complete the method steps of any one or more embodiments in the above-mentioned embodiments. When the digital processing chip does not integrate a memory, it can be connected to an external memory through a communication interface. The digital processing chip implements the recommendation device or the action performed by the recommendation device in the above-mentioned embodiment according to the program code stored in the external memory.
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图6-图12所示实施例描述的方法的步骤。An embodiment of the present application also provides a computer program product, which, when executed on a computer, enables the computer to execute the steps of the method described in the embodiments shown in the aforementioned Figures 6 to 12.
本申请实施例提供的推荐装置或者推荐装置可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使服务器内的芯片执行上述图6-图12所示实施例描述的方法步骤。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。The recommendation device or recommendation device provided in the embodiment of the present application may be a chip, and the chip includes: a processing unit and a communication unit, wherein the processing unit may be, for example, a processor, and the communication unit may be, for example, an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored in the storage unit so that the chip in the server executes the method steps described in the embodiments shown in the above-mentioned Figures 6 to 12. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, etc. The storage unit may also be a storage unit located outside the chip in the wireless access device end, such as a read-only memory (ROM) or other types of static storage devices that can store static information and instructions, a random access memory (RAM), etc.
具体地,前述的处理单元或者处理器可以是中央处理器(central processing unit,CPU)、网络处理器(neural-network processing unit,NPU)、图形处理器(graphics processing unit,GPU)、数字信号处理器(digital signal processor,DSP)、专用集成电路(application specific integrated circuit,ASIC)或现场可编程逻辑门阵列(field programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者也可以是任 何常规的处理器等。Specifically, the aforementioned processing unit or processor may be a central processing unit (CPU), a neural-network processing unit (NPU), a graphics processing unit (GPU), a digital signal processor (DSP), an application specific integrated circuit (ASIC) or a field programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any Any conventional processor, etc.
示例性地,请参阅图17,图17为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 170,NPU 170作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1703,通过控制器1704控制运算电路1703提取存储器中的矩阵数据并进行乘法运算。For example, please refer to FIG. 17, which is a schematic diagram of a structure of a chip provided in an embodiment of the present application. The chip can be a neural network processor NPU 170, which is mounted on the host CPU (Host CPU) as a coprocessor and assigned tasks by the Host CPU. The core part of the NPU is the operation circuit 1703, which is controlled by the controller 1704 to extract matrix data from the memory and perform multiplication operations.
在一些实现中,运算电路1703内部包括多个处理单元(process engine,PE)。在一些实现中,运算电路1703是二维脉动阵列。运算电路1703还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1703是通用的矩阵处理器。In some implementations, the operation circuit 1703 includes multiple processing units (process engines, PEs) inside. In some implementations, the operation circuit 1703 is a two-dimensional systolic array. The operation circuit 1703 can also be a one-dimensional systolic array or other electronic circuits capable of performing mathematical operations such as multiplication and addition. In some implementations, the operation circuit 1703 is a general-purpose matrix processor.
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1702中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1701中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1708中。For example, assume there is an input matrix A, a weight matrix B, and an output matrix C. The operation circuit takes the corresponding data of matrix B from the weight memory 1702 and caches it on each PE in the operation circuit. The operation circuit takes the matrix A data from the input memory 1701 and performs matrix operation with matrix B, and the partial result or final result of the matrix is stored in the accumulator 1708.
统一存储器1706用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(direct memory access controller,DMAC)1705,DMAC被搬运到权重存储器1702中。输入数据也通过DMAC被搬运到统一存储器1706中。The unified memory 1706 is used to store input data and output data. The weight data is directly transferred to the weight memory 1702 through the direct memory access controller (DMAC) 1705. The input data is also transferred to the unified memory 1706 through the DMAC.
总线接口单元(bus interface unit,BIU)1710,用于AXI总线与DMAC和取指存储器(instruction fetch buffer,IFB)1709的交互。The bus interface unit (BIU) 1710 is used for the interaction between the AXI bus and the DMAC and instruction fetch buffer (IFB) 1709.
总线接口单元1710(bus interface unit,BIU),用于取指存储器1709从外部存储器获取指令,还用于存储单元访问控制器1705从外部存储器获取输入矩阵A或者权重矩阵B的原数据。The bus interface unit 1710 (BIU) is used for the instruction fetch memory 1709 to obtain instructions from the external memory, and is also used for the storage unit access controller 1705 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1706或将权重数据搬运到权重存储器1702中或将输入数据数据搬运到输入存储器1701中。DMAC is mainly used to transfer input data in the external memory DDR to the unified memory 1706 or to transfer weight data to the weight memory 1702 or to transfer input data to the input memory 1701.
向量计算单元1707包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如批归一化(batch normalization),像素级求和,对特征平面进行上采样等。The vector calculation unit 1707 includes multiple operation processing units, which further process the output of the operation circuit when necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, etc. It is mainly used for non-convolutional/fully connected layer network calculations in neural networks, such as batch normalization, pixel-level summation, upsampling of feature planes, etc.
在一些实现中,向量计算单元1707能将经处理的输出的向量存储到统一存储器1706。例如,向量计算单元1707可以将线性函数和/或非线性函数应用到运算电路1703的输出,例如对卷积层提取的特征平面进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1707生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1703的激活输入,例如用于在神经网络中的后续层中的使用。In some implementations, the vector calculation unit 1707 can store the processed output vector to the unified memory 1706. For example, the vector calculation unit 1707 can apply a linear function and/or a nonlinear function to the output of the operation circuit 1703, such as linear interpolation of the feature plane extracted by the convolution layer, and then, for example, a vector of accumulated values to generate an activation value. In some implementations, the vector calculation unit 1707 generates a normalized value, a pixel-level summed value, or both. In some implementations, the processed output vector can be used as an activation input to the operation circuit 1703, for example, for use in a subsequent layer in a neural network.
控制器1704连接的取指存储器(instruction fetch buffer)1709,用于存储控制器1704使用的指令;An instruction fetch buffer 1709 connected to the controller 1704, for storing instructions used by the controller 1704;
统一存储器1706,输入存储器1701,权重存储器1702以及取指存储器1709均为On-Chip存储器。外部存储器私有于该NPU硬件架构。Unified memory 1706, input memory 1701, weight memory 1702 and instruction fetch memory 1709 are all on-chip memories. External memories are private to the NPU hardware architecture.
其中,循环神经网络中各层的运算可以由运算电路1703或向量计算单元1707执行。Among them, the operations of each layer in the recurrent neural network can be performed by the operation circuit 1703 or the vector calculation unit 1707.
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述图6-图12的方法的程序执行的集成电路。The processor mentioned in any of the above places may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits for controlling the execution of the programs of the methods of Figures 6 to 12 above.
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。It should also be noted that the device embodiments described above are merely schematic, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the scheme of this embodiment. In addition, in the drawings of the device embodiments provided by the present application, the connection relationship between the modules indicates that there is a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上 或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、只读存储器(read only memory,ROM)、随机存取存储器(random access memory,RAM)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。Through the description of the above implementation methods, technicians in the relevant field can clearly understand that the present application can be implemented by means of software plus necessary general-purpose hardware. Of course, it can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components, etc. In general, all functions performed by computer programs can be easily implemented with corresponding hardware, and the specific hardware structures used to implement the same function can also be diverse, such as analog circuits, digital circuits, or special-purpose circuits. However, for the present application, software program implementation is a better implementation method in most cases. Based on this understanding, the technical solution of the present application is essentially In other words, the part that contributes to the prior art can be embodied in the form of a software product, which is stored in a readable storage medium, such as a computer floppy disk, USB flash drive, mobile hard disk, read-only memory (ROM), random access memory (RAM), magnetic disk or optical disk, etc., and includes a number of instructions for enabling a computer device (which can be a personal computer, server, or network device, etc.) to execute the methods described in the various embodiments of the present application.
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。In the above embodiments, all or part of the embodiments may be implemented by software, hardware, firmware or any combination thereof. When implemented by software, all or part of the embodiments may be implemented in the form of a computer program product.
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。The computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, the process or function described in the embodiment of the present application is generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium. For example, the computer instructions may be transmitted from one website, computer, server or data center to another website, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium may be any available medium that a computer can store or a data storage device such as a server or data center that includes one or more available media integrated. The available medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a solid state drive (SSD)), etc.
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的实施例能够以除了在这里图示或描述的内容以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。 The terms "first", "second", "third", "fourth", etc. (if any) in the specification and claims of the present application and the above-mentioned drawings are used to distinguish similar objects, and are not necessarily used to describe a specific order or sequence. It should be understood that the data used in this way can be interchangeable where appropriate, so that the embodiments described herein can be implemented in an order other than that illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, for example, a process, method, system, product or device that includes a series of steps or units is not necessarily limited to those steps or units that are clearly listed, but may include other steps or units that are not clearly listed or inherent to these processes, methods, products or devices.

Claims (26)

  1. 一种量化方法,其特征在于,包括:A quantification method, characterized by comprising:
    获取全精度嵌入表征,所述嵌入表征包括多种特征;Obtaining a full-precision embedding representation, wherein the embedding representation includes multiple features;
    确定所述多种特征中每种特征分别对应的自适应步长;Determine an adaptive step size corresponding to each of the multiple features;
    根据所述每种特征对应的自适应步长分别对所述多种特征进行量化,得到低精度嵌入表征,所述低精度嵌入表征中的特征的精度低于所述全精度嵌入表征中特征的精度。The multiple features are quantized respectively according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, and the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
  2. 根据权利要求1所述的方法,其特征在于,所述低精度嵌入表征词表应用于神经网络,The method according to claim 1, characterized in that the low-precision embedding representation vocabulary is applied to a neural network,
    所述获取全精度嵌入表征词表,包括:The step of obtaining a full-precision embedding representation vocabulary includes:
    从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;Obtaining a representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain a low-precision embedding representation of the current iteration;
    对所述当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的所述全精度嵌入表征。The low-precision embedded representation of the current iteration is dequantized to obtain the full-precision embedded representation of the current iteration.
  3. 根据权利要求2所述的方法,其特征在于,所述确定所述多种特征中每种特征分别对应的自适应步长,包括:The method according to claim 2, characterized in that the step of determining the adaptive step size corresponding to each of the multiple features comprises:
    将所述当前次迭代的全精度嵌入表征作为所述神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;Using the full-precision embedding representation of the current iteration as the input of the neural network, and obtaining the full-precision gradient corresponding to the prediction result of the current iteration;
    根据所述全精度梯度获取更新所述全精度嵌入表征,得到更新后的全精度嵌入表征;Acquire and update the full-precision embedding representation according to the full-precision gradient to obtain an updated full-precision embedding representation;
    根据所述全精度梯度获取所述更新后的全精度嵌入表征中每种特征分别对应的自适应步长。The adaptive step size corresponding to each feature in the updated full-precision embedding representation is obtained according to the full-precision gradient.
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述每种特征对应的自适应步长分别对所述多种特征进行量化,包括:The method according to claim 3, characterized in that the quantizing of the multiple features respectively according to the adaptive step size corresponding to each feature comprises:
    根据所述每种特征分别对应的自适应步长,对所述当前次迭代的全精度低维表征中的多种特征进行量化,得到所述低精度嵌入表征。According to the adaptive step size corresponding to each feature, multiple features in the full-precision low-dimensional representation of the current iteration are quantized to obtain the low-precision embedded representation.
  5. 根据权利要求2-4中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 2 to 4, characterized in that the method further comprises:
    根据所述低精度嵌入表征更新所述低精度嵌入表征词表,得到更新后的低精度嵌入表征词表。The low-precision embedding representation vocabulary is updated according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
  6. 根据权利要求1所述的方法,其特征在于,所述确定所述多种特征中每种特征对应的自适应步长,包括:The method according to claim 1, characterized in that the step of determining the adaptive step size corresponding to each of the multiple features comprises:
    通过启发式算法计算所述每种特征对应的自适应步长。The adaptive step size corresponding to each feature is calculated by a heuristic algorithm.
  7. 根据权利要求6所述的方法,其特征在于,所述通过启发式算法计算所述每种特征对应的自适应步长,还包括:The method according to claim 6, characterized in that the step of calculating the adaptive step size corresponding to each feature by a heuristic algorithm further comprises:
    根据所述每种特征中权重绝对值计算所述每种特征对应的自适应步长。The adaptive step size corresponding to each feature is calculated according to the absolute value of the weight in each feature.
  8. 根据权利要求1-7所述的方法,其特征在于,所述根据所述每种特征对应的自适应步长分别对所述多种特征进行量化,得到低精度嵌入表征词表,还包括:The method according to claims 1-7 is characterized in that the multiple features are quantized respectively according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation vocabulary, and further includes:
    根据所述每种特征对应的自适应步长,得到所述每种特征的离散特征;Obtaining discrete features of each feature according to the adaptive step size corresponding to each feature;
    通过随机截断算法对所述每种特征的离散特征进行截断,得到所述低精度嵌入表征。The discrete features of each feature are truncated by a random truncation algorithm to obtain the low-precision embedding representation.
  9. 根据权利要求1-8所述的方法,其特征在于,所述低精度嵌入表征词表应用于语言模型或者推荐模型,所述语言模型用于获取语料的语义信息,所述推荐模型用于根据用户的信息生成推荐信息。The method according to claims 1-8 is characterized in that the low-precision embedded representation vocabulary is applied to a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
  10. 一种推荐方法,其特征在于,包括:A recommendation method, characterized by comprising:
    获取输入数据,所述输入数据包括用户针对终端的至少一种行为产生的数据; Acquiring input data, where the input data includes data generated by at least one behavior of a user on a terminal;
    从低精度嵌入表征词表中获取与所述输入数据对应的低精度嵌入表征,所述低精度嵌入表征中包括多种特征;Acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, wherein the low-precision embedding representation includes multiple features;
    根据所述多种特征中每种特征对应的自适应步长对所述多种特征分别进行反量化,得到全精度嵌入表征;Dequantizing the multiple features respectively according to the adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation;
    根据所述全精度嵌入表征作为神经网络的输入,输出推荐信息,所述推荐信息用于针对所述用户的所述至少一种行为进行推荐。Output recommendation information based on the full-precision embedded representation as an input of a neural network, where the recommendation information is used to make a recommendation for the at least one behavior of the user.
  11. 根据权利要求10所述的方法,其特征在于,所述神经网络包括语言模型或者推荐模型,所述语言模型用于获取语料的语义信息,所述推荐模型用于根据用户的信息生成推荐信息。The method according to claim 10 is characterized in that the neural network includes a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
  12. 一种量化装置,其特征在于,包括:A quantization device, characterized by comprising:
    获取模块,用于获取全精度嵌入表征,所述嵌入表征包括多种特征;An acquisition module, configured to acquire a full-precision embedded representation, wherein the embedded representation includes multiple features;
    确定模块,用于确定所述多种特征中每种特征分别对应的自适应步长;A determination module, used to determine the adaptive step size corresponding to each of the multiple features;
    量化模块,用于根据所述每种特征对应的自适应步长分别对所述多种特征进行量化,得到低精度嵌入表征,所述低精度嵌入表征中的特征的精度低于所述全精度嵌入表征中特征的精度。A quantization module is used to quantize the multiple features respectively according to the adaptive step size corresponding to each feature to obtain a low-precision embedded representation, wherein the accuracy of the features in the low-precision embedded representation is lower than the accuracy of the features in the full-precision embedded representation.
  13. 根据权利要求12所述的装置,其特征在于,所述低精度嵌入表征词表应用于神经网络,The device according to claim 12, characterized in that the low-precision embedding representation vocabulary is applied to a neural network,
    所述获取模块,具体用于:The acquisition module is specifically used for:
    从低精度嵌入表征词表中获取与当前次迭代的输入数据对应的表征,得到当前次迭代的低精度嵌入表征;Obtaining a representation corresponding to the input data of the current iteration from the low-precision embedding representation vocabulary to obtain a low-precision embedding representation of the current iteration;
    对所述当前次迭代的低精度嵌入表征进行反量化,得到当前次迭代的所述全精度嵌入表征。The low-precision embedded representation of the current iteration is dequantized to obtain the full-precision embedded representation of the current iteration.
  14. 根据权利要求13所述的装置,其特征在于,所述确定模块,具体用于:The device according to claim 13, characterized in that the determining module is specifically used to:
    将所述当前次迭代的全精度嵌入表征作为所述神经网络的输入,得到当前次迭代的预测结果对应的全精度梯度;Using the full-precision embedding representation of the current iteration as the input of the neural network, and obtaining the full-precision gradient corresponding to the prediction result of the current iteration;
    根据所述全精度梯度获取更新所述全精度嵌入表征,得到更新后的全精度嵌入表征;Acquire and update the full-precision embedding representation according to the full-precision gradient to obtain an updated full-precision embedding representation;
    根据所述全精度梯度获取所述更新后的全精度嵌入表征中每种特征分别对应的自适应步长。The adaptive step size corresponding to each feature in the updated full-precision embedding representation is obtained according to the full-precision gradient.
  15. 根据权利要求14所述的装置,其特征在于,The device according to claim 14, characterized in that
    所述量化模块,具体用于根据所述每种特征分别对应的自适应步长,对所述当前次迭代的全精度低维表征中的多种特征进行量化,得到所述低精度嵌入表征。The quantization module is specifically used to quantize multiple features in the full-precision low-dimensional representation of the current iteration according to the adaptive step size corresponding to each feature, so as to obtain the low-precision embedded representation.
  16. 根据权利要求13-15中任一项所述的装置,其特征在于,所述获取模块,还用于根据所述低精度嵌入表征更新所述低精度嵌入表征词表,得到更新后的低精度嵌入表征词表。The device according to any one of claims 13-15 is characterized in that the acquisition module is also used to update the low-precision embedding representation vocabulary according to the low-precision embedding representation to obtain an updated low-precision embedding representation vocabulary.
  17. 根据权利要求12所述的装置,其特征在于,The device according to claim 12, characterized in that
    所述确定模块,具体用于通过启发式算法计算所述每种特征对应的自适应步长。The determination module is specifically used to calculate the adaptive step size corresponding to each feature through a heuristic algorithm.
  18. 根据权利要求17所述的装置,其特征在于,The device according to claim 17, characterized in that
    所述确定模块,具体用于根据所述每种特征中权重绝对值计算所述每种特征对应的自适应步长。The determination module is specifically used to calculate the adaptive step size corresponding to each feature according to the absolute value of the weight in each feature.
  19. 根据权利要求12-18所述的装置,其特征在于,所述量化模块,具体用于:The device according to claims 12-18, characterized in that the quantization module is specifically used to:
    根据所述每种特征对应的自适应步长,得到所述每种特征的离散特征;Obtaining discrete features of each feature according to the adaptive step size corresponding to each feature;
    通过随机截断算法对所述每种特征的离散特征进行截断,得到所述低精度嵌入表征。The discrete features of each feature are truncated by a random truncation algorithm to obtain the low-precision embedding representation.
  20. 根据权利要求12-19所述的装置,其特征在于,所述低精度嵌入表征词表应用于语言模型或者 推荐模型,所述语言模型用于获取语料的语义信息,所述推荐模型用于根据用户的信息生成推荐信息。The device according to claims 12-19, characterized in that the low-precision embedding representation vocabulary is applied to a language model or The recommendation model is used to obtain the semantic information of the corpus, and the recommendation model is used to generate recommendation information based on the user's information.
  21. 一种推荐装置,其特征在于,包括:A recommendation device, characterized by comprising:
    输入模块,用于获取输入数据,所述输入数据包括用户针对终端的至少一种行为产生的数据;An input module, used to obtain input data, wherein the input data includes data generated by at least one behavior of a user on a terminal;
    获取模块,用于从低精度嵌入表征词表中获取与所述输入数据对应的低精度嵌入表征,所述低精度嵌入表征中包括多种特征;An acquisition module, configured to acquire a low-precision embedding representation corresponding to the input data from a low-precision embedding representation vocabulary, wherein the low-precision embedding representation includes multiple features;
    反量化模块,用于根据所述多种特征中每种特征对应的自适应步长对所述多种特征进行反量化,得到全精度嵌入表征;A dequantization module, configured to dequantize the multiple features according to an adaptive step size corresponding to each of the multiple features to obtain a full-precision embedded representation;
    推荐模块,用于根据所述全精度嵌入表征作为神经网络的输入,输出推荐信息,所述推荐信息用于针对所述用户的所述至少一种行为进行推荐。A recommendation module is used to output recommendation information based on the full-precision embedding representation as an input of a neural network, wherein the recommendation information is used to recommend the at least one behavior of the user.
  22. 根据权利要求21所述的装置,其特征在于,所述神经网络包括语言模型或者推荐模型,所述语言模型用于获取语料的语义信息,所述推荐模型用于根据用户的信息生成推荐信息。The device according to claim 21 is characterized in that the neural network includes a language model or a recommendation model, the language model is used to obtain semantic information of the corpus, and the recommendation model is used to generate recommendation information based on user information.
  23. 一种量化装置,其特征在于,所述通信处理装置包括:处理器,所述处理器与存储器耦合;A quantization device, characterized in that the communication processing device comprises: a processor, the processor is coupled to a memory;
    所述存储器,用于存储计算机程序;The memory is used to store computer programs;
    所述处理器,用于执行所述存储器中存储的所述计算机程序,以使得所述调度装置执行如权利要求1至9任一项所述的推荐方法。The processor is used to execute the computer program stored in the memory so that the scheduling device performs the recommendation method according to any one of claims 1 to 9.
  24. 一种推荐装置,其特征在于,所述通信处理装置包括:处理器,所述处理器与存储器耦合;A recommendation device, characterized in that the communication processing device comprises: a processor, the processor is coupled to a memory;
    所述存储器,用于存储计算机程序;The memory is used to store computer programs;
    所述处理器,用于执行所述存储器中存储的所述计算机程序,以使得所述调度装置执行如权利要求10至11任一项所述的推荐方法。The processor is used to execute the computer program stored in the memory so that the scheduling device performs the recommendation method according to any one of claims 10 to 11.
  25. 一种包含指令的计算机程序产品,其特征在于,当其在计算机上运行时,使得所述计算机执行如权利要求1至11中任一项所述的方法。A computer program product comprising instructions, characterized in that when the computer program product is run on a computer, the computer is caused to execute the method according to any one of claims 1 to 11.
  26. 一种计算机可读存储介质,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至11中任一项所述的方法。 A computer-readable storage medium, characterized in that it comprises instructions, and when the instructions are executed on a computer, the computer is caused to execute the method according to any one of claims 1 to 11.
PCT/CN2023/133825 2022-11-25 2023-11-24 Quantization method and apparatus, and recommendation method and apparatus WO2024109907A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211490535.2A CN115983362A (en) 2022-11-25 2022-11-25 Quantization method, recommendation method and device
CN202211490535.2 2022-11-25

Publications (1)

Publication Number Publication Date
WO2024109907A1 true WO2024109907A1 (en) 2024-05-30

Family

ID=85971185

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133825 WO2024109907A1 (en) 2022-11-25 2023-11-24 Quantization method and apparatus, and recommendation method and apparatus

Country Status (2)

Country Link
CN (1) CN115983362A (en)
WO (1) WO2024109907A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115983362A (en) * 2022-11-25 2023-04-18 华为技术有限公司 Quantization method, recommendation method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012559A1 (en) * 2017-07-06 2019-01-10 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN112085176A (en) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112085151A (en) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN115983362A (en) * 2022-11-25 2023-04-18 华为技术有限公司 Quantization method, recommendation method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190012559A1 (en) * 2017-07-06 2019-01-10 Texas Instruments Incorporated Dynamic quantization for deep neural network inference system and method
CN110069715A (en) * 2019-04-29 2019-07-30 腾讯科技(深圳)有限公司 A kind of method of information recommendation model training, the method and device of information recommendation
CN112085176A (en) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN112085151A (en) * 2019-06-12 2020-12-15 安徽寒武纪信息科技有限公司 Data processing method, data processing device, computer equipment and storage medium
CN115983362A (en) * 2022-11-25 2023-04-18 华为技术有限公司 Quantization method, recommendation method and device

Also Published As

Publication number Publication date
CN115983362A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
WO2021047593A1 (en) Method for training recommendation model, and method and apparatus for predicting selection probability
EP4145308A1 (en) Search recommendation model training method, and search result sorting method and device
CN112257858B (en) Model compression method and device
WO2023221928A1 (en) Recommendation method and apparatus, and training method and apparatus
WO2022156561A1 (en) Method and device for natural language processing
CN116415654A (en) Data processing method and related equipment
WO2022253074A1 (en) Data processing method and related device
WO2023284716A1 (en) Neural network searching method and related device
CN110781686B (en) Statement similarity calculation method and device and computer equipment
WO2024109907A1 (en) Quantization method and apparatus, and recommendation method and apparatus
WO2024213099A1 (en) Data processing method and apparatus
WO2023020613A1 (en) Model distillation method and related device
CN113434683B (en) Text classification method, device, medium and electronic equipment
WO2024083121A1 (en) Data processing method and apparatus
WO2024067373A1 (en) Data processing method and related apparatus
WO2024041483A1 (en) Recommendation method and related device
WO2024212648A1 (en) Method for training classification model, and related apparatus
WO2024199409A1 (en) Data processing method and apparatus thereof
WO2024199404A1 (en) Consumption prediction method and related device
WO2024114659A1 (en) Summary generation method and related device
WO2024179485A1 (en) Image processing method and related device thereof
CN117217284A (en) Data processing method and device
WO2024175079A1 (en) Model quantization method and related device
WO2024109910A1 (en) Generative model training method and apparatus and data conversion method and apparatus
WO2024067779A1 (en) Data processing method and related apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23893996

Country of ref document: EP

Kind code of ref document: A1