WO2022246986A1 - 数据处理方法、装置、设备及计算机可读存储介质 - Google Patents

数据处理方法、装置、设备及计算机可读存储介质 Download PDF

Info

Publication number
WO2022246986A1
WO2022246986A1 PCT/CN2021/106602 CN2021106602W WO2022246986A1 WO 2022246986 A1 WO2022246986 A1 WO 2022246986A1 CN 2021106602 W CN2021106602 W CN 2021106602W WO 2022246986 A1 WO2022246986 A1 WO 2022246986A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
data
network layer
quantization
quantized
Prior art date
Application number
PCT/CN2021/106602
Other languages
English (en)
French (fr)
Inventor
顾佳昕
吴佳祥
沈鹏程
李绍欣
Original Assignee
腾讯云计算(北京)有限责任公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯云计算(北京)有限责任公司 filed Critical 腾讯云计算(北京)有限责任公司
Publication of WO2022246986A1 publication Critical patent/WO2022246986A1/zh
Priority to US18/300,071 priority Critical patent/US20230252294A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present application relates to the field of artificial intelligence, and in particular to a data processing method, device, equipment, and computer-readable storage medium.
  • neural network models are applied to various businesses; for example, face recognition models are applied to face detection, and noise optimization models are applied to noise reduction.
  • scale parameter amount, calculation amount
  • the representation ability of the neural network model has a strong positive correlation with its scale (parameter amount, calculation amount); The accuracy of the prediction results of the network model.
  • a neural network with a larger scale it requires higher configuration parameters of the device, such as requiring a larger storage space, requiring a higher operating speed, and so on. Therefore, in order to configure a large-scale neural network in a device with limited storage space or limited power consumption, it is necessary to quantize the large-scale neural network.
  • how to quantify the neural network model has become one of the hot research issues.
  • Embodiments of the present application provide a data processing method, device, device, and computer-readable storage medium, which realize model quantification.
  • the embodiment of the present application provides a data processing method, including:
  • a first data set is used to train the first model, the first data set includes first data and training labels of the first data, the first data is unprocessed data, and the first model includes N network layers, N is a positive integer;
  • the second data set includes second data and a training label of the second data, and the second data is quantized data;
  • the first target network layer is an unquantized network layer, and quantifying the first target network layer
  • the second target network layer is an unquantized network layer, for all The second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • the embodiment of the present application provides a data processing device, including:
  • An acquisition unit configured to use a first data set to train the first model, the first data set includes first data and a training label of the first data, the first data is unprocessed data, and the The first model includes N network layers, and N is a positive integer;
  • a processing unit configured to use the second data set to train the first model; the second data set includes second data and training labels corresponding to the second data, and the second data is quantized data; and used to determine the first target network layer from the N network layers, the first target network layer is an unquantized network layer, and quantify the first target network layer; and used to adopt The second data set trains the quantized first model, and determines a second target network layer from the N network layers, the second target network layer is an unquantized network layer, and the The second target network layer is quantized until there is no unquantized network layer among the N network layers, and the second model is obtained.
  • an embodiment of the present application also provides a data processing device, including: a storage device and a processor; a computer program is stored in the storage device; and a processor executes the computer program to implement the above data processing method.
  • an embodiment of the present application further provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the above-mentioned data processing method is realized.
  • the present application provides a computer program product or computer program
  • the computer program product or computer program includes computer instructions
  • the computer instructions are stored in a computer-readable storage medium
  • the processor of the computer device reads the The computer instruction is read, and the processor executes the computer instruction, so that the computer device executes the above-mentioned data processing method.
  • the first data set is used to train the first model
  • the second data set is used to train the first model
  • the first target network layer is determined from the N network layers
  • the first target network layer is Perform quantization
  • use the second data set to train the quantized first model, determine the second target network layer from the N network layers, and quantize the second target network layer until there is no unquantized network layer in the N network layers
  • the network layer get the second model. It can be seen that during the iterative training process of the first model, updating the first model by quantizing the target network layer can reduce the scale of the neural network model, thereby realizing model quantization.
  • Figure 1a is a schematic structural diagram of a model quantification system provided by the embodiment of the present application.
  • Figure 1b is a schematic structural diagram of another model quantification system provided in the embodiment of the present application.
  • FIG. 2 is a flow chart of a data processing method provided by an embodiment of the present application.
  • FIG. 3 is a flow chart of another data processing method provided in the embodiment of the present application.
  • Fig. 4a is an update flow chart of a pre-training model provided by the embodiment of the present application.
  • Fig. 4b is an application scenario diagram of a quantization model provided by the embodiment of the present application.
  • Fig. 4c is an application scenario diagram of another quantization model provided by the embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a data processing device provided in an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the embodiment of the present application relates to the neural network model.
  • the model to be converted is obtained by inserting pseudo-quantization operators in multiple network layers to be quantized in the model to be trained in stages; the model to be converted is converted, and The converted model is trained, and finally the quantized model corresponding to the model to be trained is obtained, so as to achieve the purpose of reducing the scale of the neural network model.
  • the representation ability of the neural network model has a strong positive correlation with its scale (such as the amount of parameters and the amount of calculation). The deeper and wider the model, the performance is often better than the smaller model.
  • blindly expanding the size of the model can improve the accuracy of face recognition, but it has caused great obstacles to the actual application and deployment of the model, especially on mobile devices with limited computing power and power consumption. Therefore, after training to obtain a full-precision pre-training model, each device that deploys the model will compress the pre-training model according to its own situation before deploying it. Compressing the model can be understood as quantizing the model.
  • the embodiment of this application proposes the following model quantification methods in the process of model quantification research:
  • Post-quantization scheme The post-quantization scheme first uses the traditional deep neural network model training method to train a full-precision model for a specific model structure and loss function. The full-precision model is not quantized. model. Then use a specific quantization method to quantize the parameters of the model to the agreed number of digits, such as quantization to int8, that is, integer; then use a small batch of training data, for example, the training data is 2000 images, or the data volume of the training data The amount of data is much smaller than the size of the training set, and the output range of each layer in the model is obtained, that is, the value range of the activation function, and then the output of each network layer in the model is quantified, and the final model is the quantized model. At this time For a certain network layer, the model parameters involved in the calculation and the activation output of the previous layer are quantized fixed-point numbers, and the activation output of the previous layer is the input of this layer.
  • QAT Quantization Aware Training
  • the model parameters are only quantized, and the accuracy loss caused by quantization cannot be taken into account in the training process, and the model parameters are adjusted for the quantization itself. That is, the impact of quantization on the accuracy of the model is not considered.
  • pseudo-quantization nodes are inserted into the back of the model parameters and the back of the activation function to simulate the quantization process. This solution can simulate the post-quantization process during the training process, and the quantized model can be obtained after training, so the recognition accuracy loss caused by quantization can be greatly reduced.
  • Model quantization training scheme with layer-by-stage quantization In the process of quantization-aware training, instead of inserting all pseudo-quantization nodes at one time, pseudo-quantization nodes are inserted step by step and layer by layer according to the rules from shallow to deep. That is to say, each time a layer of the network layer in the model is quantized, the model will be trained, that is, the parameters of the model will be adjusted. Finally, when all the network layers that need to be quantized in the model are quantized and the model converges, an updated model is obtained.
  • the post-quantization scheme directly performs post-quantization on the full-precision model, which cannot guarantee that the quantized model can obtain a good recognition effect. This is because the error caused by quantization is not considered in the training process of the full-precision model. However, the model often requires extremely high accuracy, and the error caused by model quantization will lead to wrong recognition results, and the loss it brings is immeasurable.
  • the quantization-aware training scheme can adjust the quantized model parameters to a certain extent and reduce the error caused by the quantization operation as much as possible.
  • the one-time insertion of pseudo-quantization operators will destroy the stability of the training, making the model Unable to converge to the optimum point. This is because the pseudo-quantization operator corresponding to the quantization operation will reduce the representation ability of the model, and the excessive jump of the representation ability will make the model jump out of the optimal point of original convergence, and fall into other sub-advantages.
  • the phase-by-layer quantization model quantization training scheme can divide the "great change” of the model representation ability into several "small jumps".
  • the subsequent layer can still retain the full-precision processing flow, and can gradually adapt to the error caused by quantization, and gradually adjust its own parameters.
  • This "mild" model quantization-aware training method can greatly reduce the interference of quantization errors on model training, so that the quantized model trained by this method can still achieve the benefits of model size reduction and inference speed improvement. It can maintain a high recognition accuracy and meet the actual requirements of the model application.
  • FIG. 1a is a schematic structural diagram of a model quantization system provided by an embodiment of the application.
  • the model quantization system shown in Figure 1a includes Data processing device 101 and model storage device 102, optionally, data processing device 101 and model storage device 102 are both terminals, such as smart phones, tablet computers, portable personal computers, mobile Internet devices (Mobile Internet Devices, MID) and other devices , such as smartphones are Android phones, iOS phones, etc.; or, the data processing device 101 and the model storage device 102 are both servers, such as independent physical servers, or server clusters or distributed systems composed of multiple physical servers, or provide cloud Service, cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platform Cloud server for cloud computing service.
  • cloud Service cloud database, cloud computing, cloud function, cloud storage, network service, cloud communication, middleware service, domain name service, security service, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platform Cloud server for cloud computing service.
  • the data processing device 101 is used as a terminal and the model storage device 102 is used as a server as an example for illustration.
  • the model storage device 102 is mainly used to store the first model that has been trained.
  • the first model is trained by the model storage device 102 using the first data set, or uploaded to the model storage device by other devices after training using the first data set.
  • the first data set includes full-precision first data and training labels of the first data, and the full-precision first data refers to unprocessed first data.
  • the model storage device 102 is a node in the blockchain network, capable of storing the first model in the blockchain.
  • the blockchain is a distributed data storage, point-to-point transmission, consensus mechanism, encryption algorithm and other computer New application modes of technology. It is essentially a decentralized database, which is a series of data blocks associated with cryptographic methods.
  • the distributed ledger connected by the blockchain allows multiple parties to effectively record transactions and permanently check the transactions ( cannot be tampered with).
  • the data in the blockchain cannot be tampered with, and storing the first model in the blockchain can ensure the security of the first model.
  • the data processing device 101 When the first model needs to be deployed in the data processing device 101, the data processing device 101 first obtains the configuration parameters of the data processing device, such as storage space, running memory, power consumption, etc.; then judges whether the configuration parameters of the data processing device are consistent with the first Whether the deployment conditions of the model match, if they match, then directly obtain the first model from the model storage device 102, and deploy the first model in the data processing device; if the configuration parameters of the data processing device are different from the deployment conditions of the first model match, then the data processing device 101 uses the above-mentioned model quantization training scheme of layer-by-layer quantization proposed above to perform quantization processing on the first model acquired in the model storage device 102 to obtain a quantized model.
  • the deployment conditions of the quantized model are the same as those of the data processing device
  • the configuration parameters are matched, and then the quantitative model is deployed in the data processing device 101 .
  • the data processing device 101 collects the data to be processed, calls the quantization model to identify the data to be processed, and outputs the identification result.
  • the quantized model is a face recognition model
  • the data processing device 101 collects face data to be recognized (ie, data to be processed), calls the quantized model to perform recognition processing on the face data to be recognized, and outputs a recognition result.
  • the embodiment of the present application also provides a schematic structural diagram of another model quantization system, as shown in FIG. 1b.
  • the model quantization system includes a training data module, a full-precision model training module, a staged quantization-aware training module, a quantized model conversion module, a quantized model execution module, and a model application module.
  • the training data module is mainly responsible for preprocessing the data required by the full-precision model module and the staged quantization perception training module.
  • the training data module provides it with original training data, which is a preprocessed and normalized full-precision form; while in the phased quantization-aware training phase, the training data module Provide it with quantized training data, which is a preprocessed and normalized quantized form.
  • the data preprocessing form required by the quantization training module needs to refer to some limitations of the subsequent quantization model execution module.
  • the commonly used TNN (a mobile terminal deep learning reasoning framework) quantization model execution framework only supports input forms of The symmetric quantization form ranges from -1 to +1, so the module needs to process the training data into a corresponding symmetric quantization form ranging from -1 to +1.
  • the full-precision model training module is a neural network training module, which is used to provide a high-precision pre-training model for the subsequent staged quantization perception training module.
  • the full-precision model training steps are divided into: 0) Initialize model parameters; 1) Acquire training data of a specific size and its corresponding labels; A good loss function determines the model loss; 3) Determine the gradient of each parameter according to the loss; 4) Update the model parameters according to the pre-specified method; 5) Repeat 1) to 4) until the model converges; 6) Get the first full precision model, the first model is an unquantized model.
  • the stage-by-stage quantization-aware training module is used to quantify the network layers that need to be quantized in the first model, from shallow to deep, according to the rules, insert pseudo-quantization nodes in stages and layer by layer, and obtain the updated first model.
  • the quantized model conversion module is used to perform model conversion on the updated first model to obtain a quantized model. Since the updated first model obtained in the staged quantization-aware training module contains pseudo-quantization operators, and the model parameters are still full precision, further processing is required.
  • the quantitative model execution module is used to process the input data to be predicted to obtain the predicted result. Compared with full-precision floating-point calculations, quantized fixed-point calculations require the support of corresponding processor underlying instructions.
  • the quantitative model execution module uses the quantitative model obtained in the quantitative model conversion module to reason the input data and obtain the prediction result.
  • frameworks such as open source projects TNN and NCNN (a neural network forward computing framework) can provide special underlying support and optimization for int8 numerical calculations, so as to truly leverage the advantages of model quantization.
  • the model application module is used to deploy the quantitative model to the data processing equipment.
  • the staged quantization-aware training model obtains the first model from the full-precision model training module, and the first model includes N network layer, the first model is obtained by iteratively training the initial model using the first data set, optionally, the first data set is provided by the data module, and the first data set includes the first data of full precision and the first data
  • the training label of the full-precision data refers to the unprocessed raw data, which includes no quantization, compression, blurring, or cropping.
  • the staged quantization perception module obtains the second data set from the data module, and uses the second data set to iteratively train the first model, the second data set includes the quantized second data and the training label corresponding to the second data set , for signals, quantization can be understood as converting continuous signals into discrete signals; for images, quantization can be understood as reducing the clarity of images; for data, quantization can be understood as converting high-precision data for low-precision data.
  • the target network layer is a network layer that has not been quantized in the network layer set composed of the convolutional layer and the fully connected layer in the first model; further, the target network layer is quantized, for example, by pseudo The quantization operator operates on the parameters in the target network layer, and uses the quantized target network layer to update the first model; (4) uses the second data set to train the updated first model, that is, the second The data is input into the updated first model, and according to the output result of the updated first model and the training labels of the second data, the parameters of the N network layers of the updated first model are updated to obtain the second model.
  • the network layers that need to be quantized in the first model can be gradually quantized, that is, quantized in stages, until All network layers that need to be quantized in the first model are quantized, and the first model converges to obtain the second model.
  • the second model is quantized and converted by the quantized model conversion module, and optionally, the network parameters in the second model are quantized and converted based on the quantization coefficient to obtain the final quantized model.
  • the quantitative model execution module calls the quantitative model converted by the quantitative model conversion module to process the data to be processed and obtains the processing result; for example, the quantitative model converted by the quantitative model conversion module is a face recognition model, and the quantitative model execution module calls the face recognition model
  • the face data to be recognized is recognized to obtain the face recognition result
  • the face data to be recognized is the data to be processed
  • the face recognition result is the processing result.
  • the quantized model converted by the quantized model conversion module can also be deployed to the data processing device through the model application module; That is, data processing equipment.
  • FIG. 2 is a flowchart of a data processing method provided by an embodiment of the present application. The method is performed by a data processing device, and the method in the embodiment of the present application includes the following steps:
  • the first model refers to the model that is trained by using full-precision training data to complete the initial model.
  • the initial model is a face recognition model, a noise recognition model, a text recognition model, a disease prediction model, and the like.
  • the first model is obtained by iteratively training the initial model by using the first data set, the first data set includes the first data of full precision and the training label of the first data, and the full precision data refers to the unprocessed raw data , unprocessed includes no quantization processing, compression processing, blur processing or clipping processing, etc.
  • the training label of the first data is used to optimize the parameters in the first model; optionally, the first model is trained to convergence
  • the training process of the first model includes: 1) Obtaining training data of a specific size, that is, obtaining the first data in the first data set and its corresponding labels; 2) Using the full-precision model reasoning to obtain predictions Result, and use the training label to determine the model loss according to the pre-designed loss function;
  • the first model includes N network layers, and N is a positive integer.
  • the second data set includes quantized second data and training labels corresponding to the second data, and the training labels corresponding to the second data are used to optimize parameters in the first model.
  • quantization can be understood as converting continuous signals into discrete signals; for images, quantization can be understood as reducing the clarity of images; for data, quantization can be understood as converting high-precision data into Low-precision data, such as converting floating-point data to integer data.
  • Using the second data set to train the first model refers to: input the second data into the first model, and according to the results output by the first model and the training labels of the second data, the parameters of the N network layers of the first model The optimization is performed so that the prediction result of the optimized model is closer to the training label of the second data than before the optimization.
  • each training includes a forward operation and a reverse operation, and the reverse operation is also called a backward operation;
  • the forward operation refers to, after the training data is input into the first model, the N
  • the neurons in the network layer perform weighted processing on the input data, and output the prediction results of the training data according to the weighted processing results;
  • the reverse operation refers to, according to the prediction results, the training labels corresponding to the training data, and the loss corresponding to the first model function, determine the model loss, and determine the gradient of each parameter according to the loss, and then update the parameters of the first model, so that the prediction result of the updated first model is closer to the training data corresponding to the training data than before the update. Label.
  • the second data set is obtained after performing quantization processing on the first data set.
  • quantization processing it is also necessary to consider the limitations of the quantization model during execution; for example, the commonly used TNN quantization model execution framework only supports input
  • the form is a symmetric quantization form ranging from -1 to +1, so this module needs to process the training data into a corresponding symmetric quantizing form ranging from -1 to +1.
  • the data processing device uses the first data set to train the first model, and then uses the second data set to train the first model.
  • the first data set includes the first data and the training label of the first data
  • the first data is unprocessed data
  • the second data set includes the second data and the training label of the second data
  • the second data is quantized The data.
  • using the first data set to train the first model refers to using the first data set to perform multiple iterative training on the first model to obtain the trained first model.
  • the target condition is a condition that needs to be satisfied to determine the target network layer.
  • the target condition is specified by the user; for example, the user specifies that when the number of iterations is the 3rd, 5th, 11th, 19th or 23rd, the target network layer is selected, and the target network layer to quantify.
  • the target condition is set by the developer so that the number of iterations satisfies a certain rule; for example, the developer sets the target network layer for every P iterations, and quantifies the target network layer, and P is a positive integer ; for another example, if the current number of iterations satisfies the target rule, then select the target network layer and quantify the target network layer, for example, the target rule is a geometric sequence or an arithmetic sequence, etc.; the target condition can also be that the data processing equipment detects When the first model converges, the target network layer is selected, and the target network layer is quantized.
  • the first target network layer refers to an unquantized network layer.
  • the target network layer is specified by the user; for example, the user specifies to quantify the 3rd network layer, the 10th network layer and the 15th network layer of the first model one by one.
  • the target network layer is determined by the data processing device from the first model according to the judgment conditions. For example, the data processing device makes judgments one by one in order from shallow to deep.
  • the data processing device currently judges that it is the jth layer network layer, that is, the first j-1 layers do not meet the judgment conditions of the target network layer, j is a positive integer, and j is less than or equal to N; the jth network layer belongs to the target layer, and the jth network layer has not been quantized
  • determine the jth network layer as the target network layer for example, the target layer is a convolutional layer or a fully connected layer.
  • the process of quantizing the target network layer by the data processing device includes: obtaining a quantization coefficient, and determining a pseudo-quantization operator based on the quantization coefficient and a first parameter.
  • the first parameter refers to a parameter in the target network layer.
  • the first parameter refers to the parameter with the largest absolute value in the target network layer; the target operation is performed on the first parameter and the pseudo-quantization operator, and the target operation result is used to replace the parameters in the target network layer.
  • the target operation result refers to the The parameters obtained by the target operation.
  • the first model is updated according to the quantized target network layer, for example, the target network layer before quantization in the first model is replaced with the quantized target network layer, so as to update the first model.
  • the parameters in one or more network layers other than the target network layer in the first model also need to be updated accordingly, so as to facilitate the prediction of the updated first model
  • the result is closer to the actual result, which refers to the training labels of the second data.
  • the process of quantizing the target network layer by the data processing device is to obtain the quantization coefficient, construct a pseudo-quantization operator based on the quantization coefficient, use the pseudo-quantization operator to operate on the first parameter, and use the operation result to replace the first parameter.
  • the first parameter refers to a parameter in the first target network layer.
  • the pseudo-quantization operator is a function including a quantization coefficient, and the pseudo-quantization operator is used to operate on any parameter to perform pseudo-quantization on the any parameter.
  • the pseudo-quantization operator includes a quantization operator and an inverse quantization operator.
  • the data processing device inputs the second data into the updated first model, and according to the output result of the updated first model and the training label of the second data, the network of the updated first model
  • the parameters of the layer are updated so that the updated prediction result of the first model is closer to the actual result, and then a quantized model is obtained.
  • the actual result refers to the training label of the second data.
  • the data processing device gradually quantizes the network layers that need to be quantized in the network model to be quantized, that is, quantizes in stages, that is, each time the network layer to be quantized is selected A network layer that needs to be quantized in the quantized network model is quantized until all the network layers that need to be quantized in the network model to be quantized are quantized, and the first model converges to obtain a final quantized model. It has been found in practice that processing the model through the data processing method provided by this application can reduce the scale of the neural network model, retain the representation ability of the neural network model, and reduce the cost of directly quantifying all network layers in the neural network model. loss of recognition accuracy.
  • the data processing device executes multiple iterations to obtain the second model, that is, the second data set is used to train the first model, and the first target network layer is determined from the N network layers.
  • the first network Layers are unquantized network layers.
  • the data processing device quantifies the first target network layer, uses the second data set to train the quantized first model, and determines the second target network layer from the N network layers, and the second target network layer is unquantized Network layer.
  • the data processing device quantizes the second target network layer until there is no unquantized network layer among the N network layers, and obtains the second model.
  • the data processing device uses the second data set to train the first model, and then quantizes the target network layer to obtain the quantized first model.
  • the condition for stopping the iteration process is that none of the N network layers There are unquantized network layers. Therefore, in each iteration process, the data processing device will select at least one target network layer from among the N network layers for quantization, thereby performing multiple quantization in stages, and adopting the method of quantization and training alternately, gradually transforming the N network layers All the network layers in the network layer are quantized, so that the model gradually adapts to the error caused by the quantization. Compared with quantizing all the network layers at one time, the scheme of the embodiment of the present application can retain the representation ability of the model and reduce the error caused by quantization .
  • the first model and the second data set are obtained, and the first model is trained using the second data set;
  • the first target network layer is determined from the N network layers, and the first target network layer is Quantization: use the second data set to train the quantized first model, determine the second target network layer from the N network layers, and quantize the second target network layer until there is no unquantized model in the N network layers
  • the network layer obtains the second model. It can be seen that during the iterative training process of the first model, updating the first model by quantizing the target network layer can reduce the scale of the neural network model, thereby realizing model quantization.
  • FIG. 3 is a flowchart of another data processing method provided by an embodiment of the present application. The method is performed by a data processing device, and the method in the embodiment of the present application includes the following steps:
  • the data processing device in response to the request for deploying the first model in the data processing device, acquires the first model, and after acquiring the first model, the data processing device judges whether it meets the requirement of deploying the first model according to its own configuration parameters.
  • the deployment conditions of a model, the configuration parameters of the data processing device include storage space, processing capacity, power consumption, etc.; in response to the configuration parameters of the data processing device not matching the deployment conditions of the first model, continue to execute steps S302-Step S308, Or execute step S202-step S204, and then obtain the quantitative model corresponding to the first model, and deploy the quantitative model in response to the deployment condition of the quantitative model matching the configuration parameters of the data processing device; correspondingly, in the configuration parameters of the data processing device If it matches the deployment condition of the first model, the data processing device directly deploys the first model.
  • the process of deploying the model in the data processing device responds that the configuration parameters of the data processing device do not match the deployment conditions of the first model, obtains the second data set, and obtains the second data set from the N network layers Determine the unquantized first target network layer in , quantize the first target network layer to obtain the updated first model, continue to use the second data set to train the updated first model, and learn from the N network layers Continue to determine the unquantized second target network layer, and quantize the second target network layer until there is no unquantized network layer in the N network layers, and obtain the second model.
  • the data processing device performs quantitative conversion on the network parameters in the second model based on the quantization coefficients to obtain a quantized model, and the deployment conditions of the quantized model match the configuration parameters of the data processing device.
  • the data processing device deploys the quantitative model in the data processing device.
  • step S307 the process of quantizing and transforming the network parameters in the second model based on the quantized coefficients is detailed in step S307 below, which will not be described here.
  • step S301 and step S302 reference may be made to the implementation manners of step S201 and step S202 in FIG. 2 , which will not be repeated here.
  • the N network layers include M convolutional layers and W fully connected layers connected in sequence, M and W are positive integers, and both M and W are smaller than N.
  • the data processing device selects an unquantized network layer from the M convolutional layers and W fully connected layers in sequence, and uses the selected network layer as the first target network layer. For example, in the first model, layers 3-7 are convolutional layers, layers 21-23 are fully-connected layers, and layers 3 and 4 have been quantized, then the data processing equipment will be based on shallow In the deepest order, the fifth layer is determined as the target network layer to be quantized.
  • the number of the first parameter is at least one, and the first parameter is a parameter in the first target network layer.
  • the process of obtaining the quantization coefficient by the data processing device includes: determining the number of quantization digits, which is set by the user according to the quantization requirement, or preset by the developer; and determining from at least one first parameter that the absolute value The required target first parameter.
  • the target first parameter is the first parameter with the largest absolute value among at least one first parameter.
  • the data processing device substitutes the target first parameter and the number of quantization bits into the quantization coefficient operation rule to perform calculation to obtain the quantization coefficient.
  • the data processing device determines a pseudo-quantized operator based on the quantized coefficients and the first parameter.
  • the data processing device divides the first parameter and the quantization coefficient, uses a rounding function to round the result of the division, and then multiplies the result of the rounding operation by the quantization coefficient to obtain
  • the pseudo-quantization operator optionally, the determination method is as shown in the following formula 1.
  • Q represents the pseudo-quantization operator
  • R is the first parameter
  • D represents the quantization coefficient
  • the round() function represents rounding, that is, the part greater than or equal to 0.5 is carried up, otherwise it is discarded.
  • MAX max(abs(R)); abs() is an absolute value function, abs(R) means seeking the absolute value of R, max(abs(R)) is the first parameter of the target, that is, the first parameter with the largest absolute value
  • the pseudo-quantization operator is constructed based on quantization coefficients. Moreover, it can be seen from the formula of the quantization coefficient that the data processing device determines the quantization coefficient according to the target first parameter and the quantization bit number, the quantization coefficient is positively correlated with the target first parameter, and the quantization coefficient is negatively correlated with the quantization bit number .
  • the data processing device after obtaining the pseudo-quantization operator, performs an operation on the pseudo-quantization operator and the first parameter to obtain an operation result, and the operation result includes the quantized value corresponding to each parameter in the first target network layer.
  • the parameters of , this operation includes multiplication or division, etc., the first parameter is the parameter in the first target network layer, the parameter in the first target network layer is replaced by the quantized parameter, and the quantized first target network is obtained Floor.
  • operating the first parameter with the pseudo-quantization operator refers to operating the first parameter with the pseudo-quantization operator.
  • the above step S305 is to use the pseudo-quantization operator to operate on the first parameter, and use the operation result to replace the first parameter.
  • the data processing device updates the first model according to the quantized target network layer to obtain the updated first model. That is, after updating the target network layer, the second data set is used to train the updated first model, that is, the parameters of the updated first model are adjusted to obtain the second model. That is to say, when the data processing equipment updates the parameters of a network layer in the first model according to the pseudo-quantization operator, it may affect other network layers. Therefore, each time the parameters of a network layer are updated, the second The data set trains the updated first model to adjust the parameters in the first model so that the predicted result of the updated first model is closer to the actual result.
  • the actual result mentioned here refers to the training of the second data Label.
  • the data processing device uses the second data set to train the updated first model, if the current number of iterations satisfies the target condition, and there is a network layer to be quantized among the N network layers, the to-be-quantized The network layer of is determined as the target network layer, and the step of quantifying the target network layer is triggered.
  • the data processing device can gradually quantize the network layers that need to be quantized in the network model to be quantized, that is, perform quantization in stages, that is, each selection
  • a network layer that needs to be quantized in the network model to be quantized is quantized until all network layers that need to be quantized in the network model to be quantized are quantized, and the first model converges to obtain a final quantized model. It has been found in practice that processing the model through the data processing method provided by this application can reduce the scale of the neural network model, retain the representation ability of the neural network model, and reduce the cost of directly quantifying all network layers in the neural network model. loss of recognition accuracy.
  • step S306, that is, continue to use the second data set to train the quantized first model, and determine the second target network layer from the N network layers.
  • the second target network layer is an unquantized network layer.
  • the second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • Fig. 4a is a flow chart of updating a first model provided by the embodiment of the present application. As shown in Figure 4a, the update process of the first model includes Step1-Step7:
  • Step1 The data processing device acquires the first model.
  • the parameters of the first model are obtained by the full-precision model training module using the full-precision data set in the training data module to pre-train the initial model.
  • the full-precision data set is first data set.
  • Step2 The data processing device determines the insertion timing and insertion position of the pseudo-quantization node according to the staged quantization rules.
  • the insertion timing refers to the target condition that triggers the determination of the target network layer and quantifies the target network layer.
  • the example rules corresponding to the stage-by-layer quantization scheme proposed in this application are: from the shallow layer to the deep layer, insert a pseudo-quantization operator at the associated position of the network layer that needs to be quantized every N steps to simulate the actual quantization operation. For example, a pseudo-quantization operator is inserted between two network layers.
  • one step refers to performing a round of forward and reverse operations on the model, that is, inputting training data into the model to obtain prediction results, and updating the model according to the prediction results and the labels of the training data.
  • Step3 In Step2, when the data processing device determines that a pseudo-quantization operator needs to be inserted into the current network layer, insert the pseudo-quantization operator corresponding to the current network layer according to the above formula 1, that is, use the pseudo-quantization operator to correct the current network layer. Layer parameters are updated. For implementation, refer to step S304 and step S305, which will not be repeated here.
  • Step4 The data processing device acquires training data.
  • the training data is the training data provided by the training data module, for example, the training data is obtained after the training data module quantizes the full-precision data.
  • Step5 The data processing device performs forward processing in the first model with a pseudo-quantization operator to determine a loss function.
  • Step6 The data processing device determines the gradient of each parameter in the pre-training model according to the loss function, and updates the parameters of the first model. It should be noted that the data processed at this time is still in the form of full precision, and the pseudo-quantization operator only simulates the quantization operation.
  • Step7 In order to ensure that all network layers in the first model have been quantized, determine whether there are unquantized network layers in the first model, there are no unquantized network layers in the first model, and the first model has converged In other cases, stop iteratively updating the first model and output the obtained second model; if there are unquantized network layers in the first model, continue to repeat steps 2 to 6 until there are no unquantized network layers in the first model The network layer, and the first model has converged to obtain the second model.
  • the data processing device acquires the quantization coefficient of the pseudo-quantization operator corresponding to the quantized network layer in the second model, and the parameters of the quantized network layer, and calculates according to the pseudo-quantization operator corresponding to the quantized network layer
  • the sub-quantization coefficients and the quantized parameters of the network layer are converted to the second model to obtain a quantized model.
  • Z is a fixed-point number of L bits
  • the quantization coefficient D is full precision number.
  • the data processing device converts the second model into a quantized model through the model conversion framework.
  • the model conversion framework includes tflite (a lightweight reasoning library) or onnx (Open Neural Network Exchange, open neural network exchange ) and other frameworks.
  • the data processing device after the data processing device obtains the quantitative model, it judges whether the quantitative model meets the deployment conditions according to its own configuration parameters, and deploys the quantitative model if the quantitative model meets the deployment conditions; In the case of deployment conditions, the scale of the quantization model is further reduced by adjusting the number of quantization bits, so as to obtain a quantization model that meets the deployment conditions. The smaller the number of quantization bits, the smaller the size of the model. Storage space, computing power, power consumption, etc. Therefore, the data processing device can adjust the deployment condition of the quantized model obtained after quantization by adjusting the number of quantization bits used to quantize the first model, so that the deployment condition of the quantized model is consistent with the configuration parameters of the data processing device. match.
  • the data processing device after the data processing device deploys the quantization model, the data processing device obtains the data to be predicted; quantifies the data to be predicted, for example, quantifies the data to be predicted through the training data module, and calls the quantization model to quantify the data to be predicted Forecast data for data processing.
  • the quantitative model is a face recognition model
  • the data processing device includes a device with an image acquisition function, such as a camera, and the data to be predicted is face data to be processed.
  • the data processing device collects the face data to be processed through the device with image collection function, and quantifies the face data to be processed to obtain quantified face data.
  • the quantified face data is the quantified data to be predicted.
  • the data processing device starts from Determine the face area in the quantized face data, for example, crop the quantized face data to obtain the face area, call the face recognition model to perform face recognition on the quantized face area, and output the recognition result.
  • the quantization model is a speech recognition model
  • the data processing device includes a speech collection device, such as a microphone, and the data to be predicted is speech data to be recognized.
  • the data processing equipment collects the voice data to be recognized by the voice collection equipment, and quantifies the voice data to be recognized to obtain quantized voice data.
  • the quantized voice data is the quantized data to be predicted. Perform speech recognition and output the recognition results.
  • the quantitative model can also be a predictive model, such as predicting products and videos that users may like, or the quantitative model can be a classification model, such as classifying short videos.
  • the first model and the second data set are acquired, and the first model is trained using the second data set; the unquantized first target network layer is determined from the N network layers, and the first target Quantize the network layer to obtain the updated first model; continue to use the second data set to train the updated first model, continue to determine the unquantized second target network layer from the N network layers, and for the second target
  • the network layers are quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • updating the first model by quantizing the target network layer can reduce the size of the neural network model; practice has found that not only compact and efficient
  • the recognition model can also significantly reduce the interference of quantization errors on the training process, thereby optimizing the performance of the quantization model, such as improving the recognition speed and recognition accuracy of the quantization model.
  • the embodiment of the present application provides an application scenario of a quantization model, see FIG. 4 b , which is a diagram of an application scenario of a quantization model provided in the embodiment of the present application.
  • the data processing device 401 is a camera deployed with a face recognition model.
  • the camera stores the target face to be found, such as a photo of a lost child. The camera collects the face data of people passing through the camera collection area 402, and compares these faces with the target face.
  • the data processing device 401 quantifies the face data collected in the area 402 to obtain quantized face data, for example, the face data is a face picture, and performing quantization processing on the face picture refers to adjusting clarity.
  • the data processing device 401 determines a quantized face area from the quantized face data, calls a face recognition model to perform face recognition on the quantized face area, and outputs a face recognition result.
  • performing face recognition on the quantized face area refers to detecting the similarity between the quantized face area and the target face.
  • the data processing device 403 is an access control device deployed with a face recognition model, which stores the face of the target user with the authority to open the door; The face of the requesting user, if the face of the requesting user matches the face of the target user, the door will be opened, and if there is no match, a prompt message will be output, which is used to prompt that the requesting user does not have the permission to open the door .
  • the data processing device 403 quantifies the face data collected in the camera collection area 404 to obtain quantized face data. The sharpness of the face picture.
  • the data processing device 403 determines the face area from the quantized face data, calls the face recognition model to perform face recognition on the quantized face area, and opens the door if the face recognition is passed; If it passes (similarity is lower than the threshold), it prompts that the requesting user does not have the permission to open the door.
  • performing face recognition on the quantized face area refers to detecting the similarity between the quantized face area and the target user's face, if the similarity is higher than the threshold, it means that the face recognition is passed, and the similarity is not higher than the threshold It means that the face recognition has failed.
  • FIG. 5 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the device can be mounted on the data processing device 101 shown in FIG. 1a or the model storage device 102 .
  • the data processing apparatus shown in FIG. 5 can be used to execute some or all of the functions in the method embodiments described above in FIG. 2 and FIG. 3 . Among them, the detailed description of each unit is as follows:
  • the obtaining unit 501 is configured to train the first model by using a first data set, the first data set includes first data and a training label of the first data, the first data is unprocessed data,
  • the first model includes N network layers, where N is a positive integer;
  • the processing unit 502 is configured to use the second data set to train the first model; the second data set includes second data and a training label corresponding to the second data, and the second data is quantized and for determining the first target network layer from the N network layers, the first target network layer is an unquantized network layer, and quantifying the first target network layer; and for Using the second data set to train the quantized first model, determine a second target network layer from the N network layers, the second target network layer is an unquantized network layer, for all The second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • the processing unit 502 is configured to:
  • a pseudo-quantization operator is used to operate on the first parameter, and an operation result is used to replace the first parameter, where the first parameter refers to a parameter in the first target network layer.
  • the quantity of the first parameter is at least one; the processing unit 502 is configured to:
  • a quantization coefficient is determined according to the target first parameter and the quantization bit, the quantization coefficient is positively correlated with the target first parameter, and the quantization coefficient is negatively correlated with the quantization bit.
  • the processing unit 502 is configured to:
  • the result of the rounding operation is multiplied by the quantization coefficient to obtain the operation result.
  • the N network layers include M convolutional layers and W fully connected layers connected in sequence, M and W are positive integers, and both M and W are less than N; the processing unit 502 is configured to:
  • processing unit 502 is further configured to:
  • the target condition includes: the current number of iterations is divisible by P, and P is a positive integer.
  • the processing unit 502 is configured to:
  • the network parameters in the second model are quantized and converted based on the quantized coefficients to obtain a quantized model.
  • the processing unit 502 is configured to:
  • the second model is converted according to the quantization coefficient of the pseudo-quantization operator corresponding to the quantized network layer and the parameters of the quantized network layer to obtain a quantized model.
  • processing unit 502 is further configured to:
  • the quantitative model is deployed in the data processing equipment.
  • the quantization model is a face recognition model; the processing unit 502 is also used for:
  • step S201 and step S202 shown in FIG. 2 may be executed by the obtaining unit 501 shown in FIG. 5
  • step S203 and step S204 may be executed by the processing unit 502 shown in FIG. 5
  • step S301 and step S302 shown in FIG. 3 may be executed by the acquiring unit 501 shown in FIG. 5
  • steps S303 to S308 may be executed by the processing unit 502 shown in FIG. 5 .
  • Each unit in the data processing device shown in Figure 5 is respectively or all combined into one or several other units to form, or one or some of the units can be further split into multiple functionally smaller units To achieve the same operation without affecting the realization of the technical effects of the embodiments of the present application.
  • the above-mentioned units are divided based on logical functions. In practical applications, the functions of one unit can also be realized by multiple units, or the functions of multiple units can be realized by one unit. In other embodiments of the present application, the data processing device includes other units, and in practical applications, these functions can also be implemented with the assistance of other units, and can be implemented cooperatively by multiple units.
  • CPU central processing unit
  • RAM random access storage medium
  • ROM Read-Only Memory
  • the computer programs (including program codes) that can execute the steps involved in the corresponding methods as shown in Figure 2 and Figure 3 are run on the general-purpose computing device such as a computer of the element and the storage element, to construct the computer program as shown in Figure 5 A data processing device, and a data processing method to implement the embodiments of the present application.
  • the computer program can be recorded on, for example, a computer-readable recording medium, loaded into the above-mentioned computing device via the computer-readable recording medium, and executed therein.
  • the problem-solving principle and beneficial effect of the data processing device provided in the embodiment of the present application are similar to the problem-solving principle and beneficial effect of the data processing device in the method embodiment of the present application. Please refer to the principle and beneficial effect of the implementation of the method The effect, for the sake of brevity, will not be repeated here.
  • FIG. 6 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device includes at least a processor 601 , a communication interface 602 and a memory 603 .
  • the processor 601, the communication interface 602 and the memory 603 may be connected through a bus or in other ways.
  • the processor 601 (or central processing unit (Central Processing Unit, CPU) is the calculation core and control core of the terminal, which can analyze various instructions in the terminal and process various data of the terminal, for example: the CPU can use It is used to analyze the power-on/off instructions sent by the user to the terminal, and control the terminal to perform power-on/off operations; another example: the CPU can transmit various interactive data between the internal structures of the terminal, and so on.
  • the communication interface 602 includes a standard wired interface and wireless interface (such as WI-FI, mobile communication interface, etc.), and is used to send and receive data under the control of the processor 601; the communication interface 602 can also be used for internal data transmission of the terminal and interact.
  • the memory 603 is a storage device in the terminal, and is used to store programs and data. It can be understood that the memory 603 here may include not only a built-in memory of the terminal, but also an extended memory supported by the terminal.
  • the memory 603 provides a storage space, which stores the operating system of the terminal, which may include but not limited to: Android system, iOS system, Windows Phone system, etc., which is not limited in this application.
  • the processor 601 executes the following operations by running the executable program code in the memory 603:
  • a first data set is used to train the first model, the first data set includes first data and training labels of the first data, the first data is unprocessed data, and the first model includes N network layers, N is a positive integer;
  • the second data set includes second data and training labels of the second data, the second data is quantized data;
  • the first target network layer is an unquantized network layer, and quantifying the first target network layer
  • the second target network layer is an unquantized network layer, for all The second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • processor 601 is further configured to perform the following operations:
  • a pseudo-quantization operator is used to operate on the first parameter, and an operation result is used to replace the first parameter, where the first parameter refers to a parameter in the first target network layer.
  • the number of the first parameter is at least one, and the processor 601 is further configured to perform the following operations:
  • a quantization coefficient is determined according to the target first parameter and the quantization bit, the quantization coefficient is positively correlated with the target first parameter, and the quantization coefficient is negatively correlated with the quantization bit.
  • processor 601 is further configured to perform the following operations:
  • the result of the rounding operation is multiplied by the quantization coefficient to obtain the operation result.
  • the N network layers include M convolutional layers and W fully connected layers connected in sequence, M and W are positive integers, and both M and W are smaller than N; the processor 601 also Used to perform the following operations:
  • processor 601 is further configured to perform the following operations:
  • the target condition includes: the current number of iterations is divisible by P, and P is a positive integer.
  • processor 601 is further configured to perform the following operations:
  • the network parameters in the second model are quantized and converted based on the quantized coefficients to obtain a quantized model.
  • processor 601 is further configured to perform the following operations:
  • the second model is converted according to the quantization coefficient of the pseudo-quantization operator corresponding to the quantized network layer and the parameters of the quantized network layer to obtain a quantized model.
  • processor 601 is further configured to perform the following operations:
  • the quantitative model is deployed in the data processing equipment.
  • the quantitative model is a face recognition model; the processor 601 is also configured to perform the following operations:
  • the problem-solving principle and beneficial effect of the data processing equipment provided in the embodiment of the present application are similar to the problem-solving principle and beneficial effect of the data processing method in the method embodiment of the present application. Please refer to the principle and beneficial effect of the implementation of the method The effect, for the sake of brevity, will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium, wherein one or more instructions are stored in the computer-readable storage medium, and the one or more instructions are used to be loaded by a processor to perform the following operations:
  • a first data set is used to train the first model, the first data set includes first data and training labels of the first data, the first data is unprocessed data, and the first model includes N network layers, N is a positive integer;
  • the second data set includes second data and a training label of the second data, and the second data is quantized data;
  • the first target network layer is an unquantized network layer, and quantifying the first target network layer
  • the second target network layer is an unquantized network layer, for all The second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • a pseudo-quantization operator is used to operate on the first parameter, and an operation result is used to replace the first parameter, where the first parameter refers to a parameter in the first target network layer.
  • the number of the first parameter is at least one, and one or more instructions are also used to be loaded by the processor to perform the following operations:
  • a quantization coefficient is determined according to the target first parameter and the quantization bit, the quantization coefficient is positively correlated with the target first parameter, and the quantization coefficient is negatively correlated with the quantization bit.
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the result of the rounding operation is multiplied by the quantization coefficient to obtain the operation result.
  • the N network layers include M convolutional layers and W fully connected layers connected in sequence, M and W are positive integers, and both M and W are less than N; one or more Instructions are also used to be loaded by the processor to:
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the target condition includes: the current number of iterations is divisible by P, and P is a positive integer.
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the network parameters in the second model are quantized and converted based on the quantized coefficients to obtain a quantized model.
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the second model is converted according to the quantization coefficient of the pseudo-quantization operator corresponding to the quantized network layer and the parameters of the quantized network layer to obtain a quantized model.
  • one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the quantitative model is deployed in the data processing equipment.
  • the quantized model is a face recognition model; one or more instructions are also used to be loaded by the processor to perform the following operations:
  • the embodiment of the present application also provides a computer program product including instructions, which, when run on a computer, causes the computer to execute the data processing method of the above method embodiment.
  • the embodiment of the present application also provides a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the following operations:
  • a first data set is used to train the first model, the first data set includes first data and training labels of the first data, the first data is unprocessed data, and the first model includes N network layers, N is a positive integer;
  • the second data set includes second data and a training label of the second data, and the second data is quantized data;
  • the first target network layer is an unquantized network layer, and quantifying the first target network layer
  • the second target network layer is an unquantized network layer, for all The second target network layer is quantized until there is no unquantized network layer in the N network layers, and the second model is obtained.
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • a pseudo-quantization operator is used to operate on the first parameter, and an operation result is used to replace the first parameter, where the first parameter refers to a parameter in the first target network layer.
  • the number of the first parameter is at least one, and the processor further executes the computer instruction, so that the computer device performs the following operations:
  • a quantization coefficient is determined according to the target first parameter and the quantization bit, the quantization coefficient is positively correlated with the target first parameter, and the quantization coefficient is negatively correlated with the quantization bit.
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • the result of the rounding operation is multiplied by the quantization coefficient to obtain the operation result.
  • the N network layers include M convolutional layers and W fully connected layers connected in sequence, M and W are positive integers, and both M and W are less than N; the processor also executes The computer instructions cause the computer device to perform the following operations:
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • the target condition includes: the current number of iterations is divisible by P, and P is a positive integer.
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • the network parameters in the second model are quantized and converted based on the quantized coefficients to obtain a quantized model.
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • the second model is converted according to the quantization coefficient of the pseudo-quantization operator corresponding to the quantized network layer and the parameters of the quantized network layer to obtain a quantized model.
  • the processor also executes the computer instruction, so that the computer device performs the following operations:
  • the quantitative model is deployed in the data processing equipment.
  • the quantitative model is a face recognition model
  • the processor also executes the computer instructions, so that the computer device performs the following operations:
  • the modules in the device of the embodiment of the present application can be combined, divided and deleted according to actual needs.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

一种数据处理方法,装置、设备及计算机可读存储介质。其中方法包括:采用第一数据集对第一模型进行训练,采用第二数据集对第一模型进行训练;从N个网络层中确定第一目标网络层,对第一目标网络层进行量化;采用第二数据集对量化后的第一模型进行训练,从N个网络层中继续确定第二目标网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。可见,在对第一模型进行迭代训练过程中,通过对目标网络层进行量化来更新第一模型,实现了更好的模型量化。

Description

数据处理方法、装置、设备及计算机可读存储介质
本申请要求于2021年05月27日提交、申请号为202110583709.9、发明名称为“数据处理方法、装置、设备及计算机可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,具体涉及一种数据处理方法、装置、设备及计算机可读存储介质。
背景技术
随着计算机技术的不断发展,越来越多的神经网络模型被应用于各类业务中;例如,人脸识别模型被应用于人脸检测,噪声优化模型被应用于降低噪声。研究发现,神经网络模型的表征能力与其规模(参数量,计算量)有着较强的正相关性;简单来说,规模较大的神经网络模型的预测结果的精确度优于规模较小的神经网络模型的预测结果的精确度。但是,规模越大的神经网络在部署时,对设备的配置参数要求越高,比如要求更大的存储空间,要求更高的运行速度等等。因此,为了将规模大的神经网络配置在存储空间有限或者功耗有限的设备中,需要对大规模的神经网络进行量化处理。目前,在人工智能领域中,如何对神经网络模型进行量化处理成为研究的热点问题之一。
发明内容
本申请实施例提供了一种数据处理方法、装置、设备及计算机可读存储介质,实现了模型量化。
一方面,本申请实施例提供了一种数据处理方法,包括:
采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
采用第二数据集对所述第一模型进行训练,所述第二数据集包括第二数据以及所述第二数据的训练标签,所述第二数据是量化后的数据;
从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;
采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
一方面,本申请实施例提供了一种数据处理装置,包括:
获取单元,用于采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
处理单元,用于采用所述第二数据集对所述第一模型进行训练;所述第二数据集包括第 二数据以及所述第二数据对应的训练标签,所述第二数据是量化后的数据;以及用于从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;以及用于采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
相应地,本申请实施例还提供了一种数据处理设备,包括:存储装置和处理器;所述存储装置中存储有计算机程序;处理器,执行计算机程序,实现上述的数据处理方法。
相应地,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时,上述的数据处理方法被实现。
相应地,本申请提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中,计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行上述的数据处理方法。
在本申请实施例中,采用第一数据集对第一模型进行训练,采用第二数据集对第一模型进行训练;从N个网络层中确定第一目标网络层,对第一目标网络层进行量化;采用第二数据集对量化后的第一模型进行训练,从N个网络层中确定第二目标网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。可见,在对第一模型进行迭代训练过程中,通过对目标网络层进行量化来更新第一模型,能够减小神经网络模型的规模,从而实现了模型量化。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a为本申请实施例提供的一种模型量化系统的结构示意图;
图1b为本申请实施例提供的另一种模型量化系统的结构图示意图;
图2为本申请实施例提供的一种数据处理方法的流程图;
图3为本申请实施例提供的另一种数据处理方法的流程图;
图4a为本申请实施例提供的一种预训练模型的更新流程图;
图4b为本申请实施例提供的一种量化模型的应用场景图;
图4c为本申请实施例提供的另一种量化模型的应用场景图;
图5为本申请实施例提供的一种数据处理装置的结构示意图;
图6为本申请实施例提供的一种数据处理设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
本申请实施例涉及神经网络模型,在迭代训练过程中,通过分阶段在待训练模型中的多个待量化网络层中插入伪量化算子,得到待转换模型;将待转换模型进行转换,并对转换后 的模型进行训练,最终得到待训练模型所对应的量化模型,以达到减小神经网络模型的规模的目的。
神经网络模型的表征能力与其规模(比如参数量、计算量)有着较强的正相关性,越深越宽的模型,其性能往往优于较小的模型。然而,一味地扩大模型大小,虽然能提高人脸识别的精度,但是对于模型的实际应用与部署造成了极大的障碍,尤其是在计算能力、功耗受限的移动设备上。因此,在训练得到一个全精度的预训练模型之后,各个部署该模型的设备会根据自身情况对预训练模型进行压缩处理后再部署,对模型的压缩处理可以理解为对模型进行量化。本申请实施例在模型量化研究过程中提出了以下几种模型量化方法:
1)后量化方案(post-quantization):后量化方案先利用传统的深度神经网络模型训练方法,针对特定的模型结构、损失函数,训练得到一个全精度的模型,全精度的模型为未进行量化的模型。然后将模型的参数利用特定的量化方法,量化至约定的位数,如量化至int8,即整型化;接着利用少批量的训练数据,例如训练数据为2000张图像,或训练数据的数据量远小于训练集规模的数据量,得到模型中各个层输出的范围,即激活函数的值域大小,进而对模型中各个网络层的输出进行量化,最终得到的模型即量化后的模型,此时对某一层网络层而言,参与计算的模型参数与上一层的激活输出均为量化后的定点数,上一层的激活输出即为本层的输入。
2)量化感知训练方案(Quantization Aware Training,QAT):在后量化方案的量化步骤中,模型参数仅仅被量化,无法将量化带来的精度损失考虑到训练过程中,针对量化本身调整模型参数,即未考虑量化对模型的精度带来的影响。为此,在量化感知的训练方案中,模型参数的后面与激活函数的后面均被插入伪量化节点,用于模拟量化过程。该方案能在训练过程模拟了量化后的处理,训练完即可得到量化后的模型,故能够很大程度的缩减量化带来的识别精度损失。
3)分阶段逐层量化的模型量化训练方案:量化感知训练过程中,不是一次性插入全部的伪量化节点,而是由浅至深,按照规则,分阶段、逐层插入伪量化节点。也就是说,每次对模型中的一层网络层进行量化后,会对模型进行训练,也即是对模型的参数进行调整。最终,当模型中所有需要量化的网络层量化完成,且模型收敛时,得到更新后的模型。
实践发现,上述3种方案中,后量化方案直接对全精度的模型进行后量化,无法保证量化后模型能够获得很好的识别效果。这是因为在全精度模型的训练过程中,没有将量化带来的误差考虑其中。而模型往往对准确率要求极高,模型量化带来的误差会导致错误的识别结果,其带来的损失不可估量。
量化感知训练的方案,能在一定程度上调整量化后模型参数,尽可能减少量化操作带来的误差,但是在实际操作中,伪量化算子的一次性插入会破坏训练的稳定性,使得模型无法收敛到最优点。这是因为量化操作对应的伪量化算子会降低模型的表征能力,表征能力过于剧烈的跳变会使得模型跳出原本收敛的最优点,而陷入其他次优点。
分阶段逐层量化的模型量化训练方案,分阶段插入的方式相比于一次性插入的方式,能够将模型表征能力的“巨变”切分为若干个“小跳变”。在插入伪量化节点后,后续层仍能够保留全精度的处理流程,能够逐步适应量化带来误差,渐进的调整自身的参数。这种“温和”的模型量化感知训练方式,能极大程度地减少量化误差对模型训练的干扰,使得通过该方法训练得到的量化模型,在得到模型大小缩减、推理速度提升的收益后,仍然能保持较高的识别精度,达到模型应用的实际要求。
由上述分析可知,分阶段逐层量化的模型量化训练方案在实际应用中效果更佳,因此本申请主要对分阶段逐层量化的模型量化训练方案进行详细介绍。基于分阶段逐层量化的模型量化训练方案本申请提供了一种模型量化系统,图1a为本申请实施例提供的一种模型量化系统的结构示意图,在图1a所示的模型量化系统中包括数据处理设备101和模型存储设备102,可选地,数据处理设备101和模型存储设备102均是终端,比如智能手机、平板电脑、便携式个人计算机、移动互联网设备(Mobile Internet Devices,MID)等设备,例如智能手机为Android手机、iOS手机等;或者,数据处理设备101和模型存储设备102均是服务器,比如独立的物理服务器,或者多个物理服务器构成的服务器集群或者分布式系统,或者提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。
在图1a中以数据处理设备101为终端,模型存储设备102为服务器为例进行示意。模型存储设备102主要用于存储训练完成的第一模型,该第一模型是模型存储设备102采用第一数据集进行训练的,或者是其他设备采用第一数据集训练完成后上传到模型存储设备102中存储的,第一数据集中包括全精度的第一数据以及第一数据的训练标签,全精度的第一数据是指未处理的第一数据。可选地,模型存储设备102是区块链网络中的一个节点,能够实现将第一模型存储在区块链中,区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。其本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,用区块链所串接的分布式账本能让多方有效记录交易,且可永久查验此交易(不可篡改)。区块链中的数据具有不可篡改性,将第一模型存储到区块链中能够保证第一模型的安全性。
当数据处理设备101中需要部署第一模型时,数据处理设备101首先获取数据处理设备的配置参数,比如存储空间、运行内存以及功耗情况等等;然后判断数据处理设备的配置参数与第一模型的部署条件是否匹配,如果匹配,则直接从模型存储设备102中获取第一模型,并将第一模型部署在数据处理设备中;如果数据处理设备的配置参数与第一模型的部署条件不匹配,则数据处理设备101采用上述提出的分阶段逐层量化的模型量化训练方案对模型存储设备102中获取的第一模型进行量化处理,得到量化模型,该量化模型的部署条件与数据处理设备的配置参数相匹配,进而将量化模型部署在数据处理设备101中。
后续,数据处理设备101采集待处理的数据,并调用量化模型对待处理的数据进行识别处理,输出识别结果。比如量化模型是人脸识别模型,数据处理设备101采集待识别人脸数据(即待处理的数据),调用量化模型对待识别人脸数据进行识别处理,输出识别结果。
基于上述的模型量化系统,本申请实施例还提供另一种模型量化系统的结构示意图,如图1b所示。在图1b中,模型量化系统包括训练数据模块、全精度模型训练模块、分阶段量化感知训练模块、量化模型转换模块、量化模型执行模块,以及模型应用模块。其中,训练数据模块主要负责对全精度模型模块和分阶段量化感知训练模块所需要的数据进行预处理。可选地,在全精度模型训练阶段,训练数据模块为其提供原始的训练数据,该训练数据为预处理、归一化后的全精度形式;而在分阶段量化感知训练阶段,训练数据模块为其提供量化后的训练数据,该训练数据为预处理、归一化后的量化形式。需要说明的是,量化训练模块所需的数据预处理形式需要参考后续量化模型执行模块的一些限制,比如,常用的TNN(一种移动端深度学习推理框架)量化模型执行框架仅支持输入形式为范围-1至+1的对称量化形 式,故该模块则需要将训练数据处理为相应的范围-1至+1的对称量化形式。
全精度模型训练模块为神经网络训练模块,用于为后续分阶段量化感知训练模块提供高精度的预训练模型。可选地,全精度模型训练步骤分为:0)初始化模型参数;1)获取特定大小的训练数据及其对应的标签;2)利用全精度的模型推理得到预测结果,并利用标签根据事先设计好的损失函数确定模型损失;3)根据损失确定各个参数的梯度;4)根据事先规定的方式更新模型参数;5)重复执行1)~4)直至模型收敛;6)得到全精度的第一模型,该第一模型为未量化的模型。
分阶段量化感知训练模块用于对第一模型中需要进行量化的网络层进行量化,由浅至深,按照规则,分阶段、逐层插入伪量化节点,得到更新后的第一模型。
量化模型转换模块用于对更新后的第一模型进行模型转换,得到量化模型。由于分阶段量化感知训练模块中得到的更新后的第一模型中包含伪量化算子,且模型参数仍为全精度,故需要进一步处理。量化模型执行模块用于对输入的待预测数据进行处理得到预测结果。相比于全精度的浮点数计算,量化后的定点数计算需要相应的处理器底层指令支持。量化模型执行模块利用量化模型转换模块中得到的量化模型,对输入数据进行推理,得到预测结果。以int8量化为例,开源项目TNN、NCNN(一种神经网络前向计算框架)等框架能够对int8数值计算做专门的底层支持与优化,以真正发挥模型量化的优点。模型应用模块用于将量化模型部署至数据处理设备中。
总结上述,图1b所示的模型量化系统在进行模型量化时,过程概括为如下:(1)分阶段量化感知训练模型从全精度模型训练模块获取第一模型,该第一模型包括N个网络层,第一模型是采用第一数据集对初始模型进行迭代训练得到的,可选地,第一数据集是由数据模块提供的,第一数据集包括全精度的第一数据以及第一数据的训练标签,全精度数据是指未经处理的原始数据,未经处理包括未经过量化处理、压缩处理、模糊处理或者裁剪处理等。(2)分阶段量化感知模块从数据模块获取第二数据集,并采用第二数据集对第一模型进行迭代训练,第二数据集包括量化的第二数据以及第二数据集对应的训练标签,对于信号来说,量化可以理解为将连续的信号转换为离散的信号;对于图像来说,量化可以理解为将图像的清晰度降低;对于数据来说,量化可以理解为将高精度数据转换为低精度数据。(3)在迭代训练过程中,若检测到当前迭代次数满足目标条件,例如当前迭代次数能被P整除,P为正整数,则从N个网络层中确定出未量化的目标网络层;在一个实施例中,目标网络层是由第一模型中的卷积层和全连接层组成的网络层集合中未被量化处理过的网络层;进一步地,对目标网络层进行量化,例如通过伪量化算子对目标网络层中的参数进行运算,并采用量化后的目标网络层更新第一模型;(4)采用第二数据集对更新后的第一模型进行训练,也即是将第二数据输入更新后的第一模型,并根据更新后的第一模型输出的结果和第二数据的训练标签,对更新后的第一模型的N个网络层的参数进行更新,得到第二模型。可以理解的是,通过重复执行步骤(3)和步骤(4),即在迭代训练过程中,能够逐步对第一模型中需要进行量化的网络层进行量化,也即是分阶段进行量化,直至第一模型中所有需要进行量化的网络层均完成量化,且第一模型收敛,得到第二模型。进一步地,通过量化模型转换模块对第二模型进行量化转换,可选地,基于量化系数对第二模型中的网络参数进行量化转换,得到最终的量化模型。量化模型执行模块调用量化模型转换模块转换得到的量化模型对待处理数据进行处理,得到处理结果;例如,量化模型转换模块转换得到的量化模型是人脸识别模型,量化模型执行模块调用人脸识别模型对待识别人脸数据进行识别,得到人脸识别结果,待识别人脸数据 即为待处理数据,人脸识别结果即为处理结果。此外,量化模型转换模块转换得到的量化模型还能够通过模型应用模块部署到数据处理设备中;例如,通过模型应用模块将人脸识别模型部署到摄像头中,人脸识别模型即为量化模型,摄像头即为数据处理设备。
请参阅图2,图2为本申请实施例提供的一种数据处理方法的流程图。该方法由一个数据处理设备来执行,本申请实施例的所述方法包括如下步骤:
S201、获取第一模型。
第一模型是指使用全精度的训练数据对初始模型训练完成的模型,初始模型是人脸识别模型、噪声识别模型、文本识别模型、疾病预测模型等。其中,第一模型是采用第一数据集对初始模型进行迭代训练得到的,第一数据集包括全精度的第一数据以及第一数据的训练标签,全精度数据是指未经处理的原始数据,未经处理包括未经过量化处理、压缩处理、模糊处理或者裁剪处理等,第一数据的训练标签用于对第一模型中的参数进行优化;可选地,第一模型是训练至收敛的全精度的模型,第一模型的训练过程包括:1)获取特定大小的训练数据,也即是获取第一数据集中的第一数据及其对应的标签;2)利用全精度的模型推理得到预测结果,并利用训练标签根据事先设计好的损失函数确定模型损失;3)根据损失确定各个参数的梯度;4)根据目标方式更新模型参数,以使优化后的模型的预测结果,相比于优化前更加接近第一数据的训练标签;5)重复执行1)~4)直至模型收敛;6)得到全精度的第一模型。
其中,第一模型包括N个网络层,N为正整数。
S202、获取第二数据集,采用第二数据集对第一模型进行训练。
第二数据集包括量化的第二数据以及第二数据对应的训练标签,第二数据对应的训练标签用于对第一模型中的参数进行优化。对于信号来说,量化可以理解为将连续的信号转换为离散的信号;对于图像来说,量化可以理解为将图像的清晰度降低;对于数据来说,量化可以理解为将高精度数据转换为低精度数据,如将浮点型数据转换为整型数据。
采用第二数据集对第一模型进行训练是指:将第二数据输入第一模型,并根据第一模型输出的结果和第二数据的训练标签,对第一模型的N个网络层的参数进行优化,使得优化后的模型的预测结果相比于优化前更加接近第二数据的训练标签。可选地,每次训练包括前向运算和反向运算,反向运算又称为后向运算;其中,前向运算是指,在训练数据输入第一模型后,通过第一模型的N个网络层中的神经元对输入的数据进行加权处理,并根据加权处理的结果输出训练数据的预测结果;反向运算是指,根据预测结果、训练数据对应的训练标签以及第一模型对应的损失函数,确定模型损失,并根据损失确定各个参数的梯度,进而对第一模型的参数进行更新,使得更新后的第一模型的预测结果相比于更新前,更趋近于训练数据对应的训练标签。
可选的,第二数据集是对第一数据集进行量化处理后得到的,在进行量化处理时,还需要考虑量化模型在执行时的限制;例如,常用的TNN量化模型执行框架仅支持输入形式为范围-1至+1的对称量化形式,故该模块则需要将训练数据处理为相应的范围-1至+1的对称量化形式。
根据上述步骤S201和步骤S202的内容可知,也即是数据处理设备采用第一数据集对第一模型进行训练,然后采用第二数据集对第一模型进行训练。其中,第一数据集包括第一数据以及第一数据的训练标签,第一数据是未经处理的数据,第二数据集包括第二数据以及第 二数据的训练标签,第二数据是量化后的数据。其中,采用第一数据集对第一模型进行训练是指采用第一数据集对第一模型进行多次迭代训练,得到训练后的第一模型。
S203、在当前迭代次数满足目标条件的情况下,从N个网络层中确定第一目标网络层,对第一目标网络层进行量化,以及根据量化后的目标网络层更新第一模型。
其中,目标条件是确定目标网络层所需满足的条件。可选地,目标条件是由用户指定的;例如,用户指定在迭代次数为第3次,第5次,第11次,第19次或者第23次时,选取目标网络层,对目标网络层进行量化。可选地,目标条件由开发人员设定的,以使迭代次数满足一定规律;例如,开发人员设定每进行P次迭代,则选取目标网络层,对目标网络层进行量化,P为正整数;又例如,若当前迭代次数满足目标规律时,则选取目标网络层,对目标网络层进行量化,例如目标规律为等比数列或者等差数列等;目标条件还可以是,在数据处理设备检测到第一模型收敛的情况下,则选取目标网络层,对目标网络层进行量化。其中,第一目标网络层是指未量化的网络层。
在一种实施方式中,目标网络层是由用户指定的;例如,用户指定对第一模型的第3层网络层,第10层网络层和第15层网络层逐一进行量化。可选地,目标网络层是数据处理设备根据判断条件从第一模型中确定的,例如,数据处理设备按照由浅至深的顺序,逐一进行判断,例如数据处理设备当前判断的是第j层网络层,即前j-1层均不符合目标网络层的判断条件,j为正整数,且j小于等于N;在第j层网络层属于目标层,且第j层网络层未经过量化的情况下,则将第j层网络层确定为目标网络层,例如目标层为卷积层或者全连接层。
进一步地,数据处理设备对目标网络层进行量化的过程包括:获取量化系数,并基于该量化系数与第一参数确定伪量化算子,第一参数是指目标网络层中的参数,在一个实施例中,第一参数是指目标网络层中绝对值最大的参数;将第一参数与伪量化算子进行目标运算,并采用目标运算结果替换目标网络层中的参数,目标运算结果是指进行目标运算所得到的参数。根据量化后的目标网络层更新第一模型,例如,将第一模型中量化前的目标网络层替换为量化后的目标网络层,从而实现对第一模型进行更新。
在根据量化后的目标网络层更新第一模型后,第一模型中除目标网络层之外的一个或多个网络层中的参数也需要随之更新,以便于更新后的第一模型的预测结果更接近实际结果,实际结果是指第二数据的训练标签。
根据上述内容可知,数据处理设备对目标网络层进行量化的过程,也即是获取量化系数,基于量化系数构建伪量化算子,采用伪量化算子对第一参数进行运算,采用运算结果替换第一参数,第一参数是指第一目标网络层中的参数。
其中,伪量化算子为包括量化系数的函数,该伪量化算子用于对任一参数进行运算,来将该任一参数进行伪量化。可选地,伪量化算子包括量化算子和反量化算子。
S204、采用第二数据集对更新后的第一模型进行训练,得到量化模型。
在一种实施方式中,数据处理设备将第二数据输入更新后的第一模型,并根据更新后的第一模型输出的结果和第二数据的训练标签,对更新后的第一模型的网络层的参数进行更新,使得更新后的第一模型的预测结果更接近实际结果,进而得到量化模型,实际结果是指第二数据的训练标签。
可以理解的是,在迭代训练过程中,通过重复执行步骤S203和步骤S204,数据处理设备逐步对待量化网络模型中需要量化的网络层进行量化,也即是分阶段进行量化,即每次选择待量化网络模型中的一个需要量化的网络层进行量化,直至待量化网络模型中所有需要量 化的网络层均完成量化,且第一模型收敛,得到最终的量化模型。实践发现,通过本申请提供的数据处理方法对模型进行处理,能够减小神经网络模型的规模,能够保留神经网络模型的表征能力,降低了直接对神经网络模型中的所有网络层进行量化带来的识别精度损失。
根据上述内容可知,数据处理设备执行多次迭代过程得到第二模型,也即是采用第二数据集对第一模型进行训练,从N个网络层中确定第一目标网络层,该第一网络层是未量化的网络层。数据处理设备对第一目标网络层进行量化,采用第二数据集对量化后的第一模型进行训练,从N个网络层中确定第二目标网络层,该第二目标网络层是未量化的网络层。数据处理设备对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。
在每次迭代过程中,数据处理设备采用第二数据集对第一模型进行训练,然后对目标网络层进行量化,得到量化后的第一模型,迭代过程停止的条件为N个网络层中不存在未量化的网络层。因此,在每次迭代过程中,数据处理设备都会在N个网络层中选取至少一个目标网络层进行量化,从而分阶段进行多次量化,采用量化和训练相交替的方式,循序渐进地将N个网络层中的所有网络层进行量化,使模型逐步适应量化带来误差,相比于一次性将所有网络层进行量化,本申请实施例的方案能够保留模型的表征能力,降低量化所导致的误差。
在本申请实施例中,获取第一模型,以及第二数据集,采用第二数据集对第一模型进行训练;从N个网络层中确定第一目标网络层,对第一目标网络层进行量化;采用第二数据集对量化后的第一模型进行训练,从N个网络层中确定第二目标网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。可见,在对第一模型进行迭代训练过程中,通过对目标网络层进行量化来更新第一模型,能够减小神经网络模型的规模,从而实现了模型量化。
请参阅图3,图3为本申请实施例提供的另一种数据处理方法的流程图。该方法由一个数据处理设备来执行,本申请实施例的所述方法包括如下步骤:
S301、获取第一模型。
在一种实施方式中,响应于在数据处理设备中部署第一模型的请求,数据处理设备获取第一模型,在获取第一模型后,数据处理设备根据自身的配置参数判断自身是否满足部署第一模型的部署条件,数据处理设备的配置参数包括存储空间、处理能力、功耗等;响应于数据处理设备的配置参数与第一模型的部署条件不匹配,则继续执行步骤S302-步骤S308,或者执行步骤S202-步骤S204,进而得到第一模型对应的量化模型,响应于量化模型的部署条件与数据处理设备的配置参数相匹配,部署该量化模型;相应地,在数据处理设备的配置参数与第一模型的部署条件匹配的情况下,则数据处理设备直接部署第一模型。
根据上述内容可知,在数据处理设备中部署模型的过程,也即是数据处理设备响应于数据处理设备的配置参数与第一模型的部署条件不匹配,获取第二数据集,从N个网络层中确定未量化的第一目标网络层,对第一目标网络层进行量化,得到更新后的第一模型,继续采用第二数据集对更新后的第一模型进行训练,从N个网络层中继续确定未量化的第二目标网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。数据处理设备基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型,该量化模型的部署条件与数据处理设备的配置参数相匹配。数据处理设备将量化模型部署在数据处理设备中。
其中,基于量化系数对第二模型中的网络参数进行量化转换的过程详见下述步骤S307,在此暂不作说明。
S302、获取第二数据集,采用第二数据集对第一模型进行训练。
步骤S301和步骤S302的实施方式可参考图2中步骤S201和步骤S202中的实施方式,在此不再赘述。
S303、在当前迭代次数满足目标条件的情况下,从N个网络层中确定第一目标网络层。
在一种实施方式中,N个网络层中包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N。数据处理设备按照顺序从M个卷积层和W个全连接层中,选择未进行量化的网络层,并将选择的网络层作为第一目标网络层。例如,第一模型中第3层-第7层为卷积层,第21层-第23层为全连接层,且第3层和第4层已进行过量化处理,则数据处理设备按照由浅至深的顺序,将第5层确定为目标待量化网络层。
S304、获取量化系数,基于量化系数与第一参数确定伪量化算子。
在一种实施方式中,第一参数的数量为至少一个,第一参数为第一目标网络层中的参数。数据处理设备获取量化系数的过程包括:确定量化位数,量化位数是由用户根据量化需求设定的,或者是由开发人员预先设置好的;并从至少一个第一参数中确定满足绝对值要求的目标第一参数。在一个实施例中,目标第一参数是至少一个第一参数中,绝对值最大的第一参数。进一步地,数据处理设备将目标第一参数和量化位数代入量化系数运算规则中进行运算,得到量化系数。
在得到量化系数后,数据处理设备基于量化系数和第一参数确定伪量化算子。在一个实施例中,数据处理设备将第一参数和量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算,再将取整运算结果与量化系数进行相乘运算,得到伪量化算子,可选地,确定方法如下述公式1所示。
公式1:
Figure PCTCN2021106602-appb-000001
其中,Q表示伪量化算子,R为第一参数,D表示量化系数,round()函数表示取整,即大于或者等于0.5的部分向上进位,反之则舍去。在一个实施例中,
Figure PCTCN2021106602-appb-000002
MAX=max(abs(R));abs()为绝对值函数,abs(R)表示求R的绝对值,max(abs(R))即目标第一参数,也即是绝对值最大的第一参数,L为量化位数,对于整形化而言,L=8,即量化位数为8位。
从公式1能够看出,伪量化算子是基于量化系数构建的。并且,从量化系数的公式能够看出,数据处理设备是根据目标第一参数和量化位数来确定量化系数的,该量化系数与目标第一参数正相关,该量化系数与量化位数负相关。
S305、将第一参数与伪量化算子进行运算,采用运算结果替换第一目标网络层中的第一参数。
在一种实施方式中,数据处理设备在得到伪量化算子后,将伪量化算子与第一参数进行运算,得到运算结果,该运算结果包括第一目标网络层中各个参数对应的量化后的参数,该运算包括相乘或者相除等,第一参数即第一目标网络层中的参数,采用量化后的参数替换掉第一目标网络层中的参数,得到量化后的第一目标网络层。
其中,将第一参数与伪量化算子进行运算是指采用伪量化算子对第一参数进行运算。上述步骤S305,也即是采用伪量化算子对第一参数进行运算,采用运算结果替换所述第一参数。
S306、采用第二数据集对更新后的第一模型进行训练,得到第二模型。
在一种实施方式中,数据处理设备根据量化后的目标网络层更新第一模型,得到更新后的第一模型。即在更新目标网络层后,采用第二数据集对更新后的第一模型进行训练,也即是对更新后的第一模型进行参数调整,得到第二模型。也就是说,当数据处理设备根据伪量化算子更新第一模型中的一层网络层的参数后,可能会对其他网络层造成影响,因此每更新一层网络层的参数,需要采用第二数据集对更新后的第一模型进行训练,来调整第一模型中的参数,以使得更新后的第一模型的预测结果更接近实际结果,这里所说的实际结果是指第二数据的训练标签。
进一步地,数据处理设备采用第二数据集对更新后的第一模型进行训练过程中,在当前迭代次数满足目标条件,且N个网络层中存在待量化的网络层的情况下,将待量化的网络层确定为目标网络层,触发执行对目标网络层进行量化的步骤。
也就是说,在迭代训练过程中,通过重复执行步骤S303-步骤S306,数据处理设备能够逐步对待量化网络模型中需要进行量化的网络层进行量化,也即是分阶段进行量化,即每次选择待量化网络模型中的一个需要进行量化的网络层进行量化,直至待量化网络模型中所有需要进行量化的网络层均完成量化,且第一模型收敛,得到最终的量化模型。实践发现,通过本申请提供的数据处理方法对模型进行处理,能够减小神经网络模型的规模,保留了神经网络模型的表征能力,降低了直接对神经网络模型中的所有网络层进行量化带来的识别精度损失。
上述步骤S306,也即是继续采用第二数据集对量化后的第一模型进行训练,从N个网络层中确定第二目标网络层,第二目标网络层是未量化的网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。
图4a为本申请实施例提供的一种第一模型的更新流程图。如图4a所示,第一模型的更新流程包括Step1-Step7:
Step1:数据处理设备获取第一模型,可选地,该第一模型的参数是全精度模型训练模块采用训练数据模块中的全精度数据集对初始模型进行预训练得到的,全精度数据集即第一数据集。
Step2:数据处理设备根据分阶段量化规则,确定伪量化节点的插入时机以及插入位置,插入时机是指触发确定目标网络层,对目标网络层进行量化的目标条件。本申请提出的分阶段逐层量化方案对应的示例规则为:由浅层至深层,每隔N步在需要量化的网络层的关联位置处插入伪量化算子,以模拟实际的量化操作。例如,在两层网络层之间插入伪量化算子。其中,一步是指对模型进行一轮前向运算和反向运算,也即是将训练数据输入模型,得到预测结果,根据预测结果和训练数据的标签对模型进行更新。
Step3:在Step2中数据处理设备判定当前网络层中需要插入伪量化算子的情况下,则根据上述公式1插入当前网络层对应的伪量化算子,也即是通过伪量化算子对当前网络层的参数进行更新,实施方式可参考步骤S304和步骤S305,在此不再赘述。
Step4:数据处理设备获取训练数据,可选地,该训练数据是由训练数据模块提供的训练数据,例如训练数据是训练数据模块对全精度数据进行量化后得到的。
Step5:数据处理设备在带有伪量化算子的第一模型中进行前向处理,确定损失函数。
Step6:数据处理设备根据损失函数确定预训练模型中各个参数的梯度,更新第一模型的参数。需要说明的是,此时处理的数据仍为全精度形式,伪量化算子仅仅模拟了量化操作。
Step7:为了确保第一模型中所有网络层均已完成量化,判断第一模型中是否还存在未量 化的网络层,在第一模型中不存在未量化的网络层,并且第一模型已经收敛的情况下,则停止迭代更新第一模型,输出得到的第二模型;在第一模型中存在未量化的网络层的情况下,则继续重复step2~6,直至第一模型中不存在未量化的网络层,且第一模型已经收敛,得到第二模型。
S307、基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型。
在一种实施方式中,数据处理设备获取第二模型中量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数,根据量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数对第二模型进行转换,得到量化模型。数据处理设备提取各个伪量化算子中对应网络层的量化系数D和对应网络层量化后的参数Z=round(R/D),此时Z为L比特的定点数,量化系数D为全精度数。对于激活输出的量化算子,除了提取量化系数D外,保留对应的伪量化算子。在提取上述参数后,数据处理设备通过模型转换框架,将第二模型转换为量化模型,例如模型转换框架包括tflite(一种轻量级推理库)或者onnx(Open Neural Network Exchange,开放神经网络交换)等框架。
在另一种实施方式中,数据处理设备在得到量化模型后,根据自身的配置参数判断量化模型是否满足部署条件,在量化模型满足部署条件的情况下,则部署量化模型;在量化模型不满足部署条件的情况下,则通过调整量化位数进一步缩小量化模型的规模,以便于得到满足部署条件的量化模型,量化位数越小,则模型的规模越小,模型的规模与模型所需的存储空间、计算能力、功耗等有关。因此,数据处理设备能够通过调整对第一模型进行量化所采用的量化位数,来调整量化后得到的量化模型的部署条件,以使该量化模型的部署条件与该数据处理设备的配置参数相匹配。
在一种实施方式中,数据处理设备在部署量化模型后,数据处理设备获取待预测数据;对待预测数据进行量化,例如通过训练数据模块对待预测数据进行量化,并调用量化模型对量化后的待预测数据进行数据处理。可选地,量化模型是人脸识别模型,数据处理设备包括具有图像采集功能的设备,例如摄像头等,待预测数据是待处理人脸数据。数据处理设备通过具有图像采集功能的设备采集待处理人脸数据,并对待处理人脸数据进行量化,得到量化人脸数据,该量化人脸数据即为量化后的待预测数据,数据处理设备从量化人脸数据中确定人脸区域,例如对量化人脸数据进行裁剪,得到人脸区域,调用人脸识别模型对量化人脸区域进行人脸识别,并输出识别结果。可以理解的是,通过从量化人脸数据中确定人脸区域,能够进一步减小人脸识别模型的运算量,提高人脸识别模型的识别效率。可选地,量化模型是语音识别模型,数据处理设备包括语音采集设备,例如麦克风,待预测数据是待识别语音数据。数据处理设备通过语音采集设备采集的待识别语音数据,并对待识别语音数据进行量化,得到量化语音数据,量化语音数据即为量化后的待预测数据,数据处理设备调用语音识别模型对量化语音数据进行语音识别,并输出识别结果。可选地,量化模型还可以是预测模型,例如预测用户可能喜欢的商品、视频等,或者量化模型是分类模型,例如对短视频进行分类等。
本申请实施例中,获取第一模型,以及第二数据集,并采用第二数据集对第一模型进行训练;从N个网络层中确定未量化的第一目标网络层,对第一目标网络层进行量化,得到更新后的第一模型;继续采用第二数据集对更新后的第一模型进行训练,从N个网络层中继续确定未量化的第二目标网络层,对第二目标网络层进行量化,直至N个网络层中不存在未量化的网络层,得到第二模型。可见,在对第一模型进行迭代训练过程中,通过对目标网络层 进行量化来更新第一模型,能够减小神经网络模型的规模;实践发现,通过渐进优化的方式不仅能够得到紧凑、高效的识别模型,还能够显著降低量化误差对训练过程的干扰,进而优化量化模型的性能,例如提高量化模型的识别速度以及识别精度等。
基于上述的数据处理方法,本申请实施例提供了一种量化模型的应用场景,参见图4b,为本申请实施例提供的一种量化模型的应用场景图。在图4b中,数据处理设备401是部署了人脸识别模型的摄像头,人脸识别模型的部署方式可参考上述步骤S201-步骤S204,或者参考步骤S301-步骤S307,在此不再赘述。此外,该摄像头中存储了待寻找的目标人脸,例如走失的小孩的照片,摄像头采集经过摄像采集区域402的人的人脸数据,并将这些人脸与目标人脸进行比对,在检测到采集的人脸数据中存在与目标人脸匹配的人脸的情况下,则输出提示信息,与目标人脸匹配的人脸是指该人脸与目标人脸的相似度高于阈值。可选地,数据处理设备401将区域402中采集的人脸数据进行量化处理,得到量化后的人脸数据,例如人脸数据为人脸图片,对人脸图片进行量化处理是指调整人脸图片的清晰度。数据处理设备401从量化后的人脸数据中确定出量化人脸区域,调用人脸识别模型对量化人脸区域进行人脸识别,并输出人脸识别结果。可选地,对量化人脸区域进行人脸识别是指检测量化人脸区域与目标人脸相似度。
参见图4c,为本申请实施例提供的另一种量化模型的应用场景图。在图4c中,数据处理设备403是部署了人脸识别模型的门禁设备,该门禁设备中存储了具有开门权限的目标用户的人脸;响应于检测到开门请求,门禁设备采集当前请求开门的请求用户的人脸,在请求用户的人脸与目标用户的人脸匹配的情况下,则开门,在不匹配的情况下,则输出提示信息,该提示信息用于提示请求用户不具有开门权限。可选地,数据处理设备403将摄像采集区域404中采集的人脸数据进行量化处理,得到量化后的人脸数据,例如人脸数据为人脸图片,对人脸图片进行量化处理是指调整人脸图片的清晰度。数据处理设备403从量化后的人脸数据中确定出人脸区域,调用人脸识别模型对量化人脸区域进行人脸识别,在人脸识别通过的情况下,则开门;在人脸识别未通过的情况下(相似度低于阈值),则提示请求用户不具有开门权限。可选地,对量化人脸区域进行人脸识别是指检测量化人脸区域与目标用户的人脸之间的相似度,相似度高于阈值则说明人脸识别通过,相似度不高于阈值则说明人脸识别未通过。
上述详细阐述了本申请实施例的方法,为了便于更好地实施本申请实施例的上述方案,相应地,下面提供了本申请实施例的装置。
请参见图5,图5为本申请实施例提供的一种数据处理装置的结构示意图,该装置可以搭载在图1a所示的数据处理设备101,或者模型存储设备102上。图5所示的数据处理装置能够用于执行上述图2和图3所描述的方法实施例中的部分或全部功能。其中,各个单元的详细描述如下:
获取单元501,用于采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
处理单元502,用于采用所述第二数据集对所述第一模型进行训练;所述第二数据集包括第二数据以及所述第二数据对应的训练标签,所述第二数据是量化后的数据;以及用于从 所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;以及用于采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
在一个实施例中,处理单元502,用于:
获取量化系数,基于量化系数构建伪量化算子;
采用伪量化算子对第一参数进行运算,采用运算结果替换第一参数,第一参数是指第一目标网络层中的参数。
在一个实施例中,第一参数的数量为至少一个;处理单元502,用于:
确定量化位数,从至少一个第一参数中确定目标第一参数,目标第一参数满足绝对值要求;
根据目标第一参数和量化位数,确定量化系数,量化系数与目标第一参数正相关,量化系数与量化位数负相关。
在一个实施例中,处理单元502,用于:
将第一参数和量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算;
将取整运算结果与量化系数进行相乘运算,得到运算结果。
在一个实施例中,N个网络层中包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N;处理单元502,用于:
按照顺序从M个卷积层和W个全连接层中,选择未量化的网络层;
将选择的网络层作为第一目标网络层。
在一个实施例中,处理单元502,还用于:
在当前迭代次数满足目标条件,且N个网络层中存在未量化的网络层的情况下,将未量化的网络层确定为第一目标网络层。
在一个实施例中,目标条件包括:当前迭代次数能被P整除,P为正整数。
在一个实施例中,处理单元502,用于:
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型。
在一个实施例中,处理单元502,用于:
获取第二模型中量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数;
根据量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数对第二模型进行转换,得到量化模型。
在一个实施例中,处理单元502,还用于:
响应于在数据处理设备中部署第一模型的请求,获取数据处理设备的配置参数;
响应于数据处理设备的配置参数与第一模型的部署条件不匹配,执行采用第二数据集对第一模型进行训练的步骤;
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型,量化模型的部署条件与数据处理设备的配置参数相匹配;
将量化模型部署在数据处理设备中。
在一个实施例中,量化模型为人脸识别模型;处理单元502,还用于:
采集待识别人脸数据;
对待识别人脸数据进行量化,得到量化人脸数据;
从量化人脸数据中确定人脸区域;
调用量化模型对人脸区域进行识别处理,输出识别结果。
根据本申请的一个实施例,图2和图3所示的数据处理方法所涉及的部分步骤可由图5所示的数据处理装置中的各个单元来执行。例如,图2中所示的步骤S201和步骤S202可由图5所示的获取单元501执行,步骤S203和步骤S204可由图5所示的处理单元502执行。图3中所示的步骤S301和步骤S302可由图5所示的获取单元501执行,步骤S303-步骤S308可由图5所示的处理单元502执行。图5所示的数据处理装置中的各个单元分别或全部合并为一个或若干个另外的单元来构成,或者其中的某个或者某些单元还能够再拆分为功能上更小的多个单元来构成,实现同样的操作,而不影响本申请的实施例的技术效果的实现。上述单元是基于逻辑功能划分的,在实际应用中,一个单元的功能也能够由多个单元来实现,或者多个单元的功能由一个单元实现。在本申请的其它实施例中,数据处理装置包括其它单元,在实际应用中,这些功能也能够由其它单元协助实现,并且由多个单元协作实现。
根据本申请的另一个实施例,通过在包括中央处理单元(CPU,Central Processing Unit)、随机存取存储介质(RAM,Random Access Memory)、只读存储介质(ROM,Read-Only Memory)等处理元件和存储元件的例如计算机的通用计算装置上运行能够执行如图2和图3中所示的相应方法所涉及的各步骤的计算机程序(包括程序代码),来构造如图5中所示的数据处理装置,以及来实现本申请实施例的数据处理方法。计算机程序可以记载于例如计算机可读记录介质上,并通过计算机可读记录介质装载于上述计算装置中,并在其中运行。
基于同一发明构思,本申请实施例中提供的数据处理装置解决问题的原理与有益效果与本申请方法实施例中数据处理装置解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
请参阅图6,图6为本申请实施例提供的一种数据处理设备的结构示意图,数据处理设备至少包括处理器601、通信接口602和存储器603。其中,处理器601、通信接口602和存储器603可通过总线或其他方式连接。其中,处理器601(或称中央处理器(Central Processing Unit,CPU))是终端的计算核心以及控制核心,其能够解析终端内的各类指令以及处理终端的各类数据,例如:CPU能够用于解析用户向终端所发送的开关机指令,并控制终端进行开关机操作;再如:CPU能够在终端内部结构之间传输各类交互数据,等等。可选地,通信接口602包括标准的有线接口、无线接口(如WI-FI、移动通信接口等),受处理器601的控制用于收发数据;通信接口602还能够用于终端内部数据的传输以及交互。存储器603(Memory)是终端中的记忆设备,用于存放程序和数据。可以理解的是,此处的存储器603既可以包括终端的内置存储器,当然也可以包括终端所支持的扩展存储器。存储器603提供存储空间,该存储空间存储了终端的操作系统,可包括但不限于:Android系统、iOS系统、Windows Phone系统等等,本申请对此并不作限定。
在本申请实施例中,处理器601通过运行存储器603中的可执行程序代码,用于执行如下操作:
采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
采用第二数据集对所述第一模型进行训练,所述第二数据集包括第二数据以及所述第二 数据的训练标签,所述第二数据是量化后的数据;
从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;
采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
作为一种可选的实施例,处理器601还用于执行如下操作:
获取量化系数,基于量化系数构建伪量化算子;
采用伪量化算子对第一参数进行运算,采用运算结果替换第一参数,第一参数是指第一目标网络层中的参数。
作为一种可选的实施例,第一参数的数量为至少一个,处理器601还用于执行如下操作:
确定量化位数,从至少一个第一参数中确定目标第一参数,目标第一参数满足绝对值要求;
根据目标第一参数和量化位数,确定量化系数,量化系数与目标第一参数正相关,量化系数与量化位数负相关。
作为一种可选的实施例,处理器601还用于执行如下操作:
将第一参数和量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算;
将取整运算结果与量化系数进行相乘运算,得到运算结果。
作为一种可选的实施例,N个网络层中包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N;处理器601还用于执行如下操作:
按照顺序从M个卷积层和W个全连接层中,选择未量化的网络层;
将选择的网络层作为第一目标网络层。
作为一种可选的实施例,处理器601还用于执行如下操作:
在当前迭代次数满足目标条件,且N个网络层中存在未量化的网络层的情况下,将未量化的网络层确定为第一目标网络层。
作为一种可选的实施例,目标条件包括:当前迭代次数能被P整除,P为正整数。
作为一种可选的实施例,处理器601还用于执行如下操作:
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型。
作为一种可选的实施例,处理器601还用于执行如下操作:
获取第二模型中量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数;
根据量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数对第二模型进行转换,得到量化模型。
作为一种可选的实施例,处理器601还用于执行如下操作:
响应于在数据处理设备中部署第一模型的请求,获取数据处理设备的配置参数;
响应于数据处理设备的配置参数与第一模型的部署条件不匹配,执行采用第二数据集对第一模型进行训练的步骤;
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型,量化模型的部署条件与数据处理设备的配置参数相匹配;
将量化模型部署在数据处理设备中。
作为一种可选的实施例,量化模型为人脸识别模型;处理器601还用于执行如下操作:
采集待识别人脸数据;
对待识别人脸数据进行量化,得到量化人脸数据;
从量化人脸数据中确定人脸区域;
调用量化模型对人脸区域进行识别处理,输出识别结果。
基于同一发明构思,本申请实施例中提供的数据处理设备解决问题的原理与有益效果与本申请方法实施例中数据处理方法解决问题的原理和有益效果相似,可以参见方法的实施的原理和有益效果,为简洁描述,在这里不再赘述。
本申请实施例还提供一种计算机可读存储介质,计算机可读存储介质中存储有一条或多条指令,一条或多条指令用于由处理器加载以执行如下操作:
采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
采用第二数据集对所述第一模型进行训练,所述第二数据集包括第二数据以及所述第二数据的训练标签,所述第二数据是量化后的数据;
从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;
采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
获取量化系数,基于量化系数构建伪量化算子;
采用伪量化算子对第一参数进行运算,采用运算结果替换第一参数,第一参数是指第一目标网络层中的参数。
作为一种可选的实施例,第一参数的数量为至少一个,一条或多条指令还用于由处理器加载以执行如下操作:
确定量化位数,从至少一个第一参数中确定目标第一参数,目标第一参数满足绝对值要求;
根据目标第一参数和量化位数,确定量化系数,量化系数与目标第一参数正相关,量化系数与量化位数负相关。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
将第一参数和量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算;
将取整运算结果与量化系数进行相乘运算,得到运算结果。
作为一种可选的实施例,N个网络层中包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N;一条或多条指令还用于由处理器加载以执行如下操作:
按照顺序从M个卷积层和W个全连接层中,选择未量化的网络层;
将选择的网络层作为第一目标网络层。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
在当前迭代次数满足目标条件,且N个网络层中存在未量化的网络层的情况下,将未量化的网络层确定为第一目标网络层。
作为一种可选的实施例,目标条件包括:当前迭代次数能被P整除,P为正整数。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
获取第二模型中量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数;
根据量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数对第二模型进行转换,得到量化模型。
作为一种可选的实施例,一条或多条指令还用于由处理器加载以执行如下操作:
响应于在数据处理设备中部署第一模型的请求,获取数据处理设备的配置参数;
响应于数据处理设备的配置参数与第一模型的部署条件不匹配,执行采用第二数据集对第一模型进行训练的步骤;
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型,量化模型的部署条件与数据处理设备的配置参数相匹配;
将量化模型部署在数据处理设备中。
作为一种可选的实施例,量化模型为人脸识别模型;一条或多条指令还用于由处理器加载以执行如下操作:
采集待识别人脸数据;
对待识别人脸数据进行量化,得到量化人脸数据;
从量化人脸数据中确定人脸区域;
调用量化模型对人脸区域进行识别处理,输出识别结果。
本申请实施例还提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机执行上述方法实施例的数据处理方法。
本申请实施例还提供一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行如下操作:
采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
采用第二数据集对所述第一模型进行训练,所述第二数据集包括第二数据以及所述第二数据的训练标签,所述第二数据是量化后的数据;
从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;
采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
获取量化系数,基于量化系数构建伪量化算子;
采用伪量化算子对第一参数进行运算,采用运算结果替换第一参数,第一参数是指第一目标网络层中的参数。
作为一种可选的实施例,第一参数的数量为至少一个,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
确定量化位数,从至少一个第一参数中确定目标第一参数,目标第一参数满足绝对值要求;
根据目标第一参数和量化位数,确定量化系数,量化系数与目标第一参数正相关,量化系数与量化位数负相关。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
将第一参数和量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算;
将取整运算结果与量化系数进行相乘运算,得到运算结果。
作为一种可选的实施例,N个网络层中包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N;处理器还执行该计算机指令,使得该计算机设备执行如下操作:
按照顺序从M个卷积层和W个全连接层中,选择未量化的网络层;
将选择的网络层作为第一目标网络层。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
在当前迭代次数满足目标条件,且N个网络层中存在未量化的网络层的情况下,将未量化的网络层确定为第一目标网络层。
作为一种可选的实施例,目标条件包括:当前迭代次数能被P整除,P为正整数。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
获取第二模型中量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数;
根据量化后的网络层对应的伪量化算子的量化系数,以及量化后的网络层的参数对第二模型进行转换,得到量化模型。
作为一种可选的实施例,处理器还执行该计算机指令,使得该计算机设备执行如下操作:
响应于在数据处理设备中部署第一模型的请求,获取数据处理设备的配置参数;
响应于数据处理设备的配置参数与第一模型的部署条件不匹配,执行采用第二数据集对第一模型进行训练的步骤;
基于量化系数对第二模型中的网络参数进行量化转换,得到量化模型,量化模型的部署条件与数据处理设备的配置参数相匹配;
将量化模型部署在数据处理设备中。
作为一种可选的实施例,量化模型为人脸识别模型;处理器还执行该计算机指令,使得该计算机设备执行如下操作:
采集待识别人脸数据;
对待识别人脸数据进行量化,得到量化人脸数据;
从量化人脸数据中确定人脸区域;
调用量化模型对人脸区域进行识别处理,输出识别结果。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。
本申请实施例装置中的模块可以根据实际需要进行合并、划分和删减。
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序存储于一计算机可读存储介质中,可读存储介质包括:闪存盘、只读存储器(Read-Only Memory,ROM)、随机存取器(Random Access Memory,RAM)、磁盘或光盘等。
以上所揭露的仅为本申请一种可选实施例而已,当然不能以此来限定本申请之权利范围,本领域普通技术人员可以理解实现上述实施例的全部或部分流程,并依本申请权利要求所作的等同变化,仍属于发明所涵盖的范围。

Claims (14)

  1. 一种数据处理方法,应用于数据处理设备,所述方法包括:
    采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络层,N为正整数;
    采用第二数据集对所述第一模型进行训练,所述第二数据集包括第二数据以及所述第二数据的训练标签,所述第二数据是量化后的数据;
    从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;
    采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
  2. 如权利要求1所述的方法,其中,所述对所述第一目标网络层进行量化,包括:
    获取量化系数,基于所述量化系数构建伪量化算子;
    采用所述伪量化算子对第一参数进行运算,采用运算结果替换所述第一参数,所述第一参数是指所述第一目标网络层中的参数。
  3. 如权利要求2所述的方法,其中,所述第一参数的数量为至少一个,所述获取量化系数,包括:
    确定量化位数,从至少一个第一参数中确定目标第一参数,所述目标第一参数满足绝对值要求;
    根据所述目标第一参数和所述量化位数,确定所述量化系数,所述量化系数与所述目标第一参数正相关,所述量化系数与所述量化位数负相关。
  4. 如权利要求2所述的方法,其中,所述采用所述伪量化算子对第一参数进行运算,包括:
    将所述第一参数和所述量化系数进行相除运算,采用取整函数对相除运算结果进行取整运算;
    将取整运算结果与所述量化系数进行相乘运算,得到所述运算结果。
  5. 如权利要求1所述的方法,其中,所述N个网络层包括按照顺序连接的M个卷积层和W个全连接层,M和W为正整数,且M和W均小于N,所述从所述N个网络层中确定第一目标网络层,包括:
    按照顺序从M个卷积层和W个全连接层中,选择未量化的网络层;
    将选择的网络层作为所述第一目标网络层。
  6. 如权利要求1所述的方法,其中,所述从所述N个网络层中确定第一目标网络层,包 括:
    在当前迭代次数满足目标条件,且所述N个网络层中存在未量化的网络层的情况下,将未量化的网络层确定为所述第一目标网络层。
  7. 如权利要求6所述的方法,其特征在于,所述目标条件包括:当前迭代次数能被P整除,P为正整数。
  8. 如权利要求1所述的方法,其中,所述对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型之后,所述方法还包括:
    基于量化系数对所述第二模型中的网络参数进行量化转换,得到量化模型。
  9. 如权利要求8所述的方法,其中,所述基于量化系数对所述第二模型中的网络参数进行量化转换,得到量化模型,包括:
    获取所述第二模型中量化后的网络层对应的伪量化算子的量化系数,以及所述量化后的网络层的参数;
    根据量化后的网络层对应的伪量化算子的量化系数,以及所述量化后的网络层的参数对所述第二模型进行转换,得到所述量化模型。
  10. 如权利要求1所述的方法,其中,所述采用第二数据集对所述第一模型进行训练之前,所述方法还包括:
    响应于在数据处理设备中部署所述第一模型的请求,获取所述数据处理设备的配置参数;
    响应于所述数据处理设备的配置参数与所述第一模型的部署条件不匹配,执行所述采用第二数据集对所述第一模型进行训练的步骤;
    所述对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型之后,所述方法还包括:
    基于量化系数对所述第二模型中的网络参数进行量化转换,得到量化模型,所述量化模型的部署条件与所述数据处理设备的配置参数相匹配;
    将所述量化模型部署在所述数据处理设备中。
  11. 如权利要求10所述的方法,其中,所述量化模型为人脸识别模型,在将所述量化模型部署在所述数据处理设备中后,所述方法还包括:
    采集待识别人脸数据;
    对所述待识别人脸数据进行量化,得到量化人脸数据;
    从所述量化人脸数据中确定人脸区域;
    调用所述量化模型对所述人脸区域进行识别,输出识别结果。
  12. 一种数据处理装置,所述数据处理装置包括:
    获取单元,用于采用第一数据集对第一模型进行训练,所述第一数据集包括第一数据以及所述第一数据的训练标签,所述第一数据是未经处理的数据,所述第一模型包括N个网络 层,N为正整数;
    处理单元,用于采用所述第二数据集对所述第一模型进行训练;所述第二数据集包括第二数据以及所述第二数据对应的训练标签,所述第二数据是量化后的数据;以及用于从所述N个网络层中确定第一目标网络层,所述第一目标网络层是未量化的网络层,对所述第一目标网络层进行量化;以及用于采用所述第二数据集对量化后的所述第一模型进行训练,从所述N个网络层中确定第二目标网络层,所述第二目标网络层是未量化的网络层,对所述第二目标网络层进行量化,直至所述N个网络层中不存在未量化的网络层,得到第二模型。
  13. 一种数据处理设备,包括:存储装置和处理器;
    所述存储装置中存储有计算机程序;
    处理器,用于加载并执行所述计算机程序,以实现如权利要求1-11任一项所述的数据处理方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序适于被处理器加载并执行如权利要求1-11任一项所述的数据处理方法。
PCT/CN2021/106602 2021-05-27 2021-07-15 数据处理方法、装置、设备及计算机可读存储介质 WO2022246986A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/300,071 US20230252294A1 (en) 2021-05-27 2023-04-13 Data processing method, apparatus, and device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110583709.9 2021-05-27
CN202110583709.9A CN113762503A (zh) 2021-05-27 2021-05-27 数据处理方法、装置、设备及计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/300,071 Continuation US20230252294A1 (en) 2021-05-27 2023-04-13 Data processing method, apparatus, and device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022246986A1 true WO2022246986A1 (zh) 2022-12-01

Family

ID=78787214

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106602 WO2022246986A1 (zh) 2021-05-27 2021-07-15 数据处理方法、装置、设备及计算机可读存储介质

Country Status (3)

Country Link
US (1) US20230252294A1 (zh)
CN (1) CN113762503A (zh)
WO (1) WO2022246986A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11928762B2 (en) * 2021-09-03 2024-03-12 Adobe Inc. Asynchronous multi-user real-time streaming of web-based image edits using generative adversarial network(s)
CN117540677A (zh) * 2022-07-26 2024-02-09 中兴通讯股份有限公司 功率放大器模型的获取方法、装置及功率放大器模型

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110188880A (zh) * 2019-06-03 2019-08-30 四川长虹电器股份有限公司 一种深度神经网络的量化方法及装置
CN110969251A (zh) * 2019-11-28 2020-04-07 中国科学院自动化研究所 基于无标签数据的神经网络模型量化方法及装置
CN111598237A (zh) * 2020-05-21 2020-08-28 上海商汤智能科技有限公司 量化训练、图像处理方法及装置、存储介质
US20200320392A1 (en) * 2019-04-08 2020-10-08 Alibaba Group Holding Limited Optimization processing for neural network model
CN112101543A (zh) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 神经网络模型确定方法、装置、电子设备及可读存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180107926A1 (en) * 2016-10-19 2018-04-19 Samsung Electronics Co., Ltd. Method and apparatus for neural network quantization
US10491897B2 (en) * 2018-04-13 2019-11-26 Google Llc Spatially adaptive quantization-aware deblocking filter
US11562208B2 (en) * 2018-05-17 2023-01-24 Qualcomm Incorporated Continuous relaxation of quantization for discretized deep neural networks
CN111340226B (zh) * 2020-03-06 2022-01-25 北京市商汤科技开发有限公司 一种量化神经网络模型的训练及测试方法、装置及设备
CN111626402A (zh) * 2020-04-22 2020-09-04 中国人民解放军国防科技大学 一种卷积神经网络量化方法及装置、计算机可读存储介质
CN111695688B (zh) * 2020-06-11 2024-01-12 腾讯科技(深圳)有限公司 一种模型训练方法、装置、设备及存储介质
CN111612147A (zh) * 2020-06-30 2020-09-01 上海富瀚微电子股份有限公司 深度卷积网络的量化方法
CN112132219A (zh) * 2020-09-24 2020-12-25 天津锋物科技有限公司 一种基于移动端的深度学习检测模型的通用部署方案
CN112508125A (zh) * 2020-12-22 2021-03-16 无锡江南计算技术研究所 一种图像检测模型的高效全整数量化方法
CN112766307A (zh) * 2020-12-25 2021-05-07 北京迈格威科技有限公司 图像处理方法、装置、电子设备及可读存储介质
CN112613604A (zh) * 2021-01-07 2021-04-06 江苏禹盛科技有限公司 神经网络的量化方法及装置
CN112712068B (zh) * 2021-03-19 2021-07-06 腾讯科技(深圳)有限公司 一种关键点检测方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200320392A1 (en) * 2019-04-08 2020-10-08 Alibaba Group Holding Limited Optimization processing for neural network model
CN110188880A (zh) * 2019-06-03 2019-08-30 四川长虹电器股份有限公司 一种深度神经网络的量化方法及装置
CN110969251A (zh) * 2019-11-28 2020-04-07 中国科学院自动化研究所 基于无标签数据的神经网络模型量化方法及装置
CN111598237A (zh) * 2020-05-21 2020-08-28 上海商汤智能科技有限公司 量化训练、图像处理方法及装置、存储介质
CN112101543A (zh) * 2020-07-29 2020-12-18 北京迈格威科技有限公司 神经网络模型确定方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN113762503A (zh) 2021-12-07
US20230252294A1 (en) 2023-08-10

Similar Documents

Publication Publication Date Title
CN109978142B (zh) 神经网络模型的压缩方法和装置
US20200005673A1 (en) Method, apparatus, device and system for sign language translation
CN112257858A (zh) 一种模型压缩方法及装置
WO2022246986A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
US20220329807A1 (en) Image compression method and apparatus thereof
WO2021042857A1 (zh) 图像分割模型的处理方法和处理装置
CN111523640A (zh) 神经网络模型的训练方法和装置
CN113505883A (zh) 一种神经网络训练方法以及装置
WO2022028197A1 (zh) 一种图像处理方法及其设备
CN114418121A (zh) 模型训练方法、对象处理方法及装置、电子设备、介质
CN114698395A (zh) 神经网络模型的量化方法和装置、数据处理的方法和装置
CN115081616A (zh) 一种数据的去噪方法以及相关设备
US11366984B1 (en) Verifying a target object based on confidence coefficients generated by trained models
CN116737895A (zh) 一种数据处理方法及相关设备
CN112149426B (zh) 阅读任务处理方法及相关设备
CN113919479B (zh) 一种提取数据特征的方法和相关装置
CN113033422A (zh) 基于边缘计算的人脸检测方法、系统、设备和存储介质
EP3683733A1 (en) A method, an apparatus and a computer program product for neural networks
CN113361677A (zh) 神经网络模型的量化方法和装置
CN116913278B (zh) 语音处理方法、装置、设备和存储介质
WO2024017287A1 (zh) 一种模型训练方法及其装置
CN112529149B (zh) 一种数据处理方法及相关装置
CN117892700A (zh) 一种数据处理方法及其装置
CN113011555A (zh) 一种数据处理方法、装置、设备及存储介质
CN116152880A (zh) 转换模型的训练方法、人脸图像还原方法、装置、设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21942555

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 23/01/2024)