WO2021135707A1 - 机器学习模型的搜索方法及相关装置、设备 - Google Patents

机器学习模型的搜索方法及相关装置、设备 Download PDF

Info

Publication number
WO2021135707A1
WO2021135707A1 PCT/CN2020/130043 CN2020130043W WO2021135707A1 WO 2021135707 A1 WO2021135707 A1 WO 2021135707A1 CN 2020130043 W CN2020130043 W CN 2020130043W WO 2021135707 A1 WO2021135707 A1 WO 2021135707A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
target
layer structure
candidate
target model
Prior art date
Application number
PCT/CN2020/130043
Other languages
English (en)
French (fr)
Inventor
俞清华
刘默翰
隋志成
周力
白立勋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/758,166 priority Critical patent/US20230042397A1/en
Priority to EP20909071.1A priority patent/EP4068169A4/en
Publication of WO2021135707A1 publication Critical patent/WO2021135707A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the field of terminal technology, in particular to a search method of a machine learning model and related devices and equipment.
  • deep neural networks are widely used in the classification and recognition of images, speech, and text.
  • the network structure of the deep neural network is usually complex, the reasoning time is long, and the operation requires a large amount of memory. Due to the limitation of the processor computing power and memory storage resources of the mobile terminal, the deep neural network cannot be applied to the mobile terminal. Therefore, how to deploy the ever-increasing amount of computing deep learning models on mobile terminals is a problem that needs to be solved urgently.
  • Hybrid bit quantization for deep neural networks is an efficient solution to this problem.
  • Hybrid bit quantization means that the model parameters originally stored in 32bit floating point are stored in different low-bit (including 1bit, 2bit, 4bit, etc.) fixed-point storage, thereby reducing the memory size of the deep neural network running on the mobile terminal, and reasoning Time and power consumption, etc.
  • the prior art proposes a method for accelerating the model by combining hybrid bit quantization and neural architecture search (neural architecture search, NAS).
  • the principle is: Step S1, starting from the first layer of the model to be quantified and using it layer by layer
  • the reinforcement learning decision maker predicts the quantization method of each layer (the optional solution is 1 to 8bit, and the actual use is 2 to 8bit); step S2, when the quantization scheme of the k-th layer is determined by its prediction, the current quantization is calculated on the hardware Model inference time (ROM size or power consumption).
  • step S3 perform the model obtained in step S2 After training, the return function based on the accuracy rate is obtained, and it is put into the reinforcement learning decision maker in step S1, and the next layer of quantization scheme selection is performed until the model is fully quantized, and the final mixed bit quantization model is obtained.
  • the device needs to continuously communicate with the mobile terminal during the training process to obtain the current model performance, and the quantification is performed layer by layer, which will increase the time consumption of the model search and lead to the quantification of the model. And search efficiency is low.
  • the embodiment of the present invention provides a search method of a machine learning model and related devices and equipment to solve the problem of low efficiency in the quantification of the model and the search process.
  • an embodiment of the present invention provides a machine learning model search method, which includes: a computing device generates M pure bit models according to the model to be quantized, and obtains N of each layer structure of the M pure bit models. Evaluation parameters, the N evaluation parameters of each layer structure in the M pure bit models are measured by the mobile terminal when running the M pure bit models; further, at least one model search is performed to output the A model whose N evaluation parameters and the accuracy rate meet the requirements; wherein, the process of model search includes: training and testing the candidate model selected from the candidate set through the first data set to obtain the target model and the target model Accuracy: In the case that at least one of the N evaluation parameters of the target model does not meet the requirements and the accuracy of the target model is greater than the target threshold, according to the N evaluations of each layer structure in the M pure bit models Parameter Obtain N evaluation parameters of each layer structure in the target model, and determine the quantitative weight of each layer structure in the target model according to the network structure of the target model and the N evaluation parameters of each layer structure in the target
  • the above-mentioned pure bit model and the model to be quantized are deep neural networks with the same network structure, and M is a positive integer greater than 1;
  • the candidate set includes at least one candidate model;
  • the candidate model is a mixed bit model with the same network structure as the model to be quantized ;
  • the first data set includes multiple samples for training and testing candidate models in the candidate set.
  • the foregoing computing device may be a server, a cloud, or a distributed computing system.
  • the model quantization and model search multiple pure bit models are generated according to the model to be quantified, and then the evaluation parameters of each layer structure of the multiple pure bit models are obtained, and then a candidate is selected from the candidate set The model is trained and tested. After the target model is obtained, the quantitative weight of each layer structure in the target model can be determined based on the network structure of the target model and the evaluation parameters of each layer structure in the target model, so as to determine the quantitative weight of each layer structure in the target model.
  • the layer structure with the largest quantization weight is quantified, and the quantized model is added to the candidate set, which can reduce frequent information interaction with the terminal and improve the efficiency of model search and model quantization.
  • the N evaluation parameters include inference time and parameter amount
  • the computing device is based on the network structure of the target model and the N evaluation parameters of each layer structure in the target model.
  • One way to determine the quantitative weight of each layer structure in the target model can be:
  • the quantification of the layer structure i in the target model is determined according to the reasoning time of the layer structure i in the target model and the weight of the layer structure i Weights;
  • the quantification of the layer structure i in the target model is determined according to the parameter amount of the layer structure i in the target model and the weight of the layer structure i Weights;
  • the reasoning time of the target model is greater than the target reasoning time and the parameter quantity of the target model is greater than the target parameter quantity, then according to the reasoning time of the layer structure i in the target model, the parameter quantity of the layer structure i in the target model and the weight of the layer structure i Determine the quantitative weight of the layer structure i in the target model.
  • i is the index of the layer structure in the target model
  • i is a positive integer
  • i is not greater than the total number of layers in the target model
  • the total number of layers in the target model is the same as the total number of layers in the model to be evaluated .
  • the quantified weight of the layer structure of the target model is mainly considered when the reasoning time of its layer structure is satisfied; when the reasoning time of the target model meets the requirements and When the parameter quantity does not meet the requirements, when determining the quantitative weight of the layer structure of the target model, the parameter quantity of the layer structure is mainly considered; and when the reasoning time and reasoning time of the target model do not meet the requirements, when determining the target model
  • the quantization weight of the layer structure of the model considers the reasoning time and reasoning time of its layer structure at the same time, so that the quantization can be carried out in the direction where the reasoning time and the parameter amount meet the requirements, and the efficiency of search and quantization is further improved.
  • the quantized weight P i of the layer structure i in the target model is:
  • ⁇ , ⁇ are reasoning right time and the parameters of the amount of weight
  • O i is the weight of the layer structure i re
  • L i is the inference time layer structure i or for the inference time inference time and the target model layer structure i of the ratio
  • R i is the ratio of the parameter quantity of the layer structure i or the parameter quantity of the layer structure i to the parameter quantity of the target model
  • T is the ratio of the reasoning time of the target model to the target reasoning time
  • M is the parameter quantity of the target model and the target parameter quantity Ratio.
  • the reasoning time of the target model is the sum of the reasoning time of each layer structure in the target model
  • the parameter quantity of the target model is the sum of the parameter quantity of each layer structure in the target model.
  • the weight of the layer structure i is related to the position of the layer structure i in the target model.
  • the layer structure closer to the input layer of the target model has a smaller weight, and the layer structure closer to the output layer of the target model has a higher weight. Big weight.
  • the above method when determining the quantization weight of the layer structure in the target model, considers the importance of the position of the layer structure in the model to the accuracy of the model, and try to avoid lower-bit quantization of the layer structure close to the input layer to ensure the model The accuracy of search and quantification is improved.
  • a model before the candidate model selected from the candidate set is trained and tested through the first data set, a model may be selected from the candidate set.
  • the specific implementation of the model may be: the computing device trains and tests each candidate model in the candidate set through the second data set, and obtains the test accuracy rate of each candidate model in the candidate set.
  • the number of samples in the second data set is less than that of the first The number of samples in the data set; further, a candidate model is selected from the candidate set according to the test accuracy of each candidate model and the weight of each candidate model.
  • the weight of the candidate model is determined based on the total number of model searches when the candidate model is added to the candidate set and the total number of current model searches.
  • the probability/weight Q j of the j-th candidate model selected in the candidate set can be expressed as:
  • a j is the test accuracy of the j-th candidate model
  • w j is the weight of the j-th candidate model
  • selecting a model from the candidate set it is also possible to directly select a candidate model from the candidate set according to the test accuracy of each candidate model, for example, select the candidate model with the highest test accuracy from the candidate set .
  • the model is selected based on the accuracy rate, which ensures that the accuracy rate of the selected model is optimal, and further, improves the accuracy rate of the final output target model.
  • one implementation manner for the computing device to quantify the layer structure with the largest quantization weight in the target model may be: the computing device quantifies the layer structure with the largest quantization weight in the target model
  • the model parameters are respectively converted to model parameters represented by at least one bit number, and the at least one bit number is the number of bits in the bit number set that is lower than the current number of bits of the model parameter of the layer structure with the largest quantization weight in the target model.
  • the set includes M numerical values, and the M numerical values are respectively used to indicate the number of bits of the model parameter in the M pure bit models.
  • the selected layer structure in the target model is quantized with lower bits, and the quantized model is used as a candidate model.
  • the model search process may further include: in the case where the accuracy of the target model is less than the target threshold, the computing device reselects a model from the candidate set , Perform the above model search.
  • the target model when the accuracy of the target model does not meet the requirements, the target model is directly removed to reduce the search space and improve the search efficiency.
  • the N evaluation parameters include inference time
  • an implementation manner for the computing device to obtain the N evaluation parameters of each layer structure in the M pure bit models may be: The computing device sends M pure bit models to the mobile terminal, so that the mobile terminal runs M pure bit models and measures the inference time of each layer structure in the M pure bit models; further, it receives M sent by the mobile terminal. The reasoning time of each layer structure in the pure bit model.
  • the mobile terminal may also measure the parameter quantity of each layer structure in the M pure bit models, and further, the computing device may also receive the parameter quantity of each layer structure in the M pure bit models sent by the mobile terminal .
  • the candidate set in the first model search, includes the pure bit model with the highest number of bits among the M pure bit models, so that the search is from the highest bit pure bit model The model begins.
  • an embodiment of the present application also provides an image recognition method, including: a terminal obtains an image to be recognized, inputs the image to be recognized into a second image recognition model, obtains the category of the image to be recognized, and then outputs the image to be recognized. Identify the category of the image.
  • the second image recognition model is the first image classification model as the target model output by the search method for implementing the machine learning model in any one of the first aspect or the first aspect as the model to be quantified.
  • the first image recognition model is a trained deep neural network capable of recognizing image categories, and the first image recognition model is a full floating point model or a hybrid model.
  • the image to be recognized may be an image in the current scene that the terminal can obtain through a camera.
  • the aforementioned terminal may be a mobile phone, a tablet computer, a desktop computer, a digital camera, a smart watch, a smart bracelet, a camera, a TV, etc., which are not limited here.
  • the deep neural network can be run on terminals with limited memory and processing resources through the above-mentioned quantified model.
  • an embodiment of the present application also provides a search device for a machine learning model, including:
  • a generating module configured to generate M pure bit models according to the model to be quantized, wherein the pure bit model and the model to be quantized are deep neural networks with the same network structure, and M is a positive integer greater than 1;
  • the parameter acquisition module is used to acquire N evaluation parameters of each layer structure in the M pure bit models, and the N evaluation parameters of each layer structure in the M pure bit models are operated by the mobile terminal Measured in the M pure bit models;
  • the execution module is configured to perform at least one model search, and output a model whose N evaluation parameters and the accuracy rate all meet the requirements;
  • the execution module includes a training and testing unit, an acquisition unit, a weight unit, a quantization unit, and an adding unit.
  • the execution module executes the model search process:
  • the training and testing unit is used to train and test the candidate model selected from the candidate set through the first data set to obtain the target model and the accuracy of the target model;
  • the candidate set includes at least one candidate model;
  • the candidate model is a mixed bit model with the same network structure as the model to be quantized;
  • the first data set includes a plurality of samples for training and testing candidate models in the candidate set;
  • the acquiring unit is configured to, in the case that at least one of the N evaluation parameters of the target model does not meet the requirements, and the accuracy of the target model is greater than the target threshold, according to each of the M pure bit models N evaluation parameters of a layer structure to obtain N evaluation parameters of each layer structure in the target model;
  • the weight unit is used to determine the quantitative weight of each layer structure in the target model according to the network structure of the target model and the N evaluation parameters of each layer structure in the target model;
  • the quantization unit is used to quantify the layer structure with the largest quantization weight in the target model
  • the adding unit is used to add the quantized model to the candidate set.
  • each module/unit in the search device of the machine learning model may refer to the relevant description in the first aspect or any one of the possible implementations of the first aspect, and the search device of the machine learning model may also include Other modules/units used to implement the search method of the machine learning model of the first aspect or any one of the possible implementations of the first aspect will not be repeated here.
  • an embodiment of the present application also provides a search device for a machine learning model, including a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the memory When the stored program is executed, the search device for the machine learning model implements the search method for the machine learning model as described in the first aspect or any one of the possible implementation manners of the first aspect.
  • each device/unit in the search device of the machine learning model may refer to the relevant description in the first aspect or any one of the possible implementations of the first aspect, and the search device of the machine learning model may also include Other modules/units used to implement the search method of the machine learning model described in the first aspect or any one of the possible implementations of the first aspect will not be repeated here.
  • an embodiment of the present application also provides an image recognition device, including:
  • the acquiring unit is used to acquire the image to be recognized
  • the recognition unit is used to input the image to be recognized into the second image recognition model to obtain the category of the image to be recognized;
  • the output unit is used to output the category of the image to be recognized.
  • the second image recognition model is the first image classification model as the target model output by the search method for implementing the machine learning model in any one of the first aspect or the first aspect as the model to be quantified.
  • the first image recognition model is a trained deep neural network capable of recognizing image categories, and the first image recognition model is a full floating point model or a hybrid model.
  • the image to be recognized may be an image in the current scene that the terminal can obtain through a camera.
  • the above-mentioned image recognition device may be a mobile phone, a tablet computer, a desktop computer, a digital camera, a smart watch, a smart bracelet, a camera, a TV, etc., which are not limited here.
  • an embodiment of the present application also provides a search device for a machine learning model, including a processor and a memory, the memory is used to store a program, the processor executes the program stored in the memory, and when the memory When the stored program is executed, the search device of the machine learning model is caused to implement the image recognition method as described in the second aspect or any one of the possible implementation manners of the second aspect.
  • each device/unit in the image recognition device refers to the related description in the second aspect or any one of the possible implementation manners of the second aspect, and the image recognition device may also include other methods for implementing the foregoing first aspect.
  • the modules/units of the image recognition method described in the second aspect or any one of the possible implementation manners of the second aspect will not be repeated here.
  • the embodiments of the present application also provide a computer-readable storage medium, where the computer-readable medium is used to store computer-executable instructions, and the computer-executable instructions are used to make the computer executable when called by the computer.
  • the computer implements the search method of the machine learning model as described in the first aspect or any one of the possible implementation manners of the first aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the terminal can execute any possible implementation such as the first aspect or the first aspect.
  • the search method of the machine learning model described in the method is not limited to the first aspect.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable medium is used to store computer-executable instructions, and the computer-executable instructions are used to make the computer executable when called by the computer.
  • the computer implements the image recognition method according to the second aspect or any one of the possible implementation manners of the second aspect.
  • the embodiments of the present application also provide a computer program product containing instructions.
  • the terminal can execute any possible implementation such as the second aspect or the second aspect.
  • the image recognition method described in the method is not limited to the image recognition method.
  • FIG. 1 is a schematic diagram of the architecture of a system provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a search method for a machine learning model provided by an embodiment of the present application
  • 3A-3C are schematic explanatory diagrams of a search process of a machine learning model provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of an image recognition method provided by an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of a search device for a machine learning model provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an image recognition device provided by an embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of another search device for a machine learning model provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of another image recognition device provided by an embodiment of the present application.
  • a neural network can be composed of neural units.
  • a neural unit can refer to an arithmetic unit that takes x s and intercept 1 as inputs.
  • the output of the arithmetic unit can be:
  • s 1, 2,...n, n is a natural number greater than 1
  • W s is the weight of x s
  • b is the bias of the neural unit.
  • f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to convert the input signal in the neural unit into an output signal.
  • the output signal of the activation function can be used as the input of the next convolutional layer.
  • the activation function can be a sigmoid function.
  • a neural network is a network formed by connecting many of the above-mentioned single neural units together, that is, the output of one neural unit can be the input of another neural unit.
  • the input of each neural unit can be connected with the local receptive field of the previous layer to extract the characteristics of the local receptive field.
  • the local receptive field can be a region composed of several neural units.
  • Deep neural network also known as multi-layer neural network
  • DNN can be understood as a neural network with many hidden layers. There is no special metric for "many” here. From the division of DNN according to the location of different layers, the neural network inside the DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the number of layers in the middle are all hidden layers. The layers are fully connected, that is to say, any neuron in the i-th layer must be connected to any neuron in the i+1th layer.
  • DNN looks complicated, it is not complicated as far as the work of each layer is concerned.
  • DNN The definition of these parameters in DNN is as follows: Take coefficient W as an example: Suppose in a three-layer DNN, the linear coefficients from the fourth neuron in the second layer to the second neuron in the third layer are defined as The superscript 3 represents the number of layers where the coefficient W is located, and the subscript corresponds to the output third-level index 2 and the input second-level index 4.
  • the summary is: the coefficient from the kth neuron of the L-1 layer to the jth neuron of the Lth layer is defined as It should be noted that there is no W parameter in the input layer. In deep neural networks, more hidden layers make the network more capable of portraying complex situations in the real world.
  • a model with more parameters is more complex and has a greater "capacity", which means that it can complete more complex learning tasks.
  • Training the deep neural network is also the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).
  • each model is a deep neural network
  • the layer structure of the model is the above-mentioned hidden layer
  • the model parameter of the model is the weight matrix in each hidden layer
  • the model parameter in the layer structure is the value in the layer structure. Weight matrix.
  • Both fixed-point and floating-point are data types used by computers to store data.
  • the fundamental difference between the two lies in the position of the decimal point.
  • fixed-point numbers have fixed digits before and after the decimal point
  • floating-point numbers have no fixed digits before and after the decimal point. That is, the position of the decimal point of a floating-point number can change relative to the effective digits of the number.
  • floating-point numbers provide a larger value range and higher precision.
  • a fixed-point number has a certain number of reserved numbers.
  • the number on the left of the decimal point is the integer part of the fixed-point number, and the number on the right of the decimal point is the decimal part of the fixed-point number.
  • the maximum value that can be represented by the fixed-point number is 99.99, and the minimum value is 00.01 (here we take a decimal number as an example, the computer actually uses a binary number). It can be seen that the fixed-point number has a fixed window The form of makes it neither able to represent very large numbers nor very small numbers, and its accuracy loses 4, 8, 16, 32, etc.
  • Floating-point numbers use scientific notation to represent real numbers. Compared with a fixed window of fixed-point numbers, it uses a floating window, so it can represent a real number with a larger precision range. For example, 123.456 can be expressed as 1.23456*10 2 .
  • the full-float model is a deep neural network, and the data types of all the model parameters of the layer structure (that is, the weight matrix) are expressed in floating-point.
  • the pure bit model also known as the pure bit quantization model, is a deep neural network.
  • the data types of all the model parameters of the layer structure (that is, the weight matrix) are expressed by the same fixed number of bits (digits). .
  • the mixed bit model also known as the mixed bit quantization model, is a deep neural network.
  • the data types of the model parameters (that is, the weight matrix) of different layer structures adopt the same or different fixed-point expressions of the fixed number of bits (digits) .
  • the pure bit model is a kind of mixed bit model, and the data types of all the model parameters of the layer structure (that is, the weight matrix) are expressed by the same fixed number of bits (digits).
  • the quantization of the model in the embodiment of this application is neural network quantization, which is a model compression technology that converts floating-point storage (operation) into integer storage (operation), for example, the original model
  • the model parameters are represented by float32 (32-bit floating point).
  • the model parameters of the model are represented by int8 (8-bit fixed point).
  • the essence of the quantization of the model is the conversion/mapping between the two data types.
  • the floating point data data type of floating point data
  • fixed point data data type of fixed point data
  • R is the input floating-point data
  • Q is the fixed-point data after the floating-point data
  • Z is the zero point value (Zero Point)
  • S is the scale
  • R max represents the maximum value of the input floating point data
  • R min represents the minimum value of the input floating point data
  • Q max represents the maximum value of the fixed point data
  • R min represents the minimum value of the fixed point data
  • the conversion between fixed-point data with different bit numbers can refer to the above-mentioned conversion method between floating-point data and fixed-point data, or other conversion methods in the prior art, here No longer.
  • 4-bit and 8-bit can be performed with reference to the above conversion method, and an implementation method of floating-point data and 2-bit (1-bit) conversion can be performed by the following formula:
  • 2 bits can be expressed as three numbers -1,0,1.
  • T is the threshold.
  • the converted 2-bit fixed-point data is 1.
  • floating point data is less than -T, its value is converted to -1.
  • floating-point data has other values, its value is converted to 0.
  • the conversion method of 1 bit is similar to that of 2 bits, but its fixed-point value is only -1 and 1, where the T value is 0.
  • Fig. 1 is a schematic diagram of the architecture of a system provided by an embodiment of the present application, in which:
  • the client device 11 can send the first data set and the model to be quantized to the computing device 12.
  • the model to be quantized can be a full-floating point model, a deep neural network trained through the first data set, or a constructed unquantized model.
  • the trained deep neural network can also be a deep neural network obtained through automatic machine learning (AutoML).
  • AutoML automatic machine learning
  • the client device 11 can request the computing device 12 to quantify the quantized model to obtain accuracy, reasoning time, The parameter quantity and so on all meet the mixed bit model that the customer requires.
  • the client device 12 may also send to the computing device 12 its target accuracy rate, target reasoning time, target parameter amount and other standards of the quantized mixed bit model.
  • the first data set may also be derived from the database 13, and the first data set includes a plurality of samples.
  • the sample can be an image labeled as an object type.
  • the model to be quantified and the mixed bit model expected by the customer are both deep neural networks with image recognition capabilities. After receiving the image to be recognized, it can recognize the image.
  • the image category of the image to be recognized It is not limited to the above scenarios, but can also be applied to other scenarios.
  • the sample can also be an image labeled as a gesture type.
  • the model to be quantified and the mixed bit model expected by the customer are deep neural networks with gesture recognition capabilities. After the image to be recognized, the gesture of the image to be recognized can be recognized.
  • the computing device 12 After the computing device 12 receives the model to be quantified, it can quantify the model to be quantified according to the customer's requirements to obtain a hybrid quantization model with an accuracy rate greater than the target accuracy rate, inference time greater than the target inference time, and parameter amount less than the target parameter amount.
  • the computing device 12 may first quantify the model to be quantized to obtain multiple pure bit models. Each pure bit model has the same network structure as the model to be quantized, but the data type of the model parameter and the bit of the model parameter If the number is different, the obtained multiple pure bit models are sent to the mobile terminal test platform 14.
  • the mobile terminal 14 includes a test system. Each pure bit model is run on the mobile terminal 14.
  • the mobile terminal can obtain the inference time, parameter amount and other parameters of each layer structure in each pure bit model through the test system, that is, the mobile terminal
  • the reasoning time and parameter quantity of each layer structure expressed by fixed points with different numbers of bits are obtained.
  • the mobile terminal 14 can send to the computing device 12 the reasoning time and parameter amount of each layer structure in each pure bit model obtained by the test.
  • the computing device 12 can perform at least one model search process.
  • the process includes: the computing device 12 can train candidate models in the candidate set with a small number of samples to obtain the test accuracy of each candidate model, and then obtain the test accuracy from the candidate set based on the test accuracy. Select a candidate model, and train and test the candidate model through the first data set to obtain the trained candidate model (ie target model) and the accuracy of the target model; further, if the target model does not meet the customer's requirements, Then select the layer structure that needs to be quantified according to the reasoning time and/or parameter amount of the layer structure in the target model, and then quantify the selected layer structure in the target model to obtain one or more quantified models.
  • the computing device 12 adds the quantized model to the candidate set and repeats the next model The process of searching. It should be understood that only when the target model meets the user's requirements, the target model is output, and the output target model is the mixed bit model required by the client.
  • the computing device 12 may send the output target model to the client device 11.
  • the user equipment 15 mobile terminal
  • this target model sacrifices part of the accuracy compared to the model to be quantified, it greatly improves the calculation speed and realizes the application of complex deep neural networks on mobile terminals with lower memory capacity and memory bandwidth.
  • the computing device 12 may include multiple modules/nodes.
  • the computing device 12 may be a distributed computing system, and the multiple modules/nodes included in the computing device 12 may be computer devices with computing capabilities; on the other hand, the computing device 12 may be a device, and the multiple modules/nodes included in it may be functional modules/devices in the computing device 12, etc.
  • the model augmentation module 121 is used to generate multiple pure bit models according to the model to be quantified.
  • the model reasoning module 122 is used to exchange information with the mobile terminal to obtain the reasoning time and parameter values of each layer structure in each pure bit model.
  • the model selection module 123 is used to select a candidate model from the candidate set.
  • the model training module 124 is used to train the selected candidate models to obtain the target model.
  • the model testing module 125 is used to test the target model to obtain the accuracy of the target model.
  • the processing module 126 is used to determine whether the target model meets the customer's requirements, and after the target model meets the customer's requirements, the target model is output; the quantitative structure selection module 127 is used when the target model does not meet the customer's requirements, but the accuracy of the target model meets When requested by the customer, the layer structure with the largest quantitative weight in the target model is selected as the layer structure that needs to be quantified for the quantitative weight based on each layer structure in the target model.
  • the quantization module 128 is used to quantify the selected layer structure in the target model to obtain a quantized model, and add the quantized model to the candidate set.
  • the foregoing computing device 12 and each module in the computing device 12 may be a cloud server, a server, a computer device, a terminal device, etc., which will not be repeated here.
  • the aforementioned client equipment 11 or user equipment 15 may be a mobile phone, a tablet computer, a personal computer, a vehicle, an on-board unit, a point of sales (POS), a personal digital assistant (PDA), a drone, a smart watch , Smart glasses, VR equipment, etc., are not limited here.
  • the client device 11 may also be a server.
  • the client equipment 11, the user equipment 15, and the database 13 in the system are also not necessary equipment for the system, and the system does not include the above-mentioned equipment, or may also include other equipment or functional units, which is not limited in the embodiment of the present application.
  • step S11 can also be executed or implemented by the model augmentation module 121; step S12 can also be executed or implemented by the model inference module 122; step S13 can be executed or implemented by the model selection module 123; step S14 can be executed by the model training module Steps S15 and S16 can be implemented by the processing module 126; Steps S17 and S18 can be executed or implemented by the quantization structure selection module 127; Steps S19 and S20 can also be implemented by the quantization module 128.
  • the method 60 or each step in the method may be processed by the CPU separately, or jointly processed by the CPU and GPU, or GPU may not be used, and other processors suitable for neural network calculations may be used, for example, neural network
  • the processor is not limited here.
  • the embodiment of the present application takes a computing device as an example to illustrate, as shown in FIG. 2 is a schematic flowchart of a search method for a machine learning model and a search process for a machine learning model shown in FIGS. 3A-3C. Schematic illustration, the method may include, but is not limited to, some or all of the following steps:
  • the model to be quantized can be a full floating-point model or a hybrid model.
  • the full-float model refers to the model in which the data types of the model parameters are all floating-point
  • the mixed model refers to the model in which the data types of some model parameters are floating-point and the data types of some model parameters are fixed-point
  • the pure bit model refers to the model The data types of the parameters are all fixed-point and the number of digits of the model parameters are the same.
  • the pure bit model and the model to be quantified are deep neural networks with the same network structure, and the data types of the model parameters of the two are different.
  • the model to be quantified may be a trained deep learning model, or a constructed untrained machine learning model.
  • M pure bit models are generated, that is, the data types of the model parameters in the model to be quantized are respectively converted into fixed points with different bit numbers.
  • the layer structure is coded.
  • the layer structure i in the pure m-bit model can also be called the m-bit layer structure i, which can be expressed as F i,m , the layer structure F i, m refers to the data type of the layer structure n model parameter It is a fixed point of m bits (bits).
  • m and n are positive integers, and m is usually not greater than the number of digits of model parameters in the model to be quantized; n is not greater than the total number of layers in the model to be quantized/pure bit model.
  • the full floating point model can be converted into 5 pure bit models, which are a pure 1-bit model, a pure 2-bit model, a pure 4-bit model, a pure 8-bit model, and a pure 16-bit model. It should be noted that the five pure bit models obtained by the above conversion are exemplary. In another embodiment, the number of bits that can be converted by the model to be quantized is 3, 5-7, 9-16, 17-32, etc.
  • the pure bit model of, here is 1, 2, 4, 8, 16 as an example to illustrate.
  • S12 Run M pure bit models through the mobile terminal, and obtain N evaluation parameters of each layer structure in the M pure bit models, where the N evaluation parameters include reasoning time and/or parameter amount.
  • the computing device can send M pure bit models to the mobile terminal, and the mobile terminal can benchmark the M pure bit models, for example, input the pure bit model to the model performance evaluator , In order to get the reasoning time and parameter quantity of each layer structure in the pure bit model. Further, the mobile terminal may send the inference time and parameter amount of each layer structure in the M pure bit models obtained by the test to the computing device. It should be understood that for two layer structures with the same structure, if the number of bits of their model parameters are different, their inference time is generally different, and the layer structure with a larger number of bits has a longer inference time. It should be understood that benchmark testing is a method of testing code performance, and the model performance evaluator is an algorithm/program used to test the reasoning time and/or parameter amount of each layer of the deep neural model.
  • step S12 it is possible to obtain the inference time when the model parameters of each layer structure in the to-be-quantized model are at different bit numbers, and the parameter quantity of each layer structure, that is to say, the value of the layer structure F i, m can be obtained.
  • Reasoning time and parameter quantity among them, 1 ⁇ i ⁇ H, m ⁇ bit number combination.
  • H is the total number of layers in the layer structure of the model to be quantized/pure bit model;
  • the bit number set includes M values, which are the number of bits of the model parameters in the M pure bit models, as shown in the examples in Figure 3A- Figure 3C
  • the set of bits can be ⁇ 1,2,4,8,16 ⁇ .
  • the parameter quantity of each layer structure in the model to be quantified may also be obtained by a computing device.
  • the parameter amount is used to indicate the data amount of the model parameter in the model or layer structure.
  • the number of model parameters of each layer structure remains unchanged, but the number of bits of each model parameter changes.
  • the parameter amount of the layer structure is not only the same as that of the model parameters in the layer structure. The number is related to the number of model parameters. The more the number of model parameters and the greater the number of model parameters, the greater the number of parameters of the layer structure. That is to say, for the same layer structure, in a pure bit model with a lower number of bits, the parameter amount of the layer structure is smaller.
  • the models in the candidate set in this application are also called candidate models.
  • the candidate set may initially include a candidate model.
  • the candidate model may be the pure bit model with the highest number of bits among the M pure bit models. At this time, select The model of is the pure bit model with the highest number of bits.
  • the candidate model in the candidate set is a mixed bit model with the same network structure as the model to be quantized.
  • the candidate set may initially include one or more hybrid bit models, and the network structure of the hybrid bit model is the same as the network structure of the model to be quantized.
  • the candidate models in the candidate set are constantly changing during the model search process.
  • the model selected during each model search process will be removed from the candidate set, and the model obtained by low-bit quantization during each model search process will be added to the candidate set.
  • the computing device selects a candidate model from the candidate set can include but is not limited to the following three implementation methods:
  • the first implementation manner the computing device randomly selects a candidate model from the candidate set.
  • the computing device can select a model according to the accuracy of the candidate model in the candidate set.
  • the candidate models in the candidate set are all models that have not been trained and tested.
  • the embodiments of this application can pass a small number of samples (also referred to as the second data set in this application) Perform lightweight training and testing on candidate models, that is, train and test each candidate model in the candidate set separately through the second data set to obtain the accuracy of each candidate model after training, which is compared with passing a large number of samples (in this application) It is also called the first data set) to distinguish the accuracy rates obtained by training and testing the candidate model.
  • a small number of samples will be passed as the test accuracy rate. It should be understood that the number of samples in the first data set is greater than the number of samples in the second data set, and the second data set may be part of the samples in the first data set.
  • the computing device may comprehensively select candidate models based on the test accuracy of the candidate models and the weight of the candidate models.
  • the weight of the candidate model is related to the total number of model searches when the candidate model is added to the candidate set.
  • the weight of the candidate model can be based on the model search when the candidate model is added to the candidate set. The total number of times and the total number of current model searches are determined. For example, when a candidate model is added to the candidate set, the smaller the difference between the total number of model searches and the total number of current model searches, the higher the weight of the candidate model, and the quantified candidate model selected during the last model search will be selected The greater the probability.
  • the first candidate model is obtained by quantizing the selected candidate model.
  • the first candidate model is added to the candidate set during the third model search process (that is, the first When the candidate model is added to the candidate set, the total number of model searches is 3).
  • the fourth model search process since the first candidate model was added to the candidate set during the most recent model search process, this The first candidate model is set with a greater weight than the candidate models that have been added to the candidate set during the third model search, so that the first candidate model is preferentially selected.
  • the weight of the first candidate model becomes smaller and smaller.
  • the weight of the first candidate model is less than the weight of the candidate model obtained in the 5th, 6th, and 7th model searches.
  • the test accuracy of the first candidate model is higher than that of the 5th, 6th, and 7th model searches.
  • the method for selecting a candidate model can select the candidate model obtained by the most recent model search on the basis of preferentially selecting candidate models with high test accuracy.
  • the probability/weight Q j of the j-th candidate model selected in the candidate set can be expressed as:
  • a j is the test accuracy of the j-th candidate model
  • w j is the weight of the j-th candidate model
  • the computing device may sharpen the probability/weight Q j of each candidate model being selected, for example, the probability/weight that the j-th candidate model in the candidate set is selected Q j is processed by the Sharpen algorithm, and the processed probability/weight D j is obtained , which can be expressed as:
  • C is a constant
  • j is a positive integer
  • j ⁇ S is the total number of candidate models in the candidate set.
  • the third implementation manner the computing device may also select the candidate model corresponding to the highest test accuracy among the candidate models.
  • the probability/weight Q j of the j-th candidate model selected in the candidate set can be expressed as:
  • a j is the test accuracy rate of the j-th candidate model.
  • the probability/weight Q j of the j-th candidate model selected in the candidate set can be processed by the Sharpen algorithm to obtain the processed probability/weight D j .
  • the k-th model search process is taken as an example to illustrate.
  • the candidate set may include at least one model obtained by low-bit quantization in the previous k-1 model search process.
  • the candidate model is selected based on the accuracy rate, so that the model with higher accuracy rate is selected, and the model obtained in each round may be selected, avoiding the search of the model to enter the local optimum, and improving the accuracy of the searched model rate.
  • the closer the search round is to this search the greater the probability/weight of the candidate model obtained in the search round, so that , The candidate model obtained in the most recent round is selected first.
  • the candidate set before the kth model search is performed, only includes at least one model obtained by low-bit quantization during the k-1th model search.
  • the model search process does not consider the model obtained by low-bit quantization during the first k-2 model searches, which can reduce the search space, reduce the workload of experimental training, and speed up the process of model search.
  • the implementation of selecting a candidate model from the candidate set may be the same or different. It should also be noted that after the candidate model in the candidate set is selected, the selected candidate model (ie, the target model) is removed from the candidate set to avoid the target model being selected multiple times.
  • S14 Train and test the candidate model selected from the candidate set through the first data set to obtain the target model and the accuracy of the target model.
  • the first data set includes multiple samples.
  • the first data set can be divided into a first training data set and a first test data set.
  • the candidate model selected from the candidate set is trained through the first training data set to obtain training
  • the latter model that is, the target model in this application
  • the target model is tested through the first test data set to obtain the accuracy of the target model.
  • S15 Determine whether the N evaluation parameters of the target model and the accuracy of the target model meet the requirements. If the N evaluation parameters and accuracy of the target model meet the requirements, perform S16; if the N evaluation parameters of the target model exist at least If N evaluation parameters do not meet the requirements and the accuracy of the target model meets the requirements, the target model can be quantified, that is, execute S17-20; if the accuracy of the target model does not meet the requirements, S12 can be executed again, from the candidate set Choose a new target model.
  • the N evaluation parameters may include inference time and/or parameter amount.
  • the reasoning time of the target model can be calculated based on the reasoning time of each layer of the M pure bit model tested on the mobile terminal; in the same way, the parameter value of the target model can also be calculated based on the M pure bit models tested on the mobile terminal.
  • the parameter quantity of each layer structure in the bit model is calculated.
  • the model to be quantified/target model includes an H-layer structure, and the layer structure of the target model is respectively denoted as F 1,8 , F 2,4 , ..., F i,8 , ..., F H,16 .
  • the reasoning time of the target model is the sum of the reasoning time of the above-mentioned layer structure F 1,8 , F 2,4 , ..., F i,8 , ..., F H,16 ;
  • the parameter quantity of the target model is the above layer The sum of the parameters of the structure F 1,8 , F 2,4 , ..., F i,8 , ..., F H,16.
  • the realization of judging whether the N evaluation parameters of the target model meet the requirements (1) can be: judging whether the inference time of the target model is less than the target time threshold and judging whether the parameter quantity of the target model is less than the target parameter quantity, if the target model If the inference time is less than the target time threshold and the parameter of the target model is less than the target parameter, the N evaluation parameters of the target model meet the requirements; when the inference time of the target model is not less than the target time threshold or the parameter of the target model is not less than the target When the parameter quantity is used, at least one N evaluation parameters of the target model do not meet the requirements.
  • the N evaluation parameters may include reasoning time or parameter quantity. Further, for the specific realization of judging whether the reasoning time or parameter quantity of the target model meets the requirements, please refer to the relevant description in the above-mentioned realization (1). Go into details.
  • judging whether the accuracy of the target model meets the requirements can be achieved by judging that the accuracy of the target model is greater than the target threshold, such as 0.76, 0.8, 0.9, etc., if it is, then the accuracy of the target model is determined to meet the requirements, otherwise ,Does not meet the requirements.
  • the target threshold such as 0.76, 0.8, 0.9, etc.
  • target reasoning time, target parameter amount, and target threshold may be values set by the customer or the user, indicating the standard achieved by the target model desired by the user.
  • the target model When the N evaluation parameters of the target model meet the requirements and the accuracy of the target model meets the requirements, it means that the current target model meets the customer's requirements. At this time, the target model can be output. Further, the target model can be sent to the customer's device or User terminal.
  • the reasoning time and parameter quantity of each layer structure in the target model can be obtained from the above step S12, the reasoning time and parameter quantity of each layer structure F i, m , where 1 ⁇ i ⁇ H, m ⁇ bit number set .
  • S18 Determine the quantitative weight of each layer structure in the target model according to the network structure of the target model and the N evaluation parameters of each layer structure in the target model.
  • the quantization weight is used for the selection of the low-bit quantized layer structure in the target model.
  • determining the quantization weight of the layer structure please refer to the specific realization of the quantization weight P i of the layer structure i in the target model. Go into details.
  • step S18 if the reasoning time of the target model is greater than the target reasoning time and the parameter quantity of the target model is not greater than the target parameter quantity, the reasoning time of the layer structure i in the target model And the weight of the layer structure i determine the quantitative weight of the layer structure i in the target model; if the inference time of the target model is not greater than the target inference time and the parameter quantity of the target model is greater than the target parameter quantity, then The parameter quantity of the layer structure i in the target model and the weight of the layer structure i determine the quantitative weight of the layer structure i in the target model; if the reasoning time of the target model is greater than the target reasoning time and the target If the parameter value of the model is greater than the target parameter value, the target model is determined according to the inference time of the layer structure i in the target model, the parameter value of the layer structure i in the target model, and the weight of the layer structure i The quantized weight of the layer structure i.
  • the quantitative weight P i of the layer structure i in the target model is:
  • ⁇ , ⁇ are parameters, and the right amount of time inference weight, may be constant, as experience; i O i is the weight of the weight of the layer structure, i L i to the layer structure of the layer of inference time or The ratio of the inference time of structure i to the inference time of the target model; R i is the ratio of the parameter quantity of the layer structure i or the parameter quantity of the layer structure i to the parameter quantity of the target model; T is the ratio of the parameter quantity of the layer structure i to the parameter quantity of the target model; The ratio of the inference time of the target model to the target inference time, and M is the ratio of the parameter quantity of the target model to the target parameter quantity.
  • the layer structure can be considered in the quantitative weight of the layer structure of the target model.
  • the position of this factor by setting the weight of the layer structure, can give priority to the quantification of the layer structure that has little influence on the accuracy of the model.
  • the weight of the layer structure is related to the position of the layer structure in the model, and can be a preset value. Generally, the layer structure close to the input data has a smaller weight, and the layer structure close to the output data has a larger weight.
  • the position of the above-mentioned layer structure may not be considered, and the weight of the layer structure may not be set.
  • the quantized weight P i of the layer structure i in the target model can be expressed as:
  • each parameter in formula (6) can refer to the related description in the first implementation manner of step S18, which will not be repeated here.
  • the quantitative weight of each layer structure in the target model can be obtained.
  • the reasoning time of the target model is less than the target reasoning time and the parameter quantity of the target model is not When it is less than the target parameter quantity, it means that the reasoning time of the target model meets the requirements but its parameter quantity does not meet the requirements.
  • the quantization weight P i of the layer structure i in the target model Mainly consider the parameter quantity of this layer structure; when the reasoning time of the target model is not less than the target reasoning time and the parameter quantity of the target model is less than the target parameter quantity, it means that the reasoning time of the target model does not meet the requirements but the parameter quantity meets the requirements.
  • quantization is low-bit quantization
  • low-bit quantization refers to converting the number of bits of the model parameter of the selected layer structure in the target model into a number of bits lower than the number of bits of the current model parameter.
  • the specific implementation includes: converting the model parameters in the layer structure with the largest quantization weight in the target model into model parameters represented by at least one bit number, where the at least one bit number is a bit number set greater than the quantization weight in the target model
  • the number of bits of the model parameter of the largest layer structure that is lower than the current number of bits, the bit number set includes M values, and the M values are respectively used to indicate the number of bits of the model parameter in the M pure bit models.
  • the bit number set ⁇ 1,2,4,8,16 ⁇
  • the target model (F 1,8 , F 2,4 ,..., F i,8 ,..., F H,16
  • the layer structure selected in) is F H,16
  • the layer structure F H,16 can be quantified to obtain the low-bit quantization obtained by F H,1 , F H,2 , F H,4 , F H,8
  • the latter models are models (F 1,8 , F 2,4 , ..., F i,8 , ..., F H,1 ), models (F 1,8 , F 2,4 , ..., F i, 8 , ..., F H, 2 ), model (F 1,8 , F 2,4 , ..., F i,8 , ..., F H, 4 ) and model (F 1,8 , F 2,4 , ..., F i,8 ,..., F H,8 ).
  • the computing device may also determine that the selected layer structure is the number of bits quantized by F H,16. For example, you can choose to quantify the layer structure F H,16 to obtain F H,4 and F H,8 .
  • the quantized models added to the candidate set can be reduced, the search space of the models can be reduced, and the search process of the models can be accelerated.
  • all the models obtained by the above quantification can be used, such as models (F 1,8 , F 2,4 , ..., F i,8 , ..., F H,1 ), models (F 1,8 , F 2,4 , ..., F i,8 , ..., F H,2 ), model (F 1,8 , F 2,4 , ..., F i,8 , ..., F H,4 ) and model (F 1,8 , F 2,4 , ..., F i,8 , ..., F H,8 ) are all added to the candidate set.
  • part of the candidate models obtained by quantization may also be added to the candidate set.
  • step S20 After the computing device executes step S20 to update the candidate set, it can re-execute some or all of the above steps S13-S20.
  • the above steps S13-S20 can be called a one-time/round model search. After multiple model searches, After that, a target model whose reasoning time, parameter amount, and accuracy meet the requirements can be obtained.
  • the search method of the machine learning model shown in FIG. 2 can be applied to a variety of scenarios.
  • the model to be quantified may be a first image classification model, and the first image classification model may classify the input image.
  • Both the first data set and the second data set include multiple images, and each image is labeled with its category.
  • the first data set includes images labeled "chrysanthemum”, images labeled "lotus”, and images labeled "wheat”.
  • the image, the image marked "corn”, the image marked “peony”, etc. are marked with images of various plants.
  • the first image classification model can identify the types of plants in the image.
  • the first image classification model is a full floating-point model trained on the first data set.
  • the first image classification model is quantified through the search method of the machine learning model shown in FIG. 2 to obtain a target model whose reasoning time and parameter amount meet the requirements.
  • the second image classification model is also an image classification model, which is a hybrid bit model.
  • the second image classification model and the first image classification model have the same model structure, but the data types of the model parameters are different.
  • the second image recognition model is a target model output by the search method of the machine learning model shown in FIG. 2 through the search method of the machine learning model shown in FIG. 2 as the first image classification model as the model to be quantified.
  • the terminal may add the category of the image to be recognized to the image to be recognized. For example, if the current scene includes peony flowers, the second image recognition model can recognize that the category of the image to be recognized is "peony flower". At this time, the text "peony flower" can be added to the image to be recognized.
  • the first image classification model is quantified through the search method of the machine learning model shown in Figure 2, and the second machine learning model obtained can take up less memory and computing resources, has a faster image recognition speed, and can be passed In the process of acquiring an image, the camera recognizes the category of the image to output the recognition result to the user in real time.
  • the machine learning model search method provided in the embodiments of the present application can process other models to be quantified to obtain a target model that meets the requirements and apply it to the terminal.
  • the apparatus 500 may be the computing device 12 in the system shown in FIG. 1, and the apparatus 500 may include but is not limited to the following functional units :
  • the generating module 510 is configured to generate M pure bit models according to the model to be quantized, where the pure bit model and the model to be quantized are deep neural networks with the same network structure, and M is a positive integer greater than 1;
  • the parameter acquisition module 520 is configured to acquire N evaluation parameters of each layer structure in the M pure bit models.
  • the N evaluation parameters of each layer structure in the M pure bit models are determined by the mobile terminal. Measured when running the M pure bit models;
  • the execution module 530 is configured to perform at least one model search, and output a model whose N evaluation parameters and the accuracy rate all meet the requirements;
  • the execution module 530 includes a training and testing unit 531, an acquisition unit 532, a weight unit 533, a quantization unit 534, and an adding unit 535.
  • the execution module 530 executes the model search process:
  • the training and testing unit 531 is used to train and test the candidate model selected from the candidate set through the first data set to obtain the target model and the accuracy of the target model;
  • the candidate set includes at least one candidate model;
  • the candidate model is a mixed bit model with the same network structure as the model to be quantized;
  • the first data set includes a plurality of samples for training and testing candidate models in the candidate set;
  • the obtaining unit 532 is configured to: in the case that at least one of the N evaluation parameters of the target model does not meet the requirements and the accuracy of the target model is greater than the target threshold, according to the M pure bit models N evaluation parameters of each layer structure obtain N evaluation parameters of each layer structure in the target model;
  • the weight unit 533 is configured to determine the quantitative weight of each layer structure in the target model according to the network structure of the target model and the N evaluation parameters of each layer structure in the target model;
  • the quantization unit 534 is configured to quantize the layer structure with the largest quantization weight in the target model
  • the adding unit 535 is configured to add the quantized model to the candidate set.
  • the N evaluation parameters include inference time and parameter amount
  • the weight unit 533 is specifically configured to:
  • the inference time of the target model is greater than the target inference time and the parameter amount of the target model is not greater than the target parameter amount, then the inference time of the layer structure i in the target model and the weight of the layer structure i are determined. State the quantitative weight of the layer structure i in the target model;
  • the reasoning time of the target model is not greater than the target reasoning time and the parameter quantity of the target model is greater than the target parameter quantity, the parameter quantity of the layer structure i in the target model and the weight of the layer structure i are determined. State the quantitative weight of the layer structure i in the target model;
  • the reasoning time of the target model is greater than the target reasoning time and the parameter amount of the target model is greater than the target parameter amount, then according to the reasoning time of the layer structure i in the target model and the value of the layer structure i in the target model The parameter amount and the weight of the layer structure i determine the quantized weight of the layer structure i in the target model.
  • the execution module 530 further includes:
  • the selection unit 536 is configured to perform training and testing on each candidate model in the candidate set through the second data set before the training and testing unit executes the training and testing of the candidate model selected from the candidate set through the first data set. Training and testing to obtain the test accuracy rate of each candidate model in the candidate set, the number of samples in the second data set is less than the number of samples in the first data set; the test accuracy of each candidate model is accurate The rate and the weight of each candidate model select one candidate model from the candidate set.
  • the weight of the candidate model is determined according to the total number of model searches when the candidate model is added to the candidate set and the total number of current model searches.
  • the quantization unit 534 is specifically configured to:
  • the model parameters in the layer structure with the largest quantization weight in the target model are respectively converted into model parameters represented by at least one bit number, where the at least one bit number is the layer structure with the largest quantization weight in the set of bits than in the target model
  • the number of bits whose current bit number of the model parameter is lower than that of the bit number set includes M numerical values, and the M numerical values are respectively used to indicate the number of bits of the model parameter in the M pure-bit models.
  • execution module 530 is further configured to:
  • a new model is selected from the candidate set, and the model search is performed.
  • the N evaluation parameters include inference time
  • the parameter acquisition module 520 is specifically configured to:
  • the candidate set in the first model search, includes the pure bit model with the highest number of bits among the M pure bit models
  • FIG. 6 shows an image recognition apparatus provided by an embodiment of the application.
  • the apparatus 600 may be the user equipment 15 in the system shown in FIG. 1, and the apparatus 600 may include, but is not limited to, the following functional units:
  • the acquiring unit 610 is configured to acquire the image to be recognized
  • the recognition unit 620 is configured to input the image to be recognized into the second image recognition model to obtain the category of the image to be recognized;
  • the output unit 630 is configured to output the category of the image to be recognized.
  • the second image recognition model is a target model output by the search method of the machine learning model described in FIG. 2 by using the first image classification model as the model to be quantified.
  • the first image recognition model is a trained deep neural network capable of recognizing image categories, and the first image recognition model is a full floating point model or a hybrid model.
  • the image to be recognized may be an image in the current scene that the terminal can obtain through a camera.
  • FIG. 7 is a schematic diagram of the hardware structure of a search device for a machine learning model provided by an embodiment of the present application.
  • the neural network training device 700 shown in FIG. 7 may include a memory 701, a processor 702, a communication interface 703, and a bus 704. Among them, the memory 701, the processor 702, and the communication interface 703 realize the communication connection between each other through the bus 704.
  • the memory 701 may be a read only memory (Read Only Memory, ROM), a static storage device, a dynamic storage device, or a random access memory (Random Access Memory, RAM).
  • the memory 701 may store a program. When the program stored in the memory 701 is executed by the processor 702, the processor 702 and the communication interface 703 are used to execute all or part of the steps in the machine learning model search method of the embodiment of the present application.
  • the processor 702 may adopt a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a graphics processing unit (graphics processing unit, GPU), or one or more
  • the integrated circuit is used to execute related programs to realize the functions required by the units in the neural network training device of the embodiment of the present application, or to perform all or part of the steps in the search method of the machine learning model in the method of the present application.
  • the processor 702 may also be an integrated circuit chip with signal processing capability. In the implementation process, each step of the neural network training method of the present application can be completed by the integrated logic circuit of hardware in the processor 702 or instructions in the form of software.
  • the aforementioned processor 702 may also be a general-purpose processor, a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic devices , Discrete gates or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processing
  • ASIC application specific integrated circuit
  • FPGA off-the-shelf programmable gate array
  • FPGA Field Programmable Gate Array
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 701, and the processor 702 reads the information in the memory 701, and combines its hardware to complete the functions required by the units included in the search device of the machine learning model of the embodiment of the present application, or execute the method embodiment of the present application All or part of the steps in the search method of the machine learning model.
  • the communication interface 703 uses a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks.
  • a transceiver device such as but not limited to a transceiver to implement communication between the device 700 and other devices or communication networks.
  • the data set (the first data set and/or the second data set, the model to be quantified) can be obtained through the communication interface 703.
  • the bus 704 may include a path for transferring information between various components of the apparatus 700 (for example, the memory 701, the processor 702, and the communication interface 703).
  • parameter acquisition module 520 in the search device 500 of the machine learning model may be equivalent to the communication interface 703 in the neural network search device 700, and the generation module 510 and the execution module 530 may be equivalent to the processor 702.
  • FIG. 8 shows a schematic structural diagram of a terminal 800.
  • the terminal 800 is taken as an example to describe the embodiment in detail. It should be understood that the terminal 800 shown in FIG. 1 is only an example, and the terminal 800 may have more or fewer components than those shown in FIG. 8, may combine two or more components, or may have Different component configurations.
  • the various components shown in the figure may be implemented in hardware, software, or a combination of hardware and software including one or more signal processing and/or application specific integrated circuits.
  • the terminal 800 may include: a processor 810, an external memory interface 820, an internal memory 821, a universal serial bus (USB) interface 830, a charging management module 840, a power management module 841, a battery 842, antenna 1, antenna 2 , Mobile communication module 850, wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, earphone jack 870D, sensor module 880, buttons 890, motor 891, indicator 892, camera 893, display 894, and Subscriber identification module (subscriber identification module, SIM) card interface 895, etc.
  • a processor 810 an external memory interface 820, an internal memory 821, a universal serial bus (USB) interface 830, a charging management module 840, a power management module 841, a battery 842, antenna 1, antenna 2 , Mobile communication module 850, wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, earphone jack 870D
  • the sensor module 880 can include pressure sensor 880A, gyroscope sensor 880B, air pressure sensor 880C, magnetic sensor 880D, acceleration sensor 880E, distance sensor 880F, proximity light sensor 880G, fingerprint sensor 880H, temperature sensor 880J, touch sensor 880K, ambient light Sensor 880L, bone conduction sensor 880M, etc.
  • the structure illustrated in the embodiment of the present invention does not constitute a specific limitation on the terminal 800.
  • the terminal 800 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 810 may include one or more processing units.
  • the processor 810 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • AP application processor
  • modem processor modem processor
  • GPU graphics processing unit
  • image signal processor image signal processor
  • ISP image signal processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • the controller may be the nerve center and command center of the terminal 800.
  • the controller can generate operation control signals according to the instruction operation code and timing signals to complete the control of fetching instructions and executing instructions.
  • a memory may also be provided in the processor 810 for storing instructions and data.
  • the memory in the processor 810 is a cache memory.
  • the memory can store instructions or data that have just been used or recycled by the processor 810. If the processor 810 needs to use the instruction or data again, it can be directly called from the memory. Repeated accesses are avoided, the waiting time of the processor 810 is reduced, and the efficiency of the system is improved.
  • the processor 810 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, and a universal asynchronous transmitter/receiver (universal asynchronous) interface.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transmitter/receiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB Universal Serial Bus
  • the I2C interface is a bidirectional synchronous serial bus, including a serial data line (SDA) and a serial clock line (SCL).
  • the processor 810 may include multiple sets of I2C buses.
  • the processor 810 may be coupled to the touch sensor 880K, charger, flash, camera 893, etc., respectively through different I2C bus interfaces.
  • the processor 810 may couple the touch sensor 880K through an I2C interface, so that the processor 810 and the touch sensor 880K communicate through the I2C bus interface to implement the touch function of the terminal 800.
  • the I2S interface can be used for audio communication.
  • the processor 810 may include multiple sets of I2S buses.
  • the processor 810 may be coupled with the audio module 870 through an I2S bus to implement communication between the processor 810 and the audio module 870.
  • the audio module 870 may transmit audio signals to the wireless communication module 860 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.
  • the PCM interface can also be used for audio communication to sample, quantize and encode analog signals.
  • the audio module 870 and the wireless communication module 860 may be coupled through a PCM bus interface.
  • the audio module 870 may also transmit audio signals to the wireless communication module 860 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication.
  • the UART interface is a universal serial data bus used for asynchronous communication.
  • the bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.
  • the UART interface is usually used to connect the processor 810 and the wireless communication module 860.
  • the processor 810 communicates with the Bluetooth module in the wireless communication module 860 through the UART interface to implement the Bluetooth function.
  • the audio module 870 may transmit audio signals to the wireless communication module 860 through a UART interface, so as to realize the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect the processor 810 with the display screen 894, the camera 893 and other peripheral devices.
  • the MIPI interface includes a camera serial interface (camera serial interface, CSI), a display serial interface (display serial interface, DSI), and so on.
  • the processor 810 and the camera 893 communicate through a CSI interface to implement the shooting function of the terminal 800.
  • the processor 810 and the display screen 894 communicate through a DSI interface to realize the display function of the terminal 800.
  • the GPIO interface can be configured through software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface can be used to connect the processor 810 with the camera 893, the display screen 894, the wireless communication module 860, the audio module 870, the sensor module 880, and so on.
  • the GPIO interface can also be configured as an I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 830 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on.
  • the USB interface 830 can be used to connect a charger to charge the terminal 800, and can also be used to transfer data between the terminal 800 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiment of the present invention is merely a schematic description, and does not constitute a structural limitation of the terminal 800.
  • the terminal 800 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.
  • the charging management module 840 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the charging management module 840 may receive the charging input of the wired charger through the USB interface 830.
  • the charging management module 840 may receive the wireless charging input through the wireless charging coil of the terminal 800. While the charging management module 840 charges the battery 842, it can also supply power to the electronic device through the power management module 841.
  • the power management module 841 is used to connect the battery 842, the charging management module 840 and the processor 810.
  • the power management module 841 receives input from the battery 842 and/or the charge management module 840, and supplies power to the processor 810, the internal memory 821, the external memory, the display screen 894, the camera 893, and the wireless communication module 860.
  • the power management module 841 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).
  • the power management module 841 may also be provided in the processor 810.
  • the power management module 841 and the charging management module 840 may also be provided in the same device.
  • the wireless communication function of the terminal 800 can be realized by the antenna 1, the antenna 2, the mobile communication module 850, the wireless communication module 860, the modem processor, and the baseband processor.
  • the antenna 1 and the antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in the terminal 800 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • antenna 1 can be multiplexed as a diversity antenna of a wireless local area network.
  • the antenna can be used in combination with a tuning switch.
  • the mobile communication module 850 may provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the terminal 800.
  • the mobile communication module 850 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • the mobile communication module 850 can receive electromagnetic waves by the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 850 can also amplify the signal modulated by the modem processor, and convert it to electromagnetic wave radiation via the antenna 1.
  • at least part of the functional modules of the mobile communication module 850 may be provided in the processor 810.
  • at least part of the functional modules of the mobile communication module 850 and at least part of the modules of the processor 810 may be provided in the same device.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the application processor outputs a sound signal through an audio device (not limited to a speaker 870A, a receiver 870B, etc.), or displays an image or video through the display screen 894.
  • the modem processor may be an independent device.
  • the modem processor may be independent of the processor 810 and be provided in the same device as the mobile communication module 850 or other functional modules.
  • the wireless communication module 860 can provide applications on the terminal 800 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), and global navigation satellite systems. (global navigation satellite system, GNSS), frequency modulation (FM), near field communication (NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 860 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 860 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 810.
  • the wireless communication module 860 may also receive the signal to be sent from the processor 810, perform frequency modulation, amplify it, and convert it into electromagnetic wave radiation via the antenna 2.
  • the antenna 1 of the terminal 800 is coupled with the mobile communication module 850, and the antenna 2 is coupled with the wireless communication module 860, so that the terminal 800 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite-based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite-based augmentation systems
  • the terminal 800 implements a display function through a GPU, a display screen 894, and an application processor.
  • the GPU is an image processing microprocessor, which connects the display 894 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 810 may include one or more GPUs, which execute program instructions to generate or change display information.
  • the display screen 894 is used to display images, videos, and so on.
  • the display screen 894 includes a display panel.
  • the display panel can use liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • active-matrix organic light-emitting diode active-matrix organic light-emitting diode
  • AMOLED flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light-emitting diode (QLED), etc.
  • the terminal 800 may include one or N display screens 894, and N is a positive integer greater than one.
  • the terminal 800 can realize shooting functions through an ISP, a camera 893, a video codec, a GPU, a display screen 894, and an application processor.
  • the ISP is used to process the data fed back from the camera 893. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and is converted into an image visible to the naked eye.
  • ISP can also optimize the image noise, brightness, and skin color. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 893.
  • the camera 893 is used to capture still images or videos.
  • the object generates an optical image through the lens and is projected to the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the terminal 800 may include 1 or N cameras 893, and N is a positive integer greater than 1.
  • the digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 800 selects a frequency point, the digital signal processor is used to perform Fourier transform on the energy of the frequency point.
  • Video codecs are used to compress or decompress digital video.
  • the terminal 800 may support one or more video codecs. In this way, the terminal 800 can play or record videos in a variety of encoding formats, such as: moving picture experts group (MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • NPU is a neural-network (NN) computing processor.
  • NN neural-network
  • applications such as intelligent cognition of the terminal 800 can be realized, such as image recognition, face recognition, voice recognition, text understanding, and so on.
  • the external memory interface 820 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the terminal 800.
  • the external memory card communicates with the processor 810 through the external memory interface 820 to realize the data storage function. For example, save music, video and other files in an external memory card.
  • the internal memory 821 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 810 executes various functional applications and data processing of the terminal 800 by running instructions stored in the internal memory 821.
  • the internal memory 821 may include a storage program area and a storage data area.
  • the storage program area can store an operating system, an application program (such as a sound playback function, an image playback function, etc.) required by at least one function, and the like.
  • the data storage area can store data (such as audio data, phone book, etc.) created during the use of the terminal 800.
  • the internal memory 821 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.
  • UFS universal flash storage
  • the terminal 800 can implement audio functions through an audio module 870, a speaker 870A, a receiver 870B, a microphone 870C, a headphone interface 870D, and an application processor. For example, music playback, recording, etc.
  • the audio module 870 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 870 can also be used to encode and decode audio signals.
  • the audio module 870 may be provided in the processor 810, or part of the functional modules of the audio module 870 may be provided in the processor 810.
  • the speaker 870A also called “speaker” is used to convert audio electrical signals into sound signals.
  • the terminal 800 can listen to music through the speaker 870A, or listen to a hands-free call.
  • the receiver 870B also called “earpiece” is used to convert audio electrical signals into sound signals.
  • the terminal 800 answers a call or voice message, it can receive the voice by bringing the receiver 870B close to the human ear.
  • Microphone 870C also called “microphone” or “microphone” is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 870C through the human mouth, and input the sound signal to the microphone 870C.
  • the terminal 800 may be provided with at least one microphone 870C. In other embodiments, the terminal 800 may be provided with two microphones 870C, which can implement noise reduction functions in addition to collecting sound signals. In other embodiments, the terminal 800 may also be provided with three, four or more microphones 870C to collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.
  • the earphone interface 870D is used to connect wired earphones.
  • the earphone interface 870D may be a USB interface 830, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA, CTIA
  • the pressure sensor 880A is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the pressure sensor 880A may be provided on the display screen 894.
  • the capacitive pressure sensor may include at least two parallel plates with conductive materials. When a force is applied to the pressure sensor 880A, the capacitance between the electrodes changes. The terminal 800 determines the strength of the pressure according to the change in capacitance. When a touch operation acts on the display screen 894, the terminal 800 detects the intensity of the touch operation according to the pressure sensor 880A.
  • the terminal 800 may also calculate the touched position according to the detection signal of the pressure sensor 880A.
  • touch operations that act on the same touch position but have different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity of the touch operation is less than the first pressure threshold is applied to the short message application icon, an instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.
  • the gyro sensor 880B may be used to determine the motion posture of the terminal 800.
  • the angular velocity of the terminal 800 around three axes ie, x, y, and z axes
  • the gyro sensor 880B can be used for shooting anti-shake.
  • the gyroscope sensor 880B detects the jitter angle of the terminal 800, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the jitter of the terminal 800 through reverse movement to achieve anti-shake.
  • the gyro sensor 880B can also be used for navigation and somatosensory game scenes.
  • the air pressure sensor 880C is used to measure air pressure.
  • the terminal 800 uses the air pressure value measured by the air pressure sensor 880C to calculate the altitude to assist positioning and navigation.
  • the magnetic sensor 880D includes a Hall sensor.
  • the terminal 800 can use the magnetic sensor 880D to detect the opening and closing of the flip holster.
  • the terminal 800 can detect the opening and closing of the flip according to the magnetic sensor 880D.
  • features such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 880E can detect the magnitude of the acceleration of the terminal 800 in various directions (generally three axes). When the terminal 800 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic devices, and apply to applications such as horizontal and vertical screen switching, pedometers, etc.
  • the terminal 800 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 800 may use the distance sensor 880F to measure the distance to achieve fast focusing.
  • the proximity light sensor 880G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode.
  • the terminal 800 emits infrared light to the outside through the light emitting diode.
  • the terminal 800 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 800. When insufficient reflected light is detected, the terminal 800 can determine that there is no object near the terminal 800.
  • the terminal 800 can use the proximity light sensor 880G to detect that the user holds the terminal 800 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor 880G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.
  • the ambient light sensor 880L is used to sense the brightness of the ambient light.
  • the terminal 800 can adaptively adjust the brightness of the display screen 894 according to the perceived brightness of the ambient light.
  • the ambient light sensor 880L can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor 880L can also cooperate with the proximity light sensor 880G to detect whether the terminal 800 is in the pocket to prevent accidental touch.
  • the fingerprint sensor 880H is used to collect fingerprints.
  • the terminal 800 can use the collected fingerprint characteristics to realize fingerprint unlocking, access application locks, fingerprint photographs, fingerprint answering calls, and so on.
  • the temperature sensor 880J is used to detect temperature.
  • the terminal 800 uses the temperature detected by the temperature sensor 880J to execute a temperature processing strategy. For example, when the temperature reported by the temperature sensor 880J exceeds a threshold value, the terminal 800 executes to reduce the performance of the processor located near the temperature sensor 880J, so as to reduce power consumption and implement thermal protection.
  • the terminal 800 when the temperature is lower than another threshold, the terminal 800 heats the battery 842 to avoid abnormal shutdown of the terminal 800 due to low temperature.
  • the terminal 800 boosts the output voltage of the battery 842 to avoid abnormal shutdown caused by low temperature.
  • the touch sensor 880K is also called “touch panel”.
  • the touch sensor 880K can be arranged on the display screen 894, and the touch screen is composed of the touch sensor 880K and the display screen 894, which is also called a “touch screen”.
  • the touch sensor 880K is used to detect touch operations acting on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • the visual output related to the touch operation may be provided through the display screen 894.
  • the touch sensor 880K may also be disposed on the surface of the terminal 800, which is different from the position of the display screen 894.
  • the bone conduction sensor 880M can acquire vibration signals.
  • the bone conduction sensor 880M can acquire the vibration signal of the vibrating bone mass of the human voice.
  • the bone conduction sensor 880M can also contact the human pulse and receive the blood pressure pulse signal.
  • the bone conduction sensor 880M may also be provided in the earphone, combined with the bone conduction earphone.
  • the audio module 870 can parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 880M, and realize the voice function.
  • the application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 880M, and realize the heart rate detection function.
  • the button 890 includes a power button, a volume button, and so on.
  • the button 890 may be a mechanical button. It can also be a touch button.
  • the terminal 800 may receive key input, and generate key signal input related to user settings and function control of the terminal 800.
  • the motor 891 can generate vibration prompts.
  • the motor 891 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • touch operations applied to different applications can correspond to different vibration feedback effects.
  • Acting on touch operations in different areas of the display screen 894, the motor 891 can also correspond to different vibration feedback effects.
  • Different application scenarios for example: time reminding, receiving information, alarm clock, games, etc.
  • the touch vibration feedback effect can also support customization.
  • the indicator 892 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 895 is used to connect to the SIM card.
  • the SIM card can be connected to and separated from the terminal 800 by inserting the SIM card interface 895 or pulling out from the SIM card interface 895.
  • the terminal 800 may support 1 or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface 895 can support Nano SIM cards, Micro SIM cards, SIM cards, etc.
  • the same SIM card interface 895 can insert multiple cards at the same time. The types of the multiple cards can be the same or different.
  • the SIM card interface 895 can also be compatible with different types of SIM cards.
  • the SIM card interface 895 can also be compatible with external memory cards.
  • the terminal 800 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the terminal 800 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 800 and cannot be separated from the terminal 800.
  • the processor 14021 reads the information in the memory 1401, and combines its hardware to complete the functions required by the units included in the image recognition apparatus 600 of the embodiment of the present application, or perform the image recognition of the method embodiment of the present application method.
  • the terminal 800 may take an image of the current scene through the camera 893, and then obtain the image to be recognized.
  • the terminal 800 may output the to-be-recognized image and/or the category of the to-be-recognized image through the display.
  • the computer-readable medium may include a computer-readable storage medium, which corresponds to a tangible medium, such as a data storage medium, or a communication medium that includes any medium that facilitates the transfer of a computer program from one place to another (for example, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium, such as a signal or carrier wave.
  • Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, codes, and/or data structures for implementing the techniques described in this application.
  • the computer program product may include a computer-readable medium.
  • such computer-readable storage media may include RAM, ROM, EEPROM, CD-ROM or other optical disk storage devices, magnetic disk storage devices or other magnetic storage devices, flash memory, or structures that can be used to store instructions or data Any other media that can be accessed by the computer in the form of desired program code. And, any connection is properly termed a computer-readable medium.
  • any connection is properly termed a computer-readable medium.
  • coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave to transmit instructions from a website, server, or other remote source
  • coaxial cable Wire, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of media.
  • the computer-readable storage media and data storage media do not include connections, carrier waves, signals, or other temporary media, but are actually directed to non-transitory tangible storage media.
  • magnetic disks and optical disks include compact disks (CDs), laser disks, optical disks, digital versatile disks (DVD) and Blu-ray disks, where disks usually reproduce data magnetically, while optical disks use lasers to reproduce data optically data. Combinations of the above should also be included in the scope of computer-readable media.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • processor may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • DSP digital signal processors
  • ASIC application-specific integrated circuits
  • FPGA field programmable logic arrays
  • the term "processor” as used herein may refer to any of the foregoing structure or any other structure suitable for implementing the techniques described herein.
  • the functions described by the various illustrative logical blocks, modules, and steps described herein may be provided in dedicated hardware and/or software modules configured for encoding and decoding, or combined Into the combined codec.
  • the technology may be fully implemented in one or more circuits or logic elements.
  • the technology of this application can be implemented in a variety of devices or devices, including wireless handsets, integrated circuits (ICs), or a set of ICs (for example, chipsets).
  • ICs integrated circuits
  • Various components, modules, or units are described in this application to emphasize the functional aspects of the device for implementing the disclosed technology, but they do not necessarily need to be implemented by different hardware units.
  • various units can be combined with appropriate software and/or firmware in the codec hardware unit, or by interoperating hardware units (including one or more processors as described above). provide.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Neurology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本发明实施例公开了一种机器学习模型的搜索方法及相关装置、设备,具体涉及人工智能技术领域,该方法在进行模型的搜索和量化之前,根据待量化模型生成多个纯比特模型,进而获取到多个纯比特模型中每一层层结构的评价参数,进而,在从候选集中选择一个候选模型进行训练和测试,得到目标模型之后,可以基于目标模型的网络结构和目标模型中每一层层结构的评价参数确定出该目标模型中每一层层结构的量化权重,从而对目标模型中量化权重最大的层结构进行量化,将量化得到的模型添加进候选集,可以减少与终端的频繁信息交互,提高模型搜索和模型量化的效率。

Description

机器学习模型的搜索方法及相关装置、设备
本申请要求于2019年12月31日提交中国专利局、申请号为201911419960.0、申请名称为“机器学习模型的搜索方法及相关装置、设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及终端技术领域,尤其涉及一种机器学习模型的搜索方法及相关装置、设备。
背景技术
随着深度学习的高速发展,深度神经网络在图像、语音、文本的分类、识别等领域广泛应用。深度神经网络的网络结构通常复杂、推理时间长、运行需要占用大量的内存。由于移动终端的处理器计算能力和存储器存储资源的限制,导致,深度神经网络无法应用到移动终端。因此,如何将计算量与日俱增的深度学习模型部署在移动终端上是目前亟待解决的难题。
对深度神经网络进行混合比特量化是解决这一问题的一种高效解决方案。混合比特量化是指,将原本用32bit的浮点存储的模型参数通过成不同低比特(包括1bit,2bit,4bit等等)的定点存储,进而减少深度神经网络在移动终端运行的内存大小,推理时间和功耗等。
现有技术提出了一种将混合比特量化和神经网络架构搜索(neural architecture search,NAS)结合对模型进行加速的方法,其原理是:步骤S1,从待量化模型的第一层开始逐层利用强化学习决策器预测出每一层的量化方式(可选方案为1至8bit,实际使用2至8bit);步骤S2,当其预测确定第k层的量化方案后,会在硬件上计算当前量化模型的推理时间(ROM大小或者功耗)。如果不满足要求,则从第一层往后顺序减少已量化层的bit数(比如第一层是7bit量化方案,则减少为6bit),直至满足要求;步骤S3,将步骤S2得到的模型进行训练,得到基于准确率的回报函数,放入步骤S1中的强化学习决策器中,进行下一层的量化方案选择,直到模型全部量化,得到最终混合比特量化模型。
然而,上述对待量化模型的量化和搜索过程中,设备需要在训练过程中不断与移动终端通讯,获取当前模型性能,且量化是逐层进行,这样会增加模型搜索的耗时,导致模型的量化和搜索效率低。
发明内容
本发明实施例提供一种机器学习模型的搜索方法及相关装置、设备,以解决模型的量化和搜索过程中效率低的问题。
第一方面,本发明实施例提供了一种机器学习模型的搜索方法,包括:计算设备根据待量化模型生成M个纯比特模型,获取所述M个纯比特模型中每一层层结构的N个评价参数,该M个纯比特模型中每一层层结构的N个评价参数是由移动终端在运行所述M个纯比特模型时测量得到的;进而,执行至少一次模型搜索,输出所述N个评价参数和所述 准确率都满足要求的模型;其中,模型搜索的过程包括:通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和该目标模型的准确率;在目标模型的N个评价参数中存在至少一个评价参数不满足要求且该目标模型的准确率大于目标阈值的情况下,根据M个纯比特模型中每一层层结构的N个评价参数获取该目标模型中每一层层结构的N个评价参数,根据目标模型的网络结构和目标模型中每一层层结构的N个评价参数确定该目标模型中每一层层结构的量化权重,进而,对该目标模型中量化权重最大的层结构进行量化,将量化得到的模型添加至候选集中。其中,上述纯比特模型和待量化模型为网络结构相同的深度神经网络,M为大于1的正整数;候选集包括至少一个候选模型;候选模型是与待量化模型的网络结构相同的混合比特模型;第一数据集包括多个样本,用于训练和测试所述候选集中的候选模型。
应理解,上述计算设备可以是服务器、云端或分布式计算系统等。
上述方法,在对模型量化和模型搜索之前,根据待量化模型生成多个纯比特模型,进而获取到多个纯比特模型中每一层层结构的评价参数,进而,在从候选集中选择一个候选模型进行训练和测试,得到目标模型之后,可以基于目标模型的网络结构和目标模型中每一层层结构的评价参数确定出该目标模型中每一层层结构的量化权重,从而对目标模型中量化权重最大的层结构进行量化,将量化得到的模型添加进候选集,可以减少与终端的频繁信息交互,提高模型搜索和模型量化的效率。
结合第一方面,在一种可能的实现方式中,该N个评价参数包括推理时间和参数量,计算设备根据所述目标模型的网络结构和目标模型中每一层层结构的N个评价参数,确定目标模型中每一层层结构的量化权重的一种实现方式可以是:
若目标模型的推理时间大于目标推理时间且目标模型的参数量不大于目标参数量,则根据目标模型中的层结构i的推理时间和层结构i的权重确定目标模型中的层结构i的量化权重;
若目标模型的推理时间不大于目标推理时间且目标模型的参数量大于目标参数量,则根据目标模型中的层结构i的参数量和层结构i的权重确定目标模型中的层结构i的量化权重;
若目标模型的推理时间大于目标推理时间且目标模型的参数量大于目标参数量,则根据目标模型中的层结构i的推理时间、目标模型中的层结构i的参数量和层结构i的权重确定所述目标模型中的层结构i的量化权重。
应理解,i为目标模型中层结构的索引,i为正整数,i不大于目标模型中层结构的总层数,目标模型中层结构的总层数与待评估模型中的层结构的总层数相同。
上述方法,在目标模型的推理时间不满足要求且参数量满足要求的情况下,在确定目标模型的层结构的量化权重时主要考虑其层结构的推理时间;在目标模型的推理时间满足要求且参数量不满足要求的情况下,在确定目标模型的层结构的量化权重时主要考虑其层结构的参数量;而在目标模型的推理时间和推理时间都不满足要求的情况下,在确定目标模型的层结构的量化权重时同时考虑其层结构的推理时间和推理时间,进而,使得量化可以向推理时间和参数量都满足要求的方向进行,进一步提高搜索和量化的效率。
可选地,目标模型中层结构i的量化权重P i为:
P i=softmax{O i[αL i*f(T)+βR i*f(M)]}
Figure PCTCN2020130043-appb-000001
其中,α、β分别为推理时间和参数量的权重;O i为层结构i的权重,L i为层结构i的推理时间或为层结构i的推理时间与目标模型的推理时间的比值;R i为层结构i的参数量或层结构i的参数量与目标模型的参数量的比值;T为目标模型的推理时间与目标推理时间的比值,M为目标模型的参数量与目标参数量的比值。
其中,目标模型的推理时间为目标模型中的每一层层结构的推理时间之和;目标模型的参数量为目标模型中的每一层层结构的参数量之和。
可选地,层结构i的权重与层结构i在目标模型中的位置有关,越靠近目标模型的输入层的层结构具有较小的权重,而越靠近目标模型的输出层的层结构具有较大的权重。
上述方法,在确定目标模型中层结构的量化权重时,考虑了层结构在模型中的位置对模型的准确度的重要性,尽量避免对靠近输入层的层结构进行更低比特的量化,保证模型的准确度,提高搜索和量化的效率。
结合第一方面,在又一种可能的实现方式中,在通过第一数据集对从候选集中选择出的候选模型进行训练和测试之前,还可以从候选集中选择一个模型,从候选集中选择一个模型的具体实现方式可以是:计算设备通过第二数据集对候选集中每一个候选模型进行训练和测试,得到候选集中每一个候选模型的测验准确率,第二数据集中的样本的数量小于第一数据集中的样本的数量;进而,根据每一个候选模型的测验准确率和每一个候选模型的权重从所述候选集中选择一个候选模型。
可选地,候选模型的权重是根据候选模型被添加到候选集时模型搜索的总次数和当前模型搜索的总次数确定的。
可选地,候选集中第j个候选模型被选择的概率/权重Q j可以表示为:
Q j=softmax(w jA j)
其中,A j为第j个候选模型的测验准确率,w j为第j个候选模型的权重。
在从候选集中选择一个模型的另一种实现方式中,也可以直接根据每一个候选模型的测验准确率从所述候选集中选择一个候选模型,例如,从候选集中选择测验准确率最大的候选模型。
上述方法,基于准确率进行选择模型,保证了选中的模型的准确率最优,进而,提高最终输出的目标模型的准确率。
结合第一方面,在又一种可能的实现方式中,计算设备对目标模型中量化权重最大的层结构进行量化的一种实现方式可以是:计算设备将目标模型中量化权重最大的层结构中模型参数分别转换由至少一个比特数表示的模型参数,该至少一个比特数为比特数集合中比所述目标模型中量化权重最大的层结构的模型参数的当前比特数低的比特数,比特数集合包括M个数值,该M个数值分别用于指示M个纯比特模型中模型参数的比特数。
上述方法,通过将目标模型中被选中的层结构进行更低比特的量化,将量化后的模型作为候选模型。
结合第一方面,在又一种可能的实现方式中,该模型搜索的过程还可以包括:在目标模型的准确率小于所述目标阈值的情况下,计算设备从所述候选集中重新选择一个模型,执行上述模型搜索。
上述方法,在目标模型的准确率不满足要求时,直接去除该目标模型,减少搜索空间,提高搜索效率。
结合第一方面,在又一种可能的实现方式中,N个评价参数包括推理时间,计算设备获取M个纯比特模型中每一层层结构的N个评价参数的一种实现方式可以是:计算设备将M个纯比特模型发送至移动终端,以使移动终端运行M个纯比特模型和测量该M个纯比特模型中每一层层结构的推理时间;进而,接收移动终端发送的M个纯比特模型中每一层层结构的推理时间。
上述方法,只需要计算设备与移动终端信息交互一次,即可推算出任何混合比特模型的推理时间,进而,提高搜索和量化效率。
可选地,移动终端也可以测量该M个纯比特模型中每一层层结构的参数量,进而,计算设备也可以接收移动终端发送的M个纯比特模型中每一层层结构的参数量。
结合第一方面,在又一种可能的实现方式中,在第一次模型搜索时,候选集包括M个纯比特模型中比特数最高的纯比特模型,使得,搜索是从最高比特的纯比特模型开始。
第二方面,本申请实施例还提供了一种图像识别方法,包括:终端获取待识别图像,将待识别图像输入到第二图像识别模型,得到该待识别图像的类别,进而,输出该待识别图像的类别。
其中,第二图像识别模型是以第一图像分类模型作为待量化模型通过上述第一方面或第一方面任意一种实现所述的机器学习模型的搜索方法输出的目标模型。第一图像识别模型是训练后的可识别图像的类别的深度神经网络,第一图像识别模型为全浮点模型或混合模型。
可选地,待识别图像可以是终端可以通过摄像头获取当前场景中的图像。
上述终端可以是手机、平板电脑、台式电脑、数码相机、智能手表、智能手环、摄像头、电视等,此处不作限定。
上述方法,通过上述运行量化后的模型,对内存和处理资源有限的终端可以运行深度神经网络。
第三方面,本申请实施例还提供了一种机器学习模型的搜索装置,包括:
生成模块,用于根据待量化模型生成M个纯比特模型,其中,所述纯比特模型和所述待量化模型为网络结构相同的深度神经网络,M为大于1的正整数;
参数获取模块,用于获取所述M个纯比特模型中每一层层结构的N个评价参数,所述M个纯比特模型中每一层层结构的N个评价参数是由移动终端在运行所述M个纯比特模型时测量得到的;
执行模块,用于执行至少一次模型搜索,输出所述N个评价参数和所述准确率都满足要求的模型;
其中,所述执行模块包括训练测试单元、获取单元、权重单元、量化单元和添加单元,所述执行模块执行所述模型搜索的过程时:
所述训练测试单元用于通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和所述目标模型的准确率;所述候选集包括至少一个候选模型;所述候选模型是与所述待量化模型的网络结构相同的混合比特模型;所述第一数据集包括多个样本,用于训练和测试所述候选集中的候选模型;
所述获取单元用于在所述目标模型的N个评价参数中存在至少一个评价参数不满足要求且所述目标模型的准确率大于目标阈值的情况下,根据所述M个纯比特模型中每一层层结构的N个评价参数获取所述目标模型中每一层层结构的N个评价参数;
所述权重单元用于根据所述目标模型的网络结构和所述目标模型中每一层层结构的N个评价参数确定所述目标模型中每一层层结构的量化权重;
所述量化单元用于对所述目标模型中量化权重最大的层结构进行量化;
所述添加单元用于将量化得到的模型添加至所述候选集中。
可选地,机器学习模型的搜索装置中各个模块/单元的具体实现可以参见上述第一方面或第一方面任意一种可能的实现方式中相关描述,机器学习模型的搜索装置还可以包括用于其他用于实现上述第一方面或第一方面任意一种可能的实现方式的机器学习模型的搜索方法的模块/单元,此处不再赘述。
第四方面,本申请实施例还提供了一种机器学习模型的搜索装置,包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述机器学习模型的搜索装置实现如第一方面或第一方面任意一种可能的实现方式的所述的机器学习模型的搜索方法。
可选地,机器学习模型的搜索装置中各个器件/单元的具体实现可以参见上述第一方面或第一方面任意一种可能的实现方式中相关描述,机器学习模型的搜索装置还可以包括用于其他用于实现上述第一方面或第一方面任意一种可能的实现方式所述的机器学习模型的搜索方法的模块/单元,此处不再赘述。
第五方面,本申请实施例还提供了一种图像识别装置,包括:
获取单元,用于获取待识别图像;
识别单元,用于将待识别图像输入到第二图像识别模型,得到该待识别图像的类别;
输出单元,用于输出该待识别图像的类别。
其中,第二图像识别模型是以第一图像分类模型作为待量化模型通过上述第一方面或第一方面任意一种实现所述的机器学习模型的搜索方法输出的目标模型。第一图像识别模型是训练后的可识别图像的类别的深度神经网络,第一图像识别模型为全浮点模型或混合模型。
可选地,待识别图像可以是终端可以通过摄像头获取当前场景中的图像。
上述图像识别装置可以是手机、平板电脑、台式电脑、数码相机、智能手表、智能手环、摄像头、电视等,此处不作限定。
第六方面,本申请实施例还提供了一种机器学习模型的搜索装置,包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存 储的程序被执行时,使得所述机器学习模型的搜索装置实现如第二方面或第二方面任意一种可能的实现方式所述的图像识别方法。
可选地,图像识别装置中各个器件/单元的具体实现可以参见上述第二方面或第二方面任意一种可能的实现方式中相关描述,图像识别装置还可以包括用于其他用于实现上述第二方面或第二方面任意一种可能的实现方式所述的图像识别方法的模块/单元,此处不再赘述。
第七方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读介质用于存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第一方面或第一方面任意一种可能的实现方式所述的机器学习模型的搜索方法。
第八方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得上述终端执行如第一方面或第一方面任意一种可能的实现方式所述的机器学习模型的搜索方法。
第九方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读介质用于存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如第二方面或第二方面任意一种可能的实现方式所述的图像识别方法。
第十方面,本申请实施例还提供了一种包含指令的计算机程序产品,当上述计算机程序产品在电子设备上运行时,使得上述终端执行如第二方面或第二方面任意一种可能的实现方式所述的图像识别方法。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种系统的架构示意图;
图2是本申请实施例提供的一种机器学习模型的搜索方法的流程示意图;
图3A-图3C是本申请实施例提供的一种机器学习模型的搜索过程的示意性说明图;
图4是本申请实施例提供的一种图像识别方法的流程示意图;
图5是本申请实施例提供的一种机器学习模型的搜索装置的结构示意图;
图6是本申请实施例提供的一种图像识别装置的结构示意图;
图7是本申请实施例提供的又一种机器学习模型的搜索装置的结构示意图;
图8是本申请实施例提供的又一种图像识别装置的结构示意图。
具体实施方式
首先介绍本申请中涉及的专业术语和概念。
(1)神经网络
神经网络可以是由神经单元组成的,神经单元可以是指以x s和截距1为输入的运算单元,该运算单元的输出可以为:
Figure PCTCN2020130043-appb-000002
其中,s=1、2、……n,n为大于1的自然数,W s为x s的权重,b为神经单元的偏置。f为神经单元的激活函数(activation functions),用于将非线性特性引入神经网络中,来将神经单元中的输入信号转换为输出信号。该激活函数的输出信号可以作为下一层卷积层的输入。激活函数可以是sigmoid函数。神经网络是将许多个上述单一的神经单元联结在一起形成的网络,即一个神经单元的输出可以是另一个神经单元的输入。每个神经单元的输入可以与前一层的局部接受域相连,来提取局部接受域的特征,局部接受域可以是由若干个神经单元组成的区域。
(2)深度神经网络
深度神经网络(deep neural network,DNN),也称多层神经网络,可以理解为具有很多层隐含层的神经网络,这里的“很多”并没有特别的度量标准。从DNN按不同层的位置划分,DNN内部的神经网络可以分为三类:输入层,隐含层,输出层。一般来说第一层是输入层,最后一层是输出层,中间的层数都是隐含层。层与层之间是全连接的,也就是说,第i层的任意一个神经元一定与第i+1层的任意一个神经元相连。虽然DNN看起来很复杂,但是就每一层的工作来说,其实并不复杂,简单来说就是如下线性关系表达式y=α(Wx+b),其中,x是输入向量,y是输出向量,b是偏移向量,W是权重矩阵(也称系数),α()是激活函数。每一层仅仅是对输入向量x经过如此简单的操作得到输出向量y。由于DNN层数多,则系数W和偏移向量b的数量也就很多了。这些参数在DNN中的定义如下所述:以系数W为例:假设在一个三层的DNN中,第二层的第4个神经元到第三层的第2个神经元的线性系数定义为
Figure PCTCN2020130043-appb-000003
上标3代表系数W所在的层数,而下标对应的是输出的第三层索引2和输入的第二层索引4。总结就是:第L-1层的第k个神经元到第L层的第j个神经元的系数定义为
Figure PCTCN2020130043-appb-000004
需要注意的是,输入层是没有W参数的。在深度神经网络中,更多的隐含层让网络更能够刻画现实世界中的复杂情形。理论上而言,参数越多的模型复杂度越高,“容量”也就越大,也就意味着它能完成更复杂的学习任务。训练深度神经网络的也就是学习权重矩阵的过程,其最终目的是得到训练好的深度神经网络的所有层的权重矩阵(由很多层的向量W形成的权重矩阵)。
本申请中,各个模型均为深度神经网络,模型的层结构即为上述隐含层,模型的模型参数即为各个隐含层中权重矩阵,层结构中的模型参数即为该层结构中的权重矩阵。
(3)、定点(Fixed Point)和浮点(Floating Point)
定点和浮点都是计算机存储数据采用的数据类型。两者的根本区别在于小数点的位置,其中,定点数在小数点前后有固定位数,而浮点数在小数点前后没有固定位数,即浮点数的小数点的位置可以相对于该数字的有效数字变化。与定点数相比,浮点数提供了更大的取值范围和更高的精度。
定点数具有一定数量的保留数字,位于小数点左侧的数字为该定点数的整数部分,位于小数点右边的数字为该定点数的小数部分。例如,数字长度为4,小数点位于中间,那 么该定点数可表示的最大数值为99.99,最小数值为00.01(这里以十进制数字为例,计算机其实使用二进制的数字),可见,定点数的固定窗口的形式使得它既不能够表示非常大的数又不能表示非常小的数,其精度丢失4、8、16、32等
浮点数采用科学计数法的形式来表示实数,相对于定点数的固定窗口,它采用浮动窗口,因此可以表示较大精度范围的一个实数。例如,123.456可以表示为1.23456*10 2
(4)、全浮点模型、纯比特模型和混合比特模型
全浮点模型是一种深度神经网络,其所有的层结构的模型参数(即为权重矩阵)的数据类型均采用浮点表达。
纯比特模型,也称为纯比特量化模型,是一种深度神经网络,其所有的层结构的模型参数(即为权重矩阵)的数据类型均采用相同的固定比特数(位数)的定点表达。
混合比特模型,也称为混合比特量化模型,是一种深度神经网络,其不同层结构的模型参数(即为权重矩阵)的数据类型采用相同或不同的固定比特数(位数)的定点表达。纯比特模型是混合比特模型中的一种,其所有的层结构的模型参数(即为权重矩阵)的数据类型均采用相同的固定比特数(位数)的定点表达。
(5)、模型的量化
本申请实施例中模型的量化,即为神经网络的量化(neural network quantization),是将浮点存储(运算)转换为整型存储(运算)的一种模型压缩技术,例如,原来一个模型的模型参数使用float32(32位的浮点)表示,量化后该模型的模型参数使用int8(8位的定点)表示,通过模型的量化操作,以较小的精度损失为代价,提高模型的运算速度。
模型的量化的本质是两种数据类型的数据之间的转换/映射,其中,在将浮点数据(数据类型为浮点的数据)转换为定点数据(数据类型为定点的数据)的一种实现方式中,可以通过如下公式:
Figure PCTCN2020130043-appb-000005
其中,R为输入的浮点数据,Q为浮点数据R量化之后的定点数据,Z表示0点值(Zero Point),S表示比例,可见,确定S和Z后,既可进行这两个数据之间的转换。S和Z的确定方式很多,例如:
Figure PCTCN2020130043-appb-000006
Z=Q max-R max/S
其中,R max表示输入浮点数据的最大值,R min表示输入浮点数据的最小值,Q max表示定点数据的最大的值,R min表示定点数据的最小值。
其中,不同比特数(位数,1比特=1位)的定点数据之间的转换可以参照上述浮点数据和定点数据之间的转换方式,也可以是现有技术中其他的转换方式,这里不再赘述。
在一种实现中,4比特和8比特可以参照上述的转换方式进行,而浮点数据和2比特(1比特)转换的一种实现方式可通过如下公式进行:
Figure PCTCN2020130043-appb-000007
其中2比特可表示为三个数-1,0,1。T为阈值,浮点数据大于等于T时,转换得到的2比特的定点数据为1。浮点数据小于-T时,其值转换为-1。浮点数据为其他值时,其值转换为0。1比特的转换方式和2比特类似,但其定点值只有-1和1,其中T值为0。
下面介绍本申请实施例涉及的系统架构。如图1所示,图1是本申请实施例提供的一种系统的架构示意图,其中:
客户设备11可以将第一数据集和待量化模型发送至计算设备12,该待量化模型可以是全浮点模型,可以是通过第一数据集训练得到的深度神经网络,也可以是构建的未经过训练的深度神经网络,还也可以是通过自动机器学习(automatic machine learning,AutoML)得到的深度神经网络,客户设备11可以请求计算设备12对待量化模型进行量化,以得到准确率、推理时间、参数量等都符合客户要求的混合比特模型。客户设备12也可以将其对量化后的混合比特模型的目标准确率、目标推理时间、目标参数量等标准发送至计算设备12。
可选地,第一数据集也可以来源于数据库13,第一数据集包括多个样本。在一种场景中,样本可以是标签为物体类型的图像,待量化模型以及客户期望得到的混合比特模型都是具备图像识别能力的深度神经网络,其在接收到待识别图像后,可以识别该待识别图像的图像类别。不限于上述场景,还可以应用于其他场景,如样本还可以是标签为手势类型的图像,待量化模型以及客户期望得到的混合比特模型都是具备手势识别能力的深度神经网络,其在接收到待识别图像后,可以识别该待识别图像的手势。
计算设备12在接收到待量化模型后,可以根据客户的要求对该待量化模型进行量化,以得到准确率大于目标准确率、推理时间大于目标推理时间、参数量小于目标参数量的混合量化模型,在具体实现中,计算设备12可以首先将待量化模型进行量化,得到多个纯比特模型,各个纯比特模型与待量化模型的网络结构相同,但其模型参数的数据类型和模型参数的比特数不同,进而,将得到的多个纯比特模型发送至移动终端测试平台14。
移动终端14包括测试系统,在移动终端14运行各个纯比特模型,移动终端可以通过测试系统获取到各个纯比特模型中每一层层结构的推理时间、参数量等参数,也就是说,移动终端获取了由不同比特数的定点表达的各个层结构的推理时间和参数量。移动终端14可以将测试得到的各个纯比特模型中每一层层结构的推理时间、参数量等发送至计算设备12。
计算设备12可以执行至少一次模型搜索的过程,该过程包括:计算设备12可以通过少量的样本对候选集中的候选模型进行训练,得到各个候选模型的测试准确率,进而基于测试准确率从候选集中选取一个候选模型,并通过第一数据集对该候选模型进行训练和测试,得到训练后的候选模型(即目标模型)和目标模型的准确率;进而,若该目标模型不满足客户的要求,则对根据目标模型中层结构的推理时间和/或参数量等来选择需要进行量化的层结构,进而将目标模型中被选中的层结构进行量化,得到一个或多个量化后的模型, 应理解,目标模型在量化前后,只有被选中的层结构的模型参数的表达发生变化,其他层结构保持不变;进一步地,计算设备12将量化得到的模型添加至候选集中,并重复执行下一次模型搜索的过程。应理解,只有当目标模型满足用户要求时,才输出目标模型,输出的目标模型即为客户需要的混合比特模型。
计算设备12可以将输出的目标模型发送至客户设备11。用户设备15(移动终端)可以向客户设备11或计算设备12下载目标模型,以使用目标模型。该目标模型虽然相对于待量化模型牺牲部分精度,但大大提高了运算速度,实现在具有较低内存容量和内存带宽的移动终端上应用复杂的深度神经网络。
计算设备12可以包括多个模块/节点,一方面,计算设备12可以是分布式计算系统,计算设备12包括的多个模块/节点可以分别为具备计算能力的计算机设备;另一方面,计算设备12可以是一个设备,其包括的多个模块/节点可以是计算设备12中的功能模块/器件等。其中,模型增广模块121用于根据待量化模型生成多个纯比特模型。模型推理模块122用于与移动终端进行信息交互,得到各个纯比特模型中每一个层结构的推理时间和参数量。模型选择模块123,用于从候选集中选取一个候选模型。模型训练模块124用于对筛选出的候选模型进行训练,得到目标模型。模型测试模块125用于对目标模型进行测试,以得到目标模型的准确率。处理模块126用于判断目标模型是否满足客户的要求,在目标模型满足客户的要求后,输出该目标模型;量化结构选择模块127用于在目标模型不满足客户要求,但目标模型的准确率满足客户要求时,对基于目标模型中各个层结构的量化权重选择目标模型中量化权重最大的层结构为需要量化的层结构。量化模块128用于对目标模型中被选中的层结构进行量化,得到量化后的模型,并将量化后的模型添加至候选集。
上述计算设备12、计算设备12中的各个模块可以是云服务器、服务器、计算机设备、终端设备等,此处不再赘述。
上述客户设备11或用户设备15可以是手机、平板电脑、个人计算机、车辆、车载单元、销售终端(point of sales,POS)、个人数字助理(personal digital assistant,PDA)、无人机、智能手表、智能眼镜、VR设备等,此处不作限定。客户设备11还可以是服务器。
系统中客户设备11、用户设备15、数据库13也不是系统必须的设备,系统不包括上述设备,或还可以包括其他设备或功能单元,本申请实施例不作限定。
下面介绍本申请实施例提供的一种机器学习模型的搜索方法,该方法可以由图1中计算设备执行。可选地,步骤S11也可以由模型增广模块121执行或实现;步骤S12也可以由模型推理模块122执行或实现;步骤S13可以由模型选择模块123执行或实现;步骤S14可以由模型训练模块124和模型测试模块125来实现;步骤S15、S16可以由处理模块126来实现;步骤S17、S18可以由量化结构选择模块127执行或实现;步骤S19、S20也可以由量化模块128来实现。可选的,所述方法60或方法中各个步骤可以分别由CPU处理,也可以由CPU和GPU共同处理,也可以不用GPU,而使用其他适合用于神经网络计算的处理器,例如,神经网络处理器,此处不作限定。本申请实施例以执行主体为计算设备为例来说明,如图2所示的一种机器学习模型的搜索方法的流程示意图和图3A-图3C所示的一种机器学习模型的搜索过程的示意性说明图,该方法可以包括但不限于如下部分或全部 步骤:
S11:根据待量化模型生成M个纯比特模型。
其中,待量化模型可以是全浮点模型或混合模型。这里,全浮点模型是指模型参数的数据类型都为浮点的模型,混合模型是指部分模型参数的数据类型为浮点部分模型参数的数据类型为定点的模型,纯比特模型是指模型参数的数据类型都为定点且模型参数的位数都相同。
其中,纯比特模型和待量化模型为网络结构相同的深度神经网络,两者的模型参数的数据类型不同。可选地,该待量化模型可以是训练好的深度学习模型,也可以是构建的未经训练的机器学习模型。
根据待量化模型生成M个纯比特模型,即将待量化模型中模型参数的数据类型分别转换为不同比特数的定点。对层结构进行编码,纯m比特模型中的层结构i也可以称为m比特的层结构i,可以表示为F i,m,层结构F i,m指代层结构n模型参数的数据类型为m比特(位)的定点。m、n为正整数,m通常不大于待量化模型中的模型参数的位数;n不大于待量化模型/纯比特模型中层结构的总层数。
例如,参见图3A所示的全浮点模型量化为5个纯比特模型的示意性说明图,待量化模型为32比特的全浮点模型,包括H层层结构,即层结构i,i=1、2、…、H,H为待量化模型中层结构的总层数,H为正整数。可以将该全浮点模型转换为5个纯比特模型,分别是纯1比特模型、纯2比特模型、纯4比特模型、纯8比特模型、纯16比特模型。需要说明的是,上述转换得到的5个纯比特模型为示例性说明,在另一实施例中,待量化模型还可以转换的比特数为3、5-7、9-16、17-32等的纯比特模型,这里以比特数为1、2、4、8、16为例来说明。
S12:通过移动终端运行M个纯比特模型,获取该M个纯比特模型中每一层层结构的N个评价参数,该N个评价参数包括推理时间和/或参数量。
在S12的一种具体实现中,计算设备可以将M个纯比特模型发送至移动终端,移动终端可以对M个纯比特模型进行基准测试(benchmark),例如将纯比特模型输入到模型性能评估器,以得到该纯比特模型中每一层层结构的推理时间、参数量。进一步地,移动终端可以将测试得到的M个纯比特模型中每一层层结构的推理时间、参数量发送至计算设备。应理解,对于结构相同的两个层结构,若它们的模型参数的比特数不同,则该它们的推理时间一般不同,比特数更大的层结构具有更长的推理时间。应理解,基准测试是一种测试代码性能的方法,模型性能评估器是用于测试深度神经模型中每一层层结构的推理时间和/或参数量的算法/程序。
通过上述步骤S12可以得到待量化模型中每一层层结构的模型参数分别处于不同比特数时的推理时间,以及每一层层结构的参数量,也就是说可以得到层结构F i,m的推理时间、参数量,其中,1≤i≤H,m∈比特数合。其中,H为待量化模型/纯比特模型中层结构的总层数;比特数集合包括M个数值,分别为M个纯比特模型中模型参数的比特数,在图3A-图3C所示的实例中,比特数集合可以为{1,2,4,8,16}。
在另一些实施例中,在待量化模型中每一层层结构的参数量也可以由计算设备来获取。
应理解,应理解,参数量用于指示模型或层结构中模型参数的数据量。待量化模型在 进行数据类型的转换前后,各个层结构的模型参数的个数不变,但各个模型参数的位数发生改变,可以理解,层结构的参数量不仅与该层结构中模型参数的个数有关,也与模型参数的位数有关,模型参数的个数越多,模型参数的位数越大,则该层结构的参数量越大。也就是说,针对同一层结构,比特数越低的纯比特模型中,该层结构的参数量越小。
S13:从候选集中选择一个候选模型。
应理解,本申请中候选集中的模型也称为候选模型,候选集初始时可以包括一个候选模型,该候选模型可以是M个纯比特模型中比特数最高的纯比特模型,此时,选择出的模型即为该比特数最高的纯比特模型。应理解,候选集中候选模型是与待量化模型的网络结构相同的混合比特模型。在另一种实现中,候选集初始时可以包括一个或多个混合比特模型,该混合比特模型的网络结构与待量化模型的网络结构相同。
候选集中的候选模型在模型搜索的过程中不断变化,在每次模型搜索过程中被选择出的模型会从候选集中去除,而每次模型搜索过程中进行低比特量化得到的模型会添加至候选集。
计算设备从候选集中选择一个候选模型的可以包括但不限于如下三种实现方式:
第一实现方式:计算设备从候选集中随机选择一个候选模型。
计算设备可以根据候选集中候选模型的准确率来选择模型。候选集中的候选模型都是未经过训练和测试的模型,为减少模型训练和测试占用的计算资源、时间等,本申请实施例,可以通过少量样本(本申请中也称为第二数据集)对候选模型进行轻量化的训练和测试,即通过第二数据集对候选集中每一个候选模型分别进行训练、测试,得到训练后的各个候选模型的准确率,为与通过大量样本(本申请中也称第一数据集)对候选模型进行训练和测试得到的准确率进行区别,这里将通过少量样本成为测验准确率。应理解,第一数据集中样本的个数大于第二数据集中样本的个数,第二数据集可以是第一数据集中的部分样本。
第二实现方式:如图3B所示,计算设备可以基于候选模型的测验准确率和候选模型的权重综合选择候选模型。其中,在一种实现中,候选模型的权重与该候选模型被添加至候选集时模型搜索的总次数有关,可选地,候选模型的权重可以根据该候选模型被添加至候选集时模型搜索的总次数和当前模型搜索的总次数确定。例如,候选模型被添加至候选集时模型搜索的总次数与当前模型搜索的总次数之差越小,则该候选模型的权重越高,进而使得最近一次模型搜索过程中量化得到候选模型被选中的概率越大。
例如,在第3次模型搜索过程中,通过对选中的候选模型进行量化得到了第一候选模型,该第一候选模型在第3次模型搜索过程中被添加至候选集(即为,第一候选模型被添加至候选集时模型搜索的总次数为3),那么,在第4次模型搜索过程中,由于该第一候选模型是最近一次模型搜索过程中被添加至候选集中的,为该第一候选模型设置相对于第3次模型搜索之间已经被添加至候选集中的候选模型更大的权重,使得该第一候选模型被优先选中。然而,若该第一候选模型在第4次模型搜索过程中未被选中,那么随着模型搜索的进行,第一候选模型的权重越来越小,例如,在第8次模型搜索过程中,该第一候选模型的权重小于第5、6、7次模型搜索得到的候选模型的权重,然而,在该第一候选模型的测验准确率高于第5、6、7次模型搜索得到的候选模型的情况下,保留该第一候选模型被 选中的情况,该选择候选模型的方法可以在优先选择测验准确率高的候选模型的基础上尽量选取最近模型搜索得到的候选模型。
在另一种的实现,候选集中第j个候选模型被选择的概率/权重Q j可以表示为:
Q j=softmax(w jA j)    (1)
其中,A j为第j个候选模型的测验准确率,w j为第j个候选模型的权重。
为增加区分各个候选模型被选择的概率/权重Q j,计算设备可以对各个候选模型被选择的概率/权重Q j进行锐化处理,例如,候选集中第j个候选模型被选择的概率/权重Q j通过Sharpen算法进行处理,得到处理后的概率/权重D j,可以表示为:
Figure PCTCN2020130043-appb-000008
其中,C为常数,j为正整数,j≤S,S为候选集中候选模型的总数。
第三实现方式:计算设备也可以选择候选模型中最高的测验准确率对应的候选模型。
在一种的实现,候选集中第j个候选模型被选择的概率/权重Q j可以表示为:
Q j=softmax(A j)    (3)
其中,A j为第j个候选模型的测验准确率。
同第二种实现方式,可以对候选集中第j个候选模型被选择的概率/权重Q j通过Sharpen算法进行处理,得到处理后的概率/权重D j,具体实现参见上述第二种实现方式,此处不在赘述。
在一些实施例中,以第k次模型搜索过程为例来说明,在进行第k次模型搜索之前,候选集中可以包括前k-1次模型搜索过程中低比特量化得到的至少一个模型。通过上述方法,基于准确率进行候选模型的选择,使得准确率更高的模型被选中,各个轮次得到的模型都有可能被选中,避免模型的搜索进入局部最优,提高搜索的模型的准确率。进一步地,在上述第二实现方式中,基于轮次设定的候选模型的权重,使得搜索轮次越接近本次搜索,则该搜索轮次得到的候选模型的概率/权重越大,从而使得,最近轮次得到的候选模型被优先选中。
在另一些实施例中,在进行第k次模型搜索之前,候选集中仅包括第k-1次模型搜索过程中低比特量化得到的至少一个模型。此时,模型的搜索过程中不考虑前k-2次模型搜索过程中低比特量化得到的模型,可以减少搜索空间,减少实验训练的工作量,加快模型搜索的进程。
需要说明的是,在不同轮次的搜索过程中,从候选集中选择一个候选模型的实现方式可以相同或不同。还需要说明的是,候选集中的候选模型被选择后,被选择的候选模型(即目标模型)从候选集中去除,以避免该目标模型多次被选中。
S14:通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和该目标模型的准确率。
其中,第一数据集包括多个样本,第一数据集可以划分为第一训练数据集和第一测试数据集,通过第一训练数据集对从候选集中选择出的候选模型进行训练,得到训练后的模 型(即本申请中的目标模型),进一步地,通过第一测试数据集对目标模型进行测试,以得到该目标模型的准确率。
S15:判断目标模型的N个评价参数和目标模型的准确率是否都满足要求,若目标模型的N个评价参数和准确率都满足要求,则执行S16;若目标模型的N个评价参数存在至少一个N个评价参数不满足要求且目标模型的准确率满足要求,可对该目标模型进行量化,即执行S17-20;若目标模型的准确率不满足要求,则可以重新执行S12,从候选集中选择新的目标模型。
其中,N个评价参数可以包括推理时间和/参数量。目标模型的推理时间可以根据在移动终端测试得到的M个纯比特模型中每一层层结构的推理时间计算得到;同理,目标模型的参数量也可以根据在移动终端测试得到的M个纯比特模型中每一层层结构的参数量计算得到。例如,待量化模型/目标模型包括H层层结构,目标模型的层结构分别表示为F 1,8、F 2,4、…、F i,8、…、F H,16。那么,目标模型的推理时间即为上述层结构F 1,8、F 2,4、…、F i,8、…、F H,16的推理时间之和;目标模型的参数量即为上述层结构F 1,8、F 2,4、…、F i,8、…、F H,16的参数量之和。
其中,判断目标模型的N个评价参数是否都满足要求的实现(一)可以是:判断目标模型的推理时间是否小于目标时间阈值以及判断目标模型的参数量是否小于目标参数量,若目标模型的推理时间小于目标时间阈值以且目标模型的参数量小于目标参数量,则目标模型的N个评价参数都满足要求;当目标模型的推理时间不小于目标时间阈值或目标模型的参数量不小于目标参数量时,则目标模型存在至少一个N个评价参数不满足要求。在另一实现中,N个评价参数可以包括推理时间或参数量,进而,判断目标模型的推理时间或参数量是否满足要求的具体实现可以参见上述实现(一)中相关描述,此处不再赘述。
其中,判断目标模型的准确率是否都满足要求的一种实现可以是判断目标模型的准确率大于目标阈值,如0.76、0.8、0.9等,如果是,则判断目标模型的准确率满足要求,否则,不满足要求。
应理解,上述目标推理时间、目标参数量、目标阈值可以是客户或用户设定的值,指示用户期望的目标模型达到的标准。
S16:输出该目标模型。
当目标模型的N个评价参数都满足要求且目标模型的准确率满足要求,说明当前的目标模型满足客户要求,此时可以输出该目标模型,进一步地,可以将该目标模型发送至客户设备或用户终端。
S17:根据M个纯比特模型中每一层层结构的N个评价参数获取目标模型中每一层层结构的N个评价参数。
其中,目标模型中每一层层结构的推理时间和参数量可以从上述步骤S12中得到各个层结构F i,m的推理时间、参数量,其中,1≤i≤H,m∈比特数集合。
S18:根据目标模型的网络结构和目标模型中每一层层结构的N个评价参数,确定目标模型中每一层层结构的量化权重。其中,量化权重用于目标模型中被低比特量化的层结构的选择,确定层结构的量化权重的具体实现可以参见下述目标模型中层结构i的量化权重P i的具体实现,此处不再赘述。
在步骤S18的一种实现方式中,若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量不大于目标参数量,则根据所述目标模型中的层结构i的推理时间和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;若所述目标模型的推理时间不大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的推理时间、所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重。
以确定目标模型中层结构i的量化权重P i为例来说明目标模型中各个层结构量化权重的确定方法,目标模型中层结构i的量化权重P i为:
P i=softmax{O i[αL i*f(T)+βR i*f(M)]}     (4)
Figure PCTCN2020130043-appb-000009
其中,α、β分别为推理时间和参数量的权重,可以是常数,如经验值;O i为所述层结构i的权重,L i为所述层结构i的推理时间或为所述层结构i的推理时间与所述目标模型的推理时间的比值;R i为所述层结构i的参数量或所述层结构i的参数量与所述目标模型的参数量的比值;T为所述目标模型的推理时间与所述目标推理时间的比值,M为所述目标模型的参数量与所述目标参数量的比值。
可选地,通常模型中越接近输入数据的层结构的精度对模型的准确度影响越大,为最大程度保持量化后的模型的精度,可以在目标模型的层结构的量化权重中考虑到层结构的位置,这一因素,通过设置层结构的权重,可以优先选择对模型的准确度影响小的层结构进行量化。层结构的权重与层结构在模型中的位置有关,可以是预先设定的值,通常靠近输入数据的层结构具有较小的权重,而靠近输出数据的层结构具有较大的权重。
在步骤S18的第二种实现方式中,也可以不考虑上述层结构的位置,不设置层结构的权重。此时,目标模型中层结构i的量化权重P i可以表示为:
P i=softmax[αL i*f(T)+βR i*f(M)]    (6)
其中,公式(6)中各个参数的含义可以参见上述步骤S18的第一种实现方式中相关描述,此处不再赘述。
可见,通过上述公式(4)/公式(6)、公式(5),可以得到目标模型中每一层层结构的量化权重,在目标模型的推理时间小于目标推理时间且目标模型的参数量不小于目标参数量时,则说明目标模型的推理时间满足要求但其参数量不满足要求,此时,f(T)=0,f(M)=M,目标模型中层结构i的量化权重P i主要考虑该层结构的参数量;在目标模型的推理时间不小于目标推理时间且目标模型的参数量小于目标参数量时,则说明目标模型的推理时间不满足要求但其参数量满足要求,此时,f(T)=T,f(M)=0,目标模型中的层结构i的量化权重P i主要考虑该层结构的推理时间;在目标模型的推理时间不小于目标推理时间且目标模型的参数量不小于目标参数量时,则说明目标模型的推理时间和参数量都不满足要求,此时f(T)=T,f(M)=M,目标模型中的层结构i的量化权重P i同时考虑该层结构的 推理时间和参数量。
S19:对目标模型中量化权重最大的层结构进行量化,得到量化后的模型。
这里,量化为低比特量化,低比特量化是指将目标模型中被选中的层结构的模型参数的比特数转换为比当前模型参数的比特数更低的比特数。具体实现包括:将所述目标模型中量化权重最大的层结构中模型参数分别转换由至少一个比特数表示的模型参数,所述至少一个比特数为比特数集合中比所述目标模型中量化权重最大的层结构的模型参数的当前比特数低的比特数,所述比特数集合包括M个数值,所述M个数值分别用于指示所述M个纯比特模型中模型参数的比特数。
例如,如图3C所示,比特数集合={1,2,4,8,16},目标模型(F 1,8、F 2,4、…、F i,8、…、F H,16)中被选中的层结构为F H,16,则可以将该层结构F H,16可以量化得到F H,1、F H,2、F H,4、F H,8得到的低比特量化后的模型分别是模型(F 1,8、F 2,4、…、F i,8、…、F H,1)、模型(F 1,8、F 2,4、…、F i,8、…、F H,2)、模型(F 1,8、F 2,4、…、F i,8、…、F H,4)和模型(F 1,8、F 2,4、…、F i,8、…、F H,8)。
可选地,在步骤S18之后,S19之前,计算设备还可以也可以确定该被选中的层结构为F H,16量化的比特数。例如,可以选择将该层结构F H,16可以量化得到F H,4、F H,8。此时,相对于上述图3C,可以减少添加到候选集中的量化后的模型,缩小模型的搜索空间,加速模型的搜索过程。
S20:将量化得到的模型添加至候选集中。
在一些实施例中,可以将上述量化得到的所有模型,例如模型(F 1,8、F 2,4、…、F i,8、…、F H,1)、模型(F 1,8、F 2,4、…、F i,8、…、F H,2)、模型(F 1,8、F 2,4、…、F i,8、…、F H,4)和模型(F 1,8、F 2,4、…、F i,8、…、F H,8)都添加至候选集合中。
在另一些实施例中,也可以将量化得到部分候选模型添加至候选集合中。
在计算设备执行步骤S20,对候选集进行更新后,可以重新执行上述步骤S13-S20中的部分或全部步骤,上述步骤S13-S20可以称为一次/轮模型搜索,在经过多次的模型搜索后,可以得到推理时间、参数量、准确率满足要求的目标模型。
上述图2所示的机器学习模型的搜索方法可以应用于多种场景。例如,图像的分类:
待量化模型可以是第一图像分类模型,该第一图像分类模型可以对输入的图像进行分类。第一数据集和第二数据集均包括多个图像,每一个图像均被标注其类别,例如,第一数据集中包括标注“菊花”的图像、标注“荷花”的图像、标注“麦子”的图像、标注“玉米”的图像、标注“牡丹花”的图像等标注了各种植物的图像,此时,第一图像分类模型可以识别出图像中植物的类别。第一图像分类模型为该第一数据集训练得到的全浮点模型。为了使得该第一图像分类模型可以被应用于终端,通过上述图2所示的机器学习模型的搜索方法对该第一图像分类模型进行量化,以得到推理时间、参数量都满足要求的目标模型(即第二图像分类模型),该第二图像分类模型也是一个图像分类模型,其为混合比特模型。第二图像分类模型和第一图像分类模型的模型结构相同,但模型参数的数据类型不同。
将上述第二图像分类模型应用于终端,可以实现对图像的类别,如图4所示,为本申请实施例涉及的一种图像识别方法,该方法由终端执行,该方法可以包括但不限于如下步 骤:
S41:获取待识别图像,其中,待识别图像可以是终端可以通过摄像头获取当前场景中
的图像。
S42:将待识别图像输入到第二图像识别模型,得到该待识别图像的类别。其中,第二图像识别模型是以第一图像分类模型作为待量化模型通过上述图2所示的机器学习模型的搜索方法输出的目标模型。
S43:输出该待识别图像的类别。
在一种实现中,终端可以将该待识别图像的类别添加到待识别图像中。例如,当前场景中包括牡丹花,则可以通过第二图像识别模型可以识别到该待识别图像的类别为“牡丹花”,此时,可以将文本“牡丹花”添加至待识别图像中。
通过图2所示的机器学习模型的搜索方法对该第一图像分类模型进行量化,得到的第二机器学习模型可以占用较少的内存和计算资源,具备较快的图像识别速度,可以在通过摄像头实施获取图像的过程中,识别该图像的类别,以向用户实时输出识别结果。
不限于上述场景,本申请实施例提供的机器学习模型的搜索方法可以对其他待量化模型进行处理,得到满足要求的目标模型,以应用于终端。
下面介绍本申请实施例涉及的装置、设备。
如图5所示,为本申请实施例提供的一种机器学习模型的搜索装置,该装置500可以是图1所示的系统中的计算设备12,该装置500可以包括但不限于如下功能单元:
生成模块510,用于根据待量化模型生成M个纯比特模型,其中,所述纯比特模型和所述待量化模型为网络结构相同的深度神经网络,M为大于1的正整数;
参数获取模块520,用于获取所述M个纯比特模型中每一层层结构的N个评价参数,所述M个纯比特模型中每一层层结构的N个评价参数是由移动终端在运行所述M个纯比特模型时测量得到的;
执行模块530,用于执行至少一次模型搜索,输出所述N个评价参数和所述准确率都满足要求的模型;
其中,所述执行模块530包括训练测试单元531、获取单元532、权重单元533、量化单元534和添加单元535,所述执行模块530执行所述模型搜索的过程时:
所述训练测试单元531用于通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和所述目标模型的准确率;所述候选集包括至少一个候选模型;所述候选模型是与所述待量化模型的网络结构相同的混合比特模型;所述第一数据集包括多个样本,用于训练和测试所述候选集中的候选模型;
所述获取单元532用于在所述目标模型的N个评价参数中存在至少一个评价参数不满足要求且所述目标模型的准确率大于目标阈值的情况下,根据所述M个纯比特模型中每一层层结构的N个评价参数获取所述目标模型中每一层层结构的N个评价参数;
所述权重单元533用于根据所述目标模型的网络结构和所述目标模型中每一层层结构的N个评价参数确定所述目标模型中每一层层结构的量化权重;
所述量化单元534用于对所述目标模型中量化权重最大的层结构进行量化;
所述添加单元535用于将量化得到的模型添加至所述候选集中。
在一种可能的实现中,所述N个评价参数包括推理时间和参数量,所述权重单元533具体用于:
若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量不大于目标参数量,则根据所述目标模型中的层结构i的推理时间和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;
若所述目标模型的推理时间不大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;
若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的推理时间、所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重。
在又一种可能的实现中,所述执行模块530还包括:
选择单元536,用于在所述训练测试单元执行所述通过第一数据集对从候选集中选择出的候选模型进行训练和测试之前,通过第二数据集对所述候选集中每一个候选模型进行训练和测试,得到所述候选集中每一个候选模型的测验准确率,所述第二数据集中的样本的数量小于所述第一数据集中的样本的数量;根据所述每一个候选模型的测验准确率和所述每一个候选模型的权重从所述候选集中选择一个候选模型。
在又一种可能的实现中,所述候选模型的权重是根据所述候选模型被添加到候选集时模型搜索的总次数和当前模型搜索的总次数确定的。
在又一种可能的实现中,所述量化单元534具体用于:
将所述目标模型中量化权重最大的层结构中模型参数分别转换由至少一个比特数表示的模型参数,所述至少一个比特数为比特数集合中比所述目标模型中量化权重最大的层结构的模型参数的当前比特数低的比特数,所述比特数集合包括M个数值,所述M个数值分别用于指示所述M个纯比特模型中模型参数的比特数。
在又一种可能的实现中,所述执行模块530还用于:
在所述目标模型的准确率小于所述目标阈值的情况下,从所述候选集中重新选择一个模型,执行所述模型搜索。
在又一种可能的实现中,所述N个评价参数包括推理时间,所述参数获取模块520具体用于:
将所述M个纯比特模型发送至移动终端,以使所述移动终端运行所述M个纯比特模型和测量所述M个纯比特模型中每一层层结构的推理时间;
接收所述移动终端发送的所述M个纯比特模型中每一层层结构的推理时间。
在又一种可能的实现中,在第一次模型搜索时,所述候选集包括所述M个纯比特模型中比特数最高的纯比特模型
需要说明的是,上述各个单元的具体实现可以参见上述方法实施例图2所示的机器学习模型的搜索方法中相关描述,此处不再赘述。
如图6所示为本申请实施例提供的一种图像识别装置,该装置600可以是图1所示的系统中的用户设备15,该装置600可以包括但不限于如下功能单元:
获取单元610,用于获取待识别图像;
识别单元620,用于将待识别图像输入到第二图像识别模型,得到该待识别图像的类别;
输出单元630,用于输出该待识别图像的类别。
其中,第二图像识别模型是以第一图像分类模型作为待量化模型通过上述图2所述的机器学习模型的搜索方法输出的目标模型。第一图像识别模型是训练后的可识别图像的类别的深度神经网络,第一图像识别模型为全浮点模型或混合模型。
可选地,待识别图像可以是终端可以通过摄像头获取当前场景中的图像。
需要说明的是,上述各个单元的具体实现可以参见上述方法实施例图4所示的图像识别方法中相关描述,此处不再赘述。
图7是本申请实施例提供的一种机器学习模型的搜索装置的硬件结构示意图。图7所示的神经网络的训练装置700(该装置700具体可以是一种计算机设备)可以包括存储器701、处理器702、通信接口703以及总线704。其中,存储器701、处理器702、通信接口703通过总线704实现彼此之间的通信连接。
存储器701可以是只读存储器(Read Only Memory,ROM),静态存储设备,动态存储设备或者随机存取存储器(Random Access Memory,RAM)。存储器701可以存储程序,当存储器701中存储的程序被处理器702执行时,处理器702和通信接口703用于执行本申请实施例的机器学习模型的搜索方法中的全部或部分步骤。
处理器702可以采用通用的中央处理器(Central Processing Unit,CPU),微处理器,应用专用集成电路(Application Specific Integrated Circuit,ASIC),图形处理器(graphics processing unit,GPU)或者一个或多个集成电路,用于执行相关程序,以实现本申请实施例的神经网络的训练装置中的单元所需执行的功能,或者执行本申请方法中的机器学习模型的搜索方法中的全部或部分步骤。
处理器702还可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,本申请的神经网络的训练方法的各个步骤可以通过处理器702中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器702还可以是通用处理器、数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器701,处理器702读取存储器701中的信息,结合其硬件完成本申请实施例的机器学习模型的搜索装置中包括的单元所需执行的功能,或者执行本申请方法实施例的机器学习模型的搜索方法中 的全部或部分步骤。
通信接口703使用例如但不限于收发器一类的收发装置,来实现装置700与其他设备或通信网络之间的通信。例如,可以通过通信接口703获取数据集(第一数据集和/或第二数据集、待量化模型)。
总线704可包括在装置700各个部件(例如,存储器701、处理器702、通信接口703)之间传送信息的通路。
应理解,机器学习模型的搜索装置500中的参数获取模块520可以相当于神经网络搜索装置700中的通信接口703,生成模块510、执行模块530可以相当于处理器702。
图8示出了终端800的结构示意图。
下面以终端800为例对实施例进行具体说明。应该理解的是,图1所示终端800仅是一个范例,并且终端800可以具有比图8中所示的更多的或者更少的部件,可以组合两个或多个的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
终端800可以包括:处理器810,外部存储器接口820,内部存储器821,通用串行总线(universal serial bus,USB)接口830,充电管理模块840,电源管理模块841,电池842,天线1,天线2,移动通信模块850,无线通信模块860,音频模块870,扬声器870A,受话器870B,麦克风870C,耳机接口870D,传感器模块880,按键890,马达891,指示器892,摄像头893,显示屏894,以及用户标识模块(subscriber identification module,SIM)卡接口895等。其中传感器模块880可以包括压力传感器880A,陀螺仪传感器880B,气压传感器880C,磁传感器880D,加速度传感器880E,距离传感器880F,接近光传感器880G,指纹传感器880H,温度传感器880J,触摸传感器880K,环境光传感器880L,骨传导传感器880M等。
可以理解的是,本发明实施例示意的结构并不构成对终端800的具体限定。在本申请另一些实施例中,终端800可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器810可以包括一个或多个处理单元,例如:处理器810可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
其中,控制器可以是终端800的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。
处理器810中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器810中的存储器为高速缓冲存储器。该存储器可以保存处理器810刚用过或循环使用的指令或数据。如果处理器810需要再次使用该指令或数据,可从所述存储器中直接调用。避免了 重复存取,减少了处理器810的等待时间,因而提高了系统的效率。
在一些实施例中,处理器810可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
I2C接口是一种双向同步串行总线,包括一根串行数据线(serial data line,SDA)和一根串行时钟线(derail clock line,SCL)。在一些实施例中,处理器810可以包含多组I2C总线。处理器810可以通过不同的I2C总线接口分别耦合触摸传感器880K,充电器,闪光灯,摄像头893等。例如:处理器810可以通过I2C接口耦合触摸传感器880K,使处理器810与触摸传感器880K通过I2C总线接口通信,实现终端800的触摸功能。
I2S接口可以用于音频通信。在一些实施例中,处理器810可以包含多组I2S总线。处理器810可以通过I2S总线与音频模块870耦合,实现处理器810与音频模块870之间的通信。在一些实施例中,音频模块870可以通过I2S接口向无线通信模块860传递音频信号,实现通过蓝牙耳机接听电话的功能。
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块870与无线通信模块860可以通过PCM总线接口耦合。在一些实施例中,音频模块870也可以通过PCM接口向无线通信模块860传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信。
UART接口是一种通用串行数据总线,用于异步通信。该总线可以为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器810与无线通信模块860。例如:处理器810通过UART接口与无线通信模块860中的蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块870可以通过UART接口向无线通信模块860传递音频信号,实现通过蓝牙耳机播放音乐的功能。
MIPI接口可以被用于连接处理器810与显示屏894,摄像头893等外围器件。MIPI接口包括摄像头串行接口(camera serial interface,CSI),显示屏串行接口(display serial interface,DSI)等。在一些实施例中,处理器810和摄像头893通过CSI接口通信,实现终端800的拍摄功能。处理器810和显示屏894通过DSI接口通信,实现终端800的显示功能。
GPIO接口可以通过软件配置。GPIO接口可以被配置为控制信号,也可被配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器810与摄像头893,显示屏894,无线通信模块860,音频模块870,传感器模块880等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口,MIPI接口等。
USB接口830是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口830可以用于连接充电器为终端800充电,也可以用于终端800与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。
可以理解的是,本发明实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端800的结构限定。在本申请另一些实施例中,终端800也可以采用上述实施例中不同的接口连接方式,或多种接口连接方式的组合。
充电管理模块840用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块840可以通过USB接口830接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块840可以通过终端800的无线充电线圈接收无线充电输入。充电管理模块840为电池842充电的同时,还可以通过电源管理模块841为电子设备供电。
电源管理模块841用于连接电池842,充电管理模块840与处理器810。电源管理模块841接收电池842和/或充电管理模块840的输入,为处理器810,内部存储器821,外部存储器,显示屏894,摄像头893,和无线通信模块860等供电。电源管理模块841还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块841也可以设置于处理器810中。在另一些实施例中,电源管理模块841和充电管理模块840也可以设置于同一个器件中。
终端800的无线通信功能可以通过天线1,天线2,移动通信模块850,无线通信模块860,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。终端800中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块850可以提供应用在终端800上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块850可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块850可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块850还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块850的至少部分功能模块可以被设置于处理器810中。在一些实施例中,移动通信模块850的至少部分功能模块可以与处理器810的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器870A,受话器870B等)输出声音信号,或通过显示屏894显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器810,与移动通信模块850或其他功能模块设置在同一个器件中。
无线通信模块860可以提供应用在终端800上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的 解决方案。无线通信模块860可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块860经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器810。无线通信模块860还可以从处理器810接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,终端800的天线1和移动通信模块850耦合,天线2和无线通信模块860耦合,使得终端800可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
终端800通过GPU,显示屏894,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏894和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器810可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏894用于显示图像,视频等。显示屏894包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,终端800可以包括1个或N个显示屏894,N为大于1的正整数。
终端800可以通过ISP,摄像头893,视频编解码器,GPU,显示屏894以及应用处理器等实现拍摄功能。
ISP用于处理摄像头893反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头893中。
摄像头893用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端800可以包括1个或N个摄像头893,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数 字信号。例如,当终端800在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器用于对数字视频压缩或解压缩。终端800可以支持一种或多种视频编解码器。这样,终端800可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
NPU为神经网络(neural-network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端800的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。
外部存储器接口820可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端800的存储能力。外部存储卡通过外部存储器接口820与处理器810通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器821可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器810通过运行存储在内部存储器821的指令,从而执行终端800的各种功能应用以及数据处理。内部存储器821可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端800使用过程中所创建的数据(比如音频数据,电话本等)等。此外,内部存储器821可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
终端800可以通过音频模块870,扬声器870A,受话器870B,麦克风870C,耳机接口870D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
音频模块870用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块870还可以用于对音频信号编码和解码。在一些实施例中,音频模块870可以设置于处理器810中,或将音频模块870的部分功能模块设置于处理器810中。
扬声器870A,也称“喇叭”,用于将音频电信号转换为声音信号。终端800可以通过扬声器870A收听音乐,或收听免提通话。
受话器870B,也称“听筒”,用于将音频电信号转换成声音信号。当终端800接听电话或语音信息时,可以通过将受话器870B靠近人耳接听语音。
麦克风870C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风870C发声,将声音信号输入到麦克风870C。终端800可以设置至少一个麦克风870C。在另一些实施例中,终端800可以设置两个麦克风870C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,终端800还可以设置三个,四个或更多麦克风870C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。
耳机接口870D用于连接有线耳机。耳机接口870D可以是USB接口830,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
压力传感器880A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器880A可以设置于显示屏894。压力传感器880A的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器880A,电极之间的电容改变。终端800根据电容的变化确定压力的强度。当有触摸操作作用于显示屏894,终端800根据压力传感器880A检测所述触摸操作强度。终端800也可以根据压力传感器880A的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。
陀螺仪传感器880B可以用于确定终端800的运动姿态。在一些实施例中,可以通过陀螺仪传感器880B确定终端800围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器880B可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器880B检测终端800抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端800的抖动,实现防抖。陀螺仪传感器880B还可以用于导航,体感游戏场景。
气压传感器880C用于测量气压。在一些实施例中,终端800通过气压传感器880C测得的气压值计算海拔高度,辅助定位和导航。
磁传感器880D包括霍尔传感器。终端800可以利用磁传感器880D检测翻盖皮套的开合。在一些实施例中,当终端800是翻盖机时,终端800可以根据磁传感器880D检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。
加速度传感器880E可检测终端800在各个方向上(一般为三轴)加速度的大小。当终端800静止时可检测出重力的大小及方向。还可以用于识别电子设备姿态,应用于横竖屏切换,计步器等应用。
距离传感器880F,用于测量距离。终端800可以通过红外或激光测量距离。在一些实施例中,拍摄场景,终端800可以利用距离传感器880F测距以实现快速对焦。
接近光传感器880G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。终端800通过发光二极管向外发射红外光。终端800使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端800附近有物体。当检测到不充分的反射光时,终端800可以确定终端800附近没有物体。终端800可以利用接近光传感器880G检测用户手持终端800贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器880G也可用于皮套模式,口袋模式自动解锁与锁屏。
环境光传感器880L用于感知环境光亮度。终端800可以根据感知的环境光亮度自适应调节显示屏894亮度。环境光传感器880L也可用于拍照时自动调节白平衡。环境光传感器880L还可以与接近光传感器880G配合,检测终端800是否在口袋里,以防误触。
指纹传感器880H用于采集指纹。终端800可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
温度传感器880J用于检测温度。在一些实施例中,终端800利用温度传感器880J检测的温度,执行温度处理策略。例如,当温度传感器880J上报的温度超过阈值,终端800执行降低位于温度传感器880J附近的处理器的性能,以便降低功耗实施热保护。在另一些实施例中,当温度低于另一阈值时,终端800对电池842加热,以避免低温导致终端800异常关机。在其他一些实施例中,当温度低于又一阈值时,终端800对电池842的输出电压执行升压,以避免低温导致的异常关机。
触摸传感器880K,也称“触控面板”。触摸传感器880K可以设置于显示屏894,由触摸传感器880K与显示屏894组成触摸屏,也称“触控屏”。触摸传感器880K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏894提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器880K也可以设置于终端800的表面,与显示屏894所处的位置不同。
骨传导传感器880M可以获取振动信号。在一些实施例中,骨传导传感器880M可以获取人体声部振动骨块的振动信号。骨传导传感器880M也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器880M也可以设置于耳机中,结合成骨传导耳机。音频模块870可以基于所述骨传导传感器880M获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器880M获取的血压跳动信号解析心率信息,实现心率检测功能。
按键890包括开机键,音量键等。按键890可以是机械按键。也可以是触摸式按键。终端800可以接收按键输入,产生与终端800的用户设置以及功能控制有关的键信号输入。
马达891可以产生振动提示。马达891可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏894不同区域的触摸操作,马达891也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。
指示器892可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。
SIM卡接口895用于连接SIM卡。SIM卡可以通过插入SIM卡接口895,或从SIM卡接口895拔出,实现和终端800的接触和分离。终端800可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口895可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口895可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口895也可以兼容不同类型的SIM卡。SIM卡接口895也可以兼容外部存储卡。终端800通过SIM卡和网络交互,实现通话以及数据通信等功能。在一些实施例中,终端800采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端800中,不能和终端800分离。
本申请实施例中,处理器14021读取存储器1401中的信息,结合其硬件完成本申请实施例的图像识别装置600中包括的单元所需执行的功能,或者执行本申请方法实施例的图像识别方法。
本申请实施例中,终端800可以通过摄像头893拍摄当前场景的图像,进而得到待识别图像。终端800可以通过显示器输出该待识别图像和/或待识别图像的类别。
上述图8所述的各个功能单元的具体实现可以参见上述图4所示的图像识别方法的实施例中相关描述,本申请实施例不再赘述。
本领域技术人员能够领会,结合本文公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
作为实例而非限制,此类计算机可读存储媒体可包括RAM、ROM、EEPROM、CD-ROM或其它光盘存储装置、磁盘存储装置或其它磁性存储装置、快闪存储器或可用来存储指令或数据结构的形式的所要程序代码并且可由计算机存取的任何其它媒体。并且,任何连接被恰当地称作计算机可读媒体。举例来说,如果使用同轴缆线、光纤缆线、双绞线、数字订户线(DSL)或例如红外线、无线电和微波等无线技术从网站、服务器或其它远程源传输指令,那么同轴缆线、光纤缆线、双绞线、DSL或例如红外线、无线电和微波等无线技术包含在媒体的定义中。但是,应理解,所述计算机可读存储媒体和数据存储媒体并不包括连接、载波、信号或其它暂时媒体,而是实际上针对于非暂时性有形存储媒体。如本文中所使用,磁盘和光盘包含压缩光盘(CD)、激光光盘、光学光盘、数字多功能光盘(DVD)和蓝光光盘,其中磁盘通常以磁性方式再现数据,而光盘利用激光以光学方式再现数据。以上各项的组合也应包含在计算机可读媒体的范围内。
可通过例如一或多个数字信号处理器(DSP)、通用微处理器、专用集成电路(ASIC)、现场可编程逻辑阵列(FPGA)或其它等效集成或离散逻辑电路等一或多个处理器来执行指令。因此,如本文中所使用的术语“处理器”可指前述结构或适合于实施本文中所描述的技术的任一其它结构中的任一者。另外,在一些方面中,本文中所描述的各种说明性逻辑框、模块、和步骤所描述的功能可以提供于经配置以用于编码和解码的专用硬件和/或软件模块内,或者并入在组合编解码器中。而且,所述技术可完全实施于一或多个电路或逻辑元件中。
本申请的技术可在各种各样的装置或设备中实施,包含无线手持机、集成电路(IC)或一组IC(例如,芯片组)。本申请中描述各种组件、模块或单元是为了强调用于执行所揭示的技术的装置的功能方面,但未必需要由不同硬件单元实现。实际上,如上文所描述,各种单元可结合合适的软件和/或固件组合在编码解码器硬件单元中,或者通过互操作硬件单元(包含如上文所描述的一或多个处理器)来提供。
以上所述,仅为本申请示例性的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应该以权利要求的保护范围为准。

Claims (18)

  1. 一种机器学习模型的搜索方法,其特征在于,包括:
    根据待量化模型生成M个纯比特模型,其中,所述纯比特模型和所述待量化模型为网络结构相同的深度神经网络,M为大于1的正整数;
    获取所述M个纯比特模型中每一层层结构的N个评价参数,所述M个纯比特模型中每一层层结构的N个评价参数是由移动终端在运行所述M个纯比特模型时测量得到的;
    执行至少一次模型搜索,输出所述N个评价参数和所述准确率都满足要求的模型;
    其中,所述模型搜索的过程包括:
    通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和所述目标模型的准确率;所述候选集包括至少一个候选模型;所述候选模型是与所述待量化模型的网络结构相同的混合比特模型;所述第一数据集包括多个样本,用于训练和测试所述候选集中的候选模型;
    在所述目标模型的N个评价参数中存在至少一个评价参数不满足要求且所述目标模型的准确率大于目标阈值的情况下,根据所述M个纯比特模型中每一层层结构的N个评价参数获取所述目标模型中每一层层结构的N个评价参数,根据所述目标模型的网络结构和所述目标模型中每一层层结构的N个评价参数确定所述目标模型中每一层层结构的量化权重,对所述目标模型中量化权重最大的层结构进行量化,将量化得到的模型添加至所述候选集中。
  2. 如权利要求1所述的方法,其特征在于,所述N个评价参数包括推理时间和参数量,所述根据所述目标模型的网络结构和所述目标模型中每一层层结构的N个评价参数,确定所述目标模型中每一层层结构的量化权重,具体包括:
    若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量不大于目标参数量,则根据所述目标模型中的层结构i的推理时间和所述层结构i的权重确定所述目标模型中的层结构i的量化权重,i为目标模型中层结构的索引,i为正整数;
    若所述目标模型的推理时间不大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;
    若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的推理时间、所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重。
  3. 如权利要求1所述的方法,其特征在于,所述通过第一数据集对从候选集中选择出的候选模型进行训练和测试之前,所述模型搜索的过程还包括:
    通过第二数据集对所述候选集中每一个候选模型进行训练和测试,得到所述候选集中每一个候选模型的测验准确率,所述第二数据集中的样本的数量小于所述第一数据集中的样本的数量;
    根据所述每一个候选模型的测验准确率和所述每一个候选模型的权重从所述候选集中选择一个候选模型。
  4. 如权利要求3所述的方法,其特征在于,所述候选模型的权重是根据所述候选模型被添加到候选集时模型搜索的总次数和当前模型搜索的总次数确定的。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述对所述目标模型中量化权重最大的层结构进行量化,具体包括:
    将所述目标模型中量化权重最大的层结构中模型参数分别转换由至少一个比特数表示的模型参数,所述至少一个比特数为比特数集合中比所述目标模型中量化权重最大的层结构的模型参数的当前比特数低的比特数,所述比特数集合包括M个数值,所述M个数值分别用于指示所述M个纯比特模型中模型参数的比特数。
  6. 如权利要求1-5任一项所述的方法,其特征在于,所述模型搜索的过程还包括:
    在所述目标模型的准确率小于所述目标阈值的情况下,从所述候选集中重新选择一个模型,执行所述模型搜索。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述N个评价参数包括推理时间,所述获取所述M个纯比特模型中每一层层结构的N个评价参数,包括:
    将所述M个纯比特模型发送至移动终端,以使所述移动终端运行所述M个纯比特模型和测量所述M个纯比特模型中每一层层结构的推理时间;
    接收所述移动终端发送的所述M个纯比特模型中每一层层结构的推理时间。
  8. 如权利要求1-7任一项所述的方法,其特征在于,在第一次模型搜索时,所述候选集包括所述M个纯比特模型中比特数最高的纯比特模型。
  9. 一种机器学习模型的搜索装置,其特征在于,包括:
    生成模块,用于根据待量化模型生成M个纯比特模型,其中,所述纯比特模型和所述待量化模型为网络结构相同的深度神经网络,M为大于1的正整数;
    参数获取模块,用于获取所述M个纯比特模型中每一层层结构的N个评价参数,所述M个纯比特模型中每一层层结构的N个评价参数是由移动终端在运行所述M个纯比特模型时测量得到的;
    执行模块,用于执行至少一次模型搜索,输出所述N个评价参数和所述准确率都满足要求的模型;
    其中,所述执行模块包括训练测试单元、获取单元、权重单元、量化单元和添加单元,所述执行模块执行所述模型搜索的过程时:
    所述训练测试单元用于通过第一数据集对从候选集中选择出的候选模型进行训练和测试,得到目标模型和所述目标模型的准确率;所述候选集包括至少一个候选模型;所述候 选模型是与所述待量化模型的网络结构相同的混合比特模型;所述第一数据集包括多个样本,用于训练和测试所述候选集中的候选模型;
    所述获取单元用于在所述目标模型的N个评价参数中存在至少一个评价参数不满足要求且所述目标模型的准确率大于目标阈值的情况下,根据所述M个纯比特模型中每一层层结构的N个评价参数获取所述目标模型中每一层层结构的N个评价参数;
    所述权重单元用于根据所述目标模型的网络结构和所述目标模型中每一层层结构的N个评价参数确定所述目标模型中每一层层结构的量化权重;
    所述量化单元用于对所述目标模型中量化权重最大的层结构进行量化;
    所述添加单元用于将量化得到的模型添加至所述候选集中。
  10. 如权利要求9所述的装置,其特征在于,所述N个评价参数包括推理时间和参数量,所述权重单元具体用于:
    若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量不大于目标参数量,则根据所述目标模型中的层结构i的推理时间和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;
    若所述目标模型的推理时间不大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重;
    若所述目标模型的推理时间大于目标推理时间且所述目标模型的参数量大于目标参数量,则根据所述目标模型中的层结构i的推理时间、所述目标模型中的层结构i的参数量和所述层结构i的权重确定所述目标模型中的层结构i的量化权重。
  11. 如权利要求9所述的装置,其特征在于,所述执行模块还包括:
    选择单元,用于在所述训练测试单元执行所述通过第一数据集对从候选集中选择出的候选模型进行训练和测试之前,通过第二数据集对所述候选集中每一个候选模型进行训练和测试,得到所述候选集中每一个候选模型的测验准确率,所述第二数据集中的样本的数量小于所述第一数据集中的样本的数量;根据所述每一个候选模型的测验准确率和所述每一个候选模型的权重从所述候选集中选择一个候选模型。
  12. 如权利要求11所述的装置,其特征在于,所述候选模型的权重是根据所述候选模型被添加到候选集时模型搜索的总次数和当前模型搜索的总次数确定的。
  13. 如权利要求9-12任一项所述的装置,其特征在于,所述量化单元具体用于:
    将所述目标模型中量化权重最大的层结构中模型参数分别转换由至少一个比特数表示的模型参数,所述至少一个比特数为比特数集合中比所述目标模型中量化权重最大的层结构的模型参数的当前比特数低的比特数,所述比特数集合包括M个数值,所述M个数值分别用于指示所述M个纯比特模型中模型参数的比特数。
  14. 如权利要求9-13任一项所述的装置,其特征在于,所述执行模块还用于:
    在所述目标模型的准确率小于所述目标阈值的情况下,从所述候选集中重新选择一个模型,执行所述模型搜索。
  15. 如权利要求9-14任一项所述的装置,其特征在于,所述N个评价参数包括推理时间,所述参数获取模块具体用于:
    将所述M个纯比特模型发送至移动终端,以使所述移动终端运行所述M个纯比特模型和测量所述M个纯比特模型中每一层层结构的推理时间;
    接收所述移动终端发送的所述M个纯比特模型中每一层层结构的推理时间。
  16. 如权利要求9-15任一项所述的装置,其特征在于,在第一次模型搜索时,所述候选集包括所述M个纯比特模型中比特数最高的纯比特模型。
  17. 一种机器学习模型的搜索装置,其特征在于,包括处理器和存储器,所述存储器用于存储程序,所述处理器执行所述存储器存储的程序,当所述存储器存储的程序被执行时,使得所述机器学习模型的搜索装置实现如权利要求1-8任一项所述的方法。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读介质用于存储有计算机可执行指令,所述计算机可执行指令在被所述计算机调用时用于使所述计算机实现如权利要求1-8任一项所述的方法。
PCT/CN2020/130043 2019-12-31 2020-11-19 机器学习模型的搜索方法及相关装置、设备 WO2021135707A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/758,166 US20230042397A1 (en) 2019-12-31 2020-11-19 Machine learning model search method, related apparatus, and device
EP20909071.1A EP4068169A4 (en) 2019-12-31 2020-11-19 SEARCH METHOD FOR MACHINE LEARNING MODEL, AND ASSOCIATED APPARATUS AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911419960.0A CN111178546B (zh) 2019-12-31 2019-12-31 机器学习模型的搜索方法及相关装置、设备
CN201911419960.0 2019-12-31

Publications (1)

Publication Number Publication Date
WO2021135707A1 true WO2021135707A1 (zh) 2021-07-08

Family

ID=70652434

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/130043 WO2021135707A1 (zh) 2019-12-31 2020-11-19 机器学习模型的搜索方法及相关装置、设备

Country Status (4)

Country Link
US (1) US20230042397A1 (zh)
EP (1) EP4068169A4 (zh)
CN (1) CN111178546B (zh)
WO (1) WO2021135707A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178546B (zh) * 2019-12-31 2023-05-23 华为技术有限公司 机器学习模型的搜索方法及相关装置、设备
CN111667055A (zh) * 2020-06-05 2020-09-15 北京百度网讯科技有限公司 用于搜索模型结构的方法和装置
CN111667056B (zh) * 2020-06-05 2023-09-26 北京百度网讯科技有限公司 用于搜索模型结构的方法和装置
CN113779366B (zh) * 2020-06-10 2023-06-27 北京超星未来科技有限公司 用于自动驾驶的神经网络架构自动优化部署方法及装置
CN111860472A (zh) * 2020-09-24 2020-10-30 成都索贝数码科技股份有限公司 电视台标检测方法、系统、计算机设备及存储介质
US20220188663A1 (en) * 2020-12-10 2022-06-16 International Business Machines Corporation Automated machine learning model selection
CN114004334A (zh) * 2021-10-28 2022-02-01 中兴通讯股份有限公司 模型压缩方法、模型压缩系统、服务器及存储介质
WO2024103352A1 (zh) * 2022-11-17 2024-05-23 华为技术有限公司 一种通信方法、装置及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734266A (zh) * 2017-04-21 2018-11-02 展讯通信(上海)有限公司 深度神经网络模型的压缩方法及装置、终端、存储介质
CN109190754A (zh) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 量化模型生成方法、装置和电子设备
US20190347550A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN111178546A (zh) * 2019-12-31 2020-05-19 华为技术有限公司 机器学习模型的搜索方法及相关装置、设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11222263B2 (en) * 2016-07-28 2022-01-11 Samsung Electronics Co., Ltd. Neural network method and apparatus
US10210860B1 (en) * 2018-07-27 2019-02-19 Deepgram, Inc. Augmented generalized deep learning with special vocabulary
CN109840589B (zh) * 2019-01-25 2021-09-24 深兰人工智能芯片研究院(江苏)有限公司 一种在fpga上运行卷积神经网络的方法和装置
CN110222606B (zh) * 2019-05-24 2022-09-06 电子科技大学 基于树搜索极限学习机的电子系统早期故障预测方法
CN110222821B (zh) * 2019-05-30 2022-03-25 浙江大学 基于权重分布的卷积神经网络低位宽量化方法
CN110598763A (zh) * 2019-08-27 2019-12-20 南京云计趟信息技术有限公司 一种图像识别方法、装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108734266A (zh) * 2017-04-21 2018-11-02 展讯通信(上海)有限公司 深度神经网络模型的压缩方法及装置、终端、存储介质
US20190347550A1 (en) * 2018-05-14 2019-11-14 Samsung Electronics Co., Ltd. Method and apparatus with neural network parameter quantization
CN110555508A (zh) * 2018-05-31 2019-12-10 北京深鉴智能科技有限公司 人工神经网络调整方法和装置
CN109190754A (zh) * 2018-08-30 2019-01-11 北京地平线机器人技术研发有限公司 量化模型生成方法、装置和电子设备
CN111178546A (zh) * 2019-12-31 2020-05-19 华为技术有限公司 机器学习模型的搜索方法及相关装置、设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4068169A4

Also Published As

Publication number Publication date
EP4068169A4 (en) 2023-01-25
CN111178546A (zh) 2020-05-19
EP4068169A1 (en) 2022-10-05
US20230042397A1 (en) 2023-02-09
CN111178546B (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2021135707A1 (zh) 机器学习模型的搜索方法及相关装置、设备
WO2020238775A1 (zh) 一种场景识别方法、一种场景识别装置及一种电子设备
CN111563466B (zh) 人脸检测方法及相关产品
CN114242037A (zh) 一种虚拟人物生成方法及其装置
WO2022007895A1 (zh) 图像帧的超分辨率实现方法和装置
CN110557740A (zh) 一种电子设备控制方法及一种电子设备
WO2020042112A1 (zh) 一种终端对ai任务支持能力的评测方法及终端
CN113838490A (zh) 视频合成方法、装置、电子设备及存储介质
CN112651510A (zh) 模型更新方法、工作节点及模型更新系统
WO2022022319A1 (zh) 一种图像处理方法、电子设备、图像处理系统及芯片系统
CN114880251A (zh) 存储单元的访问方法、访问装置和终端设备
CN114444705A (zh) 模型更新方法及装置
CN113468929A (zh) 运动状态识别方法、装置、电子设备和存储介质
WO2022007757A1 (zh) 跨设备声纹注册方法、电子设备及存储介质
CN114547616A (zh) 检测垃圾软件的方法、装置及电子设备
CN115546248A (zh) 事件数据处理方法、装置和系统
CN113099734B (zh) 天线切换方法及装置
CN115393676A (zh) 手势控制优化方法、装置、终端和存储介质
CN115480250A (zh) 语音识别方法、装置、电子设备及存储介质
CN114466238A (zh) 帧解复用方法、电子设备及存储介质
CN114238554A (zh) 一种文本标注提取方法
WO2022156654A1 (zh) 一种文本数据处理方法及装置
CN114064571A (zh) 一种确定文件存储位置的方法、装置及终端
CN112885328B (zh) 一种文本数据处理方法及装置
CN115600653B (zh) 神经网络模型的部署方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20909071

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020909071

Country of ref document: EP

Effective date: 20220627

NENP Non-entry into the national phase

Ref country code: DE