CN111767833A

CN111767833A - Model generation method and device, electronic equipment and storage medium

Info

Publication number: CN111767833A
Application number: CN202010599118.6A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-13

Abstract

The application discloses a model generation method and device, electronic equipment and a storage medium, relates to the fields of deep learning, cloud computing and computer vision in artificial intelligence, and is particularly used for the aspect of face detection of a mask. The specific implementation scheme is as follows: acquiring a first model; executing N times of iterative operation search to obtain a target model; wherein N is an integer greater than or equal to 2; wherein an ith iteration operation of the N iterations operations comprises: determining an ith quantization strategy based on a quantization search space and an ith quantization code generator, and performing an ith iteration operation on a second model based on the ith quantization strategy and the first model; wherein the network complexity of the second model is lower than the network complexity of the first model; and if the number of times of the iterative operation reaches a preset number threshold N, taking a quantized model of the second model obtained in the ith iterative operation as the target model.

Description

Model generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technology. The application especially relates to in the artificial intelligence degree of depth study, cloud calculate and computer vision field, specifically is used for wearing gauze mask face detection aspect.

Background

In the related art, the face recognition model is widely used, however, the traditional face recognition model cannot solve the face recognition of a mask wearing scene, and even if a data training model with a mask is used, the model is limited in the face recognition capability of the mask wearing scene due to the fact that the model lacks pertinence to the mask scene. In order to improve the face recognition capability of the model for the scene with the mask, a large model structure is needed, but the ultra-large model is difficult to meet the requirement of real-time face recognition with the mask. Therefore, how to process the model can meet both real-time requirements and certain precision requirements becomes a problem to be solved.

Disclosure of Invention

The disclosure provides a model generation method, a model generation device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a model generation method including:

acquiring a first model;

executing N times of iterative operation search to obtain a target model; wherein N is an integer greater than or equal to 2; the quantization strategies adopted by different iteration operations in the N times of iteration operations are different;

wherein an ith iteration operation of the N iterations operations comprises:

determining an ith quantization strategy based on a quantization search space and an ith quantization code generator, and performing an ith iteration operation on a second model based on the ith quantization strategy and the first model; wherein i is an integer of 1 or more and N or less; wherein the network complexity of the second model is lower than the network complexity of the first model;

and if the number of times of the iterative operation reaches a preset number threshold N, taking a quantized model of the second model obtained in the ith iterative operation as the target model.

According to another aspect of the present disclosure, there is provided a model generation apparatus including:

an obtaining module, configured to obtain a first model;

the model generation module is used for executing N times of iterative operation search to obtain a target model; wherein N is an integer greater than or equal to 2; the quantization strategies adopted by different iteration operations in the N times of iteration operations are different;

wherein the model generation module is specifically configured to determine an ith quantization strategy based on the quantization search space and the ith quantization code generator, and perform an ith iteration on the second model based on the ith quantization strategy and the first model; wherein i is an integer of 1 or more and N or less; wherein the network complexity of the second model is lower than the network complexity of the first model; and if the number of times of the iterative operation reaches a preset number threshold N, taking a quantized model of the second model obtained in the ith iterative operation as the target model.

According to an aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aforementioned method.

According to an aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the aforementioned method.

According to the technology of the application, the second model can be distilled based on the first model, and a better quantization strategy can be obtained by searching from a quantization search space based on multiple iteration operations to train the second model, so that a quantized target model with high enough precision and small enough can be obtained, and the target model can meet the requirements of precision and small enough, so that the method is more suitable for scenes needing real-time processing.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a first flowchart illustrating a method for generating a model according to an embodiment of the present disclosure;

FIG. 2 is a second flowchart illustrating a method for generating a model according to an embodiment of the present application;

FIG. 3 is a first schematic diagram of a model generation apparatus according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a second exemplary composition structure of a model generation apparatus according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device for implementing the model generation method of the embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An embodiment of the present application provides a model generation method, as shown in fig. 1, including:

s101: acquiring a first model;

s102: obtaining a target model by executing N times of iterative operation search; wherein N is an integer greater than or equal to 2; the quantization strategies adopted by different iteration operations in the N times of iteration operations are different;

wherein the obtaining of the target model by performing the search of the N iterative operations includes:

determining an ith quantization strategy based on a quantization search space and an ith quantization code generator, performing an ith iterative operation on the second model based on the ith quantization strategy and the first model; wherein i is an integer of 1 or more and N or less; wherein the network complexity of the second model is lower than the network complexity of the first model;

The scheme provided by this embodiment may be applied to an electronic device, for example, a server or a terminal device, which is not limited herein.

The first model obtained in S101 may be a model that has been trained in advance, and in the process of executing S101-S102, parameters of the first model are all fixed parameters, that is, the first model is used as a teacher model, and the first model does not change any more. The first model may be a mask face recognition model.

Specifically, the input of the trained first model may be a face image with a part of the area in an occlusion state, and the output of the trained first model may be a recognition result, for example, a face image specifically recognized as an occlusion-free state, or may be related information of a recognized target person, and may be a tag for representing identity information of the target person, and the like.

The training of the first model may include: the first model is trained using mask data to obtain a converged first model. Wherein, the mask data may include: at least one group of face images wearing the mask and corresponding face images not wearing the mask; or, the image of the face of at least one set of wearing mask and the corresponding label information (the label information may be related information of a person, etc.). The method for training the first model is not described in detail in this embodiment.

In addition, before performing the aforementioned S101 and S102, it is also necessary to prepare a second model, a quantization search space, and an initialized quantization encoding generator.

For example, the method may further include: preparing a second model; setting a quantized search space based on the second model; a quantization code generator is designed based on the quantization search space and initialized.

Here, the second model is a model whose network complexity is lower than that of the first model (i.e., the teacher model). The network complexity is low and can be: the number of parameters of the network, the calculated amount of the network and the layer number of the network are smaller relative to the teacher model.

The second model may be a structure of a neural network model for performing a deep learning task. Here, the deep learning task may be an information processing task that is completed using a deep neural network. In practice, the deep learning task may be, for example: image recognition, speech synthesis, text translation, natural language understanding, image processing, trend prediction, target detection and tracking, and the like. In this embodiment, the second model is preset mainly for a scene of image recognition.

At least one quantization strategy may be included in the quantization search space.

In some optional implementations of the present embodiment, the corresponding model structure search space may be constructed in advance for different quantization strategies.

Here, the quantization strategy may specify a quantization method used by each network structure unit (e.g., each layer) of the neural network model, and the quantization method may include a quantization bit width and may further include a mathematical conversion method used for converting a parameter into data of a corresponding quantization bit width. That is, the quantization strategy may include a quantization method adopted by each network structure unit stacked to form the neural network model. For example, for some specified low quantization bit widths, the corresponding model structure search space does not contain network structural elements or layer structures that have high requirements on accuracy, so that the search space can be constrained with respect to the quantization strategy.

The initialized quantization code generator may be a quantization code generator preset according to actual conditions and capable of decoding to obtain a certain quantization strategy. This embodiment is not limited thereto.

Description is made with respect to S102: the obtaining of the target model by performing N iterative operation searches includes:

In addition, the method can also comprise the following steps: if the number of times of the iterative operation reaches the threshold value N of times which is not preset, quantizing a second model obtained by the ith iterative operation; performing performance evaluation on the quantized second model to obtain an evaluation result; and adjusting the quantization code generator based on the evaluation result to obtain the (i + 1) th quantization code generator, and performing subsequent processing by using the (i + 1) th quantization code generator in the (i + 1) th iteration processing, which is not repeated.

That is, in each iteration operation, processing needs to be performed based on the quantization strategy corresponding to the current iteration processing; in addition, in each iteration operation, the method also comprises a plurality of times of updating treatment of distilling the second model based on the first model and the current quantization strategy; furthermore, in one iteration operation, the second model obtained after distilling the second model based on the first model is the second model obtained in the current iteration operation.

Firstly, how to adjust or generate the quantization code generator and how to generate the quantization strategy of the current iteration processing based on the quantization code generator are explained:

the ith quantization code generator is an initialized quantization code generator for when the i is equal to 1.

That is, for the 1 st pass, the 1 st quantization strategy is determined directly from the quantization search space and the initialized quantization code generator.

Specifically, a quantization code is generated based on the 1 st quantization code generator (initialized quantization code generator); decoding the quantized code based on the quantized search space resulting in a 1 st quantization strategy.

The quantization code generator may be for generating a code; the quantization strategy currently to be used can be selected from the quantization strategies contained in the quantization search space on the basis of the coding. For example, if the code generated by the quantization code generator is 99, that is, the quantization search space is searched for the corresponding quantization strategy labeled 99 as the quantization strategy used this time. I.e. it is understood that at least one quantization strategy and its corresponding label are included in the quantized search space.

And if i is 1 and N is greater than 2, performing a 1 st iteration on the second model based on the 1 st quantization strategy and the first model, and then determining that the number of iterations does not reach a preset number threshold N, determining to perform the 2 nd iteration, that is, setting i to i +1, performing the 2 nd iteration, and re-performing the processing.

For when i is greater than 1, the method further comprises:

quantizing the second model obtained by the i-1 st iteration operation;

performing performance evaluation on the quantized second model to obtain an evaluation result;

and adjusting the quantization code generator based on the evaluation result to obtain the ith quantization code generator.

That is, if i is greater than 1, the quantization code generator is updated according to the result of the last iteration processing, and the quantization code generator to be used in the current iteration processing is obtained.

Specifically, the performance evaluation of the quantized second model may be: and performing performance evaluation on the quantized model by using the evaluation set.

The evaluation set may include annotation information of the sample data corresponding to the deep learning task, for example, the image data set includes an image sample and target category annotation information corresponding to the target recognition task, and so on, that is, the evaluation set may include at least one mask-worn face image and a corresponding label (or a corresponding mask-not-worn face image).

The performance evaluation of the quantized model may be: inputting the face image of the wearing mask with centralized evaluation into the quantized model to obtain an output recognition result; and judging the similarity between the output recognition result of the quantized model and the label corresponding to the evaluation set, and taking the similarity as the evaluation result of the performance.

For example, assuming that the similarity between the recognition result output by the model and the labels in the evaluation set is 80%, the evaluation result may be 80%.

Further, the evaluation result may be directly updated as a reward value (reward) to the quantization code generator.

It should be noted that, updating the quantization code generator based on the reward value may be understood as updating the quantization code generator based on the current reward value to search the quantization strategy to be used in the current iteration operation from the search space, so that the performance of the model finally obtained by the iteration process is better. The above-mentioned process of the quantization code generator may be understood as requiring the constraint condition that enables the performance of the model finally obtained by the iterative operation to be better satisfied, in other words, adjusting or updating the quantization code generator may be considered as adjusting the constraint condition of the search quantization strategy.

The determining an ith quantization strategy based on a quantization search space and an ith quantization code generator, comprising: generating a quantization code based on the i-th quantization code generator; and decoding the quantization coding based on the quantization search space to obtain an ith quantization strategy. That is, the quantization code generator may be used to generate a code; and decoding the code generated by the quantization code generator after the adjustment according to the quantization search space so as to determine the quantization strategy used by the iteration operation. The description of the quantization strategy is the same as that described above, and will not be repeated.

In the above-described method, the ith quantization strategy is determined based on the quantization search space and the ith quantization code generator, and the ith iteration operation is performed on the second model based on the ith quantization strategy and the first model, it can be understood that, in the ith iteration operation among N iteration operations, the ith quantization strategy is determined based on the quantization search space and the ith quantization code generator, and the second model is trained based on the ith quantization strategy and the first model to obtain the result of the ith iteration operation, and the result of the ith iteration operation includes the second model obtained by the current training.

Based on the foregoing, the following describes how to train the second model based on the quantization strategy in the i-th iteration and the first model:

the present application, based on the aforementioned ith quantization strategy and the first model, performs an ith iteration on the second model, including:

carrying out forward propagation on the first model and the second model by adopting training data to obtain a first loss function;

quantizing the second model based on an ith quantization strategy to obtain a second loss function;

and updating the second model based on the first loss function and/or the second loss function until the updating times of the second model reach a preset threshold value, so as to obtain the second model of the ith iteration.

That is, in each iteration, the second model is updated multiple times based on the same quantization strategy, and each update operation may be the same as described above.

In addition, it should be noted that, each iteration operation may be performed on the basis of the initially preset second model for subsequent processing; further alternatively, for each iteration, the subsequent processing may be performed on the second model obtained based on the last iteration.

The updating of the second model based on the first loss function and/or the second loss function may specifically be updating of a parameter of the second model based on the first loss function and/or the second loss function; and then, based on the second model with the updated parameters, performing next processing on the second model by adopting the same ith quantization strategy and the first model again to obtain the second model with the updated parameters next time, and repeating the steps until the times of updating the parameters of the second model reach a preset threshold value to obtain the second model trained in the ith iteration operation.

The preset threshold value may be set according to actual conditions, for example, hundreds of times or even thousands of times.

The training data may be a data set, which may include annotation information of the sample data corresponding to the deep learning task, for example, the image data set includes image samples and object class annotation information corresponding to the object recognition task, and so on.

In an example, the quantization of the second model based on the ith quantization strategy may be parameter quantization, and the obtaining of the ith quantization strategy is described in the foregoing embodiments and is not described here again. The parameter quantization is an important means of compressing the model. Typically, the parameters of the convolutional neural network model are floating point numbers of different lengths. Floating-point numbers can be converted to fixed-point numbers by floating-point number fixed-point. There are many methods for quantizing floating point numbers into fixed point numbers, such as LOG logarithmic transformation, sine function transformation, tangent function transformation, linear quantization transformation, and the like. For example, before parameter quantization, the parameter of the second model is a 32-bit floating point number, and by parameter quantization, the parameter of the second model is converted from the 32-bit floating point number to a fixed point number of 8-bit and stored, so that the second model can be compressed to one fourth of the original size. Therefore, the second model can be quantized, so that the calculation resources occupied by the model are greatly reduced, and the occupied memory is also greatly reduced.

Wherein the first loss function comprises:

a distillation loss function for characterizing a difference between the features extracted by the first model and the second model, respectively;

and/or the presence of a gas in the gas,

a task loss function for characterizing a difference between results of execution of the task by the first model and the second model.

The distillation loss function may characterize the difference between the output of the intermediate layer of the second model and the first model. For example, in the classification task, the soft object represents the difference between the class probability obtained by the student network and the class probability obtained by the teacher network, and can be represented by the cross entropy of the two.

For example, the distillation loss function may be constructed based on the difference between the features extracted by the second model and the last feature extraction layer (e.g., the last convolution layer or the last pooling layer) of the first model. Alternatively, the output of the last fully-connected layer of the first model and the output of the last fully-connected layer of the second model may be subjected to nonlinear processing using the same nonlinear function (e.g., softmax, sigmoid, etc.), and then the difference between the two (e.g., calculating the L2 norm characterizing the difference between the two) may be calculated as the distillation loss function.

Updating the second model based on the first loss function and/or the second loss function, several processing scenarios may exist:

in case 1, only the first loss function is used.

In this case, the first loss function may only include the above-mentioned distillation loss function, and the second model is updated for multiple times based on the distillation loss function until convergence or until the number of updates reaches a preset threshold value, so as to obtain the second model in the ith iteration.

Still alternatively, the first loss function may include only the loss function for the mission. And updating the second model for multiple times based on the task loss function until convergence or the updating times reach a preset threshold value to obtain the second model in the ith iteration operation.

The first loss function may further include a distillation loss function and a duty loss function, and in this case, the distillation loss function and the duty loss function may be added (for example, weighted sum) as the first loss function. And updating the second model for multiple times based on the first loss function until convergence or until the updating times reach a preset threshold value, so as to obtain the second model in the ith iteration operation.

Alternatively, the distillation loss function, the duty loss function, and the first loss function in which the distillation loss function and the duty loss function are superimposed may be used separately in different iterations. For example, the first iteration is processed by a distillation loss function; processing by adopting a task loss function in the second iteration processing; and in the third iteration treatment, a distillation loss function and a task loss function are superposed for treatment.

Of course, the description is only given as an example, and in the actual processing, the first loss function specifically adopted in the different iterations may be set (or configured) according to the actual situation, which is not exhaustive here.

In case 2, the distillation process is performed using only the second loss function, that is, using the second loss function in the update process of the second model a plurality of times based on only the quantitative loss function.

In case 3, the first loss function and the second loss function are used. For example, the first loss function may be a distillation loss function, or a duty loss function, or a superposition of a distillation loss function and a duty loss function.

In this case, the ith iteration may include a plurality of updates (or iterative trainings) of the second model, and each update (or iterative training) of the second model may be processed by performing superposition using at least one of the distillation loss function, the task loss function, and the quantization loss function obtained in the previous processing. The combinations are not exhaustive in this embodiment.

In this way, in one iteration (for example, in the ith iteration), the same quantization strategy is used to quantize the second model, and forward transmission is performed based on the first model and the second model, so that in the training process for the second model, the parameters of the second model can be iteratively adjusted based on the second model and the first model (i.e., the teacher model). Here, when the parameters of the second model are adjusted, if the number of updates to the second model in the current iterative training reaches a preset threshold value or the performance of the second model reaches a certain convergence condition, the training of the second model in the current (or the current, or referred to as ith) iterative operation may be stopped, so as to obtain the second model in the current (or the current, or referred to as ith) iterative operation.

If the accumulated number of iterative operations does not reach the preset number threshold, the next iterative operation is executed based on the updated quantization code generator (for example, i +1 may be set, and the ith iterative operation is executed). And repeatedly executing the iteration operation until the accumulated times of the iteration operation reach a preset time threshold, or stopping executing the iteration operation to finish searching if the performance evaluation result of the quantized second model after a certain iteration operation reaches a preset convergence condition, quantizing the second model obtained by the iteration operation based on the quantization strategy used by the last iteration operation, and taking the quantized second model as a target model.

According to the scheme, the second model can be distilled based on the first model, and a better quantization strategy can be obtained by searching from a quantization search space based on multiple iteration operations to train the second model, so that a quantized target model with high enough precision and small enough can be obtained, and the target model can meet the requirements of precision and small enough, so that the method is more suitable for scenes needing real-time processing.

The application provides a target model is applicable to gauze mask face identification scene, and the precision demand to the model is high very much in this kind of scene for during the epidemic situation or other wear gauze masks etc. shelter from under the scene, face identification model has the recognition capability that speed is very fast and the recognition result precision is higher equally.

With continuing reference to fig. 2, a flow chart of an embodiment of the model generation method of the present application applied in a mask face recognition scenario is shown, comprising the following steps:

s1, acquiring a second model; the second model can be a prepared mask face recognition model to be distilled and quantized.

And S2, setting the metric search space based on the second model. (where the quantized bits for each layer are searched, e.g., 2 bits, 4 bits, 8 bits).

And S3, determining a quantization code generator according to the quantization search space. For example, the quantization code generator can be used for designing a mask face recognition model.

And S4, initializing the quantization coding generator.

S5, acquiring a first model; the first model may be a teacher model prepared with sufficient accuracy for use in a distillation search space.

And S6, training the first model by adopting the training data to obtain a converged first model. For example, the first model may be trained using mask data to obtain a converged teacher model for use in the distillation search space.

S7, freezing the parameters of the converged first model.

Based on the processing of S4-S6 described above, a trained first model, that is, a trained teacher model for use in distilling search spaces, can be obtained. It should be understood that the above-mentioned processing of S5-S7 and the processing of S1-S4 may not be in sequence, and S5-S7 may be performed first, and then S1-S4 may be performed; or executing S1-S4 and then executing S5-S7; alternatively, S1-S4 and S5-S7 may also be performed simultaneously. Then S8 is executed.

And S8, generating the quantization code based on the quantization code generator. For example, the quantization code of the mask face recognition model (i.e., the second model) may be generated from the initialized quantization code generator in S4.

And S9, decoding the quantization coding into a quantization strategy according to the quantization search space. The quantization strategy can also be called a mask face recognition model quantization strategy.

The processes of S8 and S9 may be performed after S4, that is, S1-S4 and S8-S9 may be sequentially performed; as shown in FIG. 2, S5-S7 may be performed out of order with S1-S4 and S8-S9, and will not be described herein.

And S10, carrying out forward propagation on the second model and the first model by adopting the training data, extracting the characteristics and recording the characteristics and the loss function a. Wherein, the training data may be mask data, and the second model is a converged second model with fixed parameters obtained by training in the aforementioned S7; the loss function a may be a task loss function.

And S11, calculating the distance between the extracted features of the first model and the extracted features of the second model to obtain a loss function b. Where the distance between features may be in norm, for example l2 norm; the loss function b may be a distillation loss function.

And S12, quantizing the second model based on the quantization strategy, and then training the second model by using the training data to obtain the loss function c. The loss function c may be a quantization loss function, or a second loss function.

S13, superimpose the loss functions a, b, c and update the parameters of the second model.

And S14, judging whether the iteration times aiming at the same quantization strategy reach a time threshold value, if not, executing S10, otherwise, executing S15. In this step, the number of times of updating the model is determined, for example, it may be determined whether the number of times of updating the second model reaches a number threshold (or a preset threshold) for the same quantization strategy.

And S15, quantifying the performance of the second model by using the evaluation set evaluation quantification strategy to obtain an evaluation result.

And S16, updating the quantization coding generator by taking the evaluation result as reward (reward value).

And S17, judging whether the iteration number of the quantization code generator reaches a preset number threshold, if not, returning to S8, otherwise, executing S18.

And S18, outputting the quantized model of the searched second model, and taking the quantized model as a target model.

On the basis of the above flow, the embodiments provided by the present application may further include:

acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state;

and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

The specific scene can be the face image in the image which is the face image wearing the mask.

And then, based on the target model, recognizing the input face image to be recognized, and finally obtaining a corresponding face image as a recognition result. It should be noted that the final face recognition result does not include any occlusion region in the face.

It should be understood that the process of recognizing the face image provided by the present embodiment may be implemented in the same device as the process of generating the target model. Or, it may be implemented in different devices, for example, in a terminal device (such as a personal computer) or a server to generate the target model; in another terminal device (such as a mobile phone), the target model is used for recognition, and in this case, the method may further include: the mobile phone acquires a target model from a server (or another terminal device), and performs face recognition based on the target model.

In summary, the scheme provided by the application is applicable to the aspect of image recognition processing, and particularly can be used for recognizing any face image with a shielding area, especially the face image of a mask. The terminal carries out image processing, particularly in a scene of carrying out face recognition, and a recognition result can be efficiently and accurately obtained only by setting the target model at the terminal.

For example, in a scene, a mobile phone needs to be unlocked by face recognition, and then the face unlocking can be realized only by setting the target model with the higher running speed obtained by the application, and the face unlocking is executed on the mobile phone without setting a more complicated model.

An embodiment of the present invention further provides a model generating apparatus, as shown in fig. 3, including:

an obtaining module 31, configured to obtain a first model;

the model generation module 32 is configured to perform N iterative operations to search for a target model; wherein N is an integer greater than or equal to 2; the quantization strategies adopted by different iteration operations in the N times of iteration operations are different;

wherein the model generation module 32 is specifically configured for

The model generation module 32 for generating a quantization code based on the i-th quantization code generator; and decoding the quantization coding based on the quantization search space to obtain an ith quantization strategy.

The model generation module 32 is configured to determine that the ith quantization code generator is an initialized quantization code generator when i is equal to 1;

when i is larger than 1, quantizing a second model obtained by the i-1 th iteration operation based on the i-1 st quantization strategy; performing performance evaluation on the quantized second model to obtain an evaluation result; and adjusting the quantization code generator based on the evaluation result to obtain the ith quantization code generator.

The model generating module 32 is configured to perform forward propagation on the first model and the second model by using training data to obtain a first loss function; quantizing the second model based on an ith quantization strategy to obtain a second loss function; and updating the second model based on the first loss function and/or the second loss function until the updating times of the second model reach a preset threshold value, so as to obtain the second model of the ith iteration.

The first loss function includes:

a distillation loss function characterizing a difference between features extracted by the first model and the second model, respectively;

and/or the presence of a gas in the gas,

a task loss function characterizing differences between results of execution of the deep learning task by the first model and the second model, respectively.

As shown in fig. 4, the apparatus further includes:

the image recognition module 33 is used for acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state; and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device according to the model generation method of the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, a processor 801 is taken as an example.

The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the model generation methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the model generation method provided herein.

The memory 802, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the model generation methods in the embodiments of the present application (e.g., the acquisition module, the model generation module, the image recognition module shown in fig. 3 or 4). The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the model generation method in the above-described method embodiments.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the model generation method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A model generation method, comprising:

acquiring a first model;

wherein an ith iteration operation of the N iterations operations comprises:

2. The method of claim 1, wherein the determining an ith quantization strategy based on a quantization search space and an ith quantization code generator comprises:

generating a quantization code based on the i-th quantization code generator;

and decoding the quantization coding based on the quantization search space to obtain an ith quantization strategy.

3. The method of claim 2, wherein,

when i is equal to 1, the ith quantization code generator is an initialized quantization code generator;

when i is greater than 1, the method further comprises:

quantizing the second model obtained by the i-1 st iteration operation based on the i-1 st quantization strategy;

4. The method of claim 1, wherein the performing an ith iteration on a second model based on the ith quantization strategy and the first model comprises:

5. The method of claim 4, wherein the first loss function comprises: a distillation loss function characterizing a difference between features extracted by the first model and the second model, respectively;

and/or the presence of a gas in the gas,

6. The method of any of claims 1-5, wherein the method further comprises:

7. A model generation apparatus comprising:

an obtaining module, configured to obtain a first model;

wherein the model generation module is specifically used for

Determining an ith quantization strategy based on a quantization search space and an ith quantization code generator, and performing an ith iteration operation on a second model based on the ith quantization strategy and the first model; wherein i is an integer of 1 or more and N or less; wherein the network complexity of the second model is lower than the network complexity of the first model; and if the number of times of the iterative operation reaches a preset number threshold N, taking a quantized model of the second model obtained in the ith iterative operation as the target model.

8. The apparatus of claim 7, wherein the model generation module is configured to generate a quantization code based on the i-th quantization code generator; and decoding the quantization coding based on the quantization search space to obtain an ith quantization strategy.

9. The apparatus of claim 8, wherein the model generation module is configured to determine that the ith quantization code generator is an initialized quantization code generator when i equals 1; when i is larger than 1, quantizing a second model obtained by the i-1 th iteration operation based on the i-1 st quantization strategy; performing performance evaluation on the quantized second model to obtain an evaluation result; and adjusting the quantization code generator based on the evaluation result to obtain the ith quantization code generator.

10. The apparatus of claim 7, wherein the model generation module is configured to forward propagate the first model and the second model with training data to obtain a first loss function; quantizing the second model based on an ith quantization strategy to obtain a second loss function; and updating the second model based on the first loss function and/or the second loss function until the updating times of the second model reach a preset threshold value, so as to obtain the second model of the ith iteration.

11. The apparatus of claim 10, wherein the first loss function comprises:

and/or the presence of a gas in the gas,

12. The apparatus of any of claims 7-11, wherein the apparatus further comprises:

the image recognition module is used for acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state; and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.