CN111767832A

CN111767832A - Model generation method and device, electronic equipment and storage medium

Info

Publication number: CN111767832A
Application number: CN202010598998.5A
Authority: CN
Inventors: 希滕; 张刚; 温圣召
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-06-28
Filing date: 2020-06-28
Publication date: 2020-10-13

Abstract

The application discloses a model generation method and device, electronic equipment and a storage medium, relates to the fields of deep learning, cloud computing and computer vision in artificial intelligence, and is particularly used for the aspect of face detection of a mask. The specific implementation scheme is as follows: in the process of executing the ith iteration operation, an ith model to be trained is obtained based on an ith model code generator, and an ith delay penalty strategy is obtained based on an ith delay penalty code generator; wherein i is an integer greater than or equal to 1; and training the ith model to be trained to obtain a converged model, and determining whether the converged model is used as a target model obtained by searching.

Description

Model generation method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of computer technology. The application especially relates to in the artificial intelligence degree of depth study, cloud calculate and computer vision field, specifically is used for wearing gauze mask face detection aspect.

Background

In the related art, the face recognition model is widely used, however, the traditional face recognition model cannot solve the face recognition of a mask wearing scene, and even if a data training model with a mask is used, the model is limited in the face recognition capability of the mask wearing scene due to the fact that the model lacks pertinence to the mask scene. In order to improve the face recognition capability of the model for the scene with the mask, a large model structure is needed, but the ultra-large model is difficult to meet the requirement of real-time face recognition with the mask. Therefore, how to process the model can meet both real-time requirements and certain precision requirements becomes a problem to be solved.

Disclosure of Invention

The disclosure provides a model generation method, a model generation device, an electronic device and a storage medium.

According to an aspect of the present disclosure, there is provided a model generation method including:

in the process of executing the ith iteration operation, an ith model to be trained is obtained based on an ith model code generator, and an ith delay penalty strategy is obtained based on an ith delay penalty code generator; wherein i is an integer greater than or equal to 1;

training the ith model to be trained to obtain a converged model, and determining whether the converged model is used as a target model obtained by searching;

determining whether to use the converged model as a target model obtained by searching comprises the following steps:

if the accumulated times of the iterative operation reach a preset threshold value N, determining the converged model as a target model obtained by searching; wherein N is an integer greater than or equal to 2;

if the cumulative number of iterative operations does not reach a preset threshold value N, determining an ith individual performance reward value corresponding to the converged model, and determining an ith delay reward value of the converged model based on the ith delay punishment strategy; updating the ith model code generator and/or the ith delay penalty code generator based on the ith individual performance reward value and the ith delay reward value to obtain the (i + 1) th model code generator and/or the (i + 1) th delay penalty code generator.

According to another aspect of the present disclosure, there is provided a model generation apparatus including:

the acquisition module is used for acquiring an ith model to be trained based on an ith model code generator and acquiring an ith delay penalty strategy based on an ith delay penalty code generator in the process of executing the ith iteration operation; wherein i is an integer greater than or equal to 1;

the model generation module is used for training the ith model to be trained to obtain a converged model and determining whether the converged model is used as a target model obtained by searching;

wherein the model generation module is specifically used for

According to an aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the aforementioned method.

According to an aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the aforementioned method.

According to the technology of the application, the delay reward value of the model is determined by combining a delay punishment strategy in multiple iteration operations, and then the model can be searched by combining the delay condition in the next iteration operation, so that the target model with high accuracy and fast enough running can be finally obtained by increasing the delay limiting factor in the model searching process. In addition, the target model can meet the requirements of accuracy and small enough, so that the method is more suitable for scenes needing real-time processing.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a first flowchart illustrating a method for generating a model according to an embodiment of the present disclosure;

FIG. 2 is a second flowchart illustrating a method for generating a model according to an embodiment of the present application;

FIG. 3 is a first schematic diagram of a model generation apparatus according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of a second exemplary composition structure of a model generation apparatus according to an embodiment of the present disclosure;

FIG. 5 is a block diagram of an electronic device for implementing the model generation method of the embodiments of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An embodiment of the present application provides a model generation method, as shown in fig. 1, including:

s101: in the process of executing the ith iteration operation, an ith model to be trained is obtained based on an ith model code generator, and an ith delay penalty strategy is obtained based on an ith delay penalty code generator; wherein i is an integer greater than or equal to 1;

s102: training the ith model to be trained to obtain a converged model, and determining whether the converged model is used as a target model obtained by searching;

if the cumulative number of iterative operations does not reach a preset threshold value N, determining an ith individual performance reward value corresponding to the converged model, and determining an ith delay reward value of the converged model based on the ith delay punishment strategy; updating based on the ith individual performance reward value and the ith delay reward value to obtain an (i + 1) th model code generator and/or an (i + 1) th delay penalty code generator so as to carry out (i + 1) th iteration operation.

The scheme provided by this embodiment may be applied to an electronic device, for example, a server or a terminal device, which is not limited herein.

In addition, before the aforementioned S101 is executed, a model search space and an initialized model code generator are prepared; there is also a need to have a latency penalty code search space ready, as well as an initialized latency penalty code generator.

For example, it may include: the model code generator is designed based on the model search space and initialized. The method can also comprise the following steps: and designing a delay penalty code generator based on the delay penalty code search space, and initializing the delay penalty code generator.

Here, the model search space includes at least one model (or model structure), and the model included therein is a model with low network complexity, and may be: the parameter number of the network, the calculated amount of the network and the layer number of the network are smaller.

The model search space contains models that may be structures of neural network models for performing deep learning tasks. Here, the deep learning task may be an information processing task that is completed using a deep neural network. In practice, the deep learning task may be, for example: image recognition, speech synthesis, text translation, natural language understanding, image processing, trend prediction, target detection and tracking, and the like. In this embodiment, a convergent model is preset mainly for a scene of image recognition.

The model code generator may also understand the constraint condition to be satisfied, which may be adjusted at each iteration, and the constraint condition is satisfied in order to obtain a performance evaluation result that makes the converged model better.

Similarly, the delay penalty code generator can be understood as a constraint condition, and the constraint condition is used for enabling each iteration operation to adopt a better delay penalty strategy.

In addition, the delay penalty coding search space may be a search space including a plurality of delay penalty policies. Wherein different latency penalty policies may correspond to different tags. The delay penalty policy may be understood as a processing policy that needs to be taken after the calculated delay duration of the model exceeds a preset duration, for example, the delay penalty policy may be a processing policy that obtains a corresponding delay reward value according to the delay of the current model processing, or may be a processing policy that adjusts parameters of the model so that the delay of the adjusted model meets a preset duration requirement, and the like. Here, the preset time period may be 10ms, but may be more or less, and is designed according to the actual situation. In addition, the preset time duration may also be designed according to a scene to be applied to, for example, in a face recognition scene, a scene similar to face recognition startup, face recognition payment, face recognition security check, or the like, which requires a faster calculation speed, and the preset time duration may be set to be smaller, for example, 5ms or the like, which needs to be noted, and this is only an exemplary description, and does not limit actual processing.

The initialized delay penalty code generator and the initialized model code generator may be a code generator which is preset according to actual conditions and can decode to obtain a certain model structure or a certain delay penalty. This embodiment is not limited thereto.

Any one of the N iteration operations in S101 may be referred to as an ith iteration operation; correspondingly, the next iteration operation corresponding to the iteration operation is called an i +1 th iteration operation, and the previous iteration operation is called an i-1 th iteration operation.

The obtaining of the ith model to be trained based on the ith model code generator in S101 includes:

generating an ith model code based on the ith model code generator;

and decoding the ith model code based on the model search space to obtain the ith model to be trained.

The obtaining of the ith delay penalty policy based on the ith delay penalty code generator in S101 includes:

generating an ith delay penalty code based on the ith delay penalty code generator;

and decoding the ith delay penalty code based on the delay penalty code search space to obtain an ith delay penalty strategy.

The description of the delay penalty coding search space and the model search space is the same as the above description, and is not repeated.

In addition, when i is 1, the ith delay penalty code generator may be the aforementioned initialized delay penalty code generator; the ith model code generator may be the initialized model code generator described above.

And when i is larger than 1, the ith delay penalty code generator and the ith model code generator are updated based on the reward value obtained by the last iteration operation.

In the foregoing S102, after the ith to-be-trained model of the ith iterative operation is obtained, the ith to-be-trained model may be trained until convergence based on preset training data, so as to obtain a converged model.

The training data may be a data set, which may include at least one sample data and annotation information corresponding to the deep learning task, for example, the image data set includes an image sample and object class annotation information corresponding to the object recognition task, and so on.

The aforementioned training of the ith model to be trained until convergence can be understood as performing multiple parameter updates on the model to be trained until the updated model satisfies the preset convergence condition. Here, the convergence condition may be set according to actual conditions, and for example, may include: the parameter updating times of the ith model to be trained reach a preset updating threshold value (for example, hundreds of times or thousands of times, etc.); and/or whether the reward feedback value of the ith model to be trained reaches a preset convergence condition, for example, the change rate of the reward feedback value in the last continuous updating operations is lower than a preset change rate threshold value, and the like, which are not exhaustive.

Furthermore, in S102, after a converged model in the ith iteration operation is obtained, whether the cumulative number of iterations reaches a preset threshold is determined, and if yes, the converged model is determined to be a target model obtained by searching; wherein N is an integer greater than or equal to i;

That is, after each iteration operation is executed, whether i is equal to N or not is judged, if so, the iteration operation can be considered to be completed, and a model obtained by current search is output; otherwise, the (i + 1) th iteration operation is performed.

It should be noted that, the i +1 th iteration operation may be regarded as setting i to i +1, that is, the next iteration operation is also regarded as the i-th iteration operation, and the executed processing is the same as that of the foregoing S101 to S102, and therefore, the present embodiment will be described in detail only for a certain iteration operation.

Still further, if the cumulative number of iterations does not reach the preset threshold N, an analysis of the reward value is also performed for each iteration, specifically:

the updating is carried out based on the ith individual performance reward value and the ith delay reward value to obtain the (i + 1) th model code generator and/or the (i + 1) th delay penalty code generator so as to carry out the (i + 1) th iteration operation, and the method further comprises the following steps:

if i is equal to 1, overlapping the ith individual performance reward value and the ith delay reward value to obtain an ith reward value overlapping result;

updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator; wherein the (i + 1) th model code generator is used for executing the (i + 1) th iteration operation.

Where i ═ 1 refers to the processing of the 1 st iteration.

In the 1 st iteration operation, only the performance reward value and the delay punishment reward value corresponding to the model of the current iteration operation need to be analyzed.

Specifically, the determining of the ith individual performance reward value corresponding to the converged model means that the performance of the converged model is evaluated to obtain a corresponding evaluation value as the performance reward value, and the corresponding performance reward value is referred to as the ith individual performance reward value because of the ith iteration operation.

The performance reward value may represent that the converged model evaluates the similarity between the image in the evaluation set and the label corresponding to the preset image, for example, the similarity may be 80%, and is not exhaustive here.

In addition, the ith delay reward value of the converged model is determined based on the ith delay penalty policy, which may be understood as a policy for obtaining a delay reward value. For example, when the current model performs performance evaluation, the model runs for 11ms to obtain an identification result, and then the delay time can be 11 ms; or, when a delay boundary (e.g., a delay threshold) is set, the delay may be a difference value between the delay boundary and the delay boundary, for example, the delay boundary is 10ms, then the corresponding present delay is 1ms, and if the delay boundary is 12ms, then the present delay may be-1 ms, which may be set according to an actual situation, and is not exhaustive.

Correspondingly, the delay reward value corresponding to the converged model in the iteration operation is determined based on the corresponding delay value and the ith delay penalty strategy, so that the delay reward value is called as the ith delay reward value, and the delay reward value can be a positive number or a negative number.

Correspondingly, the ith individual performance reward value and the ith delay reward value are superposed to obtain an ith reward value superposition result, which may be:

under the condition that the 1 st time delay reward value is negative, adding the 1 st time delay reward value and the 1 st individual performance reward value to obtain a 1 st time reward value superposition result;

and under the condition that the 1 st delay reward value is a positive number, multiplying the 1 st delay reward value by the 1 st individual performance reward value to obtain a 1 st reward value superposition result.

In addition, if the 1 st delayed award value is 0, the 1 st award value superposition result may be equal to the 1 st individual performance award value.

Furthermore, the 1 st model code generator is updated based on the 1 st bonus value superposition result, and an i +1 st model code generator (namely, a 2 nd model code generator) is obtained.

Note that in the case of i being 1, only the model code generator needs to be updated, and the delay penalty code generator does not need to be updated.

In other words, the 2 nd delay penalty code generator used in the 2 nd iteration operation is an initialized delay penalty code generator.

The updating of the model code generator may be understood as readjusting the model code generator (or the constraint condition) based on the result of the current bonus value superposition, and then searching a model to be trained that is better than the current iterative operation from the model search space to perform the next iterative operation.

Further, on the basis of the iterative operation where i is 1, for a scenario where i is greater than 1, the following method may be used, specifically:

if i is larger than 1, superposing the ith individual performance reward value and the ith delay reward value to obtain an ith reward value superposition result;

updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator;

updating the ith delay penalty code generator based on the ith reward value superposition result and the (i-1) th reward value superposition result to obtain an (i + 1) th delay penalty code generator;

the (i + 1) th model code generator and the (i + 1) th delay penalty code generator are used for executing (i + 1) th iteration operation.

Where i is greater than 1 refers to the processing of the 2 nd to nth iteration operations. Regarding the delay penalty code generator used in the 2 nd iteration operation, the delay penalty code generator may be initialized as described above; however, the model code generator of the 2 nd iteration is an updated model code generator and may be different from the initialized model code generator. And the 3 rd to nth iteration operations may include the process of updating both the delay penalty code generator and the model code generator.

In the processing of the ith iteration operation, only the performance reward value and the delay penalty reward value corresponding to the model of the current iteration operation need to be analyzed. The process of analyzing the performance reward value and the delay penalty reward value at this time is the same as the process of the iterative operation with i equal to 1, and is not described again here.

In the iterative operation where i is greater than 1, the process of updating the model code generator is also the same as the process of the iterative operation where i is 1, and the description thereof is omitted.

The difference from the process of i-1 iteration operations is that in the process of iteration operations where i is greater than 1, the delay penalty code generator needs to be updated together with the result of the current bonus value superposition and the result of the last bonus value superposition. The method specifically comprises the following steps: updating the ith delay penalty code generator based on the ith bonus value superposition result and the smoothness degree between the ith-1 bonus value superposition results. That is, the delay penalty code generator may be updated according to the smoothness of the result of the superposition of two adjacent bonus values.

And the delay penalty code generator is updated, so that the delay penalty code generator which searches a better delay penalty strategy from the delay penalty search space can perform the next iteration operation.

In an example, when determining the target model, the method may further include:

and quantizing the converged model, and taking the quantized converged model as the target model.

That is, the final output target model may be an activation function obtained by searching after the iteration is completed, and a quantized model of a converged model using the activation function.

The converged model is quantized, which may be parametric quantization. The parameter quantization is an important means of compressing the model. Typically, the parameters of the convolutional neural network model are floating point numbers of different lengths. Floating-point numbers can be converted to fixed-point numbers by floating-point number fixed-point.

There are many methods for quantizing floating point numbers into fixed point numbers, such as LOG logarithmic transformation, sine function transformation, tangent function transformation, linear quantization transformation, and the like. For example, before parameter quantization, the parameter of the converged model is 32-bit floating point number, and by parameter quantization, the parameter of the converged model is converted from the 32-bit floating point number to a fixed point number of 8-bit and stored, so that the converged model can be compressed to one fourth of the original size.

Of course, other quantization strategies may be used to quantize the converged model, which is not exhaustive in this embodiment.

According to the scheme, the searched model can be trained based on multiple iteration operations, the delay reward value of the model is determined by combining a delay punishment strategy in the multiple iteration operations, and then the model can be searched by combining the delay condition in the next iteration operation, so that the delay limiting factor is increased when the model is searched, and the target model which is high in accuracy and fast in operation can be finally obtained. In addition, the target model can meet the requirements of accuracy and small enough, so that the method is more suitable for scenes needing real-time processing.

The application provides a target model is applicable to gauze mask face identification scene, and the precision demand to the model is high very much in this kind of scene for during the epidemic situation or other wear gauze masks etc. shelter from under the scene, face identification model has the recognition capability that speed is very fast and the recognition result precision is higher equally.

With continuing reference to fig. 2, a flow chart of an embodiment of the model generation method of the present application applied in a mask face recognition scenario is shown, comprising the following steps:

s1, designing a model search space; in one example, a space may be searched for a design mask face recognition model.

S2, designing a delay penalty search space; in one example, search space may be punished with a delay for designing a mask face recognition model.

S3, designing a model code generator according to the model searching space; in one example, if the recognition of the face image of the wearing mask is to be finally performed, this step may be a design mask face recognition model structure code generator.

S4, designing a delay penalty code generator according to the delay penalty search space; in one example, if the recognition of the face image of the wearing mask is to be finally performed, this step may be a delay penalty code generator for designing a mask face recognition model.

And S5, initializing a model code generator (or called an initialized mask model code generator).

And S6, initializing a delay penalty code generator, or initializing a mask face recognition model delay penalty code generator.

The processing sequence of S1-S6 may be the execution sequence of S-S6. Alternatively, S1, S3, S5 may be performed sequentially, while S2, S4, S6 may be performed sequentially in parallel. Still alternatively, S1, 3, 5 may be performed first, and then S2, 4, 6 may be performed, or S2, 4, 6 may be performed first, and then S1, 3, 5 may be performed.

And S7, generating the model code according to the model code generator. For example, the code of the mask face recognition model may be generated according to a mask model code generator.

And S8, generating a delay penalty code according to the delay penalty code generator. For example, the code of the mask face recognition model delay penalty can be generated according to the mask delay penalty code generator.

S9, coding the model according to the model searching space, and decoding the model into a model to be trained; the model to be trained can be a mask face recognition model structure.

And S10, decoding the delay penalty codes into delay penalties according to the delay penalty search space.

And S11, training the model to be trained by using the mask data until convergence.

And S12, taking the performance of the converged model as a performance reward value.

And S13, determining a delay reward value based on the delay punishment and the delay of the converged model (if the negative number is superposed on the performance, if the positive number is multiplied).

And S14, overlapping the performance reward value and the delay reward value to obtain a reward value overlapping result so as to update the model code generator.

Further, if not the first superposition, the method may further include: and updating the delay punishment code generator according to the smoothness degree of the current bonus value superposition result and the previous bonus value superposition result.

And S15, judging whether the iteration number of the model code generator reaches a preset threshold value, if not, executing S7, otherwise, executing S16.

And S16, outputting the searched optimal target model. The target model may be a mask face recognition model.

On the basis of the above flow, the embodiments provided by the present application may further include:

acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state;

and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

The specific scene can be the face image in the image which is the face image wearing the mask.

And then, based on the target model, recognizing the input face image to be recognized, and finally obtaining a corresponding face image as a recognition result. It should be noted that the final face recognition result does not include any occlusion region in the face.

It should be understood that the process of recognizing the face image provided by the present embodiment may be implemented in the same device as the process of generating the target model. Or, it may be implemented in different devices, for example, in a terminal device (such as a personal computer) or a server to generate the target model; in another terminal device (such as a mobile phone), the target model is used for recognition, and in this case, the method may further include: the mobile phone acquires a target model from a server (or another terminal device), and performs face recognition based on the target model.

In summary, the scheme provided by the application is applicable to the aspect of image recognition processing, and particularly can be used for recognizing any face image with a shielding area, especially the face image of a mask. The terminal carries out image processing, particularly in a scene of carrying out face recognition, and a recognition result can be efficiently and accurately obtained only by setting the target model at the terminal.

For example, in a scene, a mobile phone needs to be unlocked by face recognition, and then the face unlocking can be realized only by setting the target model with the higher running speed obtained by the application, and the face unlocking is executed on the mobile phone without setting a more complicated model.

An embodiment of the present invention further provides a model generating apparatus, as shown in fig. 3, including:

an obtaining module 31, configured to obtain an ith model to be trained based on an ith model code generator and obtain an ith delay penalty policy based on an ith delay penalty code generator in an ith iteration operation; wherein i is an integer greater than or equal to 1;

the model generation module 32 is configured to train the ith model to be trained to obtain a converged model, and determine whether to use the converged model as a target model obtained by the search;

wherein the model generation module 32 is specifically configured for

The obtaining module 32 is configured to generate an ith model code based on the ith model code generator; and decoding the ith model code based on the model search space to obtain the ith model to be trained.

The obtaining module 32 is configured to generate an ith delay penalty code based on the ith delay penalty code generator; and decoding the ith delay penalty code based on the delay penalty code search space to obtain an ith delay penalty strategy.

The model generating module 32 is specifically configured to, if i is equal to 1, superimpose the ith individual performance reward value and the ith delay reward value to obtain an ith reward value superimposing result; updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator; wherein the (i + 1) th model code generator is used for executing the (i + 1) th iteration operation.

The model generating module 32 is specifically configured to, if i is greater than 1, superimpose the ith individual performance reward value and the ith delay reward value to obtain an ith reward value superimposing result; updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator; updating the ith delay penalty code generator based on the ith reward value superposition result and the (i-1) th reward value superposition result to obtain an (i + 1) th delay penalty code generator;

As shown in fig. 4, the apparatus further includes:

the image recognition module 33 is used for acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state; and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

Fig. 5 is a block diagram of an electronic device according to the model generation method of the embodiment of the present application. The electronic device may be the aforementioned deployment device or proxy device. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 5, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 5, a processor 801 is taken as an example.

The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the model generation methods provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to perform the model generation method provided herein.

The memory 802, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the model generation methods in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the model generation method in the above-described method embodiments.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the model generation method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, as exemplified by the bus connection in fig. 5.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device, such as a touch screen, keypad, mouse, track pad, touch pad, pointer stick, one or more mouse buttons, track ball, joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A model generation method, comprising:

2. The method of claim 1, wherein the deriving an ith model to be trained based on an ith model code generator comprises:

generating an ith model code based on the ith model code generator;

3. The method of claim 1, wherein the deriving an ith latency penalty policy based on an ith latency penalty code generator comprises:

4. The method according to claim 1, wherein the updating the ith model code generator and/or the ith delay penalty code generator based on the ith individual performance reward value and the ith delay reward value to obtain an (i + 1) th model code generator and/or an (i + 1) th delay penalty code generator further comprises:

5. The method according to claim 1, wherein the updating the ith model code generator and/or the ith delay penalty code generator based on the ith individual performance reward value and the ith delay reward value to obtain an (i + 1) th model code generator and/or an (i + 1) th delay penalty code generator further comprises:

6. The method of any of claims 1-5, wherein the method further comprises:

7. A model generation apparatus comprising:

wherein the model generation module is specifically used for

8. The apparatus of claim 7, wherein the obtaining module is configured to generate an ith model code based on the ith model code generator; and decoding the ith model code based on the model search space to obtain the ith model to be trained.

9. The apparatus of claim 7, wherein the obtaining module is configured to generate an ith delay penalty code based on the ith delay penalty code generator; and decoding the ith delay penalty code based on the delay penalty code search space to obtain an ith delay penalty strategy.

10. The device according to claim 7, wherein the model generating module is specifically configured to, if i is equal to 1, superimpose the ith individual performance bonus value and the ith delay bonus value to obtain an ith bonus value superimposition result; updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator; wherein the (i + 1) th model code generator is used for executing the (i + 1) th iteration operation.

11. The device according to claim 7, wherein the model generating module is specifically configured to, if i is greater than 1, superimpose the ith individual performance bonus value and the ith delay bonus value to obtain an ith bonus value superimposition result; updating the ith model code generator based on the ith reward value superposition result to obtain an (i + 1) th model code generator; updating the ith delay penalty code generator based on the ith reward value superposition result and the (i-1) th reward value superposition result to obtain an (i + 1) th delay penalty code generator;

12. The apparatus of any of claims 7-11, wherein the apparatus further comprises:

the image recognition module is used for acquiring a face image to be recognized; wherein, part of the face area in the face image to be recognized is in a shielding state; and obtaining a recognition result of the face image based on the target model and the face image to be recognized.

13. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.