CN116958725A

CN116958725A - Model training method and device based on mask image and storage medium

Info

Publication number: CN116958725A
Application number: CN202310233464.6A
Authority: CN
Inventors: 李昱希; 张博深; 涂远鹏
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2023-02-28
Filing date: 2023-02-28
Publication date: 2023-10-27

Abstract

The application discloses a model training method, a device and a storage medium based on mask images, wherein the method comprises the following steps: using a mask region of the initial image mask to conduct shielding treatment on a first image region of the training image to obtain a first mask image; performing model training on the first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model; adjusting parameter values of mask parameters of the image based on the loss of the second recognition model corresponding to the first mask image to obtain a target image mask, and carrying out shielding treatment on a second image area of the training image by using a mask area of the target image mask to obtain a second mask image, wherein the loss of the second recognition model corresponding to the first mask image is lower than that of the second recognition model corresponding to the second mask image; and performing model training on the second recognition model by using the second mask image so as to update model parameters of the second recognition model and obtain a third recognition model.

Description

Model training method and device based on mask image and storage medium

Technical Field

The present application relates to the field of computers, and in particular, to a model training method, apparatus and storage medium based on mask images.

Background

The deep neural network has good effect on the training set due to the fact that the model parameters are numerous and the fitting condition is easy to occur during training, but the effect is poor during actual scene testing, and generalization capability is lacking. For this purpose, the training data in the training set may be augmented by data enhancement, thereby changing the distribution of the data in the training set.

For a scene in which training data is a training image, a data enhancement mode generally adopted is a mask-based data enhancement mode, namely, a mask with a specific shape is manually selected to be placed on the training image, and then image content in a mask area is changed, so that more images can be seen by a model during training, and stronger generalization capability is obtained.

However, in the above model training method using mask images, since mask selection and filling are mainly based on random selection, mask and filled portions may appear in any area of the original image, and there is no correlation with the objective of reducing the risk of overfitting of the training model and improving the generalization capability of the model. Therefore, the model training method based on the mask image in the related art has the problem that the generalization capability of the model is weak due to the randomness of mask selection.

Disclosure of Invention

The embodiment of the application provides a model training method, a device and a storage medium based on mask images, which at least solve the problem that the model generalization capability is weak due to the randomness of mask selection in the model training method based on mask images in the related technology.

According to an aspect of an embodiment of the present application, there is provided a model training method based on mask images, including: using a mask region of the initial image mask to conduct shielding treatment on a first image region of the training image to obtain a first mask image; performing model training on a first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model; adjusting parameter values of image mask parameters based on loss corresponding to the second recognition model and the first mask image to obtain a target image mask, and performing shielding treatment on a second image area of the training image by using a mask area of the target image mask to obtain a second mask image, wherein the image mask parameters are used for representing the mask area of the image mask corresponding to the training image, and the loss corresponding to the second recognition model and the first mask image is lower than the loss corresponding to the second recognition model and the second mask image; and performing model training on the second recognition model by using the second mask image so as to update model parameters of the second recognition model and obtain a third recognition model.

According to another aspect of the embodiment of the present application, there is also provided a model training apparatus based on mask images, including: the first processing unit is used for carrying out shielding treatment on a first image area of the training image by using a mask area of the initial image mask to obtain a first mask image; the first training unit is used for carrying out model training on the first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model; an adjusting unit, configured to adjust parameter values of image mask parameters based on a loss of the second recognition model corresponding to the first mask image, to obtain a target image mask, where the image mask parameters are used to represent a mask region of the image mask corresponding to the training image; the second processing unit is used for carrying out shielding treatment on a second image area of the training image by using the mask area of the target image mask to obtain a second mask image, wherein the loss corresponding to the second recognition model and the first mask image is lower than the loss corresponding to the second recognition model and the second mask image; and the second training unit is used for carrying out model training on the second recognition model by using the second mask image so as to update the model parameters of the second recognition model and obtain a third recognition model.

As an alternative, the adjusting unit includes: the first input module is used for inputting the first mask image into the second recognition model to obtain a first recognition result output by the second recognition model; the first determining module is used for determining a first function value corresponding to the second recognition result and the preset recognition result of a preset loss function, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the first function value is used for representing an error between the first recognition result and the preset recognition result; and the first updating module is used for updating the parameter value of the image mask parameter along the rising direction of the first parameter gradient based on the first function value to obtain the target image mask, wherein the input parameter of the preset loss function comprises the image mask parameter, and the first parameter gradient is a gradient corresponding to the preset loss function and the image mask parameter.

As an alternative, the apparatus further comprises: and the determining unit is used for determining the product of the derivative of the preset loss function on the image mask corresponding to the training image and the derivative of the image mask corresponding to the training image on the image mask parameter as the first parameter gradient.

As an alternative, the apparatus further comprises: the sampling unit is used for randomly sampling the image mask parameters to obtain initial parameter values of the image mask parameters before the mask region using the initial image mask performs shielding treatment on the first image region of the training image to obtain a first mask image; and the execution unit is used for executing image mask generation operation by using a differentiable activation function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the differentiable activation function is a monotonically increasing function.

As an alternative, the sampling unit includes: the first sampling module is used for randomly sampling the regional shape parameters in the image mask parameters to obtain initial shape parameter values, wherein the regional shape parameters are used for representing the regional shape of the mask region of the image mask corresponding to the training image; and the second sampling module is used for randomly sampling the regional position parameters in the image mask parameters to obtain initial position parameter values, wherein the regional position parameters are used for representing the regional positions of mask regions of the image mask corresponding to the training images.

As an alternative, the first sampling module includes: the first sampling submodule is used for randomly sampling the region size parameter in the image mask parameters to obtain an initial size parameter value, wherein the region size parameter is used for representing the region size of a mask region of the image mask corresponding to the training image; and the second sampling submodule is used for randomly sampling the rotation angle parameter in the image mask parameters to obtain an initial rotation angle value, wherein the rotation angle parameter is used for representing the angle of rotation of the mask region of the image mask corresponding to the training image along the preset direction by taking the region center point of the mask region of the image mask corresponding to the training image as the center of a circle.

As an alternative, the second sampling module includes: and the third sampling sub-module is used for randomly sampling the central point position parameter in the image mask parameters to obtain the initial position parameter value, wherein the central point position parameter is used for representing the position of the regional central point of the mask region of the image mask corresponding to the training image.

As an alternative, the mask area indicated by the initial parameter value of the image mask parameter is an initial mask area, and the initial parameter value of the image mask parameter includes: an initial region length value for a region length of the initial mask region, an initial region width value for a region width of the initial mask region, an initial center point abscissa and an initial center point ordinate for a region center point position of the initial mask region, and an initial rotation angle value for representing a rotation angle of the initial mask region, wherein the rotation angle of the initial mask region refers to an angle rotated in a preset direction by taking the region center point of the initial mask region as a dot; the execution unit includes: the first execution module is used for respectively executing the following operations by taking each pixel position in the initial image mask as a current pixel position to obtain the initial image mask, wherein the current pixel position comprises a current pixel abscissa and a current pixel ordinate: determining the product of a function value corresponding to the differentiable activation function and a first reference value and a function value corresponding to the differentiable activation function and a second reference value as a pixel value of the current pixel position, wherein the first reference value is a product of a first coordinate difference value and a cosine value of the initial rotation angle value, a product of a second coordinate difference value and a sine value of the initial rotation angle value is added, a value obtained by subtracting half of the initial region length value is subtracted, the second reference value is a product of the second coordinate difference value and a cosine value of the initial rotation angle value, a product of the first coordinate difference value and a sine value of the initial rotation angle value is subtracted, and a value obtained by subtracting half of the initial region width value is subtracted; the first coordinate difference value is a coordinate difference value between the current pixel abscissa and the initial center point abscissa, and the second coordinate difference value is a coordinate difference value between the initial center point ordinate and the current pixel ordinate.

As an alternative, the execution unit includes: and the second execution module is used for executing image mask generation operation according to the resolution of the training image by using the differentiable activation function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the resolution of the initial image mask is equal to the resolution of the training image.

As an alternative, the execution unit includes one of: the third execution module is used for executing image mask generation operation by using a logistic function based on the initial parameter value of the image mask parameter to obtain the initial image mask; and a fourth execution module, configured to execute an image mask generating operation by using a preset hyperbolic tangent function based on an initial parameter value of the image mask parameter, to obtain the initial image mask, where the preset hyperbolic tangent function is a hyperbolic tangent function with offset and scaling terms.

As an alternative, the first training unit includes: the second input module is used for inputting the first mask image into the first recognition model to obtain a second recognition result output by the first recognition model; the second determining module is used for determining a second function value corresponding to a preset loss function, the second recognition result and the preset recognition result, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the second function value is used for representing an error between the first recognition result and the preset recognition result; and the second updating module is used for updating the model parameters of the first identification model along the descending direction of the second parameter gradient based on the second function value to obtain the second identification model, wherein the input parameters of the preset loss function comprise the model parameters of the first identification model, and the second parameter gradient is a gradient corresponding to the model parameters of the preset loss function and the first identification model.

According to a further aspect of embodiments of the present application, there is also provided a computer readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above mask image based model training method when run.

According to yet another aspect of embodiments of the present application, there is provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the model training method based on mask images as above.

According to yet another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, and a processor, where the memory stores a computer program, and the processor is configured to execute the mask image-based model training method described above by using the computer program.

In the embodiment of the application, a mode of guiding an image mask to drift on a training image based on model loss is adopted, firstly, masking processing is carried out on a corresponding image area in the training image by using a mask area of an initial image mask to obtain a first mask image, and model training is carried out on a first recognition model by using the obtained first mask image to obtain a second recognition model, and model parameters of the recognition model are updated through the training, wherein the training aim is to minimize loss; then, updating image mask parameters of an initial image mask with the increase loss as a target to obtain a target image mask, using a mask region of the target image mask to perform shielding treatment on a corresponding image region in a training image to obtain a second mask image, using the second mask image to perform model training on a second recognition model to obtain a third recognition image, and using the training to update model parameters of the recognition model to minimize the loss.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a schematic illustration of an application environment of an alternative mask image-based model training method according to an embodiment of the present application;

FIG. 2 is a flow chart of an alternative mask image based model training method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of an alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 5 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 6 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 7 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 8 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 9 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 10 is a flow chart of another alternative mask image based model training method according to an embodiment of the present application;

FIG. 11 is a flow chart of yet another alternative mask image based model training method in accordance with an embodiment of the present application;

FIG. 12 is a schematic diagram of yet another alternative mask-based data enhancement method according to an embodiment of the present application;

FIG. 13 is a block diagram of an alternative mask image based model training apparatus in accordance with an embodiment of the present application;

FIG. 14 is a schematic diagram of an alternative electronic device in accordance with an embodiment of the application;

FIG. 15 is a block diagram of the architecture of a computer system of an alternative electronic device in accordance with an embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of the embodiment of the present application, a model training method based on a mask image is provided, optionally, as an optional implementation manner, the model training method based on a mask image may be, but is not limited to, applied to an environment as shown in fig. 1. Including but not limited to model training device 102, network 110, and server 112, where model training device 102 may include but is not limited to display 108, processor 106, and memory 104. The specific process comprises the following steps:

In step S102, the server 112 transmits the initial recognition model to be trained and the training image set for model training of the initial recognition model to the model training apparatus 102 via the network 110.

The recognition model to be trained may be trained on the server 112 or the model training device 102 (e.g., the terminal device of a developer or other relevant person). For a scenario in which model training is performed on model training device 102, the initial recognition model to be trained and a labeled training image set for model training the initial recognition model may be stored in database 114 of server 112. Based on the data pull request of model training device 102, processing engine 116 of server 112 may send the initial recognition model stored in database 114 of server 112, along with the training image set, to model training device 102 over network 110.

In step S104, the model training device 102 trains the initial recognition model by using the training images in the training image set, and obtains an intermediate recognition model.

The model training device 102 may use the training images in the training image set to perform multiple rounds of model training on the initial recognition model until the end of iteration condition is met. The recognition model obtained after model training can be used as a final recognition model, or in order to improve generalization capability of the model, data enhancement (or data augmentation) can be performed on the training image so as to change distribution of data in the training image set, and the intermediate recognition model is trained by using the enhanced training image. Here, the data enhancement technique used may be an area mask based enhancement technique.

In step S106, the model training device 102 performs data enhancement on the training image based on the region mask to obtain a mask image, and performs model training on the intermediate recognition model by using the mask image to obtain the target recognition model.

Model training device 102 may perform data enhancement on the training image based on the region mask, resulting in an augmented training image (i.e., mask image). After obtaining the intermediate recognition model, the model training device 102 may perform model training on the intermediate recognition model using the augmented training image to obtain a target recognition model.

In step S108, the model training apparatus 102 transmits the model parameters of the object recognition model to the server 112 via the network 110.

After obtaining the target recognition model, model training device 102 may send model parameters of the target recognition model to server 112 via network 110. The processing engine 116 of the server 112 may save the received model parameters of the object recognition model to the database 114 of the server 112.

Optionally, model training device 102 includes, but is not limited to, at least one of: a desktop computer, a workstation, a Virtual Reality device such as AR (Augmented Reality ), VR (Virtual Reality technology), etc., and for portable devices, smart home appliances, vehicle-mounted devices, etc., which are rich in computing resources, training of a recognition model may be performed as model training devices or auxiliary devices of the model training devices (for example, devices for transmitting instructions, displaying augmented training images, displaying recognition results, etc.), and the portable devices may include, but are not limited to, at least one of the following: a cell phone (e.g., android cell phone, iOS cell phone, etc.), a notebook computer, a tablet computer, a palm computer, MID (Mobile Internet Devices, mobile internet device), PAD, etc. The network 110 may include, but is not limited to: a wired network, a wireless network, wherein the wired network comprises: local area networks, metropolitan area networks, and wide area networks, the wireless network comprising: bluetooth, wireless fidelity (Wireless Fidelity, abbreviated WIFI), and other networks that enable wireless communications. The server 112 may be a single server, a server cluster including a plurality of servers, or a cloud server. The above is merely an example, and is not limited in any way in the present embodiment.

Alternatively, the model training method based on the mask image may be performed by the model training device 102 or the server 112 alone, or may be performed by the model training device 102 and the server 112 cooperatively, or may be performed by a processing device other than the model training device 102 and the server 112, so long as the obtained model parameters may be applied to the recognition model.

As an alternative implementation manner, taking the model training device 102 as an example to execute the mask image based model training method in this embodiment, fig. 2 is a schematic flow chart of an alternative mask image based model training method according to an embodiment of the present application, as shown in fig. 2, the flow chart of the mask image based model training method may include the following steps:

step S202, the mask area of the initial image mask is used for shielding the first image area of the training image, and a first mask image is obtained.

The model training method based on the mask image in the embodiment may be applied to a process of performing model training on a neural network model by using training data, where the training data may be a training image, the mask image is an image obtained by performing region masking on an image to be masked (that is, an image obtained by performing shielding processing on the image to be masked by using an image mask), the neural network model may be a recognition model (for example, a classification model), or another type of neural network model, and in this embodiment, model training on the recognition model by using the training image is described as an example.

The neural network model is a model for executing a specified task by using a neural network, and the deep neural network is easy to face the risk of fitting during training because of numerous parameters, so that the trained neural network model has a good processing effect on a training set, but has poor processing effect during actual scene test and lacks generalization capability. Therefore, the data enhancement technology can be applied to training of the neural network so as to improve the generalization capability of a training model and achieve a better test effect. Here, the data enhancement technique refers to that data in a training set is amplified by a specific means, so that the distribution of the data in the training set is changed. The data enhancement technology obtains stronger generalization capability by enabling the model to see more images (images) during training; more deeply, the data enhancement technique can be regarded as regularization of model output, and the neural network is prevented from focusing on outliers deviating from half of data distribution in the training process, so that a better generalization effect is achieved.

As a commonly used data enhancement technique, the region mask is not only simple and intuitive to implement, but also has very obvious effects on training and regularization of the neural network. Currently, area masks are typically manually selected to place a mask of a particular shape (e.g., an area of a rectangle) over the image, and then the image content within the mask area is modified (e.g., randomly filled with noise or a portion of another image) to create a new training image.

Here, the region mask means that a specific region (i.e., a mask region) in the original image is blocked by using a black-and-white binary image (i.e., a binary mask), in which a portion in the original image corresponding to a black portion in the binary image becomes black and other portions are unchanged, thereby extracting a portion of interest in the original image. Illustratively, the data enhancement based on the region mask can be expressed using equation (1):

wherein,,for an augmented image (i.e., a mask image), I is the original image (i.e., a training image), M represents a binarized mask of the same resolution as the original image (the image mask can take only one value of 0 or 1 per pixel), Δ represents the filled content, the augmented image->Default shares the same tag as I. The area mask will typically select rectangles as mask areas, i.e., one rectangular area in M is 0 and the other area is 1. And different area mask modes are different in the selection of the filling content delta. For example, the cut out mode is to fill gaussian random noise in the mask region; cutting another image in the training set according to the same area of the mask to be filled in a cutting mix mode; resizeMix (adjustable size blending) is to scale the entire image of another image in the training set to the same area and aspect ratio of the mask as a fill.

As shown in fig. 3, for the original image (a), a rectangular mask is first placed on the image, and then noise is randomly filled in the mask region to obtain a mask image (b), or a part of other images is randomly filled in the mask region to obtain a mask image (c).

However, the above-mentioned data enhancement method based on the regional mask depends on the random de-augmentation training data, mask selection and content filling are random, mask and filled parts may appear in any region of the original image, and lack guidance, and meanwhile, there is a lack of coupling relation with the generalization capability of the neural network training enhancement training model and the final training target. Here, lack of guidance means that the mask area is random, and what it obscures may be non-critical areas in the original image (e.g., areas that are not relevant or of low relevance to the recognition task of the recognition model), thus resulting in less efficient model training using the mask image to improve model generalization ability.

To at least partially solve the above problem, in this embodiment, by improving the data enhancement technique based on the area mask, the mask area is optimized on the basis of randomly selecting the mask, so that the area covered by the optimized mask can be more challenging to identify (for example, the head of the animal target shown in fig. 3 is covered, or the special skin texture is covered), adopting the idea of antagonism learning, and updating the geometric parameters of the mask based on increasing the network loss (which may be maximizing the network loss): after the mask image is generated by using the image mask and the recognition model is trained by using the generated mask image, the image mask is updated based on the target for increasing network loss, so that the image mask can shield the key area in the training image more effectively, and further, the data enhancement with guidance and pertinence is realized, the model generalization capability is improved, and the risk of overfitting is reduced.

When training the recognition model by using the training image, the mask region of the image mask may be used to perform occlusion processing on the image region of the training image to obtain the mask image. For a training image, the image mask used for performing the region shielding processing on the training image may be an initial image mask, the model training device may use the mask region of the initial image mask to perform shielding processing on a corresponding image region (i.e., a first image region) in the training image, so as to obtain a first mask image, and the resolution of the initial image mask may be the same as that of the training image. Here, the image mask may be a binary mask (binary image) for performing region occlusion processing on the training image, and the mask region in the image mask may be configured by image mask parameters. The initial image mask is used for shielding the first image area of the training image, the mask area of the initial image mask can be randomly selected, and the shape of the mask area of the initial image mask can be preset, for example, a rectangular area or other shapes.

For example, as shown in fig. 4, the mask region in the image mask 1 is a rectangular region, and the mask region of the image mask 1 is used to block the image region corresponding to the training image, so that the mask image 1 can be obtained.

In step S204, the first recognition model is model trained using the first mask image to update the model parameters of the first recognition model, so as to obtain the second recognition model.

After the first mask image is obtained, the first mask image may be applied to a model training process that identifies the model: the first recognition model is model trained using the first mask image to update model parameters of the first recognition model. The first recognition model may be a recognition model in any model training stage from the initial recognition model to the target recognition model, for example, the intermediate recognition model described above, a recognition model obtained after the intermediate recognition model has been model-trained using one or more mask images, or a recognition model in another stage may be used.

Here, the training process of the deep neural network model is a process of optimizing a loss function that determines the performance of the recognition model by comparing the deviation of the predicted output and the expected output of the recognition model, thereby finding the direction of optimization. If the deviation of the two is larger, the loss value is larger, and if the deviation of the two is smaller, the loss value is smaller. In order to achieve the goal of minimizing the loss of model training, the model parameters of the recognition model can be adjusted to reduce the recognition Deviation of the predicted output and the expected output of the other model. Taking the deep neural network classifier as an example, the labeled raw training dataset may be represented as set D _t ＝{(I _x C), where x ε R ^H×W A two-dimensional data array having a height H and a width W; each training sample corresponds to a label c epsilon {0,1} ^K The K-dimensional vector is a one-dimensional vector of one-hot coding (one dimension is 1, and the rest are 0), wherein K is the number of training set categories. Inputting training sample (training image) into deep neural network classifier f (& phi.; phi.) R to be trained ^H×W →R ^K Mapping the training samples into K-dimensional score vectors, wherein each bit in the K-dimensional score vectors represents a discrimination score of a corresponding category, so that the deep neural network classifier can be trained by using cross entropy loss:

where φ is the weight of the neural network (i.e., model parameters), the process of training is the process of minimizing cross entropy loss.

Step S206, adjusting parameter values of the image mask parameters based on the loss of the second recognition model corresponding to the first mask image to obtain a target image mask, and performing shielding treatment on a second image area of the training image by using a mask area of the target image mask to obtain a second mask image.

In order to avoid the problem that model training based on mask images is not significant in improving model generalization ability due to non-critical areas in the training image that are blocked by the mask areas, in this embodiment, the mask may be guided to drift on the image by means of re-parameterization and counteroptimization: and adjusting parameter values of image mask parameters based on the loss of the second recognition model corresponding to the first mask image so as to update the image mask, obtaining a target image mask, and carrying out shielding treatment on an image area (namely a second image area) corresponding to the training image by using a mask area of the target image mask to obtain a second mask image, wherein the area position of the second image area in the second mask image drifts relative to the area position of the first image area in the first mask image, and the aim of updating the image mask parameters is to increase the loss, namely, the loss of the second recognition model corresponding to the first mask image is lower than the loss of the second recognition model corresponding to the second mask image.

It should be noted that, the image mask parameters are adjusted to construct data with more training difficulty to assist in training the neural network. For example, the loss of the second recognition model corresponding to the second mask image is higher than the loss of the second recognition model corresponding to the first mask image, in this case, the second mask image has more recognition difficulty for the first recognition model, and the recognition model is trained by using the mask image with more recognition difficulty, so that the generalization capability of the recognition model can be improved.

Here, the image mask parameter is used to represent a mask region of the image mask corresponding to the training image, and the parameter types of the image mask parameter may be one or more, for example, a region shape parameter, a region position parameter, and the like, and when the image mask parameter is adjusted, the mask parameter may be attempted to be adjusted according to a preset adjustment policy, and whether to allow adjustment of the model parameter is determined based on a loss of the second recognition model corresponding to the adjusted mask image and a loss of the second recognition model corresponding to the first mask image. The preset adjustment policy may have one or more types of parameter values for adjusting the image mask parameter along a specified direction, a specified distance, or the like, and may also be other adjustment policies (for example, selecting an adjusted parameter value from a set of candidate parameter values), which is not limited in this embodiment.

For example, as shown in fig. 5, after the parameter values of the mask parameters of the image mask 1 are adjusted, an image mask 2 is obtained, and a mask image corresponding to the image mask 2 is the mask image 2. Because the mask region in the mask image 2 shields the head of the animal object, which is representative, the loss of the identification model corresponding to the mask image 2 is higher than the loss of the identification model corresponding to the mask image 1.

Step S208, performing model training on the second recognition model by using the second mask image to update model parameters of the second recognition model, so as to obtain a third recognition model.

After the second mask image is obtained, the second mask image may be applied to a model training process that identifies the model: the second recognition model may be model trained using the second mask image to update model parameters of the second recognition model. The recognition model obtained after the model parameter update is a third recognition model, and the object of model training on the second recognition model is to minimize the loss, that is, the loss of the third recognition model corresponding to the second mask image is lower than the loss of the second recognition model corresponding to the second mask image.

Here, the training image set may include a set of training images, and for each training image, model training may be performed on the recognition model by using a model training manner based on the area mask, which has already been described and will not be described herein.

Optionally, model training on the recognition model by using the training image and the augmented training image may be sequentially performed, that is, after the initial recognition model is model-trained by using the training image in the training image set to obtain an intermediate recognition model, the intermediate recognition model is model-trained by using the augmented training image to obtain a target recognition model; alternatively, for example, training images and augmented training images are used alternately to perform model training to obtain a target recognition model, where the training images and the augmented training images are sequentially input, or the training image set is updated by mixing the training images and the augmented training images, and the initial recognition model is model-trained by using the training images in the updated training image set, so as to obtain the target recognition model, where the input order of the training images and the augmented training images is not fixed.

For example, as shown in fig. 6, when training the initial recognition model by using the training image set, the initial recognition model may be model-trained by sequentially using each training image (training image 1, training image 2 … …) in the training image set to obtain an intermediate recognition model, then model-trained by sequentially using the mask enhanced image (mask image 1, mask image 2 … …) obtained by data augmentation to obtain an updated intermediate recognition model, and finally model-trained by sequentially using the mask enhanced image (mask image 1', mask image 2' … …) obtained by re-data augmentation to obtain the target recognition model.

For another example, as shown in fig. 7, when training the initial recognition model by using the training image set, the initial recognition model may be model-trained by sequentially using each training image in the training image set to obtain an intermediate recognition model, and then model-trained by alternately using the mask enhanced image obtained by data augmentation and the mask enhanced image obtained by data augmentation again to obtain the target recognition model.

For another example, as shown in fig. 8, when training the initial recognition model by using the training image set, each training image in the training image set, the mask enhanced image obtained by data augmentation, and the mask enhanced image obtained by re-data augmentation may be used alternately to model the initial recognition model to obtain the target recognition model.

According to the embodiment provided by the application, the mask region of the initial image mask is used for shielding the first image region of the training image, so that a first mask image is obtained; performing model training on the first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model; adjusting parameter values of image mask parameters based on loss corresponding to the first mask image by the second recognition model to obtain a target image mask, and carrying out shielding treatment on a second image area of the training image by using a mask area of the target image mask to obtain a second mask image, wherein the image mask parameters are used for representing the mask area of the image mask corresponding to the training image, and the loss corresponding to the second recognition model and the first mask image is lower than the loss corresponding to the second recognition model and the second mask image; and performing model training on the second recognition model by using the second mask image to update model parameters of the second recognition model to obtain a third recognition model, so that the problem that the model generalization capability is weak due to the randomness of mask selection in the model training method based on the mask image in the related technology is solved, and the model generalization capability is improved.

As an alternative, model training is performed on the first recognition model by using the first mask image to update model parameters of the first recognition model to obtain a second recognition model, including:

s11, inputting the first mask image into a first recognition model to obtain a second recognition result output by the first recognition model;

s12, determining a second function value corresponding to the first recognition result and the preset recognition result of the preset loss function, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the second function value is used for representing an error between the second recognition result and the preset recognition result;

and S13, updating the model parameters of the first recognition model along the descending direction of the second parameter gradient based on the second function value to obtain a second recognition model, wherein the input parameters of the preset loss function comprise the model parameters of the first recognition model, and the second parameter gradient is a gradient corresponding to the model parameters of the preset loss function and the first recognition model.

The input parameters of the preset loss function comprise model parameters of the identification model, and in order to improve the model training efficiency, the model parameters of the identification model can be updated along the gradient descending direction corresponding to the model parameters of the identification model, wherein the gradient refers to the parameter calculation of the multiple functions Partial derivatives, the partial derivatives of the respective parameters obtained are expressed in the form of vectors, and gradient descent is mainly used for weight (model parameters) update in the neural network model, that is, updating and adjusting the parameters of the model in one direction to minimize the loss function.

The gradient of the loss function with respect to the parameter is the direction in which the loss function rises most rapidly, and in order to minimize the loss function, the gradient of the parameter is made to follow the direction opposite to the gradient, so that the loss function can be reduced. That is, the model parameters can be updated in the opposite direction of the loss function gradient using gradient descent to minimize the loss function by returning the associated error during the back propagation of the network.

In this embodiment, the first mask image is input to the first recognition model, so as to obtain a second recognition result output by the first recognition model, where the second recognition result is a predicted output of the recognition model, and an expected output corresponding to the first mask image is a preset recognition result (i.e., a labeled recognition result corresponding to the training image is an expected recognition result), where a deviation exists between the second recognition result and the preset recognition result. After the second recognition result is obtained, a second function value corresponding to the preset loss function, the second recognition result and the preset recognition result may be determined, where the second function value is used to represent an error between the second recognition result and the preset recognition result, and the preset loss function may be a cross entropy loss function or other loss functions, and in this embodiment, the type of the loss function is not limited.

Based on the obtained second function value, the model parameters of the first recognition model can be updated along the gradient descending direction corresponding to the model parameters of the first recognition model by the preset loss function, and the second recognition model can be obtained. Here, the input parameters of the preset loss function include model parameters of the first recognition model, in other words, the preset loss function is a function of the model parameters of the first recognition model, the gradient of the preset loss function with respect to the model parameters of the first recognition model is the direction in which the preset loss function rises most rapidly, and updating the model parameters of the first recognition model along the opposite direction of the gradient can achieve the minimization of model loss.

According to the embodiment provided by the application, the model parameters of the recognition model are updated along the gradient descending direction corresponding to the model parameters of the loss function and the recognition model based on the errors of the recognition result output by the recognition model and the expected recognition result, so that the model parameters can be optimized, and the model training efficiency is improved.

As an alternative, adjusting parameter values of image mask parameters based on a loss of the second recognition model corresponding to the first mask image to obtain a target image mask includes:

S21, inputting the first mask image into a second recognition model to obtain a first recognition result output by the second recognition model;

s22, determining a first function value corresponding to a preset loss function, a first recognition result and a preset recognition result, wherein the preset recognition result is a marked recognition result corresponding to a training image, and the first function value is used for representing an error between the first recognition result and the preset recognition result;

s23, updating parameter values of image mask parameters along the rising direction of the first parameter gradients based on the first function values to obtain a target image mask, wherein input parameters of a preset loss function comprise the image mask parameters, and the second parameter gradients are gradients of the preset loss function and the image mask parameters.

In order to improve the accuracy of the image mask parameter adjustment, the image mask parameter may be used as an input parameter of a preset loss function, and the mask parameter of the image mask of the training image may be updated based on gradient rising, so as to obtain an updated image mask, that is, a target image mask. Here, the objective of optimizing the mask region is to make the new mask-blocked region more challenging in terms of recognition degree after mask optimization, so that the mask parameters of the image mask are updated along the gradient rising direction corresponding to the mask parameters of the image mask of the training image by the preset loss function, so that model loss can be increased, and at this time, the mask image generated by using the updated image mask has more recognition difficulty for the second recognition model, so that the situation of over-fitting can be avoided, and the generalization capability of the recognition model is improved.

In this embodiment, after the second recognition model is obtained, the first mask image may be input to the second recognition model to obtain a first recognition result, and a first function value corresponding to the first recognition result and the preset recognition result of the preset loss function may be determined. The first recognition result may be a recognition result corresponding to the first mask image output by the second recognition model, and an error of the first recognition result from the preset recognition result is smaller than that of the second recognition result, that is, the first function value is smaller than the second function value. Based on the obtained first function value, the mask parameters of the initial image mask can be updated along the gradient rising direction corresponding to the preset loss function and the image mask parameters to obtain the target image mask, and here, the loss maximization can be realized by updating the mask parameters of the initial image mask along the gradient rising direction.

It should be noted that, although the gradient up/down operation is adopted for updating in the embodiment of the present application, other updating strategies may be adopted, for example, gradient update with momentum, adam (adaptive moment estimation, adaptive motion estimation algorithm) optimizer, and the like.

According to the embodiment provided by the application, based on the error of the recognition result output by the recognition model and the expected recognition result, the mask parameters of the image mask of the training image are updated along the gradient rising direction corresponding to the image mask parameters of the loss function, so that more guiding and targeted data enhancement can be realized.

As an alternative, the method further includes:

s31, determining the product of the derivative of the preset loss function on the image mask corresponding to the training image and the derivative of the image mask corresponding to the training image on the image mask parameter as a first parameter gradient.

In this embodiment, the loss function may be expressed as a function of model parameters of the mask image and the recognition model, and the goal of adjusting the image mask parameters is to maximize the training loss function of the recognition model, i.e.,

wherein phi is _geo For image mask parameters, M (Φ _geo ) As a mask for an image, I.alpha.M (phi) _geo )+Δ⊙.1-M(Φ _geo ) Is the mask image.

The gradient of the loss L relative to the image mask parameter can be obtained through the gradient inversion and the chain rule, that is, the product of the derivative of the preset loss function on the image mask corresponding to the training image and the derivative of the image mask corresponding to the training image on the image mask parameter is determined as the first parameter gradient, so that the parameter value of the image mask parameter can be updated based on the first parameter gradient, for example, the updating manner of the parameter value of the image mask parameter can be as shown in the formula (2):

wherein phi is ₀ May be the current parameter value (e.g., initial parameter value) of the image mask parameter, For the current mask image (e.g., initialized mask enhanced image), γ represents the update step.

According to the embodiment provided by the application, the parameter value of the image mask parameter is updated through gradient inversion and a chain method, so that the accuracy and the efficiency of updating the image model parameter can be improved.

As an alternative, before the masking region of the initial image mask is used to mask the first image region of the training image, the method further includes:

s41, randomly sampling the image mask parameters to obtain initial parameter values of the image mask parameters;

s42, performing image mask generation operation by using a differentiable activation function based on the initial parameter value of the image mask parameter to obtain an initial image mask, wherein the differentiable activation function is a monotonically increasing function.

In this embodiment, in order to obtain an initial image mask, the image mask parameters may be randomly sampled to obtain initial parameter values of the image mask parameters, where the image mask parameters may include one or more mask parameters, and when the image mask parameters include multiple mask parameters, all mask parameters may be randomly sampled, or at least some mask parameters in the multiple mask parameters may also be randomly sampled, and parameter values of other mask parameters remain unchanged. For example, the image mask parameters include five mask parameters, three mask parameters specified therein may be randomly sampled, and parameter values of the other two mask parameters remain unchanged.

Based on the initial parameter values of the image mask parameters, an image mask generation operation may be performed, resulting in an initial image mask. In order to enable the geometric parameters of the mask to be optimized along with the network training targets, the image mask can be configured into a differentiable structure so as to solve the gradient of the preset loss function corresponding to the image mask parameters. For this purpose, when generating the image mask, the image mask generating operation may be performed using the differentiable activation function, resulting in a corresponding image mask. For an initial image mask, the image mask generation operation may be performed using a differentiable activation function based on initial parameter values of image mask parameters, resulting in an initial image mask.

Here, the differentiable activation function δ (·) may be any differentiable activation function that satisfies the following condition:

lim _x→∞ δ(x)＝1，lim _x→-∞ δ(x)＝0

it may be a monotonically increasing function, for example, a sigmoid function, or may be another monotonically increasing function with the same mathematical property, and the differentiable activation function is not limited in this embodiment.

According to the embodiment provided by the application, the image mask parameters are randomly sampled, so that the convenience of generating the initial image mask can be improved; by executing the image mask generation operation by using the monotonically increasing differentiable activation function, the gradient of the loss function relative to the image mask parameters can be conveniently determined, and the convenience of image mask adjustment is improved.

As an alternative, the random sampling of the image mask parameters to obtain initial parameter values of the image mask parameters includes:

s51, randomly sampling area shape parameters in the image mask parameters to obtain initial shape parameter values, wherein the area shape parameters are used for representing the area shape of a mask area of the image mask corresponding to the training image;

s52, randomly sampling the regional position parameters in the image mask parameters to obtain initial position parameter values, wherein the regional position parameters are used for representing the regional positions of the mask regions of the image mask corresponding to the training images.

In this embodiment, the image mask parameters may include a variety of mask parameters, which may include, but are not limited to, at least one of: and a region shape parameter for representing a region shape of a mask region of the image mask corresponding to the training image, and a region position parameter for representing a region position of the mask region of the image mask corresponding to the training image. Here, adjusting the area shape of the mask area in the image mask may change the area shape of the image area in the training image that is blocked by the mask area of the image mask to update the mask image; adjusting the region position of the mask region in the image mask may change the region position of the image region in the training image that is blocked by the mask region of the image mask to update the mask image.

As an alternative embodiment, the area shape parameter of the image mask parameters may be randomly sampled to obtain the initial shape parameter value, and the random sampling of the area shape parameter may be performed within a preset area shape parameter range, where the preset area shape parameter range may be set according to at least one of an image shape and an area size of the training image, so as to avoid an influence on recognition accuracy of the recognition model due to an excessively large or excessively small image area blocked by the mask area of the image mask.

As another alternative embodiment, the area position parameter in the image mask parameter may be randomly sampled to obtain the initial position parameter value, where the random sampling of the area position parameter may be performed within a preset area position parameter range, and the preset area position parameter range may be set according to at least one of an image shape and an area size of the training image, so as to avoid an influence on recognition accuracy of the recognition model due to over-deviation of the image area blocked by the mask area of the image mask.

According to the embodiment provided by the application, the initialization flexibility of the image mask can be realized by randomly sampling at least one of the area shape parameter and the area position parameter to obtain the initialized random mask.

As an alternative, randomly sampling the region shape parameter in the image mask parameter to obtain an initial shape parameter value, including:

s61, randomly sampling area size parameters in the image mask parameters to obtain initial size parameter values, wherein the area size parameters are used for representing the area size of a mask area of the image mask corresponding to the training image;

s62, randomly sampling rotation angle parameters in the image mask parameters to obtain an initial rotation angle value, wherein the rotation angle parameters are used for representing the rotation angle of the mask region of the image mask corresponding to the training image along the preset direction by taking the region center point of the mask region of the image mask corresponding to the training image as the center of a circle.

In this embodiment, the region shape parameters may include one or more, and may include, but are not limited to, at least one of: the region size parameter is used for representing the region size of the mask region of the image mask corresponding to the training image, and the rotation angle parameter is used for representing the rotation angle of the mask region of the image mask corresponding to the training image along the preset direction by taking the region center point of the mask region of the image mask corresponding to the training image as the center of a circle.

As an alternative embodiment, the region shape parameter may include a region size parameter. The types of the corresponding region size parameters may be different for different shaped mask regions, for example, if the mask region is a rectangular region (as shown on the left side of fig. 9), the region size parameters may include at least one of a region length and a region width, if the mask region is a circular region (as shown on the right side of fig. 9), the region size parameters may include a region radius, and for other shaped mask regions (for example, oval, triangular or other shapes), similar region size parameters may be configured, which is not limited in this embodiment.

As another alternative, the region shape parameter may include a rotation angle parameter. The rotation angle indicated by the rotation angle parameter is an angle that the mask region rotates along a preset direction by taking the region center point of the mask region as a circle center, and the preset direction can be either a clockwise direction or a counterclockwise direction. For mask regions (e.g., rectangular regions) where the region shape will change with rotation of the mask region, the region shape parameter may include a rotation angle parameter, while for mask regions (e.g., circular regions) where the region shape will not change with rotation of the mask region, the region shape parameter may not include a rotation angle parameter.

According to the embodiment provided by the application, the initialization flexibility of the image mask can be realized by randomly sampling at least one of the region size parameter and the rotation angle parameter to obtain the initialized random mask.

As an alternative, the random sampling of the regional position parameter in the image mask parameter to obtain the initial position parameter value includes:

s71, randomly sampling a central point position parameter in the image mask parameters to obtain initial position parameter values, wherein the central point position parameter is used for representing the position of the regional central point of the mask region of the image mask corresponding to the training image.

The region position parameters may include parameters for identifying arbitrary positions in the mask region, e.g., an upper left vertex, an upper right vertex, a lower left vertex, a lower right vertex, etc., of the mask region. In this embodiment, the region position parameter may be a center point position parameter that may include a position of a region center point of a mask region of the image mask corresponding to the training image, and the mask region in the image mask may be determined by combining the region center point position of the mask region with a region shape parameter value of the mask region.

Alternatively, the center point position parameter may include at least one of an abscissa of the region center point and an ordinate of the region center point, and if the abscissa of the region center point of the mask region is fixed, the center point position parameter may include only the ordinate of the region center point, and if the ordinate of the region center point of the mask region is fixed, the center point position parameter may include only the abscissa of the region center point.

Taking an image mask as a rectangular mask as an example, 5 geometric parameters phi of the rectangular mask _geo ＝[x,y,w,h,θ- ^T Where (x, y) denotes the center point coordinates, (w, h) denotes the length and width, and θ denotes the clockwise rotation angle of the rectangular mask portion with respect to the horizontal axis with the center point (x, y) as the center.

According to the embodiment provided by the application, the initialized random mask is obtained by randomly sampling the central point position parameter, so that the flexibility of initializing the image mask can be realized.

As an alternative, the mask area may be a rectangular area, and the image mask parameters may include an area length parameter, an area width parameter, a center point position parameter, and a rotation angle parameter, where the meaning of each parameter is similar to that of the foregoing embodiment, and will not be described herein. The mask region indicated by the initial parameter values of the image mask parameters is an initial mask region, and the initial parameter values of the image mask parameters include: the initial mask region comprises an initial region length value for the region length of the initial mask region, an initial region width value for the region width of the initial mask region, an initial center point abscissa and an initial center point ordinate for the region center point position of the initial mask region, and an initial rotation angle value for representing the rotation angle of the initial mask region, wherein the rotation angle of the initial mask region refers to the angle rotated by the initial mask region along a preset direction by taking the region center point of the initial mask region as a dot.

Correspondingly, based on the initial parameter values of the image mask parameters, performing an image mask generation operation using a differentiable activation function to obtain an initial image mask, comprising:

s81, respectively executing pixel value determining operation by taking each pixel position in the initial image mask as the current pixel position to obtain the initial image mask.

For the current pixel position, which may include a current pixel abscissa and a current pixel ordinate, when performing an image mask generation operation using the differentiable activation function, function values of the differentiable activation function corresponding to the first reference value and function values of the differentiable activation function corresponding to the second reference value may be determined, respectively, and a product of the two function values may be determined as a pixel value of the current pixel position. Here, the first reference value is a product of the first coordinate difference value and the cosine value of the initial rotation angle value, a product of the second coordinate difference value and the sine value of the initial rotation angle value, a value obtained by subtracting half of the initial region length value, the second reference value is a product of the second coordinate difference value and the cosine value of the initial rotation angle value, a product of the first coordinate difference value and the sine value of the initial rotation angle value, a value obtained by subtracting half of the initial region width value, and the first coordinate difference value is a coordinate difference value of the current pixel abscissa and the initial center point abscissa, and the second coordinate difference value is a coordinate difference value of the initial center point ordinate and the current pixel ordinate.

For example, with 5 geometric parameters of a rectangular mask, the pixel value of mask M at each pixel location (u, v) can be as shown in equation (3):

delta (·) can be any differentiable activation function that can be instantiated using a sigmoid function as shown in equation (4):

according to the embodiment provided by the application, the pixel value of each pixel position in the image mask is determined by combining the differential activation function with the regional parameter of the rectangular mask region, so that the convenience of image mask generation can be realized.

As an alternative, performing an image mask generation operation using a differentiable activation function based on initial parameter values of image mask parameters, resulting in an initial image mask, comprising:

s91, performing image mask generation operation according to the resolution of the training image by using the differentiable activation function based on the initial parameter value of the image mask parameter to obtain an initial image mask, wherein the resolution of the initial image mask is equal to the resolution of the training image.

In this embodiment, the resolution of the generated initial image mask and the resolution of the training image may have an association relationship, for example, the resolution may be the same or may be proportional to each other. If the two are in a certain proportion, the generated initial image mask can be scaled so that the resolution of the initial image mask is the same as that of the training image.

In order to simplify the flow of the region mask processing and improve the efficiency of the region mask, the resolution of the generated initial image mask can be controlled to be equal to the resolution of the training image, that is, the image mask generating operation is performed according to the resolution of the training image by using the differentiable activation function based on the initial parameter values of the image mask parameters, so as to obtain the initial image mask. In this case, the region mask processing can be performed using the generated initial image mask without performing other processing on the generated initial image mask.

According to the embodiment provided by the application, the image mask generating operation is performed according to the resolution of the training image by using the differentiable activation function, so that the flow of the area mask processing can be simplified, and the efficiency of the area mask is improved.

As an alternative, performing an image mask generation operation using a differentiable activation function based on initial parameter values of image mask parameters, resulting in an initial image mask, comprising one of:

s101, performing image mask generation operation by using a logic substance function based on initial parameter values of image mask parameters to obtain an initial image mask;

s103, based on initial parameter values of image mask parameters, performing image mask generation operation by using a preset hyperbolic tangent function to obtain an initial image mask, wherein the preset hyperbolic tangent function is a hyperbolic tangent function with offset and scaling items.

In this embodiment, the differentiable activation function may be any differentiable monotonically increasing function, and may include, but is not limited to, one of the following: logic function (sigmoid function), hyperbolic tangent function (tanh function) with bias and scaling terms, i.e. preset hyperbolic tangent function. For other monotonically increasing functions with the same mathematical properties, the same can be applied as a differentiable activation function in generating the image mask.

When generating the initial image mask, the image mask generating operation may be performed using a logistic function based on the initial parameter values of the image mask parameters to obtain the initial image mask, or the image mask generating operation may be performed using a preset hyperbolic tangent function based on the initial parameter values of the image mask parameters to obtain the initial image mask.

By adopting the embodiment provided by the application, the flexibility of generating the image mask can be improved by generating the image mask by using the differentiable monotonically increasing function.

The mask image-based model training method in the embodiment of the present application is explained below with reference to an alternative example. In this alternative example, the mask area is a rectangular area, and the image mask parameter Φ _geo For 5 geometric parameters: [ x, y, w, h, θ ] ^T 。

The optional example provides a data enhancement scheme based on gradient drift, on the basis of random mask selection, the selection of mask geometric parameters is optimized by utilizing training loss function gradient of a network based on a gradient optimization strategy, so that a better data amplification effect is achieved, and model generalization capability is improved. Since the resolution of the mask M is consistent with x, and each position is a discrete value {0,1}, it is difficult to directly optimize by using gradient update, and therefore M can be represented as a differentiable structure in a reparameterized manner, as shown in formula (2), by reparameterizing as described above, the mask can be optimized by optimizing the geometric parameter Φ _geo To realize the maskingDrift of the film geometry region.

Due to the involvement of mask parameters phi _geo In contrast to the alternate updating of the neural network weights phi, and the optimization objective described above is achieved by this alternative example in a gradient inversion and alternate updating manner. Referring to fig. 10 and 11, for one training sample (I, c), the iterative process of one training can be divided into the following steps:

step S1102, randomly sampling Φ ₀ ＝[x ₀ ,y ₀ ,w ₀ ,h ₀ ,θ ₀ - ^T As an initialization mask (i.e., an initial image mask), an initialized mask enhanced image is obtained (i.e., initial mask image), mask enhanced image +.>Can be represented by the formula (5):

step S1104 ofInputting the neural network for training, using the back-transferred parameter gradient to update the model parameter phi, and obtaining the loss L relative to the geometric parameter phi through the gradient negation and the chain rule _geo To update the geometrical parameter phi _geo Updating mode of model parameter phi and geometric parameter phi _geo The update manner of (2) can be as shown in the formula (6) and the formula (2):

as shown in fig. 12, after the parameter values of the mask parameters of the image mask 1 are adjusted, an image mask 3 is obtained, and a mask image corresponding to the image mask 3 is the mask image 3. The image mask 3 is updated with respect to the image mask 1 in the area size, the area position and the relative angle to the horizontal axis of the rectangular area.

Step S1106, updating the geometric parameter phi _geo Reuse for mask generation and data enhanced imagesThe image is sent into the neural network to train again and update the model parameters of the neural network, and the updating mode of the model parameters phi can be shown as a formula (7):

by the alternative example, the application which is newly increased continuously can be classified and the application of different types but the same audience group can be discovered by using the embedded vector generation modes with different granularities.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

According to another aspect of the embodiment of the present application, there is also provided a mask image-based model training apparatus for implementing the above mask image-based model training method. FIG. 13 is a block diagram of an alternative mask image based model training apparatus according to an embodiment of the present application, as shown in FIG. 13, the apparatus may include:

a first processing unit 1302, configured to perform occlusion processing on a first image area of the training image by using a mask area of the initial image mask, so as to obtain a first mask image;

a first training unit 1304, coupled to the first processing unit 1302, for performing model training on the first recognition model using the first mask image to update model parameters of the first recognition model to obtain a second recognition model;

An adjusting unit 1306, connected to the first training unit 1304, configured to adjust parameter values of image mask parameters based on a loss corresponding to the first mask image by the second recognition model, to obtain a target image mask, where the image mask parameters are used to represent a mask region of the image mask corresponding to the training image;

a second processing unit 1308, connected to the adjusting unit 1306, configured to perform occlusion processing on a second image area of the training image by using a mask area of the target image mask, so as to obtain a second mask image, where a loss corresponding to the first mask image by the second recognition model is lower than a loss corresponding to the second mask image by the second recognition model;

a second training unit 1310, coupled to the second processing unit 1308, is configured to perform model training on the second recognition model using the second mask image, so as to update model parameters of the second recognition model, and obtain a third recognition model.

It should be noted that, the first processing unit 1302 in this embodiment may be used to perform the above-mentioned step S202, the first training unit 1304 in this embodiment may be used to perform the above-mentioned step S204, the adjusting unit 1306 and the second processing unit 1308 in this embodiment may be used to perform the above-mentioned step S206, and the second training unit 1310 in this embodiment may be used to perform the above-mentioned step S208.

As an alternative, the adjusting unit includes:

the first input module is used for inputting the first mask image into the second recognition model to obtain a first recognition result output by the second recognition model;

the first determining module is used for determining a first function value corresponding to the preset loss function, the second recognition result and the preset recognition result, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the first function value is used for representing an error between the first recognition result and the preset recognition result;

the first updating module is used for updating the parameter value of the image mask parameter along the rising direction of the first parameter gradient based on the first function value to obtain the target image mask, wherein the input parameter of the preset loss function comprises the image mask parameter, and the first parameter gradient is the gradient corresponding to the preset loss function and the image mask parameter.

An optional example of this embodiment may refer to an example shown in the above mask image-based model training method, and will not be described herein.

As an alternative, the apparatus further includes:

and the determining unit is used for determining the product of the derivative of the preset loss function on the image mask corresponding to the training image and the derivative of the image mask corresponding to the training image on the image mask parameter as a first parameter gradient.

As an alternative, the apparatus further includes:

the sampling unit is used for randomly sampling the image mask parameters to obtain initial parameter values of the image mask parameters before the mask region of the initial image mask is used for carrying out shielding treatment on the first image region of the training image to obtain a first mask image;

and the execution unit is used for executing image mask generation operation by using the differentiable activation function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the differentiable activation function is a monotonically increasing function.

As an alternative, the sampling unit includes:

the first sampling module is used for randomly sampling the regional shape parameters in the image mask parameters to obtain initial shape parameter values, wherein the regional shape parameters are used for representing the regional shape of the mask region of the image mask corresponding to the training image;

And the second sampling module is used for randomly sampling the regional position parameters in the image mask parameters to obtain initial position parameter values, wherein the regional position parameters are used for representing the regional positions of the mask regions of the image mask corresponding to the training images.

As an alternative, the first sampling module includes:

the first sampling submodule is used for randomly sampling the region size parameter in the image mask parameters to obtain an initial size parameter value, wherein the region size parameter is used for representing the region size of the mask region of the image mask corresponding to the training image;

the second sampling submodule is used for randomly sampling rotation angle parameters in the image mask parameters to obtain initial rotation angle values, wherein the rotation angle parameters are used for representing angles of rotation of the mask region of the image mask corresponding to the training image along a preset direction by taking the region center point of the mask region of the image mask corresponding to the training image as the center of a circle.

As an alternative, the second sampling module includes:

and the third sampling submodule is used for randomly sampling the central point position parameter in the image mask parameters to obtain initial position parameter values, wherein the central point position parameter is used for representing the position of the regional central point of the mask region of the image mask corresponding to the training image.

As an alternative, the mask area indicated by the initial parameter value of the image mask parameter is an initial mask area, and the initial parameter value of the image mask parameter includes: the initial mask region comprises an initial region length value for the region length of the initial mask region, an initial region width value for the region width of the initial mask region, an initial center point abscissa and an initial center point ordinate for the region center point position of the initial mask region, and an initial rotation angle value for representing the rotation angle of the initial mask region, wherein the rotation angle of the initial mask region refers to the angle rotated by the initial mask region along a preset direction by taking the region center point of the initial mask region as a dot. The execution unit includes:

The first execution module is used for respectively executing the following operations by taking each pixel position in the initial image mask as a current pixel position to obtain the initial image mask, wherein the current pixel position comprises a current pixel abscissa and a current pixel ordinate:

determining the product of a function value corresponding to a differential activation function and a first reference value and a function value corresponding to a differential activation function and a second reference value as a pixel value of the current pixel position, wherein the first reference value is the product of a first coordinate difference value and a cosine value of an initial rotation angle value, the product of a second coordinate difference value and a sine value of the initial rotation angle value is added, a value obtained by subtracting half of an initial region length value is subtracted, the second reference value is the product of a second coordinate difference value and a cosine value of the initial rotation angle value, the product of the first coordinate difference value and the sine value of the initial rotation angle value is subtracted, and a value obtained by subtracting half of an initial region width value is subtracted;

the first coordinate difference value is a coordinate difference value between the abscissa of the current pixel and the abscissa of the initial center point, and the second coordinate difference value is a coordinate difference value between the ordinate of the initial center point and the ordinate of the current pixel.

As an alternative, the execution unit includes:

and the second execution module is used for executing image mask generation operation according to the resolution of the training image by using the differentiable activation function based on the initial parameter value of the image mask parameter to obtain an initial image mask, wherein the resolution of the initial image mask is equal to the resolution of the training image.

As an alternative, the execution unit comprises one of:

the third execution module is used for executing image mask generating operation by using a logic substance function based on the initial parameter value of the image mask parameter to obtain an initial image mask;

and the fourth execution module is used for executing image mask generation operation by using a preset hyperbolic tangent function based on the initial parameter value of the image mask parameter to obtain an initial image mask, wherein the preset hyperbolic tangent function is a hyperbolic tangent function with offset and scaling items.

As an alternative, the first training unit includes:

the first input module is used for inputting the first mask image into the first recognition model to obtain a second recognition result output by the first recognition model;

the first determining module is used for determining a second function value corresponding to a second recognition result and a preset recognition result of the preset loss function, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the second function value is used for representing an error between the first recognition result and the preset recognition result;

the first updating module is used for updating the model parameters of the first identification model along the descending direction of the second parameter gradient based on the second function value to obtain a second identification model, wherein the input parameters of the preset loss function comprise the model parameters of the first identification model, and the second parameter gradient is a gradient corresponding to the model parameters of the preset loss function and the first identification model.

According to still another aspect of the embodiment of the present application, there is further provided an electronic device for implementing the mask image-based model training method described above, where the electronic device may be a terminal device or a server as shown in fig. 1. The present embodiment is described taking the electronic device as a terminal device as an example. As shown in fig. 14, the electronic device comprises a memory 1402 and a processor 1404, the memory 1402 having stored therein a computer program, the processor 1404 being arranged to perform the steps of any of the method embodiments described above by means of the computer program.

Alternatively, in this embodiment, the electronic device may be located in at least one network device of a plurality of network devices of the computer network.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

s1, carrying out shielding treatment on a first image area of a training image by using a mask area of an initial image mask to obtain a first mask image;

s2, performing model training on the first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model;

s3, adjusting parameter values of image mask parameters based on loss corresponding to the first mask image by the second recognition model to obtain a target image mask, and shielding a second image area of the training image by using a mask area of the target image mask to obtain a second mask image, wherein the image mask parameters are used for representing the mask area of the image mask corresponding to the training image, and the loss corresponding to the second recognition model and the first mask image is lower than that of the second recognition model and the second mask image;

and S4, performing model training on the second recognition model by using the second mask image so as to update model parameters of the second recognition model and obtain a third recognition model.

Alternatively, it will be understood by those skilled in the art that the structure shown in fig. 14 is only schematic, and the electronic device may also be a smart phone (such as an Android mobile phone, an iOS mobile phone, etc.), a tablet computer, a palm computer, and terminal devices such as MID, PAD, etc. Fig. 14 is not limited to the structure of the electronic device described above. For example, the electronic device may also include more or fewer components (e.g., network interfaces, etc.) than shown in FIG. 14, or have a different configuration than shown in FIG. 14.

The memory 1402 may be used to store software programs and modules, such as program instructions/modules corresponding to the mask image-based model training method and apparatus in the embodiments of the present application, and the processor 1404 executes the software programs and modules stored in the memory 1402 to perform various functional applications and data processing, i.e., implement the mask image-based model training method described above. Memory 1402 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 1402 may further include memory located remotely from processor 1404, which may be connected to the terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof. Wherein memory 1402 may be used, in particular but not limited to, for storing information identifying models, model parameters, image masks, training images, and the like. As an example, as shown in fig. 14, the memory 1402 may include, but is not limited to, the first processing unit 1302, the first training unit 1304, the adjusting unit 1306, the second processing unit 1308, and the second training unit 1310 in the mask image-based model training apparatus. In addition, other module units in the mask image-based model training apparatus may be further included, but are not limited to, and are not described in detail in this example.

Optionally, the transmission device 1406 is used to receive or transmit data via a network. Specific examples of the network described above may include wired networks and wireless networks. In one example, the transmission device 1406 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices and routers via a network cable to communicate with the internet or a local area network. In one example, the transmission device 1406 is a Radio Frequency (RF) module that is used to communicate wirelessly with the internet.

In addition, the electronic device further includes: a display 1408 for displaying training images, mask images, etc.; and a connection bus 1410 for connecting the respective module parts in the above-described electronic device.

In other embodiments, the terminal device or the server may be a node in a distributed system, where the distributed system may be a blockchain system, and the blockchain system may be a distributed system formed by connecting the plurality of nodes through a network communication. Among them, the nodes may form a Peer-To-Peer (P2P) network, and any type of computing device, such as a server, a terminal, etc., may become a node in the blockchain system by joining the Peer-To-Peer network.

According to one aspect of the present application, there is provided a computer program product comprising a computer program/instruction containing program code for executing the method shown in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1509, and/or installed from the removable medium 1511. When executed by the central processor 1501, performs various functions provided by embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

FIG. 15 is a block diagram of the architecture of a computer system of an alternative electronic device in accordance with an embodiment of the present application. As shown in fig. 15, the computer system 1500 includes a central processing unit 1501 (Central Processing Unit, CPU) which can execute various appropriate actions and processes according to a program stored in a Read-Only Memory 1502 (ROM) or a program loaded from a storage section 1508 into a random access Memory 1503 (Random Access Memory, RAM). In the random access memory 1503, various programs and data necessary for the operation of the system are also stored. The cpu 1501, the rom 1502 and the ram 1503 are connected to each other via a bus 1504. An Input/Output interface 1505 (i.e., an I/O interface) is also connected to bus 1504.

The following components are connected to input/output interface 1505: an input section 1506 including a keyboard, mouse, and the like; an output portion 1507 including a Cathode Ray Tube (CRT), a liquid crystal display (Liquid Crystal Display, LCD), and a speaker; a storage section 1508 including a hard disk and the like; and a communication section 1509 including a network interface card such as a lan card, a modem, or the like. The communication section 1509 performs communication processing via a network such as the internet. The drive 1510 is also connected to the input/output interface 1505 as needed. Removable media 1511, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1510 as needed so that a computer program read therefrom is mounted into the storage section 1508 as needed.

In particular, the processes described in the various method flowcharts may be implemented as computer software programs according to embodiments of the application. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1509, and/or installed from the removable medium 1511. The computer programs, when executed by the central processor 1501, perform the various functions defined in the system of the present application.

It should be noted that, the computer system 1500 of the electronic device shown in fig. 15 is only an example, and should not impose any limitation on the functions and the application scope of the embodiments of the present application.

According to one aspect of the present application, there is provided a computer-readable storage medium, from which a processor of a computer device reads the computer instructions, the processor executing the computer instructions, causing the computer device to perform the methods provided in the various alternative implementations of the above embodiments.

Alternatively, in the present embodiment, the above-described computer-readable storage medium may be configured to store a computer program for performing the steps of:

Alternatively, in this embodiment, it will be understood by those skilled in the art that all or part of the steps in the methods of the above embodiments may be performed by a program for instructing a terminal device to execute the steps, where the program may be stored in a computer readable storage medium, and the storage medium may include: flash disk, read-Only Memory (ROM), random-access Memory (Random Access Memory, RAM), magnetic or optical disk, and the like.

The integrated units in the above embodiments may be stored in the above-described computer-readable storage medium if implemented in the form of software functional units and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, comprising several instructions for causing one or more computer devices (which may be personal computers, servers or network devices, etc.) to perform all or part of the steps of the method described in the embodiments of the present application.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In several embodiments provided by the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or at least two units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A model training method based on mask images, comprising:

using a mask region of the initial image mask to conduct shielding treatment on a first image region of the training image to obtain a first mask image;

performing model training on a first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model;

adjusting parameter values of image mask parameters based on loss corresponding to the second recognition model and the first mask image to obtain a target image mask, and performing shielding treatment on a second image area of the training image by using a mask area of the target image mask to obtain a second mask image, wherein the image mask parameters are used for representing the mask area of the image mask corresponding to the training image, and the loss corresponding to the second recognition model and the first mask image is lower than the loss corresponding to the second recognition model and the second mask image;

And performing model training on the second recognition model by using the second mask image so as to update model parameters of the second recognition model and obtain a third recognition model.

2. The method according to claim 1, wherein said adjusting parameter values of image mask parameters based on the loss of the second recognition model corresponding to the first mask image, results in a target image mask, comprising:

inputting the first mask image into the second recognition model to obtain a first recognition result output by the second recognition model;

determining a first function value corresponding to a preset loss function, the second recognition result and the preset recognition result, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the first function value is used for representing an error between the first recognition result and the preset recognition result;

and updating the parameter value of the image mask parameter along the rising direction of the first parameter gradient based on the first function value to obtain the target image mask, wherein the input parameter of the preset loss function comprises the image mask parameter, and the first parameter gradient is a gradient corresponding to the preset loss function and the image mask parameter.

3. The method according to claim 2, wherein the method further comprises:

and determining the product of the derivative of the preset loss function on the image mask corresponding to the training image and the derivative of the image mask corresponding to the training image on the image mask parameter as the first parameter gradient.

4. The method of claim 2, wherein prior to the masking the first image region of the training image with the mask region of the initial image mask to obtain the first mask image, the method further comprises:

randomly sampling the image mask parameters to obtain initial parameter values of the image mask parameters;

and performing image mask generation operation by using a differentiable activation function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the differentiable activation function is a monotonically increasing function.

5. The method of claim 4, wherein randomly sampling the image mask parameters to obtain initial parameter values for the image mask parameters comprises:

randomly sampling the regional shape parameters in the image mask parameters to obtain initial shape parameter values, wherein the regional shape parameters are used for representing the regional shape of the mask region of the image mask corresponding to the training image;

And randomly sampling the regional position parameters in the image mask parameters to obtain initial position parameter values, wherein the regional position parameters are used for representing the regional positions of mask regions of the image mask corresponding to the training images.

6. The method of claim 5, wherein randomly sampling the region shape parameters of the image mask parameters to obtain initial shape parameter values comprises:

randomly sampling the region size parameter in the image mask parameters to obtain an initial size parameter value, wherein the region size parameter is used for representing the region size of a mask region of the image mask corresponding to the training image;

and randomly sampling the rotation angle parameter in the image mask parameters to obtain an initial rotation angle value, wherein the rotation angle parameter is used for representing the rotation angle of the mask region of the image mask corresponding to the training image along the preset direction by taking the region center point of the mask region of the image mask corresponding to the training image as the center of a circle.

7. The method according to claim 5, wherein randomly sampling the region position parameter in the image mask parameters to obtain the initial position parameter value comprises:

And randomly sampling a central point position parameter in the image mask parameters to obtain the initial position parameter value, wherein the central point position parameter is used for representing the position of the regional central point of the mask region of the image mask corresponding to the training image.

8. The method of claim 4, wherein the mask region indicated by the initial parameter values of the image mask parameters is an initial mask region, the initial parameter values of the image mask parameters comprising: an initial region length value for a region length of the initial mask region, an initial region width value for a region width of the initial mask region, an initial center point abscissa and an initial center point ordinate for a region center point position of the initial mask region, and an initial rotation angle value for representing a rotation angle of the initial mask region, wherein the rotation angle of the initial mask region refers to an angle rotated in a preset direction by taking the region center point of the initial mask region as a dot;

the performing an image mask generating operation using a differentiable activation function based on the initial parameter values of the image mask parameters to obtain the initial image mask includes:

The following operations are respectively executed by taking each pixel position in the initial image mask as a current pixel position to obtain the initial image mask, wherein the current pixel position comprises a current pixel abscissa and a current pixel ordinate:

determining the product of a function value corresponding to the differentiable activation function and a first reference value and a function value corresponding to the differentiable activation function and a second reference value as a pixel value of the current pixel position, wherein the first reference value is a product of a first coordinate difference value and a cosine value of the initial rotation angle value, a product of a second coordinate difference value and a sine value of the initial rotation angle value is added, a value obtained by subtracting half of the initial region length value is subtracted, the second reference value is a product of the second coordinate difference value and a cosine value of the initial rotation angle value, a product of the first coordinate difference value and a sine value of the initial rotation angle value is subtracted, and a value obtained by subtracting half of the initial region width value is subtracted;

the first coordinate difference value is a coordinate difference value between the current pixel abscissa and the initial center point abscissa, and the second coordinate difference value is a coordinate difference value between the initial center point ordinate and the current pixel ordinate.

9. The method of claim 4, wherein performing an image mask generation operation using a differentiable activation function based on the initial parameter values of the image mask parameters to obtain the initial image mask comprises:

and executing image mask generation operation according to the resolution of the training image by using the differentiable activation function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the resolution of the initial image mask is equal to the resolution of the training image.

10. The method of claim 4, wherein performing an image mask generation operation using a differentiable activation function based on the initial parameter values of the image mask parameters results in the initial image mask, comprising one of:

performing an image mask generation operation using a logistic function based on the initial parameter values of the image mask parameters, to obtain the initial image mask;

and executing image mask generation operation by using a preset hyperbolic tangent function based on the initial parameter value of the image mask parameter to obtain the initial image mask, wherein the preset hyperbolic tangent function is a hyperbolic tangent function with offset and scaling items.

11. The method according to any one of claims 1 to 10, wherein the model training a first recognition model using the first mask image to update model parameters of the first recognition model to obtain a second recognition model, comprises:

inputting the first mask image into the first recognition model to obtain a second recognition result output by the first recognition model;

determining a second function value corresponding to a preset loss function, the second recognition result and the preset recognition result, wherein the preset recognition result is a marked recognition result corresponding to the training image, and the second function value is used for representing an error between the first recognition result and the preset recognition result;

updating the model parameters of the first recognition model along the descending direction of a second parameter gradient based on the second function value to obtain the second recognition model, wherein the input parameters of the preset loss function comprise the model parameters of the first recognition model, and the second parameter gradient is a gradient corresponding to the model parameters of the preset loss function and the first recognition model.

12. A mask image-based model training apparatus, comprising:

The first processing unit is used for carrying out shielding treatment on a first image area of the training image by using a mask area of the initial image mask to obtain a first mask image;

the first training unit is used for carrying out model training on the first recognition model by using the first mask image so as to update model parameters of the first recognition model and obtain a second recognition model;

an adjusting unit, configured to adjust parameter values of image mask parameters based on a loss of the second recognition model corresponding to the first mask image, to obtain a target image mask, where the image mask parameters are used to represent a mask region of the image mask corresponding to the training image;

the second processing unit is used for carrying out shielding treatment on a second image area of the training image by using the mask area of the target image mask to obtain a second mask image, wherein the loss corresponding to the second recognition model and the first mask image is lower than the loss corresponding to the second recognition model and the second mask image;

and the second training unit is used for carrying out model training on the second recognition model by using the second mask image so as to update the model parameters of the second recognition model and obtain a third recognition model.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium comprises a stored program, wherein the program, when run, performs the method of any one of claims 1 to 11.

14. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any one of claims 1 to 11.

15. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1-11 by means of the computer program.