CN115661564A

CN115661564A - Training method and device of image processing model, electronic equipment and storage medium

Info

Publication number: CN115661564A
Application number: CN202211094251.1A
Authority: CN
Inventors: 李剑飞
Original assignee: Hangzhou Hikrobot Co Ltd
Current assignee: Hangzhou Hikrobot Co Ltd
Priority date: 2022-09-08
Filing date: 2022-09-08
Publication date: 2023-01-31

Abstract

The embodiment of the invention provides a training method and a device of an image processing model, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring an image sample, wherein the image sample comprises a target image sample with a calibration label and a background image sample without the calibration label; inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample; determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample; and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model. As the model is trained by adopting the target image sample and the background image sample, the false detection rate of the image processing model is greatly reduced. And because the image samples do not need to be additionally generated according to the existing image samples, the efficiency of model training is higher.

Description

Training method and device of image processing model, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of deep learning model training, in particular to a training method and device of an image processing model, electronic equipment and a storage medium.

Background

Training a network model in a single scene is simple, but the trained network model is easy to generate an overfitting condition, so that after the scene is changed, because the background changes, the network model may detect interferents with different differences between the new background and the original background as targets, namely, a false detection condition occurs, and the generalization capability of the network model is poor.

In order to improve the generalization capability of the network model, a large number of image samples with rich backgrounds and targets need to be adopted to train the model, and in the related art, a large number of image samples with different backgrounds are generated on the basis of the existing image samples by an image sample generation method.

However, in the above method, since an additional image generation step is required before the model training, the efficiency of the model training is relatively low, and a certain difference exists between the generated image sample and the real image sample, so that the false detection rate of the network model obtained by training the generated image sample is high, and the detection accuracy is relatively poor.

Disclosure of Invention

The embodiment of the invention aims to provide a training method and device of an image processing model, an electronic device and a storage medium, which are used for improving the efficiency of model training and reducing the false detection rate of the image processing model. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides a method for training an image processing model, where the method includes:

acquiring an image sample, wherein the image sample comprises a target image sample with a calibration label and a background image sample without the calibration label;

inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample;

determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample;

and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain an image processing model.

Optionally, the step of inputting the image sample into a model to be trained, and extracting the image features of the target image sample and the image features of the background image sample includes:

if the image processing model is a model for a mixed scene, inputting the target image sample and the background image sample into a first image feature extractor, and extracting a first image feature of the target image sample and a second image feature of the background image sample;

and if the image processing model is used for a specific scene, inputting the target image sample into a second image feature extractor, extracting a third image feature of the target image sample, inputting the background image sample into a third image feature extractor, and extracting a fourth image feature of the background image sample, wherein the target image sample is an image acquired under a background corresponding to the background image sample.

Optionally, the image processing model is a model for a mixed scene, and the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label includes:

if the current input image is a background image sample without a calibration label, adjusting the learning weight corresponding to a target position in the first image feature extractor according to a preset mode to reduce the learning weight, wherein the target position is a position at which the first image feature extractor extracts a second image feature from the background image sample;

determining category loss and coordinate regression loss corresponding to the current input image based on the difference between the calibration label and the prediction label and a preset loss function;

determining a back propagation weight corresponding to the category loss based on the adjusted learning weight and the category loss;

and adjusting the model parameters of the model to be trained according to the back propagation weight corresponding to the category loss and the first preset back propagation weight corresponding to the coordinate regression loss.

Optionally, the step of adjusting the learning weight corresponding to the target position in the first image feature extractor according to a preset manner includes:

determining a learning weight alpha corresponding to the target position in the first image feature extractor according to the following formula:

wherein n is the number of background image samples included in the image samples, batch is the total number of the image samples, and β is a preset user parameter.

Optionally, the image processing model is a model for a specific scene, and the step of determining the prediction label corresponding to the image sample based on the image feature of the target image sample and the image feature of the background image sample includes:

fusing the third image characteristic and the fourth image characteristic to obtain a fused image characteristic;

and determining a prediction label corresponding to the image sample based on the fused image characteristics.

Optionally, the image processing model is a model for a specific scene, and the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label includes:

determining category loss and coordinate regression loss corresponding to the target image sample according to the difference between the calibration label and the prediction label and a preset loss function;

and adjusting the model parameters of the model to be trained on the basis of a second preset back propagation weight corresponding to the category loss and a third preset back propagation weight corresponding to the coordinate regression loss.

Optionally, the method further includes:

acquiring an image to be processed;

inputting the image to be processed into the image processing model to obtain a processing result output by the image processing model; or the like, or, alternatively,

and acquiring a background image corresponding to the image to be processed, and inputting the image to be processed and the background image into the image processing model to obtain a processing result output by the image processing model.

In a second aspect, an embodiment of the present invention provides an apparatus for training an image processing model, where the apparatus includes:

the image sample acquisition module is used for acquiring an image sample, wherein the image sample comprises a target image sample with a calibration label and a background image sample without the calibration label;

the image characteristic extraction module is used for inputting the image sample into a model to be trained and extracting the image characteristics of the target image sample and the image characteristics of the background image sample;

the prediction label determining module is used for determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample;

and the model parameter adjusting module is used for adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model.

Optionally, the image feature extraction module includes:

a first image feature extraction unit, configured to, when the image processing model is a model for a mixed scene, input the target image sample and the background image sample into a first image feature extractor, and extract a first image feature of the target image sample and a second image feature of the background image sample;

and the second image feature extraction unit is used for inputting the target image sample into a second image feature extractor, extracting a third image feature of the target image sample, inputting the background image sample into the third image feature extractor, and extracting a fourth image feature of the background image sample when the image processing model is a model for a specific scene, wherein the target image sample is an image acquired under a background corresponding to the background image sample.

Optionally, the model parameter adjusting module includes:

a learning weight adjusting unit, configured to, when the image processing model is a model for a mixed scene and a current input image is a background image sample without a calibration label, adjust a learning weight corresponding to a target position in the first image feature extractor according to a preset manner to reduce the learning weight, where the target position is a position at which the first image feature extractor extracts a second image feature from the background image sample;

a first loss determining unit, configured to determine a category loss and a coordinate regression loss corresponding to the current input image based on a difference between the calibration tag and the prediction tag and a preset loss function;

a back propagation weight determination unit configured to determine a back propagation weight corresponding to the category loss based on the adjusted learning weight and the category loss;

and the first model parameter adjusting unit is used for adjusting the model parameters of the model to be trained according to the back propagation weight corresponding to the category loss and the first preset back propagation weight corresponding to the coordinate regression loss.

Optionally, the learning weight adjusting unit is specifically configured to:

Optionally, the prediction tag determining module includes:

an image feature fusion unit, configured to fuse the third image feature and the fourth image feature to obtain a fused image feature when the image processing model is a model for a specific scene;

and the prediction label determining unit is used for determining a prediction label corresponding to the image sample based on the fused image characteristics.

Optionally, the model parameter adjusting module includes:

a second loss determining unit, configured to determine a category loss and a coordinate regression loss corresponding to the target image sample according to a difference between the calibration label and the prediction label and a preset loss function when the image processing model is a model for a specific scene;

and the second model parameter adjusting unit is used for adjusting the model parameters of the model to be trained on the basis of a second preset back propagation weight corresponding to the category loss and a third preset back propagation weight corresponding to the coordinate regression loss.

Optionally, the apparatus further comprises:

the image to be processed acquisition module is used for acquiring an image to be processed;

the processing result determining module is used for inputting the image to be processed into the image processing model to obtain a processing result output by the image processing model; or, the image processing module is configured to obtain a background image corresponding to the image to be processed, and input the image to be processed and the background image into the image processing model to obtain a processing result output by the image processing model.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the steps of the method for training an image processing model according to any one of the first aspect when executing a program stored in the memory.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for training an image processing model according to any one of the above first aspects.

The embodiment of the invention has the following beneficial effects:

in the scheme provided by the embodiment of the invention, the electronic equipment can obtain an image sample, wherein the image sample comprises a target image sample with a calibration label and a background image sample without the calibration label; inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample; determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample; and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model. According to the scheme of the embodiment of the invention, when the electronic equipment trains the model to be trained, the target image sample with the calibration label and the background image sample without the calibration label are adopted to train the model to be trained together, so that the image processing model obtained by training can well distinguish the target from the background, the false detection rate of the image processing model is greatly reduced, and the accuracy of the image processing model in image processing is improved. Moreover, the existing image samples can be used for training, and the image samples do not need to be additionally generated, so that the efficiency of model training is relatively high. Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other embodiments can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flowchart of a method for training an image processing model according to an embodiment of the present invention;

FIG. 2 is a detailed flowchart of step S104 in the embodiment shown in FIG. 1;

FIG. 3 is a flowchart illustrating the step S103 in FIG. 1;

FIG. 4 is another detailed flowchart of step S104 in the embodiment shown in FIG. 1;

FIG. 5 is a flow chart of an image processing method according to the embodiment shown in FIG. 1;

FIG. 6 is a flowchart illustrating a method for training an image processing model according to the embodiment shown in FIG. 1;

FIG. 7 is a schematic structural diagram of an apparatus for training an image processing model according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived from the embodiments given herein by one of ordinary skill in the art, are within the scope of the invention.

In order to improve the efficiency of model training and reduce the false detection rate of an image processing model, embodiments of the present invention provide a training method and apparatus for an image processing model, an electronic device, a computer-readable storage medium, and a computer program product. First, a training method of an image processing model according to an embodiment of the present invention is described below.

The training method for the image processing model provided by the embodiment of the invention can be applied to any electronic equipment which needs to be trained for the image processing model, for example, the electronic equipment can be a server, a processing device and the like, and is not particularly limited herein. For clarity of description, hereinafter referred to as electronic device.

As shown in fig. 1, a method for training an image processing model, the method comprising:

s101, acquiring an image sample.

Wherein the image sample comprises a target image sample with a calibration label and a background image sample without a calibration label.

S102, inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample.

S103, determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample.

S104, adjusting model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain an image processing model.

Therefore, in the scheme provided by the embodiment of the invention, the electronic equipment can obtain the image sample, wherein the image sample comprises the target image sample with the calibration label and the background image sample without the calibration label; inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample; determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample; and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model. According to the scheme of the embodiment of the invention, when the electronic equipment trains the model to be trained, the target image sample with the calibration label and the background image sample without the calibration label are adopted to train the model to be trained together, so that the image processing model obtained by training can well distinguish the target from the background, the false detection rate of the image processing model is greatly reduced, and the accuracy of the image processing model in image processing is improved. Moreover, the existing image samples can be used for training, and the image samples do not need to be additionally generated, so that the efficiency of model training is relatively high.

When a deep learning model for processing an image needs to be trained, the electronic device may acquire a target image sample including a calibration label and a background image sample without the calibration label as image samples for training the model to be trained, that is, perform step S101 described above. In this way, the image samples used for training the model to be trained can use the existing image samples, and the image samples do not need to be additionally generated, so that the efficiency of model training is improved. The image processing model obtained by the electronic device training may be an image classification model, a target detection model, a semantic segmentation model, an instance segmentation model, or a panorama segmentation model, and the like, which is not specifically limited herein.

After the electronic device obtains the image sample, the image sample may be used to train the model to be trained, that is, the above steps S102 to S104 are executed until the model to be trained converges, so as to obtain the image processing model. Therefore, the image processing model is obtained by training the target image sample and the background image sample, so that the image processing model can well distinguish the target from the background, the situation that the interfering object in the background is detected as the target is greatly reduced, namely the false detection rate of the image processing model is reduced, and the accuracy of the image processing model in image processing is improved.

Specifically, after the electronic device acquires the image sample, the image sample may be input to the model to be trained, and the image feature of the image sample is extracted, that is, the step S102 is executed. Since the image sample may include a target image sample with a calibration label and a background image sample without a calibration label, the extracted image features may also include image features of the target image sample and image features of the background image sample, respectively.

After acquiring the image features of the target image sample and the background image sample, the electronic device may determine the prediction tag corresponding to the image sample based on the acquired image features, that is, perform step S103. The prediction tag may include different contents in the case that the image processing model is a different type of model, for example, when the image processing model is an image classification model, the prediction tag may include a type of an object in the image sample, when the image processing model is an object detection model, the prediction tag may include a rectangular frame identifying an object to be detected in the image sample, when the image processing model is a semantic segmentation model, the prediction tag may include an object segmented from the image sample, when the image processing model is an instance segmentation model, the prediction tag may include an object segmented from the image sample and an annotation to the object, and when the image processing model is a panorama segmentation model, the prediction tag may include an object segmented from the image sample and an annotation to the object.

After the prediction label corresponding to the image sample is determined, the electronic device can judge whether the result of processing the image by the current model to be trained can meet the preset requirement through the difference between the calibration label and the prediction label, and if the result cannot meet the preset requirement, the model to be trained needs to be trained continuously. At this time, the electronic device may adjust the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained converges to obtain the image processing model, that is, execute step S104.

In an embodiment, the electronic device may determine whether a difference between the calibration label and the prediction label is not greater than a preset threshold, and if not, it indicates that the model to be trained converges, and may end the training, and use the model to be trained as the image processing model. If the value is larger than the preset value, the result that the current model to be trained processes the image cannot meet the preset requirement is shown, and the model to be trained needs to be trained continuously.

In the scheme provided by the embodiment of the invention, the electronic equipment can train the model to be trained by adopting the target image sample and the background image sample, so that the step of generating the image can be omitted on one hand, and the training efficiency of the model is improved; on the other hand, the image processing model obtained by training the target image sample and the background image sample can well distinguish the target from the background, the false detection rate of the image processing model is greatly reduced, and the accuracy of the image processing model in image processing is improved.

As an implementation manner of the embodiment of the present invention, the step of inputting the image sample into a model to be trained and extracting the image features of the target image sample and the image features of the background image sample may include:

and if the image processing model is used for a specific scene, inputting the target image sample into a second image feature extractor, extracting a third image feature of the target image sample, inputting the background image sample into a third image feature extractor, and extracting a fourth image feature of the background image sample.

The image processing model in the embodiment of the present invention may be an image classification model, a target detection model, a semantic segmentation model, an instance segmentation model, or a panorama segmentation model, and may be divided into a model for a mixed scene and a model for a specific scene on the basis of a specific type of model. For example, when the image processing model is an object detection model, the object detection model may be further divided into an object detection model for a mixed scene and an object detection model for a specific scene.

The specific scene is a method for training the target detection model by taking the labeled graph and the background graph corresponding to the labeled graph as an image sample pair, inputting the image sample pair into the target detection model through two input ends, and fusing the image features of the labeled graph and the image features of the background graph corresponding to the labeled graph in the target detection model. The mixed scene refers to a method for inputting the labeled image and the rich background image into the target detection model through one input end and training the target detection model, and the method requires the rich background image, but does not require that the labeled image and the background image are images in the same scene.

That is to say, the mixed scene and the specific scene actually refer to two methods for solving the problem of background false detection, and not only the target detection model for the mixed scene can perform target detection on images in multiple scenes, but the target detection model for the specific scene can only perform target detection on images in the specific scene. In fact, the target detection model for a specific scene may also be used for performing target detection on an image in any scene, and only needs to collect a background image without a target of a new scene as a background image when performing target detection, and input the background image and a target image collected in the new scene as an image pair together to the target detection model. According to the actual application requirements, a mixed scene method or a specific scene method can be selected to solve the problem of background false detection.

In an implementation manner, when the image processing model is a model for a mixed scene, the image processing model may be understood as a general model, and in the solution provided in the embodiment of the present invention, the electronic device may train the model using a background image without a target, and the background image sample and the target image sample may share one feature extractor, so that after the electronic device acquires the target image sample and the background image sample, the electronic device may input both the target image sample and the background image sample into the first image feature extractor, extract a feature of the target image sample as a first image feature by the first image feature extractor, and extract a feature of the background image sample as a second image feature.

In another embodiment, when the image processing model is a model for a specific scene, after acquiring a background image sample and a target image sample acquired under a background corresponding to the background image sample, the electronic device may input the target image sample and the background image sample into feature extractors corresponding to the target image sample, that is, feature extractors corresponding to the second image feature extractor and the background image sample, that is, a third image feature extractor, through two branch input networks, respectively, so that the target image sample and the background image sample are subjected to image feature extraction by using different feature extractors. The electronic device may extract the feature of the target image sample as a third image feature by the second image feature extractor and extract the feature of the background image sample as a fourth image feature by the third image feature extractor.

In the scheme provided by the embodiment of the invention, the electronic equipment can select to adopt a general image feature extractor to extract the features of the image sample according to different types of image processing models, or adopt a specific image feature extractor corresponding to the target image sample and the background image sample to extract the features of the image sample, so that the flexibility of the scheme is higher. And when the image processing model is a model for mixed scenes, the trained image processing model can be suitable for image processing in different scenes because the input image sample can be an image in a plurality of scenes. When the image processing model is a model for a specific scene, although the input image sample is an image in the specific scene, when extracting features, different feature extractors are adopted to extract the image features of the target image sample and the background image sample respectively, and the target image sample is an image acquired under the background corresponding to the background image sample, so that the trained image processing model can still be applicable to image processing in different scenes.

As an implementation manner of the embodiment of the present invention, as shown in fig. 2, when the image processing model is a model for a mixed scene, the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label may include:

s201, if the current input image is a background image sample without a calibration label, adjusting the learning weight corresponding to the target position in the first image feature extractor according to a preset mode to reduce the learning weight.

Since the electronic device uses the target image sample and the background image sample as image samples, and when the image processing model is a model for a mixed scene, the target image sample and the background image sample are input into a general feature extractor, that is, a first image feature extractor, for feature extraction, there is a problem that positive and negative samples of the foreground class and the background class are unbalanced.

Specifically, the target image sample includes both a foreground sample and a background sample, the background image sample includes only a background sample, and the target image sample and the background image sample are input to the general feature extractor for feature extraction, so that there are fewer foreground samples and more background samples, that is, there is a problem that the positive and negative samples of the foreground and the background are unbalanced.

Therefore, when the electronic device determines that the current input image is a background image sample without a calibration label, the learning weight corresponding to the target position in the first image feature extractor can be adjusted according to a preset mode, that is, the learning weight corresponding to the position where the first image feature extractor extracts the second image feature from the background image sample is adjusted, so as to reduce the learning weight of the model for the background sample, and thus, the positive and negative samples of the foreground and the background are kept balanced in the training of the model.

S202, determining category loss and coordinate regression loss corresponding to the current input image based on the difference between the calibration label and the prediction label and a preset loss function;

in the case that the loss is classified into a category loss and a coordinate regression loss, the electronic device may determine the category loss and the coordinate regression loss corresponding to the current input image based on a difference between the calibration label and the prediction label and a preset loss function, where the category loss represents a difference between the calibration label and a target category included in the prediction label, and the coordinate regression loss represents a coordinate difference between a target divided from the image sample included in the calibration label and the prediction label.

When the category loss corresponding to the current input image is calculated, the preset loss function may be a 0-1 loss function, and when the coordinate regression loss corresponding to the current input image is calculated, the preset loss function may be a square loss function or an absolute value loss function, which is reasonable and not specifically limited herein.

S203, determining a back propagation weight corresponding to the category loss based on the adjusted learning weight and the category loss;

since the current input image is a background image sample without a calibration label, and therefore, the current input image does not have a target frame, when updating the back propagation weight corresponding to the loss by using the adjusted learning weight, the coordinate regression loss does not need to be updated, and therefore, after determining the adjusted learning weight, the class loss, and the coordinate regression loss through the above steps S201 and S202, the electronic device may update the back propagation weight corresponding to the class loss only according to the adjusted learning weight.

In one embodiment, after determining the adjusted learning weight and the category loss, the electronic device may update the original back propagation weight corresponding to the category loss according to the adjusted learning weight, for example, by multiplying the adjusted learning weight and the original back propagation weight corresponding to the category loss to obtain an updated back propagation weight, and determining the back propagation weight corresponding to the category loss by the updated back propagation weight.

S204, adjusting the model parameters of the model to be trained according to the back propagation weight corresponding to the category loss and the first preset back propagation weight corresponding to the coordinate regression loss.

After determining the back propagation weight corresponding to the category loss through the step S203, that is, after updating the back propagation weight corresponding to the category loss, the electronic device may adjust the model parameters of the model to be trained according to the updated back propagation weight corresponding to the category loss and the original back propagation weight corresponding to the coordinate regression loss, so as to reduce the false detection rate of the image processing model and improve the accuracy of the model to be trained in processing the image, and repeatedly perform the steps S201 to S204, so as to continuously adjust the model parameters of the model to be trained, and finally make the model to be trained converge, thereby obtaining the image processing model.

In the scheme provided by the embodiment of the invention, under the condition that the image processing model is a model for a mixed scene, when the current input image is a background image sample without a calibration label, the electronic equipment can adjust the learning weight corresponding to the target position in the image feature extractor according to a preset mode, and update the back propagation weight corresponding to the class loss by adopting the adjusted weight so as to reduce the learning weight of the model on the background sample, thereby solving the problem that the positive and negative samples of the foreground class and the background class are unbalanced when the model is trained by adopting the target image sample and the background image sample.

As an implementation manner of the embodiment of the present invention, the step of adjusting the learning weight corresponding to the target position in the first image feature extractor in a preset manner may include:

when the electronic device adjusts the learning weight α corresponding to the target position in the first image feature extractor, the electronic device may adjust the learning weight α according to the following formula:

f (x) = user parameter + number of background image samples + total number of image samples

Where "+" in the above formula represents that consideration is required, not addition, and f (x) represents a formula for calculating the learning weight α.

In one embodiment, the electronic device may specifically determine the learning weight α corresponding to the target position in the first image feature extractor according to the following formula:

In an embodiment, the preset user parameter β in the above formula may be set to a number smaller than 1, for example, 0.999, according to an actual requirement, and for the preset user parameter β, a person skilled in the art may correspondingly adjust according to the actual requirement, which is not specifically limited herein.

In the solution provided by the embodiment of the present invention, the electronic device may adjust the learning weight corresponding to the target position in the first image feature extractor according to the above formula, when the ratio of the background image sample in the image sample is higher, that is, the ratio is higher

The larger the value of the weight value is, the more serious the problem of the imbalance of the positive and negative samples of the foreground class and the background class is, the smaller the value of the learning weight alpha calculated by the electronic equipment according to the formula is, namely the lower the learning weight of the sample of the background class is, so that the positive and negative samples of the foreground class and the background class are kept balanced in the training of the model, and the problem of the imbalance of the positive and negative samples of the foreground class and the background class is well solved.

As an implementation manner of the embodiment of the present invention, as shown in fig. 3, when the image processing model is a model for a specific scene, the step of determining the prediction label corresponding to the image sample based on the image feature of the target image sample and the image feature of the background image sample may include:

s301, fusing the third image characteristic and the fourth image characteristic to obtain a fused image characteristic;

when the image processing model is a model for a specific scene, the electronic device extracts features of the target image sample and the background image sample by using the second image feature extractor and the third image feature extractor, respectively, so that before determining the prediction label corresponding to the image sample according to the image features, the electronic device can fuse the third image feature and the fourth image feature to obtain the fused image features.

For example, it is reasonable that the electronic device may add the third image Feature and the element in the fourth image Feature by using a Network structure such as FPN (Feature Pyramid Networks), resNet (Deep residual Networks), SENet (Squeeze-and-Excitation Networks), and so on, to obtain a fused image Feature, or may splice the Feature map corresponding to the third image Feature and the Feature map corresponding to the fourth image Feature by using a Network structure such as DenseNet (Dense Convolutional Networks), and so on, which is not limited specifically herein.

S302, based on the fused image features, determining a prediction label corresponding to the image sample.

After obtaining the fused image features, the electronic device may determine the prediction tag corresponding to the image sample based on the fused image features, where the determined prediction tag corresponding to the image sample may correspondingly include different contents when the image processing model is a model of a different type, for example, a type of an object in the image sample, a rectangular frame of the object to be detected, or an object segmented from the image sample, and the content included in the tag may be set according to an actual requirement and a specific type of the image processing model, which is not specifically limited herein.

In the scheme provided by the embodiment of the invention, when the image processing model is a model for a specific scene, the electronic equipment can fuse the image characteristics of the target image sample and the background image sample, and determines the prediction label corresponding to the image sample based on the fused image characteristics, so that the false detection rate of the image processing model is reduced, and the accuracy of the image processing model in image processing is improved.

As an implementation manner of the embodiment of the present invention, as shown in fig. 4, when the image processing model is a model for a specific scene, the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label includes:

s401, determining category loss and coordinate regression loss corresponding to the target image sample according to the difference between the calibration label and the prediction label and a preset loss function;

since the electronic device processes the target image sample and the background image sample respectively when the image processing model is a model for a specific scene, and the background image sample does not include the calibration label, the electronic device may calculate only the loss corresponding to the target image sample when determining the loss of the image sample according to the difference between the calibration label and the prediction label and the preset loss function.

In a case where the loss is classified into a category loss and a coordinate regression loss, the electronic device may determine the category loss and the coordinate regression loss corresponding to the target image sample based on a difference between the calibration label and the prediction label and a preset loss function, where the category loss represents a difference between the calibration label and a target category included in the prediction label, and the coordinate regression loss represents a coordinate difference between a target included in the calibration label and a target included in the prediction label and segmented from the target image sample.

When calculating the category loss corresponding to the target image sample, the preset loss function may be a 0-1 loss function, and when calculating the coordinate regression loss corresponding to the target image sample, the preset loss function may be a square loss function or an absolute value loss function, which is reasonable and not specifically limited herein.

S402, adjusting the model parameters of the model to be trained based on a second preset back propagation weight corresponding to the category loss and a third preset back propagation weight corresponding to the coordinate regression loss.

When the image processing model is a model for a specific scene, the electronic device inputs the target image sample and the background image sample to different feature extractors respectively, and the background image sample and the target image acquired under the background corresponding to the background image sample are input to the model in pairs, so that the balanced learning of the model for positive and negative samples of the foreground class and the background class can be realized by setting different learning weights for the different feature extractors in advance. Therefore, after obtaining the category loss and the coordinate regression loss corresponding to the target image sample, the electronic device may directly adjust the model parameters of the model to be trained based on the second preset back propagation weight corresponding to the category loss and the third preset back propagation weight corresponding to the coordinate regression loss, without updating the back propagation weights.

In the scheme provided by the embodiment of the invention, the electronic equipment can determine the category loss and the coordinate regression loss corresponding to the target image sample according to the difference between the calibration label and the prediction label and the preset loss function; and adjusting the model parameters of the model to be trained on the basis of the second preset back propagation weight corresponding to the category loss and the third preset back propagation weight corresponding to the coordinate regression loss, so that the false detection rate of the image processing model is reduced, the accuracy of the model to be trained in processing the image is improved, and the model to be trained can be rapidly converged.

As an implementation manner of the embodiment of the present invention, as shown in fig. 5, the method may further include:

s501, acquiring an image to be processed.

After the electronic device completes the model training process to obtain the image processing model, the to-be-processed image may be obtained, and the to-be-processed image is processed according to the image processing model, for example, when the image processing model is an image classification model, the electronic device may classify the target in the to-be-processed image according to the image processing model, and when the image processing model is a target detection model, the electronic device detects the specific target in the to-be-processed image according to the image processing model, which is not specifically limited herein.

S502, inputting the image to be processed into the image processing model to obtain a processing result output by the image processing model.

S503, obtaining a background image corresponding to the image to be processed, and inputting the image to be processed and the background image into the image processing model to obtain a processing result output by the image processing model.

Because the image processing model can be a model for a mixed scene or a model for a specific scene, when the image to be processed is processed by the image processing model, the content of the input image processing model is selected according to the specific type of the image processing model.

In the case that the image processing model is a model for a mixed scene, because the background image sample and the target image sample share one feature extractor in the process of training the image processing model, and the background image sample and the target image sample can be input into the model to be trained separately for training, for the image processing model for a mixed scene, the electronic device can directly input the image to be processed into the image processing model, and obtain the processing result output by the image processing model.

In the case that the image processing model is a model for a feature scene, in the process of obtaining the image processing model through training, different feature extractors are used for a background image sample and a target image sample, and the background image sample and the target image sample acquired in the background corresponding to the background image sample are used as an image sample pair and are simultaneously input into the model to be trained for training, so that for the image processing model for the feature scene, the electronic device can obtain a background image corresponding to the image to be processed, and input the image to be processed and the background image into the image processing model together as an image pair to obtain a processing result output by the image processing model.

In the scheme provided by the embodiment of the invention, after the electronic equipment finishes training the model to be trained and obtains the image processing model, the electronic equipment can acquire the image to be processed and process the image to be processed according to the image processing model. And when the image processing model is a model for mixing scenes, the input image samples are images in a plurality of scenes, so that the trained image processing model can be suitable for image processing in different scenes. When the image processing model is a model for a characteristic scene, although the image sample in a specific scene is used for training, when the characteristic is extracted, different characteristic extractors are used for respectively extracting the image characteristics of the target image sample and the background image sample, and the target image sample is an image acquired under the background corresponding to the background image sample, so that the image processing model can still be applied to image processing in various scenes after the training is completed.

Fig. 6 is an example of a training method for an image processing model according to an embodiment of the present invention.

The electronic device may perform steps S601 and S602, collect a target image sample and a background image sample, then perform step S603, use the target image sample and the background image sample as an image sample set for training a model to be trained, and then perform step S604, and determine whether the model to be trained is a model for a hybrid scene.

If the model is a model for a mixed scene, step S605 is performed to extract image features of the image sample using a general feature extractor. Then, step S606 is executed to determine whether the image sample currently subjected to image feature extraction is a background image sample, if so, step S607 is executed to reduce the background learning weight, then step S608 is executed again to perform parameter adjustment according to a preset loss function, and if not, step S608 is directly executed to perform parameter adjustment according to the preset loss function. And step S612 is executed, whether the number of times of model training reaches the preset training number or whether the model is converged is judged, if yes, the training is ended, and the model to be trained after the training is finished is used as the image processing model. If not, the process returns to step S605 to continue extracting image features and training the model.

If the model is not a model for a mixed scene, that is, a model for a specific scene, step S609 is executed, a target image sample and a background image sample are input to the network through two branches, then step S610 is executed, two different feature extractors are used to extract image features of the target image sample and the background image sample, step S611 is executed, parameter adjustment is performed according to a preset loss function, step S612 is executed, whether the number of times of model training reaches a preset training number or whether the model converges is judged, if so, training is ended, and the trained model to be trained is used as an image processing model. If not, returning to the step S610, continuing to adopt different feature extractors to extract image features, and training the model.

In the scheme provided by the embodiment of the invention, the electronic equipment can combine the target image sample with the label and the background image sample to obtain the image sample set, and the image sample does not need to be additionally generated, so that the method is simple to implement. And the electronic equipment not only eliminates false detection, but also improves the imbalance of positive and negative samples and ensures high recall rate by allowing a background image sample without a target to be used as a training sample and setting a training weight adaptive adjustment loop and adaptively adjusting a training learning weight. In addition, the electronic equipment also provides two training modes of scene mixing and characteristic scene, under the scene mixing training mode, the image samples do not distinguish a scene input network, the target image samples and the background image samples share a network characteristic extractor, and the background learning weight is adaptively adjusted, so that the method is suitable for training of a general model. In a training mode of a characteristic scene, a target image sample and a background image sample are input in pairs, the background image sample is a non-target image in a scene unified with the target image sample, the method is suitable for a scene with a single background, a model is not required to be retrained when the scene is changed, and the method can be used only by acquiring a new background image.

Corresponding to the training method of the image processing model, the embodiment of the invention also provides a training device of the image processing model. The following describes an image processing model training apparatus according to an embodiment of the present invention.

As shown in fig. 7, an apparatus for training an image processing model, the apparatus comprising:

an image sample acquiring module 710, configured to acquire an image sample, where the image sample includes a target image sample with a calibration label and a background image sample without the calibration label;

an image feature extraction module 720, configured to input the image sample into a model to be trained, and extract an image feature of the target image sample and an image feature of the background image sample;

a prediction label determining module 730, configured to determine a prediction label corresponding to the image sample based on the image feature of the target image sample and the image feature of the background image sample;

and a model parameter adjusting module 740, configured to adjust a model parameter of the model to be trained according to a difference between the calibration label and the prediction label until the model to be trained converges, so as to obtain an image processing model.

Therefore, in the scheme provided by the embodiment of the invention, the electronic equipment can obtain the image sample, wherein the image sample comprises the target image sample with the calibration label and the background image sample without the calibration label; inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample; determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample; and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model. According to the scheme of the embodiment of the invention, when the electronic equipment trains the model to be trained, the target image sample with the calibration label and the background image sample without the calibration label are adopted to train the model to be trained together, so that the image processing model obtained by training can well distinguish the target from the background, the false detection rate of the image processing model is greatly reduced, and the accuracy of the image processing model in image processing is improved. And because the image samples used for training can use the existing image samples, and the image samples do not need to be additionally generated, the efficiency of model training is higher.

As an implementation manner of the embodiment of the present invention, the image feature extraction module 720 may include:

As an implementation manner of the embodiment of the present invention, the model parameter adjusting module 740 may include:

As an implementation manner of the embodiment of the present invention, the learning weight adjusting unit may be specifically configured to:

As an implementation manner of the embodiment of the present invention, the prediction tag determining module 730 may include:

the image feature fusion unit is used for fusing the third image feature and the fourth image feature to obtain a fused image feature under the condition that the image processing model is a model for a specific scene;

As an implementation manner of the embodiment of the present invention, the model parameter adjusting module 740 may further include:

As an implementation manner of the embodiment of the present invention, the apparatus may further include:

An embodiment of the present invention further provides an electronic device, as shown in fig. 8, which includes a processor 801, a communication interface 802, a memory 803, and a communication bus 804, where the processor 801, the communication interface 802, and the memory 803 complete mutual communication through the communication bus 804,

a memory 803 for storing a computer program;

the processor 801 is configured to implement the steps of the image processing model training method according to any one of the embodiments when executing the program stored in the memory 803.

Therefore, in the scheme provided by the embodiment of the invention, the electronic equipment can obtain the image sample, wherein the image sample comprises the target image sample with the calibration label and the background image sample without the calibration label; inputting the image sample into a model to be trained, and extracting the image characteristics of the target image sample and the image characteristics of the background image sample; determining a prediction label corresponding to the image sample based on the image characteristics of the target image sample and the image characteristics of the background image sample; and adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label until the model to be trained is converged to obtain the image processing model. According to the scheme of the embodiment of the invention, when the electronic equipment trains the model to be trained, the target image sample with the calibration label and the background image sample without the calibration label are adopted to train the model to be trained together, so that the image processing model obtained by training can well distinguish the target from the background, the false detection rate of the image processing model is greatly reduced, and the accuracy of the image processing model in processing the image is improved. Moreover, the existing image samples can be used for training, and the image samples do not need to be additionally generated, so that the efficiency of model training is relatively high.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this is not intended to represent only one bus or type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In a further embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the training method for the image processing model according to any of the above embodiments.

In a further embodiment of the present invention, there is also provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method steps of training an image processing model as described in any of the above embodiments.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus, the electronic device, the computer-readable storage medium, and the computer program product embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiments.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A method of training an image processing model, the method comprising:

2. The method of claim 1, wherein the step of inputting the image sample into a model to be trained and extracting the image features of the target image sample and the image features of the background image sample comprises:

3. The method of claim 2, wherein the image processing model is a model for a hybrid scene, and the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label comprises:

4. The method according to claim 3, wherein the step of adjusting the learning weight corresponding to the target position in the first image feature extractor according to a preset manner comprises:

5. The method of claim 2, wherein the image processing model is a model for a specific scene, and the step of determining the prediction label corresponding to the image sample based on the image features of the target image sample and the image features of the background image sample comprises:

6. The method of claim 5, wherein the image processing model is a model for a specific scene, and the step of adjusting the model parameters of the model to be trained according to the difference between the calibration label and the prediction label comprises:

and adjusting the model parameters of the model to be trained based on a second preset back propagation weight corresponding to the category loss and a third preset back propagation weight corresponding to the coordinate regression loss.

7. The method according to any one of claims 1-6, further comprising:

acquiring an image to be processed;

8. An apparatus for training an image processing model, the apparatus comprising:

9. The apparatus of claim 8, wherein the image feature extraction module comprises:

a second image feature extraction unit, configured to, when the image processing model is a model for a specific scene, input the target image sample into a second image feature extractor, extract a third image feature of the target image sample, input the background image sample into the third image feature extractor, and extract a fourth image feature of the background image sample, where the target image sample is an image acquired under a background corresponding to the background image sample; and/or the presence of a gas in the gas,

the model parameter adjustment module comprises:

the first model parameter adjusting unit is used for adjusting the model parameters of the model to be trained according to the back propagation weight corresponding to the category loss and the first preset back propagation weight corresponding to the coordinate regression loss; and/or the presence of a gas in the atmosphere,

the learning weight adjustment unit is specifically configured to:

wherein n is the number of background image samples included in the image samples, batch is the total number of the image samples, and β is a preset user parameter; and/or the presence of a gas in the gas,

the predictive tag determination module includes:

a prediction label determining unit, configured to determine a prediction label corresponding to the image sample based on the fused image feature; and/or the presence of a gas in the atmosphere,

the model parameter adjustment module comprises:

a second model parameter adjusting unit, configured to adjust the model parameters of the model to be trained based on a second preset back propagation weight corresponding to the category loss and a third preset back propagation weight corresponding to the coordinate regression loss; and/or the presence of a gas in the atmosphere,

the device further comprises:

10. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 7 when executing a program stored in the memory.

11. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 7.