CN113538441A

CN113538441A - Image segmentation model processing method, image processing method and device

Info

Publication number: CN113538441A
Application number: CN202110012485.6A
Authority: CN
Inventors: 甘振业; 姚亮; 冯云龙; 章吴浩; 李剑; 朱俊伟; 邰颖; 汪铖杰; 李季檩; 黄飞跃; 黄小明
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2021-01-06
Filing date: 2021-01-06
Publication date: 2021-10-22

Abstract

The application relates to a processing method of an image segmentation model, an image processing method and an image processing device. The method relates to an image segmentation technology in the field of computer vision, and comprises the following steps: acquiring a sample image of a target scene; carrying out image segmentation on the sample image through the lightweight segmentation model to be trained to obtain a first segmentation result in the lightweight segmentation model to be trained in the sample image; performing image segmentation on the sample image through the trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image; constructing a knowledge distillation loss function according to the first segmentation result and the second segmentation result; and when the sample image is the unmarked sample image, updating the model parameters of the lightweight segmentation model according to the knowledge distillation loss function, and continuing training until the lightweight segmentation model aiming at the target scene is obtained. By adopting the method, the marking cost can be saved, the limitation of marking on the optimization of the light-weight segmentation model is avoided, and the model iteration speed is greatly increased.

Description

Image segmentation model processing method, image processing method and device

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to a method and an apparatus for processing an image segmentation model, and a method and an apparatus for processing an image.

Background

With the development of deep learning technology, computer vision technology develops rapidly, and various models are developed endlessly. Computer vision is a challenging important research area in the engineering and scientific fields. Image segmentation is a classic problem in computer vision research, and is to divide an image into a plurality of regions according to image features, namely simply to separate an object in the image from a background.

At present, deep learning models are applied to image segmentation with better and better effect, but the deep learning models with strong performance have larger and larger network size, more and more complex structure, more and more model parameters, and gradually increased hardware resources required by prediction and training, and can only be operated in a high-computing machine. Due to the limitation of hardware resources and computational power, a lightweight segmentation model with smaller volume and higher speed is more required in practical application.

Compared with the large-volume model, the lightweight segmentation model has the defects of smaller volume and less parameter quantity, so that the generalization performance is poor. At present, in order to improve the effect of the lightweight segmentation model in different scenes, model optimization training is usually performed on the lightweight segmentation model by using label data in corresponding scenes, so that each time of the model optimization training is limited by the label data in a specific scene, the labeling cost is high, and the model optimization efficiency is very low.

Disclosure of Invention

In view of the above, it is necessary to provide a processing method, an apparatus, a computer device, and a storage medium for an image segmentation model, and an image processing method, an apparatus, a computer device, and a storage medium, which can improve optimization efficiency of a lightweight segmentation model and reduce optimization cost of the lightweight segmentation model, in order to solve the above technical problems.

A method of processing an image segmentation model, the method comprising:

acquiring a sample image of a target scene;

performing image segmentation on the sample image through a to-be-trained lightweight segmentation model to obtain a first segmentation result in the to-be-trained lightweight segmentation model in the sample image;

performing image segmentation on the sample image through a trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image;

constructing a knowledge distillation loss function according to the first segmentation result and the second segmentation result;

and when the sample image is an unmarked sample image, updating the model parameters of the lightweight segmentation model to be trained according to the knowledge distillation loss function, and continuing training until the training stopping condition is met, so as to obtain the trained lightweight segmentation model suitable for the target scene.

In one embodiment, the image segmentation of the sample image by the trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image includes:

respectively inputting the sample images into a plurality of trained teacher segmentation models;

carrying out image segmentation on the sample images through the trained teacher segmentation models to obtain a plurality of groups of segmentation results of the sample images in the trained teacher segmentation models;

and integrating the multi-segmentation results output by the trained teacher segmentation model to obtain the second segmentation result.

In one embodiment, the integrating the multi-segmentation result output by the trained teacher segmentation model to obtain the second segmentation result includes:

calculating the classification probability mean value of each pixel in the trained teacher segmentation models according to the classification probability of each pixel in the sample image represented by each set of segmentation results in the corresponding teacher segmentation model;

and obtaining a second segmentation result according to the classification probability mean value corresponding to each pixel in the sample image.

In one embodiment, said constructing a knowledge distillation loss function from said first and second segmentation results comprises:

taking the first segmentation result as prediction information of the sample image, and taking the second segmentation result as soft labeling information of the sample image;

respectively calculating the weighted dispersion loss and the soft labeling area mutual information loss corresponding to the sample image according to the prediction information and the soft labeling information;

and carrying out weighted summation on the weighted divergence loss and the soft labeling area mutual information loss to obtain a knowledge distillation loss function corresponding to the sample image.

In one embodiment, the calculating the corresponding weighted divergence loss of the sample image according to the prediction information and the soft labeling information includes:

determining edge pixels and non-edge pixels in the sample image according to the second segmentation result;

acquiring a first weight corresponding to the edge pixel and a second weight corresponding to the non-edge pixel;

and calculating the corresponding weighted divergence loss of the sample image according to the first weight, the second weight, the prediction information and the soft labeling information.

In one embodiment, the method further comprises:

when the sample image is the marked sample image, calculating a real marking loss function according to the first segmentation result and the real marking information of the marked sample image;

constructing a total loss function according to the knowledge distillation loss function and the real labeled loss function;

and updating the model parameters of the lightweight segmentation model to be trained according to the total loss function and then continuing training until the training stopping condition is met, thereby obtaining the trained lightweight segmentation model suitable for the target scene.

In one embodiment, the calculating a true annotation loss function according to the first segmentation result and the true annotation information of the annotated sample image includes:

respectively calculating the real cross entropy loss and the real area mutual information loss corresponding to the marked sample image according to the first segmentation result and the real marking information of the marked sample image;

and carrying out weighted summation on the real cross entropy loss and the real area mutual information loss to obtain a real annotation loss function corresponding to the annotated sample image.

In one embodiment, the method further comprises:

when the sample image is an unmarked sample image, performing geometric transformation on the unmarked sample image to obtain a geometric transformation image corresponding to the unmarked sample image;

performing image segmentation on the geometric transformation image through a lightweight segmentation model to be trained to obtain a third segmentation result in the lightweight segmentation model in the geometric transformation image;

constructing a non-uniformity loss function according to the first segmentation result and a fourth segmentation result obtained by carrying out geometric inverse transformation on the third segmentation result;

constructing a total loss function according to the knowledge distillation loss function and the non-uniformity loss function;

In one embodiment, prior to said acquiring a sample image of a target scene, the method further comprises:

acquiring a test sample image of a target scene and real annotation information corresponding to the test sample image;

inputting the test sample image into the teacher segmentation model, performing image segmentation on the test sample image through the teacher segmentation model, and outputting a segmentation result corresponding to the test sample image;

and determining the segmentation accuracy of the teacher segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

In one embodiment, the method further comprises:

inputting the test sample image into the trained light weight segmentation model, performing image segmentation on the test sample image through the light weight segmentation model, and outputting a segmentation result corresponding to the test sample image;

and determining the segmentation accuracy of the lightweight segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

In one embodiment, the method comprises:

acquiring an image to be processed including a target object;

inputting the image to be processed into the trained lightweight segmentation model;

outputting a segmentation result about a target object in the image to be processed through the trained lightweight segmentation model;

and replacing a background area except the target object in the image to be processed with a virtual background according to the segmentation result to obtain a fused image in which the target object and the virtual background are fused.

An apparatus for processing an image segmentation model, the apparatus comprising:

the acquisition module is used for acquiring a sample image of a target scene;

the first segmentation module is used for carrying out image segmentation on the sample image through a to-be-trained light weight segmentation model to obtain a first segmentation result in the to-be-trained light weight segmentation model in the sample image;

the second segmentation module is used for carrying out image segmentation on the sample image through a trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image;

the loss construction module is used for constructing a knowledge distillation loss function according to the first segmentation result and the second segmentation result;

and the iteration module is used for updating the model parameters of the to-be-trained light weight segmentation model according to the knowledge distillation loss function and then continuing training when the sample image is an unmarked sample image until a training stopping condition is met, and obtaining the well-trained light weight segmentation model suitable for the target scene.

A method of image processing, the method comprising:

acquiring an image to be processed including a target object;

carrying out image segmentation on the image to be processed through a trained lightweight segmentation model to obtain a segmentation result of a target object in the image to be processed;

replacing a background area except the target object in the image to be processed with a virtual background according to the segmentation result to obtain a fused image fused by the target object and the virtual background;

the light weight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, and the knowledge distillation loss function is constructed based on a first segmentation result and a second segmentation result in the trained teacher segmentation model after image segmentation is performed on a non-labeled sample image of a target scene through the light weight segmentation model and the trained teacher segmentation model to obtain the first segmentation result in the light weight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the non-labeled sample image.

In one embodiment, the step of replacing a background region in the to-be-processed image, except for the target object, with a virtual background according to the segmentation result to obtain a fused image in which the target object and the virtual background are fused includes:

replacing the background area except the participant in the online conference image with a virtual background according to the segmentation result to obtain a fused image fused by the participant and the virtual background;

the method further comprises the following steps:

and displaying the fused image, and sending the fused image to terminals participating in the online conference.

An image processing apparatus, the apparatus comprising:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed comprising a target object;

the segmentation module is used for carrying out image segmentation on the image to be processed through a trained lightweight segmentation model to obtain a segmentation result of a target object in the image to be processed; the light weight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, and the knowledge distillation loss function is constructed based on a first segmentation result and a second segmentation result in the trained teacher segmentation model after image segmentation is performed on a non-labeled sample image of a target scene through the light weight segmentation model and the trained teacher segmentation model to obtain the first segmentation result in the light weight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the non-labeled sample image;

and the fusion module is used for replacing the background area except the target object in the image to be processed with a virtual background according to the segmentation result to obtain a fusion image fused by the target object and the virtual background.

A computer device comprising a memory storing a computer program and a processor implementing the above-mentioned image segmentation model processing method or steps of the image processing method when the processor executes the computer program.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the above-mentioned processing method of the image segmentation model or steps of the image processing method.

A computer program comprising computer instructions stored in a computer-readable storage medium, the computer instructions being read from the computer-readable storage medium by a processor of a computer device, the processor executing the computer instructions, causing the computer device to perform the processing method of the image segmentation model or the steps of the image processing method described above.

When the segmentation effect of the light weight segmentation model in the target scene needs to be optimized and trained, only the unlabeled sample image needs to be respectively input into the light weight segmentation model to be optimized and the trained teacher segmentation model, and a second segmentation result output by the trained teacher segmentation model is used as guidance information for optimizing and training the light weight segmentation model.

On one hand, the non-labeled sample image is used for training, so that the optimization training of the light-weight segmentation model does not depend on labeled data of a large number of sample images, the labeling cost is saved, the limitation of the labeling speed on the optimization of the light-weight segmentation model is avoided, the speed of the iterative optimization of the model is greatly improved, and the effect is better particularly under the condition that a target scene is frequently iterated to adapt to new scene requirements; on the other hand, compared with the manual labeling, the problem that a large amount of wrong labeling data exist in the segmentation edge due to the fact that pixels cannot be really accurately labeled exists, and error information is inevitably introduced, the accuracy of the second segmentation result output by the teacher segmentation model in the segmentation edge is higher, the classification of each pixel in the manual labeling information is not 0 or 1, the supervision information is degraded, the second segmentation result output by the teacher segmentation model is a probability value between 0 and 1, the effect of the supervision information is higher, therefore, the second segmentation result is used as guiding information, and the accuracy of the trained lightweight segmentation model is higher.

According to the image processing method, the image processing device, the computer equipment and the storage medium, when the image to be processed of the target scene needs to be segmented, the segmentation result of the target object in the image to be processed can be quickly and accurately obtained only by inputting the image to be processed into a trained lightweight segmentation model which is suitable for the target scene and has a small volume, so that the image segmentation model can be operated on a mobile terminal and an embedded device, and the processing process of replacing the background area except the target object in the image to be processed with the virtual background according to the segmentation result is more efficient.

The segmentation effect of the lightweight segmentation model in the target scene is obtained by using unlabeled sample image optimization training, the unlabeled sample image is only required to be respectively input into the lightweight segmentation model to be optimized and a trained teacher segmentation model, a second segmentation result output by the trained teacher segmentation model is used as guidance information for optimization training of the lightweight segmentation model, specifically, a knowledge distillation loss function is constructed according to a first segmentation result output by the lightweight segmentation model and a second segmentation result output by the trained teacher segmentation model, and model parameters of the lightweight segmentation model are updated by using the knowledge distillation loss function, so that the trained lightweight segmentation model suitable for the target scene is obtained.

Drawings

FIG. 1 is a diagram illustrating an exemplary embodiment of a method for processing an image segmentation model;

FIG. 2 is a diagram illustrating segmentation of an online conference image and replacement of a background in one embodiment;

FIG. 3 is a flowchart illustrating a method for processing an image segmentation model according to one embodiment;

FIG. 4 is a flow diagram illustrating a second segmentation result of image segmentation performed on a sample image by a trained teacher segmentation model in one embodiment;

FIG. 5 is a schematic flow chart of the construction of the knowledge distillation loss function in one embodiment;

FIG. 6 is a diagram illustrating an embodiment of image training using label-free samples;

FIG. 7 is a diagram illustrating training using annotated sample images, in accordance with an embodiment;

FIG. 8 is a flow diagram that illustrates an optimization process for the lightweight segmentation model in one embodiment;

FIG. 9 is a flowchart illustrating a method for processing an image segmentation model in accordance with an exemplary embodiment;

FIG. 10 is a flow diagram that illustrates a method for image processing, according to one embodiment;

FIG. 11 is a block diagram showing a configuration of a processing device of an image segmentation model in one embodiment;

FIG. 12 is a block diagram showing the configuration of an image processing apparatus according to an embodiment;

FIG. 13 is an internal block diagram of a server in one embodiment;

fig. 14 is an internal structural view of a terminal in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The embodiment of the application provides a processing method of an image segmentation model and an image processing method, and relates to an Artificial Intelligence (AI) technology, wherein the AI technology is a theory, a method, a technology and an application system which simulate, extend and expand human Intelligence by using a digital computer or a machine controlled by the digital computer, sense the environment, acquire knowledge and use the knowledge to acquire an optimal result. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.

The embodiment of the application provides a processing method of an image segmentation model and an image processing method, and mainly relates to a Computer Vision technology (CV) of artificial intelligence. Computer vision is a science for researching how to make a machine "see", and further, it means that a camera and a computer are used to replace human eyes to perform machine vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include image processing, image recognition, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technologies, virtual reality, augmented reality, synchronous positioning, map construction, and other technologies, and also include common biometric technologies such as face recognition and fingerprint recognition.

The embodiment of the application provides a processing method of an image Segmentation model and an image processing method, which mainly relate to an image Segmentation technology in the technical field of computer vision, wherein an image is composed of a plurality of pixels (pixels), and the image Segmentation is to segment (Segmentation) the pixels in the image according to different semantics expressed by the pixels in the image. For example, in the embodiment of the present application, a teacher segmentation model is used to guide the image segmentation capability of a lightweight segmentation model in a target scene, specifically, when optimization training of the segmentation effect of the lightweight segmentation model in the target scene is required, a non-labeled sample image is input into a lightweight segmentation model to be optimized and a trained teacher segmentation model, and a second segmentation result output by the trained teacher segmentation model is trained as guidance information for optimization training of the lightweight segmentation model, so as to obtain a trained lightweight segmentation model suitable for the target scene, the training process saves labeling cost, avoids the limitation of the labeling speed on optimization of the lightweight segmentation model, and greatly improves the speed of model iterative optimization. In this way, the target object can be segmented from the image to be processed by using the trained light-weight segmentation model.

The processing method of the image segmentation model provided by the application can be applied to the application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. Server 104 may obtain a sample image of a target scene; carrying out image segmentation on the sample image through the lightweight segmentation model to be trained to obtain a first segmentation result in the lightweight segmentation model to be trained in the sample image; performing image segmentation on the sample image through the trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image; constructing a knowledge distillation loss function according to the first segmentation result and the second segmentation result; and when the sample image is a non-labeled sample image, updating the model parameters of the lightweight segmentation model to be trained according to the knowledge distillation loss function, and continuing training until the training stopping condition is met, so as to obtain the trained lightweight segmentation model suitable for the target scene.

The image processing method provided by the present application may also be applied to the application environment shown in fig. 1, where the server 104 may obtain an image to be processed that is sent by the terminal 102 and includes a target object; inputting an image to be processed into a trained lightweight segmentation model; outputting a segmentation result about a target object in the image to be processed through a trained lightweight segmentation model; and replacing the background area except the target object in the image to be processed with the virtual background according to the segmentation result to obtain a fused image of the target object and the virtual background. The server 104 may also return the fused image to the terminal 102, which the terminal 102 may display.

The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and the server 104 may be implemented by an independent server or a server cluster formed by a plurality of servers.

In the method for processing the image segmentation model provided by the embodiment of the present application, the execution subject may be the processing apparatus of the image segmentation model provided by the embodiment of the present application, or a computer device integrated with the processing apparatus of the image segmentation model, where the processing apparatus of the image segmentation model may be implemented in a hardware or software manner. The computer device may be the terminal 102 or the server 104 shown in fig. 1.

The processing method of the image segmentation model provided by the embodiment of the application can be applied to training of other lightweight models in the technical field of computer vision, such as a lightweight image classification model, a lightweight target positioning model, a lightweight target detection model, and the like. The lightweight models enable response at millisecond level to be possible on a convolutional neural network, and can be widely applied to a mobile terminal.

In the image processing method provided by the embodiment of the present application, an execution subject may be the image processing apparatus provided by the embodiment of the present application, or a computer device integrated with the image processing apparatus, where the image processing apparatus may be implemented in a hardware or software manner. The computer device may be the terminal 102 or the server 104 shown in fig. 1.

The image processing method provided by the embodiment of the application can be applied to an online video call scene, an online conference scene, an online live broadcast scene, an online education scene and the like. But also to instant content generation scenarios.

For example, in an online conference scene, as shown in fig. 2, which is a schematic diagram of segmenting an online conference image and replacing a background in one embodiment, referring to fig. 2, after a terminal acquires a to-be-processed image including participants, a segmentation result obtained by segmenting the to-be-processed image by using a lightweight segmentation model, a background area except for the participants in the to-be-processed image can be replaced by a virtual background, and through such processing, the background area of an environment where the participants present in the to-be-processed image are located is hidden except for the participants themselves, so that the participants can participate in an online conference on any occasion without worrying about privacy disclosure, and also without worrying about participation in the online conference in a cluttered environment and interference of other participants. The virtual background may be a preset default background or a custom background selected according to the user requirement.

For another example, in the online education scene, after the terminal acquires the image to be processed including the teacher or the teaching tool, the terminal uses the lightweight segmentation model to segment the image to be processed to obtain the segmentation result, the background area of the image to be processed except the teacher or the teaching tool can be replaced by a virtual background, such as a virtual platform or a virtual book, and the online teaching content of the teacher can be displayed on the virtual background in a superimposed manner.

For another example, in live shopping, after acquiring an image to be processed including a main broadcast or a commodity, a terminal performs image segmentation on the image to be processed by using a lightweight segmentation model to obtain a segmentation result, so that a background area except the main broadcast or the commodity in the image to be processed can be replaced by a virtual background, and commodity information, such as a commodity link, can be displayed on the virtual background in an overlapping manner.

For another example, in a video content generation scene, when a user generates video content through a terminal, for an acquired image to be processed, a segmentation result obtained by performing image segmentation on the image to be processed by using a lightweight segmentation model may select to hide a background region in the video content or replace the background region in the video content, so that the video content uploaded to an instant content generation platform may hide an environment where the user is currently located.

In one embodiment, as shown in fig. 3, a method for processing an image segmentation model is provided, which is described by taking the method as an example applied to a computer device (such as the terminal 102 or the server 104 shown in fig. 1), and includes the following steps:

step 302, a sample image of a target scene is acquired.

The target scene can be an online video call scene, an online conference scene, an online live broadcast scene, an online education scene, or an instant content generation scene. Specifically, after the need for a lightweight segmentation model for the target scene arises, a sample image of the target scene needs to be collected. The computer device may download a sample image of the target scene from a network, for example, obtain a sample image of the target scene from an image sample database, the computer device may also obtain a sample image of the target scene transferred by another computer device, for example, the server 104 in fig. 1 obtains an image transferred by the terminal 102, and the computer device may also obtain a sample image of the target scene generated locally, for example, a new sample image is generated by performing a series of image transformations on an existing sample image of the target scene.

The lightweight Segmentation model can be learned through a sample image of a target scene, so that the lightweight Segmentation model has the capability of performing image Segmentation on the sample image of the target scene, wherein the image Segmentation refers to Segmentation (Segmentation) of pixels in the image according to different semantics expressed by the pixels in the image.

In this embodiment, the sample image for training the lightweight segmentation model is an unmarked sample image, that is, a sample image without annotation information, where the annotation information of the sample image refers to category information of each pixel in the sample image, for example, a pixel belongs to a foreground or a background, then a pixel belonging to the foreground may be marked as 1, and a pixel belonging to the background may be marked as 0. Optionally, the sample image obtained by the computer device may also include a smaller proportion of the sample image with the annotation information.

And step 304, performing image segmentation on the sample image through the lightweight segmentation model to be trained to obtain a first segmentation result in the lightweight segmentation model to be trained in the sample image.

The lightweight segmentation model here is a lightweight image segmentation model to be trained, and the lightweight segmentation model is trained by a sample image of a target scene to learn the capability of image segmentation of the image of the target scene, and for example, the lightweight segmentation model is model-trained by a sample image of an online conference scene to learn the capability of image segmentation of the image of the online conference scene.

The lightweight segmentation model is an image segmentation model with small model volume, few model parameters and small calculation amount, and the lightweight segmentation model can adopt a neural network model, such as a convolution neural network model. In one embodiment, the computer device may set a model structure of an initial lightweight segmentation model in advance, and perform model training on the initial lightweight segmentation model through a sample image of a target scene to obtain model parameters. When image segmentation of an image of a target scene is required, a computer device may acquire model parameters obtained through training in advance, and then import the model parameters into an initial lightweight segmentation model, so as to obtain a lightweight segmentation model capable of performing image segmentation on the image of the target scene.

The first segmentation result is a prediction result obtained by the lightweight segmentation model by performing image segmentation on the input sample image, and it can be understood that the first segmentation result corresponding to the sample image may be less accurate when training is started, and as the training process progresses, the sample image is used to perform iterative training step by step, and the prediction result obtained by performing image segmentation on the sample image by the lightweight segmentation model is gradually accurate. The first segmentation result represents the classification probability of each pixel in the sample image predicted by the lightweight segmentation model, for example, the probability of a certain pixel belonging to the foreground is 0.8, and the probability of belonging to the background is 0.2.

In one embodiment, the computer device may obtain a sample set including a large number of sample images, where the sample images in the sample set may all be unmarked sample images, and may also include a small number of marked sample images, and the small number of marked sample images may also be used to test the accuracy of the lightweight segmentation model after training is completed. Each training of the computer device may obtain a batch of sample images, for example, 30 sample images, from the sample set, input the batch of sample images into the lightweight segmentation model, and obtain a first segmentation result of each sample image in the batch of sample images in the lightweight segmentation model. Optionally, the computer device may also perform preprocessing on the sample images before inputting the batch of sample images into the lightweight segmentation model, for example, adjusting the sample images to the same size and inputting the sample images into the lightweight segmentation model for processing.

And step 306, performing image segmentation on the sample image through the trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image.

The lightweight segmentation model to be trained can also be called a student model, and the teacher segmentation model is a trained image segmentation model. Compared with a light-weight segmentation model, the teacher segmentation model is an image segmentation model with a large model volume, a large number of model parameters and a large calculation amount, has good generalization performance and can have good accuracy in different application scenes, so that the image segmentation capability of the light-weight segmentation model in a target scene can be guided by the teacher segmentation model, and the light-weight segmentation model can be learned to the image segmentation capability in the target scene.

Optionally, in order to make the guidance capability of the teacher segmentation model stronger, a small part of labeled sample images may be used to verify the image segmentation capability of the teacher segmentation model in the target scene, and the model is used to guide the learning of the lightweight segmentation model after the verification is passed.

The second division result is a prediction result obtained by the teacher division model performing image division on the input sample image, and the prediction result can be used as annotation information of the unlabeled sample image to guide the lightweight division model to learn the image division capability of the target scene. The number of teacher division models may be one or more, for example, two or more, and when the teacher division model is a plurality, a plurality of division results about the sample image may be obtained, and the plurality of division results may be integrated to obtain a second division result. The second segmentation result represents the classification probability of each pixel in the sample image predicted by the teacher segmentation model, for example, the probability that a certain pixel belongs to the foreground is 0.9, and the probability that the certain pixel belongs to the background is 0.1.

Different from the classification probability of each pixel in the manual marking information which is not 0, namely 1, the second cutting result is a predicted value output by the teacher cutting model, the value of the second cutting result is a probability value between 0 and 1, so that a larger role of monitoring information can be played, in addition, because the manual marking information cannot be really accurate to the pixel, a large amount of wrong and inconsistent marking information with different standards exists at the cutting edge, the introduction of wrong information is inevitably caused, and because the second cutting result output by the teacher cutting model is higher in accuracy at the cutting edge, the accuracy of the light-weight cutting model obtained by the teacher cutting model through guidance and training is higher.

Specifically, after the sample image is input to the lightweight segmentation model to obtain the first segmentation result, the computer device also inputs the sample image to the trained teacher segmentation model to obtain the second segmentation result of the sample image in the trained teacher segmentation model.

It is understood that the execution order of step 304 and step 306 may be changed, and the step of performing image segmentation on the sample image through the trained teacher segmentation model to obtain the second segmentation result in the teacher segmentation model in the sample image may be performed first, and then performing image segmentation on the sample image through the light-weight segmentation model to be trained to obtain the first segmentation result in the light-weight segmentation model to be trained in the sample image.

And 308, constructing a knowledge distillation loss function according to the first segmentation result and the second segmentation result.

In this embodiment, the knowledge distillation means that the second division result output by the trained teacher model with high accuracy is used to guide the training of the lightweight division model and realize the transfer of knowledge. Therefore, the computer device may construct a knowledge distillation loss function from the first division result output from the lightweight division model and the second division result output from the trained teacher division model, and update the parameters of the lightweight division model using the knowledge distillation loss function.

And 310, when the sample image is an unmarked sample image, updating the model parameters of the lightweight segmentation model to be trained according to the knowledge distillation loss function, and continuing training until the training stopping condition is met, so as to obtain the trained lightweight segmentation model suitable for the target scene.

As mentioned above, on the premise of not increasing the training cost and not affecting the iteration speed of the lightweight segmentation model, the training sample set may include a small number of labeled sample images, and whether the labeled sample images or the unlabeled sample images are processed through the above steps 302 to 308, so as to obtain the corresponding knowledge distillation loss function. And for the non-labeled sample image, the only guiding information is from the teacher segmentation model, so the computer equipment can utilize the light-weight segmentation model to carry out image segmentation on the non-labeled sample image to obtain a first segmentation result and utilize the teacher segmentation model to carry out image segmentation on the non-labeled sample image to obtain a second segmentation result to construct a knowledge distillation loss function, obtain model parameters according to the knowledge distillation loss function when the knowledge distillation loss function is minimized, and continue training after updating the model parameters of the light-weight segmentation model.

On one hand, the non-labeled sample image is used for training, so that the optimization training of the light-weight segmentation model does not depend on labeled data of a large number of sample images, the labeling cost is saved, the limitation of the labeling speed on the optimization of the light-weight segmentation model is avoided, the speed of the iterative optimization of the model is greatly improved, and the effect is better particularly under the condition that a target scene is frequently iterated to adapt to new scene requirements; on the other hand, compared with the problem that a large amount of wrong labeling data exist in a segmentation edge due to the fact that pixels cannot be really accurately labeled in manual labeling, and error information is inevitably introduced, the accuracy of a second segmentation result output by a trained teacher segmentation model in the segmentation edge is higher, the classification of each pixel in the manual labeling information is not 0 or 1, supervision information is degraded, the second segmentation result output by the teacher segmentation model is a probability value between 0 and 1, the effect of the supervision information is higher, therefore, the second segmentation result is used as guiding information, and the accuracy of the trained lightweight segmentation model is higher.

In one embodiment, as shown in fig. 4, the image segmentation is performed on the sample image through the trained teacher segmentation model, and the second segmentation result in the trained teacher segmentation model in the sample image is obtained, including:

step 402, respectively inputting the sample images into a plurality of trained teacher segmentation models.

In particular, the computer device may employ a plurality of teacher segmentation models, each teacher segmentation model employing a different network structure. Each teacher segmentation model may employ a known network structure.

And step 404, performing image segmentation on the sample image through each trained teacher segmentation model to obtain a plurality of groups of segmentation results of the sample image in the trained teacher segmentation model.

And step 406, integrating the multi-segmentation results output by the trained teacher segmentation model to obtain a second segmentation result.

In one embodiment, integrating the multi-segmentation results output by the trained teacher segmentation model to obtain a second segmentation result comprises: calculating the classification probability mean value of each pixel in a plurality of trained teacher segmentation models according to the classification probability of each pixel in the sample image represented by each group of segmentation results in the corresponding teacher segmentation model; and obtaining a second segmentation result according to the classification probability mean value corresponding to each pixel in the sample image.

The segmentation result output by each teacher segmentation model represents the classification probability of the sample image corresponding to each class in the teacher segmentation model, and the computer device can calculate the classification probability mean value corresponding to each pixel according to the classification probabilities of the sample images in the plurality of teacher segmentation models. The classification probability average may be an equal-weight average of the classification probabilities output from the plurality of teacher segmentation models, or may be a weighted average of different weights.

In one embodiment, the mean classification probability of each pixel in the plurality of teacher segmentation models may be calculated by the following formula:

wherein Z is_t,x,y,iRepresenting the classification probability of a pixel at (x, y) in the sample image corresponding to the ith class in the tth teacher model, T representing the set of all teacher segmentation models,

representing the segmentation results of all teacher segmentation models in the integration T, obtaining the average classification probability of the pixel at (x, y) position in the sample image corresponding to the ith class in the plurality of teacher models, exp representing an exponential function with a natural constant e as a base,

and after the normalization operation is carried out on the average classification probability, a second segmentation result of the pixel at the position (x, y) in the sample image, which corresponds to the ith class in the plurality of teacher models, is obtained.

For example, the number of teacher segmentation models is 3, in the scene of the front-background segmentation, the classification probability of the pixel a in the sample image corresponding to the foreground in the first teacher segmentation model is m1, the classification probability for the corresponding background is n1, the classification probability for pixel a in the sample image for the corresponding foreground in the second teacher segmentation model is m2, the classification probability of the corresponding background is n2, the classification probability of the corresponding foreground in the third teacher segmentation model is m3, the classification probability corresponding to the background is n3, the average of the classification probabilities of the pixel A in the sample image in the 3 teacher segmentation models is calculated, namely, the average value of the classification probability corresponding to the foreground is (m1+ m2+ m3)/3, and the average value of the classification probability corresponding to the background is (n1+ n2+ n3)/3, and then the average classification probability corresponding to the foreground and the background is utilized to calculate and obtain the second segmentation result of the pixel A corresponding to the foreground and the background respectively.

In this embodiment, through the integration of a plurality of teacher segmentation models, and then guarantee to integrate the effect of the segmentation result that every teacher segmentation model output, compare single teacher model and have further promotion. By adopting the teacher segmentation models with different network structures for integration, the diversity of the second segmentation result can be ensured.

In one embodiment, as shown in fig. 5, constructing a knowledge distillation loss function from the first and second segmentation results comprises:

step 502, the first segmentation result is used as the prediction information of the sample image, and the second segmentation result is used as the soft labeling information of the sample image.

And step 504, respectively calculating the weighted dispersion loss and the soft labeling area mutual information loss corresponding to the sample image according to the prediction information and the soft labeling information.

The computer equipment can divide a first division result into a certain class in a lightweight division model according to a pixel and a plurality of pixels adjacent to the pixel to construct a multi-dimensional vector, divide a second division result into a certain class in a teacher division model according to the pixel and a plurality of pixels adjacent to the pixel to construct another multi-dimensional vector, construct soft labeling area mutual information loss according to the two multi-dimensional vectors, and maximize the soft labeling area mutual information loss in the training process so as to maximize the mutual information between the soft labeling area and the soft labeling area.

The second segmentation result is used as soft labeling information of the region mutual information, the value of each element in the generated multidimensional vector can be a numerical value between 0 and 1, and compared with the value of each element in the manual labeling information which is not 0, namely 1, the guiding effect of the second segmentation result as the soft labeling information in the training process can be improved.

For example, the classification probability of the pixel a corresponding to the foreground in the teacher segmentation model is m1, the classification probabilities of 8 adjacent pixels corresponding to the foreground in the teacher segmentation model are m2, m3, … and m9 in sequence, and the constructed 9-dimensional vector is used as soft labeling information of mutual information loss of the soft labeling area.

The divergence loss (K-L divergence loss) is a difference between a first division result and a second division result corresponding to each pixel, and the weighted divergence loss is to give different weights to the K-L divergence losses corresponding to different pixels, specifically, to give a larger weight to the pixel at the edge.

In one embodiment, the weighted K-L divergence loss for pixel (x, y) can be calculated by the following equation:

wherein, w_x,yThe weight corresponding to the pixel located at (x, y) in the sample image,

a second segmentation result representing that a pixel located at (x, y) in the sample image corresponds to an ith category in the plurality of teacher models; q. q.s_x,y,iThe first segmentation result representing that the pixel at (x, y) in the sample image corresponds to the ith class in the lightweight segmentation model.

In one embodiment, calculating the weighted divergence loss corresponding to the sample image according to the prediction information and the soft labeling information includes: determining edge pixels and non-edge pixels in the sample image according to the second segmentation result; acquiring a first weight corresponding to an edge pixel and a second weight corresponding to a non-edge pixel; and calculating the corresponding weighted divergence loss of the sample image according to the first weight, the second weight, the prediction information and the soft labeling information.

In this embodiment, the trained second prediction result output by the teacher segmentation model is used to determine edge pixels and non-edge pixels in the sample image, where the edge pixels are pixels at the segmentation edges, and thus, corresponding weights may be assigned to the pixels in the sample image. For example, a first weight assigned to edge pixels may be greater than a second weight assigned to non-edge pixels.

And step 506, carrying out weighted summation on the weighted divergence loss and the soft labeling area mutual information loss to obtain a knowledge distillation loss function corresponding to the sample image.

Specifically, the computer device may sum the weighted dispersion losses corresponding to all pixels in the sample image to obtain a total weighted dispersion loss of the entire sample image, sum the soft labeling region mutual information losses corresponding to all pixels in the sample image to obtain a total region soft labeling mutual information loss of the entire sample image, and sum the two losses by weighting to obtain a total knowledge distillation loss function. When a batch of sample images are trained and processed by the computer equipment each time, summing all knowledge distillation loss functions of the batch of sample images to obtain total loss corresponding to the training process, and then minimizing the total loss to update the model parameters of the lightweight segmentation model.

In one embodiment, the method further comprises: when the sample image is an unmarked sample image, performing geometric transformation on the unmarked sample image to obtain a geometric transformation image corresponding to the unmarked sample image; carrying out image segmentation on the geometric transformation image through a lightweight segmentation model to be trained to obtain a third segmentation result in the lightweight segmentation model in the geometric transformation image; constructing a non-uniformity loss function according to the first segmentation result and a fourth segmentation result obtained by carrying out geometric inverse transformation on the third segmentation result; constructing a total loss function according to the knowledge distillation loss function and the non-uniformity loss function; and updating the model parameters of the lightweight segmentation model to be trained according to the total loss function and then continuing training until the training stopping condition is met, thereby obtaining the trained lightweight segmentation model suitable for the target scene.

In this embodiment, in order to increase the accuracy of the lightweight segmentation model, a non-uniform loss function is introduced, so that the image segmentation capability of the lightweight segmentation model for different image transformations can be improved. Specifically, the computer device conducts reversible geometric transformation on the unmarked sample image, then utilizes the difference between a third segmentation result obtained by conducting image segmentation on the geometric transformation image through the lightweight segmentation model and a fourth segmentation result obtained by conducting geometric inverse transformation on the first segmentation result to construct a non-uniformity loss function, and utilizes the loss function and the knowledge distillation loss function to jointly guide the updating of the lightweight segmentation model. Alternatively, the non-uniformity loss function may be constructed based on KL divergence or mean square error loss. The reversible geometric transformation comprises at least one of horizontal turning, vertical turning, rotation, scaling and other processing modes.

In one embodiment, the computer device may further perform offline prediction on the unmarked sample image by using the teacher segmentation model, store a second segmentation result obtained by the prediction, and add such unmarked sample image and corresponding annotation information into optimization training of the lightweight segmentation model as the annotation information of the unmarked sample image.

In one embodiment, the computer device may further perform offline prediction on the image without the label sample by using the teacher segmentation model to obtain a prediction result in an original state, perform prediction on a transformed image of the same image without the label sample after different geometric transformations by using the teacher segmentation model to obtain a transformed prediction result, integrate the prediction result in the original state and a result obtained by inversely transforming the transformed prediction result into a final result, solidify the final result into label information of the image without the label sample, and add the label information into optimization training of the lightweight segmentation model.

FIG. 6 is a diagram illustrating an embodiment of an image training method using label-free sample images. Referring to fig. 6, unlabeled sample images are input to a lightweight segmentation model to be trained respectively to obtain a first segmentation result output by the lightweight segmentation model, then the unlabeled sample images are passed through a plurality of teacher segmentation models respectively to obtain a plurality of sets of segmentation results, the segmentation results are integrated to obtain a second segmentation result with overall high confidence, then the first segmentation result and the second segmentation result with high confidence obtained by integrating the plurality of teacher segmentation models are integrated to construct a knowledge distillation loss function, a gradient of a model parameter of the lightweight segmentation model is obtained according to a chain derivation method based on the knowledge distillation loss function, and then the model parameter of the lightweight segmentation model is updated according to the gradient.

In one embodiment, the method further comprises: when the sample image is the marked sample image, calculating a real marking loss function according to the first segmentation result and the real marking information of the marked sample image; constructing a total loss function according to the knowledge distillation loss function and the real marked loss function; and updating the model parameters of the lightweight segmentation model to be trained according to the total loss function and then continuing training until the training stopping condition is met, thereby obtaining the trained lightweight segmentation model suitable for the target scene.

As mentioned above, a small number of labeled sample images may be included in the training sample set, and the small number of labeled sample images may also be used as a test set. When the sample image acquired by the computer device from the training sample set is the marked sample image, the computer device may construct a real marking loss according to the real marking information of the sample image and the first segmentation result output by the lightweight segmentation model, and update the parameters of the lightweight segmentation model by using the total loss function after summing the partial loss and the knowledge distillation loss function corresponding to the sample image to obtain the total loss function.

In one embodiment, calculating a true annotation loss function according to the first segmentation result and the true annotation information of the annotated sample image comprises: respectively calculating the real cross entropy loss and the real area mutual information loss corresponding to the marked sample image according to the first segmentation result and the real marking information of the marked sample image; and carrying out weighted summation on the real cross entropy loss and the real area mutual information loss to obtain a real annotation loss function corresponding to the annotated sample image.

Specifically, the computer device may construct a true weighted cross-entropy loss according to a difference between the first segmentation result corresponding to each pixel and the true annotation information. The computer equipment can divide the pixels and a plurality of pixels adjacent to the pixels into a certain class of first division results in a lightweight division model according to the pixels, a multi-dimensional vector is constructed, another multi-dimensional vector is constructed according to the probability that the pixels and the pixels adjacent to the pixels in the real labeling information are divided into the certain class, the probability that the real labeling information is divided into the certain class is not 0 or 1, elements in the constructed multi-dimensional vector are not 0 or 1, the computer equipment constructs real region mutual information loss according to the two multi-dimensional vectors, and the real region mutual information loss is maximized in the training process so as to maximize the mutual information between the real region and the real region.

In one embodiment, prior to acquiring the sample image of the target scene, the method further comprises: acquiring a test sample image of a target scene and real annotation information corresponding to the test sample image; inputting the test sample image into a teacher segmentation model, performing image segmentation on the test sample image through the teacher segmentation model, and outputting a segmentation result corresponding to the test sample image; and determining the segmentation accuracy of the teacher segmentation model in the target scene according to the segmentation result and the real annotation information corresponding to each test sample image.

As mentioned above, the training sample set may include a small number of labeled sample images, for example, 200 and 500, and the small number of labeled sample images may also serve as a test set, which may be used to verify the segmentation accuracy of the teacher segmentation model in the target scene before the training of the lightweight segmentation model. If the verification result obtained after the verification of the test set indicates that the accuracy of the teacher segmentation model is good, the teacher segmentation model can be directly used for guiding the training of the lightweight segmentation model, and if the verification result obtained after the verification of the test set indicates that the accuracy of the teacher segmentation model is not good, the teacher segmentation model can be trained independently, so that the image segmentation capability of the teacher segmentation model in the target scene is improved. The teacher segmentation model is large in size and more in model parameters, and the limitation that other to-be-processed image segmentation models must follow does not exist, so that the accuracy of the teacher segmentation model in a target scene can be optimized by using a plurality of methods without limitation.

In one embodiment, the method further comprises: acquiring a test sample image of a target scene and real annotation information corresponding to the test sample image; inputting the test sample image into a trained light weight segmentation model, carrying out image segmentation on the test sample image through the light weight segmentation model to be trained, and outputting a segmentation result corresponding to the test sample image; and determining the segmentation accuracy of the lightweight segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

Similarly, after the training is finished, the test set formed by the labeled sample images can also be used for verifying the image segmentation capability of the light-weight segmentation model in the target scene. Before training the lightweight segmentation model, the computer device can use the test set to verify the image segmentation capability of the teacher segmentation model in the target scene, and after training the lightweight segmentation model is finished, the computer device can use the same test set to verify the image segmentation capability of the lightweight segmentation model in the target scene, so that fewer labeled sample images are needed, and the labeling cost is further reduced.

FIG. 7 is a diagram illustrating an embodiment of training using annotated sample images. Referring to fig. 7, the annotated sample images are respectively input to the lightweight segmentation model to be trained to obtain a first segmentation result output by the lightweight segmentation model, then the annotated sample images are respectively passed through a plurality of teacher segmentation models to obtain a plurality of groups of segmentation results, the segmentation results are integrated to obtain a second segmentation result with overall high confidence, then the first segmentation result and the second segmentation result with high confidence obtained by integrating the teacher segmentation models are integrated to construct a knowledge distillation loss function, then the first segmentation result of the lightweight segmentation model and the real annotation information of the sample images are constructed to construct a real annotation loss function, the knowledge distillation loss function and the real annotation loss function are added to obtain a total loss function, and the gradient of the model parameters of the lightweight segmentation model is obtained according to a chain-type derivation method based on the total loss function, and updating the model parameters of the lightweight segmentation model according to the gradient.

FIG. 8 is a flow diagram that illustrates an optimization process for the lightweight segmentation model in one embodiment. Referring to fig. 8, when a demand for a lightweight segmentation model of a target scene is generated, a large number of unlabeled sample images corresponding to the target scene are obtained, then a small number of labeled sample images of the target scene are used to verify the segmentation accuracy of a teacher segmentation model in the target scene, if the accuracy is low and does not meet the requirement, the teacher segmentation model is optimized by using an existing method until the segmentation effect meets the requirement, or when the accuracy meets the requirement, the teacher segmentation model and a large number of unlabeled sample images of the target scene are used to train the lightweight segmentation model, and finally a small number of labeled sample images of the target scene are used to verify the segmentation effect of the lightweight segmentation model.

According to the scheme provided by the embodiment of the application, the light weight segmentation model can be optimized in the target scene by directly utilizing the label-free training sample without depending on manual labeling, and the knowledge distillation loss provided for the semantic segmentation model is combined, so that the period of model optimization training is greatly shortened, and the application range of the real-time light weight segmentation technology is favorably expanded. The main technical effects include, but are not limited to: 1. the precision of the real-time semantic segmentation task in the target field is improved, and the precision of the target scene can be obviously improved by using the training method provided by the embodiment of the application compared with the original mode without depending on manual marking. 2. The labor cost is saved, only the collection of the sample image of the target scene still needs the labor expense by using the training method provided by the embodiment of the application, and the high semantic segmentation labor labeling cost is completely saved. 3. Enhancing the consistency of the sample: all the non-labeled data and labeled data use the prediction result of the teacher segmentation model as the supervision information for training the lightweight segmentation model, the consistency is natural, and meanwhile, the stability of prediction is further improved due to the integration of multiple teacher models. 4. By using the prediction result integrated by the multi-teacher model as guidance, more information besides the category, such as potential relations between different categories and the like, can be contained, and more effective guidance can be provided for the lightweight split network compared with the marking information encoded by one-hot.

By using the method provided by the application, 10 times of sample images of the target scene used by a method based on artificial labeling can be used in one-time model optimization, and the final evaluation index is obviously improved, as shown in the following table one:

	accuracy in evaluating indicators
		Training method based on artificial labeling	93.28％
Training method provided by the application	94.16％

Watch 1

Fig. 9 is a schematic flowchart of a processing method of an image segmentation model in a specific embodiment. Referring to fig. 9, the method includes the steps of:

and step 902a, acquiring an image of a sample without annotation.

And step 902b, acquiring an image with the marked sample.

And 904, performing image segmentation on the sample image through the lightweight segmentation model to be trained to obtain a first segmentation result in the lightweight segmentation model to be trained in the sample image.

Step 906, the sample images are respectively input into a plurality of trained teacher segmentation models.

And 908, performing image segmentation on the sample image through each trained teacher segmentation model to obtain a plurality of groups of segmentation results of the sample image in the trained teacher segmentation models.

And step 910, calculating the mean value of the classification probability of each pixel in the trained teacher segmentation models according to the classification probability of each pixel in the sample image represented by each group of segmentation results in the corresponding teacher segmentation model.

And 912, obtaining a second segmentation result according to the classification probability mean value corresponding to each pixel in the sample image.

Step 914, the first segmentation result is used as the prediction information of the sample image, and the second segmentation result is used as the soft label information of the sample image.

And step 916, respectively calculating the weighted dispersion loss and the soft labeling area mutual information loss corresponding to the sample image according to the prediction information and the soft labeling information.

And 918, carrying out weighted summation on the weighted divergence loss and the soft labeling area mutual information loss to obtain a knowledge distillation loss function corresponding to the sample image.

And 920, performing geometric transformation on the non-labeled sample image to obtain a geometric transformation image corresponding to the non-labeled sample image.

And step 922, carrying out image segmentation on the geometric transformation image through the lightweight segmentation model to be trained, and obtaining a third segmentation result in the lightweight segmentation model in the geometric transformation image.

And 924, constructing a non-uniformity loss function according to the first segmentation result and a fourth segmentation result obtained by performing inverse geometric transformation on the third segmentation result.

And step 926, constructing a total loss function of the unmarked sample image according to the knowledge distillation loss function and the non-uniformity loss function.

Step 928, updating the model parameters of the lightweight segmentation model according to the total loss function.

And 930, respectively calculating the real cross entropy loss and the real area mutual information loss corresponding to the marked sample image according to the first segmentation result and the real marking information of the marked sample image.

Step 932, performing weighted summation on the real cross entropy loss and the real area mutual information loss to obtain a real annotation loss function corresponding to the annotated sample image.

And step 934, constructing a total loss function of the labeled sample image according to the knowledge distillation loss function and the real labeled loss function.

Step 936, updating model parameters of the lightweight segmentation model according to the total loss function.

It should be understood that, although the steps in the flowchart of fig. 9 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 9 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least a portion of the other steps or stages.

In one embodiment, as shown in fig. 10, an image processing method is provided, which is described by taking the method as an example applied to a computer device (such as the terminal 102 or the server 104 shown in fig. 1), and includes the following steps:

step 1002, acquiring an image to be processed including a target object.

And step 1004, inputting the image to be processed into the trained lightweight segmentation model.

And step 1006, outputting a segmentation result about the target object in the image to be processed through the trained lightweight segmentation model.

The light weight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, and the knowledge distillation loss function is constructed based on a first segmentation result and a second segmentation result in the trained teacher segmentation model after image segmentation is performed on a non-annotated sample image of a target scene through the light weight segmentation model and the trained teacher segmentation model to obtain the first segmentation result in the light weight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the non-annotated sample image.

And step 1008, replacing the background area except the target object in the image to be processed with a virtual background according to the segmentation result to obtain a fused image of the target object and the virtual background.

The virtual background may be a preset default background or a custom background selected according to the user requirement.

In one embodiment, the image to be processed includes participants participating in an online conference, and the method includes replacing a background region except for a target object in the image to be processed with a virtual background according to a segmentation result to obtain a fused image in which the target object and the virtual background are fused, including: replacing the background area except the participant in the online conference image with a virtual background according to the segmentation result to obtain a fused image fused by the participant and the virtual background; the method further comprises the following steps: and displaying the fused image, and sending the fused image to the terminals participating in the online conference.

According to the image processing method, when the image to be processed of the target scene needs to be segmented, the segmentation result of the target object in the image to be processed can be quickly and accurately obtained only by inputting the image to be processed into the trained lightweight segmentation model which is suitable for the target scene and has a small volume, so that the image segmentation model can be operated on a mobile terminal and an embedded device, and the processing process of replacing the background area except the target object in the image to be processed with the virtual background according to the segmentation result is more efficient.

In one embodiment, as shown in fig. 11, an apparatus 1100 for processing an image segmentation model is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an acquisition module 1102, a first segmentation module 1104, a second segmentation module 1106, a loss construction module 1108, and an iteration module 1110, wherein:

an obtaining module 1102, configured to obtain a sample image of a target scene;

a first segmentation module 1104, configured to perform image segmentation on the sample image through the lightweight segmentation model to be trained, and obtain a first segmentation result in the lightweight segmentation model to be trained in the sample image;

a second segmentation module 1106, configured to perform image segmentation on the sample image through the trained teacher segmentation model, to obtain a second segmentation result in the trained teacher segmentation model in the sample image;

a loss construction module 1108 for constructing a knowledge distillation loss function based on the first and second segmentation results;

and the iteration module 1110 is configured to, when the sample image is an unmarked sample image, update the model parameters of the lightweight segmentation model to be trained according to the knowledge distillation loss function, and then continue training until a training stop condition is met, to obtain a trained lightweight segmentation model suitable for the target scene.

In one embodiment, the second segmentation module 1106 is further configured to input the sample images into a plurality of trained teacher segmentation models, respectively; carrying out image segmentation on the sample image through each trained teacher segmentation model to obtain a plurality of groups of segmentation results of the sample image in the trained teacher segmentation model; and integrating the multi-segmentation result output by the trained teacher segmentation model to obtain a second segmentation result.

In one embodiment, the second segmentation module 1106 is further configured to calculate a mean value of the classification probabilities of the pixels in the trained teacher segmentation models according to the classification probabilities of the pixels in the sample image represented by each set of segmentation results in the corresponding teacher segmentation model; and obtaining a second segmentation result according to the classification probability mean value corresponding to each pixel in the sample image.

In one embodiment, the loss construction module 1108 is further configured to use the first segmentation result as prediction information of the sample image, and use the second segmentation result as soft annotation information of the sample image; respectively calculating the weighted dispersion loss and the soft labeling area mutual information loss corresponding to the sample image according to the prediction information and the soft labeling information; and carrying out weighted summation on the weighted divergence loss and the soft labeling area mutual information loss to obtain a knowledge distillation loss function corresponding to the sample image.

In one embodiment, the loss construction module 1108 is further configured to determine edge pixels and non-edge pixels in the sample image according to the second segmentation result; acquiring a first weight corresponding to an edge pixel and a second weight corresponding to a non-edge pixel; and calculating the corresponding weighted divergence loss of the sample image according to the first weight, the second weight, the prediction information and the soft labeling information.

In one embodiment, the loss constructing module 1108 is further configured to, when the sample image is an annotated sample image, calculate a true annotation loss function according to the first segmentation result and the true annotation information of the annotated sample image; constructing a total loss function according to the knowledge distillation loss function and the real marked loss function; the iteration module 1110 is further configured to update the model parameters of the lightweight segmentation model to be trained according to the total loss function, and then continue training until the training stop condition is met, so as to obtain the trained lightweight segmentation model suitable for the target scene.

In one embodiment, the loss constructing module 1108 is further configured to calculate, according to the first segmentation result and the real annotation information of the annotated sample image, a real cross entropy loss and a real area mutual information loss corresponding to the annotated sample image respectively; and carrying out weighted summation on the real cross entropy loss and the real area mutual information loss to obtain a real annotation loss function corresponding to the annotated sample image.

In one embodiment, the loss constructing module 1108 is further configured to, when the sample image is an unmarked sample image, perform geometric transformation on the unmarked sample image to obtain a geometric transformation image corresponding to the unmarked sample image; carrying out image segmentation on the geometric transformation image through a lightweight segmentation model to be trained to obtain a third segmentation result in the lightweight segmentation model in the geometric transformation image; constructing a non-uniformity loss function according to the first segmentation result and a fourth segmentation result obtained by carrying out geometric inverse transformation on the third segmentation result; constructing a total loss function according to the knowledge distillation loss function and the non-uniformity loss function; the iteration module 1110 is further configured to update the model parameters of the lightweight segmentation model to be trained according to the total loss function, and then continue training until the training stop condition is met, so as to obtain the trained lightweight segmentation model suitable for the target scene.

In one embodiment, the apparatus further includes a verification module, configured to obtain a test sample image of the target scene and real annotation information corresponding to the test sample image; inputting the test sample image into a teacher segmentation model, performing image segmentation on the test sample image through the teacher segmentation model, and outputting a segmentation result corresponding to the test sample image; and determining the segmentation accuracy of the teacher segmentation model in the target scene according to the segmentation result and the real annotation information corresponding to each test sample image.

In one embodiment, the apparatus further includes a verification module, configured to obtain a test sample image of the target scene and real annotation information corresponding to the test sample image; inputting the test sample image into a trained light weight segmentation model, carrying out image segmentation on the test sample image through the light weight segmentation model, and outputting a segmentation result corresponding to the test sample image; and determining the segmentation accuracy of the lightweight segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

In one embodiment, the apparatus further includes a real-time segmentation module, configured to obtain an image to be processed including the target object; inputting an image to be processed into a trained lightweight segmentation model; outputting a segmentation result about a target object in the image to be processed through a trained lightweight segmentation model; and replacing the background area except the target object in the image to be processed with the virtual background according to the segmentation result to obtain a fused image of the target object and the virtual background.

When the segmentation effect of the lightweight segmentation model in the target scene needs to be optimally trained, the processing apparatus 1100 for image segmentation models only needs to input the unlabeled sample image into the lightweight segmentation model to be optimized and the trained teacher segmentation model, and to output the second segmentation result output by the trained teacher segmentation model as guidance information for optimally training the lightweight segmentation model. On one hand, the non-labeled sample image is used for training, so that the optimization training of the light-weight segmentation model does not depend on labeled data of a large number of sample images, the labeling cost is saved, the limitation of the labeling speed on the optimization of the light-weight segmentation model is avoided, the speed of the iterative optimization of the model is greatly improved, and the effect is better particularly under the condition that a target scene is frequently iterated to adapt to new scene requirements; on the other hand, compared with the problem that a large amount of wrong labeling data exist in a segmentation edge due to the fact that pixels cannot be really accurately labeled in manual labeling, and error information is inevitably introduced, the accuracy of a second segmentation result output by a trained teacher segmentation model in the segmentation edge is higher, the classification of each pixel in the manual labeling information is not 0 or 1, supervision information is degraded, the second segmentation result output by the teacher segmentation model is a probability value between 0 and 1, the effect of the supervision information is higher, therefore, the second segmentation result is used as guiding information, and the accuracy of the trained lightweight segmentation model is higher.

For specific limitations of the image segmentation model processing apparatus 1100, reference may be made to the above limitations on the processing method of the image segmentation model, which will not be described herein again. The respective modules in the processing device of the image segmentation model can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, as shown in fig. 12, an image processing apparatus 1200 is provided, which may be a part of a computer device using a software module or a hardware module, or a combination of the two modules, and specifically includes: an obtaining module 1202, a segmenting module 1204, and a fusing module 1206, wherein:

an obtaining module 1202, configured to obtain an image to be processed including a target object;

a segmentation module 1204, configured to perform image segmentation on the image to be processed through the trained lightweight segmentation model, and obtain a segmentation result about a target object in the image to be processed; the method comprises the steps that a lightweight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, the knowledge distillation loss function is constructed based on a first segmentation result and a second segmentation result after image segmentation is carried out on a non-labeled sample image of a target scene through the lightweight segmentation model and a trained teacher segmentation model to obtain the first segmentation result in the lightweight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the non-labeled sample image;

and the fusion module 1206 is configured to replace the background region except the target object in the image to be processed with the virtual background according to the segmentation result, so as to obtain a fusion image in which the target object and the virtual background are fused.

In one embodiment, the image to be processed includes participants participating in the online conference, and the fusion module 1206 is further configured to replace, according to the segmentation result, a background area in the online conference image, other than the participants, with a virtual background, to obtain a fusion image in which the participants and the virtual background are fused; the device also comprises a display module used for displaying the fused image and a sending module used for sending the fused image to the terminals participating in the online conference.

When the image to be processed of the target scene needs to be segmented, the image processing apparatus 1200 can quickly and accurately obtain the segmentation result of the target object in the image to be processed only by inputting the image to be processed into the trained lightweight segmentation model which is suitable for the target scene and has a small volume, so that the operation of the image segmentation model in the mobile terminal and the embedded device becomes possible, and the processing process of replacing the background area except the target object in the image to be processed with the virtual background according to the segmentation result is more efficient.

The segmentation effect of the lightweight segmentation model in the target scene is obtained by using unlabeled sample image optimization training, the unlabeled sample image is only required to be respectively input into the lightweight segmentation model to be optimized and a teacher segmentation model which is trained, a second segmentation result output by the teacher segmentation model is used as guidance information for optimization training of the lightweight segmentation model, specifically, a knowledge distillation loss function is constructed according to a first segmentation result output by the lightweight segmentation model and a second segmentation result output by the teacher segmentation model, and model parameters of the lightweight segmentation model are updated by using the knowledge distillation loss function, so that the trained lightweight segmentation model suitable for the target scene is obtained.

For specific limitations of the image processing apparatus 1200, the above limitations on the image processing method can be referred to, and are not repeated herein. The respective modules in the image processing apparatus described above may be wholly or partially implemented by software, hardware, and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 13. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of processing an image segmentation model.

In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 14. The computer equipment comprises a processor, a memory, a communication interface, a display screen and an image acquisition device which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless communication can be realized through WIFI, an operator network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement an image processing method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the image acquisition device of the computer equipment can be a camera and the like.

It will be appreciated by those skilled in the art that the configurations shown in fig. 12 or 13 are only block diagrams of some of the configurations relevant to the present application, and do not constitute a limitation on the computer apparatus to which the present application is applied, and a particular computer apparatus may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In one embodiment, a computer program product or computer program is provided that includes computer instructions stored in a computer-readable storage medium. The computer instructions are read by a processor of a computer device from a computer-readable storage medium, and the computer instructions are executed by the processor to cause the computer device to perform the steps in the above-mentioned method embodiments.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method for processing an image segmentation model, the method comprising:

acquiring a sample image of a target scene;

2. The method of claim 1, wherein the image segmenting the sample image through the trained teacher segmentation model to obtain a second segmentation result in the trained teacher segmentation model in the sample image comprises:

3. The method of claim 2, wherein said integrating the multi-segmentation results output by the trained teacher segmentation model to obtain the second segmentation result comprises:

4. The method of claim 1, wherein constructing a knowledge distillation loss function from the first and second split results comprises:

5. The method of claim 4, wherein the calculating the weighted divergence loss corresponding to the sample image according to the prediction information and the soft labeling information comprises:

6. The method of claim 1, further comprising:

7. The method of claim 6, wherein the calculating a true annotation loss function based on the first segmentation result and true annotation information of the annotated sample image comprises:

8. The method of claim 1, further comprising:

performing image segmentation on the geometric transformation image through the to-be-trained lightweight segmentation model to obtain a third segmentation result in the to-be-trained lightweight segmentation model in the geometric transformation image;

9. The method of claim 1, wherein prior to said obtaining a sample image of a target scene, the method further comprises:

inputting the test sample image into the trained teacher segmentation model, performing image segmentation on the test sample image through the trained teacher segmentation model, and outputting a segmentation result corresponding to the test sample image;

and determining the segmentation accuracy of the trained teacher segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

10. The method of claim 1, further comprising:

inputting the test sample image into the trained lightweight segmentation model, performing image segmentation on the test sample image through the trained lightweight segmentation model, and outputting a segmentation result corresponding to the test sample image;

and determining the segmentation accuracy of the trained lightweight segmentation model in the target scene according to the segmentation result corresponding to each test sample image and the real annotation information.

11. The method according to any one of claims 1 to 10, further comprising:

acquiring an image to be processed including a target object;

12. An image processing method, characterized in that the method comprises:

acquiring an image to be processed including a target object;

the trained lightweight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, and the knowledge distillation loss function is constructed based on a first segmentation result in the lightweight segmentation model to be trained and a second segmentation result in the trained teacher segmentation model in the unlabeled sample image after image segmentation is performed on an unlabeled sample image of a target scene through the lightweight segmentation model to be trained and the trained teacher segmentation model to obtain the first segmentation result in the lightweight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the unlabeled sample image.

13. The method according to claim 12, wherein the to-be-processed image includes participants participating in an online conference, and the replacing a background region except the target object in the to-be-processed image with a virtual background according to the segmentation result to obtain a fused image in which the target object and the virtual background are fused comprises:

the method further comprises the following steps:

14. An apparatus for processing an image segmentation model, the apparatus comprising:

the acquisition module is used for acquiring a sample image of a target scene;

15. An image processing apparatus, characterized in that the apparatus comprises:

the segmentation module is used for carrying out image segmentation on the image to be processed through a trained lightweight segmentation model to obtain a segmentation result of a target object in the image to be processed; the trained lightweight segmentation model is obtained by updating model parameters based on a knowledge distillation loss function, and the knowledge distillation loss function is constructed based on a first segmentation result in the lightweight segmentation model to be trained and a second segmentation result in the trained teacher segmentation model in the unlabeled sample image after image segmentation is performed on an unlabeled sample image of a target scene through the lightweight segmentation model to be trained and the trained teacher segmentation model to obtain the first segmentation result in the lightweight segmentation model to be trained and the second segmentation result in the trained teacher segmentation model in the unlabeled sample image;