CN111260679B

CN111260679B - Image processing method, image segmentation model training method and related device

Info

Publication number: CN111260679B
Application number: CN202010014372.5A
Authority: CN
Inventors: 叶海佳; 何帅; 王文斓
Original assignee: Guangzhou Huya Technology Co Ltd
Current assignee: Guangzhou Huya Technology Co Ltd
Priority date: 2020-01-07
Filing date: 2020-01-07
Publication date: 2022-02-01
Anticipated expiration: 2040-01-07
Also published as: CN111260679A; WO2021139625A1

Abstract

The application provides an image processing method, an image segmentation model training method and a related device, and relates to the technical field of artificial intelligence, wherein two training images which are adjacent in time sequence and optical flow information between the two training images are obtained as a training image set, and training annotation information corresponding to the training image set is obtained; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between the continuous images is ensured when image fusion is carried out.

Description

Image processing method, image segmentation model training method and related device

Technical Field

The application relates to the technical field of artificial intelligence, in particular to an image processing method, an image segmentation model training method and a related device.

Background

The matting technology is used for separating foreground information and background information in one image and then applying the obtained foreground information to other background information; by utilizing the matting technology, the extracted foreground information can be fused with any background information, for example, in the live broadcast field, the extracted portrait information can be fused with any background picture or video, and therefore the live broadcast experience of a user is improved.

However, the current matting technique only separates pixels of the image information and the background information to obtain a mask only containing 0 and 1, and when fusion occurs, consistency between adjacent image frames is poor, so that object information in a video image may have situations such as jitter.

Disclosure of Invention

An object of the present application is to provide an image processing method, an image segmentation model training method, and a related apparatus, which can ensure consistency between consecutive images when performing image fusion.

In order to achieve the purpose, the technical scheme adopted by the application is as follows:

in a first aspect, the present application provides a method for training an image segmentation model, where the method includes:

obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set includes two training images adjacent in time sequence, and optical flow (optical flow) information between the two training images;

inputting the two training images into the image segmentation model to obtain two training mask information;

and updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition.

In a second aspect, the present application provides an image processing method, the method comprising:

receiving an image to be processed and a background to be fused;

inputting the image to be processed into an image segmentation model trained to be convergent by the method provided by the application to obtain target mask information corresponding to the image to be processed;

and processing the image to be processed and the background to be fused by using the target mask information to obtain a fused image.

In a third aspect, the present application provides an image segmentation model training apparatus, including:

the first processing module is used for obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set comprises two training images adjacent in time sequence and optical flow information between the two training images;

the first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;

and the updating module is used for updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches the set convergence condition.

In a fourth aspect, the present application provides an image processing apparatus comprising:

the receiving module is used for receiving the image to be processed and the background to be fused;

the second processing module is used for inputting the image to be processed into an image segmentation model which is trained to be convergent by using the method provided by the application, and obtaining target mask information corresponding to the image to be processed;

the second processing module is further configured to process the image to be processed and the background to be fused by using the target mask information, so as to obtain a fused image.

In a fifth aspect, the present application provides an electronic device comprising a memory for storing one or more programs; a processor; the one or more programs, when executed by the processor, implement the image segmentation model training method or the image processing method described above.

In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described image segmentation model training method or image processing method.

According to the image processing method, the image segmentation model training method and the related device, two training images adjacent in time sequence and optical flow information between the two training images are obtained as a training image set, and training annotation information corresponding to the training image set is obtained; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, when the image segmentation model is trained, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between continuous images can be ensured when image fusion is carried out.

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly explain the technical solutions of the present application, the drawings needed for the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also derive other related drawings from these drawings without inventive effort.

FIG. 1 is a block diagram illustrating a schematic configuration of an electronic device provided herein;

FIG. 2 illustrates a schematic flow diagram of an image segmentation model training method provided herein;

FIG. 3 shows a schematic block diagram of an image segmentation model;

FIG. 4 shows a schematic flow diagram of sub-steps of step 206 in FIG. 2;

FIG. 5 illustrates another schematic flow chart diagram of an image segmentation model training method provided herein;

FIG. 6 is a schematic diagram illustrating one way of extracting optical flow information;

FIG. 7 illustrates yet another schematic flow chart of an image segmentation model training method provided herein;

FIG. 8 shows a schematic flow chart of an image processing method provided by the present application;

FIG. 9 illustrates a schematic diagram of pre-and post-fusion contrast of images;

FIG. 10 is a block diagram illustrating an exemplary structure of an image segmentation model training apparatus provided in the present application;

fig. 11 shows a schematic block diagram of an image processing apparatus provided in the present application.

In the figure: 100-an electronic device; 101-a memory; 102-a processor; 103-a communication interface; 400-image segmentation model training means; 401-a first processing module; 402-an update module; 500-an image processing apparatus; 501-a receiving module; 502-second processing module.

Detailed Description

To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the accompanying drawings in some embodiments of the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on a part of the embodiments in the present application without any creative effort belong to the protection scope of the present application.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

In the live broadcast field, for example, foreground information such as a portrait of a main broadcast may be separated from background information by using a matting technique, and then the separated foreground information may be merged with other background information.

Wherein, assuming that the foreground information separated from the matting is represented as F and the fused background information is represented as B, the fused image I can be represented as: i ═ mF + (1-m) B.

In the formula, m represents mask information (mask) corresponding to the foreground information F.

Combining the fusion formula of the image I, it can be known that the fusion effect of the image I is mainly affected by the value of the mask information because the foreground information F and the background information B are both input fixed values.

In some matting schemes, such as binary-image semantic segmentation, when mask information is obtained, the binary-image semantic segmentation scheme is to understand an image from a semantic hierarchy, classify information in the image into foreground pixels and background pixels from semantic categories, and obtain a 0 and 1 mask.

However, in a scene such as live webcasting, when fusing foreground information such as a main portrait with other background information, since live webcasting plays a continuous video stream, it is necessary to consider not only the fusion between the foreground information and the background information, but also that the segmentation result of two continuous frames of images cannot have a large deviation. However, for example, in the above-mentioned human image binary semantic segmentation scheme, only the segmentation of a single frame image is considered, and the consistency of the segmentation results of adjacent image frames in time sequence is not considered, so that when the human image segmentation is applied to scenes such as live webcasting, after the segmented foreground information is fused with other background information, object information in a video image generated by fusion may appear a shake phenomenon, which affects user experience.

Therefore, based on the above drawbacks, the present application provides a possible implementation manner as follows: obtaining two training images adjacent in time sequence and optical flow information between the two training images as a training image set, and obtaining training marking information corresponding to the training image set; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; the image segmentation model can be combined with the motion information between each image and other adjacent images to extract the mask information of the corresponding images, and further consistency among continuous images can be ensured when image fusion is carried out.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic block diagram illustrating an electronic device 100 provided in the present application, in which an untrained image segmentation model may be stored in the electronic device 100, so as to execute the image segmentation model training method provided in the present application to complete training of the image segmentation model; alternatively, the electronic device 100 may store therein an image segmentation model trained to converge by the image segmentation model training method provided by the present application, and implement the image processing method provided by the present application by using the image segmentation model trained to converge.

The electronic device 100 includes a memory 101, a processor 102, and a communication interface 103, wherein the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.

The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the image segmentation model training apparatus or the image processing apparatus provided in the present application, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 101, so as to execute the image segmentation model training method or the steps corresponding to the image processing method provided in the present application. The communication interface 103 may be used for communicating signaling or data with other node devices.

The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.

The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that electronic device 100 may include more or fewer components than shown in FIG. 1 or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.

The following describes an exemplary training method of an image segmentation model provided in the present application, taking the electronic device 100 shown in fig. 1 as an exemplary implementation subject.

Referring to fig. 2, fig. 2 shows a schematic flow chart of the image segmentation model training method provided in the present application, which may include the following steps:

step 202, obtaining a training image set and training annotation information corresponding to the training image set;

step 204, inputting the two training images into an image segmentation model to obtain two training mask information;

and step 206, updating model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition.

In one embodiment, an image segmentation model as shown in fig. 3 is stored in the electronic device, and the image segmentation model can process an input image and output mask information corresponding to the image; the network structure adopted by the image segmentation model may be an Unet network or a segmentation network such as depeplabv 3 or SEGNET, and the network structure of the image segmentation model is not limited in the present application.

When the electronic device trains the image segmentation model, a training image set and training annotation information corresponding to the training image set may be obtained first, where the training image set includes two training images adjacent to each other in time sequence, such as I in fig. 3₀And I₁And optical flow information between the two training images, the optical flow information characterizing the motion cues between the two training images, i.e. I₀And I₁The correlation between them.

Then, as shown in FIG. 3, the electronic device may compare I₀And I₁Inputting two training images into an image segmentation model to obtain two training mask information; for example, in the scenario shown in FIG. 3, I₀The corresponding Mask information may be Mask in FIG. 3₀，I₁The corresponding Mask information may be Mask in FIG. 3₁。

Finally, the electronic device may be based on, for example, Mask₀And Mask₁Updating model parameters of the image segmentation model by using two pieces of training mask information, training label information and optical flow information by using a back propagation algorithm (BP algorithm), for example, until the image segmentation model reaches a set convergence condition; the optical flow information represents the motion information between the two training images, and accordingly, the mask information corresponding to each of the two training images should also have the motion information represented by the optical flow information; in this way, when updating the model parameters of the image segmentation model, the image segmentation model can learn the motion information between two training images by using the optical flow information, so that when extracting the mask information of the target image, the image segmentation model can be extracted by combining the mask information of other images adjacent to the target image, thereby keeping the consistency between the adjacent images.

Based on the above design, the image segmentation model training method provided by the application obtains two training images adjacent in time sequence and optical flow information between the two training images as a training image set, and obtains training annotation information corresponding to the training image set; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, when the image segmentation model is trained, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between continuous images can be ensured when image fusion is carried out.

It should be noted that two training images that are adjacent in time sequence and obtained by the electronic device include a first image that is earlier in time sequence and a second image that is later in time sequence, such as I in fig. 3₀It may be provided as a first image,I₁may be used as the second image.

Accordingly, the two training mask information output by the image segmentation model include the first training mask information corresponding to the first image and the second training mask information corresponding to the second image, such as I in fig. 3₀Corresponding Mask₀Can be used as the first training mask information, I₁Corresponding Mask₁May be used as the second training mask information.

In addition, optical flow is the movement of an object, a scene, or an object caused by the camera moving between two consecutive frames of images; the optical flow information is a vector information, and the optical flow is generally divided into a forward optical flow and a backward optical flow, such as an image I between two frames of images shown in FIG. 3₀In picture I₁Before, then for image I₁In other words, image I₀To image I₁The optical flow information of (1) is backward optical flow, and the recorded image is image I₀To image I_tDirection and rate of movement of; image I₁To image I₀The optical flow information of (1) is forward optical flow, and image I is recorded₁To image I₀Direction and rate of movement.

In the above description, for illustration purposes only, an image in the front of the time sequence is used as a first image, and an image in the back of the time sequence is used as a second image; in other possible embodiments of the present application, the later time sequence may be used as the first image, and the earlier time sequence may be used as the second image; this is not limited in this application.

Optionally, in an embodiment, the training annotation information obtained by the electronic device may be annotation mask information of the second image, that is, according to the scenario shown in fig. 3, the electronic device obtains the image I₁Marking mask information; accordingly, the optical flow information obtained by the electronic device may be for the image I₁Backward optical flow in the word.

Thus, referring to fig. 4 on the basis of fig. 2, fig. 4 shows a schematic flow chart of the sub-steps of step 206 in fig. 2, and as a possible implementation, step 206 may include the following sub-steps:

step 206-1, obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;

step 206-2, obtaining the time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;

and step 206-3, updating model parameters of the image segmentation model based on the content loss and the time sequence loss.

The electronic device, in performing step 206 to update the model parameters of the image segmentation model, may split the loss function of the image segmentation model into two parts, content loss and timing loss.

For example, the loss function of the image segmentation model may satisfy the following formula:

L＝Lc+Lst

in the formula, L represents the total loss of the image segmentation model, Lc represents the content loss, and Lst represents the time-series loss.

The content loss constraint is the second training mask information output by the image segmentation model and the actual mask information of the second image, and the content loss guarantees the accuracy of the segmentation result.

In this way, when the electronic device executes step 206, the content loss of the image segmentation model can be obtained according to the annotation mask information, the second training mask information and the second image, that is, the difference between the second training mask information and the annotation mask information is calculated.

For example, as a possible implementation, the calculation formula of the content loss may satisfy the following:

Lc＝||mask_gt-mask1_pre||₂+||I₁*(mask_gt-mask1_pre)||₂

where Lc represents a loss of content, mask_gtIndicating annotation mask information, mask1_preRepresenting second training mask information, I₁Representing the second image.

On the other hand, the time sequence loss is constrained by the motion information between the two frames of images, and the mask information corresponding to the two frames of images can be kept consistent in time sequence.

In this way, when the electronic device executes step 206, the electronic device obtains the timing loss of the image segmentation model based on the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.

For example, as a possible implementation, the timing loss calculation formula may satisfy the following:

Lst＝exp(-α*M)||mask1_pre-warp₀₁(mask0_pre)||₂

where Lst denotes a timing loss, α denotes a set parameter, and M | | | I₁-warp₀₁(I₀)||₂，I₀Representing a first image, warp₀₁Representing optical flow information, mask0_preRepresenting first training mask information, warp₀₁(mask0_pre) Representing mask information, warp, at the moment of transforming the first training mask information into the second image on the basis of the optical flow information₀₁(I₀) Representing the image at the time when the first image was transformed to the second image based on the optical flow information.

In this way, the electronic device may calculate the sum of the content loss and the timing loss as the total loss of the image segmentation model in a summation manner based on the obtained content loss and timing loss, and then update the model parameters of the image segmentation model by using, for example, a BP algorithm; and (4) continuously performing iterative training until the image segmentation model reaches the set convergence condition.

It should be noted that the above formulas for calculating the content loss, the timing loss, and the total loss of the image segmentation model are only illustrative, and in some other possible embodiments of the present application, other formulas may be used to calculate the above losses, for example.

In addition, in the above scheme provided by the present application, the step 206-1 may be executed to obtain the content loss, and then the step 206-2 may be executed to obtain the timing loss; or, the step 206-2 is executed to obtain the content loss first, and then the step 206-1 is executed to obtain the content loss; the sequence of the execution of the step 206-1 and the step 206-2 is not limited in the present application; for example, in another possible implementation, step 206-1 may be performed together with step 206-2.

Furthermore, it should be noted that the optical flow is a two-dimensional vector field of the image during the translation process, which represents the velocity field of the object point in the three-dimensional motion by the two-dimensional image, reflecting the image change due to the motion in a tiny time interval to determine the motion direction and motion rate on the image point, so that the optical flow can be used to provide clues for restoring the image motion.

When the image segmentation model is trained, optical flow information between two training images can be obtained in an online extraction mode, so that the workload of a user when the image segmentation model is trained is reduced.

To this end, referring to fig. 5 on the basis of fig. 2, fig. 5 shows another schematic flowchart of the image segmentation model training method provided in the present application, and before performing step 202, the image segmentation model training method may further include the following steps:

in step 201, inter-frame optical flow between two training images is extracted to obtain optical flow information.

In one embodiment, as shown in FIG. 6, the electronic device may extract an inter-frame optical flow between two training images using, for example, a selflow algorithm to obtain optical flow information.

For example, as described above will be for image I₁In the example of backward optical flow as optical flow information, the electronic device may convert the image I₀And image I₁As input, and extracting image I by using selflow algorithm₁The backward interframe optical flow, thereby obtaining optical flow information.

However, in the embodiment of obtaining optical flow information by using the above-mentioned online extraction method, the step of extracting optical flow information online needs to be performed, so that the training time of the image segmentation model is prolonged; further, when the image segmentation model is iteratively trained, since step 201 needs to be repeatedly executed, the operation of extracting optical flow information may be repeatedly performed on two training images of the same set.

Therefore, in another embodiment, the optical flow information can also be obtained by means of offline extraction, namely: step 201 may be executed first, and then after obtaining the optical flow information of the two training images in each group, the two training images in each group and the corresponding optical flow information are used as the input of the image segmentation model, so as to execute the training process of the image segmentation model; at this time, when the image segmentation model is trained, since it is not necessary to perform step 201 to obtain optical flow information again, it is possible to reduce the training time of the image segmentation model and avoid performing an operation of repeatedly extracting optical flow information on two training images of the same set.

Moreover, in an actual training scene, because the open-source data is less, in order to make the training image of the training image segmentation model sufficient, the data amount of the training image can be increased by intercepting, for example, a portrait matting data set in a live broadcast scene and extracting mask information of a corresponding image as annotation mask information.

However, it should be noted that the manual operation of the user is still required by intercepting the image cutout data set, which increases the workload of the user in training the image segmentation model.

To this end, referring to fig. 7 on the basis of fig. 2, fig. 7 shows a further schematic flowchart of the image segmentation model training method provided by the present application, and before performing step 202, the image segmentation model training method may further include the following steps:

step 200, fusing the obtained two object information with background information respectively to generate two training images.

In one embodiment, a user can extract object information in two images adjacent in time sequence and transmit the two object information to the electronic device; then, the electronic device may fuse the obtained two object information with a background information to generate two training images, that is: a set of training images, thereby increasing the amount of data in the training images.

Of course, it should be understood that the above describes only one way of generating two training images, by taking the example of fusing the obtained object information with one piece of background information; when a large number of training images need to be generated, the electronic device may fuse the two training images with different background information, so as to generate a plurality of sets of training images, where each set of training images includes two training images.

In addition, in combination with the above description, in a scene where the foreground information F and the background information B separated by matting are fused, the fused image I can be represented as: i ═ mF + (1-m) B.

In the expression, m represents mask information corresponding to the foreground information F. As can be seen, based on the expression, as long as the mask information m corresponding to the foreground information F can be obtained, the foreground information F can be fused with any background information B.

Thus, on the basis of the image segmentation model training method provided by the application, the image segmentation model training method can be used for training a converged image segmentation model for fusion of foreground information F and background information B in a live scene, for example.

Referring to fig. 8, fig. 8 shows a schematic flow chart of an image processing method provided by the present application, which may include the following steps:

step 302, receiving an image to be processed and a background to be fused;

step 304, inputting the image to be processed into an image segmentation model trained to be converged by using an image segmentation model training method to obtain target mask information corresponding to the image to be processed;

and step 306, processing the image to be processed and the background to be fused by using the target mask information to obtain a fused image.

In an embodiment, for example, in a live scene, the electronic device may use each received live video frame as an image to be processed, and receive a background to be merged, where the purpose is to: and replacing the background information of each frame of video live broadcast picture with the background to be fused.

Then, taking one frame of live video picture as an image to be processed as an example, the electronic device may use the image to be processedInputting the image into the image segmentation model trained to be converged by the image segmentation model training method provided by the application, so that the image segmentation model outputs target Mask information Mask corresponding to the image to be processed_m。

Finally, the electronic equipment can Mask the obtained target Mask information_mThe parameter m is used as the parameter m in the fusion formula, and the image to be processed and the background to be fused are substituted into the fusion formula, so that a fusion image I is obtained, and the effect before and after fusion can be shown in FIG. 9; as can be seen, the image segmentation model can extract mask information of corresponding images by combining motion information between each image and another adjacent image when performing image fusion after learning motion information between images using optical flow information, and can ensure consistency between consecutive images when performing image fusion.

In addition, based on the same inventive concept as the above-mentioned image segmentation model training method provided in the present application, please refer to fig. 10, and fig. 10 shows a schematic structural block diagram of an image segmentation model training apparatus 400 provided in the present application, where the image segmentation model training apparatus 400 may include a first processing module 401 and an updating module 402. Wherein:

a first processing module 401, configured to obtain a training image set and training annotation information corresponding to the training image set; the training image set comprises two training images which are adjacent in time sequence and optical flow information between the two training images;

the first processing module 401 is further configured to input the two training images into an image segmentation model to obtain two training mask information;

and an updating module 402, configured to update the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information, and the optical flow information until the image segmentation model reaches a set convergence condition.

Optionally, as a possible implementation manner, the two training images include a first image before the time sequence and a second image after the time sequence, and the training annotation information is annotation mask information of the second image;

the two training mask information comprise first training mask information corresponding to the first image and second training mask information corresponding to the second image;

the updating module 402 is specifically configured to, when updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information, and the optical flow information:

obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;

obtaining the time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;

and updating the model parameters of the image segmentation model based on the content loss and the time sequence loss.

Optionally, as a possible implementation manner, the calculation formula of the content loss satisfies the following:

Lc＝||mask_gt-mask1_pre||₂+||I₁*(mask_gt-mask1_pre)||₂

where Lc represents a loss of content, mask_gtIndicating annotation mask information, mask1_preRepresenting second training mask information, I₁Representing a second image;

the calculation formula of the timing loss satisfies the following:

Lst＝exp(-α*M)||mask1_pre-warp₀₁(mask0_pre)||₂

Optionally, as a possible implementation manner, before obtaining the training image set and the training annotation information corresponding to the training image set, the first processing module 401 is further configured to:

an inter-frame optical flow between two training images is extracted to obtain optical flow information.

and fusing the obtained two object information with background information respectively to generate two training images.

Referring to fig. 11, fig. 11 shows a schematic block diagram of an image processing apparatus 500 according to the present application, where the image processing apparatus 500 may include a receiving module 501 and a second processing module 502. Wherein:

a receiving module 501, configured to receive an image to be processed and a background to be fused;

a second processing module 502, configured to input an image to be processed into an image segmentation model trained to converge by using the image segmentation model provided in the present application, so as to obtain target mask information corresponding to the image to be processed;

the second processing module 502 is further configured to process the image to be processed and the background to be fused by using the target mask information, so as to obtain a fused image.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to some embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).

It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in some embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to some embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.

The above description is only a few examples of the present application and is not intended to limit the present application, and those skilled in the art will appreciate that various modifications and variations can be made in the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. An image segmentation model training method, characterized in that the method comprises:

obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set comprises two training images adjacent in time sequence and optical flow information between the two training images;

updating model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition;

the two training images comprise a first image in front of a time sequence and a second image in back of the time sequence, and the training annotation information is annotation mask information of the second image;

the two pieces of training mask information comprise first training mask information corresponding to the first image and second training mask information corresponding to the second image;

updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information, wherein the step comprises the following steps:

obtaining a time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;

updating model parameters of the image segmentation model based on the content loss and the timing loss.

2. The method of claim 1, wherein the calculation formula of the content loss satisfies the following:

Lc＝||mask_gt-mask1_pre||₂+||I₁*(mask_gt-mask1_pre)||₂

where Lc represents the loss of content, mask_gtRepresenting the annotation mask information, mask1_preRepresenting said second training mask information, I₁Representing the second image;

the calculation formula of the time sequence loss satisfies the following conditions:

Lst＝exp(-α*M)||mask1_pre-warp₀₁(mask0_pre)||₂

where Lst denotes the timing loss, α denotes a set parameter, and M | | | I₁-warp₀₁(I₀)||₂，I₀Representing said first image, warp₀₁Representing the optical flow information, mask0_preRepresenting said first training mask information, warp₀₁(mask0_pre) Representing mask information, warp, at the moment of transforming the first training mask information into the second image on the basis of the optical flow information₀₁(I₀) Representing the image at the time when the first image was transformed to the second image based on the optical flow information.

3. The method of claim 1, wherein prior to obtaining the training image set and the training annotation information corresponding to the training image set, the method further comprises:

and extracting inter-frame optical flow between the two training images to obtain the optical flow information.

4. The method of claim 1, wherein prior to obtaining the training image set and the training annotation information corresponding to the training image set, the method further comprises:

and fusing the obtained two object information with background information respectively to generate the two training images.

5. An image processing method, characterized in that the method comprises:

receiving an image to be processed and a background to be fused;

inputting the image to be processed into an image segmentation model trained to be converged by the method according to any one of claims 1 to 4, and obtaining target mask information corresponding to the image to be processed;

6. An apparatus for training an image segmentation model, the apparatus comprising:

the updating module is used for updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition;

the update module is specifically configured to:

7. An image processing apparatus, characterized in that the apparatus comprises:

a second processing module, configured to input the image to be processed into an image segmentation model trained to converge by using the method according to any one of claims 1 to 4, and obtain target mask information corresponding to the image to be processed;

8. An electronic device, comprising:

a memory for storing one or more programs;

a processor;

the one or more programs, when executed by the processor, implement the method of any of claims 1-5.

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.