CN111260679B - Image processing method, image segmentation model training method and related device - Google Patents

Image processing method, image segmentation model training method and related device Download PDF

Info

Publication number
CN111260679B
CN111260679B CN202010014372.5A CN202010014372A CN111260679B CN 111260679 B CN111260679 B CN 111260679B CN 202010014372 A CN202010014372 A CN 202010014372A CN 111260679 B CN111260679 B CN 111260679B
Authority
CN
China
Prior art keywords
image
training
information
segmentation model
mask information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010014372.5A
Other languages
Chinese (zh)
Other versions
CN111260679A (en
Inventor
叶海佳
何帅
王文斓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huya Technology Co Ltd
Original Assignee
Guangzhou Huya Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huya Technology Co Ltd filed Critical Guangzhou Huya Technology Co Ltd
Priority to CN202010014372.5A priority Critical patent/CN111260679B/en
Publication of CN111260679A publication Critical patent/CN111260679A/en
Priority to PCT/CN2021/070167 priority patent/WO2021139625A1/en
Application granted granted Critical
Publication of CN111260679B publication Critical patent/CN111260679B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method, an image segmentation model training method and a related device, and relates to the technical field of artificial intelligence, wherein two training images which are adjacent in time sequence and optical flow information between the two training images are obtained as a training image set, and training annotation information corresponding to the training image set is obtained; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between the continuous images is ensured when image fusion is carried out.

Description

Image processing method, image segmentation model training method and related device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to an image processing method, an image segmentation model training method and a related device.
Background
The matting technology is used for separating foreground information and background information in one image and then applying the obtained foreground information to other background information; by utilizing the matting technology, the extracted foreground information can be fused with any background information, for example, in the live broadcast field, the extracted portrait information can be fused with any background picture or video, and therefore the live broadcast experience of a user is improved.
However, the current matting technique only separates pixels of the image information and the background information to obtain a mask only containing 0 and 1, and when fusion occurs, consistency between adjacent image frames is poor, so that object information in a video image may have situations such as jitter.
Disclosure of Invention
An object of the present application is to provide an image processing method, an image segmentation model training method, and a related apparatus, which can ensure consistency between consecutive images when performing image fusion.
In order to achieve the purpose, the technical scheme adopted by the application is as follows:
in a first aspect, the present application provides a method for training an image segmentation model, where the method includes:
obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set includes two training images adjacent in time sequence, and optical flow (optical flow) information between the two training images;
inputting the two training images into the image segmentation model to obtain two training mask information;
and updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition.
In a second aspect, the present application provides an image processing method, the method comprising:
receiving an image to be processed and a background to be fused;
inputting the image to be processed into an image segmentation model trained to be convergent by the method provided by the application to obtain target mask information corresponding to the image to be processed;
and processing the image to be processed and the background to be fused by using the target mask information to obtain a fused image.
In a third aspect, the present application provides an image segmentation model training apparatus, including:
the first processing module is used for obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set comprises two training images adjacent in time sequence and optical flow information between the two training images;
the first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;
and the updating module is used for updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches the set convergence condition.
In a fourth aspect, the present application provides an image processing apparatus comprising:
the receiving module is used for receiving the image to be processed and the background to be fused;
the second processing module is used for inputting the image to be processed into an image segmentation model which is trained to be convergent by using the method provided by the application, and obtaining target mask information corresponding to the image to be processed;
the second processing module is further configured to process the image to be processed and the background to be fused by using the target mask information, so as to obtain a fused image.
In a fifth aspect, the present application provides an electronic device comprising a memory for storing one or more programs; a processor; the one or more programs, when executed by the processor, implement the image segmentation model training method or the image processing method described above.
In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the above-described image segmentation model training method or image processing method.
According to the image processing method, the image segmentation model training method and the related device, two training images adjacent in time sequence and optical flow information between the two training images are obtained as a training image set, and training annotation information corresponding to the training image set is obtained; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, when the image segmentation model is trained, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between continuous images can be ensured when image fusion is carried out.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly explain the technical solutions of the present application, the drawings needed for the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and that those skilled in the art can also derive other related drawings from these drawings without inventive effort.
FIG. 1 is a block diagram illustrating a schematic configuration of an electronic device provided herein;
FIG. 2 illustrates a schematic flow diagram of an image segmentation model training method provided herein;
FIG. 3 shows a schematic block diagram of an image segmentation model;
FIG. 4 shows a schematic flow diagram of sub-steps of step 206 in FIG. 2;
FIG. 5 illustrates another schematic flow chart diagram of an image segmentation model training method provided herein;
FIG. 6 is a schematic diagram illustrating one way of extracting optical flow information;
FIG. 7 illustrates yet another schematic flow chart of an image segmentation model training method provided herein;
FIG. 8 shows a schematic flow chart of an image processing method provided by the present application;
FIG. 9 illustrates a schematic diagram of pre-and post-fusion contrast of images;
FIG. 10 is a block diagram illustrating an exemplary structure of an image segmentation model training apparatus provided in the present application;
fig. 11 shows a schematic block diagram of an image processing apparatus provided in the present application.
In the figure: 100-an electronic device; 101-a memory; 102-a processor; 103-a communication interface; 400-image segmentation model training means; 401-a first processing module; 402-an update module; 500-an image processing apparatus; 501-a receiving module; 502-second processing module.
Detailed Description
To make the purpose, technical solutions and advantages of the present application clearer, the technical solutions in the present application will be clearly and completely described below with reference to the accompanying drawings in some embodiments of the present application, and it is obvious that the described embodiments are some, but not all embodiments of the present application. The components of the present application, as generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments obtained by a person of ordinary skill in the art based on a part of the embodiments in the present application without any creative effort belong to the protection scope of the present application.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
In the live broadcast field, for example, foreground information such as a portrait of a main broadcast may be separated from background information by using a matting technique, and then the separated foreground information may be merged with other background information.
Wherein, assuming that the foreground information separated from the matting is represented as F and the fused background information is represented as B, the fused image I can be represented as: i ═ mF + (1-m) B.
In the formula, m represents mask information (mask) corresponding to the foreground information F.
Combining the fusion formula of the image I, it can be known that the fusion effect of the image I is mainly affected by the value of the mask information because the foreground information F and the background information B are both input fixed values.
In some matting schemes, such as binary-image semantic segmentation, when mask information is obtained, the binary-image semantic segmentation scheme is to understand an image from a semantic hierarchy, classify information in the image into foreground pixels and background pixels from semantic categories, and obtain a 0 and 1 mask.
However, in a scene such as live webcasting, when fusing foreground information such as a main portrait with other background information, since live webcasting plays a continuous video stream, it is necessary to consider not only the fusion between the foreground information and the background information, but also that the segmentation result of two continuous frames of images cannot have a large deviation. However, for example, in the above-mentioned human image binary semantic segmentation scheme, only the segmentation of a single frame image is considered, and the consistency of the segmentation results of adjacent image frames in time sequence is not considered, so that when the human image segmentation is applied to scenes such as live webcasting, after the segmented foreground information is fused with other background information, object information in a video image generated by fusion may appear a shake phenomenon, which affects user experience.
Therefore, based on the above drawbacks, the present application provides a possible implementation manner as follows: obtaining two training images adjacent in time sequence and optical flow information between the two training images as a training image set, and obtaining training marking information corresponding to the training image set; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; the image segmentation model can be combined with the motion information between each image and other adjacent images to extract the mask information of the corresponding images, and further consistency among continuous images can be ensured when image fusion is carried out.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
Referring to fig. 1, fig. 1 is a schematic block diagram illustrating an electronic device 100 provided in the present application, in which an untrained image segmentation model may be stored in the electronic device 100, so as to execute the image segmentation model training method provided in the present application to complete training of the image segmentation model; alternatively, the electronic device 100 may store therein an image segmentation model trained to converge by the image segmentation model training method provided by the present application, and implement the image processing method provided by the present application by using the image segmentation model trained to converge.
The electronic device 100 includes a memory 101, a processor 102, and a communication interface 103, wherein the memory 101, the processor 102, and the communication interface 103 are electrically connected to each other directly or indirectly to enable data transmission or interaction. For example, the components may be electrically connected to each other via one or more communication buses or signal lines.
The memory 101 may be used to store software programs and modules, such as program instructions/modules corresponding to the image segmentation model training apparatus or the image processing apparatus provided in the present application, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 101, so as to execute the image segmentation model training method or the steps corresponding to the image processing method provided in the present application. The communication interface 103 may be used for communicating signaling or data with other node devices.
The Memory 101 may be, but is not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Programmable Read-Only Memory (EEPROM), and the like.
The processor 102 may be an integrated circuit chip having signal processing capabilities. The Processor 102 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
It will be appreciated that the configuration shown in FIG. 1 is merely illustrative and that electronic device 100 may include more or fewer components than shown in FIG. 1 or may have a different configuration than shown in FIG. 1. The components shown in fig. 1 may be implemented in hardware, software, or a combination thereof.
The following describes an exemplary training method of an image segmentation model provided in the present application, taking the electronic device 100 shown in fig. 1 as an exemplary implementation subject.
Referring to fig. 2, fig. 2 shows a schematic flow chart of the image segmentation model training method provided in the present application, which may include the following steps:
step 202, obtaining a training image set and training annotation information corresponding to the training image set;
step 204, inputting the two training images into an image segmentation model to obtain two training mask information;
and step 206, updating model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition.
In one embodiment, an image segmentation model as shown in fig. 3 is stored in the electronic device, and the image segmentation model can process an input image and output mask information corresponding to the image; the network structure adopted by the image segmentation model may be an Unet network or a segmentation network such as depeplabv 3 or SEGNET, and the network structure of the image segmentation model is not limited in the present application.
When the electronic device trains the image segmentation model, a training image set and training annotation information corresponding to the training image set may be obtained first, where the training image set includes two training images adjacent to each other in time sequence, such as I in fig. 30And I1And optical flow information between the two training images, the optical flow information characterizing the motion cues between the two training images, i.e. I0And I1The correlation between them.
Then, as shown in FIG. 3, the electronic device may compare I0And I1Inputting two training images into an image segmentation model to obtain two training mask information; for example, in the scenario shown in FIG. 3, I0The corresponding Mask information may be Mask in FIG. 30,I1The corresponding Mask information may be Mask in FIG. 31
Finally, the electronic device may be based on, for example, Mask0And Mask1Updating model parameters of the image segmentation model by using two pieces of training mask information, training label information and optical flow information by using a back propagation algorithm (BP algorithm), for example, until the image segmentation model reaches a set convergence condition; the optical flow information represents the motion information between the two training images, and accordingly, the mask information corresponding to each of the two training images should also have the motion information represented by the optical flow information; in this way, when updating the model parameters of the image segmentation model, the image segmentation model can learn the motion information between two training images by using the optical flow information, so that when extracting the mask information of the target image, the image segmentation model can be extracted by combining the mask information of other images adjacent to the target image, thereby keeping the consistency between the adjacent images.
Based on the above design, the image segmentation model training method provided by the application obtains two training images adjacent in time sequence and optical flow information between the two training images as a training image set, and obtains training annotation information corresponding to the training image set; then inputting the two training images into an image segmentation model to obtain two training mask information; updating model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information until the image segmentation model reaches a set convergence condition; compared with the prior art, when the image segmentation model is trained, the image segmentation model can learn the motion information between the images by using the optical flow information, so that the image segmentation model can extract the mask information of the corresponding image by combining the motion information between each image and other adjacent images, and the consistency between continuous images can be ensured when image fusion is carried out.
It should be noted that two training images that are adjacent in time sequence and obtained by the electronic device include a first image that is earlier in time sequence and a second image that is later in time sequence, such as I in fig. 30It may be provided as a first image,I1may be used as the second image.
Accordingly, the two training mask information output by the image segmentation model include the first training mask information corresponding to the first image and the second training mask information corresponding to the second image, such as I in fig. 30Corresponding Mask0Can be used as the first training mask information, I1Corresponding Mask1May be used as the second training mask information.
In addition, optical flow is the movement of an object, a scene, or an object caused by the camera moving between two consecutive frames of images; the optical flow information is a vector information, and the optical flow is generally divided into a forward optical flow and a backward optical flow, such as an image I between two frames of images shown in FIG. 30In picture I1Before, then for image I1In other words, image I0To image I1The optical flow information of (1) is backward optical flow, and the recorded image is image I0To image ItDirection and rate of movement of; image I1To image I0The optical flow information of (1) is forward optical flow, and image I is recorded1To image I0Direction and rate of movement.
In the above description, for illustration purposes only, an image in the front of the time sequence is used as a first image, and an image in the back of the time sequence is used as a second image; in other possible embodiments of the present application, the later time sequence may be used as the first image, and the earlier time sequence may be used as the second image; this is not limited in this application.
Optionally, in an embodiment, the training annotation information obtained by the electronic device may be annotation mask information of the second image, that is, according to the scenario shown in fig. 3, the electronic device obtains the image I1Marking mask information; accordingly, the optical flow information obtained by the electronic device may be for the image I1Backward optical flow in the word.
Thus, referring to fig. 4 on the basis of fig. 2, fig. 4 shows a schematic flow chart of the sub-steps of step 206 in fig. 2, and as a possible implementation, step 206 may include the following sub-steps:
step 206-1, obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;
step 206-2, obtaining the time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;
and step 206-3, updating model parameters of the image segmentation model based on the content loss and the time sequence loss.
The electronic device, in performing step 206 to update the model parameters of the image segmentation model, may split the loss function of the image segmentation model into two parts, content loss and timing loss.
For example, the loss function of the image segmentation model may satisfy the following formula:
L=Lc+Lst
in the formula, L represents the total loss of the image segmentation model, Lc represents the content loss, and Lst represents the time-series loss.
The content loss constraint is the second training mask information output by the image segmentation model and the actual mask information of the second image, and the content loss guarantees the accuracy of the segmentation result.
In this way, when the electronic device executes step 206, the content loss of the image segmentation model can be obtained according to the annotation mask information, the second training mask information and the second image, that is, the difference between the second training mask information and the annotation mask information is calculated.
For example, as a possible implementation, the calculation formula of the content loss may satisfy the following:
Lc=||maskgt-mask1pre||2+||I1*(maskgt-mask1pre)||2
where Lc represents a loss of content, maskgtIndicating annotation mask information, mask1preRepresenting second training mask information, I1Representing the second image.
On the other hand, the time sequence loss is constrained by the motion information between the two frames of images, and the mask information corresponding to the two frames of images can be kept consistent in time sequence.
In this way, when the electronic device executes step 206, the electronic device obtains the timing loss of the image segmentation model based on the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.
For example, as a possible implementation, the timing loss calculation formula may satisfy the following:
Lst=exp(-α*M)||mask1pre-warp01(mask0pre)||2
where Lst denotes a timing loss, α denotes a set parameter, and M | | | I1-warp01(I0)||2,I0Representing a first image, warp01Representing optical flow information, mask0preRepresenting first training mask information, warp01(mask0pre) Representing mask information, warp, at the moment of transforming the first training mask information into the second image on the basis of the optical flow information01(I0) Representing the image at the time when the first image was transformed to the second image based on the optical flow information.
In this way, the electronic device may calculate the sum of the content loss and the timing loss as the total loss of the image segmentation model in a summation manner based on the obtained content loss and timing loss, and then update the model parameters of the image segmentation model by using, for example, a BP algorithm; and (4) continuously performing iterative training until the image segmentation model reaches the set convergence condition.
It should be noted that the above formulas for calculating the content loss, the timing loss, and the total loss of the image segmentation model are only illustrative, and in some other possible embodiments of the present application, other formulas may be used to calculate the above losses, for example.
In addition, in the above scheme provided by the present application, the step 206-1 may be executed to obtain the content loss, and then the step 206-2 may be executed to obtain the timing loss; or, the step 206-2 is executed to obtain the content loss first, and then the step 206-1 is executed to obtain the content loss; the sequence of the execution of the step 206-1 and the step 206-2 is not limited in the present application; for example, in another possible implementation, step 206-1 may be performed together with step 206-2.
Furthermore, it should be noted that the optical flow is a two-dimensional vector field of the image during the translation process, which represents the velocity field of the object point in the three-dimensional motion by the two-dimensional image, reflecting the image change due to the motion in a tiny time interval to determine the motion direction and motion rate on the image point, so that the optical flow can be used to provide clues for restoring the image motion.
When the image segmentation model is trained, optical flow information between two training images can be obtained in an online extraction mode, so that the workload of a user when the image segmentation model is trained is reduced.
To this end, referring to fig. 5 on the basis of fig. 2, fig. 5 shows another schematic flowchart of the image segmentation model training method provided in the present application, and before performing step 202, the image segmentation model training method may further include the following steps:
in step 201, inter-frame optical flow between two training images is extracted to obtain optical flow information.
In one embodiment, as shown in FIG. 6, the electronic device may extract an inter-frame optical flow between two training images using, for example, a selflow algorithm to obtain optical flow information.
For example, as described above will be for image I1In the example of backward optical flow as optical flow information, the electronic device may convert the image I0And image I1As input, and extracting image I by using selflow algorithm1The backward interframe optical flow, thereby obtaining optical flow information.
However, in the embodiment of obtaining optical flow information by using the above-mentioned online extraction method, the step of extracting optical flow information online needs to be performed, so that the training time of the image segmentation model is prolonged; further, when the image segmentation model is iteratively trained, since step 201 needs to be repeatedly executed, the operation of extracting optical flow information may be repeatedly performed on two training images of the same set.
Therefore, in another embodiment, the optical flow information can also be obtained by means of offline extraction, namely: step 201 may be executed first, and then after obtaining the optical flow information of the two training images in each group, the two training images in each group and the corresponding optical flow information are used as the input of the image segmentation model, so as to execute the training process of the image segmentation model; at this time, when the image segmentation model is trained, since it is not necessary to perform step 201 to obtain optical flow information again, it is possible to reduce the training time of the image segmentation model and avoid performing an operation of repeatedly extracting optical flow information on two training images of the same set.
Moreover, in an actual training scene, because the open-source data is less, in order to make the training image of the training image segmentation model sufficient, the data amount of the training image can be increased by intercepting, for example, a portrait matting data set in a live broadcast scene and extracting mask information of a corresponding image as annotation mask information.
However, it should be noted that the manual operation of the user is still required by intercepting the image cutout data set, which increases the workload of the user in training the image segmentation model.
To this end, referring to fig. 7 on the basis of fig. 2, fig. 7 shows a further schematic flowchart of the image segmentation model training method provided by the present application, and before performing step 202, the image segmentation model training method may further include the following steps:
step 200, fusing the obtained two object information with background information respectively to generate two training images.
In one embodiment, a user can extract object information in two images adjacent in time sequence and transmit the two object information to the electronic device; then, the electronic device may fuse the obtained two object information with a background information to generate two training images, that is: a set of training images, thereby increasing the amount of data in the training images.
Of course, it should be understood that the above describes only one way of generating two training images, by taking the example of fusing the obtained object information with one piece of background information; when a large number of training images need to be generated, the electronic device may fuse the two training images with different background information, so as to generate a plurality of sets of training images, where each set of training images includes two training images.
In addition, in combination with the above description, in a scene where the foreground information F and the background information B separated by matting are fused, the fused image I can be represented as: i ═ mF + (1-m) B.
In the expression, m represents mask information corresponding to the foreground information F. As can be seen, based on the expression, as long as the mask information m corresponding to the foreground information F can be obtained, the foreground information F can be fused with any background information B.
Thus, on the basis of the image segmentation model training method provided by the application, the image segmentation model training method can be used for training a converged image segmentation model for fusion of foreground information F and background information B in a live scene, for example.
Referring to fig. 8, fig. 8 shows a schematic flow chart of an image processing method provided by the present application, which may include the following steps:
step 302, receiving an image to be processed and a background to be fused;
step 304, inputting the image to be processed into an image segmentation model trained to be converged by using an image segmentation model training method to obtain target mask information corresponding to the image to be processed;
and step 306, processing the image to be processed and the background to be fused by using the target mask information to obtain a fused image.
In an embodiment, for example, in a live scene, the electronic device may use each received live video frame as an image to be processed, and receive a background to be merged, where the purpose is to: and replacing the background information of each frame of video live broadcast picture with the background to be fused.
Then, taking one frame of live video picture as an image to be processed as an example, the electronic device may use the image to be processedInputting the image into the image segmentation model trained to be converged by the image segmentation model training method provided by the application, so that the image segmentation model outputs target Mask information Mask corresponding to the image to be processedm
Finally, the electronic equipment can Mask the obtained target Mask informationmThe parameter m is used as the parameter m in the fusion formula, and the image to be processed and the background to be fused are substituted into the fusion formula, so that a fusion image I is obtained, and the effect before and after fusion can be shown in FIG. 9; as can be seen, the image segmentation model can extract mask information of corresponding images by combining motion information between each image and another adjacent image when performing image fusion after learning motion information between images using optical flow information, and can ensure consistency between consecutive images when performing image fusion.
In addition, based on the same inventive concept as the above-mentioned image segmentation model training method provided in the present application, please refer to fig. 10, and fig. 10 shows a schematic structural block diagram of an image segmentation model training apparatus 400 provided in the present application, where the image segmentation model training apparatus 400 may include a first processing module 401 and an updating module 402. Wherein:
a first processing module 401, configured to obtain a training image set and training annotation information corresponding to the training image set; the training image set comprises two training images which are adjacent in time sequence and optical flow information between the two training images;
the first processing module 401 is further configured to input the two training images into an image segmentation model to obtain two training mask information;
and an updating module 402, configured to update the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information, and the optical flow information until the image segmentation model reaches a set convergence condition.
Optionally, as a possible implementation manner, the two training images include a first image before the time sequence and a second image after the time sequence, and the training annotation information is annotation mask information of the second image;
the two training mask information comprise first training mask information corresponding to the first image and second training mask information corresponding to the second image;
the updating module 402 is specifically configured to, when updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information, and the optical flow information:
obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;
obtaining the time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;
and updating the model parameters of the image segmentation model based on the content loss and the time sequence loss.
Optionally, as a possible implementation manner, the calculation formula of the content loss satisfies the following:
Lc=||maskgt-mask1pre||2+||I1*(maskgt-mask1pre)||2
where Lc represents a loss of content, maskgtIndicating annotation mask information, mask1preRepresenting second training mask information, I1Representing a second image;
the calculation formula of the timing loss satisfies the following:
Lst=exp(-α*M)||mask1pre-warp01(mask0pre)||2
where Lst denotes a timing loss, α denotes a set parameter, and M | | | I1-warp01(I0)||2,I0Representing a first image, warp01Representing optical flow information, mask0preRepresenting first training mask information, warp01(mask0pre) Representing mask information, warp, at the moment of transforming the first training mask information into the second image on the basis of the optical flow information01(I0) Representing the image at the time when the first image was transformed to the second image based on the optical flow information.
Optionally, as a possible implementation manner, before obtaining the training image set and the training annotation information corresponding to the training image set, the first processing module 401 is further configured to:
an inter-frame optical flow between two training images is extracted to obtain optical flow information.
Optionally, as a possible implementation manner, before obtaining the training image set and the training annotation information corresponding to the training image set, the first processing module 401 is further configured to:
and fusing the obtained two object information with background information respectively to generate two training images.
Referring to fig. 11, fig. 11 shows a schematic block diagram of an image processing apparatus 500 according to the present application, where the image processing apparatus 500 may include a receiving module 501 and a second processing module 502. Wherein:
a receiving module 501, configured to receive an image to be processed and a background to be fused;
a second processing module 502, configured to input an image to be processed into an image segmentation model trained to converge by using the image segmentation model provided in the present application, so as to obtain target mask information corresponding to the image to be processed;
the second processing module 502 is further configured to process the image to be processed and the background to be fused by using the target mask information, so as to obtain a fused image.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative, and for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to some embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in some embodiments of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to perform all or part of the steps of the method according to some embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic or optical disk, etc. for storing program codes.
The above description is only a few examples of the present application and is not intended to limit the present application, and those skilled in the art will appreciate that various modifications and variations can be made in the present application. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims (9)

1. An image segmentation model training method, characterized in that the method comprises:
obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set comprises two training images adjacent in time sequence and optical flow information between the two training images;
inputting the two training images into the image segmentation model to obtain two training mask information;
updating model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition;
the two training images comprise a first image in front of a time sequence and a second image in back of the time sequence, and the training annotation information is annotation mask information of the second image;
the two pieces of training mask information comprise first training mask information corresponding to the first image and second training mask information corresponding to the second image;
updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training annotation information and the optical flow information, wherein the step comprises the following steps:
obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;
obtaining a time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;
updating model parameters of the image segmentation model based on the content loss and the timing loss.
2. The method of claim 1, wherein the calculation formula of the content loss satisfies the following:
Lc=||maskgt-mask1pre||2+||I1*(maskgt-mask1pre)||2
where Lc represents the loss of content, maskgtRepresenting the annotation mask information, mask1preRepresenting said second training mask information, I1Representing the second image;
the calculation formula of the time sequence loss satisfies the following conditions:
Lst=exp(-α*M)||mask1pre-warp01(mask0pre)||2
where Lst denotes the timing loss, α denotes a set parameter, and M | | | I1-warp01(I0)||2,I0Representing said first image, warp01Representing the optical flow information, mask0preRepresenting said first training mask information, warp01(mask0pre) Representing mask information, warp, at the moment of transforming the first training mask information into the second image on the basis of the optical flow information01(I0) Representing the image at the time when the first image was transformed to the second image based on the optical flow information.
3. The method of claim 1, wherein prior to obtaining the training image set and the training annotation information corresponding to the training image set, the method further comprises:
and extracting inter-frame optical flow between the two training images to obtain the optical flow information.
4. The method of claim 1, wherein prior to obtaining the training image set and the training annotation information corresponding to the training image set, the method further comprises:
and fusing the obtained two object information with background information respectively to generate the two training images.
5. An image processing method, characterized in that the method comprises:
receiving an image to be processed and a background to be fused;
inputting the image to be processed into an image segmentation model trained to be converged by the method according to any one of claims 1 to 4, and obtaining target mask information corresponding to the image to be processed;
and processing the image to be processed and the background to be fused by using the target mask information to obtain a fused image.
6. An apparatus for training an image segmentation model, the apparatus comprising:
the first processing module is used for obtaining a training image set and training annotation information corresponding to the training image set; wherein the training image set comprises two training images adjacent in time sequence and optical flow information between the two training images;
the first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;
the updating module is used for updating the model parameters of the image segmentation model according to the two pieces of training mask information, the training label information and the optical flow information until the image segmentation model reaches a set convergence condition;
the two training images comprise a first image in front of a time sequence and a second image in back of the time sequence, and the training annotation information is annotation mask information of the second image;
the two pieces of training mask information comprise first training mask information corresponding to the first image and second training mask information corresponding to the second image;
the update module is specifically configured to:
obtaining the content loss of the image segmentation model according to the labeling mask information, the second training mask information and the second image;
obtaining a time sequence loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information and the second training mask information;
updating model parameters of the image segmentation model based on the content loss and the timing loss.
7. An image processing apparatus, characterized in that the apparatus comprises:
the receiving module is used for receiving the image to be processed and the background to be fused;
a second processing module, configured to input the image to be processed into an image segmentation model trained to converge by using the method according to any one of claims 1 to 4, and obtain target mask information corresponding to the image to be processed;
the second processing module is further configured to process the image to be processed and the background to be fused by using the target mask information, so as to obtain a fused image.
8. An electronic device, comprising:
a memory for storing one or more programs;
a processor;
the one or more programs, when executed by the processor, implement the method of any of claims 1-5.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-5.
CN202010014372.5A 2020-01-07 2020-01-07 Image processing method, image segmentation model training method and related device Active CN111260679B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010014372.5A CN111260679B (en) 2020-01-07 2020-01-07 Image processing method, image segmentation model training method and related device
PCT/CN2021/070167 WO2021139625A1 (en) 2020-01-07 2021-01-04 Image processing method, image segmentation model training method and related apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010014372.5A CN111260679B (en) 2020-01-07 2020-01-07 Image processing method, image segmentation model training method and related device

Publications (2)

Publication Number Publication Date
CN111260679A CN111260679A (en) 2020-06-09
CN111260679B true CN111260679B (en) 2022-02-01

Family

ID=70923869

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010014372.5A Active CN111260679B (en) 2020-01-07 2020-01-07 Image processing method, image segmentation model training method and related device

Country Status (2)

Country Link
CN (1) CN111260679B (en)
WO (1) WO2021139625A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260679B (en) * 2020-01-07 2022-02-01 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related device
CN112351291A (en) * 2020-09-30 2021-02-09 深圳点猫科技有限公司 Teaching interaction method, device and equipment based on AI portrait segmentation
CN112560583A (en) * 2020-11-26 2021-03-26 复旦大学附属中山医院 Data set generation method and device
CN112669324B (en) * 2020-12-31 2022-09-09 中国科学技术大学 Rapid video target segmentation method based on time sequence feature aggregation and conditional convolution
CN113051430B (en) * 2021-03-26 2024-03-26 北京达佳互联信息技术有限公司 Model training method, device, electronic equipment, medium and product
CN113393465A (en) * 2021-05-26 2021-09-14 浙江吉利控股集团有限公司 Image generation method and device
US20240153038A1 (en) * 2021-07-15 2024-05-09 Boe Technology Group Co., Ltd. Image processing method and device, and training method of image processing model and training method thereof
CN113610865B (en) * 2021-07-27 2024-03-29 Oppo广东移动通信有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN113570689B (en) * 2021-07-28 2024-03-01 杭州网易云音乐科技有限公司 Portrait cartoon method, device, medium and computing equipment
CN114782460B (en) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment
CN115457119B (en) * 2022-09-21 2023-10-27 正泰集团研发中心(上海)有限公司 Bus bar labeling method, device, computer equipment and readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875900A (en) * 2017-11-02 2018-11-23 北京旷视科技有限公司 Method of video image processing and device, neural network training method, storage medium
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676081B2 (en) * 2005-06-17 2010-03-09 Microsoft Corporation Image segmentation of foreground from background layers
CN103942794B (en) * 2014-04-16 2016-08-31 南京大学 A kind of image based on confidence level is collaborative scratches drawing method
US10475186B2 (en) * 2016-06-23 2019-11-12 Intel Corportation Segmentation of objects in videos using color and depth information
CN109697689B (en) * 2017-10-23 2023-09-01 北京京东尚科信息技术有限公司 Storage medium, electronic device, video synthesis method and device
CN107808389B (en) * 2017-10-24 2020-04-17 上海交通大学 Unsupervised video segmentation method based on deep learning
CN109978893B (en) * 2019-03-26 2023-06-20 腾讯科技(深圳)有限公司 Training method, device, equipment and storage medium of image semantic segmentation network
CN110176027B (en) * 2019-05-27 2023-03-14 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN110472593B (en) * 2019-08-20 2021-02-09 重庆紫光华山智安科技有限公司 Training image acquisition method, model training method and related device
CN111260679B (en) * 2020-01-07 2022-02-01 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875900A (en) * 2017-11-02 2018-11-23 北京旷视科技有限公司 Method of video image processing and device, neural network training method, storage medium
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于光流场的图像分割;焦春林 等;《西北工业大学学报》;20060430;第24卷(第2期);第265-269页 *

Also Published As

Publication number Publication date
CN111260679A (en) 2020-06-09
WO2021139625A1 (en) 2021-07-15

Similar Documents

Publication Publication Date Title
CN111260679B (en) Image processing method, image segmentation model training method and related device
Zhang et al. Uncertainty inspired RGB-D saliency detection
Manen et al. Pathtrack: Fast trajectory annotation with path supervision
CN112967212A (en) Virtual character synthesis method, device, equipment and storage medium
JP7247327B2 (en) Techniques for Capturing and Editing Dynamic Depth Images
CN103608847A (en) Method and arrangement for image model construction
CN112308770B (en) Portrait conversion model generation method and portrait conversion method
CN112597824A (en) Behavior recognition method and device, electronic equipment and storage medium
CN114339450A (en) Video comment generation method, system, device and storage medium
CN114187392B (en) Virtual even image generation method and device and electronic equipment
Sun et al. Modnet-v: Improving portrait video matting via background restoration
WO2022164680A1 (en) Simultaneously correcting image degradations of multiple types in an image of a face
Mukhopadhyay et al. Diff2lip: Audio conditioned diffusion models for lip-synchronization
CN116091955A (en) Segmentation method, segmentation device, segmentation equipment and computer readable storage medium
Ye et al. Real3d-portrait: One-shot realistic 3d talking portrait synthesis
WO2021070004A1 (en) Object segmentation in video stream
KR102514807B1 (en) Method and Apparatus for 3D Hand Mesh Recovery in Motion Blur RGB Image
CN115147516A (en) Virtual image video generation method and device, computer equipment and storage medium
Ni Application of motion tracking technology in movies, television production and photography using big data
Menapace et al. Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
JP2014149788A (en) Object area boundary estimation device, object area boundary estimation method, and object area boundary estimation program
Gomes Graph-based network for dynamic point cloud prediction
CN111340101A (en) Stability evaluation method and device, electronic equipment and computer readable storage medium
Chopra et al. Source-Free Domain Adaptation with Diffusion-Guided Source Data Generation
CN113609960B (en) Face driving method and device for target picture

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant