WO2021139625A1 - Image processing method, image segmentation model training method and related apparatus - Google Patents

Image processing method, image segmentation model training method and related apparatus Download PDF

Info

Publication number
WO2021139625A1
WO2021139625A1 PCT/CN2021/070167 CN2021070167W WO2021139625A1 WO 2021139625 A1 WO2021139625 A1 WO 2021139625A1 CN 2021070167 W CN2021070167 W CN 2021070167W WO 2021139625 A1 WO2021139625 A1 WO 2021139625A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
training
information
segmentation model
image segmentation
Prior art date
Application number
PCT/CN2021/070167
Other languages
French (fr)
Chinese (zh)
Inventor
叶海佳
何帅
王文斓
Original Assignee
广州虎牙科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 广州虎牙科技有限公司 filed Critical 广州虎牙科技有限公司
Publication of WO2021139625A1 publication Critical patent/WO2021139625A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration by the use of more than one image, e.g. averaging, subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30204Marker

Definitions

  • This application relates to the field of artificial intelligence technology. Specifically, it provides an image processing method, an image segmentation model training method, and related devices.
  • the matting technology refers to separating the foreground information and background information in an image, and then applying the obtained foreground information to other background information; using matting technology, the extracted foreground information can be combined with any background information
  • the extracted portrait information can be integrated with any background picture or video, thereby enhancing the user's experience of watching the live broadcast.
  • the current matting technology only separates the pixels of the portrait information from the background information, and obtains a mask containing only 0 and 1.
  • the consistency between adjacent image frames is poor, resulting in poor video images.
  • the object information may be jittery and so on.
  • the purpose of this application is to provide an image processing method, an image segmentation model training method and related devices, which can ensure consistency between consecutive images when performing image fusion.
  • An embodiment of the application provides an image segmentation model training method, the method includes:
  • the training image set includes two training images that are adjacent in time series, and optical flow information between the two training images;
  • the model parameters of the image segmentation model are updated until the image segmentation model reaches a set convergence condition.
  • An embodiment of the present application also provides an image processing method, the method including:
  • the target mask information is used to process the to-be-processed image and the to-be-fused background to obtain a fused image.
  • An embodiment of the present application also provides an image segmentation model training device, the device includes:
  • the first processing module is configured to obtain a training image set and training annotation information corresponding to the training image set; wherein, the training image set includes two training images that are adjacent in time series, and one of the two training images Optical flow information between time;
  • the first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;
  • An update module configured to update the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, until the image segmentation model reaches a set convergence condition .
  • An embodiment of the present application also provides an image processing device, which includes:
  • the receiving module is configured to receive the image to be processed and the background to be fused
  • the second processing module is configured to input the image to be processed into the image segmentation model trained to converge using the image segmentation model training method provided in this application to obtain target mask information corresponding to the image to be processed;
  • the second processing module is further configured to use the target mask information to process the to-be-processed image and the to-be-fused background to obtain a fused image.
  • the embodiment of the present application also provides an electronic device, including:
  • the memory is configured to store one or more programs
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the above-mentioned image segmentation model training method or image processing method provided in the present application is implemented.
  • Figure 1 shows a schematic structural block diagram of an electronic device provided by the present application
  • Fig. 2 shows a schematic flowchart of the image segmentation model training method provided by the present application
  • Figure 3 shows a schematic structural diagram of an image segmentation model
  • FIG. 4 shows a schematic flowchart of the sub-steps of step 206 in FIG. 2;
  • FIG. 5 shows another schematic flowchart of the image segmentation model training method provided by the present application
  • Fig. 6 shows a schematic diagram of an extraction method of optical flow information
  • FIG. 7 shows still another schematic flowchart of the image segmentation model training method provided by the present application.
  • FIG. 8 shows a schematic flowchart of the image processing method provided by the present application.
  • Figure 9 shows a schematic diagram of image fusion before and after comparison
  • FIG. 10 shows a schematic structural block diagram of the image segmentation model training device provided by the present application.
  • FIG. 11 shows a schematic structural block diagram of the image processing device provided by the present application.
  • 100-electronic equipment 101-memory; 102-processor; 103-communication interface; 400-image segmentation model training device; 401-first processing module; 402-update module; 500-image processing device; 501- Receiving module; 502-second processing module.
  • the matting technology can be used to separate foreground information such as the portrait of the host from the background information, and then merge the separated foreground information with other background information.
  • m represents the mask information corresponding to the foreground information F (mask).
  • the portrait binary semantic segmentation scheme in the process of obtaining mask information, the portrait binary semantic segmentation scheme generally understands the image from the semantic level, and classifies the information in the image from the semantic category For foreground pixels and background pixels, the result is a mask with a value range between 0 and 1.
  • some possible implementations provided by this application are: obtaining two training images that are adjacent in time sequence, and the optical flow information between the two training images is used as Train the image set, and obtain the training annotation information corresponding to the training image set; then input the two training images into the image segmentation model to obtain two training mask information; and then according to the two training mask information, training annotation information, and Optical flow information, update the model parameters of the image segmentation model until the image segmentation model reaches the set convergence conditions; enable the image segmentation model to combine the motion information between each image and other adjacent images to extract the corresponding image mask Information, and then when performing image fusion, it can ensure consistency between consecutive images.
  • FIG. 1 shows a schematic structural block diagram of an electronic device 100 provided in this application.
  • the electronic device 100 can store an untrained image segmentation model to perform the image segmentation model training provided in this application. Method to complete the training of the image segmentation model; or, the electronic device 100 may store an image segmentation model trained to converge using the image segmentation model training method provided in this application, and use the training to converge image segmentation
  • the model implements the image processing method provided in this application.
  • the electronic device 100 may include a memory 101, a processor 102, and a communication interface 103.
  • the memory 101, the processor 102, and the communication interface 103 are directly or indirectly electrically connected to each other to realize data exchange. Transmission or interaction.
  • these components can be electrically connected to each other through one or more communication buses or signal lines.
  • the memory 101 may be configured to store software programs and modules, such as the image segmentation model training device provided in this application or program instructions/modules corresponding to the image processing device.
  • the processor 102 executes the software programs and modules stored in the memory 101, In this way, various functional applications and data processing are executed, and then the image segmentation model training method or the steps corresponding to the image processing method provided in this application are executed.
  • the communication interface 103 may be configured to perform signaling or data communication with other node devices.
  • the memory 101 may be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), and programmable read-only memory (Programmable Read-Only Memory, PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
  • RAM Random Access Memory
  • ROM read-only memory
  • PROM Programmable Read-Only Memory
  • EPROM Erasable Programmable Read-Only Memory
  • EEPROM Electric Erasable Programmable Read-Only Memory
  • the processor 102 may be an integrated circuit chip with signal processing capabilities.
  • the processor 102 may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • CPU central processing unit
  • NP Network Processor
  • DSP Digital Signal Processing
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • FIG. 1 is only for illustration, and the electronic device 100 may also include more or less components than those shown in FIG. 1, or have a different configuration from that shown in FIG. 1.
  • the components shown in FIG. 1 can be implemented by hardware, software, or a combination thereof.
  • the electronic device 100 shown in FIG. 1 is used as an exemplary execution subject to exemplarily describe the image segmentation model training method provided in the present application.
  • FIG. 2 shows a schematic flowchart of the image segmentation model training method provided by the present application.
  • the image segmentation model training method may include the following steps:
  • Step 202 Obtain a training image set and training annotation information corresponding to the training image set.
  • Step 204 Input two training images into the image segmentation model to obtain two training mask information.
  • Step 206 Update the model parameters of the image segmentation model according to the two training mask information, the training label information, and the optical flow information, until the image segmentation model reaches the set convergence condition.
  • an image segmentation model as shown in FIG. 3 may be stored in the electronic device.
  • the image segmentation model can process the input image and output the mask information of the corresponding image; wherein, the image segmentation model adopts
  • the network structure may be a Unet network, or a segmentation network such as Deeplabv3, SEGNET, etc.
  • the application does not limit the network structure of the image segmentation model.
  • the electronic device can first obtain the training image set and the training annotation information corresponding to the training image set, where the training image set includes two training images that are adjacent in time series, as shown in Figure 3.
  • the optical flow information represents the motion cues between the two training images, that is, the correlation between I 0 and I 1.
  • the electronic device can input the two training images I 0 and I 1 into the image segmentation model to obtain two training mask information; for example, in the scene shown in Figure 3, the mask information corresponding to I 0 It may be Mask 0 in FIG. 3, and the mask information corresponding to I 1 may be Mask 1 in FIG. 3.
  • the electronic device can update the model parameters of the image segmentation model by using, for example, a backpropagation algorithm (BP algorithm) based on the two training mask information, training annotation information, and optical flow information of Mask 0 and Mask 1, for example, until The image segmentation model reaches the set convergence condition; among them, since the optical flow information represents the motion information between the two training images, correspondingly, the mask information corresponding to the two training images also has the optical flow Information representation of the motion information; in this way, in the process of updating the model parameters of the image segmentation model, the image segmentation model can use optical flow information to learn the motion information between the two training images, so that the image segmentation model is extracting the target image When the mask information of the target image is combined, the mask information of other images adjacent to the target image can be extracted, so as to maintain the consistency between adjacent images.
  • BP algorithm backpropagation algorithm
  • an image segmentation model training method provided by this application is obtained by obtaining two training images adjacent in time sequence and the optical flow information between the two training images as a training image set, and obtaining The training annotation information corresponding to the training image set; then the two training images are input into the image segmentation model to obtain two training mask information; and then according to the two training mask information, training annotation information and optical flow information, the image is updated The model parameters of the segmentation model until the image segmentation model reaches the set convergence conditions; in this way, in the process of training the image segmentation model, the image segmentation model can use the optical flow information to learn the motion information between the images, so that the image segmentation model can Combining the motion information between each image and other adjacent images, extract the mask information of the corresponding image, and then when performing image fusion, it can ensure consistency between consecutive images.
  • the two training images that are adjacent in time sequence obtained by the electronic device generally include a first image that is earlier in time sequence and a second image that is later in time sequence, such as the I in FIG. 3 that is earlier in time sequence. 0 can be used as the first image, and I 1 with a later sequence can be used as the second image.
  • the two training mask information output by the image segmentation model includes the first training mask information corresponding to the first image and the second training mask information corresponding to the second image, such as I 0 in Figure 3 the mask 0 can be used as mask information of the first training, I 1 mask corresponding to a mask can be used as the second training information.
  • optical flow is the movement of the target caused by the movement of the target, scene, or camera between two consecutive frames of images; optical flow information is a vector information, and optical flow is generally divided into forward optical flow and backward optical flow.
  • the timing of image I 0 is before image I 1.
  • the optical flow information from image I 0 to image I 1 is the backward optical flow.
  • It records the movement direction and speed of the image I 0 to the image I 1 ;
  • the optical flow information from the image I 1 to the image I 0 is the forward optical flow, and it records the movement direction of the image I 1 to the image I 0 And rate.
  • the foregoing is only for illustration.
  • the image with the earlier time sequence is used as the first image
  • the image with the later time sequence is used as the second image; in some other possible embodiments of the present application, it is also possible to use
  • the later in the sequence is regarded as the first image
  • the earlier in the sequence is regarded as the second image; this application does not limit this.
  • the training annotation information obtained by the electronic device may be the annotation mask information of the second image.
  • the training annotation information obtained by the electronic device may be the image I 1
  • the mask information is marked; accordingly, the optical flow information obtained by the electronic device may be the backward optical flow for the image I 1.
  • FIG. 4 shows a schematic flowchart of the sub-steps of step 206 in FIG. 2.
  • step 206 may include the following sub-steps:
  • Step 206-1 Obtain the content loss of the image segmentation model according to the labeling mask information, the second training mask information, and the second image.
  • Step 206-2 Obtain the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.
  • Step 206-3 based on the content loss and the timing loss, update the model parameters of the image segmentation model.
  • the electronic device may divide the loss function of the image segmentation model into two parts: content loss and timing loss.
  • the loss function of the image segmentation model may satisfy the following formula:
  • L can represent the total loss of the image segmentation model
  • Lc can represent the content loss
  • Lst can represent the timing loss
  • the content loss constraint is the second training mask information output by the image segmentation model and the actual mask information of the second image, and the content loss guarantees the accuracy of the segmentation result.
  • the electronic device can obtain the content loss of the image segmentation model according to the labeling mask information, the second training mask information, and the second image, that is, calculating the second training mask information and the labeling mask.
  • the calculation formula for content loss may satisfy the following:
  • Lc represents the content loss
  • mask gt represents the labeling mask information
  • mask1 pre represents the second training mask information
  • I 1 represents the second image.
  • the timing loss constrains the motion information between the two frames of images, ensuring that the mask information corresponding to the two frames of images can be kept consistent in time sequence.
  • the electronic device obtains the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.
  • the timing loss calculation formula can satisfy the following:
  • Lst represents the timing loss
  • represents the set parameter
  • I 0 represents the first image
  • warp 01 represents optical flow information
  • mask0 pre represents the first training mask information
  • the electronic device can use a summation method to calculate the sum of the content loss and the timing loss as the total loss of the image segmentation model, and then use the calculated loss based on, for example, the BP algorithm.
  • the total loss updates the model parameters of the image segmentation model; through continuous iterative training, until the image segmentation model reaches the set convergence condition.
  • the electronic device may first perform step 206-1 to obtain the content loss, and then perform step 206-2 to obtain the timing loss; or it may perform step 206-2 first to obtain the content loss, and then perform step 206-1 obtains content loss; this application does not limit the order of execution of step 206-1 and step 206-2; for example, in another possible implementation manner, it can also be step 206-1 and step 206 -2 are executed together.
  • optical flow is a two-dimensional vector field in the translation process of an image. It uses a two-dimensional image to represent the velocity field of an object point in three-dimensional motion, which reflects the image change formed by motion in a small time interval. , To determine the direction and rate of motion on the image point, so that the optical flow can be configured to provide clues to restore image motion.
  • the electronic device can obtain the optical flow information between the two training images through online extraction, so as to reduce the workload of the user in the process of training the image segmentation model.
  • FIG. 5 shows another schematic flowchart of the image segmentation model training method provided by the present application. Before step 202 is performed, the image segmentation model training method may also It includes the following steps:
  • Step 201 Extract the inter-frame optical flow between two training images to obtain optical flow information.
  • the electronic device may use, for example, a selflow algorithm to extract the inter-frame optical flow between two training images to obtain optical flow information.
  • the electronic device can take the image I 0 and the image I 1 as input, and use the selflow algorithm to extract the backward frame of the image I 1 Time optical flow, thereby obtaining optical flow information.
  • the training time of the image segmentation model is lengthened due to the need to perform the step of extracting optical flow information online;
  • the operation of extracting optical flow information may be repeated on the same set of two training images.
  • the optical flow information can also be obtained by offline extraction, that is, step 201 can be performed first, and after the optical flow information of the two training images in each group is obtained, the optical flow information can be obtained.
  • step 201 can be performed first, and after the optical flow information of the two training images in each group is obtained, the optical flow information can be obtained.
  • Obtaining optical flow information in 201 can reduce the training time of the image segmentation model and avoid the repeated extraction of optical flow information from the same set of two training images.
  • FIG. 7 shows another schematic flowchart of the image segmentation model training method provided by the present application. Before step 202 is performed, the image segmentation model training method also It can include the following steps:
  • step 200 the two pieces of object information obtained are respectively fused with a piece of background information to generate two training images.
  • the user can extract the object information in two images that are adjacent in time sequence, and transmit the two object information to the electronic device; then, the electronic device can connect the obtained two object information with the One background information is fused to generate two training images, that is, a set of training images, so as to increase the data volume of the training images.
  • each training image is fused with different background information to generate multiple training image sets, and each training image set includes two training images.
  • m represents the mask information corresponding to the foreground information F. It can be seen that based on this expression, as long as the mask information m corresponding to the foreground information F can be obtained, the foreground information F and any background information B can be fused.
  • the image segmentation model is configured such as the foreground information F and the background in the live scene.
  • FIG. 8 shows a schematic flowchart of the image processing method provided by the present application.
  • the image processing method may include the following steps:
  • Step 302 Receive the image to be processed and the background to be fused.
  • Step 304 Input the image to be processed into the image segmentation model trained to converge using the image segmentation model training method to obtain target mask information corresponding to the image to be processed.
  • Step 306 Use the target mask information to process the image to be processed and the background to be fused to obtain a fused image.
  • the electronic device may use each frame of the received live video image as a to-be-processed image, and receive a background to be merged.
  • the purpose is: to combine the background information of each frame of the live video image Replace with the background to be blended.
  • the electronic device can input the image to be processed into the image segmentation model that is trained to the convergence using the image segmentation model training method provided in this application.
  • the segmentation model outputs the target mask information Mask m corresponding to the image to be processed.
  • the electronic device can use the obtained target mask information Mask m as the parameter m in the above fusion formula, and substitute the image to be processed and the background to be fused into the fusion formula to obtain the fusion image I.
  • the effect before and after the fusion can be as follows As shown in Figure 9; in this way, after the image segmentation model uses optical flow information to learn the motion information between images, in the process of image fusion, it can combine the motion information between each image and other adjacent images to extract Corresponding to the mask information of the image, which can ensure consistency between consecutive images when performing image fusion.
  • FIG. 10 shows a schematic structural block diagram of the image segmentation model training device 400 provided by this application.
  • the training device 400 may include a first processing module 401 and an update module 402. among them:
  • the first processing module 401 may be configured to obtain a training image set and training annotation information corresponding to the training image set; wherein, the training image set includes two training images that are adjacent in time series, and the difference between the two training images Optical flow information;
  • the first processing module 401 may also be configured to input two training images into the image segmentation model to obtain two training mask information;
  • the update module 402 may be configured to update the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, until the image segmentation model reaches the set convergence condition.
  • the two training images include a first image that is earlier in time sequence and a second image that is later in time sequence, and the training annotation information is the annotation mask information of the second image;
  • the two training mask information include first training mask information corresponding to the first image and second training mask information corresponding to the second image;
  • the updating module 402 can be configured to:
  • the model parameters of the image segmentation model are updated.
  • the update module 401 in the process of updating the model parameters of the image segmentation model based on the content loss and the timing loss, can be configured to:
  • the sum of content loss and timing loss is calculated as the total loss of the image segmentation model to update the model parameters of the image segmentation model with the total loss.
  • the calculation formula for content loss satisfies the following:
  • Lc represents content loss
  • mask gt represents labeled mask information
  • mask1 pre represents second training mask information
  • I 1 represents second image
  • Lst represents the timing loss
  • represents the set parameter
  • I 0 represents the first image
  • warp 01 represents optical flow information
  • mask0 pre represents the first training mask information
  • the mask information is marked as the mask information of the captured straight picture.
  • the first processing module 401 may also be configured to:
  • the first processing module 401 may be configured to:
  • the selflow algorithm is used to extract the inter-frame optical flow between two training images to obtain optical flow information.
  • the first processing module 401 may also be configured to:
  • the obtained two object information is respectively fused with a piece of background information to generate two training images.
  • FIG. 11 shows a schematic structural block diagram of the image processing apparatus 500 provided by the present application.
  • the image processing apparatus 500 may include receiving Module 501 and second processing module 502. among them:
  • the receiving module 501 may be configured to receive the image to be processed and the background to be merged;
  • the second processing module 502 may be configured to input the image to be processed into an image segmentation model trained to converge using the above-mentioned image segmentation model provided in this application, to obtain target mask information corresponding to the image to be processed;
  • the second processing module 502 may also be configured to use the target mask information to process the image to be processed and the background to be fused to obtain a fused image.
  • the receiving module 501 may be configured to:
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more configured to implement a prescribed logical function. Executable instructions.
  • the functions marked in the block may also occur in a different order from the order marked in the drawings.
  • two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
  • the functional modules in some embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
  • the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in some embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
  • the image segmentation model Because in the process of training the image segmentation model, two adjacent training images in time sequence and the optical flow information between the two training images are obtained as the training image set, and the corresponding training image set is obtained. Train the annotation information; then input the two training images into the image segmentation model to obtain two training mask information; and then update the model parameters of the image segmentation model according to the two training mask information, training annotation information, and optical flow information. Until the image segmentation model reaches the set convergence condition; in this way, the image segmentation model can use the optical flow information to learn the motion information between images, so that the image segmentation model can combine the motion information between each image and other adjacent images , To extract the mask information of the corresponding image, and then when performing image fusion, it can ensure consistency between consecutive images.

Abstract

Embodiments of the present application relate to the technical field of artificial intelligence, and provided therein are an image processing method, image segmentation model training method and related apparatus. The method comprises: obtaining two training images that are adjacent in time sequence and optical flow information between the two training images as a training image set, and obtaining training annotation information corresponding to the training image set; then inputting the two training images into the image segmentation model to obtain two pieces of training mask information; and then according to the two pieces of training mask information, training annotation information and optical flow information, and updating the model parameters of the image segmentation model until the image segmentation model reaches the set convergence condition. As such, the image segmentation model can use the optical flow information to learn motion information between the images, so that the image segmentation model can combine the motion information between each image and other adjacent images, extract the mask information of the corresponding image, and ensure consistency between consecutive images when performing image fusion.

Description

图像处理方法、图像分割模型训练方法及相关装置Image processing method, image segmentation model training method and related device
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年1月7日提交中国专利局的申请号为2020100143725、名称为“图像处理方法、图像分割模型训练方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 7, 2020, with the application number 2020100143725 and titled "Image Processing Method, Image Segmentation Model Training Method and Related Device", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及人工智能技术领域,具体而言,提供一种图像处理方法、图像分割模型训练方法及相关装置。This application relates to the field of artificial intelligence technology. Specifically, it provides an image processing method, an image segmentation model training method, and related devices.
背景技术Background technique
抠图技术是指将一张图像中的前景信息和背景信息分离,然后将得到的前景信息应用在其他背景信息中;利用抠图技术,能够将提取到的前景信息与任意的背景信息相融合,比如在直播领域,可以将提取到的人像信息与任意背景图片或者是视频进行融合,从而提升用户观看直播的体验感。The matting technology refers to separating the foreground information and background information in an image, and then applying the obtained foreground information to other background information; using matting technology, the extracted foreground information can be combined with any background information For example, in the field of live broadcast, the extracted portrait information can be integrated with any background picture or video, thereby enhancing the user's experience of watching the live broadcast.
然而,目前的抠图技术仅仅是将人像信息与背景信息的像素分离,得到一个只包含0与1的掩膜,在融合时,相邻图像帧之间的一致性较差,导致视频画面中的对象信息可能会出现抖动等情况。However, the current matting technology only separates the pixels of the portrait information from the background information, and obtains a mask containing only 0 and 1. When merging, the consistency between adjacent image frames is poor, resulting in poor video images. The object information may be jittery and so on.
发明内容Summary of the invention
本申请的目的在于提供一种图像处理方法、图像分割模型训练方法及相关装置,能够确保在进行图像融合时,连续的图像之间具有一致性。The purpose of this application is to provide an image processing method, an image segmentation model training method and related devices, which can ensure consistency between consecutive images when performing image fusion.
为实现上述目的中的至少一个目的,本申请采用的技术方案如下:In order to achieve at least one of the above objectives, the technical solutions adopted in this application are as follows:
本申请实施例提供了一种图像分割模型训练方法,所述方法包括:An embodiment of the application provides an image segmentation model training method, the method includes:
获得训练图像集以及所述训练图像集对应的训练标注信息;其中,所述训练图像集包括在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息;Obtain a training image set and training annotation information corresponding to the training image set; wherein the training image set includes two training images that are adjacent in time series, and optical flow information between the two training images;
将所述两个训练图像输入所述图像分割模型,得到两个训练掩膜信息;Input the two training images into the image segmentation model to obtain two training mask information;
根据所述两个训练掩膜信息、所述训练标注信息以及所述光流信息,更新所述图像分割模型的模型参数,直至所述图像分割模型达到设定的收敛条件。According to the two training mask information, the training annotation information, and the optical flow information, the model parameters of the image segmentation model are updated until the image segmentation model reaches a set convergence condition.
本申请实施例还提供了一种图像处理方法,所述方法包括:An embodiment of the present application also provides an image processing method, the method including:
接收待处理图像以及待融合背景;Receive the image to be processed and the background to be fused;
将所述待处理图像输入至利用本申请提供的上述图像分割模型训练方法训练至收敛的图像分割模型,得到所述待处理图像对应的目标掩膜信息;Input the to-be-processed image into the image segmentation model that is trained to converge using the above-mentioned image segmentation model training method provided in this application to obtain target mask information corresponding to the to-be-processed image;
利用所述目标掩膜信息对所述待处理图像以及所述待融合背景进行处理,得到融合图像。The target mask information is used to process the to-be-processed image and the to-be-fused background to obtain a fused image.
本申请实施例还提供了一种图像分割模型训练装置,所述装置包括:An embodiment of the present application also provides an image segmentation model training device, the device includes:
第一处理模块,被配置成获得训练图像集以及所述训练图像集对应的训练标注信息;其中,所述训练图像集包括在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息;The first processing module is configured to obtain a training image set and training annotation information corresponding to the training image set; wherein, the training image set includes two training images that are adjacent in time series, and one of the two training images Optical flow information between time;
所述第一处理模块还被配置成,将所述两个训练图像输入所述图像分割模型,得到两个训练掩膜信息;The first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;
更新模块,被配置成根据所述两个训练掩膜信息、所述训练标注信息以及所述光流信息,更新所述图像分割模型的模型参数,直至所述图像分割模型达到设定的收敛条件。An update module configured to update the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, until the image segmentation model reaches a set convergence condition .
本申请实施例还提供了一种图像处理装置,所述装置包括:An embodiment of the present application also provides an image processing device, which includes:
接收模块,被配置成接收待处理图像以及待融合背景;The receiving module is configured to receive the image to be processed and the background to be fused;
第二处理模块,被配置成将所述待处理图像输入至利用本申请提供的上述图像分割模型训练方法训练至收敛的图像分割模型,得到所述待处理图像对应的目标掩膜信息;The second processing module is configured to input the image to be processed into the image segmentation model trained to converge using the image segmentation model training method provided in this application to obtain target mask information corresponding to the image to be processed;
所述第二处理模块还被配置成,利用所述目标掩膜信息对所述待处理图像以及所述待融合背景进行处理,得到融合图像。The second processing module is further configured to use the target mask information to process the to-be-processed image and the to-be-fused background to obtain a fused image.
本申请实施例还提供了一种电子设备,包括:The embodiment of the present application also provides an electronic device, including:
存储器,被配置成存储一个或多个程序;The memory is configured to store one or more programs;
处理器;processor;
当所述一个或多个程序被所述处理器执行时,实现本申请提供的上述图像分割模型训练方法或者是图像处理方法。When the one or more programs are executed by the processor, the above-mentioned image segmentation model training method or image processing method provided in this application is implemented.
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现本申请提供的上述图像分割模型训练方法或者是图像处理方法。The embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored. When the computer program is executed by a processor, the above-mentioned image segmentation model training method or image processing method provided in the present application is implemented.
附图说明Description of the drawings
为了更清楚地说明本申请的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本申请的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它相关的附图。In order to explain the technical solution of the present application more clearly, the drawings that need to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show certain embodiments of the present application, and therefore should not be It is regarded as a limitation of the scope. For those of ordinary skill in the art, other related drawings can be obtained based on these drawings without creative work.
图1示出本申请提供的电子设备的一种示意性结构框图;Figure 1 shows a schematic structural block diagram of an electronic device provided by the present application;
图2示出本申请提供的图像分割模型训练方法的一种示意性流程图;Fig. 2 shows a schematic flowchart of the image segmentation model training method provided by the present application;
图3示出图像分割模型的一种示意性结构图;Figure 3 shows a schematic structural diagram of an image segmentation model;
图4示出图2中步骤206的子步骤的一种示意性流程图;FIG. 4 shows a schematic flowchart of the sub-steps of step 206 in FIG. 2;
图5示出本申请提供的图像分割模型训练方法的另一种示意性流程图;FIG. 5 shows another schematic flowchart of the image segmentation model training method provided by the present application;
图6示出光流信息的一种提取方式示意图;Fig. 6 shows a schematic diagram of an extraction method of optical flow information;
图7示出本申请提供的图像分割模型训练方法的再一种示意性流程图;FIG. 7 shows still another schematic flowchart of the image segmentation model training method provided by the present application;
图8示出本申请提供的图像处理方法的一种示意性流程图;FIG. 8 shows a schematic flowchart of the image processing method provided by the present application;
图9示出一种图像融合前后对照示意图;Figure 9 shows a schematic diagram of image fusion before and after comparison;
图10示出本申请提供的图像分割模型训练装置的一种示意性结构框图;FIG. 10 shows a schematic structural block diagram of the image segmentation model training device provided by the present application;
图11示出本申请提供的图像处理装置的一种示意性结构框图。FIG. 11 shows a schematic structural block diagram of the image processing device provided by the present application.
图中:100-电子设备;101-存储器;102-处理器;103-通信接口;400-图像分割模型训练装置;401-第一处理模块;402-更新模块;500-图像处理装置;501-接收模块;502-第二处理模块。In the figure: 100-electronic equipment; 101-memory; 102-processor; 103-communication interface; 400-image segmentation model training device; 401-first processing module; 402-update module; 500-image processing device; 501- Receiving module; 502-second processing module.
具体实施方式Detailed ways
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请的一些实施例中的附图,对本申请中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本申请的组件可以以各种不同的配置来布置和设计。In order to make the purpose, technical solutions and advantages of this application clearer, the technical solutions in this application will be described clearly and completely in conjunction with the drawings in some embodiments of this application. Obviously, the described embodiments are the present invention. Apply for some examples, but not all examples. The components of the present application generally described and shown in the drawings herein may be arranged and designed in various different configurations.
因此,以下对在附图中提供的本申请的实施例的详细描述并非旨在限制要求保护的本申请的范围,而是仅仅表示本申请选定的一些实施例。基于本申请中的一部分实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。Therefore, the following detailed description of the embodiments of the present application provided in the accompanying drawings is not intended to limit the scope of the claimed application, but merely represents some selected embodiments of the present application. Based on a part of the embodiments in this application, all other embodiments obtained by a person of ordinary skill in the art without creative work shall fall within the protection scope of this application.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅被配置成区分描述,而不能理解为指示或暗示相对重要性。It should be noted that similar reference numerals and letters indicate similar items in the following figures. Therefore, once a certain item is defined in one figure, it does not need to be further defined and explained in subsequent figures. At the same time, in the description of the present application, the terms "first", "second", etc. are only configured to differentiate descriptions, and cannot be understood as indicating or implying relative importance.
需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply one of these entities or operations. There is any such actual relationship or order between. Moreover, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or device including a series of elements not only includes those elements, but also includes those that are not explicitly listed Other elements of, or also include elements inherent to this process, method, article or equipment. If there are no more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or equipment that includes the element.
在例如上述的直播领域中,利用抠图技术,可以将例如主播的人像等前景信息与背景信息进行分离,然后将分离出的前景信息与其他的背景信息进行融合。In the live broadcast field, for example, the matting technology can be used to separate foreground information such as the portrait of the host from the background information, and then merge the separated foreground information with other background information.
其中,假定抠图分离出的前景信息表示为F,融合的背景信息表示为B,则融合后的图像I可以表示为:I=mF+(1-m)B。Among them, assuming that the foreground information separated from the matting is denoted as F and the fused background information is denoted as B, the fused image I can be expressed as: I=mF+(1-m)B.
式中,m表示前景信息F对应的掩膜信息(mask)。In the formula, m represents the mask information corresponding to the foreground information F (mask).
结合上述图像I的融合公式可知,由于前景信息F和背景信息B均为输入的定值,因此图像I的融合效果主要是受掩膜信息取值的影响。Combining the above fusion formula of image I, it can be seen that since foreground information F and background information B are both input fixed values, the fusion effect of image I is mainly affected by the value of the mask information.
在一些抠图方案中,比如人像二值语义分割,在获得掩膜信息的过程中,人像二值语义分割方案一般是从语义的层次理解图像,将图像中的信息从语义类属中归类为前景像素和背景像素,得到的是一个值域在0与1之间的掩膜。In some matting schemes, such as portrait binary semantic segmentation, in the process of obtaining mask information, the portrait binary semantic segmentation scheme generally understands the image from the semantic level, and classifies the information in the image from the semantic category For foreground pixels and background pixels, the result is a mask with a value range between 0 and 1.
然而,在例如网络直播等场景中,将例如主播人像等前景信息与其他背景信息进行融合的过程中,由于网络直播播放的是连续的视频流,因此不仅需要考虑前景信息与背景信息之间的融合,还要考虑连续的两帧图像的分割结果不能有太大的偏差。However, in scenes such as webcasting, in the process of fusing foreground information such as the host’s portrait with other background information, since the webcast broadcasts a continuous video stream, it is not only necessary to consider the relationship between the foreground information and the background information. For fusion, it is also necessary to consider that the segmentation results of two consecutive frames cannot have too much deviation.
但在例如上述的人像二值语义分割方案,其考虑的仅仅是单帧图像的分割,没有考虑到相邻图像帧分割结果在时序上的一致性,使得应用在网络直播等场景的人像分割时,分割后的前景信息在与其他背景信息融合后,融合产生的视频画面中的对象信息可能出现抖动现象,影响用户体验。However, in the above-mentioned two-value semantic segmentation of portraits, it only considers the segmentation of a single frame image, and does not consider the consistency of the segmentation results of adjacent image frames in time sequence, so that it is applied to the portrait segmentation of scenes such as webcasts. , After the segmented foreground information is fused with other background information, the object information in the video picture generated by the fusion may appear jitter, which affects the user experience.
因此,为了解决上述相关方案存在的至少部分缺陷,本申请提供的一些可能的实现方式为:通过获得在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息作为训练图像集,并获得该训练图像集对应的训练标注信息;然后将这两个训练图像输入图像分割模型,得到两个训练掩膜信息;进而根据这两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模型的模型参数,直至图像分割模型达到设定的收敛条件;使得图像分割模型能够结合每一图像与相邻的其他图像之间的运动信息,提取对应图像的掩膜信息,进而在进行图像融合时,能够确保连续的图像之间具有一致性。Therefore, in order to solve at least some of the shortcomings of the above-mentioned related solutions, some possible implementations provided by this application are: obtaining two training images that are adjacent in time sequence, and the optical flow information between the two training images is used as Train the image set, and obtain the training annotation information corresponding to the training image set; then input the two training images into the image segmentation model to obtain two training mask information; and then according to the two training mask information, training annotation information, and Optical flow information, update the model parameters of the image segmentation model until the image segmentation model reaches the set convergence conditions; enable the image segmentation model to combine the motion information between each image and other adjacent images to extract the corresponding image mask Information, and then when performing image fusion, it can ensure consistency between consecutive images.
下面结合附图,对本申请的一些实施方式作详细说明。在不冲突的情况下,下述的实施例及实施例中的特征可以相互组合。Hereinafter, some embodiments of the present application will be described in detail with reference to the accompanying drawings. In the case of no conflict, the following embodiments and features in the embodiments can be combined with each other.
请参阅图1,图1示出本申请提供的电子设备100的一种示意性结构框图,该电子设备100内可以保存有未经训练的图像分割模型,从而执行本申请提供的图像分割模型训练方法,以完成对该图像分割模型的训练;或者是,该电子设备100内可以保存有利用本申请提供的图像分割模型训练方法训练至收敛的图像分割模型,并利用该训练至收敛的图像分割模型实现本申请提供的图像处理方法。Please refer to FIG. 1. FIG. 1 shows a schematic structural block diagram of an electronic device 100 provided in this application. The electronic device 100 can store an untrained image segmentation model to perform the image segmentation model training provided in this application. Method to complete the training of the image segmentation model; or, the electronic device 100 may store an image segmentation model trained to converge using the image segmentation model training method provided in this application, and use the training to converge image segmentation The model implements the image processing method provided in this application.
其中,在一些实施例中,电子设备100可以包括存储器101、处理器102和通信接口103,该存储器101、处理器102和通信接口103相互之间直接或间接地电性连接,以实现数据的传输或交互。例如,这些元件相互之间可通过一条或多条通讯总线或信号线实现电性连接。Among them, in some embodiments, the electronic device 100 may include a memory 101, a processor 102, and a communication interface 103. The memory 101, the processor 102, and the communication interface 103 are directly or indirectly electrically connected to each other to realize data exchange. Transmission or interaction. For example, these components can be electrically connected to each other through one or more communication buses or signal lines.
存储器101可以被配置成存储软件程序及模块,如本申请提供的图像分割模型训练装置或者是图像处理装置对应的程序指令/模块,处理器102通过执行存储在存储器101内的软件程序及模块,从而执行各种功能应用以及数据处理,进而执行本申请提供的图像分割模型训练方法或者是图像处理方法对应的步骤。该通信接口103可以被配置成与其他节点设备进行信令或数据的通信。The memory 101 may be configured to store software programs and modules, such as the image segmentation model training device provided in this application or program instructions/modules corresponding to the image processing device. The processor 102 executes the software programs and modules stored in the memory 101, In this way, various functional applications and data processing are executed, and then the image segmentation model training method or the steps corresponding to the image processing method provided in this application are executed. The communication interface 103 may be configured to perform signaling or data communication with other node devices.
其中,存储器101可以是,但不限于,随机存取存储器(Random Access Memory,RAM),只读存储器(Read Only Memory,ROM),可编程只读存储器(Programmable Read-Only Memory,PROM),可擦除只读存储器(Erasable Programmable Read-Only Memory,EPROM),电可擦除可编程只读存储器(Electric Erasable Programmable Read-Only Memory,EEPROM)等。The memory 101 may be, but is not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read Only Memory, ROM), and programmable read-only memory (Programmable Read-Only Memory, PROM). Erasable Programmable Read-Only Memory (EPROM), Electric Erasable Programmable Read-Only Memory (EEPROM), etc.
处理器102可以是一种集成电路芯片,具有信号处理能力。该处理器102可以是通用处理器,包括中央处理器(Central Processing Unit,CPU)、网络处理器(Network Processor,NP)等;还可以是数字信号处理器(Digital Signal Processing,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor 102 may be an integrated circuit chip with signal processing capabilities. The processor 102 may be a general-purpose processor, including a central processing unit (CPU), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP), a dedicated integrated Circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
可以理解的是,图1所示的结构仅为示意,电子设备100还可以包括比图1中所示更多或者更少的组件,或者具有与图1所示不同的配置。图1中所示的各组件可以采用硬件、软件或其组合实现。It can be understood that the structure shown in FIG. 1 is only for illustration, and the electronic device 100 may also include more or less components than those shown in FIG. 1, or have a different configuration from that shown in FIG. 1. The components shown in FIG. 1 can be implemented by hardware, software, or a combination thereof.
下面以图1所示的电子设备100作为示意性执行主体,对本申请提供的图像分割模型训练方法进行示例性说明。Hereinafter, the electronic device 100 shown in FIG. 1 is used as an exemplary execution subject to exemplarily describe the image segmentation model training method provided in the present application.
请参阅图2,图2示出本申请提供的图像分割模型训练方法的一种示意性流程图,在一些实施例中,该图像分割模型训练方法可以包括以下步骤:Please refer to FIG. 2. FIG. 2 shows a schematic flowchart of the image segmentation model training method provided by the present application. In some embodiments, the image segmentation model training method may include the following steps:
步骤202,获得训练图像集以及训练图像集对应的训练标注信息。Step 202: Obtain a training image set and training annotation information corresponding to the training image set.
步骤204,将两个训练图像输入图像分割模型,得到两个训练掩膜信息。Step 204: Input two training images into the image segmentation model to obtain two training mask information.
步骤206,根据两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模型的模型参数,直至图像分割模型达到设定的收敛条件。Step 206: Update the model parameters of the image segmentation model according to the two training mask information, the training label information, and the optical flow information, until the image segmentation model reaches the set convergence condition.
在一些实施例中,电子设备内可以存储有如图3所示的图像分割模型,该图像分割模型能够对输入的图像进行处理,并输出对应图像的掩膜信息;其中,该图像分割模型采用的网络结构可以是Unet网络、或者是Deeplabv3、SEGNET等分割网络,本申请对于该图像分割模型的网络结构不进行限定。In some embodiments, an image segmentation model as shown in FIG. 3 may be stored in the electronic device. The image segmentation model can process the input image and output the mask information of the corresponding image; wherein, the image segmentation model adopts The network structure may be a Unet network, or a segmentation network such as Deeplabv3, SEGNET, etc. The application does not limit the network structure of the image segmentation model.
电子设备在对该图像分割模型进行训练的过程中,可以先获得训练图像集以及训练图像集对应的训练标注信息,其中,训练图像集包括在时序上相邻的两个训练图像,比如图3中的I 0和I 1,以及这两个训练图像之间的光流信息,光流信息表征的是两个训练图像之间的运动线索,即I 0和I 1之间的关联性。 In the process of training the image segmentation model, the electronic device can first obtain the training image set and the training annotation information corresponding to the training image set, where the training image set includes two training images that are adjacent in time series, as shown in Figure 3. In I 0 and I 1 , and the optical flow information between the two training images, the optical flow information represents the motion cues between the two training images, that is, the correlation between I 0 and I 1.
然后,如图3所示,电子设备可以将I 0和I 1两个训练图像输入图像分割模型,得到两个训练掩膜信息;比如在图3所示场景中,I 0对应的掩膜信息可以是图3中的Mask 0,I 1对应的掩膜信息可以是图3中的Mask 1Then, as shown in Figure 3, the electronic device can input the two training images I 0 and I 1 into the image segmentation model to obtain two training mask information; for example, in the scene shown in Figure 3, the mask information corresponding to I 0 It may be Mask 0 in FIG. 3, and the mask information corresponding to I 1 may be Mask 1 in FIG. 3.
最后,电子设备可以根据例如Mask 0和Mask 1两个训练掩膜信息、训练标注信息以及光流信息,采用例如反向传播算法(Backpropagation algorithm,BP算法)更新该图像分割模型的模型参数,直至该图像分割模型达到设定的收敛条件;其中,由于光流信息表征的是两个训练图像之间的运动信息,那么相应地,两个训练图像各自对应的掩膜信息,也具有该光流信息表征的运动信息;如此,在更新该图像分割模型的模型参数的过程中,图像分割模型能够利用光流信息学习到两个训练图像之间的运动信息,使得该图像分割模型在提取目标图像的掩膜信息时,能够结合与目标图像相邻的其他图像的掩膜信息进行提取,从而保持相邻图像间的一致性。 Finally, the electronic device can update the model parameters of the image segmentation model by using, for example, a backpropagation algorithm (BP algorithm) based on the two training mask information, training annotation information, and optical flow information of Mask 0 and Mask 1, for example, until The image segmentation model reaches the set convergence condition; among them, since the optical flow information represents the motion information between the two training images, correspondingly, the mask information corresponding to the two training images also has the optical flow Information representation of the motion information; in this way, in the process of updating the model parameters of the image segmentation model, the image segmentation model can use optical flow information to learn the motion information between the two training images, so that the image segmentation model is extracting the target image When the mask information of the target image is combined, the mask information of other images adjacent to the target image can be extracted, so as to maintain the consistency between adjacent images.
可见,基于上述设计,本申请提供的一种图像分割模型训练方法,通过获得在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息作为训练图像集,并获得该训练图像集对应的训练标注信息;然后将这两个训练图像输入图像分割模型,得到两个训练掩膜信息;进而根据这两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模 型的模型参数,直至图像分割模型达到设定的收敛条件;如此,在训练图像分割模型的过程中,能够使图像分割模型利用光流信息学习到图像间的运动信息,使得图像分割模型能够结合每一图像与相邻的其他图像之间的运动信息,提取对应图像的掩膜信息,进而在进行图像融合时,能够确保连续的图像之间具有一致性。It can be seen that, based on the above design, an image segmentation model training method provided by this application is obtained by obtaining two training images adjacent in time sequence and the optical flow information between the two training images as a training image set, and obtaining The training annotation information corresponding to the training image set; then the two training images are input into the image segmentation model to obtain two training mask information; and then according to the two training mask information, training annotation information and optical flow information, the image is updated The model parameters of the segmentation model until the image segmentation model reaches the set convergence conditions; in this way, in the process of training the image segmentation model, the image segmentation model can use the optical flow information to learn the motion information between the images, so that the image segmentation model can Combining the motion information between each image and other adjacent images, extract the mask information of the corresponding image, and then when performing image fusion, it can ensure consistency between consecutive images.
其中,需要说明的是,电子设备所获得的在时序上相邻的两个训练图像,一般包括时序靠前的第一图像和时序靠后的第二图像,比如图3中时序靠前的I 0可以作为第一图像,时序靠后的I 1可以作为第二图像。 Among them, it should be noted that the two training images that are adjacent in time sequence obtained by the electronic device generally include a first image that is earlier in time sequence and a second image that is later in time sequence, such as the I in FIG. 3 that is earlier in time sequence. 0 can be used as the first image, and I 1 with a later sequence can be used as the second image.
相应地,图像分割模型输出的两个训练掩膜信息,即包括了第一图像对应的第一训练掩膜信息,以及第二图像对应的第二训练掩膜信息,比如图3中I 0对应的Mask 0可以作为第一训练掩膜信息,I 1对应的Mask 1可以作为第二训练掩膜信息。 Correspondingly, the two training mask information output by the image segmentation model includes the first training mask information corresponding to the first image and the second training mask information corresponding to the second image, such as I 0 in Figure 3 the mask 0 can be used as mask information of the first training, I 1 mask corresponding to a mask can be used as the second training information.
另外,需要说明的是,光流是目标、场景或摄像机在连续两帧图像间运动时造成的目标的运动;光流信息是一个矢量信息,光流一般分为前向光流和后向光流,比如在图3所示的两帧图像间,图像I 0的时序在图像I 1之前,则对于图像I 1而言,图像I 0到图像I 1的光流信息即为后向光流,其记录的是图像I 0到图像I 1的运动方向和速率;图像I 1到图像I 0的光流信息即为前向光流,其记录的是图像I 1到图像I 0的运动方向和速率。 In addition, it should be noted that optical flow is the movement of the target caused by the movement of the target, scene, or camera between two consecutive frames of images; optical flow information is a vector information, and optical flow is generally divided into forward optical flow and backward optical flow. For example, between the two frames of images shown in Figure 3, the timing of image I 0 is before image I 1. For image I 1 , the optical flow information from image I 0 to image I 1 is the backward optical flow. , It records the movement direction and speed of the image I 0 to the image I 1 ; the optical flow information from the image I 1 to the image I 0 is the forward optical flow, and it records the movement direction of the image I 1 to the image I 0 And rate.
并且,上述仅为示意,将相邻的两个训练图像中,时序靠前的图像作为第一图像,时序靠后的作为第二图像;在本申请其他一些可能的实施例中,还可以将时序靠后的作为第一图像,时序靠前的作为第二图像;本申请对此不进行限定。Moreover, the foregoing is only for illustration. Among the two adjacent training images, the image with the earlier time sequence is used as the first image, and the image with the later time sequence is used as the second image; in some other possible embodiments of the present application, it is also possible to use The later in the sequence is regarded as the first image, and the earlier in the sequence is regarded as the second image; this application does not limit this.
可选地,在一些实施例中,电子设备获得的训练标注信息可以为第二图像的标注掩膜信息,比如按照图3所示的场景,电子设备获得的训练标注信息可以是图像I 1的标注掩膜信息;相应地,电子设备获得的光流信息,可以是对于图像I 1而言的后向光流。 Optionally, in some embodiments, the training annotation information obtained by the electronic device may be the annotation mask information of the second image. For example, according to the scenario shown in FIG. 3, the training annotation information obtained by the electronic device may be the image I 1 The mask information is marked; accordingly, the optical flow information obtained by the electronic device may be the backward optical flow for the image I 1.
如此,在图2的基础上,请参阅图4,图4示出图2中步骤206的子步骤的一种示意性流程图,在一些可能的实现方式中,步骤206可以包括以下子步骤:In this way, on the basis of FIG. 2, please refer to FIG. 4. FIG. 4 shows a schematic flowchart of the sub-steps of step 206 in FIG. 2. In some possible implementation manners, step 206 may include the following sub-steps:
步骤206-1,根据标注掩膜信息、第二训练掩膜信息以及第二图像,获得图像分割模型的内容损失。Step 206-1: Obtain the content loss of the image segmentation model according to the labeling mask information, the second training mask information, and the second image.
步骤206-2,根据光流信息、第一图像、第二图像、第一训练掩膜信息以及第二训练掩膜信息,获得图像分割模型的时序损失。Step 206-2: Obtain the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.
步骤206-3,基于内容损失和时序损失,更新图像分割模型的模型参数。Step 206-3, based on the content loss and the timing loss, update the model parameters of the image segmentation model.
电子设备在执行步骤206以更新图像分割模型的模型参数的过程中,可以将图像分割模型的损失函数分为内容损失和时序损失两部分。In the process of performing step 206 to update the model parameters of the image segmentation model, the electronic device may divide the loss function of the image segmentation model into two parts: content loss and timing loss.
比如,在一些实施例中,该图像分割模型的损失函数可以满足如下公式:For example, in some embodiments, the loss function of the image segmentation model may satisfy the following formula:
L=Lc+LstL=Lc+Lst
式中,L可以表示图像分割模型总的损失,Lc可以表示内容损失,Lst可以表示时序损失。In the formula, L can represent the total loss of the image segmentation model, Lc can represent the content loss, and Lst can represent the timing loss.
其中,内容损失约束的是图像分割模型输出的第二训练掩膜信息与第二图像实际的掩膜信息,内容损失保证的是对分割结果的准确度。Among them, the content loss constraint is the second training mask information output by the image segmentation model and the actual mask information of the second image, and the content loss guarantees the accuracy of the segmentation result.
如此,电子设备在执行步骤206的过程中,可以根据标注掩膜信息、第二训练掩膜信息以及第二图像,获得该图像分割模型的内容损失,即计算第二训练掩膜信息与标注掩膜信息之间的差异。In this way, in the process of performing step 206, the electronic device can obtain the content loss of the image segmentation model according to the labeling mask information, the second training mask information, and the second image, that is, calculating the second training mask information and the labeling mask. The difference between film information.
比如,在一些可能的实现方式中,内容损失的计算公式可以满足如下:For example, in some possible implementations, the calculation formula for content loss may satisfy the following:
Figure PCTCN2021070167-appb-000001
Figure PCTCN2021070167-appb-000001
式中,Lc表示内容损失,mask gt表示标注掩膜信息,mask1 pre表示第二训练掩膜信息,I 1表示第二图像。 In the formula, Lc represents the content loss, mask gt represents the labeling mask information, mask1 pre represents the second training mask information, and I 1 represents the second image.
另一方面,时序损失约束的是两帧图像之间运动信息,确保两帧图像各自对应的掩膜信息在时序上能够保持一致。On the other hand, the timing loss constrains the motion information between the two frames of images, ensuring that the mask information corresponding to the two frames of images can be kept consistent in time sequence.
如此,电子设备在执行步骤206的过程中,根据光流信息、第一图像、第二图像、第一训练掩膜信息以及第二训练掩膜信息,获得图像分割模型的时序损失。In this way, in the process of performing step 206, the electronic device obtains the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information.
比如,在一些可能的实现方式中,时序损失的计算公式可以满足如下:For example, in some possible implementations, the timing loss calculation formula can satisfy the following:
Figure PCTCN2021070167-appb-000002
Figure PCTCN2021070167-appb-000002
式中,Lst表示时序损失,α表示设定的参数,
Figure PCTCN2021070167-appb-000003
I 0表示第一图像,warp 01表示光流信息,mask0 pre表示第一训练掩膜信息。
In the formula, Lst represents the timing loss, α represents the set parameter,
Figure PCTCN2021070167-appb-000003
I 0 represents the first image, warp 01 represents optical flow information, and mask0 pre represents the first training mask information.
如此,电子设备基于上述获得的内容损失和时序损失,可以采用求和的方式,计算内容损失和时序损失两者的和作为图像分割模型总的损失,然后基于例如BP算法,并利用计算出的总的损失更新图像分割模型的模型参数;通过不停的迭代训练,直至图像分割模型达到设定的收敛条件。In this way, based on the content loss and timing loss obtained above, the electronic device can use a summation method to calculate the sum of the content loss and the timing loss as the total loss of the image segmentation model, and then use the calculated loss based on, for example, the BP algorithm. The total loss updates the model parameters of the image segmentation model; through continuous iterative training, until the image segmentation model reaches the set convergence condition.
需要说明的是,上述计算内容损失、时序损失以及图像分割模型总的损失的公式仅为示意,在本申请其他一些可能的实施例中,还可以采用例如其他的一些公式计算上述的各项损失,本申请对此不进行限定。It should be noted that the above formulas for calculating content loss, time sequence loss, and total loss of the image segmentation model are only for illustration. In some other possible embodiments of this application, for example, some other formulas may be used to calculate the above various losses. , This application does not limit this.
另外,本申请提供的上述方案中,电子设备可以是先执行步骤206-1获得内容损失,再执行步骤206-2获得时序损失;也可以是先执行步骤206-2获得内容损失,再执行步骤206-1获得内容损失;本申请对于步骤206-1与步骤206-2两者执行的先后顺序不进行限定;比如在另一种可能的实现方式中,还可以是步骤206-1与步骤206-2一起执行。In addition, in the above solution provided by this application, the electronic device may first perform step 206-1 to obtain the content loss, and then perform step 206-2 to obtain the timing loss; or it may perform step 206-2 first to obtain the content loss, and then perform step 206-1 obtains content loss; this application does not limit the order of execution of step 206-1 and step 206-2; for example, in another possible implementation manner, it can also be step 206-1 and step 206 -2 are executed together.
并且,需要说明的是,光流是图像在平移过程中的二维矢量场,其通过二维图像来表示物体点在三维运动的速度场,反应的是微小时间间隔内由于运动形成的图像变化,以确定图像点上的运动方向和运动速率,使得光流可以被配置成提供恢复图像运动的线索。Moreover, it should be noted that optical flow is a two-dimensional vector field in the translation process of an image. It uses a two-dimensional image to represent the velocity field of an object point in three-dimensional motion, which reflects the image change formed by motion in a small time interval. , To determine the direction and rate of motion on the image point, so that the optical flow can be configured to provide clues to restore image motion.
在训练图像分割模型的过程中,电子设备可以通过在线提取的方式,获得两个训练图像间之间的光流信息,以减少训练图像分割模型的过程中用户的工作量。In the process of training the image segmentation model, the electronic device can obtain the optical flow information between the two training images through online extraction, so as to reduce the workload of the user in the process of training the image segmentation model.
因此,在图2的基础上,请参阅图5,图5示出本申请提供的图像分割模型训练方法的另一种示意性流程图,在执行步骤202之前,该图像分割模型训练方法还可以包括以下步骤:Therefore, on the basis of FIG. 2, please refer to FIG. 5. FIG. 5 shows another schematic flowchart of the image segmentation model training method provided by the present application. Before step 202 is performed, the image segmentation model training method may also It includes the following steps:
步骤201,提取两个训练图像之间的帧间光流,以获得光流信息。Step 201: Extract the inter-frame optical flow between two training images to obtain optical flow information.
在一些实施例中,如图6所示,电子设备可以采用例如selflow算法提取两个训练图像之间的帧间光流,以获得光流信息。In some embodiments, as shown in FIG. 6, the electronic device may use, for example, a selflow algorithm to extract the inter-frame optical flow between two training images to obtain optical flow information.
比如,以上述将对于图像I 1而言的后向光流作为光流信息的示例中,电子设备可以将 图像I 0和图像I 1作为输入,并采用selflow算法提取图像I 1的后向帧间光流,从而获得光流信息。 For example, in the above example where the backward optical flow for the image I 1 is used as the optical flow information, the electronic device can take the image I 0 and the image I 1 as input, and use the selflow algorithm to extract the backward frame of the image I 1 Time optical flow, thereby obtaining optical flow information.
但需要说明的是,在利用上述的在线提取的方式获得光流信息的实施例中,由于需要执行在线提取光流信息的步骤,使得图像分割模型的训练时间被拉长;并且,在对图像分割模型进行迭代训练时,由于需要重复执行步骤201,可能会对同一组两个训练图像进行重复提取光流信息的操作。However, it should be noted that in the embodiment of obtaining optical flow information using the online extraction method described above, the training time of the image segmentation model is lengthened due to the need to perform the step of extracting optical flow information online; When the segmentation model is iteratively trained, since step 201 needs to be repeated, the operation of extracting optical flow information may be repeated on the same set of two training images.
因此,在本申请其他一可能的实施例中,还可以采用离线提取的方式获得光流信息,即:可以先执行步骤201,在获得每一组中两个训练图像的光流信息后,再将每一组的两个训练图像和对应的光流信息作为图像分割模型的输入,进而执行对图像分割模型的训练过程;此时,在训练图像分割模型的过程中,由于不需要再执行步骤201获得光流信息,可以减少图像分割模型的训练时间,以及避免对同一组两个训练图像进行重复提取光流信息的操作。Therefore, in another possible embodiment of the present application, the optical flow information can also be obtained by offline extraction, that is, step 201 can be performed first, and after the optical flow information of the two training images in each group is obtained, the optical flow information can be obtained. Take the two training images of each group and the corresponding optical flow information as the input of the image segmentation model, and then perform the training process of the image segmentation model; at this time, in the process of training the image segmentation model, there is no need to perform steps Obtaining optical flow information in 201 can reduce the training time of the image segmentation model and avoid the repeated extraction of optical flow information from the same set of two training images.
并且,需要说明的是,在实际的训练场景中,由于开源的数据较少,为了使训练图像分割模型的训练图像足够多,可以通过截取例如直播场景中的人像抠图数据集,并提取对应直播画面的掩膜信息作为标注掩膜信息,从而提升训练图像的数据量。Moreover, it should be noted that in the actual training scene, since there is less open source data, in order to make the training image of the training image segmentation model enough, you can intercept, for example, the portrait matting data set in the live scene and extract the corresponding The mask information of the live screen is used as annotated mask information, thereby increasing the data volume of the training image.
但是,需要说明的是,通过截取人像抠图数据集的方式依然需要用户手动操作,会增加用户训练图像分割模型的工作量。However, it should be noted that the method of intercepting the portrait matting data set still requires manual operation by the user, which will increase the workload of the user in training the image segmentation model.
为此,在图2的基础上,请参阅图7,图7示出本申请提供的图像分割模型训练方法的再一种示意性流程图,在执行步骤202之前,该图像分割模型训练方法还可以包括以下步骤:To this end, on the basis of FIG. 2, please refer to FIG. 7. FIG. 7 shows another schematic flowchart of the image segmentation model training method provided by the present application. Before step 202 is performed, the image segmentation model training method also It can include the following steps:
步骤200,将获得两个对象信息分别与一背景信息进行融合,以生成两个训练图像。In step 200, the two pieces of object information obtained are respectively fused with a piece of background information to generate two training images.
在一些实施例中,用户可以提取在时序上相邻的两个图像中的对象信息,并将这两个对象信息传输给电子设备;然后,电子设备可以将获得的这两个对象信息分别与一背景信息进行融合,以生成两个训练图像,即:一组训练图像集,从而增大训练图像的数据量。In some embodiments, the user can extract the object information in two images that are adjacent in time sequence, and transmit the two object information to the electronic device; then, the electronic device can connect the obtained two object information with the One background information is fused to generate two training images, that is, a set of training images, so as to increase the data volume of the training images.
当然,可以理解的是,上述仅以将获得的对象信息分别与一背景信息进行融合为例,说明生成两个训练图像的一种方式;当需要生成大量的训练图像,电子设备可以将这两个训练图像均与不同的背景信息进行融合,从而生成多组训练图像集,且每一组训练图像集均包括两个训练图像。Of course, it is understandable that the above only uses the fusion of the obtained object information with a piece of background information as an example to illustrate one way of generating two training images; when a large number of training images need to be generated, the electronic device can combine the two training images. Each training image is fused with different background information to generate multiple training image sets, and each training image set includes two training images.
另外,结合上述可知,在将抠图分离出的前景信息F与背景信息B进行融合的场景中,融合后的图像I可以表示为:I=mF+(1-m)B。In addition, in combination with the foregoing, in a scene where the foreground information F separated from the matting image and the background information B are merged, the merged image I can be expressed as: I=mF+(1-m)B.
在该表达式中,m表示前景信息F对应的掩膜信息。可见,基于该表达式,只要能够获得前景信息F对应的掩膜信息m,即能够将前景信息F与任一背景信息B进行融合。In this expression, m represents the mask information corresponding to the foreground information F. It can be seen that based on this expression, as long as the mask information m corresponding to the foreground information F can be obtained, the foreground information F and any background information B can be fused.
如此,在本申请提供的上述图像分割模型训练方法的基础上,针对利用该图像分割模型训练方法训练至收敛的图像分割模型,将该图像分割模型被配置成比如直播场景中前景信息F与背景信息B的融合。In this way, on the basis of the above-mentioned image segmentation model training method provided in this application, for the image segmentation model trained to the convergence using the image segmentation model training method, the image segmentation model is configured such as the foreground information F and the background in the live scene. The fusion of information B.
请参阅图8,图8示出本申请提供的图像处理方法的一种示意性流程图,在一些实施例中,该图像处理方法可以包括以下步骤:Please refer to FIG. 8. FIG. 8 shows a schematic flowchart of the image processing method provided by the present application. In some embodiments, the image processing method may include the following steps:
步骤302,接收待处理图像以及待融合背景。Step 302: Receive the image to be processed and the background to be fused.
步骤304,将待处理图像输入至利用图像分割模型训练方法训练至收敛的图像分割模型,得到待处理图像对应的目标掩膜信息。Step 304: Input the image to be processed into the image segmentation model trained to converge using the image segmentation model training method to obtain target mask information corresponding to the image to be processed.
步骤306,利用目标掩膜信息对待处理图像以及待融合背景进行处理,得到融合图像。Step 306: Use the target mask information to process the image to be processed and the background to be fused to obtain a fused image.
在一些实施例中,比如在直播场景中,电子设备可以将接收的每一帧视频直播画面作为待处理图像,并接收一待融合背景,目的即为:将每一帧视频直播画面的背景信息替换为该待融合背景。In some embodiments, for example, in a live broadcast scene, the electronic device may use each frame of the received live video image as a to-be-processed image, and receive a background to be merged. The purpose is: to combine the background information of each frame of the live video image Replace with the background to be blended.
然后,以其中一帧视频直播画面作为待处理图像为例,电子设备可以将该待处理图像输入至利用本申请提供的上述图像分割模型训练方法训练至收敛的图像分割模型中,从而由该图像分割模型输出该待处理图像对应的目标掩膜信息Mask mThen, taking one of the live video images as the image to be processed as an example, the electronic device can input the image to be processed into the image segmentation model that is trained to the convergence using the image segmentation model training method provided in this application. The segmentation model outputs the target mask information Mask m corresponding to the image to be processed.
最后,电子设备可以将获得的目标掩膜信息Mask m作为上述融合公式中的参数m,并将待处理图像以及待融合背景代入该融合公式中,从而得到融合图像I,融合前后的效果可以如图9所示;如此,图像分割模型在利用光流信息学习到图像间的运动信息后,在进行图像融合的过程中,能够结合每一图像与相邻的其他图像之间的运动信息,提取对应图像的掩膜信息,从而在进行图像融合时,能够确保连续的图像之间具有一致性。 Finally, the electronic device can use the obtained target mask information Mask m as the parameter m in the above fusion formula, and substitute the image to be processed and the background to be fused into the fusion formula to obtain the fusion image I. The effect before and after the fusion can be as follows As shown in Figure 9; in this way, after the image segmentation model uses optical flow information to learn the motion information between images, in the process of image fusion, it can combine the motion information between each image and other adjacent images to extract Corresponding to the mask information of the image, which can ensure consistency between consecutive images when performing image fusion.
另外,基于与本申请提供的上述图像分割模型训练方法相同的发明构思,请参阅图10,图10示出本申请提供的图像分割模型训练装置400的一种示意性结构框图,该图像分割模型训练装置400可以包括第一处理模块401及更新模块402。其中:In addition, based on the same inventive concept as the above-mentioned image segmentation model training method provided by this application, please refer to FIG. 10. FIG. 10 shows a schematic structural block diagram of the image segmentation model training device 400 provided by this application. The training device 400 may include a first processing module 401 and an update module 402. among them:
第一处理模块401,可以被配置成获得训练图像集以及训练图像集对应的训练标注信息;其中,训练图像集包括在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息;The first processing module 401 may be configured to obtain a training image set and training annotation information corresponding to the training image set; wherein, the training image set includes two training images that are adjacent in time series, and the difference between the two training images Optical flow information;
第一处理模块401还可以被配置成,将两个训练图像输入图像分割模型,得到两个训练掩膜信息;The first processing module 401 may also be configured to input two training images into the image segmentation model to obtain two training mask information;
更新模块402,可以被配置成根据两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模型的模型参数,直至图像分割模型达到设定的收敛条件。The update module 402 may be configured to update the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, until the image segmentation model reaches the set convergence condition.
可选地,作为一种可能的实现方式,两个训练图像包括在时序靠前的第一图像和时序靠后的第二图像,训练标注信息为第二图像的标注掩膜信息;Optionally, as a possible implementation manner, the two training images include a first image that is earlier in time sequence and a second image that is later in time sequence, and the training annotation information is the annotation mask information of the second image;
两个训练掩膜信息包括第一图像对应的第一训练掩膜信息、以及第二图像对应的第二训练掩膜信息;The two training mask information include first training mask information corresponding to the first image and second training mask information corresponding to the second image;
更新模块402在根据两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模型的模型参数的过程中,可以被配置成:In the process of updating the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, the updating module 402 can be configured to:
根据标注掩膜信息、第二训练掩膜信息以及第二图像,获得图像分割模型的内容损失;Obtain the content loss of the image segmentation model according to the labeling mask information, the second training mask information, and the second image;
根据光流信息、第一图像、第二图像、第一训练掩膜信息以及第二训练掩膜信息,获得图像分割模型的时序损失;Obtain the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information;
基于内容损失和时序损失,更新图像分割模型的模型参数。Based on content loss and timing loss, the model parameters of the image segmentation model are updated.
可选地,作为一种可能的实现方式,更新模块401在基于内容损失和时序损失,更新图像分割模型的模型参数的过程中,可以被配置成:Optionally, as a possible implementation manner, in the process of updating the model parameters of the image segmentation model based on the content loss and the timing loss, the update module 401 can be configured to:
计算出内容损失和时序损失的和作为图像分割模型总的损失,以利用总的损失更新图像分割模型的模型参数。The sum of content loss and timing loss is calculated as the total loss of the image segmentation model to update the model parameters of the image segmentation model with the total loss.
可选地,作为一种可能的实现方式,内容损失的计算公式满足如下:Optionally, as a possible implementation manner, the calculation formula for content loss satisfies the following:
Figure PCTCN2021070167-appb-000004
Figure PCTCN2021070167-appb-000004
式中,Lc表示内容损失,mask gt表示标注掩膜信息,mask1 pre表示第二训练掩膜信息,I 1表示第二图像; In the formula, Lc represents content loss, mask gt represents labeled mask information, mask1 pre represents second training mask information, and I 1 represents second image;
时序损失的计算公式满足如下:The calculation formula of the timing loss satisfies the following:
Figure PCTCN2021070167-appb-000005
Figure PCTCN2021070167-appb-000005
式中,Lst表示时序损失,α表示设定的参数,
Figure PCTCN2021070167-appb-000006
I 0表示第一图 像,warp 01表示光流信息,mask0 pre表示第一训练掩膜信息。
In the formula, Lst represents the timing loss, α represents the set parameter,
Figure PCTCN2021070167-appb-000006
I 0 represents the first image, warp 01 represents optical flow information, and mask0 pre represents the first training mask information.
可选地,作为一种可能的实现方式,标注掩膜信息为截取的直画面的掩膜信息。Optionally, as a possible implementation manner, the mask information is marked as the mask information of the captured straight picture.
可选地,作为一种可能的实现方式,第一处理模块401在获得训练图像集以及训练图像集对应的训练标注信息之前,还可以被配置成:Optionally, as a possible implementation manner, before obtaining the training image set and the training annotation information corresponding to the training image set, the first processing module 401 may also be configured to:
提取两个训练图像之间的帧间光流,以获得光流信息。Extract the inter-frame optical flow between two training images to obtain optical flow information.
可选地,作为一种可能的实现方式,第一处理模块401在提取两个训练图像之间的帧间光流,以获得光流信息的过程中,可以被配置成:Optionally, as a possible implementation manner, in the process of extracting the inter-frame optical flow between two training images to obtain optical flow information, the first processing module 401 may be configured to:
采用selflow算法提取两个训练图像之间的帧间光流,以获得光流信息。The selflow algorithm is used to extract the inter-frame optical flow between two training images to obtain optical flow information.
可选地,作为一种可能的实现方式,第一处理模块401在获得训练图像集以及训练图像集对应的训练标注信息之前,还可以被配置成:Optionally, as a possible implementation manner, before obtaining the training image set and the training annotation information corresponding to the training image set, the first processing module 401 may also be configured to:
将获得两个对象信息分别与一背景信息进行融合,以生成两个训练图像。The obtained two object information is respectively fused with a piece of background information to generate two training images.
并且,基于与本申请提供的上述图像处理方法相同的发明构思,请参阅图11,图11示出本申请提供的图像处理装置500的一种示意性结构框图,该图像处理装置500可以包括接收模块501及第二处理模块502。其中:In addition, based on the same inventive concept as the above-mentioned image processing method provided by this application, please refer to FIG. 11. FIG. 11 shows a schematic structural block diagram of the image processing apparatus 500 provided by the present application. The image processing apparatus 500 may include receiving Module 501 and second processing module 502. among them:
接收模块501,可以被配置成接收待处理图像以及待融合背景;The receiving module 501 may be configured to receive the image to be processed and the background to be merged;
第二处理模块502,可以被配置成将待处理图像输入至利用本申请提供的上述图像分割模型训练至收敛的图像分割模型,得到待处理图像对应的目标掩膜信息;The second processing module 502 may be configured to input the image to be processed into an image segmentation model trained to converge using the above-mentioned image segmentation model provided in this application, to obtain target mask information corresponding to the image to be processed;
第二处理模块502还可以被配置成,利用目标掩膜信息对待处理图像以及待融合背景进行处理,得到融合图像。The second processing module 502 may also be configured to use the target mask information to process the image to be processed and the background to be fused to obtain a fused image.
可选地,作为一种可能的实现方式,接收模块501在接收待处理图像的过程中,可以被配置成:Optionally, as a possible implementation manner, in the process of receiving the image to be processed, the receiving module 501 may be configured to:
将接收的每一帧视频直播画面作为待处理图像。Use each frame of the received live video image as an image to be processed.
在本申请所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本申请的一些实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个被配置成实现规定的逻辑功能的可执行指令。In the embodiments provided in this application, it should be understood that the disclosed device and method may also be implemented in other ways. The device embodiments described above are merely illustrative. For example, the flowcharts and block diagrams in the drawings show the possible implementation architecture, functions, and operations of the devices, methods, and computer program products according to some embodiments of the present application. . In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the module, program segment, or part of the code contains one or more configured to implement a prescribed logical function. Executable instructions.
也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。It should also be noted that in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two consecutive blocks can actually be executed substantially in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions.
另外,在本申请的一些实施例中的各功能模块可以集成在一起形成一个独立的部分,也可以是各个模块单独存在,也可以两个或两个以上模块集成形成一个独立的部分。In addition, the functional modules in some embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part.
所述功能如果以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请的一些实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。If the function is implemented in the form of a software function module and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in some embodiments of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory, random access memory, magnetic disk or optical disk and other media that can store program codes.
以上所述仅为本申请的部分实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、 等同替换、改进等,均应包含在本申请的保护范围之内。The foregoing descriptions are only part of the embodiments of the present application, and are not used to limit the present application. For those skilled in the art, the present application may have various modifications and changes. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of this application shall be included in the protection scope of this application.
对于本领域技术人员而言,显然本申请不限于上述示范性实施例的细节,而且在不背离本申请的精神或基本特征的情况下,能够以其它的具体形式实现本申请。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本申请的范围由所附权利要求而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化囊括在本申请内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。For those skilled in the art, it is obvious that the present application is not limited to the details of the foregoing exemplary embodiments, and the present application can be implemented in other specific forms without departing from the spirit or basic characteristics of the present application. Therefore, no matter from which point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of this application is defined by the appended claims rather than the above description, and therefore it is intended to fall into the claims. All changes in the meaning and scope of the equivalent elements of are included in this application. Any reference signs in the claims should not be regarded as limiting the claims involved.
工业实用性Industrial applicability
由于在对图像分割模型进行训练的过程中,通过获得在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息作为训练图像集,并获得该训练图像集对应的训练标注信息;然后将这两个训练图像输入图像分割模型,得到两个训练掩膜信息;进而根据这两个训练掩膜信息、训练标注信息以及光流信息,更新图像分割模型的模型参数,直至图像分割模型达到设定的收敛条件;如此,能够使图像分割模型利用光流信息学习到图像间的运动信息,使得图像分割模型能够结合每一图像与相邻的其他图像之间的运动信息,提取对应图像的掩膜信息,进而在进行图像融合时,能够确保连续的图像之间具有一致性。Because in the process of training the image segmentation model, two adjacent training images in time sequence and the optical flow information between the two training images are obtained as the training image set, and the corresponding training image set is obtained. Train the annotation information; then input the two training images into the image segmentation model to obtain two training mask information; and then update the model parameters of the image segmentation model according to the two training mask information, training annotation information, and optical flow information. Until the image segmentation model reaches the set convergence condition; in this way, the image segmentation model can use the optical flow information to learn the motion information between images, so that the image segmentation model can combine the motion information between each image and other adjacent images , To extract the mask information of the corresponding image, and then when performing image fusion, it can ensure consistency between consecutive images.

Claims (15)

  1. 一种图像分割模型训练方法,其特征在于,所述方法包括:An image segmentation model training method, characterized in that the method includes:
    获得训练图像集以及所述训练图像集对应的训练标注信息;其中,所述训练图像集包括在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息;Obtain a training image set and training annotation information corresponding to the training image set; wherein the training image set includes two training images that are adjacent in time series, and optical flow information between the two training images;
    将所述两个训练图像输入所述图像分割模型,得到两个训练掩膜信息;Input the two training images into the image segmentation model to obtain two training mask information;
    根据所述两个训练掩膜信息、所述训练标注信息以及所述光流信息,更新所述图像分割模型的模型参数,直至所述图像分割模型达到设定的收敛条件。According to the two training mask information, the training annotation information, and the optical flow information, the model parameters of the image segmentation model are updated until the image segmentation model reaches a set convergence condition.
  2. 如权利要求1所述的方法,其特征在于,所述两个训练图像包括在时序靠前的第一图像和时序靠后的第二图像,所述训练标注信息为所述第二图像的标注掩膜信息;The method of claim 1, wherein the two training images include a first image that is earlier in time sequence and a second image that is later in time sequence, and the training annotation information is an annotation of the second image Mask information
    所述两个训练掩膜信息包括所述第一图像对应的第一训练掩膜信息、以及所述第二图像对应的第二训练掩膜信息;The two training mask information include first training mask information corresponding to the first image and second training mask information corresponding to the second image;
    根据所述两个训练掩膜信息、所述训练标注信息以及所述光流信息,更新所述图像分割模型的模型参数的步骤,包括:The step of updating the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information includes:
    根据所述标注掩膜信息、所述第二训练掩膜信息以及所述第二图像,获得所述图像分割模型的内容损失;Obtaining the content loss of the image segmentation model according to the annotation mask information, the second training mask information, and the second image;
    根据所述光流信息、所述第一图像、所述第二图像、所述第一训练掩膜信息以及所述第二训练掩膜信息,获得所述图像分割模型的时序损失;Obtaining the timing loss of the image segmentation model according to the optical flow information, the first image, the second image, the first training mask information, and the second training mask information;
    基于所述内容损失和所述时序损失,更新所述图像分割模型的模型参数。Based on the content loss and the timing loss, the model parameters of the image segmentation model are updated.
  3. 如权利要求2所述的方法,其特征在于,所述基于所述内容损失和所述时序损失,更新所述图像分割模型的模型参数,包括:3. The method of claim 2, wherein the updating the model parameters of the image segmentation model based on the content loss and the timing loss comprises:
    计算出所述内容损失和所述时序损失的和作为所述图像分割模型总的损失,以利用所述总的损失更新所述图像分割模型的模型参数。The sum of the content loss and the timing loss is calculated as the total loss of the image segmentation model, so as to update the model parameters of the image segmentation model by using the total loss.
  4. 如权利要求2或3所述的方法,其特征在于,所述内容损失的计算公式满足如下:The method according to claim 2 or 3, wherein the calculation formula of the content loss satisfies the following:
    Figure PCTCN2021070167-appb-100001
    Figure PCTCN2021070167-appb-100001
    式中,Lc表示所述内容损失,mask gt表示所述标注掩膜信息,mask1 pre表示所述第二训练掩膜信息,I 1表示所述第二图像; In the formula, Lc represents the content loss, mask gt represents the labeling mask information, mask1 pre represents the second training mask information, and I 1 represents the second image;
    所述时序损失的计算公式满足如下:The calculation formula of the timing loss satisfies the following:
    Figure PCTCN2021070167-appb-100002
    Figure PCTCN2021070167-appb-100002
    式中,Lst表示所述时序损失,α表示设定的参数,
    Figure PCTCN2021070167-appb-100003
    I 0表示所述第一图像,warp 01表示所述光流信息,mask0 pre表示所述第一训练掩膜信息。
    In the formula, Lst represents the timing loss, α represents the set parameter,
    Figure PCTCN2021070167-appb-100003
    I 0 represents the first image, warp 01 represents the optical flow information, and mask0 pre represents the first training mask information.
  5. 如权利要求2-4中任一项所述的方法,其特征在于,所述标注掩膜信息为截取的直画面的掩膜信息。The method according to any one of claims 2-4, wherein the labeling mask information is the mask information of the captured straight picture.
  6. 如权利要求1-5中任一项所述的方法,其特征在于,在获得训练图像集以及所述训练图像集对应的训练标注信息之前,所述方法还包括:The method according to any one of claims 1 to 5, wherein before obtaining the training image set and the training annotation information corresponding to the training image set, the method further comprises:
    提取所述两个训练图像之间的帧间光流,以获得所述光流信息。The inter-frame optical flow between the two training images is extracted to obtain the optical flow information.
  7. 如权利要求6所述的方法,其特征在于,所述提取所述两个训练图像之间的帧间光流,以获得所述光流信息,包括:The method according to claim 6, wherein the extracting the inter-frame optical flow between the two training images to obtain the optical flow information comprises:
    采用selflow算法提取所述两个训练图像之间的帧间光流,以获得所述光流信息。The selflow algorithm is used to extract the inter-frame optical flow between the two training images to obtain the optical flow information.
  8. 如权利要求1-7中任一项所述的方法,其特征在于,在获得训练图像集以及所述训 练图像集对应的训练标注信息之前,所述方法还包括:The method according to any one of claims 1-7, wherein before obtaining a training image set and training label information corresponding to the training image set, the method further comprises:
    将获得两个对象信息分别与一背景信息进行融合,以生成所述两个训练图像。The two pieces of object information obtained are respectively fused with a piece of background information to generate the two training images.
  9. 如权利要求1-8中任一项所述的方法,其特征在于,所述图像分割模型的网络结构为Unet网络、Deeplabv3或者是SEGNET网络。The method according to any one of claims 1-8, wherein the network structure of the image segmentation model is Unet network, Deeplabv3 or SEGNET network.
  10. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method includes:
    接收待处理图像以及待融合背景;Receive the image to be processed and the background to be fused;
    将所述待处理图像输入至利用如权利要求1-9中任一项所述的方法训练至收敛的图像分割模型,得到所述待处理图像对应的目标掩膜信息;Inputting the image to be processed into an image segmentation model trained to converge using the method according to any one of claims 1-9 to obtain target mask information corresponding to the image to be processed;
    利用所述目标掩膜信息对所述待处理图像以及所述待融合背景进行处理,得到融合图像。The target mask information is used to process the to-be-processed image and the to-be-fused background to obtain a fused image.
  11. 如权利要求10所述的方法,其特征在于,所述接收待处理图像,包括:The method according to claim 10, wherein said receiving the image to be processed comprises:
    将接收的每一帧视频直播画面作为待处理图像。Use each frame of the received live video image as an image to be processed.
  12. 一种图像分割模型训练装置,其特征在于,所述装置包括:An image segmentation model training device, characterized in that the device includes:
    第一处理模块,被配置成获得训练图像集以及所述训练图像集对应的训练标注信息;其中,所述训练图像集包括在时序上相邻的两个训练图像,以及该两个训练图像之间的光流信息;The first processing module is configured to obtain a training image set and training annotation information corresponding to the training image set; wherein, the training image set includes two training images that are adjacent in time series, and one of the two training images Optical flow information between time;
    所述第一处理模块还被配置成,将所述两个训练图像输入所述图像分割模型,得到两个训练掩膜信息;The first processing module is further configured to input the two training images into the image segmentation model to obtain two training mask information;
    更新模块,被配置成根据所述两个训练掩膜信息、所述训练标注信息以及所述光流信息,更新所述图像分割模型的模型参数,直至所述图像分割模型达到设定的收敛条件。An update module configured to update the model parameters of the image segmentation model according to the two training mask information, the training annotation information, and the optical flow information, until the image segmentation model reaches a set convergence condition .
  13. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device includes:
    接收模块,被配置成接收待处理图像以及待融合背景;The receiving module is configured to receive the image to be processed and the background to be fused;
    第二处理模块,被配置成将所述待处理图像输入至利用如权利要求1-9中任一项所述的方法训练至收敛的图像分割模型,得到所述待处理图像对应的目标掩膜信息;The second processing module is configured to input the image to be processed into an image segmentation model trained to converge using the method according to any one of claims 1-9 to obtain a target mask corresponding to the image to be processed information;
    所述第二处理模块还被配置成,利用所述目标掩膜信息对所述待处理图像以及所述待融合背景进行处理,得到融合图像。The second processing module is further configured to use the target mask information to process the to-be-processed image and the to-be-fused background to obtain a fused image.
  14. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    存储器,被配置成存储一个或多个程序;The memory is configured to store one or more programs;
    处理器;processor;
    当所述一个或多个程序被所述处理器执行时,实现如权利要求1-11中任一项所述的方法。When the one or more programs are executed by the processor, the method according to any one of claims 1-11 is implemented.
  15. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该计算机程序被处理器执行时实现如权利要求1-11中任一项所述的方法。A computer-readable storage medium having a computer program stored thereon, wherein the computer program implements the method according to any one of claims 1-11 when the computer program is executed by a processor.
PCT/CN2021/070167 2020-01-07 2021-01-04 Image processing method, image segmentation model training method and related apparatus WO2021139625A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010014372.5A CN111260679B (en) 2020-01-07 2020-01-07 Image processing method, image segmentation model training method and related device
CN202010014372.5 2020-01-07

Publications (1)

Publication Number Publication Date
WO2021139625A1 true WO2021139625A1 (en) 2021-07-15

Family

ID=70923869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/070167 WO2021139625A1 (en) 2020-01-07 2021-01-04 Image processing method, image segmentation model training method and related apparatus

Country Status (2)

Country Link
CN (1) CN111260679B (en)
WO (1) WO2021139625A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570689A (en) * 2021-07-28 2021-10-29 杭州网易云音乐科技有限公司 Portrait cartoon method, apparatus, medium and computing device
CN113610865A (en) * 2021-07-27 2021-11-05 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN115457119A (en) * 2022-09-21 2022-12-09 正泰集团研发中心(上海)有限公司 Bus bar labeling method and device, computer equipment and readable storage medium
CN117237397A (en) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 Portrait segmentation method, system, equipment and storage medium based on feature fusion

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111260679B (en) * 2020-01-07 2022-02-01 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related device
CN112351291A (en) * 2020-09-30 2021-02-09 深圳点猫科技有限公司 Teaching interaction method, device and equipment based on AI portrait segmentation
CN112560583A (en) * 2020-11-26 2021-03-26 复旦大学附属中山医院 Data set generation method and device
CN112669324B (en) * 2020-12-31 2022-09-09 中国科学技术大学 Rapid video target segmentation method based on time sequence feature aggregation and conditional convolution
CN113051430B (en) * 2021-03-26 2024-03-26 北京达佳互联信息技术有限公司 Model training method, device, electronic equipment, medium and product
CN113393465A (en) * 2021-05-26 2021-09-14 浙江吉利控股集团有限公司 Image generation method and device
CN115836319A (en) * 2021-07-15 2023-03-21 京东方科技集团股份有限公司 Image processing method and device
CN114782460B (en) * 2022-06-21 2022-10-18 阿里巴巴达摩院(杭州)科技有限公司 Image segmentation model generation method, image segmentation method and computer equipment

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942794A (en) * 2014-04-16 2014-07-23 南京大学 Image collaborative cutout method based on confidence level
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning
CN108875900A (en) * 2017-11-02 2018-11-23 北京旷视科技有限公司 Method of video image processing and device, neural network training method, storage medium
CN109697689A (en) * 2017-10-23 2019-04-30 北京京东尚科信息技术有限公司 Storage medium, electronic equipment, image synthesizing method and device
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system
CN110176027A (en) * 2019-05-27 2019-08-27 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN111260679A (en) * 2020-01-07 2020-06-09 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7676081B2 (en) * 2005-06-17 2010-03-09 Microsoft Corporation Image segmentation of foreground from background layers
CN109978893B (en) * 2019-03-26 2023-06-20 腾讯科技(深圳)有限公司 Training method, device, equipment and storage medium of image semantic segmentation network
CN110472593B (en) * 2019-08-20 2021-02-09 重庆紫光华山智安科技有限公司 Training image acquisition method, model training method and related device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103942794A (en) * 2014-04-16 2014-07-23 南京大学 Image collaborative cutout method based on confidence level
US20170372479A1 (en) * 2016-06-23 2017-12-28 Intel Corporation Segmentation of objects in videos using color and depth information
CN109697689A (en) * 2017-10-23 2019-04-30 北京京东尚科信息技术有限公司 Storage medium, electronic equipment, image synthesizing method and device
CN107808389A (en) * 2017-10-24 2018-03-16 上海交通大学 Unsupervised methods of video segmentation based on deep learning
CN108875900A (en) * 2017-11-02 2018-11-23 北京旷视科技有限公司 Method of video image processing and device, neural network training method, storage medium
CN110060264A (en) * 2019-04-30 2019-07-26 北京市商汤科技开发有限公司 Neural network training method, video frame processing method, apparatus and system
CN110176027A (en) * 2019-05-27 2019-08-27 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN111260679A (en) * 2020-01-07 2020-06-09 广州虎牙科技有限公司 Image processing method, image segmentation model training method and related device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610865A (en) * 2021-07-27 2021-11-05 Oppo广东移动通信有限公司 Image processing method, image processing device, electronic equipment and computer readable storage medium
CN113610865B (en) * 2021-07-27 2024-03-29 Oppo广东移动通信有限公司 Image processing method, device, electronic equipment and computer readable storage medium
CN113570689A (en) * 2021-07-28 2021-10-29 杭州网易云音乐科技有限公司 Portrait cartoon method, apparatus, medium and computing device
CN113570689B (en) * 2021-07-28 2024-03-01 杭州网易云音乐科技有限公司 Portrait cartoon method, device, medium and computing equipment
CN115457119A (en) * 2022-09-21 2022-12-09 正泰集团研发中心(上海)有限公司 Bus bar labeling method and device, computer equipment and readable storage medium
CN115457119B (en) * 2022-09-21 2023-10-27 正泰集团研发中心(上海)有限公司 Bus bar labeling method, device, computer equipment and readable storage medium
CN117237397A (en) * 2023-07-13 2023-12-15 天翼爱音乐文化科技有限公司 Portrait segmentation method, system, equipment and storage medium based on feature fusion

Also Published As

Publication number Publication date
CN111260679A (en) 2020-06-09
CN111260679B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
WO2021139625A1 (en) Image processing method, image segmentation model training method and related apparatus
Sindagi et al. Multi-level bottom-top and top-bottom feature fusion for crowd counting
WO2019114405A1 (en) Video recognition and training method and apparatus, electronic device and medium
Shivakumar et al. Dfusenet: Deep fusion of rgb and sparse depth information for image guided dense depth completion
Wang et al. Deep online video stabilization with multi-grid warping transformation learning
US11722727B2 (en) Special effect processing method and apparatus for live broadcasting, and server
CN109859295B (en) Specific cartoon face generation method, terminal device and storage medium
Manen et al. Pathtrack: Fast trajectory annotation with path supervision
CN111783647B (en) Training method of face fusion model, face fusion method, device and equipment
KR20220006657A (en) Remove video background using depth
US9881207B1 (en) Methods and systems for real-time user extraction using deep learning networks
Zhu et al. Cross-modality 3d object detection
WO2022156622A1 (en) Sight correction method and apparatus for face image, device, computer-readable storage medium, and computer program product
WO2022156626A1 (en) Image sight correction method and apparatus, electronic device, computer-readable storage medium, and computer program product
US20210158008A1 (en) UAV Video Aesthetic Quality Evaluation Method Based On Multi-Modal Deep Learning
CN111402399A (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
JP2016110653A (en) Method for dividing and tracking content in video stream
Tang et al. Tafnet: A three-stream adaptive fusion network for rgb-t crowd counting
WO2022120997A1 (en) Distributed slam system and learning method therefor
WO2022218012A1 (en) Feature extraction method and apparatus, device, storage medium, and program product
CN111815595A (en) Image semantic segmentation method, device, equipment and readable storage medium
CN112308770A (en) Portrait conversion model generation method and portrait conversion method
Kim et al. End-to-end lip synchronisation based on pattern classification
CN101945299B (en) Camera-equipment-array based dynamic scene depth restoring method
Shen et al. RGBT tracking based on cooperative low-rank graph model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21738740

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21738740

Country of ref document: EP

Kind code of ref document: A1