WO2019080685A1 - 视频图像分割方法及装置、存储介质、电子设备 - Google Patents

视频图像分割方法及装置、存储介质、电子设备

Info

Publication number
WO2019080685A1
WO2019080685A1 PCT/CN2018/107388 CN2018107388W WO2019080685A1 WO 2019080685 A1 WO2019080685 A1 WO 2019080685A1 CN 2018107388 W CN2018107388 W CN 2018107388W WO 2019080685 A1 WO2019080685 A1 WO 2019080685A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
frame
segmented
divided
image segmentation
Prior art date
Application number
PCT/CN2018/107388
Other languages
English (en)
French (fr)
Inventor
安山
朱兆琪
陈宇
翁志
Original Assignee
北京京东尚科信息技术有限公司
北京京东世纪贸易有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东尚科信息技术有限公司, 北京京东世纪贸易有限公司 filed Critical 北京京东尚科信息技术有限公司
Priority to US16/757,760 priority Critical patent/US11227393B2/en
Publication of WO2019080685A1 publication Critical patent/WO2019080685A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/174Segmentation; Edge detection involving the use of two or more images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • G06T5/75Unsharp masking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Definitions

  • the present disclosure relates to the field of video image segmentation technologies, and in particular, to a video image segmentation method, a video image segmentation device, a computer readable storage medium, and an electronic device.
  • a video image segmentation method including:
  • the image to be segmented is segmented using the adjusted image segmentation model.
  • adjusting the image segmentation model by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame includes:
  • the image is segmented by using a first frame of the image to be divided, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • the model is adjusted.
  • the video image segmentation method further includes:
  • the first frame of the image to be segmented, the preset frame, the previous frame of the frame to be divided, and the mask image of the previous frame are used.
  • the image segmentation model is adjusted.
  • the video image segmentation method further includes:
  • Extracting the preset frame from the image to be segmented Extracting the preset frame from the image to be segmented.
  • extracting the preset frame from the image to be segmented includes:
  • the preset value is any value within 0.6 to 0.9.
  • a video image segmentation apparatus including:
  • a machine learning module configured to perform machine learning using a historical video image and a mask image of the historical video image to obtain an image segmentation model
  • a model adjustment module configured to adjust the image segmentation model by using a first frame of the image to be divided, a previous frame of the frame to be divided, and a mask image of the previous frame;
  • An image segmentation module is configured to segment the image to be segmented by using an adjusted image segmentation model.
  • adjusting the image segmentation model by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame includes:
  • the image is segmented by using a first frame of the image to be divided, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • the model is adjusted.
  • the video image segmentation apparatus after determining whether the number of frames to be divided of the image to be segmented exceeds a preset threshold, the video image segmentation apparatus further includes:
  • the first frame of the image to be segmented, the preset frame, the previous frame of the frame to be divided, and the mask image of the previous frame are used.
  • the image segmentation model is adjusted.
  • the video image segmentation apparatus further includes:
  • an extraction module configured to extract the preset frame from the image to be segmented.
  • extracting the preset frame from the image to be segmented includes:
  • the preset value is any value within 0.6 to 0.9.
  • a computer readable storage medium having stored thereon a computer program, the computer program being executed by a processor to implement the video image segmentation method of any of the above.
  • an electronic device including:
  • a memory for storing executable instructions of the processor
  • the processor is configured to perform the video image segmentation method according to any one of the above items by executing the executable instruction.
  • Fig. 1 schematically shows a flow chart of a video image segmentation method.
  • FIG. 2 schematically shows a flow block diagram of a video image segmentation method.
  • Fig. 3 schematically shows a flow chart of a method of adjusting an image segmentation model.
  • Fig. 4 schematically shows a block diagram of a video image dividing device.
  • FIG. 5 schematically shows an exemplary diagram of an electronic device for implementing the above-described video image segmentation method.
  • Fig. 6 schematically shows a computer readable storage medium for implementing the above video image segmentation method.
  • Video object segmentation may also be referred to as VOS (Video Object Segmentation), which requires video frames to be extracted from the video frame by frame and segmented within the frame.
  • VOS Video Object Segmentation
  • the following two schemes may be included:
  • One is a single-lens video object segmentation (OSVOS) method, which fine-tunes an already trained model by using a first frame of a video to be segmented image, and then uses a fine-tuned model to generate a subsequent frame.
  • OSVOS single-lens video object segmentation
  • an On-Line Adaptation Video Object Segmentation (OnAVOS) method uses the result generated in the previous frame and the 0th frame to fine-tune the already trained model.
  • the generation of the next frame is a guide.
  • the method improves the segmentation effect.
  • the segmentation result of the previous frame cannot be very A good guide to the generation of image segmentation results in the next frame results in a worse and worse segmentation result.
  • the video object segmentation method may include the following steps: first, establishing a segmentation model, and then performing model training through a video frame and a corresponding mask image; further, in For video segmentation, the most instructive frame of frame 0 is needed. First, the network is fine-tuned by the 0th frame. In the process of generating subsequent frames, the online adaptive method is adopted, that is, the result generated by the previous frame and the 0th frame are used. Fine-tune the network to guide the generation of the next frame.
  • the video object segmentation method only uses the 0th frame and the previous frame of the generated frame to perform fine adjustment of the network parameters, and the adjacent frames of the multi-frame video have a large correlation, and the subsequent frames are too different from the 0 frames.
  • the adjacent frame and the 0th frame may cause the occlusion object to not recover well in the next few frames.
  • a video image segmentation method is first provided in the present exemplary embodiment.
  • the video image segmentation method may include the following steps:
  • Step S110 Performing machine learning using the historical video image and the mask image of the historical video image to obtain an image segmentation model.
  • Step S120 Adjusting the image segmentation model by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • Step S130 Segment the image to be segmented by using the adjusted image segmentation model.
  • the image segmentation model is adjusted by using the first frame of the image to be segmented, the previous frame of the frame to be segmented, and the mask image of the previous frame, and the adjusted image segmentation model is utilized. Segmentation of a plurality of to-be-divided frames to be divided into images, which avoids the fact that the adjacent frames of the 0th frame and the generated frame are adjusted in the prior art, and the adjacent frames of the multi-frame video have a large correlation. The difference between the back frame and the 0th frame is too large.
  • the image segmentation model is adjusted by using the first frame of the image to be segmented, the previous frame of the frame to be segmented, and the mask image of the previous frame.
  • the plurality of to-be-divided frames to be segmented are segmented, and the image segmentation model is adjusted only by using the first frame of the image to be segmented. For large changes in subsequent predictive frames thereby causing a problem of fragmentation subsequent segmented frames further improve the quality of image segmentation.
  • step S110 machine learning is performed using the history video image and the mask image of the historical video image to obtain an image segmentation model.
  • image segmentation model In detail:
  • the image segmentation model 201 is obtained by machine learning (training) using the history video image 202 and the mask image of the historical video image. Further, the original video frame and the artificially-marked object segmentation mask may be used as the data of the segmentation model training, and the network weights for segmentation are obtained through network training, and then the network weights may be used to predict the subsequent large frame changes. , to avoid the splitting problem of subsequent frame segmentation.
  • the above network training network can use VGGNet (a convolutional neural network model) and residual network as the basic structure of the network, which can include 38 hidden layers and adopt Imagenet as the network pre- Train the data to get richer object feature parameters.
  • VGGNet a convolutional neural network model
  • residual network as the basic structure of the network, which can include 38 hidden layers and adopt Imagenet as the network pre- Train the data to get richer object feature parameters.
  • step S120 the image segmentation model is adjusted by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • the adjustment of the image segmentation model may include step S1202-step S1206. among them:
  • step S1202 it is determined whether the number of frames to be divided of the image to be segmented exceeds a preset threshold.
  • the number of the to-be-divided frames of the image to be segmented is calculated, and it is determined whether the number of the to-be-divided frames of the image to be segmented exceeds a preset threshold; wherein the preset threshold may be 10 or 20, or may be other values, for example, may be 25 Or 30 and 40, etc., this example does not impose any special restrictions.
  • step S1204 when it is determined that the number of frames to be divided of the image to be divided does not exceed a preset threshold, the first frame of the image to be divided, the previous frame of the frame to be divided, and the mask image of the previous frame are utilized.
  • the image segmentation model is adjusted.
  • the image to be segmented may be directly used.
  • the first frame, the previous frame image of the frame to be divided, and the mask image of the previous frame image are adjusted to the image segmentation model.
  • the first frame of the segmentation image needs to be extracted, and the specific frame may include: manually adopting the first frame of the image to be segmented
  • the extraction may also use other processing software to extract the first frame of the segmented image, which is not particularly limited in this example.
  • step S1206 when it is determined that the number of frames to be divided of the image to be segmented exceeds a preset threshold, the first frame of the image to be segmented, the preset frame, the previous frame of the frame to be divided, and the previous frame are used.
  • the mask image adjusts the image segmentation model.
  • the number of to-be-divided frames of the image to be segmented 203 exceeds a preset threshold (greater than 10 frames)
  • a preset threshold greater than 10 frames
  • the image to be segmented needs to be utilized.
  • the one-frame image 204, the preset frame image 206, the previous frame (output frame) 205 image of the frame to be divided, and the mask image of the previous frame are adjusted to the image segmentation model 201.
  • the video image segmentation method further includes: extracting the preset frame from the image to be segmented; wherein extracting the preset frame image may include: multiplying the number of frames to be divided by the image to be segmented by a preset The value obtains the number of extracted frames; and extracts the corresponding frame of the extracted frame number as the preset frame.
  • the preset value may be 0.6-0.9 Any value.
  • the number of video frames be n.
  • 10 ⁇ n ⁇ 15 add the 10th image to the image based on the first frame of the image to be divided, the previous frame image of the frame to be divided, and the mask image of the previous frame.
  • Information as a learning frame fine-tuning the image segmentation model to prevent the feature offset of the object after the video is occluded;
  • the image information of the 15th frame of the preset frame is added to the image by using the first frame of the image to be divided, the previous frame image of the frame to be divided, and the mask image of the previous frame.
  • the segmentation model is fine-tuned to prevent the feature offset of the object after the video is occluded;
  • the image information of the 20th frame is added to the image based on the first frame of the image to be divided, the previous frame image of the frame to be divided, and the mask image of the previous frame, and the image segmentation model is added. Fine-tuning to prevent the feature offset of the object after the video is occluded;
  • the image information of the 25th frame is added to the image by using the first frame of the image to be divided, the previous frame image of the frame to be divided, and the mask image of the previous frame, and the image segmentation model is performed. Fine-tuning to prevent feature offsets of objects after occlusion of the video.
  • step S130 the image to be segmented is segmented by using the adjusted image segmentation model.
  • the adjusted image segmentation model In detail:
  • the image segmentation model is performed by using the first frame of the image to be segmented, the previous frame image of the frame to be segmented, and the mask image of the previous frame. Adjusting and using the adjusted image segmentation model to segment the segmented image; when the number of frames to be segmented of the image to be segmented is greater than a preset threshold, using the image to be segmented image, the first frame, the preset frame image, and the frame to be segmented The image of the previous frame and the mask image of the previous frame are adjusted to the image segmentation model, and the image to be segmented is segmented by the adjusted image segmentation model.
  • the present disclosure also provides a video image segmentation apparatus.
  • the video image segmentation apparatus includes a machine learning module 410, a model adjustment module 420, and an image segmentation module 430. among them:
  • the machine learning module 410 can be configured to perform machine learning using the historical video image and the mask image of the historical video image to obtain an image segmentation model.
  • the model adjustment module 420 can be configured to adjust the image segmentation model by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • the image segmentation module 430 can be configured to segment the image to be segmented by using the adjusted image segmentation model.
  • adjusting the image segmentation model by using a first frame of the image to be segmented, a previous frame of the frame to be divided, and a mask image of the previous frame includes:
  • the image is segmented by using a first frame of the image to be divided, a previous frame of the frame to be divided, and a mask image of the previous frame.
  • the model is adjusted.
  • the video image segmentation apparatus after determining whether the number of the to-be-divided frames of the image to be segmented exceeds a preset threshold, the video image segmentation apparatus further includes:
  • the first frame of the image to be segmented, the preset frame, the previous frame of the frame to be divided, and the mask image of the previous frame are used.
  • the image segmentation model is adjusted.
  • the video image segmentation apparatus further includes:
  • an extraction module configured to extract the preset frame from the image to be segmented.
  • extracting the preset frame from the image to be segmented includes:
  • the number of extracted frames is obtained by multiplying the number of frames to be divided by the image to be divided by a preset value.
  • the preset value is any value within 0.6-0.9.
  • the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network.
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a number of instructions are included to cause a computing device (which may be a personal computer, server, mobile terminal, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.
  • an electronic device capable of implementing the above method is also provided.
  • FIG. 5 An electronic device 600 according to such an embodiment of the present disclosure is described below with reference to FIG. 5 is merely an example and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • electronic device 600 is embodied in the form of a general purpose computing device.
  • the components of the electronic device 600 may include, but are not limited to, the at least one processing unit 610, the at least one storage unit 620, and a bus 630 that connects different system components (including the storage unit 620 and the processing unit 610).
  • the storage unit stores program code, which can be executed by the processing unit 610, such that the processing unit 610 performs various exemplary embodiments according to the present disclosure described in the "Exemplary Method" section of the present specification.
  • the processing unit 610 may perform step S110 as shown in FIG. 1 : performing machine learning using a history video image and a mask image of the historical video image to obtain an image segmentation model; S120: using the first image to be segmented The frame, the previous frame of the frame to be divided, and the mask image of the previous frame are adjusted to the image segmentation model; and step S130: segmenting the image to be segmented by using the adjusted image segmentation model.
  • the storage unit 620 can include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 6201 and/or a cache storage unit 6202, and can further include a read only storage unit (ROM) 6203.
  • RAM random access storage unit
  • ROM read only storage unit
  • the storage unit 620 can also include a program/utility 6204 having a set (at least one) of the program modules 6205, such program modules 6205 including but not limited to: an operating system, one or more applications, other program modules, and program data, Implementations of the network environment may be included in each or some of these examples.
  • Bus 630 may represent one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any of a variety of bus structures. bus.
  • the electronic device 600 can also communicate with one or more external devices 700 (eg, a keyboard, pointing device, Bluetooth device, etc.), and can also communicate with one or more devices that enable the user to interact with the electronic device 600, and/or with The electronic device 600 is enabled to communicate with any device (e.g., router, modem, etc.) that is in communication with one or more other computing devices. This communication can take place via an input/output (I/O) interface 650. Also, electronic device 600 can communicate with one or more networks (eg, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through network adapter 660. As shown, network adapter 660 communicates with other modules of electronic device 600 via bus 630.
  • network adapter 660 communicates with other modules of electronic device 600 via bus 630.
  • the technical solution according to an embodiment of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.) or on a network.
  • a non-volatile storage medium which may be a CD-ROM, a USB flash drive, a mobile hard disk, etc.
  • a number of instructions are included to cause a computing device (which may be a personal computer, server, terminal device, or network device, etc.) to perform a method in accordance with an embodiment of the present disclosure.
  • a computer readable storage medium having stored thereon a program product capable of implementing the above method of the present specification.
  • various aspects of the present disclosure may also be embodied in the form of a program product comprising program code for causing said program product to run on a terminal device The terminal device performs the steps according to various exemplary embodiments of the present disclosure described in the "Exemplary Method" section of the present specification.
  • a program product 800 for implementing the above method which may employ a portable compact disk read only memory (CD-ROM) and includes program code, and may be at a terminal device, is described in accordance with an embodiment of the present disclosure.
  • CD-ROM portable compact disk read only memory
  • the program product of the present disclosure is not limited thereto, and in this document, the readable storage medium may be any tangible medium that contains or stores a program that can be used by or in connection with an instruction execution system, apparatus, or device.
  • the program product can employ any combination of one or more readable media.
  • the readable medium can be a readable signal medium or a readable storage medium.
  • the readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the above. More specific examples (non-exhaustive lists) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the foregoing.
  • the computer readable signal medium may include a data signal that is propagated in the baseband or as part of a carrier, carrying readable program code. Such propagated data signals can take a variety of forms including, but not limited to, electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • the readable signal medium can also be any readable medium other than a readable storage medium that can transmit, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
  • Program code embodied on a readable medium can be transmitted using any suitable medium, including but not limited to wireless, wireline, optical cable, RF, etc., or any suitable combination of the foregoing.
  • Program code for performing the operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language, such as Java, C++, etc., including conventional procedural Programming language—such as the "C" language or a similar programming language.
  • the program code can execute entirely on the user computing device, partially on the user device, as a stand-alone software package, partially on the remote computing device on the user computing device, or entirely on the remote computing device or server. Execute on.
  • the remote computing device can be connected to the user computing device via any kind of network, including a local area network (LAN) or wide area network (WAN), or can be connected to an external computing device (eg, provided using an Internet service) Businesses are connected via the Internet).
  • LAN local area network
  • WAN wide area network
  • Businesses are connected via the Internet.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

本公开是关于一种视频图像分割方法及装置,属于视频图像分割技术领域,该方法包括:利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;利用调整后的图像分割模型对所述待分割图像进行分割。该方法可以提高视频图像分割的精度。

Description

视频图像分割方法及装置、存储介质、电子设备 技术领域
本公开涉及视频图像分割技术领域,具体而言,涉及一种视频图像分割方法、视频图像分割装置、计算机可读存储介质以及电子设备。
背景技术
随着电子商务的快速发展,各大电子商务平台之间的竞争也越来越白热化。因此,为了提高自身的竞争力并未用户提供更全面的商品信息,大部分的电子商务平台都通过对商品进行视频录制并通过视频分割的方法得到商品的各个方位的视图以使用户可以得到更加全面的商品信息。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于提供一种视频图像分割方法、视频图像分割装置、计算机可读存储介质以及电子设备,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的一个或者多个问题。
根据本公开的一个方面,提供一种视频图像分割方法,包括:
利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;
利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;
利用调整后的图像分割模型对所述待分割图像进行分割。
在本公开的一种示例性实施例中,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整包括:
判断所述待分割图像的待分割帧数是否超过预设阈值;
在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例性实施例中,在判断所述待分割图像的待分割帧数是否超过预设阈值之后,所述视频图像分割方法还包括:
在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例性实施例中,所述视频图像分割方法还包括:
从所述待分割图像中提取所述预设帧。
在本公开的一种示例性实施例中,从所述待分割图像中提取所述预设帧包括:
利用所述待分割图像的待分割帧数乘以预设值得到提取帧数;
提取所述提取帧数的对应帧作为所述预设帧。
在本公开的一种示例性实施例中,所述预设值为0.6~0.9内的任意值。
根据本公开的一个方面,提供一种视频图像分割装置,包括:
机器学习模块,用于利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;
模型调整模块,用于利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;
图像分割模块,用于利用调整后的图像分割模型对所述待分割图像进行分割。
在本公开的一种示例性实施例中,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整包括:
判断所述待分割图像的待分割帧数是否超过预设阈值;
在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例性实施例中,在判断所述待分割图像的待分割帧数是否超过预设阈值之后,所述视频图像分割装置还包括:
在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例性实施例中,所述视频图像分割装置还包括:
提取模块,用于从所述待分割图像中提取所述预设帧。
在本公开的一种示例性实施例中,从所述待分割图像中提取所述预设帧包括:
利用所述待分割图像的待分割帧数乘以预设值得到提取帧数;
提取所述提取帧数的对应帧作为所述预设帧。
在本公开的一种示例性实施例中,所述预设值为0.6~0.9内的任意值。
根据本公开的一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现上述任意一项所述的视频图像分割方法。
根据本公开的一个方面,提供一种电子设备,包括:
处理器;以及
存储器,用于存储所述处理器的可执行指令;
其中,所述处理器配置为经由执行所述可执行指令来执行上述任意一项所述的视频图像分割方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1示意性示出一种视频图像分割方法的流程图。
图2示意性示出一种视频图像分割方法的流程框图。
图3示意性示出一种对图像分割模型进行调整的方法流程图。
图4示意性示出一种视频图像分割装置的方框图。
图5示意性示出一种用于实现上述视频图像分割方法的电子设备示例图。
图6示意性示出一种用于实现上述视频图像分割方法的计算机可读存储介质。
具体实施方式
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络 和/或处理器装置和/或微控制器装置中实现这些功能实体。
视频物体分割也可以被称为VOS(Video Object Segmentation),需要逐帧从视频中提取视频帧并对帧内物体进行分割。在现有的视频图像分割方法中,可以包括以下两种方案:
一种是,单镜头视频对象分割(OSVOS,One-shot Video Object Segmentation)方法,该方法通过视频待分割图像第一帧对已经训练好的模型进行微调,然后利用微调好的模型生成后续帧的图像分割结果;但是该方法仅利用第一帧学习参数对模型进行微调,无法实现对后续帧大的变化进行预测,从而导致后续帧分割的割裂;
另一种是,在线自适应的视频物体分割(OnAVOS,On-line Adaption Video Object Segmentation)方法被提出,该方法采用前一帧生成的结果和第0帧对已经训练好的模型进行微调,为下一帧的生成作指导,相比于只采用第一帧进行指导的分割方法,该方法对分割效果有所改进,但是对于物体遮挡、物体发生旋转等问题,前一帧的分割结果不能很好的指导下一帧图像分割结果的生成,导致分割结果越来越差。
在一种在线自适应的视频物体分割方法(OnAVOS)中,该视频物体分割方法可以包括以下步骤:首先,建立分割模型,然后通过视频帧和对应的掩模图像进行模型训练;进一步的,在进行视频分割时,需要第0帧最为指导帧,首先采用第0帧对网络进行微调,在后续帧的生成过程中,采用在线自适应的方法,即采用前一帧生成的结果和第0帧对网络做微调,为下一帧的生成做指导。
但是,上述视频物体分割方法仅采用第0帧和生成帧的前一帧进行网络参数的微调,而多帧视频相邻两帧具有较大的关联性,而后面帧与0帧相差又太大,而当遇到帧内分割物体遭遇到遮挡时,只采用相邻帧和第0帧可能导致遮挡物体在下几帧不能很好的恢复。
本示例实施方式中首先提供了一种视频图像分割方法。参考图1所示,该视频图像分割方法可以包括以下步骤:
步骤S110.利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型。
步骤S120.利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
步骤S130.利用调整后的图像分割模型对所述待分割图像进行分割。
在上述视频图像分割方法中,一方面,通过利用待分割图像的第一帧、待分割帧的前一帧以及前一帧的掩模图像对图像分割模型进行调整在利用调整后的图像分割模型对待分割图像的多个待分割帧进行分割,避免了现有技术方案中由于采用第0帧和生成帧 的前一帧进行调整造成的由于多帧视频相邻两帧具有较大的关联性而后面帧与第0帧相差又太大,当遇到帧内分割物体遭遇到遮挡时导致遮挡物体在下几帧不能很好的恢复的问题,提高了分割图像的质量使得用户可以查看到更加清晰的图像,提升了用户体验;另一方面,通过利用待分割图像的第一帧、待分割帧的前一帧以及前一帧的掩模图像对图像分割模型进行调整在利用调整后的图像分割模型对待分割图像的多个待分割帧进行分割,解决了仅采用待分割图像第一帧对图像分割模型进行调整而无法实现对后续帧大的变化进行预测从而导致后续帧分割的割裂的问题,进一步的提高了分割图像的质量。
下面,将对本示例实施方式中上述视频图像分割方法中的各步骤进行详细的解释以及说明。
在步骤S110中,利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型。详细而言:
参考图2所示,利用历史视频图像202以及历史视频图像的掩模图像进行机器学习(训练)得到图像分割模型201。进一步的,还可以采用原始视频帧和人工标记的物体分割掩模作为分割模型训练的数据,通过网络训练得到用于分割的网络权重,然后可以再利用该网络权重对后续帧大的变化进行预测,避免了后续帧分割的割裂问题。此处需要补充说明的是,上述网络训练的网络可以采用VGGNet(一种卷积神经网络模型)和残差网络作为网络的基本结构,可以包括38个隐含层,并采用Imagenet作为网络的预训练数据以获得更丰富的物体特征参数。
在步骤S120中,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。参考图3所示,对图像分割模型进行调整可以包括步骤S1202-步骤S1206。其中:
在步骤S1202中,判断所述待分割图像的待分割帧数是否超过预设阈值。详细而言:
首先,计算待分割图像的待分割帧数并判断该待分割图像的待分割帧数是否超过预设阈值;其中,该预设阈值可以为10或者20,也可以为其他值,例如可以是25或者30以及40等等,本示例对此不做特殊限制。
在步骤S1204中,在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。详细而言:
当待分割图像的待分割帧数未超过预设阈值(小于等于10帧时),可以判断该待分割图像的待分割帧数未超过上述预设阈值,可以直接利用待分割图像的待分割图像第一帧、待分割帧的前一帧图像以前一帧图像的掩模图像对图像分割模型进行调整。此处需要补充说明的是,为了可以利用待分割图像第一帧对图像分割模型进行调整,还需要对 待分割图像第一帧进行提取,具体的可以包括:可以利用人工进行待分割图像第一帧的提取,也可以利用其他的处理软件对待分割图像第一帧进行提取,本示例对此不做特殊限制。
在步骤S1206中,在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。详细而言:
参考图2所示,当待分割图像203的待分割帧数超过预设阈值(大于10帧时),可以判断该待分割图像的待分割帧数超过上述预设阈值,需要利用待分割图像第一帧图像204、预设帧图像206、待分割帧的前一帧(输出帧)205图像以及前一帧的掩模图像对图像分割模型201进行调整。通过增加该预设帧图像对图像分割模型进行调整,在保证物体基本特征的前提下,保证了视频中运动物体在被遮挡时再次出现时也能被很好的分割,并防止了物体特征的偏移现象,进一步的提升了图像分割的精度。
进一步的,上述视频图像分割方法还包括:从所述待分割图像中提取所述预设帧;其中,提取预设帧图像可以包括:利用所述待分割图像的待分割帧数乘以预设值得到提取帧数;提取所述提取帧数的对应帧作为所述预设帧。详细而言:
首先,利用待分割图像的待分割帧数乘以预设值得到一提取帧数,然后提取该提取帧数的对应帧作为上述预设帧图像;其中,上述预设值可以为0.6-0.9中的任意值。举例而言:
设视频帧数为n,当10<n≤15时,在采用待分割图像第一帧、待分割帧的前一帧图像以及前一帧的掩模图像对图像的基础上加入第10的图像信息作为学习帧,对图像分割模型进行微调以防止视频发生遮挡后物体的特征偏移;
当15<n≤20时,在采用待分割图像第一帧、待分割帧的前一帧图像以及前一帧的掩模图像对图像基础上加入预设帧第15帧的图像信息,对图像分割模型进行微调以防止视频发生遮挡后物体的特征偏移;
当20<n≤25时,在采用待分割图像第一帧、待分割帧的前一帧图像以及前一帧的掩模图像对图像的基础上加入第20帧的图像信息,对图像分割模型进行微调以防止视频发生遮挡后物体的特征偏移;
当25<n≤30时,在采用待分割图像第一帧、待分割帧的前一帧图像以及前一帧的掩模图像对图像基础上加入第25帧的图像信息,对图像分割模型进行微调以防止视频发生遮挡后物体的特征偏移。
在步骤S130中,利用调整后的图像分割模型对所述待分割图像进行分割。详细而言:
参考图2所示,当待分割图像的待分割帧数小于预设阈值时,利用待分割图像第一 帧、待分割帧的前一帧图像以及前一帧的掩模图像对图像分割模型进行调整并利用调整后的图像分割模型对待分割图像进行分割;当待分割图像的待分割帧数大于预设阈值时,利用待分割图像的待分割图像第一帧、预设帧图像、待分割帧的前一帧图像以及前一帧的掩模图像对图像分割模型进行调整并利用调整后的图像分割模型对待分割图像进行分割。
本公开还提供了一种视频图像分割装置。参考图4所示,该视频图像分割装置包括机器学习模块410、模型调整模块420以及图像分割模块430。其中:
机器学习模块410可以用于利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型。
模型调整模块420可以用于利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
图像分割模块430可以用于利用调整后的图像分割模型对所述待分割图像进行分割。
在本公开的一种示例实施方式中,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整包括:
判断所述待分割图像的待分割帧数是否超过预设阈值。
在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例实施方式中,在判断所述待分割图像的待分割帧数是否超过预设阈值之后,所述视频图像分割装置还包括:
在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
在本公开的一种示例实施方式中,所述视频图像分割装置还包括:
提取模块,用于从所述待分割图像中提取所述预设帧。
在本公开的一种示例实施方式中,从所述待分割图像中提取所述预设帧包括:
利用所述待分割图像的待分割帧数乘以预设值得到提取帧数。
提取所述提取帧数的对应帧作为所述预设帧。
在本公开的一种示例实施方式中,所述预设值为0.6~0.9内的任意值。
上述视频图像分割装置中各模块的具体细节已经在对应的视频图像分割方法中进行了详细想描述,因此此处不再赘述。
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多 模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。
此外,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、移动终端、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种能够实现上述方法的电子设备。
所属技术领域的技术人员能够理解,本公开的各个方面可以实现为系统、方法或程序产品。因此,本公开的各个方面可以具体实现为以下形式,即:完全的硬件实施方式、完全的软件实施方式(包括固件、微代码等),或硬件和软件方面结合的实施方式,这里可以统称为“电路”、“模块”或“系统”。
下面参照图5来描述根据本公开的这种实施方式的电子设备600。图5显示的电子设备600仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图5所示,电子设备600以通用计算设备的形式表现。电子设备600的组件可以包括但不限于:上述至少一个处理单元610、上述至少一个存储单元620、连接不同系统组件(包括存储单元620和处理单元610)的总线630。
其中,所述存储单元存储有程序代码,所述程序代码可以被所述处理单元610执行,使得所述处理单元610执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。例如,所述处理单元610可以执行如图1中所示的步骤S110:利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;S120:利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;步骤S130:利用调整后的图像分割模型对所述待分割图像进行分割。
存储单元620可以包括易失性存储单元形式的可读介质,例如随机存取存储单元(RAM)6201和/或高速缓存存储单元6202,还可以进一步包括只读存储单元(ROM)6203。
存储单元620还可以包括具有一组(至少一个)程序模块6205的程序/实用工具6204,这样的程序模块6205包括但不限于:操作系统、一个或者多个应用程序、其它程序模块以及程序数据,这些示例中的每一个或某种组合中可能包括网络环境的实现。
总线630可以为表示几类总线结构中的一种或多种,包括存储单元总线或者存储单元控制器、外围总线、图形加速端口、处理单元或者使用多种总线结构中的任意总线结构的局域总线。
电子设备600也可以与一个或多个外部设备700(例如键盘、指向设备、蓝牙设备等)通信,还可与一个或者多个使得用户能与该电子设备600交互的设备通信,和/或与使得该电子设备600能与一个或多个其它计算设备进行通信的任何设备(例如路由器、调制解调器等等)通信。这种通信可以通过输入/输出(I/O)接口650进行。并且,电子设备600还可以通过网络适配器660与一个或者多个网络(例如局域网(LAN),广域网(WAN)和/或公共网络,例如因特网)通信。如图所示,网络适配器660通过总线630与电子设备600的其它模块通信。应当明白,尽管图中未示出,可以结合电子设备600使用其它硬件和/或软件模块,包括但不限于:微代码、设备驱动器、冗余处理单元、外部磁盘驱动阵列、RAID系统、磁带驱动器以及数据备份存储系统等。
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。
在本公开的示例性实施例中,还提供了一种计算机可读存储介质,其上存储有能够实现本说明书上述方法的程序产品。在一些可能的实施方式中,本公开的各个方面还可以实现为一种程序产品的形式,其包括程序代码,当所述程序产品在终端设备上运行时,所述程序代码用于使所述终端设备执行本说明书上述“示例性方法”部分中描述的根据本公开各种示例性实施方式的步骤。
参考图6所示,描述了根据本公开的实施方式的用于实现上述方法的程序产品800,其可以采用便携式紧凑盘只读存储器(CD-ROM)并包括程序代码,并可以在终端设备,例如个人电脑上运行。然而,本公开的程序产品不限于此,在本文件中,可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。
所述程序产品可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号 介质或者可读存储介质。可读存储介质例如可以为但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了可读程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。可读信号介质还可以是可读存储介质以外的任何可读介质,该可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。
可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、有线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言的任意组合来编写用于执行本公开操作的程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、C++等,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。
本领域技术人员在考虑说明书及实践这里公开的发明后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。

Claims (14)

  1. 一种视频图像分割方法,其特征在于,包括:
    利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;
    利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;
    利用调整后的图像分割模型对所述待分割图像进行分割。
  2. 根据权利要求1所述的视频图像分割方法,其特征在于,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整包括:
    判断所述待分割图像的待分割帧数是否超过预设阈值;
    在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
  3. 根据权利要求2所述的视频图像分割方法,其特征在于,在判断所述待分割图像的待分割帧数是否超过预设阈值之后,所述视频图像分割方法还包括:
    在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
  4. 根据权利要求3所述的视频图像分割方法,其特征在于,所述视频图像分割方法还包括:
    从所述待分割图像中提取所述预设帧。
  5. 根据权利要求4所述的视频图像分割方法,其特征在于,从所述待分割图像中提取所述预设帧包括:
    利用所述待分割图像的待分割帧数乘以预设值得到提取帧数;
    提取所述提取帧数的对应帧作为所述预设帧。
  6. 根据权利要求5所述的视频图像分割方法,其特征在于,所述预设值为0.6~0.9内的任意值。
  7. 一种视频图像分割装置,其特征在于,包括:
    机器学习模块,用于利用历史视频图像以及所述历史视频图像的掩模图像进行机器学习得到图像分割模型;
    模型调整模块,用于利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整;
    图像分割模块,用于利用调整后的图像分割模型对所述待分割图像进行分割。
  8. 根据权利要求7所述的视频图像分割装置,其特征在于,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整包 括:
    判断所述待分割图像的待分割帧数是否超过预设阈值;
    在判断所述待分割图像的待分割帧数未超过预设阈值时,利用待分割图像的第一帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
  9. 根据权利要求8所述的视频图像分割装置,其特征在于,在判断所述待分割图像的待分割帧数是否超过预设阈值之后,所述视频图像分割装置还包括:
    在判断所述待分割图像的待分割帧数超过预设阈值时,利用待分割图像的第一帧、预设帧、待分割帧的前一帧以及所述前一帧的掩模图像对所述图像分割模型进行调整。
  10. 根据权利要求9所述的视频图像分割装置,其特征在于,所述视频图像分割装置还包括:
    提取模块,用于从所述待分割图像中提取所述预设帧。
  11. 根据权利要求10所述的视频图像分割装置,其特征在于,从所述待分割图像中提取所述预设帧包括:
    利用所述待分割图像的待分割帧数乘以预设值得到提取帧数;
    提取所述提取帧数的对应帧作为所述预设帧。
  12. 根据权利要求10所述的视频图像分割装置,其特征在于,所述预设值为0.6~0.9内的任意值。
  13. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现权利要求1-6任一项所述的视频图像分割方法。
  14. 一种电子设备,其特征在于,包括:
    处理器;以及
    存储器,用于存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-6任一项所述的视频图像分割方法。
PCT/CN2018/107388 2017-10-24 2018-09-25 视频图像分割方法及装置、存储介质、电子设备 WO2019080685A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/757,760 US11227393B2 (en) 2017-10-24 2018-09-25 Video image segmentation method and apparatus, storage medium and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711003830.XA CN109697724B (zh) 2017-10-24 2017-10-24 视频图像分割方法及装置、存储介质、电子设备
CN201711003830.X 2017-10-24

Publications (1)

Publication Number Publication Date
WO2019080685A1 true WO2019080685A1 (zh) 2019-05-02

Family

ID=66228169

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/107388 WO2019080685A1 (zh) 2017-10-24 2018-09-25 视频图像分割方法及装置、存储介质、电子设备

Country Status (3)

Country Link
US (1) US11227393B2 (zh)
CN (1) CN109697724B (zh)
WO (1) WO2019080685A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340813A (zh) * 2020-02-25 2020-06-26 北京字节跳动网络技术有限公司 图像实例分割方法、装置、电子设备及存储介质
CN112862005A (zh) * 2021-03-19 2021-05-28 北京百度网讯科技有限公司 视频的分类方法、装置、电子设备和存储介质
CN113066092A (zh) * 2021-03-30 2021-07-02 联想(北京)有限公司 视频对象分割方法、装置及计算机设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110782469A (zh) * 2019-10-25 2020-02-11 北京达佳互联信息技术有限公司 一种视频帧图像分割方法、装置、电子设备及存储介质
CN110992367B (zh) * 2019-10-31 2024-02-02 北京交通大学 对带有遮挡区域的图像进行语义分割的方法
CN112884794B (zh) * 2019-11-29 2022-12-02 北京航空航天大学 生成图像方法、装置、电子设备和计算机可读介质
CN111223114B (zh) * 2020-01-09 2020-10-30 北京达佳互联信息技术有限公司 一种图像区域的分割方法、装置及电子设备
CN112906551A (zh) * 2021-02-09 2021-06-04 北京有竹居网络技术有限公司 视频处理方法、装置、存储介质及电子设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833492A (zh) * 2012-08-01 2012-12-19 天津大学 一种基于颜色相似度的视频场景分割方法
CN103325112A (zh) * 2013-06-07 2013-09-25 中国民航大学 动态场景中运动目标快速检测方法
CN105741269A (zh) * 2016-01-25 2016-07-06 中国科学院深圳先进技术研究院 视频切割的方法及装置
EP3128485A1 (en) * 2015-08-05 2017-02-08 Thomson Licensing Method and apparatus for hierarchical motion estimation using dfd-based image segmentation
US20170161905A1 (en) * 2015-12-07 2017-06-08 Avigilon Analytics Corporation System and method for background and foreground segmentation
CN107248162A (zh) * 2017-05-18 2017-10-13 杭州全景医学影像诊断有限公司 急性脑缺血图像分割模型的获得方法及急性脑缺血图像分割的方法

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9445713B2 (en) * 2013-09-05 2016-09-20 Cellscope, Inc. Apparatuses and methods for mobile imaging and analysis
US9584814B2 (en) * 2014-05-15 2017-02-28 Intel Corporation Content adaptive background foreground segmentation for video coding
CN105205834A (zh) * 2015-07-09 2015-12-30 湖南工业大学 一种基于混合高斯及阴影检测模型的目标检测与提取方法
CN105654508B (zh) * 2015-12-24 2018-06-01 武汉大学 基于自适应背景分割的监控视频移动目标跟踪方法及系统
EP3223237B1 (en) * 2016-03-22 2020-05-27 Tata Consultancy Services Limited Systems and methods for detecting and tracking a marker
CN106295716A (zh) * 2016-08-23 2017-01-04 广东工业大学 一种基于视频信息的交通运动目标分类方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102833492A (zh) * 2012-08-01 2012-12-19 天津大学 一种基于颜色相似度的视频场景分割方法
CN103325112A (zh) * 2013-06-07 2013-09-25 中国民航大学 动态场景中运动目标快速检测方法
EP3128485A1 (en) * 2015-08-05 2017-02-08 Thomson Licensing Method and apparatus for hierarchical motion estimation using dfd-based image segmentation
US20170161905A1 (en) * 2015-12-07 2017-06-08 Avigilon Analytics Corporation System and method for background and foreground segmentation
CN105741269A (zh) * 2016-01-25 2016-07-06 中国科学院深圳先进技术研究院 视频切割的方法及装置
CN107248162A (zh) * 2017-05-18 2017-10-13 杭州全景医学影像诊断有限公司 急性脑缺血图像分割模型的获得方法及急性脑缺血图像分割的方法

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111340813A (zh) * 2020-02-25 2020-06-26 北京字节跳动网络技术有限公司 图像实例分割方法、装置、电子设备及存储介质
CN111340813B (zh) * 2020-02-25 2023-09-01 北京字节跳动网络技术有限公司 图像实例分割方法、装置、电子设备及存储介质
CN112862005A (zh) * 2021-03-19 2021-05-28 北京百度网讯科技有限公司 视频的分类方法、装置、电子设备和存储介质
CN112862005B (zh) * 2021-03-19 2023-08-01 北京百度网讯科技有限公司 视频的分类方法、装置、电子设备和存储介质
CN113066092A (zh) * 2021-03-30 2021-07-02 联想(北京)有限公司 视频对象分割方法、装置及计算机设备

Also Published As

Publication number Publication date
US11227393B2 (en) 2022-01-18
CN109697724B (zh) 2021-02-26
CN109697724A (zh) 2019-04-30
US20200320712A1 (en) 2020-10-08

Similar Documents

Publication Publication Date Title
WO2019080685A1 (zh) 视频图像分割方法及装置、存储介质、电子设备
US11831566B2 (en) Method and apparatus for transmitting scene image of virtual scene, computer device, and computer-readable storage medium
WO2021004232A1 (zh) 机器翻译方法及装置、电子设备及存储介质
US20180253648A1 (en) Connectionist temporal classification using segmented labeled sequence data
US11521038B2 (en) Electronic apparatus and control method thereof
US10810993B2 (en) Sample-efficient adaptive text-to-speech
US9215539B2 (en) Sound data identification
US11748389B1 (en) Delegated decision tree evaluation
CN107281753B (zh) 场景音效混响控制方法及装置、存储介质及电子设备
CN107369453B (zh) 语音频码流的解码方法及装置
WO2022001027A1 (zh) 网络教学中投屏画面自适应的方法以及装置
US20160232914A1 (en) Sound Enhancement through Deverberation
US20210256226A1 (en) Interactive machine translation method, electronic device, and computer-readable storage medium
US20160154727A1 (en) System, method, and computer program to improve the productivity of unit testing
US10832129B2 (en) Transfer of an acoustic knowledge to a neural network
CN111862987B (zh) 语音识别方法和装置
CN111327926A (zh) 视频插帧方法、装置、电子设备及存储介质
WO2018086427A1 (zh) 一种控制智能设备的方法及装置
CN107278315B (zh) 运动模糊的快速自适应估计以用于相干渲染
WO2020131594A1 (en) Combined forward and backward extrapolation of lost network data
CN107924398A (zh) 用于提供以评论为中心的新闻阅读器的系统和方法
CN106663123A (zh) 以评论为中心的新闻阅读器
US10152507B2 (en) Finding of a target document in a spoken language processing
CN111970560A (zh) 视频获取方法、装置、电子设备及存储介质
CN114926322A (zh) 图像生成方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18871413

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 19.08.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18871413

Country of ref document: EP

Kind code of ref document: A1