CN105719297A - Object cutting method and device based on video - Google Patents

Object cutting method and device based on video Download PDF

Info

Publication number
CN105719297A
CN105719297A CN201610041711.2A CN201610041711A CN105719297A CN 105719297 A CN105719297 A CN 105719297A CN 201610041711 A CN201610041711 A CN 201610041711A CN 105719297 A CN105719297 A CN 105719297A
Authority
CN
China
Prior art keywords
cutting
video
image
pixel
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610041711.2A
Other languages
Chinese (zh)
Inventor
陈世峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201610041711.2A priority Critical patent/CN105719297A/en
Publication of CN105719297A publication Critical patent/CN105719297A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Image Analysis (AREA)

Abstract

The invention is applicable to the technical field of video processing and provides an object cutting method and device based on video.The method comprises the steps of extracting one frame of the video, and cutting a designated object out of the image through an image cutting algorithm; conducting feature learning on the object obtained through cutting, so that statistical characteristics of an object area, a non-object area and boundary of the object are obtained; based on the obtained statistical characteristics of the object, cutting the object in other images of the video through a conditional random field model.According to the object cutting method and device, based on the cutting result of the first image, the statistical characteristics of the object obtained after cutting are learned, cutting of the object in other frames of a video clip is further achieved through the conditional random field model, and thereby automatic cutting for any object in any video can be completed, so that object cutting is not limited by the conditions such as the background stationary, camera stationary, the foreground mobile or the known background and the like, and processing capacity of the object cutting algorithm is improved.

Description

Object cutting method and device based on video
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to a video-based object cutting method and device.
Background
The object segmentation technology in video plays an extremely important role in many application fields of computer vision, including application fields of video monitoring, video editing, video retrieval and the like. In order to improve the precision of object cutting, most algorithms in the prior art propose assumptions on video data, such as an assumption that a video background is still, an assumption that a camera remains still, and the like.
Disclosure of Invention
In view of this, embodiments of the present invention provide a video-based object cutting method and apparatus, so as to solve the problem that in the prior art, there is a limitation on an application scene of object cutting, and an object cutting operation cannot be completed for any video.
In a first aspect, a video-based object cutting method is provided, including:
extracting one frame of image of a video, and cutting out a specified object in the image through a graph cutting algorithm;
performing feature learning on the cut object to obtain statistical features of an object region, a non-object region and a boundary of the object;
and based on the acquired statistical characteristics of the object, cutting the object in other frame images of the video through a conditional random field model.
In a second aspect, there is provided a video-based object cutting apparatus comprising:
the first cutting unit is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
the characteristic learning unit is used for carrying out characteristic learning on the cut object and acquiring the statistical characteristics of an object region, a non-object region and a boundary of the object;
and the second cutting unit is used for cutting the object in other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
According to the embodiment of the invention, firstly, an object in a first frame image of a video segment is cut by using a simple graph cutting algorithm, then, the statistical characteristics of the cut object are learned based on the cutting result of the first frame image, and then, the object in other frames of the video segment is cut by using a conditional random field model, so that the automatic cutting of any object in any video can be completed, the object cutting is not limited by the conditions of static background, static camera, foreground motion or known background and the like, and the processing capacity of the object cutting algorithm is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a video-based object cutting method according to an embodiment of the present invention;
fig. 2 is a block diagram of a video-based object cutting apparatus according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
Fig. 1 shows an implementation flow of the video-based object cutting method provided by the embodiment of the present invention, which is detailed as follows:
in S101, one frame of image of the video is extracted, and a designated object in the image is cut out by a graph cutting algorithm.
Preferably, the first frame image of the video may be extracted according to the playing sequence of the video image frames, and the object specified by the user interaction in the image may be cut by the graph cutting algorithm.
In S102, feature learning is performed on the cut object, and statistical features of an object region, a non-object region, and a boundary of the object are acquired.
In the embodiment of the present invention, a Support Vector Machine (SVM) may be used as a classifier, and the color or brightness value of a pixel and the color or brightness value of an image block centered on the pixel are selected as image statistical features, so as to learn the statistical features of the object region, the non-object region, and the boundary of the object, respectively.
And taking the object cutting result of the first frame image of the video as training data, and learning by adopting an SVM (support vector machine). Suppose foIs a learned classification function in which sgn [ f [ ] iso(oi)]The +1 representative pixel i belongs to the object region, sgn [ f [ ]o(oi)]The-1 representation pixel i belongs to a non-object region, where sgn represents a sign function.
In S103, based on the obtained statistical features of the object, the object in other frame images of the video is cut through a conditional random field model.
Remember o ═ oi}i∈lFor an image frame to be processed, r ═ ri}i∈lIs the result of the segmentation of the image frame, where I is the set of pixels in the image frame, oiIs a statistical characteristic (i.e. luminance or color, etc.) of the pixel iCharacteristic) riIs the label for pixel i, which has a value of +1 or-1 (r)iThe +1 representative pixel i belongs to the object region, riA-1 representative pixel i belongs to a non-object region), whereby the object cut problem can be described as a problem in which all pixels in the video image frame solve for the optimal labeling.
With a Conditional Random Field (CRF) model, the solution of optimal labeling of pixels in a video image frame can be achieved by the following maximum a posteriori probability p (r | o):
wherein,is a neighborhood of pixel i, including a spatial neighborhoodAnd temporal neighborhoodFor each pixel, 8 spatial neighborhoods and 18 temporal neighborhoods are employed, Z being the discrimination function. The solution of the above maximum a posteriori probability can be translated into the solution of the following energy function:
in order to define u in equation (2), a classifier needs to be selected, and in the embodiment of the present invention, a Support Vector Machine (SVM) may be used as the classifier, and the color of the pixel and the color of the image block centered on the pixel are selected as the image statistical feature.
And taking the object cutting result of the first frame image of the video as training data, and learning by adopting an SVM (support vector machine). Suppose foIs a learned classification function, sgn [ f ] in classification applicationso(oi)]The +1 representative pixel i belongs to the object region, sgn [ f [ ]o(oi)]The-1 representation pixel i belongs to a non-object region, where sgn represents a sign function. By SVM, a hyperplane can be found that is the largest distance from the closest data point in the training data of the two classes, assuming m+1And m-1The maximum distance from the hyperplane in the two types of training data, respectively, u is defined as:
in formula (2), v is defined as the sum of three terms:
v ( r i , r j ) = v i j c + v i j b + v i j t , - - - ( 4 )
wherein:
v i j c = α · exp ( - | | c i - c j | | σ ) · | r i - r j | , - - - ( 5 )
here, i and j are pixel pairs in the spatial neighborhood, α and σ are control parameters, α is obtained by dividing a random 100-piece graph, and σ is automatically set to 2<||ci-cj||>Wherein<·>is a mean operation for the entire image.
Based on the initial image cutting result, selecting pixel pairs with different label values in all neighborhoods, and based on the selected data, calculating 3 2-dimensional histogram tables Hr,HgAnd HbCorresponding to R, G, B color channels, respectively. Defining:
v i j b = &alpha; &prime; &CenterDot; exp ( - H i j &sigma; &prime; ) &CenterDot; | r i - r j | , - - - ( 6 )
wherein, H i j = m a x { H r ( c i - c j ) , H g ( c i - c j ) , H b ( c i - c j ) } , α′=α,σ′=Npwherein N ispIs the number of pixel pairs selected from the initial result.
For theBy means of heelsContrast being defined similarly, i.e.Where i and j are different pairs of pixels in the temporal neighborhood.
And combining the above formulas to obtain a final energy function, and determining the optimal label of each pixel in each image frame of the video by minimizing the energy function, thereby completing the cutting of the object, wherein the solution of minimizing the energy function can be realized by a graph cutting algorithm.
According to the embodiment of the invention, firstly, an object in a first frame image of a video segment is cut by using a simple graph cutting algorithm, then, the statistical characteristics of the cut object are learned based on the cutting result of the first frame image, and then, the object in other frames of the video segment is cut by using a conditional random field model, so that the automatic cutting of any object in any video can be completed, the object cutting is not limited by the conditions of static background, static camera, foreground motion or known background and the like, and the processing capacity of the object cutting algorithm is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Corresponding to the video-based object cutting method described in the above embodiments, fig. 2 shows a structural block diagram of the video-based object cutting apparatus provided in the embodiments of the present invention, and for convenience of description, only the parts related to the present embodiment are shown.
Referring to fig. 2, the apparatus includes:
the first cutting unit 21 is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
a feature learning unit 22 configured to perform feature learning on the cut object, and acquire statistical features of an object region, a non-object region, and a boundary of the object;
and the second cutting unit 23 is configured to cut the object in the other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
Optionally, the second cutting unit 23 is configured to:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
Optionally, the first cutting unit 21 is specifically configured to:
a first frame image of the video is extracted.
Optionally, the feature learning unit 22 is specifically configured to:
and performing feature learning on the cut object through a support vector machine.
Optionally, the statistical features include a color or a luminance value of a pixel, and a color or a luminance value of an image block centered on the pixel.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be implemented in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.

Claims (10)

1. A video-based object segmentation method, comprising:
extracting one frame of image of a video, and cutting out a specified object in the image through a graph cutting algorithm;
performing feature learning on the cut object to obtain statistical features of an object region, a non-object region and a boundary of the object;
and based on the acquired statistical characteristics of the object, cutting the object in other frame images of the video through a conditional random field model.
2. The method of claim 1, wherein said segmenting the object in other frame images of the video by a conditional random field model based on the obtained statistical features of the object comprises:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
3. The method of claim 1, wherein the extracting one of the frames of the video comprises:
a first frame image of the video is extracted.
4. The method of claim 1, wherein the feature learning of the cut object comprises:
and performing feature learning on the cut object through a support vector machine.
5. The method of claim 1, wherein the statistical features comprise color or luminance values of a pixel and color or luminance values of image blocks centered on the pixel.
6. A video-based object cutting apparatus, comprising:
the first cutting unit is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
the characteristic learning unit is used for carrying out characteristic learning on the cut object and acquiring the statistical characteristics of an object region, a non-object region and a boundary of the object;
and the second cutting unit is used for cutting the object in other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
7. The apparatus of claim 6, wherein the second cutting unit is to:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
8. The device according to claim 6, characterized in that the first cutting unit is particularly adapted to:
a first frame image of the video is extracted.
9. The apparatus as claimed in claim 6, wherein the feature learning unit is specifically configured to:
and performing feature learning on the cut object through a support vector machine.
10. The apparatus of claim 6, wherein the statistical features comprise color or luminance values of a pixel and color or luminance values of an image block centered on the pixel.
CN201610041711.2A 2016-01-21 2016-01-21 Object cutting method and device based on video Pending CN105719297A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610041711.2A CN105719297A (en) 2016-01-21 2016-01-21 Object cutting method and device based on video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610041711.2A CN105719297A (en) 2016-01-21 2016-01-21 Object cutting method and device based on video

Publications (1)

Publication Number Publication Date
CN105719297A true CN105719297A (en) 2016-06-29

Family

ID=56154842

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610041711.2A Pending CN105719297A (en) 2016-01-21 2016-01-21 Object cutting method and device based on video

Country Status (1)

Country Link
CN (1) CN105719297A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085025A (en) * 2019-06-14 2020-12-15 阿里巴巴集团控股有限公司 Object segmentation method, device and equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689305A (en) * 2007-06-05 2010-03-31 微软公司 Learning object cutout from a single example
CN102044151A (en) * 2010-10-14 2011-05-04 吉林大学 Night vehicle video detection method based on illumination visibility identification
US20110200230A1 (en) * 2008-10-10 2011-08-18 Adc Automotive Distance Control Systems Gmbh Method and device for analyzing surrounding objects and/or surrounding scenes, such as for object and scene class segmenting
CN102902978A (en) * 2012-08-31 2013-01-30 电子科技大学 Object-oriented high-resolution remote-sensing image classification method
JP2013080433A (en) * 2011-10-05 2013-05-02 Nippon Telegr & Teleph Corp <Ntt> Gesture recognition device and program for the same
CN103810704A (en) * 2014-01-23 2014-05-21 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on support vector machine and discriminative random field
CN104751492A (en) * 2015-04-17 2015-07-01 中国科学院自动化研究所 Target area tracking method based on dynamic coupling condition random fields

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101689305A (en) * 2007-06-05 2010-03-31 微软公司 Learning object cutout from a single example
US20110200230A1 (en) * 2008-10-10 2011-08-18 Adc Automotive Distance Control Systems Gmbh Method and device for analyzing surrounding objects and/or surrounding scenes, such as for object and scene class segmenting
CN102044151A (en) * 2010-10-14 2011-05-04 吉林大学 Night vehicle video detection method based on illumination visibility identification
JP2013080433A (en) * 2011-10-05 2013-05-02 Nippon Telegr & Teleph Corp <Ntt> Gesture recognition device and program for the same
CN102902978A (en) * 2012-08-31 2013-01-30 电子科技大学 Object-oriented high-resolution remote-sensing image classification method
CN103810704A (en) * 2014-01-23 2014-05-21 西安电子科技大学 SAR (synthetic aperture radar) image change detection method based on support vector machine and discriminative random field
CN104751492A (en) * 2015-04-17 2015-07-01 中国科学院自动化研究所 Target area tracking method based on dynamic coupling condition random fields

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
PEI YIN 等: "Tree-based Classifiers for Bilayer Video Segmentation", 《2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN ESSA》 *
SHIFENG CHEN等: "Learning Boundary and Appearance for Video Object Cutout", 《IEEE SIGNAL PROCESSING LETTERS》 *
丁明跃: "《物联网识别技术》", 31 July 2012, 中国铁道出版社 *
李丽莎: "基于本体的图像检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
汪洪桥 等: "《模式分析的多核方法及其应用》", 31 March 2014, 国防工业出版社 *
郭磊 等: "基于支持向量机和条件随机场的MR图像分割", 《北京理工大学学报》 *
陈厚群等著: "《高拱坝抗震安全》", 31 January 2012, 中国电力出版社 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112085025A (en) * 2019-06-14 2020-12-15 阿里巴巴集团控股有限公司 Object segmentation method, device and equipment
CN112085025B (en) * 2019-06-14 2024-01-16 阿里巴巴集团控股有限公司 Object segmentation method, device and equipment

Similar Documents

Publication Publication Date Title
CN111192292B (en) Target tracking method and related equipment based on attention mechanism and twin network
CN111681273B (en) Image segmentation method and device, electronic equipment and readable storage medium
CN109214403B (en) Image recognition method, device and equipment and readable medium
CN112132156A (en) Multi-depth feature fusion image saliency target detection method and system
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
EP3005234A2 (en) Method and system for recognizing information
CN110889824A (en) Sample generation method and device, electronic equipment and computer readable storage medium
CN110991310B (en) Portrait detection method, device, electronic equipment and computer readable medium
CN111028170B (en) Image processing method, image processing apparatus, electronic device, and readable storage medium
CN110866896A (en) Image saliency target detection method based on k-means and level set super-pixel segmentation
CN108805838B (en) Image processing method, mobile terminal and computer readable storage medium
CN109035147B (en) Image processing method and device, electronic device, storage medium and computer equipment
CN112906794A (en) Target detection method, device, storage medium and terminal
CN114049499A (en) Target object detection method, apparatus and storage medium for continuous contour
WO2017135120A1 (en) Computationally efficient frame rate conversion system
CN112218005B (en) Video editing method based on artificial intelligence
CN111696080A (en) Face fraud detection method, system and storage medium based on static texture
CN114445651A (en) Training set construction method and device of semantic segmentation model and electronic equipment
CN105979283A (en) Video transcoding method and device
Zeeshan et al. A newly developed ground truth dataset for visual saliency in videos
CN114155172A (en) Image processing method and system
CN114758145A (en) Image desensitization method and device, electronic equipment and storage medium
CN108810319B (en) Image processing apparatus, image processing method, and program
CN105741269A (en) Video cutting method and device
CN111079624B (en) Sample information acquisition method and device, electronic equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160629