CN105719297A - Object cutting method and device based on video - Google Patents
Object cutting method and device based on video Download PDFInfo
- Publication number
- CN105719297A CN105719297A CN201610041711.2A CN201610041711A CN105719297A CN 105719297 A CN105719297 A CN 105719297A CN 201610041711 A CN201610041711 A CN 201610041711A CN 105719297 A CN105719297 A CN 105719297A
- Authority
- CN
- China
- Prior art keywords
- cutting
- video
- image
- pixel
- frame
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000005520 cutting process Methods 0.000 title claims abstract description 68
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000006870 function Effects 0.000 claims description 15
- 238000012706 support-vector machine Methods 0.000 claims description 12
- 230000011218 segmentation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 7
- 230000008569 process Effects 0.000 description 4
- 230000003068 static effect Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
Landscapes
- Image Analysis (AREA)
Abstract
The invention is applicable to the technical field of video processing and provides an object cutting method and device based on video.The method comprises the steps of extracting one frame of the video, and cutting a designated object out of the image through an image cutting algorithm; conducting feature learning on the object obtained through cutting, so that statistical characteristics of an object area, a non-object area and boundary of the object are obtained; based on the obtained statistical characteristics of the object, cutting the object in other images of the video through a conditional random field model.According to the object cutting method and device, based on the cutting result of the first image, the statistical characteristics of the object obtained after cutting are learned, cutting of the object in other frames of a video clip is further achieved through the conditional random field model, and thereby automatic cutting for any object in any video can be completed, so that object cutting is not limited by the conditions such as the background stationary, camera stationary, the foreground mobile or the known background and the like, and processing capacity of the object cutting algorithm is improved.
Description
Technical Field
The invention belongs to the technical field of video processing, and particularly relates to a video-based object cutting method and device.
Background
The object segmentation technology in video plays an extremely important role in many application fields of computer vision, including application fields of video monitoring, video editing, video retrieval and the like. In order to improve the precision of object cutting, most algorithms in the prior art propose assumptions on video data, such as an assumption that a video background is still, an assumption that a camera remains still, and the like.
Disclosure of Invention
In view of this, embodiments of the present invention provide a video-based object cutting method and apparatus, so as to solve the problem that in the prior art, there is a limitation on an application scene of object cutting, and an object cutting operation cannot be completed for any video.
In a first aspect, a video-based object cutting method is provided, including:
extracting one frame of image of a video, and cutting out a specified object in the image through a graph cutting algorithm;
performing feature learning on the cut object to obtain statistical features of an object region, a non-object region and a boundary of the object;
and based on the acquired statistical characteristics of the object, cutting the object in other frame images of the video through a conditional random field model.
In a second aspect, there is provided a video-based object cutting apparatus comprising:
the first cutting unit is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
the characteristic learning unit is used for carrying out characteristic learning on the cut object and acquiring the statistical characteristics of an object region, a non-object region and a boundary of the object;
and the second cutting unit is used for cutting the object in other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
According to the embodiment of the invention, firstly, an object in a first frame image of a video segment is cut by using a simple graph cutting algorithm, then, the statistical characteristics of the cut object are learned based on the cutting result of the first frame image, and then, the object in other frames of the video segment is cut by using a conditional random field model, so that the automatic cutting of any object in any video can be completed, the object cutting is not limited by the conditions of static background, static camera, foreground motion or known background and the like, and the processing capacity of the object cutting algorithm is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a video-based object cutting method according to an embodiment of the present invention;
fig. 2 is a block diagram of a video-based object cutting apparatus according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
Fig. 1 shows an implementation flow of the video-based object cutting method provided by the embodiment of the present invention, which is detailed as follows:
in S101, one frame of image of the video is extracted, and a designated object in the image is cut out by a graph cutting algorithm.
Preferably, the first frame image of the video may be extracted according to the playing sequence of the video image frames, and the object specified by the user interaction in the image may be cut by the graph cutting algorithm.
In S102, feature learning is performed on the cut object, and statistical features of an object region, a non-object region, and a boundary of the object are acquired.
In the embodiment of the present invention, a Support Vector Machine (SVM) may be used as a classifier, and the color or brightness value of a pixel and the color or brightness value of an image block centered on the pixel are selected as image statistical features, so as to learn the statistical features of the object region, the non-object region, and the boundary of the object, respectively.
And taking the object cutting result of the first frame image of the video as training data, and learning by adopting an SVM (support vector machine). Suppose foIs a learned classification function in which sgn [ f [ ] iso(oi)]The +1 representative pixel i belongs to the object region, sgn [ f [ ]o(oi)]The-1 representation pixel i belongs to a non-object region, where sgn represents a sign function.
In S103, based on the obtained statistical features of the object, the object in other frame images of the video is cut through a conditional random field model.
Remember o ═ oi}i∈lFor an image frame to be processed, r ═ ri}i∈lIs the result of the segmentation of the image frame, where I is the set of pixels in the image frame, oiIs a statistical characteristic (i.e. luminance or color, etc.) of the pixel iCharacteristic) riIs the label for pixel i, which has a value of +1 or-1 (r)iThe +1 representative pixel i belongs to the object region, riA-1 representative pixel i belongs to a non-object region), whereby the object cut problem can be described as a problem in which all pixels in the video image frame solve for the optimal labeling.
With a Conditional Random Field (CRF) model, the solution of optimal labeling of pixels in a video image frame can be achieved by the following maximum a posteriori probability p (r | o):
wherein,is a neighborhood of pixel i, including a spatial neighborhoodAnd temporal neighborhoodFor each pixel, 8 spatial neighborhoods and 18 temporal neighborhoods are employed, Z being the discrimination function. The solution of the above maximum a posteriori probability can be translated into the solution of the following energy function:
in order to define u in equation (2), a classifier needs to be selected, and in the embodiment of the present invention, a Support Vector Machine (SVM) may be used as the classifier, and the color of the pixel and the color of the image block centered on the pixel are selected as the image statistical feature.
And taking the object cutting result of the first frame image of the video as training data, and learning by adopting an SVM (support vector machine). Suppose foIs a learned classification function, sgn [ f ] in classification applicationso(oi)]The +1 representative pixel i belongs to the object region, sgn [ f [ ]o(oi)]The-1 representation pixel i belongs to a non-object region, where sgn represents a sign function. By SVM, a hyperplane can be found that is the largest distance from the closest data point in the training data of the two classes, assuming m+1And m-1The maximum distance from the hyperplane in the two types of training data, respectively, u is defined as:
in formula (2), v is defined as the sum of three terms:
wherein:
here, i and j are pixel pairs in the spatial neighborhood, α and σ are control parameters, α is obtained by dividing a random 100-piece graph, and σ is automatically set to 2<||ci-cj||>Wherein<·>is a mean operation for the entire image.
Based on the initial image cutting result, selecting pixel pairs with different label values in all neighborhoods, and based on the selected data, calculating 3 2-dimensional histogram tables Hr,HgAnd HbCorresponding to R, G, B color channels, respectively. Defining:
wherein, α′=α,σ′=Npwherein N ispIs the number of pixel pairs selected from the initial result.
For theBy means of heelsContrast being defined similarly, i.e.Where i and j are different pairs of pixels in the temporal neighborhood.
And combining the above formulas to obtain a final energy function, and determining the optimal label of each pixel in each image frame of the video by minimizing the energy function, thereby completing the cutting of the object, wherein the solution of minimizing the energy function can be realized by a graph cutting algorithm.
According to the embodiment of the invention, firstly, an object in a first frame image of a video segment is cut by using a simple graph cutting algorithm, then, the statistical characteristics of the cut object are learned based on the cutting result of the first frame image, and then, the object in other frames of the video segment is cut by using a conditional random field model, so that the automatic cutting of any object in any video can be completed, the object cutting is not limited by the conditions of static background, static camera, foreground motion or known background and the like, and the processing capacity of the object cutting algorithm is improved.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
Corresponding to the video-based object cutting method described in the above embodiments, fig. 2 shows a structural block diagram of the video-based object cutting apparatus provided in the embodiments of the present invention, and for convenience of description, only the parts related to the present embodiment are shown.
Referring to fig. 2, the apparatus includes:
the first cutting unit 21 is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
a feature learning unit 22 configured to perform feature learning on the cut object, and acquire statistical features of an object region, a non-object region, and a boundary of the object;
and the second cutting unit 23 is configured to cut the object in the other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
Optionally, the second cutting unit 23 is configured to:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
Optionally, the first cutting unit 21 is specifically configured to:
a first frame image of the video is extracted.
Optionally, the feature learning unit 22 is specifically configured to:
and performing feature learning on the cut object through a support vector machine.
Optionally, the statistical features include a color or a luminance value of a pixel, and a color or a luminance value of an image block centered on the pixel.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present invention may be implemented in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention are intended to be included within the scope of the present invention.
Claims (10)
1. A video-based object segmentation method, comprising:
extracting one frame of image of a video, and cutting out a specified object in the image through a graph cutting algorithm;
performing feature learning on the cut object to obtain statistical features of an object region, a non-object region and a boundary of the object;
and based on the acquired statistical characteristics of the object, cutting the object in other frame images of the video through a conditional random field model.
2. The method of claim 1, wherein said segmenting the object in other frame images of the video by a conditional random field model based on the obtained statistical features of the object comprises:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
3. The method of claim 1, wherein the extracting one of the frames of the video comprises:
a first frame image of the video is extracted.
4. The method of claim 1, wherein the feature learning of the cut object comprises:
and performing feature learning on the cut object through a support vector machine.
5. The method of claim 1, wherein the statistical features comprise color or luminance values of a pixel and color or luminance values of image blocks centered on the pixel.
6. A video-based object cutting apparatus, comprising:
the first cutting unit is used for extracting one frame of image of the video and cutting out a specified object in the image through a graph cutting algorithm;
the characteristic learning unit is used for carrying out characteristic learning on the cut object and acquiring the statistical characteristics of an object region, a non-object region and a boundary of the object;
and the second cutting unit is used for cutting the object in other frame images of the video through the conditional random field model based on the acquired statistical characteristics of the object.
7. The apparatus of claim 6, wherein the second cutting unit is to:
and minimizing a preset energy function to determine an optimal label of each pixel, wherein the label is used for indicating whether the corresponding pixel belongs to the object or not.
8. The device according to claim 6, characterized in that the first cutting unit is particularly adapted to:
a first frame image of the video is extracted.
9. The apparatus as claimed in claim 6, wherein the feature learning unit is specifically configured to:
and performing feature learning on the cut object through a support vector machine.
10. The apparatus of claim 6, wherein the statistical features comprise color or luminance values of a pixel and color or luminance values of an image block centered on the pixel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610041711.2A CN105719297A (en) | 2016-01-21 | 2016-01-21 | Object cutting method and device based on video |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610041711.2A CN105719297A (en) | 2016-01-21 | 2016-01-21 | Object cutting method and device based on video |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105719297A true CN105719297A (en) | 2016-06-29 |
Family
ID=56154842
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610041711.2A Pending CN105719297A (en) | 2016-01-21 | 2016-01-21 | Object cutting method and device based on video |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105719297A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085025A (en) * | 2019-06-14 | 2020-12-15 | 阿里巴巴集团控股有限公司 | Object segmentation method, device and equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101689305A (en) * | 2007-06-05 | 2010-03-31 | 微软公司 | Learning object cutout from a single example |
CN102044151A (en) * | 2010-10-14 | 2011-05-04 | 吉林大学 | Night vehicle video detection method based on illumination visibility identification |
US20110200230A1 (en) * | 2008-10-10 | 2011-08-18 | Adc Automotive Distance Control Systems Gmbh | Method and device for analyzing surrounding objects and/or surrounding scenes, such as for object and scene class segmenting |
CN102902978A (en) * | 2012-08-31 | 2013-01-30 | 电子科技大学 | Object-oriented high-resolution remote-sensing image classification method |
JP2013080433A (en) * | 2011-10-05 | 2013-05-02 | Nippon Telegr & Teleph Corp <Ntt> | Gesture recognition device and program for the same |
CN103810704A (en) * | 2014-01-23 | 2014-05-21 | 西安电子科技大学 | SAR (synthetic aperture radar) image change detection method based on support vector machine and discriminative random field |
CN104751492A (en) * | 2015-04-17 | 2015-07-01 | 中国科学院自动化研究所 | Target area tracking method based on dynamic coupling condition random fields |
-
2016
- 2016-01-21 CN CN201610041711.2A patent/CN105719297A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101689305A (en) * | 2007-06-05 | 2010-03-31 | 微软公司 | Learning object cutout from a single example |
US20110200230A1 (en) * | 2008-10-10 | 2011-08-18 | Adc Automotive Distance Control Systems Gmbh | Method and device for analyzing surrounding objects and/or surrounding scenes, such as for object and scene class segmenting |
CN102044151A (en) * | 2010-10-14 | 2011-05-04 | 吉林大学 | Night vehicle video detection method based on illumination visibility identification |
JP2013080433A (en) * | 2011-10-05 | 2013-05-02 | Nippon Telegr & Teleph Corp <Ntt> | Gesture recognition device and program for the same |
CN102902978A (en) * | 2012-08-31 | 2013-01-30 | 电子科技大学 | Object-oriented high-resolution remote-sensing image classification method |
CN103810704A (en) * | 2014-01-23 | 2014-05-21 | 西安电子科技大学 | SAR (synthetic aperture radar) image change detection method based on support vector machine and discriminative random field |
CN104751492A (en) * | 2015-04-17 | 2015-07-01 | 中国科学院自动化研究所 | Target area tracking method based on dynamic coupling condition random fields |
Non-Patent Citations (7)
Title |
---|
PEI YIN 等: "Tree-based Classifiers for Bilayer Video Segmentation", 《2007 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN ESSA》 * |
SHIFENG CHEN等: "Learning Boundary and Appearance for Video Object Cutout", 《IEEE SIGNAL PROCESSING LETTERS》 * |
丁明跃: "《物联网识别技术》", 31 July 2012, 中国铁道出版社 * |
李丽莎: "基于本体的图像检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
汪洪桥 等: "《模式分析的多核方法及其应用》", 31 March 2014, 国防工业出版社 * |
郭磊 等: "基于支持向量机和条件随机场的MR图像分割", 《北京理工大学学报》 * |
陈厚群等著: "《高拱坝抗震安全》", 31 January 2012, 中国电力出版社 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085025A (en) * | 2019-06-14 | 2020-12-15 | 阿里巴巴集团控股有限公司 | Object segmentation method, device and equipment |
CN112085025B (en) * | 2019-06-14 | 2024-01-16 | 阿里巴巴集团控股有限公司 | Object segmentation method, device and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111192292B (en) | Target tracking method and related equipment based on attention mechanism and twin network | |
CN111681273B (en) | Image segmentation method and device, electronic equipment and readable storage medium | |
CN109214403B (en) | Image recognition method, device and equipment and readable medium | |
CN112132156A (en) | Multi-depth feature fusion image saliency target detection method and system | |
CN111950723A (en) | Neural network model training method, image processing method, device and terminal equipment | |
EP3005234A2 (en) | Method and system for recognizing information | |
CN110889824A (en) | Sample generation method and device, electronic equipment and computer readable storage medium | |
CN110991310B (en) | Portrait detection method, device, electronic equipment and computer readable medium | |
CN111028170B (en) | Image processing method, image processing apparatus, electronic device, and readable storage medium | |
CN110866896A (en) | Image saliency target detection method based on k-means and level set super-pixel segmentation | |
CN108805838B (en) | Image processing method, mobile terminal and computer readable storage medium | |
CN109035147B (en) | Image processing method and device, electronic device, storage medium and computer equipment | |
CN112906794A (en) | Target detection method, device, storage medium and terminal | |
CN114049499A (en) | Target object detection method, apparatus and storage medium for continuous contour | |
WO2017135120A1 (en) | Computationally efficient frame rate conversion system | |
CN112218005B (en) | Video editing method based on artificial intelligence | |
CN111696080A (en) | Face fraud detection method, system and storage medium based on static texture | |
CN114445651A (en) | Training set construction method and device of semantic segmentation model and electronic equipment | |
CN105979283A (en) | Video transcoding method and device | |
Zeeshan et al. | A newly developed ground truth dataset for visual saliency in videos | |
CN114155172A (en) | Image processing method and system | |
CN114758145A (en) | Image desensitization method and device, electronic equipment and storage medium | |
CN108810319B (en) | Image processing apparatus, image processing method, and program | |
CN105741269A (en) | Video cutting method and device | |
CN111079624B (en) | Sample information acquisition method and device, electronic equipment and medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160629 |