WO2012162981A1 - Video character separation method and device - Google Patents

Video character separation method and device Download PDF

Info

Publication number
WO2012162981A1
WO2012162981A1 PCT/CN2011/079751 CN2011079751W WO2012162981A1 WO 2012162981 A1 WO2012162981 A1 WO 2012162981A1 CN 2011079751 W CN2011079751 W CN 2011079751W WO 2012162981 A1 WO2012162981 A1 WO 2012162981A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
foreground
background
image
probability
Prior art date
Application number
PCT/CN2011/079751
Other languages
French (fr)
Chinese (zh)
Inventor
刘志
史冉
丁保焱
薛银珠
杨胜
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2011/079751 priority Critical patent/WO2012162981A1/en
Priority to CN201180001853.1A priority patent/CN103119625B/en
Publication of WO2012162981A1 publication Critical patent/WO2012162981A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/167Detection; Localisation; Normalisation using comparisons between temporally consecutive images

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and apparatus for video character segmentation. Background technique
  • the object segmentation technique refers to separating the object of interest to the user on the video or image at the pixel level from the background, and the segmented object can be synthesized into a new background.
  • a Gaussian mixture model is used to establish a background color model, and then the video frame image is subtracted from the established background color model, and threshold segmentation is performed to obtain a color model of the foreground object.
  • the object segmentation is automatically segmented by the graph cut, and the cut image is smoothed by the morphological opening and closing operation to optimize the segmentation result.
  • the original RGB Red, Green, Blue
  • HSV Hue, Saturation, Value
  • Embodiments of the present invention provide a method and apparatus for video character segmentation, which can be applied to segmentation of various video character objects, and can segment a complete person object in real time.
  • a method for segmenting video characters including:
  • a map is constructed according to the respective probabilities, and a graph cut is performed to obtain a person object.
  • a device for video character segmentation comprising:
  • a first acquiring unit configured to perform face detection on the first frame video image to be processed, to obtain a human face region
  • a second acquiring unit configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area
  • a calculating unit configured to calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel in the video image is a foreground or a background;
  • An embodiment of the present invention provides a method and an apparatus for video character segmentation.
  • the face image of a person is obtained by performing face detection on a video image of a first frame to be processed, and acquiring a foreground seed pixel according to the face region of the character.
  • a background seed pixel according to the foreground seed pixel point and the background seed pixel point, respectively calculating a probability that each pixel point in the video image is a foreground or a background, constructing a graph according to the respective probabilities, and performing graph cut acquisition Character object.
  • FIG. 1 is a flowchart of a method for video character segmentation according to Embodiment 1 of the present invention
  • FIG. 2 is a block diagram of a device for video character segmentation according to Embodiment 1 of the present invention
  • 4 is a schematic diagram of video character segmentation according to Embodiment 2 of the present invention
  • Figure 5 is a schematic view showing the cutting of the figure provided in Embodiment 2 of the present invention.
  • FIG. 6 is a schematic diagram of determining a contour according to Embodiment 2 of the present invention.
  • FIG. 7 is a schematic diagram of luminance change detection according to Embodiment 2 of the present invention.
  • FIG. 8 is a block diagram of an apparatus for video character segmentation according to Embodiment 2 of the present invention. detailed description
  • An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 1 , the method includes: Step 1 01: Performing a face detection on a video image of a first frame to be processed to obtain a face region of a person; The first frame image in the video image to be processed is processed, and when it is not the first frame video image, the video image can be quickly segmented according to the correlation between adjacent video frames.
  • Step 1 02 Obtain a foreground seed pixel point and a background seed pixel point according to the face area of the character;
  • Step 1 03 Calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel point in the video image is a foreground or a background;
  • Step 1 04 construct a graph according to the respective probabilities, and perform graph cut to obtain a character object.
  • An embodiment of the present invention provides a method for segmenting a video character.
  • the face image of a person is obtained by performing face detection on a first frame of the video image to be processed, and acquiring a foreground seed pixel and a background seed according to the face region of the character.
  • the prior art which is used when segmenting video characters is not adaptable to various types of videos, and when the object segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, and the embodiment of the present invention provides Plan can It is suitable for segmentation of various video characters, and can segment complete character objects in real time.
  • the embodiment of the present invention provides a device for video character segmentation.
  • the device includes: a first acquiring unit 201, a second acquiring unit 202, a calculating unit 203, and a processing unit 204.
  • the first acquiring unit 201 is configured to perform face detection on the first frame video image to be processed to obtain a face region of the person;
  • a second acquiring unit 202 configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area
  • the calculating unit 203 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;
  • the processing unit 204 is configured to construct a map according to the respective probabilities, and perform a graph cut to obtain a person.
  • the apparatus for providing a video character segmentation is provided in the embodiment of the present invention, and the first frame of the video image to be processed is performed by the first acquiring unit. Detecting, acquiring a face region of a person, according to the face region of the character, the second acquiring unit acquires a foreground seed pixel point and a background seed pixel point, and then the calculating unit respectively calculates a probability that each pixel point in the video image is a foreground or a background
  • the processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object.
  • the number of components of the Gaussian mixture model is manually set, the adaptability to various types of video is not strong, and the complete character object cannot be segmented, which is provided by the embodiment of the present invention.
  • the scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time.
  • An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 3, the method includes: Step 301: Determine whether a video frame image to be processed is a first frame.
  • the purpose of determining whether the current video frame image to be processed is the first frame is that when the current video frame image is not the first frame, the current frame image may be processed according to the result of the segmented video character object of the previous frame, that is, according to the adjacent video frame.
  • the correlation of the images is processed, which speeds up the processing.
  • Step 302 When the image of the video frame to be processed is the first frame, perform a face detection on the first frame of the video image to be processed, and obtain a face region of the character; Specifically, the AdaBoost algorithm is used for face detection.
  • Adaboos t is an iterative algorithm. The core idea is to train different classifiers for the same training set, which can be called weak classifiers, and then combine these weak classifiers. , constitutes a stronger final classifier (strong classifier).
  • Performing face detection that is, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, The face area is judged by the group of classifiers, and the detected face area of the person is as shown in the rectangular area in Fig. 4 (a).
  • Step 303 Obtain a foreground seed pixel point and a background seed pixel point according to the character face area.
  • a moderate adjustment is made to generate a foreground sample model and a background sample model.
  • the face area of the person is appropriately reduced, and then the distance between the face area of the person and the upper body area is determined according to the height of the face area of the person, according to the head and shoulder of the person.
  • the ratio of the width determines the area of the upper body, so that the foreground model can be generated.
  • the pixel in the area included in the light-colored line in Figure 4 (b) is the foreground seed pixel;
  • a background sample model is generated, and the dark dotted line in Fig. 4(c) and the pixel points in the area included in the image boundary are Seed pixels for the background.
  • Step 304 Determine, according to the foreground seed pixel points, three sets of sample values of the foreground seed pixel points on three color components of L, a, and b, and determine the background seed pixel points according to the background seed pixel points respectively. Three sets of sample values on three color components of L, a, b;
  • the solution provided by the embodiment of the present invention converts a video image from RGB (Red, Green, Blue, Red, Green, Blue) space to Lab space
  • Lab consists of three channels, and L channel is a luminance channel, a channel and b channel.
  • L channel is a luminance channel, a channel and b channel.
  • a represents the range from magenta to green
  • b represents the range from yellow to blue.
  • the three color components L, a, and b are independent of each other.
  • three sets of sample values are obtained on the three color components L, a, b, ⁇ a , a 2 F ,..., adon F j, ⁇ «".
  • Step 305 Calculate, according to sample values of the foreground seed pixel point and the background seed pixel point, a first foreground probability and a first background probability of each pixel point in the video image.
  • f/(x), f( x ), respectively are calculated according to sample values of the foreground seed pixel point and the background seed pixel point. f (x), f (x);
  • Xi represents an i-th foreground seed pixel point or an i-th background seed pixel point, and X represents any one of the video images
  • Step 306 Normalize the first foreground probability and the first background probability, and calculate a probability that each pixel in the video image is a foreground or a background.
  • each image in the video image of the first frame is The prime point is processed to obtain the probability that each pixel is foreground or background.
  • the higher the pixel value, the brighter the pixel the greater the probability that the pixel is foreground.
  • the darker the pixel The greater the probability of it being the background.
  • Step 307 construct a map according to the respective probabilities, and perform graph cutting to obtain a character object;
  • V Ver tex, vertex
  • Vl the ith vertex
  • the jth vertex
  • E (Edge, edge) refers to the join of the two associated vertices in graph G line
  • ⁇ 3 ⁇ 4 represents a linking i, j edge vertices
  • W (we i ght, right) refers to a value assigned to the edge connecting two vertices, which represents how closely the relationship between these two vertices
  • w ⁇ represents The weight of the edge connecting i, j vertices.
  • the solution provided by the embodiment of the present invention performs the graph cutting by using the maximum stream minimum cut algorithm.
  • all the vertices of the graph are divided into two subsets, and the edges between the two subsets constitute a cut of the graph. This is shown by the dotted line in Figure 5 (b).
  • the two subsets respectively include a virtual source point and a virtual sink point, the source point corresponds to the foreground seed pixel point, and the sink point corresponds to the background seed pixel point. All the cuts with the smallest cut weight from the source point to the sink point are called minimum cuts.
  • a basic way to find the minimum cut is to find the maximum flow from the source point to the sink point, that is, the edge connecting the two vertices is regarded as A water pipe, the weight of the side is the capacity of the water pipe.
  • the so-called maximum flow is the maximum water flow that can be passed from the source point to the sink point.
  • the water pipes that are completely filled are the source and sink points. Minimal cut.
  • a map is constructed by constructing energy items between pixels, specifically, a pixel in a video frame image corresponds to a vertex of the graph, and an edge of the graph is connected to two adjacent pixels correspondingly, and each edge is allocated.
  • a weight indicates the relationship between the two pixels connected to the edge, such as the degree of similarity between the colors and the relationship between the source and the sink.
  • the foreground probability of the pixel indicates the relationship with the source, and the background probability of the pixel. Indicates the relationship with the Meeting Point.
  • the source point and the sink point respectively represent the foreground seed pixel point and the background seed pixel point.
  • B is a binary variable
  • the problem of segmenting a person object in a video frame image may be converted into a problem of segmentation of a graph to be constructed.
  • the maximum stream minimum cut algorithm may be used to perform graph cut, thereby obtaining a character object.
  • Steps 302 to 307 are processes for processing the image of the first frame.
  • the image to be processed is not the image of the first frame, if the processing of steps 302 to 307 is performed for each frame of image, it will be quite expensive. Time, so when processing the entire video sequence, the following process is further taken.
  • Step 308 When the video frame image to be processed is not the first frame, perform a brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image.
  • Whether the non-parametric model of the foreground/background needs to be updated depends mainly on the change of the scene.
  • One of the main factors is the change of brightness.
  • the change of brightness may be caused by the change of the surrounding environment, or may be caused by the video collection device. It will result in a foreground I background probability calculated using the current non-parametric model that does not fit well with the current video frame.
  • the brightness change detection mainly utilizes the Bha t tacha ryya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame.
  • H i) is the value of the histogram at the gray level i
  • H. (i) is a histogram H.
  • Step 309 determining whether the brightness difference distance is less than a preset threshold
  • the preset threshold is determined experimentally and can be 0.1.
  • the processing is performed according to the current image to be processed as the processing method of the first frame video image; as shown in FIG. 7 , the continuous two frames of the brightness difference distance greater than the preset threshold is the histogram Therefore, the processing is performed in accordance with the processing of steps 302-307.
  • Step 310 Determine, when the brightness difference distance is less than a preset threshold, a contour of a person object of the current video frame image according to a contour of a person object of the video image of the previous frame;
  • At least one key point on the contour of the character object of the previous frame video image is extracted; and the feature point whose direction is abruptly changed on the contour of the object may be extracted, and then Proportional sampling is performed to obtain a suitable number of key points, and the starting point, ending point, and feature points closer to the bottom of the image are also selected as key points. Dozens of dark dots in the gray banded area as shown in Fig. 6 are the key points.
  • represents the position of X
  • denotes the motion vector estimated for pixel X
  • ( 2x4+1 ) * ( 2x4+1 ) 81 possible motion quantities.
  • an energy function E value can be calculated, and the motion vector corresponding to the smallest E value is selected as the pixel X.
  • the motion vector in this way, can obtain the corresponding key point of the pixel X in the current frame.
  • the key point is the key point of the goal.
  • the at least one target key point is connected to obtain a character object outline of the current video frame image.
  • Step 311 Update, according to the contour of the character object, a probability that each pixel in the current video frame image is a foreground or a background;
  • the currently determined character object contour is an approximate character object contour, as shown in FIG. 6, the white area is the foreground, the black area is the background, and the gray strip area is the uncertain area, that is, the gray strip area may
  • the foreground it may also be a background
  • the non-parametric model of the foreground/background of the video image of the previous frame the non-parametric model of the foreground/background of each pixel in the current video frame image is updated, that is, each pixel is determined to be foreground or
  • the probability of the background specifically, the probability of each pixel point being the foreground or background is calculated according to the methods of steps 305 and 306.
  • Step 312 Construct a map according to the respective probabilities, and perform a graph cut to obtain a character object of the current video frame image.
  • the person object of the video frame image is cut according to the method of step 307.
  • a video character segmentation method provided by an embodiment of the present invention by using a video frame image
  • the segmentation of the object object when the prior art is used, the adaptability to the various types of video is not strong, and when the segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, which is provided by the embodiment of the present invention.
  • the scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time, and the entire video can be quickly segmented based on the correlation between adjacent video frames.
  • An embodiment of the present invention provides a device for video character segmentation.
  • the device includes: a determining unit 801, a first obtaining unit 802, a second obtaining unit 803, a calculating unit 804, a determining module 805, and a first calculating.
  • the determining unit 801 is configured to determine whether the image of the video frame to be processed is the first frame
  • the first acquiring unit 802 is configured to perform face detection on the first frame of the video image to be processed to obtain a face region of the person; and use the AdaBoos t algorithm to perform the face. Detecting, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, using the group The classifier determines the face area.
  • the second acquiring unit 803 acquires a foreground seed pixel point and a background seed pixel point;
  • the facial region of the person acquired by the first obtaining unit 802 is moderately adjusted, that is, the facial region of the person is appropriately reduced, and then the distance between the facial region and the upper body region of the human is determined according to the height of the facial region of the human.
  • the area of the upper body is determined according to the ratio of the width of the head and the shoulder of the person, so that the foreground model is generated, wherein the pixel included in the foreground model is the foreground seed pixel;
  • a background sample model is generated, wherein the pixel points included in the background sample model are background seed pixels.
  • the calculating unit 804 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;
  • the determining module 805 of the calculating unit 804 is configured to respectively determine three sets of sample values of the foreground seed pixel points on the three color components of L, a, b according to the foreground seed pixel points, according to the background seed The pixel points respectively determine three sets of sample values of the background seed pixel points on the three color components of L, a, b;
  • a first calculating module 806, configured to separately calculate a first foreground probability and a first background probability of each pixel in the video image according to the foreground value of the foreground seed pixel and the background seed pixel;
  • the first calculating submodule 807 in the first calculating module 806 is configured to calculate f/(x), f (x) respectively according to sample values of the foreground seed pixel point and the background seed pixel point. ), f ( f (x), f (x), f (x); where x represents any pixel in the video image, f/( x ), f/(x), f (x) respectively Representing the foreground probability of the pixel points on the three color components a and b, f (x), f (x), and f (x) respectively represent the background scene of the pixel points on the three color components a and b Probability
  • a second calculation sub-module 808, configured to Calculating a first foreground probability of any one of the pixel points in the video image
  • the first foreground probability and the first background probability are normalized, and the second computing module 809 calculates a probability that each pixel in the video image is a foreground or a background;
  • the processing unit 812 After determining the foreground/background probability of the pixel in the current video frame image, the processing unit 812 constructs a map according to the respective probabilities, and performs graph cutting to obtain the character object;
  • the detection acquiring unit 813 for the video frame Performing a brightness change detection on the image to obtain a brightness difference distance between the current video frame image and the previous frame video image;
  • the luminance change detection mainly utilizes the Bha t tacharyya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame.
  • H i is the value of the histogram at the gray level i
  • H. (i) is a histogram H.
  • the determining unit 813 determines the character object contour of the current video frame image according to the character object contour of the previous frame video image;
  • the extracting module 815 in the determining unit 814 is configured to extract at least one key point on the contour of the character object of the video image of the previous frame according to the binary image of the segmentation result of the video image of the previous frame. ;
  • the first determining module 816 determines a corresponding key point of each of the key points in the current video frame image; according to a distance and a slope change between two adjacent key points
  • the second determining module 817 determines at least one target key point
  • An obtaining module 818 configured to connect the at least one target key point to obtain a character object contour of the current video frame image
  • the updating unit 819 updates a probability that each pixel point in the current video frame image is a foreground or a background; and constructs a map according to the updated respective probability, the processing unit 812 is further configured to perform The graph cut acquires the character object of the current video frame image.
  • An embodiment of the present invention provides a device for segmenting a video object, by using a first acquiring unit to perform face detection on a first frame of a video image to be processed, to obtain a face region of a person, according to the face region of the character, a second acquiring unit. Obtaining a foreground seed pixel and a background seed pixel, and then calculating a probability that each pixel in the video image is a foreground or a background, the processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object.

Abstract

The present invention relates to the technical field of communication. Disclosed are a video character separation method and device, applicable to separation of various video character objects and capable of separating a complete character object in real time. In the technical solution provided by the embodiments of the present invention, human face detection is performed on a first frame of video image to be processed, so as to obtain the character face area, foreground seed pixels and background seed pixels are obtained according to the character face area, the probability of each pixel being the foreground or the background in the video image is calculated respectively according to the foreground seed pixels and the background seed pixels, an image is constructed according to each probability, and image cutting is performed to obtain the character object. The solution provided by the embodiments of the present invention is applicable to video object separation.

Description

一种视频人物分割的方法及装置 技术领域  Method and device for video character segmentation
本发明涉及通信技术领域, 尤其涉及一种视频人物分割的方法及装置。 背景技术  The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for video character segmentation. Background technique
对象分割技术是指将视频或者图像上用户感兴趣的对象在像素级上与背 景实现分离, 分割出来的对象可以合成到新的背景中去。 现有的对人物对象 自动分割的技术中, 釆用高斯混合模型建立背景的颜色模型, 然后将视频帧 图像与建立的背景的颜色模型相减, 并进行阈值分割, 获得前景对象的颜色 模型。 利用前景 /背景的颜色模型以及相邻像素之间的颜色差异构造图, 通过 图切割来实现人物对象的自动分割, 釆用形态学开闭运算对切割出来的图像 进行平滑处理以优化分割结果。在对人物对象自动分割的技术中,釆用了 HSV ( Hue, Saturation, Value, 色相、饱和度、 色明度)颜色空间取代原始的 RGB ( Red, Green, Blue, 红、 绿、 蓝)颜色空间, 以减弱亮度变化对于分割质 量的影响。  The object segmentation technique refers to separating the object of interest to the user on the video or image at the pixel level from the background, and the segmented object can be synthesized into a new background. In the existing technique for automatically segmenting a human object, a Gaussian mixture model is used to establish a background color model, and then the video frame image is subtracted from the established background color model, and threshold segmentation is performed to obtain a color model of the foreground object. Using the foreground/background color model and the color difference structure between adjacent pixels, the object segmentation is automatically segmented by the graph cut, and the cut image is smoothed by the morphological opening and closing operation to optimize the segmentation result. In the technique of automatic segmentation of character objects, the original RGB (Red, Green, Blue, Red, Green, Blue) color space is replaced by the HSV (Hue, Saturation, Value) color space. To reduce the effect of brightness changes on the quality of the segmentation.
然而, 釆用现有技术对视频人物进行分割时, 高斯混合模型的分量数目 由人工设定, 对各类视频的适应性不强, 并且釆用开闭运算优化对象分割结 果时, 不能分割出完整的人物对象。 发明内容  However, when the video character is segmented by the prior art, the number of components of the Gaussian mixture model is manually set, and the adaptability to various types of video is not strong, and when the opening and closing operation is used to optimize the segmentation result of the object, the segmentation result cannot be segmented. Complete character object. Summary of the invention
本发明的实施例提供一种视频人物分割的方法及装置, 可以适用于各类 视频人物对象的分割, 并且可以实时地分割出完整人物对象。  Embodiments of the present invention provide a method and apparatus for video character segmentation, which can be applied to segmentation of various video character objects, and can segment a complete person object in real time.
为达到上述目的, 本发明的实施例釆用如下技术方案:  In order to achieve the above object, embodiments of the present invention use the following technical solutions:
一种视频人物分割的方法, 包括:  A method for segmenting video characters, including:
将待处理的第一帧视频图像进行人脸检测, 获取人物脸部区域; 根据所述人物脸部区域, 获取前景种子像素点和背景种子像素点; 根据所述前景种子像素点和所述背景种子像素点, 分别计算所述视频图 像中各个像素点为前景或者背景的概率; Performing face detection on the first frame of the video image to be processed to obtain a face region of the person; acquiring a foreground seed pixel point and a background seed pixel point according to the face region of the person; according to the foreground seed pixel point and the background Seed pixel points, respectively calculating the video map The probability that each pixel in the image is foreground or background;
根据所述各个概率构建图, 并进行图切割获取人物对象。  A map is constructed according to the respective probabilities, and a graph cut is performed to obtain a person object.
一种视频人物分割的装置, 包括:  A device for video character segmentation, comprising:
第一获取单元, 用于将待处理的第一帧视频图像进行人脸检测, 获取人 物脸部区域;  a first acquiring unit, configured to perform face detection on the first frame video image to be processed, to obtain a human face region;
第二获取单元, 用于根据所述人物脸部区域, 获取前景种子像素点和背 景种子像素点;  a second acquiring unit, configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area;
计算单元, 用于根据所述前景种子像素点和所述背景种子像素点, 分别 计算所述视频图像中各个像素点为前景或者背景的概率;  a calculating unit, configured to calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel in the video image is a foreground or a background;
处理单元, 用于根据所述各个概率构建图, 并进行图切割获取人物对象。 本发明实施例提供一种视频人物分割的方法及装置, 通过将待处理的第 一帧视频图像进行人脸检测, 获取人物脸部区域, 根据所述人物脸部区域, 获取前景种子像素点和背景种子像素点, 根据所述前景种子像素点和所述背 景种子像素点, 分别计算所述视频图像中各个像素点为前景或者背景的概率, 根据所述各个概率构建图, 并进行图切割获取人物对象。 与现有技术中对视 频人物进行分割时, 高斯混合模型的分量数目由人工设定, 对各类视频的适 应性不强, 并且釆用开闭运算优化对象分割结果时, 不能分割出完整的人物 对象相比, 本发明实施例提供的方案可以适用于各类视频人物的分割, 并且 可以实时地分割出完整人物对象。 附图说明 施例或现有技术描述中所需要使用的附图作简单地介绍, 显而易见地, 下面 描述中的附图仅仅是本发明的一些实施例, 对于本领域普通技术人员来讲, 在不付出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。  a processing unit, configured to construct a map according to the respective probabilities, and perform graph cutting to obtain a character object. An embodiment of the present invention provides a method and an apparatus for video character segmentation. The face image of a person is obtained by performing face detection on a video image of a first frame to be processed, and acquiring a foreground seed pixel according to the face region of the character. a background seed pixel, according to the foreground seed pixel point and the background seed pixel point, respectively calculating a probability that each pixel point in the video image is a foreground or a background, constructing a graph according to the respective probabilities, and performing graph cut acquisition Character object. When segmenting a video character in the prior art, the number of components of the Gaussian mixture model is manually set, and the adaptability to various types of video is not strong, and when the opening and closing operation is used to optimize the segmentation result of the object, the complete segmentation cannot be separated. Compared with the character object, the solution provided by the embodiment of the present invention can be applied to segmentation of various video characters, and the complete character object can be segmented in real time. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set forth in the description of the claims Other drawings may also be obtained from these drawings without the inventive labor.
图 1为本发明实施例 1提供的一种视频人物分割的方法的流程图; 图 2为本发明实施例 1提供的一种视频人物分割的装置的框图; 图 3为本发明实施例 2提供的一种视频人物分割的方法的流程图; 图 4为本发明实施例 2提供的视频人物分割的示意图; 1 is a flowchart of a method for video character segmentation according to Embodiment 1 of the present invention; FIG. 2 is a block diagram of a device for video character segmentation according to Embodiment 1 of the present invention; A flowchart of a method of video character segmentation; 4 is a schematic diagram of video character segmentation according to Embodiment 2 of the present invention;
图 5为本发明实施例 2提供的图切割的示意图;  Figure 5 is a schematic view showing the cutting of the figure provided in Embodiment 2 of the present invention;
图 6为本发明实施例 2提供的确定轮廓的示意图;  6 is a schematic diagram of determining a contour according to Embodiment 2 of the present invention;
图 7为本发明实施例 2提供的亮度变化检测示意图;  7 is a schematic diagram of luminance change detection according to Embodiment 2 of the present invention;
图 8为本发明实施例 2提供的一种视频人物分割的装置的框图。 具体实施方式  FIG. 8 is a block diagram of an apparatus for video character segmentation according to Embodiment 2 of the present invention. detailed description
下面将结合本发明实施例中的附图, 对本发明实施例中的技术方案进行 清楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而 不是全部的实施例。 基于本发明中的实施例, 本领域普通技术人员在没有作 出创造性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。  The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.
实施例 1  Example 1
本发明实施例提供一种视频人物分割的方法, 如图 1所示, 该方法包括: 步骤 1 01 ,将待处理的第一帧视频图像进行人脸检测,获取人物脸部区域; 这里只是将待处理视频图像中的第一帧图像进行处理, 当不是第一帧视 频图像时, 可以根据相邻视频帧之间的相关性, 快速地分割视频图像。  An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 1 , the method includes: Step 1 01: Performing a face detection on a video image of a first frame to be processed to obtain a face region of a person; The first frame image in the video image to be processed is processed, and when it is not the first frame video image, the video image can be quickly segmented according to the correlation between adjacent video frames.
步骤 1 02 , 根据所述人物脸部区域, 获取前景种子像素点和背景种子像素 点;  Step 1 02: Obtain a foreground seed pixel point and a background seed pixel point according to the face area of the character;
步骤 1 03 , 根据所述前景种子像素点和所述背景种子像素点, 分别计算所 述视频图像中各个像素点为前景或者背景的概率;  Step 1 03: Calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel point in the video image is a foreground or a background;
步骤 1 04 , 根据所述各个概率构建图, 并进行图切割获取人物对象。  Step 1 04, construct a graph according to the respective probabilities, and perform graph cut to obtain a character object.
本发明实施例提供一种视频人物分割的方法, 通过将待处理的第一帧视 频图像进行人脸检测, 获取人物脸部区域, 根据所述人物脸部区域, 获取前 景种子像素点和背景种子像素点, 根据所述前景种子像素点和所述背景种子 像素点, 分别计算所述视频图像中各个像素点为前景或者背景的概率, 根据 所述各个概率构建图, 并进行图切割获取人物对象。 对视频人物进行分割时 釆用的现有技术, 对各类视频的适应性不强, 并且釆用开闭运算优化对象分 割结果时, 不能分割出完整的人物对象相比, 本发明实施例提供的方案可以 适用于各类视频人物的分割, 并且可以实时地分割出完整人物对象。 An embodiment of the present invention provides a method for segmenting a video character. The face image of a person is obtained by performing face detection on a first frame of the video image to be processed, and acquiring a foreground seed pixel and a background seed according to the face region of the character. a pixel, according to the foreground seed pixel point and the background seed pixel point, respectively calculating a probability that each pixel point in the video image is a foreground or a background, constructing a graph according to the respective probabilities, and performing graph cutting to obtain a character object . The prior art which is used when segmenting video characters is not adaptable to various types of videos, and when the object segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, and the embodiment of the present invention provides Plan can It is suitable for segmentation of various video characters, and can segment complete character objects in real time.
本发明实施例提供一种视频人物分割的装置, 如图 2所示, 该装置包括: 第一获取单元 201 , 第二获取单元 202 , 计算单元 203 , 处理单元 204 。  The embodiment of the present invention provides a device for video character segmentation. As shown in FIG. 2, the device includes: a first acquiring unit 201, a second acquiring unit 202, a calculating unit 203, and a processing unit 204.
第一获取单元 201 , 用于将待处理的第一帧视频图像进行人脸检测, 获取 人物脸部区域;  The first acquiring unit 201 is configured to perform face detection on the first frame video image to be processed to obtain a face region of the person;
第二获取单元 202 , 用于根据所述人物脸部区域, 获取前景种子像素点和 背景种子像素点;  a second acquiring unit 202, configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area;
计算单元 203 , 用于根据所述前景种子像素点和所述背景种子像素点, 分 别计算所述视频图像中各个像素点为前景或者背景的概率;  The calculating unit 203 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;
处理单元 204 , 用于根据所述各个概率构建图, 并进行图切割获取人物对 本发明实施例提供一种视频人物分割的装置, 通过第一获取单元将待处 理的第一帧视频图像进行人脸检测, 获取人物脸部区域, 根据所述人物脸部 区域, 第二获取单元获取前景种子像素点和背景种子像素点, 然后计算单元 分别计算所述视频图像中各个像素点为前景或者背景的概率, 处理单元根据 所述各个概率构建图, 并进行图切割获取人物对象。 与现有技术中对视频人 物进行分割时, 高斯混合模型的分量数目由人工设定, 对各类视频的适应性 不强, 并且不能分割出完整的人物对象相比, 本发明实施例提供的方案可以 适用于各类视频人物的分割, 并且可以实时地分割出完整人物对象。  The processing unit 204 is configured to construct a map according to the respective probabilities, and perform a graph cut to obtain a person. The apparatus for providing a video character segmentation is provided in the embodiment of the present invention, and the first frame of the video image to be processed is performed by the first acquiring unit. Detecting, acquiring a face region of a person, according to the face region of the character, the second acquiring unit acquires a foreground seed pixel point and a background seed pixel point, and then the calculating unit respectively calculates a probability that each pixel point in the video image is a foreground or a background The processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object. When the video character is segmented in the prior art, the number of components of the Gaussian mixture model is manually set, the adaptability to various types of video is not strong, and the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time.
实施例 2  Example 2
本发明实施例提供一种视频人物分割的方法, 如图 3所示, 该方法包括: 步骤 301 , 判断待处理的视频帧图像是否为第一帧;  An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 3, the method includes: Step 301: Determine whether a video frame image to be processed is a first frame.
判断当前待处理的视频帧图像是否为第一帧的目的为, 当前视频帧图像 不是第一帧时, 可以根据前一帧的分割视频人物对象的结果处理当前帧图像, 即根据相邻视频帧图像的相关性进行处理, 这样可以加快处理的速度。  The purpose of determining whether the current video frame image to be processed is the first frame is that when the current video frame image is not the first frame, the current frame image may be processed according to the result of the segmented video character object of the previous frame, that is, according to the adjacent video frame. The correlation of the images is processed, which speeds up the processing.
步骤 302 , 当待处理的视频帧图像为第一帧时, 将待处理的第一帧视频图 像进行人脸检测, 获取人物脸部区域; 具体地, 釆用 AdaBoost算法进行人脸检测, Adaboos t是一种迭代算法, 其核心思想是针对同一个训练集训练不同的分类器, 可以称为弱分类器, 然 后把这些弱分类器集合起来, 构成一个更强的最终分类器(强分类器)。 进行 人脸检测即利用人脸图像和非人脸图像, 训练一组分类器, 其中, 人脸图像 为正样本, 非人脸图像为负样本; 搜索输入的待处理的图像的每个区域, 利 用这组分类器判断出人脸区域, 检测出的人物脸部区域如图 4 (a) 中的矩形 区域所示。 Step 302: When the image of the video frame to be processed is the first frame, perform a face detection on the first frame of the video image to be processed, and obtain a face region of the character; Specifically, the AdaBoost algorithm is used for face detection. Adaboos t is an iterative algorithm. The core idea is to train different classifiers for the same training set, which can be called weak classifiers, and then combine these weak classifiers. , constitutes a stronger final classifier (strong classifier). Performing face detection, that is, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, The face area is judged by the group of classifiers, and the detected face area of the person is as shown in the rectangular area in Fig. 4 (a).
步骤 303, 根据所述人物脸部区域, 获取前景种子像素点和背景种子像素 点;  Step 303: Obtain a foreground seed pixel point and a background seed pixel point according to the character face area.
根据所述人物脸部区域, 进行适度的调整, 生成前景釆样模型和背景釆 样模型。 具体地, 如图 4 (b)所示, 对人物脸部区域进行适度缩小, 然后根 据人物脸部区域的高度确定人物脸部区域和上半身区域之间的距离, 根据人 物头部和肩部的宽度的比例确定上半身的区域, 这样即可生成前景釆样模型, 图 4 (b) 中的浅色折线所包含的区域内的像素点即为前景种子像素点;  According to the face area of the person, a moderate adjustment is made to generate a foreground sample model and a background sample model. Specifically, as shown in FIG. 4(b), the face area of the person is appropriately reduced, and then the distance between the face area of the person and the upper body area is determined according to the height of the face area of the person, according to the head and shoulder of the person. The ratio of the width determines the area of the upper body, so that the foreground model can be generated. The pixel in the area included in the light-colored line in Figure 4 (b) is the foreground seed pixel;
如图 4 (c)所示, 在确定的前景釆样模型的基础上进行扩大, 生成背景 釆样模型, 图 4 (c) 中的深色折线与图像边界所包含的区域内的像素点即为 背景种子像素点。  As shown in Fig. 4(c), based on the determined foreground sample model, a background sample model is generated, and the dark dotted line in Fig. 4(c) and the pixel points in the area included in the image boundary are Seed pixels for the background.
步骤 304, 根据所述前景种子像素点分别确定所述前景种子像素点在 L、 a、 b三种颜色分量上的三组样本值, 根据所述背景种子像素点分别确定所述 背景种子像素点在 L、 a、 b三种颜色分量上的三组样本值;  Step 304: Determine, according to the foreground seed pixel points, three sets of sample values of the foreground seed pixel points on three color components of L, a, and b, and determine the background seed pixel points according to the background seed pixel points respectively. Three sets of sample values on three color components of L, a, b;
本发明实施例提供的方案将视频图像由 RGB (Red 、 Green, Blue, 红、 绿、 蓝)空间转换到 Lab空间, Lab由三个通道组成, L通道为一个亮度通道, a通道和 b通道为两个色彩通道。 a表示从洋红色至绿色的范围, b表示从黄 色至蓝色的范围。 其中 3个颜色分量 L、 a、 b互相独立。 对于釆样得到的 n 个前景种子像素点,在 L、a、b这 3个颜色分量上得到 3组样本值 , {a ,a2 F,...,a„Fj, {«".., }; 同样地, 对于釆样得到的 n个背景种子像素点, 也得到 3组样本值 }, {a ,a2 B,...,an B} , {6f ,62 £,...,6„£ },其中上标 F和 B 分另1 J表示前景 ( Foreground )和背景 ( Background )。 The solution provided by the embodiment of the present invention converts a video image from RGB (Red, Green, Blue, Red, Green, Blue) space to Lab space, Lab consists of three channels, and L channel is a luminance channel, a channel and b channel. For two color channels. a represents the range from magenta to green, and b represents the range from yellow to blue. The three color components L, a, and b are independent of each other. For the n foreground seed pixels obtained from the sample, three sets of sample values are obtained on the three color components L, a, b, {a , a 2 F ,..., a„ F j, {«". Similarly, for the n background seed pixels obtained by the sample, 3 sets of sample values}, {a , a 2 B , ..., a n B } , {6f , 6 2 £ , are also obtained. ...,6„ £ }, where superscripts F and B The other 1 J represents the foreground (Foreground) and the background (Background).
步骤 305, 根据所述前景种子像素点和所述背景种子像素点的样本值, 分 别计算所述视频图像中各个像素点的第一前景概率和第一背景概率;  Step 305: Calculate, according to sample values of the foreground seed pixel point and the background seed pixel point, a first foreground probability and a first background probability of each pixel point in the video image.
具体地, 根据所述前景种子像素点和所述背景种子像素点的样本值, 分 别计算 f/(x)、 f (x)、
Figure imgf000008_0001
f (x)、 f (x);
Specifically, f/(x), f( x ), respectively, are calculated according to sample values of the foreground seed pixel point and the background seed pixel point.
Figure imgf000008_0001
f (x), f (x);
举例说明, 釆用核密度估计方法来构建前景 /背景的非参数模型, 对于当 前待处理的第一帧视频图像的任一个像素点, 基于非参数核密度估计得到 f(x) = _∑^^.exP[ _~ -Ί— ] , For example, the kernel density estimation method is used to construct a non-parametric model of the foreground/background. For any pixel of the first frame of the video image to be processed, based on the non-parametric kernel density estimation, f( x ) = _∑^ ^. ex P[ _~ - Ί — ] ,
Xi表示第 i个前景种子像素点或者第 i个背景种子像素点, X表示所述视频图 像中的任一个像素点; Xi represents an i-th foreground seed pixel point or an i-th background seed pixel point, and X represents any one of the video images;
根据 f(x)分别计算 f/(x)、 f (x)、 f (x)和 f (x)、 f (x)、 f (x); 其中, x表示所述视频图像中任一个像素点, f/(x)、 f (x)、 f (x)分别表示所述像 素点在 L、 a、 b三种颜色分量上的前景概率, f (x)、 f (x)、 f (x)分别表示 所述像素点在 L、 a、 b三种颜色分量上的背景概率; Calculating f/(x), f(x), f(x), and f(x), f(x), f(x) according to f(x), where x represents any pixel in the video image Points, f/(x), f(x), f(x) represent the foreground probabilities of the pixel points on the three color components L, a, b, respectively, f ( x ), f ( x ), f ( x ) respectively representing the background probability of the pixel points on the three color components L, a, b;
由于三种颜色分量相互独立, 根据 fF(x) = fL F(x)*fa F(x)*fb F(x), 计算得到 所 述视频 图 像 中 任一个像素 点 的 第 一 前 景概率 ; 根据 fB(x) = f B(X)*fa B(X)*fb B(x), 计算得到所述视频图像中任一个像素点的第一背 景概率。 Since the three color components are independent of each other, the first pixel of any one of the video images is calculated according to f F (x) = f L F( x)*f a F( x)*f b F( x) Probability of foreground; according to f B (x) = f B ( X ) * f a B ( X ) * f b B (x), the first background probability of any pixel in the video image is calculated.
步骤 306, 将所述第一前景概率和所述第一背景概率进行归一化处理, 计 算得到所述视频图像中各个像素点为前景或者背景的概率。  Step 306: Normalize the first foreground probability and the first background probability, and calculate a probability that each pixel in the video image is a foreground or a background.
具体地, 对像素 X的前景概率和背景概率进行归一化处理, 即根据 F(x) = fF(x)/[fF(x) + fB(x)], 计算得到所述视频图像中任一个像素点为前景的 概率; Specifically, the foreground probability and the background probability of the pixel X are normalized, that is, the video is calculated according to F(x) = f F (x) / [f F (x) + f B (x)] The probability that any pixel in the image is foreground;
根据 PB( X ) = 1 - (X) ,计算得到所述视频图像中任一个像素点为背景的概 率。 According to P B ( X ) = 1 - (X), the probability that any pixel point in the video image is the background is calculated.
按照步骤 305和步骤 306提供的方法将第一帧所述视频图像中的各个像 素点进行处理, 获得各个像素点为前景或者背景的概率, 如图 4 ( d )所示, 像素值越高, 像素越亮, 则像素为前景的概率越大, 同理, 像素越暗, 其为 背景的概率越大。 According to the method provided in steps 305 and 306, each image in the video image of the first frame is The prime point is processed to obtain the probability that each pixel is foreground or background. As shown in Fig. 4 (d), the higher the pixel value, the brighter the pixel, the greater the probability that the pixel is foreground. Similarly, the darker the pixel, The greater the probability of it being the background.
步骤 307 , 根据所述各个概率构建图, 并进行图切割获取人物对象; 需要说明的是, 如图 5 ( a )所示图 G {V, E, W} , 其中, V ( Ver tex, 顶 点)是图 G中的基本元素,代表图中相互关联的个体, Vl表示第 i个顶点, ^ 表示第 j个顶点; E (Edge , 边)是指图 G中连接两个关联的顶点的连线, Ε¾ 表示连接 i, j 个顶点的边; W ( we i ght , 权)是指分配给连接两个顶点的边 的一个数值, 它表示这两个顶点的关系紧密程度, wϋ表示连接 i, j 个顶点 的边的权值。 Step 307, construct a map according to the respective probabilities, and perform graph cutting to obtain a character object; it should be noted that, as shown in FIG. 5 (a), a graph G {V, E, W}, where V (Ver tex, vertex) ) is the basic element in Figure G, representing the interrelated individuals in the graph, Vl represents the ith vertex, ^ represents the jth vertex; E (Edge, edge) refers to the join of the two associated vertices in graph G line, Ε ¾ represents a linking i, j edge vertices; W (we i ght, right) refers to a value assigned to the edge connecting two vertices, which represents how closely the relationship between these two vertices, w ϋ represents The weight of the edge connecting i, j vertices.
本发明实施例提供的方案釆用最大流最小割算法进行图切割, 如图 5 ( b ) 所示,将图的所有顶点划分为两个子集, 两个子集之间的边构成图的割, 如图 5 ( b ) 中的虚线所示。 其中, 两个子集中分别包含一个虚拟的源点和一个虚 拟的汇点, 源点对应前景种子像素点, 汇点对应背景种子像素点。 所有从源 点到汇点的割中权值最小的割称为最小割, 寻找最小割的一个基本的方法就 是寻找从源点到汇点的最大流, 即将连接两个顶点的边看作是一根水管, 边 的权值就是水管的容量。 所谓的最大流就是从源点到汇点所能通过的最大的 水流量, 当水流从源点到汇点的流量达到最大时, 完全充满了的那些水管就 正是分割源点和汇点的最小割。  The solution provided by the embodiment of the present invention performs the graph cutting by using the maximum stream minimum cut algorithm. As shown in FIG. 5( b ), all the vertices of the graph are divided into two subsets, and the edges between the two subsets constitute a cut of the graph. This is shown by the dotted line in Figure 5 (b). The two subsets respectively include a virtual source point and a virtual sink point, the source point corresponds to the foreground seed pixel point, and the sink point corresponds to the background seed pixel point. All the cuts with the smallest cut weight from the source point to the sink point are called minimum cuts. A basic way to find the minimum cut is to find the maximum flow from the source point to the sink point, that is, the edge connecting the two vertices is regarded as A water pipe, the weight of the side is the capacity of the water pipe. The so-called maximum flow is the maximum water flow that can be passed from the source point to the sink point. When the flow of water from the source point to the sink point reaches the maximum, the water pipes that are completely filled are the source and sink points. Minimal cut.
本发明实施例中通过构建像素之间的能量项构建图, 具体地, 将视频帧 图像中的像素对应图的顶点, 图的边则相应地连接两个相邻的像素, 为每条 边分配一个权值来表示这条边所连接的两个象素的关系, 例如相互间颜色的 相似程度及与源点和汇点的关系, 像素的前景概率表示与源点的关系, 像素 的背景概率表示与汇点的关系。 源点和汇点分别表示前景种子像素点和背景 种子像素点, 像素的前景概率越大, 与源点的关系越密切, 属于人物对象的 可能性越大; 像素的背景概率越大, 则与汇点的关系越密切, 属于背景的可 能性就越大。 具体地, 根据 Ed (x) = 计算所述视频图像中任一个像素的数据能量项;
Figure imgf000010_0001
根据 Es (x,y) = \ ' Β(Χ)≠ B(y) 计算所述视频图像中任一个像素与其相临
In the embodiment of the present invention, a map is constructed by constructing energy items between pixels, specifically, a pixel in a video frame image corresponds to a vertex of the graph, and an edge of the graph is connected to two adjacent pixels correspondingly, and each edge is allocated. A weight indicates the relationship between the two pixels connected to the edge, such as the degree of similarity between the colors and the relationship between the source and the sink. The foreground probability of the pixel indicates the relationship with the source, and the background probability of the pixel. Indicates the relationship with the Meeting Point. The source point and the sink point respectively represent the foreground seed pixel point and the background seed pixel point. The larger the foreground probability of the pixel, the closer the relationship with the source point is, and the more likely it is to belong to the character object; the larger the background probability of the pixel, the greater the The closer the relationship of Meeting Point is, the more likely it is to belong to the background. Specifically, according to E d (x) = calculating a data energy term of any one of the pixels in the video image;
Figure imgf000010_0001
Calculating any pixel in the video image according to E s (x,y) = \ ' Β(Χ)≠ B(y)
[0,B(x) = B(y) 像素的平滑能量项;  [0, B(x) = B(y) The smoothing energy term of the pixel;
其中, B是一个二值变量, X) = Q表示所述视频图像中任一个像素 ^属于 背景, ^χ) = 1表示所述视频图像中任一个像素 ^属于前景, 是数据能量 项, (χ, 是基于相邻像素的平滑能量项, 像素 y是像素 X的 4邻域内任一 象素, "为一个参数, "可以为 1. 5 ; Where B is a binary variable, X ) = Q indicates that any pixel in the video image belongs to the background, and ^ χ ) = 1 indicates that any pixel in the video image belongs to the foreground and is a data energy term, ( ; , is based on the smoothing energy term of the adjacent pixel, the pixel y is any pixel in the 4 neighborhood of the pixel X, "for one parameter," may be 1.5;
根据所述视频图像中任一个像素的数据能量项, 所述视频图像中任一个 像素与其相临像素的平滑能量项构建图;  Forming a map of smoothing energy terms of any one of the video images and its adjacent pixels according to a data energy term of any one of the video images;
本发明实施例中将视频帧图像中的人物对象进行分割的问题, 可以转化 成为将构建的图的分割的问题, 具体地, 可以釆用最大流最小割算法进行图 切割, 从而可以获取人物对象。  In the embodiment of the present invention, the problem of segmenting a person object in a video frame image may be converted into a problem of segmentation of a graph to be constructed. Specifically, the maximum stream minimum cut algorithm may be used to perform graph cut, thereby obtaining a character object. .
步骤 302-步骤 307为对第一帧图像进行处理的流程, 当需要处理的图像 不是第一帧图像时,如果对每一帧图像都釆取步骤 302-步骤 307的处理流程, 将会相当耗费时间, 因此在处理整个视频序列时, 进一步地釆取以下处理过 程。  Steps 302 to 307 are processes for processing the image of the first frame. When the image to be processed is not the image of the first frame, if the processing of steps 302 to 307 is performed for each frame of image, it will be quite expensive. Time, so when processing the entire video sequence, the following process is further taken.
步骤 308 , 当待处理的视频帧图像不是第一帧时, 对所述视频帧图像进行 亮度变化检测, 获得当前所述视频帧图像与前一帧视频图像之间的亮度差异 距离;  Step 308: When the video frame image to be processed is not the first frame, perform a brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image.
前景 /背景的非参数模型是否需要更新主要取决于场景的变化, 其中的一 个主要因素就是亮度的变化, 亮度的变化可能由周围环境的变化引起, 也可 能由视频釆集设备引起, 亮度的变化会导致使用当前的非参数模型计算出的 前景 I背景概率不能较好地符合当前的视频帧。  Whether the non-parametric model of the foreground/background needs to be updated depends mainly on the change of the scene. One of the main factors is the change of brightness. The change of brightness may be caused by the change of the surrounding environment, or may be caused by the video collection device. It will result in a foreground I background probability calculated using the current non-parametric model that does not fit well with the current video frame.
具体地, 亮度变化检测主要利用了 Bha t tacha ryya距离来计算当前帧的 亮度直方图 与前一帧的亮度直方图 ^。之间的差异, 即根据
Figure imgf000011_0001
其中, H i)是直方图 在灰 度阶 i的值, H。(i)是直方图 H。在灰度阶 i的值。
Specifically, the brightness change detection mainly utilizes the Bha t tacha ryya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame. The difference between
Figure imgf000011_0001
Where H i) is the value of the histogram at the gray level i, H. (i) is a histogram H. The value of the gray level i.
步骤 309, 判断所述亮度差异距离是否小于预设阈值;  Step 309, determining whether the brightness difference distance is less than a preset threshold;
这里预设阈值通过实验确定, 可以为 0.1。  Here, the preset threshold is determined experimentally and can be 0.1.
当所述亮度差异距离大于预设阈值时, 按照当前待处理图像为第一帧视 频图像的处理方法进行处理; 如图 7 所示, 亮度差异距离大于预设阈值的连 续两帧图像即其直方图,这样即按照步骤 302-步骤 307的处理过程进行处理。  When the brightness difference distance is greater than the preset threshold, the processing is performed according to the current image to be processed as the processing method of the first frame video image; as shown in FIG. 7 , the continuous two frames of the brightness difference distance greater than the preset threshold is the histogram Therefore, the processing is performed in accordance with the processing of steps 302-307.
步骤 310, 当所述亮度差异距离小于预设阈值时,根据所述前一帧视频图 像的人物对象轮廓, 确定当前所述视频帧图像的人物对象轮廓;  Step 310: Determine, when the brightness difference distance is less than a preset threshold, a contour of a person object of the current video frame image according to a contour of a person object of the video image of the previous frame;
具体地, 根据所述前一帧视频图像的分割结果的二值图, 提取所述前一 帧视频图像的人物对象轮廓上的至少一个关键点; 可以提取对象轮廓上方向 突变的特征点, 然后进行等比例釆样, 以获得合适数目的关键点, 而轮廓的 起始点、 结束点以及距离图像底部较近的特征点也选作关键点。 如图 6 所示 的灰色的带状区域内的几十个深色圓点即为关键点。  Specifically, according to the binary image of the segmentation result of the previous frame video image, at least one key point on the contour of the character object of the previous frame video image is extracted; and the feature point whose direction is abruptly changed on the contour of the object may be extracted, and then Proportional sampling is performed to obtain a suitable number of key points, and the starting point, ending point, and feature points closer to the bottom of the image are also selected as key points. Dozens of dark dots in the gray banded area as shown in Fig. 6 are the key points.
根据至少一个所述关键点, 确定每个所述关键点在当前所述视频帧图像 中对应的关键点;  Determining, according to at least one of the key points, a corresponding key point of each of the key points in the current video frame image;
具体地,对于每一关键点 X,通过最小化下述能量函数来估计其运动矢量, 根据 E=£ + r*G, 获取能量函数 ; 其中, 参数 r用于控制能量项 L和 G之间 的权重关系;  Specifically, for each key point X, the motion vector is estimated by minimizing the following energy function, and an energy function is obtained according to E=£ + r*G; wherein the parameter r is used to control the energy term between L and G Weight relationship;
能量项 L定义为 = Μ(χ) II -ιχ)- (ζχ +mvx)\\ , The energy term L is defined as = Μ(χ) II - ιχ )- (ζ χ +mv x )\\ ,
其中, ^表示 X的位置, "^表示为像素 X估计的运动矢量, W "表示在前 一帧中以像素 X为中心的一个窗口。 如果与像素 X相同位置处的像素在前一 帧属于前景区域, 则 MW = i, 否则 Μ ) = ο。 Where ^ represents the position of X, "^ denotes the motion vector estimated for pixel X, and W" denotes a window centered on pixel X in the previous frame. If the pixel at the same position as the pixel X belongs to the foreground region in the previous frame, MW = i, otherwise Μ ) = ο.
能量项 G取决于当前帧的梯度信息 G = exP(— maxllg ( +腳 Jll), ce{L,a,b} 其中, ^表示在三个颜色分量上某一分量的梯度值。 The energy term G depends on the gradient information of the current frame G = ex P (- max llg (+foot J11), ce{L, a, b} where ^ represents the gradient value of a component on the three color components.
可以设定像素 X 的运动矢量的范围为上下左右 4 个像素, 这样就有 ( 2x4+1 ) * ( 2x4+1 ) =81 个可能的运动适量, 对于每个可能的运动矢量, 都 可以计算得出一个能量函数 E值,选择最小的 E值对应的运动矢量作为像素 X 的运动矢量, 这样, 即可获得像素 X在当前帧中对应的关键点。 You can set the motion vector of pixel X to be 4 pixels up, down, left, and right. ( 2x4+1 ) * ( 2x4+1 ) = 81 possible motion quantities. For each possible motion vector, an energy function E value can be calculated, and the motion vector corresponding to the smallest E value is selected as the pixel X. The motion vector, in this way, can obtain the corresponding key point of the pixel X in the current frame.
根据相邻的两个所述关键点之间的距离及斜率变化, 确定至少一个目标 关键点;  Determining at least one target key point according to a change in distance and slope between two adjacent key points;
具体地, 在获得了当前帧的所有的关键点后, 根据相邻的两个所述关键 点之间的距离及斜率变化, 舍去部分对人物对象轮廓形状影响较小的关键点, 确定剩余的关键点作为目标关键点。  Specifically, after all the key points of the current frame are obtained, according to the distance and the slope change between the two adjacent key points, the key points that have little influence on the contour shape of the character object are discarded, and the remaining points are determined. The key point is the key point of the goal.
将所述至少一个目标关键点连接起来, 获得当前所述视频帧图像的人物 对象轮廓。  The at least one target key point is connected to obtain a character object outline of the current video frame image.
将所述至少一个目标关键点连接起来, 即可获得一个大致的人物轮廓, 在该轮廓的基础上进行腐蚀膨胀操作, 获得不确定区域, 即图 6 所示的灰色 带状区域。  By connecting the at least one target key point, a rough outline of the character is obtained, and the corrosion expansion operation is performed on the basis of the outline to obtain an indeterminate area, that is, the gray stripe area shown in FIG.
步骤 311 ,根据所述人物对象轮廓更新当前所述视频帧图像中各个像素点 为前景或者背景的概率;  Step 311: Update, according to the contour of the character object, a probability that each pixel in the current video frame image is a foreground or a background;
具体地, 当前确定的人物对象轮廓为一个大概的人物对象轮廓, 如图 6 所示, 白色的区域为前景, 黑色的区域为背景, 灰色带状区域为不确定区域, 即灰色带状区域可能为前景也可能为背景, 根据前一帧视频图像的前景 /背景 的非参数模型更新当前所述视频帧图像中各个像素点的前景 /背景的非参数 模型, 即确定各个像素点的为前景或者背景的概率, 具体地, 根据步骤 305 和步骤 306的方法计算各个像素点为前景或者背景的概率。  Specifically, the currently determined character object contour is an approximate character object contour, as shown in FIG. 6, the white area is the foreground, the black area is the background, and the gray strip area is the uncertain area, that is, the gray strip area may For the foreground, it may also be a background, according to the non-parametric model of the foreground/background of the video image of the previous frame, the non-parametric model of the foreground/background of each pixel in the current video frame image is updated, that is, each pixel is determined to be foreground or The probability of the background, specifically, the probability of each pixel point being the foreground or background is calculated according to the methods of steps 305 and 306.
需要说明的是, 由于当前视频帧图像中的前景 /背景的大部分区域根据前 一帧视频图像已经确定, 因此可以加快分割整个视频的人物对象的速度。  It should be noted that since most of the foreground/background in the current video frame image has been determined based on the previous frame video image, the speed of dividing the person object of the entire video can be speeded up.
步骤 312 , 根据所述各个概率构建图, 并进行图切割获取当前所述视频帧 图像的人物对象。  Step 312: Construct a map according to the respective probabilities, and perform a graph cut to obtain a character object of the current video frame image.
具体地, 根据步骤 307的方法切割视频帧图像的人物对象。  Specifically, the person object of the video frame image is cut according to the method of step 307.
本发明实施例提供的一种视频人物分割的方法, 通过对视频帧图像中人 物对象的分割, 当釆用的现有技术时, 对各类视频的适应性不强, 并且釆用 开闭运算优化对象分割结果时, 不能分割出完整的人物对象, 本发明实施例 提供的方案可以适用于各类视频人物的分割, 并且可以实时地分割出完整人 物对象, 基于相邻视频帧之间的相关性, 可以快速地分割整个视频。 A video character segmentation method provided by an embodiment of the present invention, by using a video frame image The segmentation of the object object, when the prior art is used, the adaptability to the various types of video is not strong, and when the segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time, and the entire video can be quickly segmented based on the correlation between adjacent video frames.
本发明实施例提供一种视频人物分割的装置, 如图 8所示, 该装置包括: 判断单元 801 , 第一获取单元 802 , 第二获取单元 803 , 计算单元 804 , 确定 模块 805 , 第一计算模块 806 , 第一计算子模块 807 , 第二计算子模块 808 , 第二计算模块 809 ,第一计算子模块 810,第二计算子模块 811 ,处理单元 812 , 检测获取单元 813 , 确定单元 814 , 提取模块 815 , 第一确定模块 816 , 第二 确定模块 817 , 获取模块 818 , 更新单元 819。  An embodiment of the present invention provides a device for video character segmentation. As shown in FIG. 8, the device includes: a determining unit 801, a first obtaining unit 802, a second obtaining unit 803, a calculating unit 804, a determining module 805, and a first calculating. The module 806, the first calculation sub-module 807, the second calculation sub-module 808, the second calculation module 809, the first calculation sub-module 810, the second calculation sub-module 811, the processing unit 812, the detection acquisition unit 813, the determination unit 814, The extraction module 815, the first determining module 816, the second determining module 817, the obtaining module 818, and the updating unit 819.
判断单元 801 , 用于判断待处理的视频帧图像是否为第一帧;  The determining unit 801 is configured to determine whether the image of the video frame to be processed is the first frame;
当待处理的视频帧图像为第一帧时, 第一获取单元 802 , 用于将待处理的 第一帧视频图像进行人脸检测, 获取人物脸部区域; 可以釆用 AdaBoos t算法 进行人脸检测, 利用人脸图像和非人脸图像, 训练一组分类器, 其中, 人脸 图像为正样本, 非人脸图像为负样本; 搜索输入的待处理的图像的每个区域, 利用这组分类器判断出人脸区域。  When the image of the video frame to be processed is the first frame, the first acquiring unit 802 is configured to perform face detection on the first frame of the video image to be processed to obtain a face region of the person; and use the AdaBoos t algorithm to perform the face. Detecting, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, using the group The classifier determines the face area.
根据所述人物脸部区域, 第二获取单元 803 获取前景种子像素点和背景 种子像素点;  According to the face area of the person, the second acquiring unit 803 acquires a foreground seed pixel point and a background seed pixel point;
具体地, 对第一获取单元 802 获取的人物脸部区域进行适度的调整, 即 对人物脸部区域进行适度缩小, 然后根据人物脸部区域的高度确定人物脸部 区域和上半身区域之间的距离, 根据人物头部和肩部的宽度的比例确定上半 身的区域, 这样即可生成前景釆样模型, 其中前景釆样模型内包含的像素点 即为前景种子像素点; 在确定的前景釆样模型的基础上进行扩大, 生成背景 釆样模型, 其中背景釆样模型内包含的像素点即为背景种子像素点。  Specifically, the facial region of the person acquired by the first obtaining unit 802 is moderately adjusted, that is, the facial region of the person is appropriately reduced, and then the distance between the facial region and the upper body region of the human is determined according to the height of the facial region of the human. The area of the upper body is determined according to the ratio of the width of the head and the shoulder of the person, so that the foreground model is generated, wherein the pixel included in the foreground model is the foreground seed pixel; Based on the expansion, a background sample model is generated, wherein the pixel points included in the background sample model are background seed pixels.
计算单元 804 , 用于根据所述前景种子像素点和所述背景种子像素点, 分 别计算所述视频图像中各个像素点为前景或者背景的概率;  The calculating unit 804 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;
具体地, 在计算所述视频图像中各个像素点为前景或者背景的概率时, 所述计算单元 804中的确定模块 805,用于根据所述前景种子像素点分别确定 所述前景种子像素点在 L、 a、 b三种颜色分量上的三组样本值, 根据所述背 景种子像素点分别确定所述背景种子像素点在 L、 a、 b三种颜色分量上的三 组样本值; Specifically, when calculating a probability that each pixel point in the video image is a foreground or a background, The determining module 805 of the calculating unit 804 is configured to respectively determine three sets of sample values of the foreground seed pixel points on the three color components of L, a, b according to the foreground seed pixel points, according to the background seed The pixel points respectively determine three sets of sample values of the background seed pixel points on the three color components of L, a, b;
第一计算模块 806,用于根据所述前景种子像素点和所述背景种子像素点 的样本值, 分别计算所述视频图像中各个像素点的第一前景概率和第一背景 概率;  a first calculating module 806, configured to separately calculate a first foreground probability and a first background probability of each pixel in the video image according to the foreground value of the foreground seed pixel and the background seed pixel;
进一步地, 所述第一计算模块 806中的第一计算子模块 807, 用于根据所 述前景种子像素点和所述背景种子像素点的样本值, 分别计算 f/(x)、 f (x)、 f ( f (x)、 f (x)、 f (x); 其中, x表示所述视频图像中任一个像素点, f/(x)、 f/(x)、 f (x)分别表示所述像素点在 a、 b三种颜色分量上的前景 概率, f (x)、 f (x)、 f (x)分别表示所述像素点在 a、 b三种颜色分量上 的背景景概率; Further, the first calculating submodule 807 in the first calculating module 806 is configured to calculate f/(x), f (x) respectively according to sample values of the foreground seed pixel point and the background seed pixel point. ), f ( f (x), f (x), f (x); where x represents any pixel in the video image, f/( x ), f/(x), f (x) respectively Representing the foreground probability of the pixel points on the three color components a and b, f (x), f (x), and f (x) respectively represent the background scene of the pixel points on the three color components a and b Probability
第二计算子模块 808, 用于根据
Figure imgf000014_0001
计算得到所 述视频图像中任一个像素点的第一前景概率;
a second calculation sub-module 808, configured to
Figure imgf000014_0001
Calculating a first foreground probability of any one of the pixel points in the video image;
所述第二计算子模块 808还用于, 根据 fB(x) = fL B(x)*fa B(x)*fb B(x), 计算 得到所述视频图像中任一个像素点的第一背景概率。 The second calculation sub-module 808 is further configured to calculate any pixel in the video image according to f B ( x ) = f L B ( x )*f a B ( x )*f b B (x) The first background probability of the point.
将所述第一前景概率和所述第一背景概率进行归一化处理, 第二计算模 块 809, 计算得到所述视频图像中各个像素点为前景或者背景的概率;  The first foreground probability and the first background probability are normalized, and the second computing module 809 calculates a probability that each pixel in the video image is a foreground or a background;
具体地, 所述第二计算模块 809 中的第一计算子模块 810, 用于根据 F(x) = fF(x)/[fF(x) + fB(x)], 计算得到所述视频图像中任一个像素点为前景的 概率; Specifically, the first calculation sub-module 810 in the second calculation module 809 is configured to calculate the location according to F(x) = f F (x) / [f F (x) + f B (x)] The probability that any pixel in the video image is foreground;
第二计算子模块 811, 用于根据 pB(x) = l- ^(x), 计算得到所述视频图像 中任一个像素点为背景的概率。 The second calculation sub-module 811 is configured to calculate, according to pB( x )=l-^(x), a probability that any one of the pixel points in the video image is a background.
当确定当前视频帧图像中的像素点的前景 /背景概率后, 处理单元 812根 据所述各个概率构建图, 并进行图切割获取人物对象;  After determining the foreground/background probability of the pixel in the current video frame image, the processing unit 812 constructs a map according to the respective probabilities, and performs graph cutting to obtain the character object;
当待处理的视频帧图像不是第一帧时,检测获取单元 813, 对所述视频帧 图像进行亮度变化检测, 获得当前所述视频帧图像与前一帧视频图像之间的 亮度差异距离; When the video frame image to be processed is not the first frame, the detection acquiring unit 813, for the video frame Performing a brightness change detection on the image to obtain a brightness difference distance between the current video frame image and the previous frame video image;
具体地, 亮度变化检测主要利用了 Bha t tacharyya距离来计算当前帧的 亮度直方图 与前一帧的亮度直方图 ^。之间的差异, 即根据
Figure imgf000015_0001
其中, H i)是直方图 在灰 度阶 i的值, H。(i)是直方图 H。在灰度阶 i的值。
Specifically, the luminance change detection mainly utilizes the Bha t tacharyya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame. The difference between
Figure imgf000015_0001
Where H i) is the value of the histogram at the gray level i, H. (i) is a histogram H. The value of the gray level i.
当所述亮度差异距离小于预设阈值时, 确定单元 813 , 根据所述前一帧视 频图像的人物对象轮廓, 确定当前所述视频帧图像的人物对象轮廓;  When the brightness difference distance is less than the preset threshold, the determining unit 813 determines the character object contour of the current video frame image according to the character object contour of the previous frame video image;
进一步地, 所述确定单元 814中的提取模块 815 , 用于根据所述前一帧视 频图像的分割结果的二值图, 提取所述前一帧视频图像的人物对象轮廓上的 至少一个关键点;  Further, the extracting module 815 in the determining unit 814 is configured to extract at least one key point on the contour of the character object of the video image of the previous frame according to the binary image of the segmentation result of the video image of the previous frame. ;
根据至少一个所述关键点, 第一确定模块 816 确定每个所述关键点在当 前所述视频帧图像中对应的关键点; 根据相邻的两个所述关键点之间的距离 及斜率变化, 第二确定模块 817确定至少一个目标关键点;  According to at least one of the key points, the first determining module 816 determines a corresponding key point of each of the key points in the current video frame image; according to a distance and a slope change between two adjacent key points The second determining module 817 determines at least one target key point;
获取模块 818 , 用于将所述至少一个目标关键点连接起来, 获得当前所述 视频帧图像的人物对象轮廓;  An obtaining module 818, configured to connect the at least one target key point to obtain a character object contour of the current video frame image;
然后根据所述人物对象轮廓, 更新单元 819 更新当前所述视频帧图像中 各个像素点为前景或者背景的概率; 根据更新的所述各个概率构建图, 所述 处理单元 812还用于, 并进行图切割获取当前所述视频帧图像的人物对象。  And then, according to the character object contour, the updating unit 819 updates a probability that each pixel point in the current video frame image is a foreground or a background; and constructs a map according to the updated respective probability, the processing unit 812 is further configured to perform The graph cut acquires the character object of the current video frame image.
本发明实施例提供一种视频人物分割的装置, 通过第一获取单元将待处 理的第一帧视频图像进行人脸检测, 获取人物脸部区域, 根据所述人物脸部 区域, 第二获取单元获取前景种子像素点和背景种子像素点, 然后计算单元 分别计算所述视频图像中各个像素点为前景或者背景的概率, 处理单元根据 所述各个概率构建图, 并进行图切割获取人物对象。 与现有技术中对视频人 物进行分割时, 高斯混合模型的分量数目由人工设定, 对各类视频的适应性 不强, 并且不能分割出完整的人物对象相比, 本发明实施例提供的方案可以 适用于各类视频人物的分割, 并且可以实时地分割出完整人物对象。 以上所述, 仅为本发明的具体实施方式, 但本发明的保护范围并不局限 于此, 任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易 想到变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护 范围应所述以权利要求的保护范围为准。 An embodiment of the present invention provides a device for segmenting a video object, by using a first acquiring unit to perform face detection on a first frame of a video image to be processed, to obtain a face region of a person, according to the face region of the character, a second acquiring unit. Obtaining a foreground seed pixel and a background seed pixel, and then calculating a probability that each pixel in the video image is a foreground or a background, the processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object. When the video character is segmented in the prior art, the number of components of the Gaussian mixture model is manually set, the adaptability to various types of video is not strong, and the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time. The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

权利 要 求 书 Claim
1、 一种视频人物分割的方法, 其特征在于, 包括: A method for segmenting a video character, comprising:
将待处理的第一帧视频图像进行人脸检测, 获取人物脸部区域;  Performing face detection on the first frame of the video image to be processed to obtain a face region of the person;
根据所述人物脸部区域, 获取前景种子像素点和背景种子像素点; 根据所述前景种子像素点和所述背景种子像素点, 分别计算所述视频图像 中各个像素点为前景或者背景的概率;  Obtaining a foreground seed pixel point and a background seed pixel point according to the face area of the person; calculating a probability of each pixel point in the video image as a foreground or a background according to the foreground seed pixel point and the background seed pixel point respectively ;
根据所述各个概率构建图, 并进行图切割获取人物对象。  A map is constructed according to the respective probabilities, and a graph cut is performed to obtain a person object.
2、 根据权利要求 1所述的视频人物分割的方法, 其特征在于, 在所述将待 处理的第一帧视频图像进行人脸检测, 获取人物脸部区域之前, 还包括:  The method for segmenting a video character according to claim 1, wherein before the performing the face detection on the first frame video image to be processed to obtain the face region of the person, the method further includes:
判断待处理的视频帧图像是否为第一帧;  Determining whether the image of the video frame to be processed is the first frame;
当待处理的视频帧图像不是第一帧时, 对所述视频帧图像进行亮度变化检 测, 获得当前所述视频帧图像与前一帧视频图像之间的亮度差异距离;  When the video frame image to be processed is not the first frame, performing brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image;
当所述亮度差异距离小于预设阈值时, 根据所述前一帧视频图像的人物对 象轮廓, 确定当前所述视频帧图像的人物对象轮廓;  Determining, according to the person object contour of the video image of the previous frame, a character object contour of the current video frame image, when the brightness difference distance is less than a preset threshold;
根据所述人物对象轮廓更新当前所述视频帧图像中各个像素点为前景或者 背景的概率;  Updating, according to the contour of the human object, a probability that each pixel in the current video frame image is a foreground or a background;
根据所述各个概率构建图, 并进行图切割获取当前所述视频帧图像的人物 对象。  A map is constructed according to the respective probabilities, and a graph cut is performed to obtain a character object of the current video frame image.
3、 根据权利要求 2所述的视频人物分割的方法, 其特征在于, 所述根据所 述前一帧视频图像的人物对象轮廓, 确定当前所述视频帧图像的人物对象轮廓 包括:  The method for segmenting a video character according to claim 2, wherein determining the contour of the person object of the current video frame image according to the contour of the human object of the video image of the previous frame comprises:
根据所述前一帧视频图像的分割结果的二值图, 提取所述前一帧视频图像 的人物对象轮廓上的至少一个关键点;  Extracting at least one key point on the contour of the character object of the video image of the previous frame according to the binary image of the segmentation result of the video image of the previous frame;
根据至少一个所述关键点, 确定每个所述关键点在当前所述视频帧图像中 对应的关键点;  Determining, according to at least one of the key points, a key point corresponding to each of the key points in the current video frame image;
根据相邻的两个所述关键点之间的距离及斜率变化, 确定至少一个目标关 键点; 将所述至少一个目标关键点连接起来, 获得当前所述视频帧图像的人物对 象轮廓。 Determining at least one target key point according to a change in distance and slope between two adjacent key points; The at least one target key point is connected to obtain a character object contour of the current video frame image.
4、 根据权利要求 1所述的视频人物分割的方法, 其特征在于, 所述根据所 述前景种子像素点和所述背景种子像素点, 分别计算所述视频图像中各个像素 点为前景或者背景的概率包括:  The video character segmentation method according to claim 1, wherein the calculating, according to the foreground seed pixel point and the background seed pixel point, each pixel point in the video image as a foreground or a background The probability includes:
根据所述前景种子像素点分别确定所述前景种子像素点在 L、 a、 b 三种颜 色分量上的三组样本值, 根据所述背景种子像素点分别确定所述背景种子像素 点在 L、 a、 b三种颜色分量上的三组样本值;  Determining, according to the foreground seed pixel points, three sets of sample values of the foreground seed pixel points on three color components of L, a, b, respectively, determining, according to the background seed pixel point, that the background seed pixel point is at L, a, b three sets of sample values on three color components;
根据所述前景种子像素点和所述背景种子像素点的样本值, 分别计算所述 视频图像中各个像素点的第一前景概率和第一背景概率;  Calculating, according to sample values of the foreground seed pixel point and the background seed pixel point, a first foreground probability and a first background probability of each pixel point in the video image;
将所述第一前景概率和所述第一背景概率进行归一化处理, 计算得到所述 视频图像中各个像素点为前景或者背景的概率。  The first foreground probability and the first background probability are normalized to calculate a probability that each pixel in the video image is foreground or background.
5、 根据权利要求 4所述的视频人物分割的方法, 其特征在于, 所述根据所 述前景种子像素点和所述背景种子像素点的样本值, 分别计算所述视频图像中 各个像素点的第一前景概率和第一背景概率包括:  The video character segmentation method according to claim 4, wherein the calculating, according to the sample values of the foreground seed pixel point and the background seed pixel point, respectively, calculating respective pixel points in the video image The first foreground probability and the first background probability include:
根据所述前景种子像素点和所述背景种子像素点的样本值, 分别计算 f/ (x)、 f/ (x)、 f ( f (x)、 f (x)、 f (x) ; 其中, x 表示所述视频图像中 任一个像素点, f/(x)、 f/(x)、 f (x)分别表示所述像素点在 a、 b三种颜色 分量上的前景概率, f (x)、 f (x)、 f (x)分别表示所述像素点在 a、 b三种 颜色分量上的背景景概率; Calculating f/( x ), f/(x), f(f(x), f(x), f(x), respectively, according to sample values of the foreground seed pixel point and the background seed pixel point; , x represents any pixel point in the video image, f/( x ), f/( x ), f ( x ) respectively represent the foreground probability of the pixel point on the three color components a, b, f ( x), f (x), f (x) respectively represent the background scene probability of the pixel points on the three color components a and b;
W ΐρ (χ) = fL F(x) * fa F(x) * fb F(x) , 计算得到所述视频图像中任一个像素点的 第一前景概率; W ΐ ρ (χ) = f L F (x) * f a F (x) * f b F (x) , calculating a first foreground probability of any pixel in the video image;
根据 fB (x) = f (x) * fa B(x) * fb B(x) , 计算得到所述视频图像中任一个像素点的 第一背景概率。 A first background probability of any one of the pixel points in the video image is calculated according to f B ( x ) = f ( x ) * f a B ( x ) * f b B (x) .
6、 根据权利要求 4所述的视频人物分割的方法, 其特征在于, 所述将所述 第一前景概率和所述第一背景概率进行归一化处理, 计算得到所述视频图像中 各个像素点为前景或者背景的概率包括: 根据 PF(X) = fF(X)/[fF(X) + fB(X)], 计算得到所述视频图像中任一个像素点为 前景的概率; The video character segmentation method according to claim 4, wherein the first foreground probability and the first background probability are normalized, and each pixel in the video image is calculated. The probability that the point is foreground or background includes: Calculating a probability that any pixel point in the video image is foreground according to P F (X) = f F ( X ) / [f F ( X ) + f B ( X )];
根据 PB ) = i - PF (x), 计算得到所述视频图像中任一个像素点为背景的概 率。 According to P B ) = i - P F (x), the probability that any pixel point in the video image is the background is calculated.
7、 一种视频人物分割的装置, 其特征在于, 包括:  7. A device for segmenting a video character, comprising:
第一获取单元, 用于将待处理的第一帧视频图像进行人脸检测, 获取人物 脸部区域;  a first acquiring unit, configured to perform face detection on the first frame video image to be processed, and obtain a face region of the character;
第二获取单元, 用于根据所述人物脸部区域, 获取前景种子像素点和背景 种子像素点;  a second acquiring unit, configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area;
计算单元, 用于根据所述前景种子像素点和所述背景种子像素点, 分别计 算所述视频图像中各个像素点为前景或者背景的概率;  a calculating unit, configured to calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel point in the video image is a foreground or a background;
处理单元, 用于根据所述各个概率构建图, 并进行图切割获取人物对象。 a processing unit, configured to construct a map according to the respective probabilities, and perform graph cutting to obtain a character object.
8、 根据权利要求 7所述的视频人物分割的装置, 其特征在于, 所述装置还 包括: The device for dividing a video character according to claim 7, wherein the device further comprises:
判断单元, 用于判断待处理的视频帧图像是否为第一帧;  a determining unit, configured to determine whether the image of the video frame to be processed is the first frame;
检测获取单元, 用于当待处理的视频帧图像不是第一帧时, 对所述视频帧 图像进行亮度变化检测, 获得当前所述视频帧图像与前一帧视频图像之间的亮 度差异距离;  a detection acquiring unit, configured to: when the video frame image to be processed is not the first frame, perform brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image;
确定单元, 用于当所述亮度差异距离小于预设阈值时, 根据所述前一帧视 频图像的人物对象轮廓, 确定当前所述视频帧图像的人物对象轮廓;  a determining unit, configured to: when the brightness difference distance is less than a preset threshold, determine a character object contour of the current video frame image according to the character object contour of the previous frame video image;
更新单元, 用于根据所述人物对象轮廓更新当前所述视频帧图像中各个像 素点为前景或者背景的概率;  And an updating unit, configured to update, according to the contour of the character object, a probability that each pixel point in the current video frame image is a foreground or a background;
所述处理单元还用于, 根据更新的所述各个概率构建图, 并进行图切割获 取当前所述视频帧图像的人物对象。  The processing unit is further configured to: construct a map according to the updated respective probabilities, and perform a graph cut to obtain a character object of the current video frame image.
9、 根据权利要求 8所述的视频人物分割的装置, 其特征在于, 所述确定单 元包括:  9. The apparatus for video character segmentation according to claim 8, wherein the determining unit comprises:
提取模块, 用于根据所述前一帧视频图像的分割结果的二值图, 提取所述 前一帧视频图像的人物对象轮廓上的至少一个关键点; An extracting module, configured to extract, according to a binary image of a segmentation result of the video image of the previous frame At least one key point on the contour of the character object of the previous frame of the video image;
第一确定模块, 用于根据至少一个所述关键点, 确定每个所述关键点在当 前所述视频帧图像中对应的关键点;  a first determining module, configured to determine, according to the at least one of the key points, a corresponding key point of each of the key points in the current video frame image;
第二确定模块, 用于根据相邻的两个所述关键点之间的距离及斜率变化, 确定至少一个目标关键点;  a second determining module, configured to determine at least one target key point according to a distance and a slope change between two adjacent key points;
获取模块, 用于将所述至少一个目标关键点连接起来, 获得当前所述视频 帧图像的人物对象轮廓。  And an obtaining module, configured to connect the at least one target key point to obtain a character object contour of the current video frame image.
10、 根据权利要求 7 所述的视频人物分割的装置, 其特征在于, 所述计算 单元包括:  The device for dividing a video character according to claim 7, wherein the calculating unit comprises:
确定模块, 用于根据所述前景种子像素点分别确定所述前景种子像素点在 L、 a、 b三种颜色分量上的三组样本值, 根据所述背景种子像素点分别确定所述 背景种子像素点在 L、 a、 b三种颜色分量上的三组样本值;  a determining module, configured to respectively determine three sets of sample values of the foreground seed pixel points on three color components of L, a, b according to the foreground seed pixel points, and determine the background seed according to the background seed pixel points respectively Three sets of sample values of pixels on three color components of L, a, b;
第一计算模块, 用于根据所述前景种子像素点和所述背景种子像素点的样 本值, 分别计算所述视频图像中各个像素点的第一前景概率和第一背景概率; 第二计算模块, 用于将所述第一前景概率和所述第一背景概率进行归一化 处理, 计算得到所述视频图像中各个像素点为前景或者背景的概率。  a first calculation module, configured to respectively calculate a first foreground probability and a first background probability of each pixel point in the video image according to sample values of the foreground seed pixel point and the background seed pixel point; And normalizing the first foreground probability and the first background probability, and calculating a probability that each pixel in the video image is a foreground or a background.
11、 根据权利要求 10所述的视频人物分割的装置, 其特征在于, 所述第一 计算模块包括:  The device for dividing a video character according to claim 10, wherein the first calculating module comprises:
第一计算子模块, 用于根据所述前景种子像素点和所述背景种子像素点的 样本值, 分别计算 f/(x)、 f/(x)、 f (x^Pf (x)、 f (x)、 f (x); 其中, x 表 示所述视频图像中任一个像素点, f/(x)、 f (x)、 f (x)分别表示所述像素点在 L、 a、 b三种颜色分量上的前景概率, f (x)、 f (x)、 f (x)分别表示所述像素 点在 L、 a、 b三种颜色分量上的背景景概率; a first calculation submodule, configured to calculate f/( x ), f/( x ), f(x^Pf(x), f, respectively, according to sample values of the foreground seed pixel and the background seed pixel ( x ), f (x); where x represents any pixel in the video image, and f/( x ), f ( x ), f (x) respectively represent the pixel at L, a, b The foreground probabilities on the three color components, f ( x ), f ( x ), and f ( x ) respectively represent the background scene probabilities of the three color components of the pixel points L, a, b;
第二计算子模块, 用于根据
Figure imgf000020_0001
计算得到所述视频 图像中任一个像素点的第一前景概率;
a second calculation submodule for
Figure imgf000020_0001
Calculating a first foreground probability of any one of the pixel points in the video image;
所述第二计算子模块还用于, 根据 fB(x) = f B(x)*fa B(x)*fb B(x), 计算得到所 述视频图像中任一个像素点的第一背景概率。 The second calculation sub-module is further configured to calculate, according to f B ( x ) = f B ( x )*f a B ( x )*f b B (x), any pixel point in the video image. First background probability.
12、 根据权利要求 10所述的视频人物分割的装置, 其特征在于, 所述第二 计算模块包括: The device for dividing a video character according to claim 10, wherein the second calculating module comprises:
第一计算子模块, 用于根据 ^( =^( / (^ + ^( ], 计算得到所述视频 图像中任一个像素点为前景的概率;  a first calculation submodule, configured to calculate, according to ^( =^( / (^ + ^( ), a probability that any pixel in the video image is a foreground;
第二计算子模块, 用于根据 ^ (X) = l- (x), 计算得到所述视频图像中任一 个像素点为背景的概率。 a second calculation sub-module, configured to calculate, according to ^ ( X ) = l- ( x ), a probability that any one of the pixel points in the video image is a background.
PCT/CN2011/079751 2011-09-16 2011-09-16 Video character separation method and device WO2012162981A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2011/079751 WO2012162981A1 (en) 2011-09-16 2011-09-16 Video character separation method and device
CN201180001853.1A CN103119625B (en) 2011-09-16 2011-09-16 Video character separation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2011/079751 WO2012162981A1 (en) 2011-09-16 2011-09-16 Video character separation method and device

Publications (1)

Publication Number Publication Date
WO2012162981A1 true WO2012162981A1 (en) 2012-12-06

Family

ID=47258310

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2011/079751 WO2012162981A1 (en) 2011-09-16 2011-09-16 Video character separation method and device

Country Status (2)

Country Link
CN (1) CN103119625B (en)
WO (1) WO2012162981A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507997A (en) * 2020-04-22 2020-08-07 腾讯科技(深圳)有限公司 Image segmentation method, device, equipment and computer storage medium
CN111583292A (en) * 2020-05-11 2020-08-25 浙江大学 Self-adaptive image segmentation method for two-photon calcium imaging video data

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230252B (en) * 2017-01-24 2022-02-01 深圳市商汤科技有限公司 Image processing method and device and electronic equipment
CN106846336B (en) * 2017-02-06 2022-07-15 腾讯科技(上海)有限公司 Method and device for extracting foreground image and replacing image background
CN106997599B (en) * 2017-04-17 2019-08-30 华东理工大学 A kind of video moving object subdivision method of light sensitive
CN107221058A (en) * 2017-05-25 2017-09-29 刘萍 Intelligent channel barrier system
CN107766803B (en) * 2017-09-29 2021-09-28 北京奇虎科技有限公司 Video character decorating method and device based on scene segmentation and computing equipment
CN109035257B (en) * 2018-07-02 2021-08-31 百度在线网络技术(北京)有限公司 Portrait segmentation method, device and equipment
CN113673270B (en) * 2020-04-30 2024-01-26 北京达佳互联信息技术有限公司 Image processing method and device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101588459A (en) * 2009-06-26 2009-11-25 北京交通大学 A kind of video keying processing method
CN101710418A (en) * 2009-12-22 2010-05-19 上海大学 Interactive mode image partitioning method based on geodesic distance
CN102129691A (en) * 2011-03-22 2011-07-20 北京航空航天大学 Video object tracking cutting method using Snake profile model
CN102156995A (en) * 2011-04-21 2011-08-17 北京理工大学 Video movement foreground dividing method in moving camera

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4639271B2 (en) * 2005-12-27 2011-02-23 三星電子株式会社 camera
JP2008123086A (en) * 2006-11-09 2008-05-29 Matsushita Electric Ind Co Ltd Image processor and image processing method
CN100580691C (en) * 2007-03-16 2010-01-13 上海博康智能信息技术有限公司 Interactive human face identificiating system and method of comprehensive utilizing human face and humanbody auxiliary information
CN101587541B (en) * 2009-06-18 2011-02-02 上海交通大学 Character recognition method based on human body contour outline

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101588459A (en) * 2009-06-26 2009-11-25 北京交通大学 A kind of video keying processing method
CN101710418A (en) * 2009-12-22 2010-05-19 上海大学 Interactive mode image partitioning method based on geodesic distance
CN102129691A (en) * 2011-03-22 2011-07-20 北京航空航天大学 Video object tracking cutting method using Snake profile model
CN102156995A (en) * 2011-04-21 2011-08-17 北京理工大学 Video movement foreground dividing method in moving camera

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507997A (en) * 2020-04-22 2020-08-07 腾讯科技(深圳)有限公司 Image segmentation method, device, equipment and computer storage medium
CN111583292A (en) * 2020-05-11 2020-08-25 浙江大学 Self-adaptive image segmentation method for two-photon calcium imaging video data
CN111583292B (en) * 2020-05-11 2023-07-07 浙江大学 Self-adaptive image segmentation method for two-photon calcium imaging video data

Also Published As

Publication number Publication date
CN103119625B (en) 2015-06-03
CN103119625A (en) 2013-05-22

Similar Documents

Publication Publication Date Title
WO2012162981A1 (en) Video character separation method and device
CN108520219B (en) Multi-scale rapid face detection method based on convolutional neural network feature fusion
KR101023733B1 (en) Intra-mode region-of-interest video object segmentation
KR100997064B1 (en) Multi-mode region-of-interest video object segmentation
CN108804578B (en) Unsupervised video abstraction method based on consistency segment generation
US7680342B2 (en) Indoor/outdoor classification in digital images
WO2017084204A1 (en) Method and system for tracking human body skeleton point in two-dimensional video stream
US8265392B2 (en) Inter-mode region-of-interest video object segmentation
Li et al. Saliency model-based face segmentation and tracking in head-and-shoulder video sequences
US20110299774A1 (en) Method and system for detecting and tracking hands in an image
US20150125074A1 (en) Apparatus and method for extracting skin area to block harmful content image
CN108447068B (en) Ternary diagram automatic generation method and foreground extraction method using ternary diagram
CN104050471A (en) Natural scene character detection method and system
US9418426B1 (en) Model-less background estimation for foreground detection in video sequences
JP4098021B2 (en) Scene identification method, apparatus, and program
CN116309607B (en) Ship type intelligent water rescue platform based on machine vision
CN111815528A (en) Bad weather image classification enhancement method based on convolution model and feature fusion
Zhu et al. Automatic object detection and segmentation from underwater images via saliency-based region merging
CN114359323A (en) Image target area detection method based on visual attention mechanism
CN109784216B (en) Vehicle-mounted thermal imaging pedestrian detection Rois extraction method based on probability map
Arsic et al. Improved lip detection algorithm based on region segmentation and edge detection
CN107704864B (en) Salient object detection method based on image object semantic detection
JP2000348173A (en) Lip extraction method
Zafarifar et al. Blue sky detection for picture quality enhancement
Jyothisree et al. Shadow detection using tricolor attenuation model enhanced with adaptive histogram equalization

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 201180001853.1

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 11866721

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 11866721

Country of ref document: EP

Kind code of ref document: A1