WO2012162981A1

WO2012162981A1 - Video character separation method and device

Info

Publication number: WO2012162981A1
Application number: PCT/CN2011/079751
Authority: WO
Inventors: 刘志; 史冉; 丁保焱; 薛银珠; 杨胜
Original assignee: 华为技术有限公司
Priority date: 2011-09-16
Filing date: 2011-09-16
Publication date: 2012-12-06
Also published as: CN103119625B; CN103119625A

Abstract

The present invention relates to the technical field of communication. Disclosed are a video character separation method and device, applicable to separation of various video character objects and capable of separating a complete character object in real time. In the technical solution provided by the embodiments of the present invention, human face detection is performed on a first frame of video image to be processed, so as to obtain the character face area, foreground seed pixels and background seed pixels are obtained according to the character face area, the probability of each pixel being the foreground or the background in the video image is calculated respectively according to the foreground seed pixels and the background seed pixels, an image is constructed according to each probability, and image cutting is performed to obtain the character object. The solution provided by the embodiments of the present invention is applicable to video object separation.

Description

Method and device for video character segmentation

The present invention relates to the field of communications technologies, and in particular, to a method and apparatus for video character segmentation. Background technique

The object segmentation technique refers to separating the object of interest to the user on the video or image at the pixel level from the background, and the segmented object can be synthesized into a new background. In the existing technique for automatically segmenting a human object, a Gaussian mixture model is used to establish a background color model, and then the video frame image is subtracted from the established background color model, and threshold segmentation is performed to obtain a color model of the foreground object. Using the foreground/background color model and the color difference structure between adjacent pixels, the object segmentation is automatically segmented by the graph cut, and the cut image is smoothed by the morphological opening and closing operation to optimize the segmentation result. In the technique of automatic segmentation of character objects, the original RGB (Red, Green, Blue, Red, Green, Blue) color space is replaced by the HSV (Hue, Saturation, Value) color space. To reduce the effect of brightness changes on the quality of the segmentation.

However, when the video character is segmented by the prior art, the number of components of the Gaussian mixture model is manually set, and the adaptability to various types of video is not strong, and when the opening and closing operation is used to optimize the segmentation result of the object, the segmentation result cannot be segmented. Complete character object. Summary of the invention

Embodiments of the present invention provide a method and apparatus for video character segmentation, which can be applied to segmentation of various video character objects, and can segment a complete person object in real time.

In order to achieve the above object, embodiments of the present invention use the following technical solutions:

A method for segmenting video characters, including:

Performing face detection on the first frame of the video image to be processed to obtain a face region of the person; acquiring a foreground seed pixel point and a background seed pixel point according to the face region of the person; according to the foreground seed pixel point and the background Seed pixel points, respectively calculating the video map The probability that each pixel in the image is foreground or background;

A map is constructed according to the respective probabilities, and a graph cut is performed to obtain a person object.

A device for video character segmentation, comprising:

a first acquiring unit, configured to perform face detection on the first frame video image to be processed, to obtain a human face region;

a second acquiring unit, configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area;

a calculating unit, configured to calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel in the video image is a foreground or a background;

a processing unit, configured to construct a map according to the respective probabilities, and perform graph cutting to obtain a character object. An embodiment of the present invention provides a method and an apparatus for video character segmentation. The face image of a person is obtained by performing face detection on a video image of a first frame to be processed, and acquiring a foreground seed pixel according to the face region of the character. a background seed pixel, according to the foreground seed pixel point and the background seed pixel point, respectively calculating a probability that each pixel point in the video image is a foreground or a background, constructing a graph according to the respective probabilities, and performing graph cut acquisition Character object. When segmenting a video character in the prior art, the number of components of the Gaussian mixture model is manually set, and the adaptability to various types of video is not strong, and when the opening and closing operation is used to optimize the segmentation result of the object, the complete segmentation cannot be separated. Compared with the character object, the solution provided by the embodiment of the present invention can be applied to segmentation of various video characters, and the complete character object can be segmented in real time. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are set forth in the description of the claims Other drawings may also be obtained from these drawings without the inventive labor.

1 is a flowchart of a method for video character segmentation according to Embodiment 1 of the present invention; FIG. 2 is a block diagram of a device for video character segmentation according to Embodiment 1 of the present invention; A flowchart of a method of video character segmentation; 4 is a schematic diagram of video character segmentation according to Embodiment 2 of the present invention;

Figure 5 is a schematic view showing the cutting of the figure provided in Embodiment 2 of the present invention;

6 is a schematic diagram of determining a contour according to Embodiment 2 of the present invention;

7 is a schematic diagram of luminance change detection according to Embodiment 2 of the present invention;

FIG. 8 is a block diagram of an apparatus for video character segmentation according to Embodiment 2 of the present invention. detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

Example 1

An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 1 , the method includes: Step 1 01: Performing a face detection on a video image of a first frame to be processed to obtain a face region of a person; The first frame image in the video image to be processed is processed, and when it is not the first frame video image, the video image can be quickly segmented according to the correlation between adjacent video frames.

Step 1 02: Obtain a foreground seed pixel point and a background seed pixel point according to the face area of the character;

Step 1 03: Calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel point in the video image is a foreground or a background;

Step 1 04, construct a graph according to the respective probabilities, and perform graph cut to obtain a character object.

An embodiment of the present invention provides a method for segmenting a video character. The face image of a person is obtained by performing face detection on a first frame of the video image to be processed, and acquiring a foreground seed pixel and a background seed according to the face region of the character. a pixel, according to the foreground seed pixel point and the background seed pixel point, respectively calculating a probability that each pixel point in the video image is a foreground or a background, constructing a graph according to the respective probabilities, and performing graph cutting to obtain a character object . The prior art which is used when segmenting video characters is not adaptable to various types of videos, and when the object segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, and the embodiment of the present invention provides Plan can It is suitable for segmentation of various video characters, and can segment complete character objects in real time.

The embodiment of the present invention provides a device for video character segmentation. As shown in FIG. 2, the device includes: a first acquiring unit 201, a second acquiring unit 202, a calculating unit 203, and a processing unit 204.

The first acquiring unit 201 is configured to perform face detection on the first frame video image to be processed to obtain a face region of the person;

a second acquiring unit 202, configured to acquire a foreground seed pixel point and a background seed pixel point according to the character face area;

The calculating unit 203 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;

The processing unit 204 is configured to construct a map according to the respective probabilities, and perform a graph cut to obtain a person. The apparatus for providing a video character segmentation is provided in the embodiment of the present invention, and the first frame of the video image to be processed is performed by the first acquiring unit. Detecting, acquiring a face region of a person, according to the face region of the character, the second acquiring unit acquires a foreground seed pixel point and a background seed pixel point, and then the calculating unit respectively calculates a probability that each pixel point in the video image is a foreground or a background The processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object. When the video character is segmented in the prior art, the number of components of the Gaussian mixture model is manually set, the adaptability to various types of video is not strong, and the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time.

Example 2

An embodiment of the present invention provides a method for segmenting a video character. As shown in FIG. 3, the method includes: Step 301: Determine whether a video frame image to be processed is a first frame.

The purpose of determining whether the current video frame image to be processed is the first frame is that when the current video frame image is not the first frame, the current frame image may be processed according to the result of the segmented video character object of the previous frame, that is, according to the adjacent video frame. The correlation of the images is processed, which speeds up the processing.

Step 302: When the image of the video frame to be processed is the first frame, perform a face detection on the first frame of the video image to be processed, and obtain a face region of the character; Specifically, the AdaBoost algorithm is used for face detection. Adaboos t is an iterative algorithm. The core idea is to train different classifiers for the same training set, which can be called weak classifiers, and then combine these weak classifiers. , constitutes a stronger final classifier (strong classifier). Performing face detection, that is, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, The face area is judged by the group of classifiers, and the detected face area of the person is as shown in the rectangular area in Fig. 4 (a).

Step 303: Obtain a foreground seed pixel point and a background seed pixel point according to the character face area.

According to the face area of the person, a moderate adjustment is made to generate a foreground sample model and a background sample model. Specifically, as shown in FIG. 4(b), the face area of the person is appropriately reduced, and then the distance between the face area of the person and the upper body area is determined according to the height of the face area of the person, according to the head and shoulder of the person. The ratio of the width determines the area of the upper body, so that the foreground model can be generated. The pixel in the area included in the light-colored line in Figure 4 (b) is the foreground seed pixel;

As shown in Fig. 4(c), based on the determined foreground sample model, a background sample model is generated, and the dark dotted line in Fig. 4(c) and the pixel points in the area included in the image boundary are Seed pixels for the background.

Step 304: Determine, according to the foreground seed pixel points, three sets of sample values of the foreground seed pixel points on three color components of L, a, and b, and determine the background seed pixel points according to the background seed pixel points respectively. Three sets of sample values on three color components of L, a, b;

The solution provided by the embodiment of the present invention converts a video image from RGB (Red, Green, Blue, Red, Green, Blue) space to Lab space, Lab consists of three channels, and L channel is a luminance channel, a channel and b channel. For two color channels. a represents the range from magenta to green, and b represents the range from yellow to blue. The three color components L, a, and b are independent of each other. For the n foreground seed pixels obtained from the sample, three sets of sample values are obtained on the three color components L, a, b, {a , a ₂ ^F ,..., a„ ^F j, {«". Similarly, for the n background seed pixels obtained by the sample, 3 sets of sample values}, {a , a ₂ ^B , ..., a _n ^B } , {6f , 6 ₂ ^£ , are also obtained. ...,6„ ^£ }, where superscripts F and B The other ¹ J represents the foreground (Foreground) and the background (Background).

Step 305: Calculate, according to sample values of the foreground seed pixel point and the background seed pixel point, a first foreground probability and a first background probability of each pixel point in the video image.

Specifically, f/(x), f( ^x ), respectively, are calculated according to sample values of the foreground seed pixel point and the background seed pixel point.

f (x), f (x);

For example, the kernel density estimation method is used to construct a non-parametric model of the foreground/background. For any pixel of the first frame of the video image to be processed, based on the non-parametric kernel density estimation, f( ^x ) = _∑^ ^. ^ex P[ _~ - _Ί — ] ,

_Xi represents an i-th foreground seed pixel point or an i-th background seed pixel point, and X represents any one of the video images;

Calculating f/(x), f(x), f(x), and f(x), f(x), f(x) according to f(x), where x represents any pixel in the video image Points, f/(x), f(x), f(x) represent the foreground probabilities of the pixel points on the three color components L, a, b, respectively, f ( _x ), f ( _x ), f ( _x ) respectively representing the background probability of the pixel points on the three color components L, a, b;

Since the three color components are independent of each other, the first pixel of any one of the video images is calculated according to f ^F (x) = f _L ^F( x)*f _a ^F( x)*f _b ^F( x) Probability of foreground; according to f ^B (x) = f ^B ( _X ) * f _a ^B ( _X ) * f _b ^B (x), the first background probability of any pixel in the video image is calculated.

Step 306: Normalize the first foreground probability and the first background probability, and calculate a probability that each pixel in the video image is a foreground or a background.

Specifically, the foreground probability and the background probability of the pixel X are normalized, that is, the video is calculated according to F(x) = f ^F (x) / [f ^F (x) + f ^B (x)] The probability that any pixel in the image is foreground;

According to P ^B ( X ) = 1 - (X), the probability that any pixel point in the video image is the background is calculated.

According to the method provided in steps 305 and 306, each image in the video image of the first frame is The prime point is processed to obtain the probability that each pixel is foreground or background. As shown in Fig. 4 (d), the higher the pixel value, the brighter the pixel, the greater the probability that the pixel is foreground. Similarly, the darker the pixel, The greater the probability of it being the background.

Step 307, construct a map according to the respective probabilities, and perform graph cutting to obtain a character object; it should be noted that, as shown in FIG. 5 (a), a graph G {V, E, W}, where V (Ver tex, vertex) ) is the basic element in Figure G, representing the interrelated individuals in the graph, _Vl represents the ith vertex, ^ represents the jth vertex; E (Edge, edge) refers to the join of the two associated vertices in graph G line, Ε _¾ represents a linking i, j edge vertices; W (we i ght, right) refers to a value assigned to the edge connecting two vertices, which represents how closely the relationship between these two _vertices, w ϋ represents The weight of the edge connecting i, j vertices.

The solution provided by the embodiment of the present invention performs the graph cutting by using the maximum stream minimum cut algorithm. As shown in FIG. 5( b ), all the vertices of the graph are divided into two subsets, and the edges between the two subsets constitute a cut of the graph. This is shown by the dotted line in Figure 5 (b). The two subsets respectively include a virtual source point and a virtual sink point, the source point corresponds to the foreground seed pixel point, and the sink point corresponds to the background seed pixel point. All the cuts with the smallest cut weight from the source point to the sink point are called minimum cuts. A basic way to find the minimum cut is to find the maximum flow from the source point to the sink point, that is, the edge connecting the two vertices is regarded as A water pipe, the weight of the side is the capacity of the water pipe. The so-called maximum flow is the maximum water flow that can be passed from the source point to the sink point. When the flow of water from the source point to the sink point reaches the maximum, the water pipes that are completely filled are the source and sink points. Minimal cut.

In the embodiment of the present invention, a map is constructed by constructing energy items between pixels, specifically, a pixel in a video frame image corresponds to a vertex of the graph, and an edge of the graph is connected to two adjacent pixels correspondingly, and each edge is allocated. A weight indicates the relationship between the two pixels connected to the edge, such as the degree of similarity between the colors and the relationship between the source and the sink. The foreground probability of the pixel indicates the relationship with the source, and the background probability of the pixel. Indicates the relationship with the Meeting Point. The source point and the sink point respectively represent the foreground seed pixel point and the background seed pixel point. The larger the foreground probability of the pixel, the closer the relationship with the source point is, and the more likely it is to belong to the character object; the larger the background probability of the pixel, the greater the The closer the relationship of Meeting Point is, the more likely it is to belong to the background. Specifically, according to E _d (x) = calculating a data energy term of any one of the pixels in the video image;

Calculating any pixel in the video image according to E _s (x,y) = \ ' ^{Β(Χ)≠ B(y)}

[0, B(x) = B(y) The smoothing energy term of the pixel;

Where B is a binary variable, ^X ) ^{= Q} indicates that any pixel in the video image belongs to the background, and ^ ^χ ) ^{= 1} indicates that any pixel in the video image belongs to the foreground and is a data energy term, ( ^; , is based on the smoothing energy term of the adjacent pixel, the pixel y is any pixel in the 4 neighborhood of the pixel X, "for one parameter," may be 1.5;

Forming a map of smoothing energy terms of any one of the video images and its adjacent pixels according to a data energy term of any one of the video images;

In the embodiment of the present invention, the problem of segmenting a person object in a video frame image may be converted into a problem of segmentation of a graph to be constructed. Specifically, the maximum stream minimum cut algorithm may be used to perform graph cut, thereby obtaining a character object. .

Steps 302 to 307 are processes for processing the image of the first frame. When the image to be processed is not the image of the first frame, if the processing of steps 302 to 307 is performed for each frame of image, it will be quite expensive. Time, so when processing the entire video sequence, the following process is further taken.

Step 308: When the video frame image to be processed is not the first frame, perform a brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image.

Whether the non-parametric model of the foreground/background needs to be updated depends mainly on the change of the scene. One of the main factors is the change of brightness. The change of brightness may be caused by the change of the surrounding environment, or may be caused by the video collection device. It will result in a foreground I background probability calculated using the current non-parametric model that does not fit well with the current video frame.

Specifically, the brightness change detection mainly utilizes the Bha t tacha ryya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame. The difference between

Where H i) is the value of the histogram at the gray level i, H. (i) is a histogram H. The value of the gray level i.

Step 309, determining whether the brightness difference distance is less than a preset threshold;

Here, the preset threshold is determined experimentally and can be 0.1.

When the brightness difference distance is greater than the preset threshold, the processing is performed according to the current image to be processed as the processing method of the first frame video image; as shown in FIG. 7 , the continuous two frames of the brightness difference distance greater than the preset threshold is the histogram Therefore, the processing is performed in accordance with the processing of steps 302-307.

Step 310: Determine, when the brightness difference distance is less than a preset threshold, a contour of a person object of the current video frame image according to a contour of a person object of the video image of the previous frame;

Specifically, according to the binary image of the segmentation result of the previous frame video image, at least one key point on the contour of the character object of the previous frame video image is extracted; and the feature point whose direction is abruptly changed on the contour of the object may be extracted, and then Proportional sampling is performed to obtain a suitable number of key points, and the starting point, ending point, and feature points closer to the bottom of the image are also selected as key points. Dozens of dark dots in the gray banded area as shown in Fig. 6 are the key points.

Determining, according to at least one of the key points, a corresponding key point of each of the key points in the current video frame image;

Specifically, for each key point X, the motion vector is estimated by minimizing the following energy function, and an energy function is obtained according to E=£ + r*G; wherein the parameter r is used to control the energy term between L and G Weight relationship;

The energy term L is defined as = Μ(χ) II - ^ι (ζ _χ )- (ζ _χ +mv _x )\\ ,

Where ^ represents the position of X, "^ denotes the motion vector estimated for pixel X, and W" denotes a window centered on pixel X in the previous frame. If the pixel at the same position as the pixel X belongs to the foreground region in the previous frame, MW = i, otherwise ^Μ ) = ο.

The energy term G depends on the gradient information of the current frame ^G = ^ex P (- ^max llg (+foot J11), ce{L, a, b} where ^ represents the gradient value of a component on the three color components.

You can set the motion vector of pixel X to be 4 pixels up, down, left, and right. ( 2x4+1 ) * ( 2x4+1 ) = 81 possible motion quantities. For each possible motion vector, an energy function E value can be calculated, and the motion vector corresponding to the smallest E value is selected as the pixel X. The motion vector, in this way, can obtain the corresponding key point of the pixel X in the current frame.

Determining at least one target key point according to a change in distance and slope between two adjacent key points;

Specifically, after all the key points of the current frame are obtained, according to the distance and the slope change between the two adjacent key points, the key points that have little influence on the contour shape of the character object are discarded, and the remaining points are determined. The key point is the key point of the goal.

The at least one target key point is connected to obtain a character object outline of the current video frame image.

By connecting the at least one target key point, a rough outline of the character is obtained, and the corrosion expansion operation is performed on the basis of the outline to obtain an indeterminate area, that is, the gray stripe area shown in FIG.

Step 311: Update, according to the contour of the character object, a probability that each pixel in the current video frame image is a foreground or a background;

Specifically, the currently determined character object contour is an approximate character object contour, as shown in FIG. 6, the white area is the foreground, the black area is the background, and the gray strip area is the uncertain area, that is, the gray strip area may For the foreground, it may also be a background, according to the non-parametric model of the foreground/background of the video image of the previous frame, the non-parametric model of the foreground/background of each pixel in the current video frame image is updated, that is, each pixel is determined to be foreground or The probability of the background, specifically, the probability of each pixel point being the foreground or background is calculated according to the methods of steps 305 and 306.

It should be noted that since most of the foreground/background in the current video frame image has been determined based on the previous frame video image, the speed of dividing the person object of the entire video can be speeded up.

Step 312: Construct a map according to the respective probabilities, and perform a graph cut to obtain a character object of the current video frame image.

Specifically, the person object of the video frame image is cut according to the method of step 307.

A video character segmentation method provided by an embodiment of the present invention, by using a video frame image The segmentation of the object object, when the prior art is used, the adaptability to the various types of video is not strong, and when the segmentation result is optimized by the opening and closing operation, the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time, and the entire video can be quickly segmented based on the correlation between adjacent video frames.

An embodiment of the present invention provides a device for video character segmentation. As shown in FIG. 8, the device includes: a determining unit 801, a first obtaining unit 802, a second obtaining unit 803, a calculating unit 804, a determining module 805, and a first calculating. The module 806, the first calculation sub-module 807, the second calculation sub-module 808, the second calculation module 809, the first calculation sub-module 810, the second calculation sub-module 811, the processing unit 812, the detection acquisition unit 813, the determination unit 814, The extraction module 815, the first determining module 816, the second determining module 817, the obtaining module 818, and the updating unit 819.

The determining unit 801 is configured to determine whether the image of the video frame to be processed is the first frame;

When the image of the video frame to be processed is the first frame, the first acquiring unit 802 is configured to perform face detection on the first frame of the video image to be processed to obtain a face region of the person; and use the AdaBoos t algorithm to perform the face. Detecting, using a face image and a non-face image, training a group of classifiers, wherein the face image is a positive sample and the non-face image is a negative sample; searching for each region of the input image to be processed, using the group The classifier determines the face area.

According to the face area of the person, the second acquiring unit 803 acquires a foreground seed pixel point and a background seed pixel point;

Specifically, the facial region of the person acquired by the first obtaining unit 802 is moderately adjusted, that is, the facial region of the person is appropriately reduced, and then the distance between the facial region and the upper body region of the human is determined according to the height of the facial region of the human. The area of the upper body is determined according to the ratio of the width of the head and the shoulder of the person, so that the foreground model is generated, wherein the pixel included in the foreground model is the foreground seed pixel; Based on the expansion, a background sample model is generated, wherein the pixel points included in the background sample model are background seed pixels.

The calculating unit 804 is configured to separately calculate a probability that each pixel in the video image is a foreground or a background according to the foreground seed pixel point and the background seed pixel point;

Specifically, when calculating a probability that each pixel point in the video image is a foreground or a background, The determining module 805 of the calculating unit 804 is configured to respectively determine three sets of sample values of the foreground seed pixel points on the three color components of L, a, b according to the foreground seed pixel points, according to the background seed The pixel points respectively determine three sets of sample values of the background seed pixel points on the three color components of L, a, b;

a first calculating module 806, configured to separately calculate a first foreground probability and a first background probability of each pixel in the video image according to the foreground value of the foreground seed pixel and the background seed pixel;

Further, the first calculating submodule 807 in the first calculating module 806 is configured to calculate f/(x), f (x) respectively according to sample values of the foreground seed pixel point and the background seed pixel point. ), f ( f (x), f (x), f (x); where x represents any pixel in the video image, f/( _x ), f/(x), f (x) respectively Representing the foreground probability of the pixel points on the three color components a and b, f (x), f (x), and f (x) respectively represent the background scene of the pixel points on the three color components a and b Probability

a second calculation sub-module 808, configured to

Calculating a first foreground probability of any one of the pixel points in the video image;

The second calculation sub-module 808 is further configured to calculate any pixel in the video image according to f ^B ( _x ) = f _L ^B ( _x )*f _a ^B ( _x )*f _b ^B (x) The first background probability of the point.

The first foreground probability and the first background probability are normalized, and the second computing module 809 calculates a probability that each pixel in the video image is a foreground or a background;

Specifically, the first calculation sub-module 810 in the second calculation module 809 is configured to calculate the location according to F(x) = f ^F (x) / [f ^F (x) + f ^B (x)] The probability that any pixel in the video image is foreground;

The second calculation sub-module 811 is configured to calculate, according to pB( _x )=l-^(x), a probability that any one of the pixel points in the video image is a background.

After determining the foreground/background probability of the pixel in the current video frame image, the processing unit 812 constructs a map according to the respective probabilities, and performs graph cutting to obtain the character object;

When the video frame image to be processed is not the first frame, the detection acquiring unit 813, for the video frame Performing a brightness change detection on the image to obtain a brightness difference distance between the current video frame image and the previous frame video image;

Specifically, the luminance change detection mainly utilizes the Bha t tacharyya distance to calculate the luminance histogram of the current frame and the luminance histogram of the previous frame. The difference between

When the brightness difference distance is less than the preset threshold, the determining unit 813 determines the character object contour of the current video frame image according to the character object contour of the previous frame video image;

Further, the extracting module 815 in the determining unit 814 is configured to extract at least one key point on the contour of the character object of the video image of the previous frame according to the binary image of the segmentation result of the video image of the previous frame. ;

According to at least one of the key points, the first determining module 816 determines a corresponding key point of each of the key points in the current video frame image; according to a distance and a slope change between two adjacent key points The second determining module 817 determines at least one target key point;

An obtaining module 818, configured to connect the at least one target key point to obtain a character object contour of the current video frame image;

And then, according to the character object contour, the updating unit 819 updates a probability that each pixel point in the current video frame image is a foreground or a background; and constructs a map according to the updated respective probability, the processing unit 812 is further configured to perform The graph cut acquires the character object of the current video frame image.

An embodiment of the present invention provides a device for segmenting a video object, by using a first acquiring unit to perform face detection on a first frame of a video image to be processed, to obtain a face region of a person, according to the face region of the character, a second acquiring unit. Obtaining a foreground seed pixel and a background seed pixel, and then calculating a probability that each pixel in the video image is a foreground or a background, the processing unit constructs a map according to the respective probabilities, and performs graph cutting to obtain a character object. When the video character is segmented in the prior art, the number of components of the Gaussian mixture model is manually set, the adaptability to various types of video is not strong, and the complete character object cannot be segmented, which is provided by the embodiment of the present invention. The scheme can be applied to the segmentation of various video characters, and the complete character object can be segmented in real time. The above is only the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of changes or substitutions within the technical scope of the present invention. It should be covered by the scope of the present invention. Therefore, the scope of the invention should be determined by the scope of the claims.

Claims

Claim

A method for segmenting a video character, comprising:

Performing face detection on the first frame of the video image to be processed to obtain a face region of the person;

Obtaining a foreground seed pixel point and a background seed pixel point according to the face area of the person; calculating a probability of each pixel point in the video image as a foreground or a background according to the foreground seed pixel point and the background seed pixel point respectively ;

The method for segmenting a video character according to claim 1, wherein before the performing the face detection on the first frame video image to be processed to obtain the face region of the person, the method further includes:

Determining whether the image of the video frame to be processed is the first frame;

When the video frame image to be processed is not the first frame, performing brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image;

Determining, according to the person object contour of the video image of the previous frame, a character object contour of the current video frame image, when the brightness difference distance is less than a preset threshold;

Updating, according to the contour of the human object, a probability that each pixel in the current video frame image is a foreground or a background;

A map is constructed according to the respective probabilities, and a graph cut is performed to obtain a character object of the current video frame image.

The method for segmenting a video character according to claim 2, wherein determining the contour of the person object of the current video frame image according to the contour of the human object of the video image of the previous frame comprises:

Extracting at least one key point on the contour of the character object of the video image of the previous frame according to the binary image of the segmentation result of the video image of the previous frame;

Determining, according to at least one of the key points, a key point corresponding to each of the key points in the current video frame image;

Determining at least one target key point according to a change in distance and slope between two adjacent key points; The at least one target key point is connected to obtain a character object contour of the current video frame image.

The video character segmentation method according to claim 1, wherein the calculating, according to the foreground seed pixel point and the background seed pixel point, each pixel point in the video image as a foreground or a background The probability includes:

Determining, according to the foreground seed pixel points, three sets of sample values of the foreground seed pixel points on three color components of L, a, b, respectively, determining, according to the background seed pixel point, that the background seed pixel point is at L, a, b three sets of sample values on three color components;

Calculating, according to sample values of the foreground seed pixel point and the background seed pixel point, a first foreground probability and a first background probability of each pixel point in the video image;

The first foreground probability and the first background probability are normalized to calculate a probability that each pixel in the video image is foreground or background.

The video character segmentation method according to claim 4, wherein the calculating, according to the sample values of the foreground seed pixel point and the background seed pixel point, respectively, calculating respective pixel points in the video image The first foreground probability and the first background probability include:

Calculating f/( _x ), f/(x), f(f(x), f(x), f(x), respectively, according to sample values of the foreground seed pixel point and the background seed pixel point; , x represents any pixel point in the video image, f/( _x ), f/( _x ), f ( _x ) respectively represent the foreground probability of the pixel point on the three color components a, b, f ( x), f (x), f (x) respectively represent the background scene probability of the pixel points on the three color components a and b;

W ΐ ^ρ (χ) = f _L ^F (x) * f _a ^F (x) * f _b ^F (x) , calculating a first foreground probability of any pixel in the video image;

A first background probability of any one of the pixel points in the video image is calculated according to f ^B ( _x ) = f ( _x ) * f _a ^B ( _x ) * f _b ^B (x) .

The video character segmentation method according to claim 4, wherein the first foreground probability and the first background probability are normalized, and each pixel in the video image is calculated. The probability that the point is foreground or background includes: Calculating a probability that any pixel point in the video image is foreground according to P ^F (X) = f ^F ( _X ) / [f ^F ( _X ) + f ^B ( _X )];

According to P ^B ) = i - P ^F (x), the probability that any pixel point in the video image is the background is calculated.

7. A device for segmenting a video character, comprising:

a first acquiring unit, configured to perform face detection on the first frame video image to be processed, and obtain a face region of the character;

a calculating unit, configured to calculate, according to the foreground seed pixel point and the background seed pixel point, a probability that each pixel point in the video image is a foreground or a background;

a processing unit, configured to construct a map according to the respective probabilities, and perform graph cutting to obtain a character object.

The device for dividing a video character according to claim 7, wherein the device further comprises:

a determining unit, configured to determine whether the image of the video frame to be processed is the first frame;

a detection acquiring unit, configured to: when the video frame image to be processed is not the first frame, perform brightness change detection on the video frame image to obtain a brightness difference distance between the current video frame image and the previous frame video image;

a determining unit, configured to: when the brightness difference distance is less than a preset threshold, determine a character object contour of the current video frame image according to the character object contour of the previous frame video image;

And an updating unit, configured to update, according to the contour of the character object, a probability that each pixel point in the current video frame image is a foreground or a background;

The processing unit is further configured to: construct a map according to the updated respective probabilities, and perform a graph cut to obtain a character object of the current video frame image.

9. The apparatus for video character segmentation according to claim 8, wherein the determining unit comprises:

An extracting module, configured to extract, according to a binary image of a segmentation result of the video image of the previous frame At least one key point on the contour of the character object of the previous frame of the video image;

a first determining module, configured to determine, according to the at least one of the key points, a corresponding key point of each of the key points in the current video frame image;

a second determining module, configured to determine at least one target key point according to a distance and a slope change between two adjacent key points;

And an obtaining module, configured to connect the at least one target key point to obtain a character object contour of the current video frame image.

The device for dividing a video character according to claim 7, wherein the calculating unit comprises:

a determining module, configured to respectively determine three sets of sample values of the foreground seed pixel points on three color components of L, a, b according to the foreground seed pixel points, and determine the background seed according to the background seed pixel points respectively Three sets of sample values of pixels on three color components of L, a, b;

a first calculation module, configured to respectively calculate a first foreground probability and a first background probability of each pixel point in the video image according to sample values of the foreground seed pixel point and the background seed pixel point; And normalizing the first foreground probability and the first background probability, and calculating a probability that each pixel in the video image is a foreground or a background.

The device for dividing a video character according to claim 10, wherein the first calculating module comprises:

a first calculation submodule, configured to calculate f/( _x ), f/( _x ), f(x^Pf(x), f, respectively, according to sample values of the foreground seed pixel and the background seed pixel ( _x ), f (x); where x represents any pixel in the video image, and f/( _x ), f ( _x ), f (x) respectively represent the pixel at L, a, b The foreground probabilities on the three color components, f ( _x ), f ( _x ), and f ( _x ) respectively represent the background scene probabilities of the three color components of the pixel points L, a, b;

a second calculation submodule for

The second calculation sub-module is further configured to calculate, according to f ^B ( _x ) = f ^B ( _x )*f _a ^B ( _x )*f _b ^B (x), any pixel point in the video image. First background probability.

The device for dividing a video character according to claim 10, wherein the second calculating module comprises:

a first calculation submodule, configured to calculate, according to ^( =^( / (^ + ^( ), a probability that any pixel in the video image is a foreground;

a second calculation sub-module, configured to calculate, according to ^ ( _X ) = l- ( _x ), a probability that any one of the pixel points in the video image is a background.