WO2020062898A1

WO2020062898A1 - Video foreground target extraction method and apparatus

Info

Publication number: WO2020062898A1
Application number: PCT/CN2019/088278
Authority: WO
Inventors: 蔡昭权; 蔡映雪; 陈伽; 胡松; 黄思博; 李慧; 胡辉; 陈明阳
Original assignee: 惠州学院
Priority date: 2018-09-26
Filing date: 2019-05-24
Publication date: 2020-04-02
Also published as: WO2020063321A1; CN110378867A; CN110335288A; CN110659562A; CN110363788A; CN110516534A; WO2020063189A1; WO2020062899A1; WO2020063436A1

Abstract

Disclosed are a video foreground target extraction method and apparatus. In the method, firstly, a transparency estimation value is recalculated by measuring the reliability of foreground and background pixel pairs to obtain a first transparency mask of the first image; then, a new picture is generated by superposing grayscale information and a second transparency mask of the first image is obtained, and the first transparency mask of the first image is further modified, finally, the foreground object of a certain frame of image in a video is extracted by using the modified first transparency mask. The present disclosure can comprehensively use the reliability and grayscale information of the foreground and background pixel pairs in a certain frame of image in the video, and provides a new video foreground target extraction solution.

Description

Video foreground target extraction method and device

Technical field

The present disclosure belongs to the field of image processing, and particularly relates to a method and a device for extracting video foreground objects.

Background technique

In the prior art in the image field, for a certain frame image in a video, a transparency mask is generated by selecting a color range, and then the foreground target of the video is extracted by using the transparency mask.

However, in the prior art, although there are enough solutions for extracting video foreground targets, there is no related and novel implementation method for how to use foreground background pixel pairs and gray level information to extract video foreground targets.

Summary of the Invention

The present disclosure provides a video foreground target extraction method, including the following steps:

S100. For a first image in a video, divide all foreground pixel sets F, all background pixel sets B, and all unknown pixel sets Z in the image; wherein the first image is a certain image extracted from the video. Frame image

S200. Given some foreground and background pixel pairs (F _i , B _j ), measure the transparency of each unknown pixel Z _k according to the following formula

Among them, I _k is the RGB color value of the unknown pixel Z _k , the foreground pixel F _i is the m foreground pixels closest to the unknown pixel Z _k , and the background pixel B _j is also the m closest to the unknown pixel Z _k Background pixels, the foreground and background pixel pairs (F _i , B _j ) totaling m ² groups;

S300. For each of the m ² groups of foreground and background pixel pairs (F _i , B _j ) and their corresponding

The confidence level n _{ij of the} foreground and background pixel pair (F _i , B _j ) is measured according to the following formula:

Among them, σ takes a value of 0.1, and the set of foreground and background pixel pairs corresponding to MAX (n _ij ) with the highest reliability is selected as (F _iMAX , B _jMAX );

S400. Calculate the transparency value of each unknown pixel Z _k according to the following formula

S500. According to the transparency estimation value of each unknown pixel _Zk

Determine a first transparency mask of the first image initially;

S600: Superimpose grayscale information on the first image to generate a second image, and divide the second image into all its foreground pixel sets, all background pixel sets, and all unknown pixel sets;

S700. For the second image, perform steps S200 to S500 to determine a first transparency mask of the second image, and use the first transparency mask of the second image as a second transparency mask of the first image. ;

S800. Use the second transparency mask of the first image to modify the first transparency mask of the first image.

S900. Extract the foreground target in the first image of the video according to the first transparency mask of the first image obtained by the correction in step S800.

In addition, the present disclosure also discloses a video foreground target extraction device, including:

A first dividing module, configured to divide, for a first image in a video, all foreground pixel sets F, all background pixel sets B, and all unknown pixel sets Z in the image; wherein the first image is from the A frame of image extracted from the video;

A first metric module, configured to: given certain foreground and background pixel pairs (F _i , B _j ), measure the transparency of each unknown pixel Z _k according to the following formula

A second metric module, configured to: for each of the m ² groups of foreground and background pixel pairs (F _i , B _j ) and their corresponding

A calculation module for calculating an estimated transparency value of each unknown pixel Z _k according to the following formula

A determining module, configured to: according to the transparency estimation value of each unknown pixel Z _k

Determine a first transparency mask of the first image initially;

A second division module, configured to superpose the grayscale information on the first image to generate a second image, and divide the second image into all its foreground pixel sets, all background pixel sets, and all unknown pixel sets;

Recalling a module, for: for the second image, calling the first measurement module, the second measurement module, the calculation module, and the determination module again to determine the first transparency mask of the second image, and The first transparency mask of the second image is used as the second transparency mask of the first image;

A correction module, configured to: use the second transparency mask of the first image to modify the first transparency mask of the first image;

An extraction module is configured to extract a foreground object in the first image of the video according to a first transparency mask of the first image obtained by the correction module.

Through the method and device, the present disclosure can comprehensively utilize the credibility and gray level information of the foreground and background pixel pairs, and provide a new video foreground target extraction scheme.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram of a method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of a device according to another embodiment of the present disclosure.

detailed description

In order to enable those skilled in the art to understand the technical solutions disclosed in the present disclosure, the technical solutions of the various embodiments will be described below in conjunction with the embodiments and related drawings. The described embodiments are part of the embodiments of the present disclosure, rather than All examples. The terms "first", "second", and the like used in this disclosure are used to distinguish different objects, rather than to describe a specific order. In addition, "including" and "having", as well as any variations thereof, are intended to cover and not include exclusively. For example, a process, method, or system, or product or device containing a series of steps or units is not limited to the listed steps or units, but optionally includes steps or units not listed, or optional It also includes other steps or units inherent to these processes, methods, systems, products, or equipment.

Reference to "an embodiment" herein means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the present disclosure. The appearances of this phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are they independent or alternative embodiments that are mutually exclusive with other embodiments. Those skilled in the art can understand that the embodiments described herein may be combined with other embodiments.

Referring to FIG. 1, FIG. 1 is a schematic flowchart of a video foreground target extraction method according to an embodiment of the present disclosure. As shown, the method includes the following steps:

It can be understood that there are many ways to divide the image into foreground pixels, background pixels, and unknown pixels. It can be manually labeled, it can also be machine-learned or data-driven, or it can be divided into all foregrounds according to the corresponding foreground threshold and background threshold And background pixels and their corresponding sets; once the foreground and background pixels are divided, the unknown pixels and their corresponding sets are naturally divided;

In addition, when the video foreground object is extracted, the first image may be: when the video is played, in response to a user operation, pausing the current video playback, and immediately intercepting the current frame of the paused picture so that Obtain a first image; the first image may also be: when a video is not played, in response to a user operation, randomly select a certain frame or frames in the video, and use a certain frame image as the first image. Anyway, it can be understood that this method can be used for foreground target extraction of each frame of image in the video. Preferably, the first image is a first frame image in a video.

For those skilled in the art, theoretically, the selection of m can make the corresponding foreground and background pixel pairs be partial samples or exhaust the entire image; as for step S200, it is intended to pass the color of the unknown pixels and the foreground background The color relationship of pixel pairs is used to estimate the transparency of unknown pixels. In addition, the selection of m can further combine the characteristics of neighbor pixels and unknown pixels in terms of color, texture, grayscale, brightness, and spatial distance;

It can be understood that the value of σ is an empirical value or a statistical value or a simulation value. Step S300 uses the credibility to further filter the foreground and background pixel pairs, and is used in the subsequent steps to estimate the unknown pixel transparency by further filtering the foreground and background pixel pairs;

S500. According to the transparency estimation value of each unknown pixel _Zk

Determine a first transparency mask of the first image initially;

That is to say, after the transparency value of each unknown pixel is obtained, this embodiment naturally determines the first transparency mask of the first image naturally; the reason why it is natural is that the transparency mask can be viewed On the grounds

Those corresponding pixels selected according to a certain value (or value range);

As far as this step is concerned, this embodiment considers that, in addition to the role of RGB color, each pixel should consider the effect of gray information on the pixel; therefore, after superimposing the gray information, use the following steps to modify the transparency mask.

So far, the present disclosure provides a new video foreground target extraction scheme by comprehensively using the reliability and gray level information of the foreground and background pixel pairs. It can be understood that the extraction of video foreground objects is an infinite approximation process. Due to the transition of color and gray in the video image frame, it is difficult to say that the transparency mask obtained by some method is the only correct one. Theoretically, the above-mentioned embodiment integrates more information and considers more factors, which is helpful for a more comprehensive examination of the images in the video, thereby extracting a relatively satisfactory video foreground target. It can be understood that, in the foregoing embodiment, when the foreground target in the first image of the video is extracted according to the first transparency mask, related methods in the prior art may also be used for reference and synthesis. That is, the key of the above embodiment is how to obtain the transparency mask in a new way, and not how to extract the video foreground target according to the transparency mask.

In another embodiment, after step S900, the method further includes the following steps:

S1000. Extract each remaining frame image from the video and use it as the first image, and repeatedly perform the foregoing steps S100 to S900 to extract all foreground objects of the video; or

S1100: Extract each remaining frame image from the video, use it as the first image, and divide the first transparency image corresponding to the current frame according to the first transparency mask of the first image modified in the previous frame. All foreground pixel sets F _c , all background pixel sets B _C and all unknown pixel sets Z _{C in} an image are repeatedly performed in steps S200 to S900 to extract all foreground targets of the video, where the current frame is divided All the foreground pixel sets F _c , all the background pixel sets B _C and all the unknown pixel sets Z _C in the corresponding first image specifically include the following steps:

S11001: Binarize the first transparency mask of the first image after the correction of the previous frame, and take a threshold value of 0.5 to obtain a first binary image of the foreground target;

S11002. Use the first binary image as the initial value of the second binary image.

S11003: Perform a morphological erosion operation on the second binary image using a circular structural element with a size of 3x3, and update the second binary image with the obtained result:

S11004, repeat step S1003 five times;

S11005. Use the first binary image as the initial value of the third binary image.

S11006: Perform a morphological expansion operation on the third binary image using a circular structural element with a size of 3x3, and update the third binary image with the obtained result:

S11007, repeat step S1006 five times;

S11008, the true corresponding pixels in the second binary image are used as all foreground pixel sets F _c , the false corresponding pixels in the third binary image are used as all background pixel sets B _C , and the remaining pixels are used as all unknown pixel sets Z. _C.

It can be understood that repeating the above steps S100 to S900 for each frame image in the video will extract all foreground targets in the video. However, considering that video frames often have continuity and similarity in the picture content between each frame of the image and its subsequent frame, in order to make full use of this coherence and similarity, the above embodiment may also be based on the previous frame. The first transparency mask of the modified first image divides all foreground pixel sets F _c , all background pixel sets B _C and all unknown pixel sets Z _C in the first image corresponding to the current frame, so that it can be used in image processing Strike a balance between accuracy and efficiency; that is, this embodiment has the inherited characteristics: it inherits the transparency mask of the previous frame, and uses the transparency mask to divide the foreground pixel set and background of the next frame The pixel set and the unknown pixel set. In view of the continuity and similarity in the picture content, this division not only relies on the transparency mask of the previous frame but also uses the means of morphological erosion and morphological expansion, which belong to the disclosure An innovation point.

In another embodiment, in step S600, the grayscale information is superimposed on the first image to generate a second image in the following manner:

S601. Perform a mean filtering on the first image to obtain a third image.

S602. The first image and the third image generate a second image by using the following formula:

Among them, IM ₂ represents the gray value of the k-th pixel on the second image after superimposition, x _r represents a neighborhood pixel of the k-th pixel x _k on the first image, and N _k represents the neighborhood of the neighborhood centered on x _k Number of pixels,

Represents the pixel value of the k-th pixel on the third image obtained by performing average filtering on the first image, and β is taken as 0.5.

For the above-mentioned embodiment, a specific manner for superimposing grayscale information is given through empirical values and related formulas.

In another embodiment, step S800 further includes:

S801. Find the edges of the second transparency mask and the edges of the first transparency mask respectively according to the second transparency mask of the first image and the first transparency mask of the first image.

S802. Obtain positions of all pixels on the edges of the second transparency mask and positions of all pixels on the edges of the first transparency mask, and determine positions of all pixels on the edges of the second transparency mask and the first transparency mask An area where the positions of all pixels at the edges of the edges are coincident, and then the pixels at the same position Z _{sp are determined} ;

S803: Find the transparency estimate value of the first transparency mask corresponding to the first image of the pixel Z _{sp and} the transparency estimate value of the second transparency mask corresponding to the first image, and use the average value of the two as the pixel Z. _sp revised transparency estimate;

S804, in order to estimate the transparency of the pixel Z _sp modification with the transparency of the first image of the first mask.

In the above embodiment, it is intended to find and compare the pixels with the same position in the two transparency masks, and use the transparency estimates of the pixels with the same position in the respective transparency masks to take the average value to modify the first The first transparency mask of the image.

In another embodiment, the step S802 further includes:

S8021. According to the determined area where the positions of all pixels of the edges of the second transparency mask and the positions of all pixels of the edges of the first transparency mask coincide, further determine pixels Z _dp with different positions, including two cases: A pixel Z _dp2 at the edge of the second transparency mask and a pixel Z _dp1 at the edge of the first transparency mask;

Different from the previous embodiment, this embodiment additionally pays attention to pixels with different positions in the edges determined by the two transparency masks, and finds these pixels at different positions from each other;

S8022, using the pixels Z _dp at different positions and the pixels Z _sp at the same position to obtain the edge determined by the edge of the second transparency mask and the edge of the first transparency mask: a closed area enclosed between the edges, And positions of all closed pixels of the closed area;

As far as this step is concerned, since the edge corresponding to each mask can be regarded as a connected or closed curve to a certain extent, no matter how the closed curves corresponding to the two masks overlap or not overlap: For those pixels on the edges corresponding to the two masks that do not correspond (that is, the positions are different, or the positions are not coincident), the closed area enclosed by the edges and edges of the two masks is jointly determined, and Positions of all closed pixels of the closed area;

S8023, perform the following sub-steps:

(1) Find the transparency value of the pixel corresponding to the position of the pixel Z _dp1 in the first transparency mask of the first image, and find the transparency value of the corresponding pixel in the second image, and use the average of the two As the pixel Z _dp1 corrected transparency estimate;

(2) Find the transparency value of the pixel corresponding to the position of the pixel Z _dp2 in the second transparency mask of the first image, and find the transparency value of the corresponding pixel in the first image, and use the average of the two As the pixel Z _dp2 corrected transparency estimate;

For this step, it is intended to find the transparency value or transparency value of each pixel in the two closed systems under the two different systems, and use the average value of the two as the corresponding pixel's modified transparency value. In step S8024, a first transparency mask for correcting the first image is used. That is, this embodiment is similar to the modified idea of the previous embodiment, except that this embodiment solves the area enclosed by the edges corresponding to the two masks. Taking the pixel Z _dp1 as an example, it belongs to a pixel of a first transparency mask of a first image, and there is an estimated transparency value in the first transparency mask of the first image. In addition, in a second image, the pixel The pixel in the second image corresponding to the position of Z _dp1 has the transparency value in the second image. In this embodiment, the transparency estimation value and the average value of the transparency value are used as the corrected transparency estimation value of the corresponding pixel Z _dp1 . The pixels Z _{dp1 are} similar.

S8024. Combine the corrected transparency estimate value of the pixel Z _{dp1 and the} corrected transparency estimate value of the pixel Z _{dp2 to} modify the first transparency mask of the first image. For example, the corrected transparency estimation value of the pixel Z _{dp1 and the} corrected transparency estimation value of the pixel Z _{dp2 are} used as the transparency value of the pixel at the corresponding position of the first transparency mask.

The steps in the method of the embodiment of the present disclosure can be adjusted, combined, and deleted according to actual needs.

It should be noted that, for the foregoing method embodiments, for simplicity of description, they are all described as a series of action combinations, but those skilled in the art should know that the present invention is not limited by the described action order. Because according to the present invention, certain steps may be performed in another order or simultaneously.

In addition, referring to FIG. 2, the present disclosure also discloses a video foreground target extraction device in another embodiment, including:

Determine a first transparency mask of the first image initially;

In this embodiment, as shown in FIG. 2, the foregoing modules may be combined with a processor and a memory to form a system for implementation; however, FIG. 2 is not a hindrance: each module may also have a processing unit itself to implement data processing capabilities.

In another embodiment, the apparatus further includes the following modules:

A module is sequentially called for: extracting each remaining frame image from the video and using it as the first image, respectively, and sequentially calling: the first division module, the first measurement module, and the second measurement A module, a calculation module, a determination module, a second division module, a re-call module, a correction module, and an extraction module to extract all foreground targets of the video; or include:

The inheritance calling module is configured to extract each remaining frame image from the video, use it as the first image, and input it to a third partitioning module, where the third partitioning module is configured according to the above The first transparency mask of the corrected first image of one frame divides all foreground pixel sets F _c , all background pixel sets B _C and all unknown pixel sets Z _C in the first image corresponding to the current frame; then, The inheritance calling module sequentially calls the first measurement module, the second measurement module, the calculation module, the determination module, the second division module, the recall module, the correction module, and the extraction module in order to extract all foreground targets of the video, where The third division module includes:

A first binary image processing unit, configured to binarize the first transparency mask of the first image corrected in the previous frame, and take a threshold of 0.5 to obtain a first binary image of the foreground target;

A second binary image initial unit, configured to: use the first binary image as an initial value of the second binary image;

A second binary image processing unit, configured to: perform a morphological erosion operation on the second binary image by using a circular structural element with a size of 3x3, and update the second binary image with the obtained result;

The first repeated calling unit is used to repeatedly call the second binary processing unit five times;

A third binary image initial unit, configured to: use the first binary image as an initial value of the third binary image;

The third binary image processing unit is configured to perform a morphological expansion operation on the third binary image using a circular structure element of size 3x3, and update the third binary image with the obtained result:

A first repeated calling unit for repeatedly calling the third binary processing unit five times;

A true / false division unit, configured to: use the corresponding corresponding pixels in the second binary image last updated by the second binary image processing unit as true to set all foreground pixel sets F _c ; Corresponding false pixels in the three binary image are all background pixel sets B _C and the remaining pixels are all unknown pixel sets Z _C.

In another embodiment, the second dividing module further includes:

An average filtering unit, configured to: perform average filtering on the first image to obtain a third image;

A second image generating unit is configured to generate the second image by using the following formula:

In another embodiment, the correction module further includes:

Finding an edge unit, configured to find the edge of the second transparency mask and the edge of the first transparency mask respectively according to the second transparency mask of the first image and the first transparency mask of the first image;

A position determining unit, configured to obtain the positions of all pixels at the edges of the second transparency mask and the positions of all pixels at the edges of the first transparency mask, and determine the positions and An area where the positions of all pixels of the edges of the first transparency mask are coincident, and then pixels Z _{sp having the} same position are determined;

The first correction unit is configured to find the transparency estimate value of the first transparency mask corresponding to the first image of the pixel Z _{sp and} the transparency estimate value of the second transparency mask corresponding to the first image, respectively. The average value of the pixel is used as the estimated transparency value of the pixel Z _sp ;

The second correction unit is configured to correct the first transparency mask of the first image by using the pixel Z _sp corrected transparency estimation value.

It can be understood that the device can implement the method described in the first embodiment.

In another embodiment, the location determining unit further includes:

Different position subunits are used to further determine pixels with different positions Z _dp according to the positions where all the pixels of the edges of the second transparency mask and the positions of all the pixels of the edges of the first transparency mask overlap, including: : A pixel Z _dp2 located at the edge of the second transparency mask and a pixel Z _dp1 located at the edge of the first transparency mask;

A closed subunit, configured to use the pixels Z _dp at different positions and the pixels Z _sp at the same position to obtain the edge determined by the second transparency mask and the edge of the first transparency mask: A closed closed area, and the positions of all closed pixels of the closed area;

Find subunits multiple times for:

(3) Find the transparency value of the pixel corresponding to the position of the pixel Z _dp1 in the first transparency mask of the first image, and find the transparency value of the corresponding pixel in the second image, and use the average of the two As the pixel Z _dp1 corrected transparency estimate;

(4) Find the transparency value of the pixel corresponding to the position of the pixel Z _dp2 in the second transparency mask of the first image, and find the transparency value of the corresponding pixel in the first transparency mask in the first image, and use The average of the two is used as the pixel Z _dp2 corrected transparency estimate;

The complex correction subunit is configured to: modify the first transparency mask of the first image in combination with the pixel Z _dp1 modified transparency estimation value and the pixel Z _dp2 modified transparency estimation value.

Those skilled in the art should also know that the embodiments described in the specification are all preferred embodiments, and the actions, modules, and units involved are not necessarily required by the present invention.

In the above embodiments, the description of each embodiment has its own emphasis. For a part that is not described in detail in one embodiment, reference may be made to related descriptions in other embodiments.

In the several embodiments provided by the present disclosure, it should be understood that the disclosed method may be implemented as a corresponding functional unit, processor, or even a system, where each part of the system may be located in one place or distributed. To multiple network elements. Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment. In addition, each functional unit may be integrated into one processing unit, or each unit may exist alone, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit. When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium. Based on such an understanding, the technical solution of the present disclosure essentially or part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, which is stored in a storage medium Including instructions for causing a computer device (which may be a smart phone, a personal digital assistant, a wearable device, a notebook computer, or a tablet computer) to perform all or part of the steps of the method described in various embodiments of the present disclosure. The aforementioned storage media include: U disks, Read-Only Memory (ROM), Random Access Memory (RAM), mobile hard disks, magnetic disks, or optical disks, and other media that can store program codes .

As mentioned above, the above embodiments are only used to describe the technical solutions of the present disclosure, but are not limited thereto. Although the present disclosure has been described in detail with reference to the foregoing embodiments, those skilled in the art should understand that they can still implement the foregoing implementations. The technical solutions described in the examples are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions outside the scope of the technical solutions of the embodiments of the present disclosure.

Claims

A video foreground target extraction method includes the following steps:

S100. For a first image in a video, divide all foreground pixel sets F, all background pixel sets B, and all unknown pixel sets Z in the image; wherein the first image is a certain image extracted from the video. Frame image

S200. Given some foreground and background pixel pairs (F i , B j ), measure the transparency of each unknown pixel Z k according to the following formula

Among them, I k is the RGB color value of the unknown pixel Z k , the foreground pixel F i is the m foreground pixels closest to the unknown pixel Z k , and the background pixel B j is also the m closest to the unknown pixel Z k Background pixels, the foreground and background pixel pairs (F i , B j ) totaling m 2 groups;

S300. For each of the m 2 groups of foreground and background pixel pairs (F i , B j ) and their corresponding
The confidence level n ij of the foreground and background pixel pair (F i , B j ) is measured according to the following formula:

Among them, σ takes a value of 0.1, and the set of foreground and background pixel pairs corresponding to MAX (n ij ) with the highest reliability is selected as (F iMAX , B jMAX );

S400. Calculate the transparency value of each unknown pixel Z k according to the following formula

S500. According to the transparency estimation value of each unknown pixel Zk
Determine a first transparency mask of the first image initially;

S600: Superimpose grayscale information on the first image to generate a second image, and divide the second image into all its foreground pixel sets, all background pixel sets, and all unknown pixel sets;

S700. For the second image, perform steps S200 to S500 to determine a first transparency mask of the second image, and use the first transparency mask of the second image as a second transparency mask of the first image. ;

S800. Use the second transparency mask of the first image to modify the first transparency mask of the first image.

S900. Extract the foreground target in the first image of the video according to the first transparency mask of the first image obtained by the correction in step S800.
The method according to claim 1, further comprising the following steps after step S900:

S1000. Extract each remaining frame image from the video and use it as the first image, and repeatedly perform the foregoing steps S100 to S900 to extract all foreground objects of the video; or

S1100: Extract each remaining frame image from the video, use it as the first image, and divide the first transparency image corresponding to the current frame according to the first transparency mask of the first image modified in the previous frame. All foreground pixel sets F c , all background pixel sets B C and all unknown pixel sets Z C in an image are repeatedly performed in steps S200 to S900 to extract all foreground targets of the video, where the current frame is divided All the foreground pixel sets F c , all the background pixel sets B C and all the unknown pixel sets Z C in the corresponding first image specifically include the following steps:

S11001: Binarize the first transparency mask of the first image after the correction of the previous frame, and take a threshold value of 0.5 to obtain a first binary image of the foreground target;

S11002. Use the first binary image as the initial value of the second binary image.

S11003: Perform a morphological erosion operation on the second binary image using a circular structural element with a size of 3x3, and update the second binary image with the obtained result:

S11004, repeat step S1003 five times;

S11005. Use the first binary image as the initial value of the third binary image.

S11006: Perform a morphological expansion operation on the third binary image using a circular structural element with a size of 3x3, and update the third binary image with the obtained result:

S11007, repeat step S1006 five times;

S11008, the true corresponding pixels in the second binary image are used as all foreground pixel sets F c , the false corresponding pixels in the third binary image are used as all background pixel sets B C , and the remaining pixels are used as all unknown pixel sets Z. C.
The method according to claim 1, wherein, in step S600, the grayscale information is superimposed on the first image to generate a second image in the following manner:

S601. Perform a mean filtering on the first image to obtain a third image.

S602. The first image and the third image generate a second image by using the following formula:

Among them, IM 2 represents the gray value of the k-th pixel on the second image after superimposition, x r represents a neighborhood pixel of the k-th pixel x k on the first image, and N k represents the neighborhood of the neighborhood centered on x k Number of pixels,
Represents the pixel value of the k-th pixel on the third image obtained by performing average filtering on the first image, and β is taken as 0.5.
The method according to claim 1, wherein step S800 further comprises:

S801. Find the edges of the second transparency mask and the edges of the first transparency mask respectively according to the second transparency mask of the first image and the first transparency mask of the first image.

S802. Obtain the positions of all pixels at the edges of the second transparency mask and the positions of all pixels at the edges of the first transparency mask, and determine the positions of all pixels at the edges of the second transparency mask and the first transparency mask. An area where the positions of all pixels at the edges of the edges are coincident, and then the pixels at the same position Z sp are determined ;

S803: Find the transparency estimate value of the first transparency mask corresponding to the first image of the pixel Z sp and the transparency estimate value of the second transparency mask corresponding to the first image, and use the average value of the two as the pixel Z. sp revised transparency estimate;

S804, in order to estimate the transparency of the pixel Z sp modification with the transparency of the first image of the first mask.
The method according to claim 4, wherein the step S802 further comprises:

S8021. According to the determined area where the positions of all pixels of the edge of the second transparency mask and the positions of all pixels of the edge of the first transparency mask coincide, further determining pixels Z dp having different positions, including: located at the second transparency mask Pixels Z dp2 at the edges of the mask and pixels Z dp1 at the edges of the first transparency mask;

S8022, using the pixels Z dp at different positions and the pixels Z sp at the same position to obtain the edge determined by the edge of the second transparency mask and the edge of the first transparency mask: a closed area enclosed between the edges, And positions of all closed pixels of the closed area;

S8023, perform the following sub-steps:

(1) Find the transparency value of the pixel corresponding to the position of the pixel Z dp1 in the first transparency mask of the first image, and find the transparency value of the corresponding pixel in the second image, and use the average of the two As the pixel Z dp1 corrected transparency estimate;

(2) Find the transparency value of the pixel corresponding to the position of the pixel Z dp2 in the second transparency mask of the first image, and find the transparency value of the corresponding pixel in the first image, and use the average of the two As the pixel Z dp2 corrected transparency estimate;

S8024. Combine the corrected transparency estimate value of the pixel Z dp1 and the corrected transparency estimate value of the pixel Z dp2 to modify the first transparency mask of the first image.
A video foreground target extraction device includes:

A first dividing module, configured to divide, for a first image in a video, all foreground pixel sets F, all background pixel sets B, and all unknown pixel sets Z in the image; wherein the first image is from the A frame of image extracted from the video;

A first metric module, configured to: given certain foreground and background pixel pairs (F i , B j ), measure the transparency of each unknown pixel Z k according to the following formula

Among them, I k is the RGB color value of the unknown pixel Z k , the foreground pixel F i is the m foreground pixels closest to the unknown pixel Z k , and the background pixel B j is also the m closest to the unknown pixel Z k Background pixels, the foreground and background pixel pairs (F i , B j ) totaling m 2 groups;

A second metric module, configured to: for each of the m 2 groups of foreground and background pixel pairs (F i , B j ) and their corresponding
The confidence level n ij of the foreground and background pixel pair (F i , B j ) is measured according to the following formula:

Among them, σ takes a value of 0.1, and the set of foreground and background pixel pairs corresponding to MAX (n ij ) with the highest reliability is selected as (F iMAX , B jMAX );

A calculation module for calculating an estimated transparency value of each unknown pixel Z k according to the following formula

A determining module, configured to: according to the transparency estimation value of each unknown pixel Z k
Determine a first transparency mask of the first image initially;

A second division module, configured to superpose the grayscale information on the first image to generate a second image, and divide the second image into all its foreground pixel sets, all background pixel sets, and all unknown pixel sets;

Recalling a module, for: for the second image, calling the first measurement module, the second measurement module, the calculation module, and the determination module again to determine the first transparency mask of the second image, and The first transparency mask of the second image is used as the second transparency mask of the first image;

A correction module, configured to: use the second transparency mask of the first image to modify the first transparency mask of the first image;

An extraction module is configured to extract a foreground object in the first image of the video according to a first transparency mask of the first image obtained by the correction module.
The apparatus according to claim 6, further comprising:

A module is sequentially called for: extracting each remaining frame image from the video and using it as the first image, respectively, and sequentially calling: the first division module, the first measurement module, and the second measurement A module, a calculation module, a determination module, a second division module, a re-call module, a correction module, and an extraction module to extract all foreground targets of the video; or include:

The inheritance calling module is configured to extract each remaining frame image from the video, use it as the first image, and input it to a third partitioning module, where the third partitioning module is configured according to the above The first transparency mask of the corrected first image of one frame divides all foreground pixel sets F c , all background pixel sets B C and all unknown pixel sets Z C in the first image corresponding to the current frame; then, The inheritance calling module sequentially calls the first measurement module, the second measurement module, the calculation module, the determination module, the second division module, the recall module, the correction module, and the extraction module in order to extract all foreground targets of the video, where The third division module includes:

A first binary image processing unit, configured to binarize the first transparency mask of the first image corrected in the previous frame, and take a threshold of 0.5 to obtain a first binary image of the foreground target;

A second binary image initial unit, configured to: use the first binary image as an initial value of the second binary image;

A second binary image processing unit, configured to: perform a morphological erosion operation on the second binary image by using a circular structural element with a size of 3x3, and update the second binary image with the obtained result;

The first repeated calling unit is used to repeatedly call the second binary processing unit five times;

A third binary image initial unit, configured to: use the first binary image as an initial value of the third binary image;

The third binary image processing unit is configured to perform a morphological expansion operation on the third binary image using a circular structure element of size 3x3, and update the third binary image with the obtained result:

A first repeated calling unit for repeatedly calling the third binary processing unit five times;

A true / false division unit, configured to: use the corresponding corresponding pixels in the second binary image last updated by the second binary image processing unit as true to set all foreground pixel sets F c ; Corresponding false pixels in the three binary image are all background pixel sets B C and the remaining pixels are all unknown pixel sets Z C.
The apparatus according to claim 6, wherein the second dividing module further comprises:

An average filtering unit, configured to: perform average filtering on the first image to obtain a third image;

A second image generating unit is configured to generate the second image by using the following formula:

Among them, IM 2 represents the gray value of the k-th pixel on the second image after superimposition, x r represents a neighborhood pixel of the k-th pixel x k on the first image, and N k represents the neighborhood of the neighborhood centered on x k Number of pixels,
Represents the pixel value of the k-th pixel on the third image obtained by performing average filtering on the first image, and β is taken as 0.5.
The apparatus according to claim 6, wherein the correction module further comprises:

Finding an edge unit, configured to find the edge of the second transparency mask and the edge of the first transparency mask respectively according to the second transparency mask of the first image and the first transparency mask of the first image;

A position determining unit, configured to obtain the positions of all pixels at the edges of the second transparency mask, and the positions of all pixels at the edges of the first transparency mask, and determine the positions of all pixels at the edges of the second transparency mask and An area where the positions of all pixels of the edges of the first transparency mask are coincident, and then pixels Z sp having the same position are determined;

The first correction unit is configured to find the transparency estimate value of the first transparency mask corresponding to the first image of the pixel Z sp and the transparency estimate value of the second transparency mask corresponding to the first image, respectively. The average value of the pixel is used as the estimated transparency value of the pixel Z sp ;

The second correction unit is configured to correct the first transparency mask of the first image by using the pixel Z sp corrected transparency estimation value.
The apparatus according to claim 9, wherein the determining unit further comprises:

Different position subunits are used to further determine pixels with different positions Z dp according to the positions where all the pixels of the edges of the second transparency mask and the positions of all the pixels of the edges of the first transparency mask overlap, including: : A pixel Z dp2 located at the edge of the second transparency mask and a pixel Z dp1 located at the edge of the first transparency mask;

A closed subunit, configured to use the pixels Z dp at different positions and the pixels Z sp at the same position to obtain the edge determined by the second transparency mask and the edge of the first transparency mask: A closed closed area, and the positions of all closed pixels of the closed area;

Find subunits multiple times for:

(1) Find the transparency value of the pixel corresponding to the position of the pixel Z dp1 in the first transparency mask of the first image, and find the transparency value of the corresponding pixel in the second image, and use the average of the two As the pixel Z dp1 corrected transparency estimate;

(2) Find the transparency value of the pixel corresponding to the position of the pixel Z dp2 in the second transparency mask of the first image, and find the transparency value of the corresponding pixel in the first image, and use the average of the two As the pixel Z dp2 corrected transparency estimate;

The complex correction subunit is configured to: modify the first transparency mask of the first image in combination with the pixel Z dp1 modified transparency estimation value and the pixel Z dp2 modified transparency estimation value.