CN111612811B

CN111612811B - Video foreground information extraction method and system

Info

Publication number: CN111612811B
Application number: CN202010506381.6A
Authority: CN
Inventors: 乔智; 常超; 刘熹; 柴宇明; 舒友生; 黄崟东
Original assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Current assignee: National Defense Technology Innovation Institute PLA Academy of Military Science
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2021-02-19
Anticipated expiration: 2040-06-05
Also published as: CN111612811A

Abstract

The embodiment of the invention provides a method and a system for extracting video foreground information, which have the advantages of small calculated amount, simple method and good effect on high-frame-rate and low-pixel target videos. In addition, in the embodiment of the invention, the gray scale change information of each pixel at each pixel position is independently processed, so that the parallel computation can be compatible and accelerated, the computation speed is greatly improved, and the information crosstalk between pixels can be reduced. Compared with a frame difference method, the video foreground information extraction method provided by the embodiment of the invention can solve the problem of inaccurate extraction caused by low sampling rate and high foreground movement speed of the frame difference method. Compared with an optical flow method, the video foreground information extraction method provided by the embodiment of the invention has a simpler calculation method and shorter calculation time. The static background value obtained by normal analysis calculation has better accuracy than simple background subtraction, is less influenced by artificial judgment, and has stronger adaptability to different conditions.

Description

Video foreground information extraction method and system

Technical Field

The invention relates to the technical field of video image processing, in particular to a method and a system for extracting video foreground information.

Background

The extraction of the foreground information of the video refers to separating a foreground object in the video from a background, and under a general condition, the foreground object is in a motion state. For example, in order to accurately analyze the behavior of the zebra fish, the tail of the zebra fish during the swing is separated from background information in a video containing the zebra fish, and the like.

At present, commonly used methods for extracting foreground information of a video include a frame difference method, an optical flow method, a background subtraction method and the like. The frame difference method is to obtain the moving area of an image by using the image difference of two or more adjacent frames so as to obtain the video foreground information. The method is high in speed and simple in operation, but a large error occurs when a foreground object moving at a high speed is faced. For example, the extraction effect of the zebra fish tail by using the method is not ideal because the zebra fish tail swings fast and the video dynamic blurring degree is high. The optical flow method is to estimate the movement of foreground object in video according to the space-time variation gradient of image by calculating optical flow field and extract foreground object. The method has the disadvantages of large calculation amount, high sensitivity to noise and easy influence. The background subtraction method is to pre-construct a background image by using image information, and then to differentiate a current frame from the background image, thereby realizing the distinction between a motion region and a background and extracting video foreground information. The method has high requirement on the quality of artificially constructed background, the selection of the threshold is very critical, and moving foreground objects cannot be completely extracted if the threshold is too high or too low, so that certain difficulty exists in operation.

Therefore, it is urgently needed to provide a method and a system for extracting video foreground information.

Disclosure of Invention

To overcome the foregoing problems or at least partially solve the foregoing problems, embodiments of the present invention provide a method and system for extracting video foreground information.

In a first aspect, an embodiment of the present invention provides a method for extracting video foreground information, including:

acquiring a target video, and changing an image with a preset frame number in the target video into a gray image;

for each pixel position in the target video, determining a pixel gray distribution matrix at the pixel position based on the gray image, determining a four-bit distance of all gray values in each pixel gray distribution matrix, and performing normal fitting on the gray values in the four-bit distance; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value;

determining a static background value of each pixel position based on the pixel category of each pixel position in the target video and the period information of each pixel position, and extracting foreground information of each frame of image in the target video based on the static background value of each pixel position.

Preferably, the determining the static threshold at the pixel position based on the normal distribution obtained by normal fitting specifically includes:

calculating a first gray value when the cumulative probability value of the normal distribution is a first preset threshold value and a second gray value when the cumulative probability value is a second preset threshold value; the first preset threshold is smaller than the second preset threshold;

determining the static threshold based on the first grayscale value and the second grayscale value.

Preferably, the static threshold specifically includes a first static threshold and a second static threshold, and the first static threshold is smaller than the second static threshold;

correspondingly, the determining the pixel category at the pixel position based on the static threshold specifically includes:

determining a first number of frames of grayscale images at the pixel location having grayscale values above the second static threshold and a second number of frames of grayscale images having grayscale values below the first static threshold;

determining a pixel classification at the pixel location based on the first frame number and the second frame number.

Preferably, the determining the period information at the pixel position based on the static threshold specifically includes:

for each frame of gray level image, determining the gray level value of the gray level image at the pixel position, and if the gray level value of the gray level image at the pixel position is judged to be greater than or equal to the first static threshold value and less than or equal to the second static threshold value, marking the gray level image at the pixel position as 1; otherwise, marking the grayscale image at the pixel location as 0;

the time set formed by all the frame gray level images marked as 1 at the pixel position is the first-class time period at the pixel position, and the time set formed by all the frame gray level images marked as 0 at the pixel position is the second-class time period at the pixel position.

Preferably, the pixel classes at the pixel positions comprise a first class of pixels and a second class of pixels; correspondingly, the determining the static background value of each pixel position based on the pixel category of each pixel position in the target video and the period information of each pixel position specifically includes:

for each pixel position in the target video, if the pixel type at the pixel position is judged and known to be the first type of pixel, determining a static background value of the pixel position based on the minimum value of the gray value at the pixel position in the second type of time period;

otherwise, determining a static background value for the pixel location based on a maximum value of the gray value at the pixel location in the first class period.

Preferably, the ratio of the preset frame number to the total frame number in the target video is less than 0.05.

Preferably, before determining the four-bit distance of all gray values in the pixel gray distribution matrix, the method further includes: and adding random noise into the pixel gray distribution matrix.

In a second aspect, an embodiment of the present invention provides a video foreground information extraction system, including: the device comprises an acquisition module, a determination module and an extraction module. Wherein the content of the first and second substances,

the acquisition module is used for acquiring a target video and converting an image with a preset frame number in the target video into a gray image;

the determining module is used for determining a pixel gray distribution matrix at each pixel position in the target video based on the gray image, determining a four-bit distance of all gray values in each pixel gray distribution matrix, and performing normal fitting on the gray values in the four-bit distance; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value;

the extraction module is used for determining a static background value of each pixel position based on the pixel category of each pixel position in the target video and the period information of each pixel position, and extracting foreground information of each frame of image in the target video based on the static background value of each pixel position.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the video foreground information extraction method according to the first aspect when executing the program.

In a fourth aspect, an embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the video foreground information extraction method according to the first aspect.

The method and the system for extracting the video foreground information provided by the embodiment of the invention have the advantages of small calculated amount, simple method and good effect on the target video with high frame rate and low pixel. In addition, in the embodiment of the invention, the gray scale change information of each pixel at each pixel position is independently processed, so that the parallel computation can be compatible and accelerated, the computation speed is greatly improved, and the information crosstalk between pixels can be reduced. Compared with a frame difference method, the video foreground information extraction method provided by the embodiment of the invention can solve the problem of inaccurate extraction caused by low sampling rate and high foreground movement speed of the frame difference method. Compared with an optical flow method, the video foreground information extraction method provided by the embodiment of the invention has a simpler calculation method and shorter calculation time. The static background value obtained by normal analysis calculation has better accuracy than simple background subtraction, is less influenced by artificial judgment, and has stronger adaptability to different conditions.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic flow chart of a video foreground information extraction method according to an embodiment of the present invention;

fig. 2a is a schematic diagram of a frame of image in a target video in a video foreground information extraction method according to an embodiment of the present invention;

fig. 2b is a schematic diagram of an image after binarization processing in a video foreground information extraction method according to an embodiment of the present invention;

fig. 3a is a schematic diagram of a frame of image in a target video in a video foreground information extraction method according to an embodiment of the present invention;

fig. 3b is a schematic diagram of an image after binarization processing in the method for extracting video foreground information according to the embodiment of the present invention;

fig. 4a is a schematic diagram of a frame of image in a target video in a video foreground information extraction method according to an embodiment of the present invention;

fig. 4b is a schematic diagram of an image after binarization processing in the method for extracting video foreground information according to the embodiment of the present invention;

fig. 5 is a schematic diagram of gray scale changes in a static period and a dynamic period when a pixel type at a pixel position i in the video foreground information extraction method provided in the embodiment of the present invention is a first type pixel;

fig. 6 is a schematic diagram of gray scale changes in a static period and a dynamic period when the pixel type at the pixel position i in the video foreground information extraction method provided in the embodiment of the present invention is the second type of pixel;

fig. 7 is a schematic view of a complete flow of a video foreground information extraction method according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of a video foreground information extraction system according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the embodiments of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience in describing the embodiments of the present invention and simplifying the description, but do not indicate or imply that the referred devices or elements must have specific orientations, be configured in specific orientations, and operate, and thus, should not be construed as limiting the embodiments of the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the embodiments of the present invention, it should be noted that, unless explicitly stated or limited otherwise, the terms "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. Specific meanings of the above terms in the embodiments of the present invention can be understood in specific cases by those of ordinary skill in the art.

Zebrafish, as a novel model organism, has high homology with human genes, and the biological structure and physiological function are highly similar to those of mammals. Meanwhile, zebra fish has the advantages of small size, high transparency, short growth cycle, low feeding cost and the like, and has been widely used as a model organism for disease research. Therefore, behavioral analysis of zebra fish is becoming a very important technical means in the biological field. In order to accurately analyze the behavior of the zebra fish, the tail movement information is very critical. The tail of the zebra fish swings fast, and the foreground information of the tail is easy to be confused with the background information in the swinging process, so that great difficulty is caused in analyzing the tail movement information of the zebra fish. Therefore, the embodiment of the invention provides a video foreground information extraction method.

As shown in fig. 1, the method for extracting video foreground information according to the embodiment of the present invention includes:

s1, acquiring a target video, and changing an image with a preset frame number in the target video into a gray image;

s2, for each pixel position in the target video, determining a pixel gray distribution matrix at the pixel position based on the gray image, determining a four-bit distance of all gray values in each pixel gray distribution matrix, and performing normal fitting on the gray values in the four-bit distance; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value;

s3, determining a static background value of each pixel position based on the pixel category of each pixel position in the target video and the period information of each pixel position, and extracting foreground information of each frame of image in the target video based on the static background value of each pixel position.

Specifically, in the video foreground information extraction method provided in the embodiment of the present invention, an execution main body is a processor, and specifically, the execution main body may be a local processor, such as a computer, a tablet computer, a smart phone, and the like, or a cloud processor, which is not specifically limited in the embodiment of the present invention.

Step S1 is performed first. The target video D is a video that needs to be subjected to foreground information extraction, and the foreground information is related information of a foreground object in the video except for a background. In general, a foreground object may be a moving object and the background a static background. The target video includes a plurality of frames of images, and the total number F of the specifically included images may be set according to needs, which is not specifically limited in the embodiment of the present invention. Each frame of image in the target video may specifically be an RGB color image with a size of M × N × 3, where M is the number of rows of pixels included in each frame of image, N is the number of columns of pixels included in each frame of image, 3 is the number of channels included in each frame of image, and 3 channels are R, G, B respectively. An image of a preset number of frames P in the target video D is changed into a gray image, i.e., a P-frame image is extracted from an F-frame image and converted into a gray image of size M × N. Further, P-frame gray images can be obtained, which can constitute the test set T. The grayscale image may specifically be an 8-bit grayscale image.

It should be noted that, in the embodiment of the present invention, a P frame image is extracted from an F frame image, and a value of P may be set as needed, where P is required to be less than or equal to F. The extraction operation may be equal-interval extraction or unequal-interval extraction as necessary, and for example, several consecutive frames of images may be extracted from the F-frame image as necessary.

Then, step S2 is executed. Because each frame of image in the target video has a size of M × N, that is, each frame of image includes M × N pixels, the target video includes M × N pixel positions, each pixel position has P pixels with a preset frame number, and the P pixels respectively belong to P frame images.

For each pixel location i in the target video, a pixel gray scale distribution matrix at pixel location i is determined based on the P-frame gray scale image. The elements in the pixel gray-scale distribution matrix are the gray-scale values of P pixels at pixel position i, so the size of the pixel gray-scale distribution matrix is 1 × P. The pixel positions i correspond to the pixel gray distribution matrix one by one, that is, each pixel position corresponds to one pixel gray distribution matrix, so that the P-frame gray image can have M × N pixel gray distribution matrices with the size of 1 × P. The process of determining the pixel gray distribution matrix at the pixel position i may specifically be: and combining the test sets T into an M × N × P gray matrix, wherein each element represents the gray value of each pixel at a certain moment, and the specific moment refers to the moment corresponding to the gray image where the pixel is located. The gray matrix is divided into M × N matrices of 1 × P size, and each 1 × P matrix constitutes a pixel gray distribution matrix E for a corresponding pixel position, representing the gray value variation of P pixels at each pixel position.

The pixel gray distribution matrix E contains the gray values of P pixels, that is, P gray values, so that when the four-bit distances of all the gray values in the pixel gray distribution matrix are determined, the P gray values are sorted in the order from large to small or from small to large. After sorting, a first quartile Q can be calculated₁And a third quartile Q₃And further according to the first quartile Q₁And a third quartile Q₃To obtain a four-bit distance Q_E＝Q₃–Q₁。

Will divide the bit distance Q_EThe normal fitting can be carried out on the gray value in the space, and the normal fitting can specifically adopt a standard normal distribution model to divide the four-bit distance Q_EThe gray values in the interior and their corresponding cumulative probability values are normally fitted. The fitting method may specifically be: will Q₁Align Q with the first quartile of the standard normal distribution model₃And aligning with the third quartile of the standard normal distribution model to obtain normal distribution. And determining the static threshold value at the pixel position i according to the normal distribution. In the embodiment of the present invention, the static threshold may specifically include a first static threshold TH₁And a second static threshold value TH₂And has TH₁<TH₂。TH₁、TH₂The value size of the data is related to the subsequent determination of the pixel category and the time interval information, and TH is strictly set₁And TH₂The size relationship between can be in the followingThe post-stool analyzes the pixel class and the period information. The required static threshold value can be simply and efficiently calculated by adopting a standard normal distribution model.

The pixel class at the pixel position i may specifically include a first class of pixels and a second class of pixels, where the first class of pixels may specifically be bright-area pixels, which indicate that most of the pixels at the pixel position i represent foreground information, and the second class of pixels may specifically be dark-area pixels, which indicate that most of the pixels at the pixel position i represent background information. The period information at the pixel position i may specifically include: the first type of period may specifically be a static period, i.e. the pixel at pixel position i represents background information, and the second type of period may specifically be a non-static period, i.e. the pixel at pixel position i represents foreground information. And selecting the gray values in different time period information as the static background values of the corresponding pixel positions according to different pixel categories.

Finally, step S3 is performed. The static background value of each pixel position i refers to a background gray value determined according to the pixel category at each pixel position i and the time period information at the pixel position i, and is an equivalent value. The static background values of all M × N pixel positions are combined in the original position, and a static background model B with a size of M × N can be formed. And extracting foreground information of each frame of image in the target video D according to the static background value of each pixel position i, specifically, subtracting the static background model B from each frame of image in the target video D to obtain an image only containing the foreground information, and further obtaining a video only containing the foreground information.

The video foreground information extraction method provided by the embodiment of the invention is actually a video foreground information extraction method based on normal distribution, has small calculation amount and simple method, and particularly has good effect on target videos with high frame rate and low pixels. In addition, in the embodiment of the invention, the gray scale change information of each pixel at each pixel position is independently processed, so that the parallel computation can be compatible and accelerated, the computation speed is greatly improved, and the information crosstalk between pixels can be reduced. Compared with a frame difference method, the video foreground information extraction method provided by the embodiment of the invention can solve the problem of inaccurate extraction caused by low sampling rate and high foreground movement speed of the frame difference method. Compared with an optical flow method, the video foreground information extraction method provided by the embodiment of the invention has a simpler calculation method and shorter calculation time. The static background value obtained by normal analysis calculation has better accuracy than simple background subtraction, is less influenced by artificial judgment, and has stronger adaptability to different conditions.

On the basis of the above embodiment, in the embodiment of the present invention, after subtracting the static background model B from each frame of image in the target video D to obtain an image containing only foreground information, the method further includes: and carrying out binarization processing on the image only containing the foreground information to obtain the image with highlighted foreground information. For example, as shown in fig. 2a, fig. 3a, and fig. 4a, each of the images is a frame image in the target video, and after the video foreground information is extracted by the video foreground information extraction method provided in the embodiment of the present invention, the binarized image shown in fig. 2b, fig. 3b, and fig. 4b is obtained.

On the basis of the foregoing embodiment, the determining the static threshold at the pixel position based on the normal distribution obtained by normal fitting specifically includes:

Specifically, in the embodiment of the present invention, when determining the static threshold at the pixel position i according to the normal distribution, first, a first gray scale value when the cumulative probability value of the normal distribution is a first preset threshold and a second gray scale value when the cumulative probability value is a second preset threshold are calculated. The first preset threshold is smaller than the second preset threshold, and specific values of the first preset threshold and the second preset threshold may be set as required, for example, the value range of the first preset threshold may be set to be smaller than 10%, and the second preset threshold may be set to be smaller than the second preset thresholdThe threshold value may be set to a value greater than 90%. Preferably, the first preset threshold is 0.1%, and the second preset threshold is 99.9%. And then determining a static threshold according to the first gray value and the second gray value. Specifically, the first gray value may be used as the first static threshold TH at the pixel position i₁Taking the second gray scale value as a second static threshold value TH at the pixel position i₂。

In the embodiment of the invention, the first static threshold and the second static threshold are determined by setting the first preset threshold and the second preset threshold, so that the selection of the first static threshold and the second static threshold is more flexible, and the applicability of the method to different videos is improved.

On the basis of the above embodiment, the first static threshold and the second static threshold are also related to the extraction manner of the P-frame image, and the extraction manner of the P-frame image is different, which results in that the finally determined first static threshold and the second static threshold are different. Therefore, the applicability of the method to different videos can be improved by changing the extraction mode of the P frame image.

On the basis of the above embodiment, the static threshold specifically includes a first static threshold and a second static threshold, and the first static threshold is smaller than the second static threshold;

Specifically, in the embodiment of the present invention, when determining the pixel category at the pixel position i according to the static threshold, it is first determined that the grayscale value at the pixel position i is higher than the second static threshold TH₂Is lower than a first static threshold TH₁Second frame number of the gray scale imageAnd (4) ND. Then, the pixel type at the pixel position i is determined based on the first frame number NL and the second frame number ND. Specifically, the pixel type may be determined according to the relative size of the first frame number NL and the second frame number ND, and the pixel at the pixel position i may be divided into two types, namely a dark area pixel and a bright area pixel. If the first frame number NL is judged to be larger than or equal to the second frame number ND, the pixel type at the pixel position i is determined to be a first type pixel, namely a bright area pixel; otherwise, if the first frame number NL is judged to be smaller than the second frame number ND, the pixel type at the pixel position i is determined to be a second type pixel, that is, a dark area pixel.

In the embodiment of the invention, the pixel category at each pixel position is determined according to the size relationship between the first frame number and the second frame number, so that different types of pixels can be accurately distinguished, and different static background values are selected for the different types of pixels.

On the basis of the foregoing embodiment, the determining, based on the static threshold, the period information at the pixel position specifically includes:

Specifically, in the embodiment of the present invention, the definition manner of the static period Sa may be:

where j represents the number of frames in test set T at pixel location i, i.e., the testThe jth frame image in the set T, Sa (j) represents the mark value of the jth frame image at the pixel position i, E (j) represents the gray value of the pixel of the jth frame image at the pixel position i, when E (j) satisfies TH₁＜E(j)＜TH₂Let sa (j) equal to 1, i.e. mark the jth frame image as 1 at pixel position i, otherwise mark the jth frame image as 0 at pixel position i. Finally, a time set formed by all frame images marked as 1 at the pixel position i is used as a first-type period, namely a static period, at the pixel position i; the temporal set of all frame images labeled 0 at pixel position i as the second-class period, i.e., the non-static period, at pixel position i can be understood as a dynamic period.

In the embodiment of the invention, different time periods can be more quickly distinguished by marking a certain pixel position in each frame of image and further determining the first type time period and the second type time period according to the marking value.

On the basis of the above embodiment, the pixel category at the pixel position includes a first type of pixel and a second type of pixel;

correspondingly, the determining the static background value of each pixel position based on the pixel category of each pixel position in the target video and the period information of each pixel position specifically includes:

Specifically, in the embodiment of the present invention, in determining the static background value of each pixel position i, for each pixel position i, if the pixel class at the pixel position i is the first class of pixels (i.e., bright area pixels), the static background value of the pixel position i is determined according to the minimum value of the gray value at the pixel position i in the second class period, that is, the minimum value of the gray value at the pixel position i in the second class period is used as the static background value of the pixel position i. This is because the gray value of the first type of pixels on the foreground object is very high, and the gray value of the first type of pixels is reduced when the foreground object moves, so the gray value of the background itself is very high, and the gray value is high and is the background, so the minimum value of the gray value at the pixel position i in the second type of time period, that is, the non-static time period, that is, the dynamic time period, needs to be taken as the static background value. As shown in fig. 5, the static period is 51 and the dynamic period is 52 in fig. 5, and when the pixel class at the pixel position i is the first type of pixel (i.e., the bright area pixel), the minimum value of the dynamic period is selected as the static background value.

If the pixel class at the pixel position i is the second class of pixels (i.e. dark area pixels), the static background value of the pixel position i is determined according to the maximum value of the gray value at the pixel position i in the first class period, i.e. the maximum value of the gray value at the pixel position i in the first class period is taken as the static background value of the pixel position i. This is because the second type of pixels on the foreground object have a low gray value, and the gray value will increase when the foreground object moves across these second type of pixels, so the gray value of the background itself is low, and the background with the low gray value needs to take the maximum value of the gray value at the pixel position i in the first type of time period, i.e. the static time period, as the static background value. As shown in fig. 6, the dynamic period is 61 and the static period is 62 in fig. 6, and when the pixel type at the pixel position i is the second type of pixel, the maximum value of the static period is selected as the static background value.

In the embodiment of the invention, the static background value of each pixel position is determined through the minimum value of the gray value of each pixel position in the second type of time period or the maximum value of the gray value of each pixel position in the first type of time period, so that the background information can be deducted to the maximum degree and the foreground information is reserved at the same time.

On the basis of the above embodiment, the ratio of the preset frame number to the total frame number in the target video is less than 0.05.

Specifically, if the preset frame number is P and the total frame number in the target video is F, P/F is less than 0.05. Therefore, the sparsity of the test set T can be enhanced, the foreground information is highlighted, the extraction of the foreground information is facilitated, meanwhile, the calculation amount can be greatly reduced, and the calculation speed is increased.

On the basis of the foregoing embodiment, before determining the four-bit distances of all gray values in the pixel gray distribution matrix, the method further includes: and adding random noise into the pixel gray distribution matrix.

Specifically, in the embodiment of the invention, random noise R is added into the pixel gray distribution matrix E, and the amplitude of the random noise R is distributed between 0 and 1, so that the fitting adaptability of pixels with weak or unchanged brightness can be improved, the adaptability of a static threshold to different images is enhanced, and the robustness of the video foreground information extraction method is improved.

As shown in fig. 7, a complete flow diagram of the video foreground information extraction method provided in the embodiment of the present invention is provided on the basis of the above-mentioned embodiment.

(1) Reading a target video D, wherein the target video D is composed of an F frame image sequence, and each frame image is an MxNx3 RGB color image.

(2) The extraction P-frame images at equal intervals from the F-frames of the target video D are combined, where P/F < 0.05.

(3) Each frame image is converted into an 8-bit gray image of size M × N pixels.

(4) And (4) forming the P frames of gray images into a test set T, wherein the size of the test set T is M multiplied by N multiplied by P.

(5) The test set T data is an M × N × P gray matrix, and each element represents a gray value of each pixel at a certain time. The test set T is divided into M × N1 × P matrices, and each 1 × P matrix constitutes a pixel grayscale distribution matrix E corresponding to a pixel position.

(6) Taking a pixel position i, and taking a pixel gray distribution matrix E_iAdding random noise R to obtain pixel gray distribution matrix E added with random noise_iThe random noise R is distributed between 0 and 1.

(7) Distributing the pixel gray level_iAfter all elements in the tree are sorted in ascending order, the first quartile Q of the tree is obtained by calculation₁And a third quartile Q₃To obtain a four-bit distance Q_E＝Q₃–Q₁。

(8) Will divide the bit distance Q_EInner pixel gray distribution matrix E_iThe gray value in (1) and the corresponding cumulative probability value are subjected to normal fitting, and the used model is a standard normal distribution model. According to the first quartile Q₁And a third quartile Q₃And corresponding cumulative probability values, Q₁Align Q with the first quartile of the standard normal distribution₃Aligned with the third quartile of the standard normal distribution. Obtaining fitting result, namely normal distribution, calculating brightness values with cumulative probability of 0.1% and 99.9%, respectively defined as TH₁And TH₂As a static threshold value of the static period Sa of the pixel luminance distribution.

(9) Distributing the pixel gray level_iCorresponding to a static threshold value TH₁And TH₂A comparison is made. Judging the variation trend of the gray value, wherein the gray value at the pixel position i is higher than a second static threshold TH₂Is NL, is lower than a first static threshold TH₁The second frame number of (2) is ND, and the pixel type is judged according to the relative sizes of NL and ND, and the pixels are divided into two types of dark area pixels and bright area pixels. If NL is larger than or equal to ND, the pixel at the pixel position i is a bright area pixel, otherwise the pixel at the pixel position i is a dark area pixel. The static background value of the bright area pixel is the maximum value of the static time interval, and the static background value of the dark area pixel is the minimum value of the non-static time interval, namely the dynamic time interval.

(10) And if the static background values of all the pixel positions are calculated, finally combining the static background values of all the pixel positions according to the original positions to form a static background model B with the size of M multiplied by N. Otherwise, returning to the step (6), and calculating the static background value of the next pixel position.

(11) And subtracting the static background model B from the target video D frame by frame to obtain each frame image only containing the foreground information, and then respectively carrying out binarization processing on each frame image to obtain the highlighted foreground information image.

As shown in fig. 8, on the basis of the above embodiment, an embodiment of the present invention provides a video foreground information extraction system, including: an acquisition module 81, a determination module 82 and an extraction module 83. Wherein the content of the first and second substances,

the acquiring module 81 is configured to acquire a target video and convert an image with a preset frame number in the target video into a grayscale image;

the determining module 82 is configured to, for each pixel position in the target video, determine a pixel gray distribution matrix at the pixel position based on the gray scale image, determine a quartile range of all gray scale values in each pixel gray distribution matrix, and perform normal fitting on the gray scale values within the quartile range; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value;

the extracting module 83 is configured to determine a static background value of each pixel position based on the pixel category at each pixel position in the target video and the period information at each pixel position, and extract foreground information of each frame of image in the target video based on the static background value of each pixel position.

Specifically, the functions of the modules in the video foreground information extraction system provided in the embodiment of the present invention correspond to the operation flows of the steps in the method embodiments one to one, and the implementation effect is also consistent.

As shown in fig. 9, on the basis of the above embodiment, an embodiment of the present invention provides an electronic device, including: a processor (processor)901, a memory (memory)902, a communication Interface (Communications Interface)903, and a communication bus 904; wherein the content of the first and second substances,

the processor 901, the memory 902 and the communication interface 903 are communicated with each other through a communication bus 904. The memory 902 stores program instructions executable by the processor 901, and the processor 901 is configured to call the program instructions in the memory 902 to execute the video foreground information extraction method provided by the above-mentioned embodiments of the methods.

It should be noted that, when being implemented specifically, the electronic device in this embodiment may be a server, a PC, or another device, as long as the structure includes a processor 901, a communication interface 903, a memory 902, and a communication bus 904 shown in fig. 9, where the processor 901, the communication interface 903, and the memory 902 complete mutual communication through the communication bus 904, and the processor 901 may call a logic instruction in the memory 902 to execute the above method. The embodiment does not limit the specific implementation form of the electronic device.

The logic instructions in memory 902 may be implemented in software functional units and stored in a computer readable storage medium when sold or used as a stand-alone article of manufacture. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Further, the present invention discloses a computer program product, which includes a computer program stored on a non-transitory computer readable storage medium, the computer program includes program instructions, when the program instructions are executed by a computer, the computer can execute the video foreground information extraction method provided by the above method embodiments.

On the basis of the foregoing embodiments, the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented to execute the video foreground information extraction method provided by the foregoing embodiments when executed by a processor.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A method for extracting video foreground information is characterized by comprising the following steps:

for each pixel position in the target video, determining a pixel gray distribution matrix at the pixel position based on the gray image, determining a four-bit distance of all gray values in each pixel gray distribution matrix, and performing normal fitting on the gray values in the four-bit distance; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value; the pixel category comprises a first type of pixels and a second type of pixels, wherein the first type of pixels are bright area pixels, and the second type of pixels are dark area pixels;

determining a static background value of each pixel position based on the pixel category of each pixel position in the target video and the time period information of each pixel position, and extracting foreground information of each frame of image in the target video based on the static background value of each pixel position;

the static threshold specifically includes a first static threshold and a second static threshold, and the first static threshold is smaller than the second static threshold;

2. The method according to claim 1, wherein the determining the static threshold at the pixel position based on a normal distribution obtained by normal fitting specifically includes:

3. The method according to claim 1, wherein the determining the period information at the pixel position based on the static threshold specifically includes:

4. The method according to claim 3, wherein the pixel classes at the pixel position comprise a first class of pixels and a second class of pixels;

5. The method according to any one of claims 1 to 4, wherein a ratio of the preset frame number to a total frame number in the target video is less than 0.05.

6. The method according to any one of claims 1 to 4, wherein before determining the quartile distance of all gray values in the pixel gray distribution matrix, the method further comprises: and adding random noise into the pixel gray distribution matrix.

7. A video foreground information extraction system, comprising:

the system comprises an acquisition module, a display module and a display module, wherein the acquisition module is used for acquiring a target video and converting an image with a preset frame number in the target video into a gray image;

the determining module is used for determining a pixel gray distribution matrix at each pixel position in the target video based on the gray image, determining a four-bit distance of all gray values in each pixel gray distribution matrix, and performing normal fitting on the gray values in the four-bit distance; determining a static threshold value at the pixel position based on normal distribution obtained by normal fitting, and determining the pixel category at the pixel position and the time period information at the pixel position based on the static threshold value; the pixel category comprises a first type of pixels and a second type of pixels, wherein the first type of pixels are bright area pixels, and the second type of pixels are dark area pixels;

an extraction module, configured to determine a static background value of each pixel position based on a pixel category at each pixel position in the target video and the period information at each pixel position, and extract foreground information of each frame of image in the target video based on the static background value of each pixel position;

correspondingly, the determining module is further configured to:

8. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the video foreground information extraction method according to any one of claims 1 to 6 when executing the program.

9. A non-transitory computer readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the video foreground information extraction method according to any one of claims 1 to 6.