WO2021136001A1

WO2021136001A1 - Codebook principle-based efficient video moving object detection method

Info

Publication number: WO2021136001A1
Application number: PCT/CN2020/137988
Authority: WO
Inventors: 许野平; 井焜; 刘辰飞; 陈英鹏; 朱爱红
Original assignee: 神思电子技术股份有限公司
Priority date: 2019-12-31
Filing date: 2020-12-21
Publication date: 2021-07-08
Also published as: CN111145219B; CN111145219A

Abstract

A Codebook principle-based efficient video moving object detection method, used for mainly eliminating system time delay caused by memory management, simplifying the calculation process of a Codebook method, and accelerating an operation speed. The method specifically comprises: acquiring a video frame from a video signal source in real time, wherein the video frame consists of pixels, the pixels consist of several channel components, and each channel of each pixel has a histogram having a fixed size; when the pixel histogram of a new image frame is updated, using an increment factor as a histogram increment unit; and for a newly received image frame, determining whether each pixel in an image is a foreground or a background, and before receiving a next frame image, multiplying the increment factor by a forgetting factor, i.e., T=T*R. The histograms are directly compared during detection, and the efficiency is higher.

Description

An efficient video moving target detection method based on Codebook principle

Technical field

The invention belongs to the field of machine vision, and particularly relates to an efficient video moving target detection method based on the Codebook principle.

Background technique

Codebook moving target detection method can effectively overcome video background interference. The main disadvantages are: (1) With the change of the video picture, it is necessary to frequently apply for the release of memory. In the case of unattended equipment, memory recycling will affect the reliability and real-time performance of the system. (2) When the video background changes due to factors such as lighting, the Codebook method will gradually fail. In this case, the background information needs to be updated again, and moving targets cannot be detected during this period. (3) The Codebook method is slower, which is not conducive to running on low-configuration hardware devices.

"Image processing method, device and computer-readable storage medium" (Publication No.: 109427067A) discloses an image processing method, including: establishing a Codebook model in RGB space based on a codebook algorithm; using the established Codebook model to detect The pixel of the image to be detected is the foreground or the background, and the detection result is obtained; the message value sum of the pixel of the image to be detected is obtained by passing the message value in the multi-neighborhood direction by using the belief propagation algorithm, and normalization is performed to obtain Probability value; the message value characterizes the continuity of a pixel and neighboring pixels; using the probability value to correct the detection result. This invention can reduce the noise of the Codebook method and improve the accuracy of the detection target. The problem to be solved and the method adopted by the invention are not the same.

"Monitoring area invasion method based on multi-layer Codebook" (Publication No.: 107341816A) discloses a monitoring area intrusion method based on multi-layer Codebook. The video image used for background modeling is used as a temporary background model. When the training time is satisfied When the value Tm is given, the eight areas of background pixels in the temporary background model are searched and connected domains are formed. When the area of the connected domain meets the area threshold Sm and the access frequency Fm, all pixels in the connected domain Add to the permanent background model, and delete the pixel from the temporary background model; search for the corresponding pixel in the permanent background model for each pixel in the image to be detected, if there is no corresponding pixel, determine the image to be detected as the foreground . The invention can effectively prevent isolated noise from being added to the permanent background model, and effectively deal with false alarms caused by sudden changes in light caused by lightning and train lights. This invention can reduce the noise of the Codebook method and improve the accuracy of the detection target. The problem to be solved and the method adopted by the invention are not the same.

"An Image Processing Method Based on Improved Codebook Foreground Detection" (application number: 201610452894.7) discloses an image processing method based on improved Codebook foreground detection, which is characterized by converting RGB color space to YCbCr color space; improved Codebook foreground detection algorithm; application of improved Codebook algorithm for foreground detection. By adopting the method of the present invention, foreground detection can be performed well, while distinguishing foreground and background, the influence of illumination changes on detection is reduced, memory consumption is reduced, and performance is improved. The calculation amount of this method is much higher than that of the normal Codebook method, and the hardware requirements are too high.

"Multi-level dictionary set-based non-reference image quality evaluation method" (application number: 201610273831.5), the present invention discloses a non-reference quality evaluation method based on multi-level dictionary coding, which mainly solves the problem of computer evaluation of noisy images and human eyes The problem of perception inconsistency. The implementation steps are: 1. Divide the image database; 2. Extract the feature vector of a single experimental sample; 3. Calculate the quality value of the feature vector of a pollution map of the training sample; 4. Calculate the feature vector of all training samples; 5. Calculate training The quality values of the feature vectors of all pollution maps in the sample; 6. Use the feature vectors of the training sample reference image to build the first-level dictionary set; 7. Use the feature vectors of the training sample pollution map to build the second-level dictionary set; 8. Calculate the second The quality value of each cluster center in the first-level dictionary set; 9. Project the test sample to the second-level dictionary set to calculate the quality value of the test sample; 10. Judge the sample quality according to the sample quality value. The evaluation result of the present invention is consistent with human eye perception, and can be used for image screening, transmission and compression on the Internet. This invention can reduce the noise of the Codebook method and improve the accuracy of the detection target. The problem to be solved and the method adopted by the invention are not the same.

"A foreground detection method fusing superpixels and background models" (Publication No.: 105825234A), the present invention discloses a foreground detection method fusing superpixels and Codebook background models. The pixels in the video image are segmented by superpixels. Combine it into super pixel blocks, and use the super pixel block as the unit to establish a Codebook background model for its clustering center. There is no need to separately establish a Codebook background model for each pixel in the video, which effectively saves the memory required for the background model. In the foreground detection stage, only The detection of the clustering center greatly shortens the detection time and meets the requirements of the real-time monitoring platform. This invention only detects cluster centers, which increases the possibility of missed target detection and reduces the detection accuracy of the Codebook method.

Summary of the invention

The present invention provides a high-efficiency video moving target detection method based on Codebook principle, which mainly solves the following problems: (1) How to use fixed-size memory, avoid frequent applications for memory release, and eliminate system time lag caused by memory management. (2) Solve the problem of background model failure caused by the gradual change of light over time. The device can work continuously for a long time without relearning the background. (3) Simplify the calculation process of the Codebook method and improve the running speed.

The present invention is achieved through the following technical solutions

The invention discloses an efficient video moving target detection method based on Codebook principle, including:

Collect video frames in real time from the video signal source;

The video frame is composed of pixels, and the pixels are composed of several channel components. Each channel of each pixel has a fixed-size histogram; for an image with a resolution of Wx L and C channel per pixel, a statistical histogram H[W] [L][C][D], and set its initial value to 0; where W is the image width, L is the image height, C is the number of channels of image pixels, and D is the total number of channel brightness levels;

When the pixel histogram of the new image frame is updated, the increment factor is used as the histogram increment unit; specifically:

Each time a frame of image is received, for each pixel in the image, the brightness value of each channel is accumulated into the corresponding histogram unit. The specific accumulation method is: H[x][y][c][d] =H[x][y][c][d]+T, where (x,y) is the coordinate of the pixel on the image, c is the channel number of the pixel, and d is the pixel (x,y) in channel c T is the brightness increment factor, R is the forgetting factor; T can be assigned a small real number value during initialization; R is determined according to the forgetting speed to be achieved, if it is expected that after n frames of images, the current image is opposite to the histogram The weight of the graph contribution is reduced to 1/m, then R^n=m, that is, R=m^(1/n);

For the newly received image frame, the method to determine whether each pixel in the image is the foreground or the background is: the brightness of the pixel (x, y) channel c is d, according to the threshold P, if H[x][y][c] [d]<P, it can be determined that the pixel (x, y) belongs to the foreground pixel, if all channels c of the pixel (x, y) have H[x][y][c][d]>=P, Then it can be determined that the pixel belongs to the background pixel;

Specifically, the determination threshold P of the pixel (x, y) in channel c is set as follows: P=max(H[x][y][c])*0.5, that is, the pixel (x, y) is in channel c Half of the maximum value of the statistical histogram;

Before receiving the next frame of image, the increment factor is multiplied by the forgetting factor, that is, T=T*R.

The above-mentioned high-efficiency video moving target detection method based on Codebook principle, preferably: for VGA resolution black and white video signals, the histogram accumulation method is: H[x][y][d]=H[x][y][d ]+T, T=T*R; 0<=x<640, 0<=y<480, 0<=d<256; the initial value of T is 1.0, R=2^(1/1500)=1.004622.

The above-mentioned high-efficiency video moving target detection method based on Codebook principle, preferably: for color high-definition video signals, the histogram accumulation method is: H[x][y][c][d]=H[x][y][ c][d]+T, T=T*R; 0<=x<1920, 0<=y<1080, 0<=c<3; 0<=d<256; the initial value of T is 1.0, R =2^(1/1500)=1.004622.

Compared with the prior art, the beneficial effects of the present invention are:

(1) According to [2000], the codebook data structure in Codebook is replaced with a histogram with a fixed memory size to avoid frequent memory application and release operations;

(2) Using the forgetting factor to update the histogram can automatically invalidate historical data, avoiding frequent background initialization operations in the Codebook method;

(3) The histogram update is simpler than the codebook update codebook, and the detection process is directly compared to the histogram, which is more efficient;

(4) According to the method of [3005], the present invention adopts a 16x3 dimensional histogram structure per pixel, realizes direct addressing with less cost, and has higher operating efficiency.

Description of the drawings

Figure 1 is a schematic flow diagram of the present invention.

Detailed ways:

The description will be specifically described below in conjunction with FIG. 1.

(1) The device hardware adopts a PC computer, and the operating system adopts Windows 7. The PC computer is connected with the network camera through the network cable, and the video code stream of the camera adopts the H.264 encoding format.

(2) Collect video frames in real time from the video signal source;

A video frame is composed of pixels, and the pixels are composed of several channel components, and each channel of each pixel has a histogram with a fixed size.

Specifically, a VGA resolution black and white video image, such as a thermal imaging camera, has an image resolution of 640x480 pixels, each pixel is composed of a single brightness channel, and the value range of the brightness channel component is usually 0-255;

Specifically, for a color video image, such as a high-definition camera, the image resolution is 1920x1080 pixels, each pixel is composed of three primary color channels of red, green, and blue, and the value range of each primary color channel component is usually 0-255;

(3) For an image with a resolution of WxL and C channel per pixel, construct a statistical histogram H[W][L][C][D], and set its initial value to 0. Among them, W is the image width, L is the image height, C is the number of channels of image pixels, and D is the total number of channel brightness levels;

Specifically, for VGA resolution black and white video signals, the histogram structure is H[640][480][1][256], which can be simplified to H[640][480][256], image width 640 pixels, image height 480 pixels, 1 pixel channel, 256 levels of channel brightness;

Specifically, for color high-definition video signals, the histogram structure is H[1920][1080][3][256], image width 1920, image height 1080 pixels, 3 pixel channels, and channel brightness 256 levels;

Each time a frame of image is received, for each pixel in the image, the brightness value of each channel is accumulated into the corresponding histogram unit. The specific accumulation method is: H[x][y][c][d] =H[x][y][c][d]+T, T=T*R. Among them, (x, y) is the coordinate of the pixel on the image, c is the channel number of the pixel, d is the brightness value of the pixel (x, y) in channel c, T is the brightness accumulation factor, R is the forgetting factor; T is A small real number value can be assigned during initialization; R is determined according to the forgetting speed to be achieved. If it is expected that after n frames of images, the weight of the contribution of the current image to the histogram is reduced to 1/m, then R^n=m, that is R=m^(1/n);

Specifically, for VGA resolution black and white video signals, the histogram accumulation method is: H[x][y][d]=H[x][y][d]+T, T=T*R; 0<= x<640,0<=y<480, 0<=d<256; the initial value of T is 1.0, R=2^(1/1500)=1.004622;

Specifically, for color high-definition video signals, the histogram accumulation method is: H[x][y][c][d]=H[x][y][c][d]+T, T=T*R ; 0<=x<1920, 0<=y<1080, 0<=c<3; 0<=d<256; the initial value of T is 1.0, R=2^(1/1500)=1.0004622;

(4) For the newly received image frame, the method to determine whether each pixel in the image is the foreground or the background is: the brightness of the pixel (x, y) channel c is d, according to the threshold P, if H[x][y] [c][d]<P, it can be determined that the pixel (x, y) belongs to the foreground pixel, if all channels c of the pixel (x, y) have H[x][y][c][d]> =P, it can be determined that the pixel belongs to the background pixel;

Specifically, the determination threshold P of the pixel (x, y) in channel c is set as follows: P=max(H[x][y][c])*0.5, that is, the pixel (x, y) is in channel c Half of the maximum value of the statistical histogram.

Claims

An efficient video moving target detection method based on Codebook principle, which is characterized in that it includes:

[1001] Collect video frames in real time from a video signal source;

[2001] The video frame is composed of pixels, and the pixels are composed of several channel components. Each channel of each pixel has a fixed-size histogram; for an image with a resolution of WxL and C channel per pixel, a statistical histogram H[ W][L][C][D], and set its initial value to 0; among them, W is the image width, L is the image height, C is the number of channels of image pixels, and D is the total number of channel brightness levels;

[3001] When the pixel histogram of the new image frame is updated, the increment factor is used as the histogram increment unit; specifically

Each time a frame of image is received, for each pixel in the image, the brightness value of each channel is accumulated into the corresponding histogram unit. The specific accumulation method is: H[x][y][c][d] =H[x][y][c][d]+T, where (x,y) is the coordinate of the pixel on the image, c is the channel number of the pixel, and d is the pixel (x,y) in channel c T is the brightness increment factor, R is the forgetting factor; T can be assigned a small real number value during initialization; R is determined according to the forgetting speed to be achieved, if it is expected that after n frames of images, the current image is opposite to the histogram The weight of the graph contribution is reduced to 1/m, then R^n=m, that is, R=m^(1/n);

[4001] For a newly received image frame, the method to determine whether each pixel in the image is the foreground or the background is: the brightness of the pixel (x, y) channel c is d, according to the threshold P, if H[x][y] [c][d]<P, it can be determined that the pixel (x, y) belongs to the foreground pixel, if all channels c of the pixel (x, y) have H[x][y][c][d]> =P, it can be determined that the pixel belongs to the background pixel;

The determination threshold P of the pixel (x, y) in channel c is set: P=max(H[x][y][c])*0.5, that is, the statistical histogram of pixel (x, y) in channel c is taken Half of the maximum value of the graph

Before receiving the next frame of image, the increment factor is multiplied by the forgetting factor, that is, T=T*R.
The high-efficiency video moving target detection method based on Codebook principle according to claim 1, characterized in that:

For VGA resolution black and white video signals, the histogram accumulation method is: H[x][y][d]=H[x][y][d]+T, T=T*R; 0<=x<640 , 0<=y<480, 0<=d<256; the initial value of T is 1.0, and R=2^(1/1500)=1.004622.
The high-efficiency video moving target detection method based on the Codebook principle according to claim 1, characterized in that: for color high-definition video signals, the histogram accumulation method is: H[x][y][c][d]=H[ x][y][c][d]+T, T=T*R; 0<=x<1920, 0<=y<1080, 0<=c<3; 0<=d<256; T's The initial value is 1.0, and R=2^(1/1500)=1.004622.