WO2021012757A1

WO2021012757A1 - Real-time target detection and tracking method based on panoramic multichannel 4k video images

Info

Publication number: WO2021012757A1
Application number: PCT/CN2020/090155
Authority: WO
Inventors: 朱伟; 王扬红; 苗锋; 邱文嘉; 王寿峰; 马浩; 白俊奇
Original assignee: 南京莱斯电子设备有限公司
Priority date: 2019-07-23
Filing date: 2020-05-14
Publication date: 2021-01-28
Also published as: CN110517288B; CN110517288A

Abstract

A real-time target detection and tracking method based on panoramic multichannel 4k video images, mainly used for solving the problems in the prior art that the processing speed of panoramic multichannel 4k images is slow, a target crossing multichannel cameras is erroneously detected or missed, and the stability of detecting and tracking the target is low. The method comprises: first, performing long-time target probability counting on a panoramic video image, realizing region importance division, and setting a background modeling parameter threshold; next, performing adaptive background modeling on the panoramic video image and obtaining a foreground target candidate region of a scene; then, performing fusing and processing on the foreground target candidate region to form a candidate target plot; finally, performing dynamic flight track management to realize the multi-target stable tracking of a panoramic video. The present method can be used in the fields of the remote monitoring of an airport tower, panoramic video enhancement, road traffic vehicle detection and the like, and is excellent in target detection and tracking performance.

Description

Real-time target detection and tracking method based on panoramic multi-channel 4k video images

Technical field

The invention relates to the technical field of digital image processing, in particular to a real-time target detection and tracking method based on panoramic multi-channel 4k video images.

Background technique

Target detection is to extract the target of interest from the image through computer vision algorithms. As an important branch of image processing, target detection has a very wide range of applications in various fields. In the actual detection scene, due to the complex and unstable external environment, there are many interferences, which bring many problems to target detection. The realization of accurate, stable and real-time target detection and tracking has very important research significance.

Zhang Tianyu proposed a multi-scale target detection method in the patent "Spatio-temporal multi-scale moving target detection method". The image is divided into blocks and the optimal difference interval in the moving area is used to achieve target detection and tracking. This method is robust in complex scenes The performance is low, and the criteria for determining significant differences are difficult to adapt to multiple scenarios. Zdenek Kalal, Krystian Mikolajczyk and others in "Tracking-Learning-Detection" proposed a method for detecting and tracking a single target in a video, which uses the information difference between frames to combine the detection and tracking to realize online learning of target samples. The median optical flow method proposed by this method requires target initialization, and it is difficult to ensure synchronization with the detector when the tracking correction is fixed. Yang Yanshuang and Pu Baoming proposed an adaptive threshold SUSAN method to detect the vehicle target boundary in "Moving Vehicle Detection Based on Improved SUSAN Algorithm". The histogram transform and the Hough transform are used to extract the connected domain of the target, and the vehicle target and background are extracted. Separation, the real-time performance of this method is poor, and it is difficult to effectively complete the target segmentation with adaptive threshold in complex scenes.

Summary of the invention

Aiming at the shortcomings of the existing technology, in order to solve the problems of poor real-time performance and insufficient stability of the existing target detection and tracking technology, the present invention proposes a real-time target detection and tracking method based on panoramic multi-channel 4k video images. Target detection and tracking Excellent performance and easy to implement in engineering.

The real-time target detection and tracking method based on panoramic multi-channel 4k video images provided by the present invention includes the following steps:

Step 1. Divide the panoramic multi-channel 4k video image into n regions, perform multi-frame target statistics for each region, classify each region of the panoramic video according to the target statistical probability, and complete the background modeling parameters according to the level of each region Threshold setting;

Step 2: Perform median filtering on the panoramic video image, initialize the background model, adaptively adjust the background modeling parameter threshold through the degree of dynamic transformation of the background, complete the background update, and then process the blinking pixels to complete the background image generation, and finally Use frame difference operation to realize the image generation of foreground candidate target area;

Step 3. Perform median filtering on the candidate target area image, use morphology-related operations to complete the enhanced candidate target area extraction, calculate the connected domain of the enhanced candidate target area and the minimum circumscribed rectangle of the connected domain, and eliminate false candidate target frames through the target shape features. Form the target spot;

Step 4. Perform continuous multi-frame detection on the panoramic video image to obtain the target point trace. By judging the absolute distance between the target point trace and the target track, and the multi-channel video cross coverage state, the target dynamic track management is performed, and the continuous multi-frame track information Perform data correction and complete multi-target stable tracking.

Step 1 includes:

Step 1-1, according to the panoramic video image size and scene coverage (the dividing criterion is that a single area does not exceed 1920*1080, and the 4k video image is just divided into 16), divide the panoramic video image into n areas S _n . N areas are denoted as S _n , the area width of each area is less than or equal to 1920 (pixels), and the area height is greater than or equal to 1080 (pixels);

Step 1-2, use the frame difference method (reference: ZHOU Y, JI J, SONG KA Moving Target Detection Method Based on Improved Frame Difference Background Modeling[J].Open Cybernetics&Systemics Journal, 2014) to count moving targets in K-frame video images The frequency of appearance in the panoramic video image, according to the frequency of the moving target, the n regions are divided into four levels: A, B, C, and D according to the frequency of the target appearance. Among them, there are moving targets in the video image with more than ₁ frame. The area is an A-level image area, the area where there are moving objects in the video image with more than K ₂ frames and less than ₁ frame is the B-level image area, and the area where there are moving objects in the video image with more than K ₃ frames and less than ₂ frames is the C-level image area. The area where the moving target exists in the video image of more than ₄ frames and less than ₃ frames is the D-level image area;

Steps 1-3, merge the adjacent image areas, and respectively record the corresponding panoramic position coordinates of each area. The nth S _n corresponds to the panoramic position coordinates of (x _n ,y _n ,w _n ,h _n ), where ( x _{_n,} y _n) is the n th region left corner position S _n w _{_n,} h _n denote the n-th region S _n, width and height.

Steps 1-4, setting corresponding background modeling parameter thresholds for n regions respectively, and the background modeling parameter threshold corresponding to the nth region S _n is T _n .

Step 2 includes:

Step 2-1, perform fast median filtering on panoramic video images (ZHANG Li, CHEN Zhi-qiang, GAO Wen-huan, et al. Mean-based fast median filter[J]. Journal of Tsinghua University: Science and Technology, 2004, 44(9): 1157-1159.), to eliminate the influence of background noise;

Step 2-2, initialize the background model of the panoramic video image, the background model modeling method adopts ViBE (Visual Background Extractor, BARNICH O, DROOGENBROECK M V. ViBe: A universal background subtraction algorithm for video sequences[J].IEEE Transactions on Image Processing, 2011, 20(06): 1709-1724.), where the background modeling parameter threshold T _{n is} set as the Euclidean distance threshold in the ViBE algorithm.

Step 2-3, adaptively adjust the background modeling parameter threshold T _n according to the dynamic transformation degree of the background to complete the background model update. The background modeling parameter threshold T _{n is} used to determine whether a pixel belongs to the background. Too large or too small will affect the quality of background modeling. In order to accurately describe the motion state of the target, the threshold is adaptively adjusted by the dynamic transformation degree to define the background transformation parameters. φ(x,y) is:

Where f(i,j) is the pixel value of the current frame at position (i,j), d(i,j) is the pixel value of the background model at position (i,j), M is the width of the current frame image, N Is the height of the current frame image.

Set the background transformation factor parameter μ. When the current pixel value is successfully matched with the background model, calculate the value of φ(x, y). If the current is a static scene φ(x, y) tends to be a stable value, if for a dynamic scene, φ(x, y) is larger, and the adaptive update of the background modeling parameter threshold T _n is performed according to the following formula:

Where T _n 'is the threshold after adaptive adjustment, β is the dynamic adjustment factor, and μ and β are both fixed parameters.

Steps 2-4, processing the blinking pixels in the background model to complete the generation of the background image. The specific processing method of flashing pixels: For the pixels in the background image generated in the background modeling, a certain pixel in the background image often bounces back and forth between the background point and the front spot, constructing an index level table of the flashing pixels, if said Pixels belong to the edge contour points of the background image (Reference: Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models[J]. International Journal of Computer Vision,1988,1(4):321-331.), But different from the edge contour points in the background image of the previous frame, the flicker frequency level increases

Otherwise, the flashing frequency level is reduced

If the flicker frequency level of a certain pixel of the continuous K frames of background image is greater than S _NK , then it is determined that the pixel is a flickering pixel, and the flickering pixel is removed from the updated background image.

Step 2-5: Perform difference between the panoramic video image and the background image obtained in step 2-4 to generate a candidate target image Im _obj , and the candidate target area is the candidate target image.

Step 3 includes:

Step 3-1, perform fast median filter on the candidate target image Im _obj (ZHANG Li, CHEN Zhi-qiang, GAO Wen-huan, et al. Mean-based fast median filter[J]. Journal of Tsinghua University: Science and Technology, 2004, 44(9): 1157-1159.) Generate image Im _mf ;

Step 3-2, perform morphological expansion on the filtered image Im _mf (Haralick R.Zhunag X. Image analysis using mathematical morphology[J].IEEE Trans.On Pattern Analysis and Machine Intelligence1987,9(4):532-550. ) Operate to generate an image Im _do , and then perform an AND operation between the image Im _do and the candidate target image Im _obj to generate an enhanced candidate target image Im _obj2 ;

Step 3-3, perform morphological closing operation on the image Im _obj2 (Haralick R.Zhunag X.Image analysis using mathematical morphology[J].IEEE Trans.On Pattern Analysis and Machine Intelligence 1987,9(4):532-550. ), extract the connected domain of the candidate target, calculate the minimum bounding rectangle of the connected domain, and extract the candidate target frame;

Step 3-4: Calculate the shape characteristics of the candidate target frame, the shape characteristics including the width obj_w, height obj_h, and aspect ratio obj_wh of the target frame, and determine whether the shape characteristics of the current candidate target frame satisfy obj_w>w ₀ , obj_h>h ₀ , obj_wh ≥ wh ₀ and obj_wh ≤ wh ₁ , if the above requirements are not met, the current candidate target frame is judged to be a false target and deleted; the candidate target frame that meets the requirements is generated as a target trace, where w ₀ is the target frame Width threshold, h ₀ is the target frame height threshold, wh ₁ and wh ₀ are the target aspect ratio high threshold and target aspect ratio low threshold respectively; the target trace includes frame number, target position coordinates, target width, and target height , Target aspect ratio and target area.

Step 4 includes:

Step 4-1, generate the target track Tr _i from the target point trace Po _i extracted from the first frame of panoramic video image, the specific operation method is: put the batch number BN automatically generated by the target point trace structure into the target track structure Volume vector, batch number BN is automatically accumulated, and satisfies 1≤BN≤9999, and the target track includes frame number, target position coordinates, target width, target height, target aspect ratio and target area;

Step 4-2: Calculate the absolute distance D _i+1 between the target point track Po _i+1 and the target track Tr _i extracted from the next frame of panoramic video image respectively, and the calculation formula of the absolute distance D _i+1 is:

Among them, Po _i+1 (x) is the abscissa of the target track, Po _i+1 (y) is the ordinate of the target track, Tr _i (x) is the abscissa of the target track, Tr _i (y) Is the ordinate of the target track;

If D _i+1 ≤DT, add the target point track Po _i+1 to the target track Tr _i ; if D _i+1 >DT, then regenerate the target point track Po _i+1 according to step 4-1 Target track Tr _i+1 , where DT is the absolute distance judgment threshold;

Step 4-3: Determine whether the current target is in the multi-channel video cross coverage state based on the track information, and adopt the fast correlation filtering method (Henriques J F, Rui C, Martins P, et al. High-speed tracking with kernelized correlation filters[J ].IEEE Transactions on Pattern Analysis&Machine Intelligence, 2015, 37(3):583-596.) Track management of multi-screen targets.

In step 4-3, judging whether the current target is in the multi-channel video cross coverage state according to the track information includes: when the position of the target in the horizontal direction in the i-th frame of panoramic video image I _i is greater than the threshold w ₁ , And the target's horizontal track speed is positive, and at the same time, when the target's position in the horizontal direction in the i+1th frame of panoramic video image I _i+1 is less than the threshold w ₂ , and the target's horizontal track speed When it is negative, it is determined that the target track reaches the edge of the image, that is, it is in a multi-channel video cross coverage state, where the panoramic video images I _i and I _i+1 are adjacent continuous images.

Step 4-4: Perform data correction on continuous multiple frames of track information to complete stable multi-target tracking.

Step 4-4 includes: storing the track data of continuous N _k frames of panoramic video images, and converting the track data of the current frame

And its previous N _k -1 frame predicted track data

Perform weighted average to generate corrected track data

The specific operations are as follows:

Among them, x is the target horizontal position coordinate in the track data, y is the target vertical position coordinate in the track data, w is the target width in the track data, h is the target height in the track data, σ ₁ and σ ₂ is a weighting factor, which satisfies σ ₁ +σ ₂ =1.

Beneficial effects: The present invention discloses a real-time target detection and tracking method based on panoramic multi-channel 4k ultra-high-definition video images, which solves the problems of high false alarm rate and low robustness in panoramic target detection and tracking. Using regional block processing to complete the background modeling threshold setting, then implement adaptive background modeling to extract candidate target regions and point traces, and finally use dynamic track management to achieve stable multi-target tracking of panoramic video. The present invention performs verification tests in multiple scenarios, has excellent target detection and tracking performance, the target detection rate is greater than 90%, and the average processing time is less than 40 ms, which fully verifies the effectiveness of the present invention.

Description of the drawings

In the following, the present invention will be further described in detail with reference to the accompanying drawings and specific embodiments, and the above-mentioned or other advantages of the present invention will become clearer.

Figure 1 is a flow chart of the method according to the invention.

Detailed ways

The present invention will be further described below in conjunction with the drawings and embodiments.

As shown in FIG. 1, according to an embodiment of the present invention, a real-time target detection and tracking method based on multiple 4k video images includes the following steps:

Step 1. Divide the panoramic 4-channel 4k video image into 16 areas, perform multi-frame target statistics on each of the 16 areas, classify each area of the panoramic video according to the target statistical probability, and complete the 16 areas according to the levels of the 16 areas. Threshold setting of each regional background modeling parameter;

Step 2: Perform fast median filtering on the panoramic video image, initialize the background model, adjust the background modeling parameter threshold adaptively through the dynamic transformation of the background to complete the background update, and then process the blinking pixels to complete the background image generation. Finally, the frame difference operation is used to extract the foreground target candidate area;

Step 3: Perform fast median filtering on the candidate target region image, use morphological related operations to complete the enhanced target region extraction, calculate the connected domain of the enhanced candidate target region and the minimum circumscribed rectangle of the connected domain, and eliminate false candidate target frames through target shape features. Form the target spot;

Step 4. Perform continuous multi-frame detection on the panoramic video to obtain the target point trace. By judging the absolute distance between the target point trace and the target track, and the multi-channel video cross coverage state, the target dynamic track management is performed, and the continuous multi-frame track information Perform data correction and complete multi-target stable tracking.

In the present invention, step 1 includes:

Step 1-1, according to the panoramic 4-channel 4k video image size and scene coverage, the panoramic video image is divided into 16 regions, the width and height of the region is W _n × H _n , where the region width W _n ≤ 1920, the region height H _n ≤1080;

Step 1-2, use the frame difference method (ZHOU Y, JI J, SONG KA Moving Target Detection Method Based on Improved Frame Difference Background Modeling [J].Open Cybernetics&Systemics Journal, 2014) to count the moving targets in the panoramic video in 200000 frames of video images The frequency of appearance in the image, according to the frequency of occurrence of moving objects, the area S _{n is} divided into four levels A, B, C, and D according to the frequency of appearance of the target. Among them, the area with moving objects in the video image with more than 20,000 frames is A level Image area, the area where there are moving objects in the video image with more than 10,000 frames and less than 20,000 frames is the B-level image area, the area where there are moving objects in the video image with 5000 frames and more than 10,000 frames is the C-level image area, and the video images with more than 1,000 frames and less than 5000 frames exist The area of the moving target is a D-level image area, where _n in the area S _n ranges from [1,16]; each area has only one level, and each level corresponds to a threshold, so there are 16 thresholds in 16 areas;

Steps 1-3, merge the adjacent level areas, and respectively record the corresponding panoramic position coordinates (x _n , y _n , w _n , h _n ) of each area S _n , where (x _n , y _n ) is the area S _n The position coordinates are the upper left corner coordinates, and (w _n , h _n ) is the width and height of the area S _n .

Step 1-4, S _n region respectively disposed corresponding to the corresponding level background modeling parameter threshold value T _{_n,} T _n is generally a value of _{_{T nA = 30, T nB =}} 25, T nC = 20 and T _nD = 15, wherein T _nA , T _nB , T _nC , and T _nD represent the thresholds set for S _{n in} areas A, B, C, and D respectively. If a moving target appears in 22,000 frames of video in the 200,000 frames of S ₁ area, then T ₁ =30 .

In the present invention, step 2 includes:

Step 2-2, initialize the background model of the panoramic video, the background model modeling method adopts ViBE (Visual Background Extractor, BARNICH O, DROOGENBROECK M V.ViBe: A universal background subtraction algorithm for video sequences[J].IEEE Transactions on Image Processing , 2011, 20(06): 1709-1724.), where the background modeling parameter threshold T _{n is} set as the Euclidean distance threshold in the ViBE algorithm, and the default value of T _n is 20.

Step 2-3, adaptively adjust the background modeling parameter threshold T _n according to the dynamic transformation degree of the background to complete the background model update. The background modeling parameter threshold T _{n is} used to determine whether a pixel belongs to the background. Too large or too small will affect the quality of background modeling. In order to accurately describe the motion state of the target, the threshold is adaptively adjusted by the dynamic transformation degree to define the background transformation parameters. φ(x,y):

Where f(i,j) is the pixel value of the current frame at (i,j), d(i,j) is the pixel value of the background model at (i,j), M is the width of the current frame image, and N is the current The height of the frame image, M=3840, N=2160. Set the background transformation factor parameter μ. When the current pixel value is successfully matched with the background model, calculate the value of φ(x, y). If the current is a static scene φ(x, y) tends to be a stable value, if for a dynamic scene, φ(x, y) is larger, and the adaptive update of the background modeling parameter threshold T _n is performed according to the following formula:

Among them, T _n 'is the threshold after adaptive adjustment, β is the dynamic adjustment factor, μ and β are fixed parameters, μ is generally taken as 0.8, and β is generally taken as 0.2.

Steps 2-4, processing the blinking pixels in the background model to complete the generation of the background image. The specific processing method of flashing pixels: For the background image generated in the background modeling, a certain pixel in the background image often bounces back and forth between the background point and the front spot, constructing an index level table of flashing pixels, for the edge contours belonging to the background image When the point (Kass M, Witkin A, Terzopoulos D. Snakes: Active contour models[J].International Journal of Computer Vision,1988,1(4):321-331.) is different from the edge contour point in the background image of the previous frame Increased flashing frequency level

The same as the pixel edge contour point, the flicker frequency level is reduced

If the frequency level of a certain pixel of the continuous K background image is greater than S _NK , it is determined that the current pixel is a flickering pixel, and it is removed from the updated background image. Where K=50,

S _NK =10.

Step 2-5: Perform a difference between the original image and the background image extracted from a single frame to generate a candidate target image Im _obj to complete the candidate target extraction.

In the present invention, step 3 includes:

Step 3-2: Perform a morphological expansion operation on the filtered image Im _mf to generate an image Im _do , and then perform an AND operation between the image Im _do and the candidate target image Im _obj to generate an enhanced candidate target image Im _obj2 ;

Step 3-3: Perform a morphological closing operation on the image Im _obj2 , extract the connected domain of the candidate target, calculate the minimum bounding rectangle of the connected domain, and extract the candidate target frame;

Step 3-4: Calculate the shape characteristics of the candidate target frame, the shape characteristics including the width obj_w, height obj_h, and aspect ratio obj_wh of the target frame, and determine whether the shape characteristics of the current candidate target frame satisfy obj_w>w ₀ , obj_h>h ₀ , obj_wh ≥ wh ₀ and obj_wh ≤ wh ₁ , if the above requirements are not met, it is determined that the candidate target frame is currently a false target, and the candidate frame will meet the requirements to generate target traces, where w ₀ is the target frame width threshold, and h ₀ is The target frame height thresholds, wh ₁ , wh ₀ are the target aspect ratio high and low thresholds, usually w ₀ =10, h ₀ =10, wh ₁ =5, and wh ₀ =1. The track data includes frame number, target position coordinates, target width, target height, target aspect ratio and target area.

In the present invention, step 4 includes:

Step 4-1 _: Generate the target track Tr _i from the target point trace Po _i extracted from the first frame of video image. The specific operation method is as follows: automatically generate the batch number BN of the target track structure and put it into the target track structure vector. The batch number BN is automatically accumulated and satisfies 1≤BN≤9999. The target track includes frame number and target. Position coordinates, target width, target height, target aspect ratio and target area.

Step 4-2: Calculate the absolute distance D _i+1 between the target point track Po _i+1 and the target track Tr _i extracted from the next frame of video image respectively, and the calculation method of the absolute distance D _i+1 is:

Among them, Po _i+1 (x) is the x coordinate of the target point trace, Po _i+1 (y) is the y coordinate of the target point trace, Tr _i (x) is the x coordinate of the target track, Tr _i (y) Is the y coordinate of the target track.

If D _i+1 ≤DT, add the target point track Po _i+1 to the target track Tr _i ; if D _i+1 >DT, then regenerate the target point track Po _i+1 according to step 4-1 Target trajectory Tr _i+1 , where DT is the absolute distance judgment threshold, which is generally 15;

Step 4-3, judge whether the current target is in the multi-channel video cross coverage state according to the track information, and adopt the fast correlation filtering method (Henriques JF, Rui C, Martins P, et al. High-speed tracking with kernelized correlation filters[J] .IEEE Transactions on Pattern Analysis&Machine Intelligence, 2015, 37(3):583-596.) Track management of multi-screen targets. The specific method for determining the state of multi-channel video cross coverage: When the position of the target in the horizontal direction in the image I ₁ is greater than w ₁ , and the target's horizontal track speed is positive, it is determined that the target track reaches the edge of the image At the same time, when the position of the target in the horizontal direction in the image I ₂ is less than w ₂ and the track speed of the target in the horizontal direction is negative, it is determined that the target track also reaches the edge of the image, and w ₁ generally takes the value Is 3800, and w _{2 is} generally 50.

Step 4-4: Perform data correction on continuous multiple frames of track information to complete stable multi-target tracking. The data correction method is: store the track data of continuous N _k frames of video images, and change the track data of the current frame

And its previous N _k -1 frame predicted track data

Perform weighted average to generate corrected track data

The specific operations are as follows:

among them

Is the corrected track data, x is the target horizontal position coordinate in the track data, y is the target vertical position coordinate in the track data, w is the target width in the track data, h is the target height in the track data , Σ ₁ and σ ₂ are weighting factors, N _{k is} generally 25, σ _{1 is} generally 0.3, and σ _{2 is} generally 0.7, which satisfies σ ₁ +σ ₂ =1.

The present invention provides a real-time target detection and tracking method based on panoramic multi-channel 4k video images. There are many methods and ways to implement this technical solution. The above are only the preferred embodiments of the present invention. It should be pointed out that the technology in this technical field For personnel, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications should also be regarded as the protection scope of the present invention. All the components that are not clear in this embodiment can be implemented using existing technology.

Claims

A real-time target detection and tracking method based on panoramic multi-channel 4k video images is characterized in that it includes the following steps:

Step 1. Divide the panoramic multi-channel 4k video image into n regions, perform multi-frame target statistics for each region, classify each region of the panoramic video according to the target statistical probability, and complete the background modeling parameters according to the level of each region Threshold setting;

Step 2: Perform median filtering on the panoramic video image, initialize the background model, adaptively adjust the background modeling parameter threshold through the degree of dynamic transformation of the background, complete the background update, and then process the blinking pixels to complete the background image generation, and finally Use frame difference operation to realize the image generation of foreground candidate target area;

Step 3. Perform median filtering on the candidate target area image, use morphology-related operations to complete the enhanced candidate target area extraction, calculate the connected domain of the enhanced candidate target area and the minimum circumscribed rectangle of the connected domain, and eliminate false candidate target frames through the target shape features. Form the target spot;

Step 4. Perform continuous multi-frame detection on the panoramic video image to obtain the target point trace. By judging the absolute distance between the target point trace and the target track, and the multi-channel video cross coverage state, the target dynamic track management is performed, and the continuous multi-frame track information Perform data correction and complete multi-target stable tracking.
The method of claim 1, wherein step 1 comprises the following steps:

Step 1-1, according to the panoramic video image size and scene coverage, divide the panoramic video image into n areas S n , the nth area is denoted as S n , the area width of each area is less than or equal to 1920, and the area height is greater than or equal to 1080;

Step 1-2, use the frame difference method to count the frequency of the moving target in the K-frame video image in the panoramic video image. According to the frequency of the moving target, divide the n regions into A, B, C according to the frequency of the target. , D four levels, of which the area where there are moving objects in the video image with more than K 1 frame is the A level image area, the area where the moving object exists in the video image with more than K 2 frames and less than 1 frame is the B level image area, and the area where K is more than 3 frames is K The area where the moving target exists in the video image with less than 2 frames is the C-level image area, and the area where the moving target exists in the video image with more than K 4 frames and less than K 3 frames is the D-level image area;

Steps 1-3, merge the adjacent image areas, and respectively record the corresponding panoramic position coordinates of each area. The nth S n corresponds to the panoramic position coordinates of (x n ,y n ,w n ,h n ), where ( x n ,y n ) are the coordinates of the upper left corner of the position of the nth area S n , w n , h n represent the width and height of the nth area S n respectively;

Steps 1-4, setting corresponding background modeling parameter thresholds for n regions respectively, and the background modeling parameter threshold corresponding to the nth region S n is T n .
The method of claim 2, wherein step 2 includes the following steps:

Step 2-1, perform fast median filtering on the panoramic video image to eliminate the influence of background noise;

Step 2-2: Initialize the background model of the panoramic video image. The background model modeling method adopts ViBE, and the background modeling parameter threshold T n is set as the Euclidean distance threshold in the ViBE algorithm;

Step 2-3, adaptively adjust the background modeling parameter threshold T n according to the dynamic transformation degree of the background to complete the background model update;

Steps 2-4, processing the blinking pixels in the background model to complete the generation of the background image;

Step 2-5: Perform difference between the panoramic video image and the background image obtained in step 2-4 to generate a candidate target image Im obj , and the candidate target area is the candidate target image.
The method of claim 3, wherein steps 2-3 include:

The background modeling parameter threshold T n is used to determine whether a pixel belongs to the background, and the background transformation parameter φ(x, y) is defined as:

Where f(i,j) is the pixel value of the current frame at position (i,j), d(i,j) is the pixel value of the background model at position (i,j), M is the width of the current frame image, N Is the height of the current frame image;

Set the background transformation factor parameter μ. When the current pixel value is successfully matched with the background model, calculate the value of φ(x, y). If the current is a static scene φ(x, y) tends to be a stable value, if for a dynamic scene, φ(x, y) is larger, and the adaptive update of the background modeling parameter threshold T n is performed according to the following formula:

Where T n 'is the threshold after adaptive adjustment, β is the dynamic adjustment factor, and μ and β are both fixed parameters.
The method of claim 4, wherein steps 2-4 include:

For the pixels in the background image generated in the background modeling, if the pixels belong to the edge contour points of the background image but are different from the edge contour points in the background image of the previous frame, the flicker frequency level increases
Otherwise, the flashing frequency level is reduced
If the flicker frequency level of the continuous K frames of background image is greater than S NK , then the pixel is determined to be a flickering pixel, and the flickering pixel is removed from the updated background image.
The method of claim 5, wherein step 3 includes the following steps:

Step 3-1: Perform median filtering on the candidate target image Im obj to generate an image Im mf ;

Step 3-2: Perform a morphological expansion operation on the image Im mf to generate an image Im do , and then perform an AND operation between the image Im do and the candidate target image Im obj to generate an enhanced candidate target image Im obj2 ;

Step 3-3: Perform a morphological closing operation on the image Im obj2 , extract the connected domain of the candidate target, calculate the minimum bounding rectangle of the connected domain, and extract the candidate target frame;

Step 3-4: Calculate the shape characteristics of the candidate target frame, the shape characteristics including the width obj_w, height obj_h, and aspect ratio obj_wh of the target frame, and determine whether the shape characteristics of the current candidate target frame satisfy obj_w>w 0 , obj_h>h 0 , obj_wh ≥ wh 0 and obj_wh ≤ wh 1 , if the above requirements are not met, the current candidate target frame is judged to be a false target and deleted; the candidate target frame that meets the requirements is generated as a target trace, where w 0 is the target frame Width threshold, h 0 is the target frame height threshold, wh 1 and wh 0 are the target aspect ratio high threshold and target aspect ratio low threshold respectively; the target trace includes frame number, target position coordinates, target width, and target height , Target aspect ratio and target area.
The method of claim 6, wherein step 4 includes the following steps:

Step 4-1, generate the target track Tr i from the target point trace Po i extracted from the first frame of panoramic video image, the specific operation method is: put the batch number BN automatically generated by the target point trace structure into the target track structure Volume vector, batch number BN is automatically accumulated, and satisfies 1≤BN≤9999, and the target track includes frame number, target position coordinates, target width, target height, target aspect ratio and target area;

Step 4-2: Calculate the absolute distance D i+1 between the target point track Po i+1 and the target track Tr i extracted from the next frame of panoramic video image respectively, and the calculation formula of the absolute distance D i+1 is:

Among them, Po i+1 (x) is the abscissa of the target track, Po i+1 (y) is the ordinate of the target track, Tr i (x) is the abscissa of the target track, Tr i (y) Is the ordinate of the target track;

If D i+1 ≤DT, add the target point track Po i+1 to the target track Tr i ; if D i+1 >DT, then regenerate the target point track Po i+1 according to step 4-1 Target track Tr i+1 , where DT is the absolute distance judgment threshold;

Step 4-3: Determine whether the current target is in the multi-channel video cross coverage state according to the track information, and manage the track of the target that belongs to the multi-screen;

Step 4-4: Perform data correction on continuous multiple frames of track information to complete stable multi-target tracking.
The method according to claim 7, wherein, in step 4-3, the judging whether the current target is in a multi-channel video cross coverage state according to the track information comprises:

When the position of the target in the horizontal direction in the i-th frame of panoramic video image I i is greater than the threshold w 1 , and the target's horizontal track speed is positive, at the same time, when the target is in the i+1-th frame of panoramic video image I When the position in the horizontal direction in i+1 is less than the threshold w 2 and the track speed in the horizontal direction of the target is negative, it is determined that the target track reaches the edge of the image, that is, it is in the state of multi-channel video cross coverage. The video images I i and I i+1 are adjacent continuous images.
8. The method of claim 8, wherein step 4-4 comprises:

Store the track data of continuous N k frames of panoramic video images, and change the track data of the current frame
And its previous N k -1 frame predicted track data
Perform weighted average to generate corrected track data

Among them, x is the target horizontal position coordinate in the track data, y is the target vertical position coordinate in the track data, w is the target width in the track data, h is the target height in the track data, σ 1 and σ 2 is a weighting factor, which satisfies σ 1 +σ 2 =1.