CN111539987A

CN111539987A - Occlusion detection system and method based on discrimination model

Info

Publication number: CN111539987A
Application number: CN202010251627.XA
Authority: CN
Inventors: 乔宇; 谷月阳
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2020-04-01
Filing date: 2020-04-01
Publication date: 2020-08-14
Anticipated expiration: 2040-04-01
Also published as: CN111539987B

Abstract

The invention provides a shielding detection system and method based on a discriminant model, which comprises the following steps: step S1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector; step S2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center; step S3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information; step S4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction. The present invention overcomes the shortcomings of the context-based occlusion detection algorithms described above.

Description

Occlusion detection system and method based on discrimination model

Technical Field

The invention relates to the technical field of occlusion detection systems and methods, in particular to an occlusion detection system and method based on a discriminant model.

Background

Video tracking is one of the most important research areas in computer vision and has many widespread applications. Video tracking systems can generally be divided into five parts: a Motion Model (Motion Model) generates a plurality of candidate areas possibly containing the target based on the tracking result of the previous frame; a Feature Extractor (Feature Extractor) performs Feature extraction on the candidate region so as to accurately classify the candidate region; an Observation Model (observer Model) calculates the probability that each candidate region is the target by using the result of feature extraction, and takes the candidate region with the maximum probability as the position of the target; a Model Updater (Model Updater) updates the target template by using the output result of the current frame; the integrated processor (Ensemble Post-processor) integrates a plurality of output results which may exist as a final output result. Video tracking algorithms mainly fall into two broad categories: a generation algorithm and a discrimination algorithm. The generation algorithm assumes the generation process of the target appearance and searches the most similar candidate area in the image; the discriminant algorithm, however, is different and it usually trains a classifier to recognize objects from the background. Because background information is utilized, the performance of the discrimination algorithm is superior to that of the generation algorithm. Historically, classical discrimination algorithms are infinite, and S.Avidan proposes a Support Vector Tracking (Support Vector Tracking) algorithm based on a Support Vector machine; kalal et al combine target Tracking with target Detection to propose Tracking-Learning-Detection (Tracking-Learning-Detection) algorithm; babenko uses a Multiple Instance Learning (Multiple Instance Learning) algorithm to put positive and negative samples into "bags," which are used to train a classifier. There are generally two indicators for evaluating the performance of the tracking algorithm, the Overlap Ratio (Overlap Ratio) and the Center position Error (Center Location Error). The coincidence rate refers to the ratio of the intersection and union of the output frame of the tracker and the real position of the target; the central position error refers to the Euclidean distance between the central points of the two. Considering that the target scale sizes in different sequences are different, the central position error cannot accurately reflect the quality of the tracking result. For a larger scale target, a shift of the output bounding box by a few pixels from the true position means that the tracking result is better, whereas for a small target, a shift of a few pixels is likely to indicate that the tracker has deviated far from the target. In addition, when a target is occluded, there is a possibility that the trackers lose the target, and in this case, even if the frames output by the two trackers are at different positions, the quantitative result of the evaluation index should be the same, but the result calculated by the center position error is different. Since the normalization processing is performed, the above two problems do not occur using the coincidence ratio as an evaluation index. The accuracy of the target template is one of important factors influencing the performance of a video tracking algorithm, and the updating of the target template is influenced by occlusion, appearance change and the like. Occlusion means that some appearance information of the object is covered by the occlusion, while the change in appearance of the object is caused by illumination changes or deformations. Therefore, the update strategy of the target template should be distinct in these two cases: when the target is shielded, the template stops updating the shielded area, and when the appearance of the target changes, the template learns the change in time and updates the change. However, most of the current algorithms for the occlusion problem use occlusion as one of the target appearance changes, and use the same template update strategy: the update is stopped when the tracking confidence is small. Obviously, such a template update strategy is not suitable, which may cause the tracking system to lose the target due to occlusion. Context-based occlusion detection algorithms attempt to detect and process occlusions directly. The algorithm considers that if the background small blocks around the target enter the target area in the next frame and cover the target area, the target is shielded. Therefore, the algorithm not only tracks the target by using a kernel Correlation Filter (Kernelized Correlation Filter), but also tracks the background small blocks around the target to obtain the position of the target and the background in the next frame, and then calculates the coincidence rate of the target and the background; and the small blocks with the coincidence rate larger than the threshold value and the tracking confidence coefficient higher than the threshold value are considered to shield the target. The detection failure of the algorithm can occur when the algorithm is used for dealing with complex conditions, and the main reasons are as follows: the tracking result of the tracker on the background small block is not necessarily accurate, and the actual position of the tracker is not the position with the maximum response peak value; secondly, the preset criterion and threshold are used to determine whether the background covers the target, which is not suitable for all sequences.

Patent document CN109635723A discloses an occlusion detection method and device, which obtains a depth map of an image to be detected; and under the condition that a target histogram exists in the histogram of the depth map, determining that a camera for collecting the image to be detected is shielded, wherein the target histogram is the histogram with the maximum neighborhood gradient and the height larger than a threshold value, and the neighborhood gradient is the height difference between the neighborhood gradient and the adjacent histogram. It can be seen that the occlusion detection is based on the depth map and the histogram of the depth map without using a sample training model, and therefore, the dependence on the training sample is avoided. The patent still has room for improvement in the optimization algorithm.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a shielding detection system and method based on a discriminant model.

The occlusion detection method based on the discriminant model provided by the invention comprises the following steps: step S1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector; step S2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center; step S3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information; step S4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction.

Preferably, step S3 includes: step S3.1: and judging whether the similarity of the background small blocks is higher, if so, acquiring the information of the judgment result that the candidate area is shielded.

Preferably, step S3 includes: step S3.2: and judging whether the similarity of the background small blocks is higher, if not, acquiring judgment result information that the candidate area is not shielded.

Preferably, step S4 includes: step S4.1: setting all values in the mask as 1, after occlusion detection, setting elements corresponding to occluded candidate regions in the mask as 0, and using the mask for subsequent template updating and position prediction.

The invention provides a shielding detection system based on a discriminant model, which comprises:

module M1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector;

module M2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center;

module M3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information;

module M4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction.

Preferably, the module M3 includes: module M3.1: and judging whether the similarity of the background small blocks is higher, if so, acquiring the information of the judgment result that the candidate area is shielded.

Preferably, the module M3 includes: module M3.2: and judging whether the similarity of the background small blocks is higher, if not, acquiring judgment result information that the candidate area is not shielded.

Preferably, the module M4 includes: module M4.1: setting all values in the mask as 1, after occlusion detection, setting elements corresponding to occluded candidate regions in the mask as 0, and using the mask for subsequent template updating and position prediction.

Preferably, the method comprises the following steps: the system comprises an input image module, a target tracker, an occlusion detector, a local template updater, a target predictor and a target modeler; the input image module is connected with the target tracker; the target tracker is connected with the shielding detector; the shielding detector is respectively connected with the target modeler and the local template updater; the local template updater is connected with the target modeler and the target predictor; the object predictor is connected with the object modeler.

Preferably, the target tracker is capable of sending tracking result information to an occlusion detector; the occlusion detector can send updated mask information to a local template updater; the local template updater can send the shielding proportion information to the target predictor; the shielding detector can send shielding object template information to the target modeler; the local template updater can send first target template information to the target modeler; the target predictor can send the search range information to a target modeler; the target modeler may send second target template information to the occlusion detector; the target modeler may send target model information to a target tracker.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention provides an occlusion detection method based on a discriminant model, which overcomes the defects of the occlusion detection algorithm based on the context;

2. the method classifies candidate areas in the output frame of the tracker by using a discrimination model, the areas belonging to the target are considered to be not shielded, and the areas belonging to the target are considered to be shielded;

3. according to the invention, the candidate areas in the output frame are respectively compared with the target template and the background, whether the target is shielded or not is judged by using the judgment model, and the appearance change of the target can be captured when the target is shielded by adopting a strategy of updating the local template, so that the target is more accurately tracked.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

fig. 1 is a schematic diagram of the working principle of the present invention.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

As shown in fig. 1, the occlusion detection method based on the discriminant model according to the present invention includes: step S1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector; step S2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center; step S3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information; step S4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction.

The invention provides a shielding detection system based on a discriminant model, which comprises: module M1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector; module M2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center; module M3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information; module M4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction.

Specifically, in one embodiment, a discriminant model-based occlusion detection system comprises: the occlusion detector judges whether the candidate area in the output frame belongs to an occlusion object or a target by using the discriminant model so as to detect whether the small block is occluded, the output result is a mask with the same scale as the target, and elements in the mask represent the occlusion condition of the corresponding position of the target; and the local template updater updates the target template according to the mask for shielding the output of the detection. The updating strategy is not simply 'updating' or 'not updating', but updating for the area of the target which is not shielded, and not updating for the shielded area; the target predictor is used for generally preventing a target in a video tracking test sequence from moving to a large extent in two adjacent frames, so that the search range of the tracker is generally small; when the target is seriously shielded, the output result is possibly deviated from the target, which can cause the target not to be in the search range after being separated from the shielding, and the target predictor expands the search range when the target is seriously shielded, thereby ensuring that the target can be sampled even if the target is far away from the output result; the target modeler needs a target template and a search range when tracking the target of the subsequent frame, compares the candidate region with the background and the small block which already blocks the target when carrying out occlusion detection, and caches the template of the occlusion object, the target template and the search range for tracking the target and detecting the occlusion later.

Preferably, the occlusion detection does not use criteria and thresholds, but calculates similarity between the candidate region in the target frame and the corresponding region and background region of the target template, and classifies the type of the candidate region based on the discriminant model to determine whether the region is occluded.

Preferably, the template updater updates the current frame output result locally.

Specifically, in one embodiment, an occlusion detection method based on a discriminant model includes the following steps that after a tracker outputs the position of a target in a current frame, an occlusion detector generates background small blocks which are closely arranged on the periphery of the target, in a next frame, if a point in a target frame exists in a search range of the background small blocks, a candidate region with the same scale is generated by taking the point as a center, the similarity between the candidate region and corresponding regions of the background small blocks and a target template is calculated, if the similarity between the candidate region and the background small blocks is higher, the candidate region is occluded, and otherwise, the candidate region is not occluded. A mask with the same scale as the target is preset, all values in the mask are set to be 1, after occlusion detection is carried out, elements corresponding to occluded candidate regions in the mask are set to be 0, and the mask is used for subsequent template updating and position prediction.

Specifically, in one embodiment, as shown in fig. 1, a video occlusion detection system based on context information comprises:

and the target tracker acquires the initial position of the target in the first frame and tracks the target in the subsequent frames. The specific method comprises the following steps: the tracker first samples the output of the previous frame as the center to construct a circulant matrix, one property of which can be diagonalized by discrete fourier changes, namely:

wherein F is a discrete Fourier matrix and F is a discrete Fourier matrix,

for the result after discrete fourier transform of the first row of the circulant matrix X, H denotes the conjugate transpose and diag denotes the diagonal matrix. And then the tracker performs detection by using the trained target:

wherein z represents the image to be tracked, x represents the image of the first frame, k^xzThe first row of the kernel matrix representing x and z, represents the discrete Fourier transform, α represents the Laplace operator, which can be calculated by:

where y represents a Gaussian distribution label, x represents the image of the first frame, λ represents a regularization coefficient, and ^ represents a discrete Fourier transform. Selecting a gaussian kernel, then:

wherein x, x' represents two input images, exp represents an exponential function with a natural constant as a base, sigma represents a standard deviation, and ^ represents a discrete Fourier transform, and F^-1And (3) representing inverse discrete Fourier transform, | | | | | represents modulus, represents conjugation, and ⊙ represents point multiplication to obtain a response matrix, and then taking the position corresponding to the response peak value as the position of the target current frame.

And the occlusion detector judges whether the candidate area in the target frame belongs to the target or the background by using the discrimination model. First, generating small blocks with the same size and closely arranged around the target, and searching a similar area in the next frame, wherein the searching range is twice of the size of the small blocks. In the next frame, if a point in the target frame exists in the search range of a certain background small block, a candidate region with the same size is generated by taking the point as the center, the discrimination model respectively calculates the similarity between the candidate region and the background small block as well as the similarity between the candidate region and the target template, and the similarity calculation formula is as follows:

wherein

Representing a feature extractor that aggregates Histogram of oriented gradient (Histogram of OrientGradient) features and Color Name (Color Name) features. r is₁、r₂Representing the image block, | | | |, represents modulus, mean (·) represents mean value, and ρ represents similarity. The same candidate region may correspond to a plurality of regionsComparing the small blocks of the background, and taking the maximum similarity as the probability that the candidate area is the background.

When the similarity between the candidate region and the target template is calculated, considering that the target may rotate and other posture changes in two adjacent frames, the similarity calculation is performed on the candidate region and the template block with a larger corresponding position scale. Specifically, the candidate region is slid on the template block, the similarity is respectively calculated, and the maximum similarity is taken as the probability that the candidate region is the target. And after the probability that the candidate region belongs to the background and the target is obtained, judging by the judgment model, if the probability that the candidate region belongs to the background is greater than the probability that the candidate region belongs to the target, covering the candidate region, otherwise, not covering the candidate region. And after all the candidate areas are judged, setting a template updating mask with the same size as the target dimension, wherein the element corresponding to the area which is not shielded is set to be 1, and the element corresponding to the shielded area is set to be 0. The mask is used for local updates to the target template.

And the local template updater is used for learning the characteristics of the image which is not blocked in the output frame and neglecting the blocked part when the blocking occurs. Based on this, the target updater locally updates the template using the template update mask. After the tracker outputs the tracking result, the new target template can be obtained by the following formula:

x_t←x_t-1·M·(1-β)+x_c·M·β+x_t-1·(1-M) (6)

where M denotes a template update mask, x_t-1Template, x, representing the previous frame_cRepresenting the image in the output frame of the current frame, x_tThe template representing the current frame, β is the learning rate, obviously for non-occluded regions, the new template is the weighted sum of the current frame target and the previous frame template:

x_t←x_t-1·(1-β)+x_c·β (7)

wherein x_t-1Template, x, representing the previous frame_cRepresenting the image in the output frame of the current frame, x_tTemplate representing the current frame, β being the learning rate

For the region where the target is occluded, the template remains unchanged because the corresponding element in the mask is 0:

x_t←x_t-1·(1-M)＝x_t-1(8)

where M denotes a template update mask, x_t-1Template, x, representing the previous frame_tRepresenting the template for the current frame.

The target predictor is used for generally preventing a target in a video tracking test sequence from moving to a large extent in two adjacent frames, so that the search range of the tracker is generally small; when the target is seriously occluded, the output result may deviate from the target, which may cause the target to be out of the occlusion and not be within the search range. When initializing a target tracker, training n relevant filters with different scales, when detecting that a target is seriously shielded, expanding the search range of the tracker by using a target predictor, and performing relevant operation on the filters with the same scale trained during initialization; after the target is separated from the shielding, the target predictor reduces the search range and avoids over-sampling of useless background information.

And the target modeler caches the target template, the shelter template and the search range of the previous frame and is used for tracking the target of the next frame. The target template reflects appearance information of the target. In the first frame, the target template is an image corresponding to the real position; in subsequent frames, the target template is continuously updated. The shelter template is a background small block which is more similar to the candidate area in the shelter detection process and reflects the appearance information of the object which shields the target; in the subsequent occlusion detection, the occlusion object template is taken as a background small block to perform similarity calculation with the candidate region. The search range is a certain value under normal conditions, when the target is seriously shielded, the search range is expanded, and the target is ensured to be within the search range even if the output result deviates from the target; after the occlusion is finished, the tracking system can retrace the target.

The invention provides an occlusion detection method based on a discriminant model, which overcomes the defects of the occlusion detection algorithm based on the context; the method classifies candidate areas in the output frame of the tracker by using a discrimination model, the areas belonging to the target are considered to be not shielded, and the areas belonging to the target are considered to be shielded; according to the invention, the candidate areas in the output frame are respectively compared with the target template and the background, whether the target is shielded or not is judged by using the judgment model, and the appearance change of the target can be captured when the target is shielded by adopting a strategy of updating the local template, so that the target is more accurately tracked.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A method for occlusion detection based on a discriminant model, comprising:

step S1: after the tracker outputs the position of the target of the current frame, the shielding detector generates closely-arranged background small blocks at the periphery of the shielding detector;

step S2: in the next frame, if a point in the target frame exists in the search range of the background small block, generating a candidate region with the same scale by taking the point as the center;

step S3: calculating the similarity of the candidate region with the background small block and the corresponding region of the target template, wherein if the similarity with the background small block is higher, the candidate region is shielded, otherwise, the candidate region is not shielded; judging whether the similarity of the background small blocks is higher or not, and acquiring judgment result information;

step S4: a mask with the same scale as the target is preset and used for subsequent template updating and position prediction.

2. The occlusion detection method based on the discriminant model of claim 1, wherein step S3 comprises: step S3.1: and judging whether the similarity of the background small blocks is higher, if so, acquiring the information of the judgment result that the candidate area is shielded.

3. The occlusion detection method based on the discriminant model of claim 1, wherein step S3 comprises: step S3.2: and judging whether the similarity of the background small blocks is higher, if not, acquiring judgment result information that the candidate area is not shielded.

4. The occlusion detection method based on the discriminant model of claim 1, wherein step S4 comprises: step S4.1: setting all values in the mask as 1, after occlusion detection, setting elements corresponding to occluded candidate regions in the mask as 0, and using the mask for subsequent template updating and position prediction.

5. A occlusion detection system based on a discriminant model, comprising:

6. The discriminative model-based occlusion detection system of claim 5, wherein module M3 comprises:

module M3.1: and judging whether the similarity of the background small blocks is higher, if so, acquiring the information of the judgment result that the candidate area is shielded.

7. The discriminative model-based occlusion detection system of claim 5, wherein module M3 comprises:

module M3.2: and judging whether the similarity of the background small blocks is higher, if not, acquiring judgment result information that the candidate area is not shielded.

8. The discriminative model-based occlusion detection system of claim 5, wherein module M4 comprises: module M4.1: setting all values in the mask as 1, after occlusion detection, setting elements corresponding to occluded candidate regions in the mask as 0, and using the mask for subsequent template updating and position prediction.

9. The discriminative model-based occlusion detection system of any of claims 5-8, comprising: the system comprises an input image module, a target tracker, an occlusion detector, a local template updater, a target predictor and a target modeler;

the input image module is connected with the target tracker;

the target tracker is connected with the shielding detector;

the shielding detector is respectively connected with the target modeler and the local template updater;

the local template updater is connected with the target modeler and the target predictor;

the object predictor is connected with the object modeler.

10. The discriminative model-based occlusion detection system of claim 9 wherein the target tracker is capable of sending tracking result information to an occlusion detector;

the occlusion detector can send updated mask information to a local template updater;

the local template updater can send the shielding proportion information to the target predictor;

the shielding detector can send shielding object template information to the target modeler;

the local template updater can send first target template information to the target modeler;

the target predictor can send the search range information to a target modeler;

the target modeler may send second target template information to the occlusion detector;

the target modeler may send target model information to a target tracker.