CN105989614B

CN105989614B - Dangerous object detection method fusing multi-source visual information

Info

Publication number: CN105989614B
Application number: CN201510080128.8A
Authority: CN
Inventors: 袁媛; 王�琦; 房建武
Original assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Current assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Priority date: 2015-02-13
Filing date: 2015-02-13
Publication date: 2020-09-01
Anticipated expiration: 2035-02-13
Also published as: CN105989614A

Abstract

The invention provides a dangerous object detection method fusing multi-source visual information, which comprises the following steps: 1, multi-source visual image acquisition; incremental motion consistency considerations; 3, fusing multi-source visual information; and 4, calculating the detection rate. The method for detecting the dangerous object fusing the multi-source visual information solves the technical problems that the detection category of the dangerous object is limited and various information is not effectively utilized in the prior art.

Description

Dangerous object detection method fusing multi-source visual information

Technical Field

The invention belongs to the field of computer vision and image understanding, and particularly relates to a dangerous object detection method fusing multi-source visual information in video monitoring.

Background

The automatic prediction of dangerous objects which may appear during driving is a key technology in video monitoring. In general, detection of dangerous objects becomes difficult due to complicated object types, variable monitoring environments, and severe camera shake. At present, detection methods for dangerous objects are mainly divided into two main categories:

one is a detector-based approach that trains pedestrian or vehicle detectors in advance using manually collected pedestrian or vehicle samples, and then detects corresponding targets in the surveillance video. Xu et al, in the references "Y.xu, D.xu, S.Lin, T.Han, X.Cao, and X.Li.detection of Sudden Pedestrian crossing for driving Assistance Systems IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,42(3): 729-. This document is novel to train with partial pedestrian samples so that pedestrians can be detected as soon as they are discovered. Rezaei and Terrauchi propose a vehicle Detection method that combines multiple features and the Dempster-craft Technology in the literature "M.Rezaei and M.Terrauchi. vehicle Detection base on Multi-feature reagents and Dempster-craft Fusion theory. in Proceedings of Pacific-RimSymposum on Image and Video Technology,2013, pp.60-72". While these methods may provide some level of detection of dangerous objects, they have the disadvantage of requiring additional training samples and of not covering all objects present in front of the vehicle.

The other is a method based on the fusion of significance and color features, which introduces an attention selection mechanism in psychology into danger detection by means of significance detection. For example, Alonso et al, in the documents "J.Alonso, E.R.Vidal, A.rotter, and M.Muhlenberg.Lane-change precision air System Based on motion-drive Vehicle tracking. IEEE Transactions on Vehicle Technology,57(5): 2736-. The method has the disadvantages that the motion significance of one side of the visual field is only considered, and the position of the dangerous object is uncertain in the real driving process, namely the dangerous object can appear in the left and right visual fields which are shot.

Disclosure of Invention

The invention provides a novel dangerous object detection method fusing multi-source visual information, and solves the technical problems that in the prior art, the detection category of dangerous objects is limited, and a plurality of kinds of information are not effectively utilized.

The technical solution of the invention is as follows:

a dangerous object detection method fusing multi-source visual information comprises the following steps:

multi-source visual image acquisition:

1.1, acquiring a color video image and a near-infrared video image in real time by using a multispectral camera;

1.2, obtaining a depth image corresponding to the color video image by using a single-sequence depth recovery method;

1.3, obtaining a motion image corresponding to the color video image by using a correlation optical flow method;

1.4, segmenting each frame image in the moving image by utilizing a linear iterative clustering method to obtain a superpixel grid;

1.5, overlapping the super pixel grids on the color video image, the near infrared video image and the depth image;

incremental motion consistency considerations:

2.1, dividing the moving image into a left moving video frame and a right moving video frame, wherein the dividing line is a central axis of the moving image;

2.2 training a normal motion mode base A including a left normal motion mode base A by using a super-pixel motion mode obtained by initial F frame segmentation^lAnd the right normal movement mode base A^r(ii) a Wherein F is a value rangeThe circumference is 5-20;

2.3 at time t, the images of all channels are divided into N superpixels, and the characteristics y corresponding to the N superpixels in the moving image are calculatedⁱWherein i is 1: N;

2.4, constructing a graph regular minimum soft threshold mean square objective formula model:

wherein U is the constructed Gauss-Laplace error term and Y is all YⁱThe matrix is formed, X is the sparse coefficient to be solved, L is Laplace matrix, lambda₁Constraint coefficients for the gaussian-laplacian noise sparse term,

constraint coefficients for geometric prevalence regularization terms;

2.5, obtaining a calculation result of the danger confidence value of the motion image:

and (3) integrating the calculation results of the left side and the right side:

wherein,

β is the error balance coefficient of the left and right sides;

2.6 obtaining the confidence of danger under the consideration of the motion information of the whole motion image

Will be provided with

Is normalized to [0,1 ] using a max-min normalization method]To (c) to (d);

multi-source visual information fusion:

3.1 calculating the significance results of the color video image, the near-infrared video image and the depth image respectively by using a significance calculation method based on image evaluation to obtain the danger confidence coefficient S of the color video image_t ^CAnd the danger confidence of the near-infrared video image

And depth image risk confidence

3.2 Bayes model with significance

Fusing to obtain dangerous confidence maps of motion images, color video images, near-infrared video images and depth images

3.2.1 ] calculating the prior probability Pr (O):

obtaining an element distribution diagram according to the occurrence frequency of the superpixel features in the image space

Wherein the OPT is an element distribution diagram index; then the prior probability is

3.2.2 ] calculating likelihood probability Pr (S (z) | O):

will obtain

Binarization and calculating the target areaFalling of a domain on a corresponding original visual image

Number of pixels corresponding to the numerical value

Calculating the fall-in of the background region on the corresponding original visual image

Number of pixels corresponding to the numerical value

The likelihood probability of the target and the background is

3.3. calculating the fusion probability of all visual information after Bayes:

wherein,

and 4, calculating the detection rate:

repeating the steps 2-3 for each frame of image until the whole video image is processed; marking the real dangerous object area in the t-th frame of the video as G_tThe detection rate is as follows:

TPR＝TP/P,

FPR＝FP/N.

wherein TP is the number of pixels with correct detection, FP is the number of pixels with wrong detection, and P is G_tThe number of target pixel points in (1) is G_tAnd the number of the middle background pixel points.

Lambda of above₁A value of 0.05, said

The value is 0.005, and the value of β is 0.4.

The element distribution map index OPT in step 3.2.1 above is:

and F in the step 2.2 is 10.

The invention has the advantages that:

the invention simultaneously and synergistically considers the complementarity and selectivity of the multi-source visual information, and the obtained road dangerous object detection result is obviously superior to other methods.

Drawings

FIG. 1 is a flow chart of a dangerous object detection method fusing multi-source visual information according to the present invention.

Detailed Description

Referring to fig. 1, the steps implemented by the present invention are as follows:

step 1, a multi-source visual image acquisition module.

(1a) The method comprises the steps of utilizing a multispectral camera to obtain color and near-infrared video images in real time, then utilizing a single-sequence depth recovery method to obtain depth images corresponding to the color videos, and utilizing a correlation optical flow method to obtain motion images corresponding to the color videos. The motion image is divided into a specified number of super pixel images by a Linear Iterative Clustering (SLIC) method, and super pixel grids are superposed on the near infrared, depth and color images so as to facilitate super pixel feature calculation.

And step 2, an incremental motion consistency consideration module.

(2a) Dividing the motion image into a left part and a right part, wherein the dividing line is a central axis of the motion image;

(2b) respectively training a left normal motion mode base A by using a super-pixel motion mode obtained by initial 10-frame segmentation^lAnd the right normal movement mode base A^r. Method for minimum soft threshold mean square incremental expression by graph regularizationConsider dangerous motion information within the driver's field of view. Since the motion consistency is considered in the same manner for both sides, for the sake of brevity, A^lAnd A^rDenoted by a. Suppose that at time t, the images of all channels are divided into N superpixels, and the features y corresponding to the N superpixels in the moving image are calculated_iAnd i is 1: N, and then constructing a graph regular minimum soft threshold mean square target formula model:

u is the constructed Gauss-Laplace error term and Y is all Y_iThe matrix is formed, X is the sparse coefficient to be solved, L is Laplace matrix, lambda₁(0.05 in all examples) is the constraint coefficient of the sparse term of the laplacian of gaussian noise,

(0.005 in all examples) is sparse constrained by geometric prevalence regularities.

(2c) Obtaining a risk confidence value calculation result of the moving image:

wherein

β is the balance coefficient of the errors on the left and right sides, 0.4 is taken in all the examples, and the danger confidence coefficient under the consideration of the whole motion image motion information is obtained

Will be provided with

Is normalized to [0,1 ] using a max-min normalization method]In the meantime.

And step 3, a multi-source visual information fusion module.

(3a) The saliency results of color, near-infrared and depth channel images, respectively expressed as C.Yang, L.Zhang, H.Lu, X.Ruan, and M.Yang.Saliency detection via Graph-based Artificial ranking, in Proceedings of IEEE Conference on computer Vision and Pattern Recognition,2013: 3166-

These results may also be referred to as a risk confidence map for color, near infrared and depth images.

(3b) Motion, color, near infrared and depth image danger confidence coefficient graph obtained by utilizing significance Bayes model fusion

The significant Bayesian model is as follows:

1) the prior probability pr (o) is calculated. Unlike previous significance fusion methods, the present invention utilizes a more efficient elemental distribution map

To estimate a priori probabilities of including dangerous objects in the visual images. Wherein the elemental distribution map utilizes the literature "f.

Y.Pritch,A.Hornung.Saliency filters:ContrastBased Filtering for Salient Region Detection.In Proceedings of IEEEConf.Computer Vision and Pattern Recognition,2012:733740' are calculated. The method is characterized in that the spatial variance change of the super-pixel feature to be detected in the image is calculated, namely the occurrence probability of the super-pixel feature to be detected at other positions of the image is calculated. The larger the value of the elemental distribution map, the less physical. Then the prior probability of the object

And the optimal element profile index OPT is:

2) calculating the similarity probability Pr (S (z) | O). Will obtain

Binarization and then calculating the falling of the target region on the corresponding original visual image

Number of pixels corresponding to the numerical value

Simultaneously calculating the falling of the background area on the corresponding original visual image

Number of pixels corresponding to the numerical value

The likelihood probabilities of the target and the background are:

(3c) calculating the fusion probability of all visual information after Bayesian:

wherein,

and 4, calculating the detection rate.

And (5) executing the step (2) and the step (3) in each frame in the image until the whole video is processed. The real dangerous object area in the t-th frame of the marked video is Gt. The detection rate is the ROC curve calculation formula:

TPR＝TP/P,FPR＝FP/N.

wherein TP is the number of pixel points with correct detection, FP is the number of pixel points with wrong detection, and P is G_tThe number of target pixel points in (1) is G_tAnd the number of the middle background pixel points.

A process flow for 12 segments of multi-source video images comprises the following processing steps:

1. determining simulation conditions

The invention is in the central processing unit

And (3) simulating by using MATLAB software on an i 3-32403.3 GHz CPU and a memory 4G, WINDOWS 7 operating system.

The data used in the simulation is an autonomously acquired video sequence of 12 real road scenes.

2. Emulated content

The method of the invention is used for dangerous target detection according to the following steps:

firstly, the calculated motion image, the original color image, the near-infrared image and the restored depth image are simultaneously input into the system, and step 2 and step 3 are executed.

Next, the results of the test results obtained for each frame and the results of the real markers are calculated from the acceptable test curve of the subject and the area value under the curve, and the results are shown in table 1.

The method without PAD in table 1 is a result of direct multiplication of multiple information points, and all combinations of information are exhaustive in this example, and they are single motion information (M), motion-color information point Multiplication (MC), motion-near infrared information point Multiplication (MI), motion-depth information point Multiplication (MD), motion-color-near infrared information point Multiplication (MCI), motion-color-depth information point Multiplication (MID), motion-near infrared-depth information point Multiplication (MID), and motion-color-near infrared-depth information point Multiplication (MCID). PAD-MCID denotes the method according to the invention. Obtaining a visualization result image, wherein (a) is an original color image frame; (b) is a real result; (c) - (k) correspond to M, MC, MI, MD, MCI, MCD, MID, MCID, PAD in Table 1, respectively. As can be seen from Table 1, the recognition rate of the present invention is significantly higher than that of the simple multiplicative information fusion method.

TABLE 1 comparison of area values under ROC curve for hazardous object detection

Claims

1. A dangerous object detection method fusing multi-source visual information is characterized by comprising the following steps: the method comprises the following steps:

multi-source visual image acquisition:

incremental motion consistency considerations:

2.2 training Normal motion mode base A, including left side Normal motion mode, with superpixel motion mode obtained from initial F frame segmentationGroup A of the formula^lAnd the right normal movement mode base A^r(ii) a Wherein the value range of F is 5-20;

constraint coefficients for geometric prevalence regularization terms;

wherein,

β is the error balance coefficient of the left and right sides;

Will be provided with

Is normalized to [0,1 ] using a max-min normalization method]To (c) to (d);

multi-source visual information fusion:

3.1 calculating the significance results of the color video image, the near-infrared video image and the depth image respectively by using a significance calculation method based on image evaluation to obtain the danger confidence coefficient of the color video image

Near-infrared video image danger confidence

And depth image risk confidence

3.2 Bayes model with significance

3.2.1 ] calculating the prior probability Pr (O):

3.2.2 ] calculating likelihood probability Pr (S (z) | O):

will obtain

Binarization, calculating the falling of the target region on the corresponding original visual image

Number of pixels corresponding to the numerical value

Number of pixels corresponding to the numerical value

The likelihood probability of the target and the background is

3.3. calculating the fusion probability of all visual information after Bayes:

wherein,

and 4, calculating the detection rate:

repeating the steps 2-3 for each frame of image until the whole video image is processed; marking real dangerous object area in t frame of videoIs G_tThe detection rate is as follows:

TPR＝TP/P,

FPR＝FP/N

2. The method for detecting a dangerous object fusing multi-source visual information according to claim 1, wherein: said lambda₁A value of 0.05, said

The value is 0.005, and the value of β is 0.4.

3. The method for detecting a dangerous object fusing multi-source visual information according to claim 1 or 2, wherein: the element distribution map index OPT in the step 3.2.1 is:

4. the method for detecting a dangerous object fusing multi-source visual information according to claim 3, wherein: and F in the step 2.2 is 10.