CN111489330B

CN111489330B - Weak and small target detection method based on multi-source information fusion

Info

Publication number: CN111489330B
Application number: CN202010215165.6A
Authority: CN
Inventors: 韩振军; 韩许盟; 余学辉; 宫宇琦; 蒋楠; 彭潇珂; 王岿然; 焦建彬; 叶齐祥; 万方
Original assignee: University of Chinese Academy of Sciences
Current assignee: University of Chinese Academy of Sciences
Priority date: 2020-03-24
Filing date: 2020-03-24
Publication date: 2021-06-22
Anticipated expiration: 2040-03-24
Also published as: CN111489330A

Abstract

The invention provides a method for detecting a small and weak target, which is realized by the following steps: preprocessing an image for target detection, and enhancing the contrast of the image; carrying out image saliency analysis on the image to obtain a saliency map highlighting the area where the foreground target is located; and based on the saliency map of the image, carrying out segmentation processing on the saliency target in the image, and positioning to obtain the target position in the image. When the target detection simultaneously adopts a plurality of time and space aligned images collected by a multi-source channel, the detection method also comprises the step of fusing the images processed by the steps. The method has very important significance for monitoring and analyzing the remote target, and effectively realizes the extraction of the salient target in the image by using methods such as salient analysis, image segmentation, image fusion and the like.

Description

Weak and small target detection method based on multi-source information fusion

Technical Field

The invention belongs to the field of computer vision and image processing, relates to the detection of weak and small targets, and particularly relates to a weak and small target detection method based on multi-source information fusion.

Background

In the field of target detection, the task of detecting weak and small targets is always a challenging problem, but has very important significance for remote target monitoring and analysis. Because the distance between the sensor and the target is long, the scale of the target displayed on the image is small, the characteristics such as shape and contour are not obvious, the information content of the target is weak, and the image is influenced by various noises, so that the target is easily submerged in the noises, which brings great difficulty to how to quickly and accurately detect the target.

The weak and small target detection is a supplement of the existing general target on the scale, and the problem of weak and small target detection is solved, so that the target detection is truly general on all scales. The relative scale and the absolute scale of the target in the existing general target detection are often very large, and in practical application, the relative scale and the absolute scale of many scene targets are far smaller than the value, so that the small-scale target detection can be regarded as target detection closer to the tasks, and the existing general target detection method can not well cover the problems.

With the maturity and commercialization of unmanned aerial vehicle technology, the correlation technique based on unmanned aerial vehicle also receives more and more attention, and it is one of them to accomplish automatic monitoring based on target detection of taking photo by plane. Aerial photography is generally equipped with a camera with extremely high resolution to obtain high-quality images, but when high-altitude photography is carried out, the resolution of many targets (such as pedestrians, vehicles and the like) is still low, so that targets with weak information and small relative scale and absolute scale are generated, and aerial target detection is usually natural and is weak target detection. At present, unmanned aerial vehicle aerial photography has many application scenes, for example, photographers and amateurs use unmanned aerial vehicles to record daily activities or perform some innovative projects; or the method is used for detecting the defective solar panel by the solar power plant; or for the detection of early disease in plants; and even can be used for shark detection in the public safety field, and the like. In addition, for many national defense and military tasks, such as accurate defense and accurate strike, or border safety monitoring tasks and the like, the detection of weak and small targets is often like shadow following. The targets in these application scenarios also have the characteristics of weak targets and complex diversity.

At present, related researches on image small and weak target detection are not much at home and abroad, and the researches on the small and weak targets are less. In research, the weak and small target detection method mostly focuses on positioning the target by using a deep learning method. At present, the network framework based on the anchor frame (anchor) is a detector with better experimental performance at present, but the anchor frame setting of the detector is not friendly to the target with smaller absolute scale, which is mainly reflected on the scale (scale) and stride (stride) of the anchor frame. For example, the minimum dimension of the anchor frame in the Faster R-CNN is 32 × 32, the optimal cross over Unit (IOU) is only about 0.22 for a small target with an absolute dimension of 15 × 15, and in the default setting of the Faster R-CNN, the anchor frame is selected by adopting the maximum IOU matching principle to predict the target, so that a large number of low-quality (IOU is low) matches exist for the small target; in addition, the stride of the anchor frame in the Faster R-CNN is 16, which is larger than the side length of the 15 x 15 target, so that when the small target is displaced, the matched IOU will be greatly reduced. The more anchor frames of the best matching IOU of the same target caused by displacement, the more target matching modes generated concomitantly will be greatly increased, and the regression difficulty is increased. As observed in most experiments, the initial IOU of target matching has a strong positive correlation with the regression result, so the anchor frame setting of the existing detection framework is not favorable for targets with small absolute scale. Further, deep learning requires a long model learning time and is typically performed on a GPU, with instantaneity and universality issues.

Therefore, aiming at the problems in the prior art, the invention provides a method for detecting a small and weak target, in particular to a method for detecting a small and weak target based on multivariate information fusion.

Disclosure of Invention

In order to overcome the above problems, the present inventors have conducted intensive studies and proposed a simple and effective method for detecting a small and weak target, which mainly uses an image processing method to autonomously analyze and process a scene image and extract a target region that may exist in a picture for a detection task without specific target limitation. The detection method comprises the following specific implementation processes: firstly, preprocessing an original picture by using an image equalization method to achieve a better image effect, so that subsequent operation processing is facilitated; secondly, extracting a salient region in the scene image by using a visual saliency analysis algorithm based on data driving; thirdly, based on the saliency map of the image, using an adaptive threshold and a GrabCut segmentation algorithm to realize fine segmentation processing on the saliency target in the image and locate the target in the image. When the target detection adopts a plurality of time and space aligned images collected by a multi-source channel, the method further comprises a step of fusing the images processed by the steps, thereby completing the invention.

The invention aims to provide the following technical scheme:

the invention aims to provide a weak and small target detection method, which comprises the following steps:

step 1), preprocessing an image for target detection, and enhancing the contrast of the image;

step S2), carrying out image significance analysis on the image to obtain a significance map highlighting the area where the foreground object is;

step S3), based on the saliency map of the image, segmentation processing is performed on the saliency target in the image, and the target position in the image is located.

When the target detection simultaneously adopts a plurality of time and space aligned images collected by a multi-source channel, the detection method also comprises the step of fusing the images processed in the steps 1-3).

According to the weak and small target detection method based on multi-source information fusion, the method has the following beneficial effects:

(1) the method for detecting the weak and small targets does not need a training process of a model, and can detect the targets to be detected in real time.

(2) According to the method for detecting the weak and small targets, a bottom-up visual attention mechanism is adopted for image significance analysis, data driving is adopted, the method is independent of specific tasks, comprehensive suspicious targets are favorably provided, and support is provided for judgment of specified targets by people;

(3) according to the method for detecting the weak and small targets, the targets can be quickly segmented and positioned by adopting self-adaptive threshold segmentation, the targets can be finely segmented and positioned by adopting self-adaptive threshold segmentation and GrabCT segmentation, and the weak and small targets can be segmented and positioned by adopting different modes according to requirements.

(4) The method for detecting the weak and small targets can fuse the multi-source information, comprehensively utilize the multi-source information and comprehensively and accurately position the targets.

Drawings

FIG. 1 shows a visible light image for object detection in an embodiment;

FIG. 2 shows a graph of the results of an HSV-based color image model equalization of FIG. 1;

FIG. 3 shows a graph of the results of a saliency analysis of a visible light image;

FIG. 4 shows a graph of the results of a saliency analysis of a visible light image temporally and spatially aligned with an infrared image;

FIG. 5 is a graph showing the result of adaptive threshold segmentation of a saliency map of a visible light image using the Otsu method and superposition of the segmentation result with an original image;

FIG. 6 is a result diagram of adaptive threshold segmentation of a saliency map of an infrared image by an Otsu method and superposition of the segmentation result with an original image;

FIG. 7 shows a graph of the results of the GrabCut method segmentation of FIG. 5;

FIG. 8 shows a graph of the results of the GrabCut method segmentation of FIG. 6;

FIG. 9 is a diagram showing the result of fusing the segmentation results of the visible light image and the infrared image using Bayesian decision;

FIG. 10a shows a picture to be detected, and FIG. 10b shows the detection result of FIG. 10 a;

FIG. 11a shows a picture to be detected, and FIG. 11b shows the detection result of FIG. 11 a;

FIG. 12a shows a picture to be detected, and FIG. 12b shows the detection result of FIG. 12 a;

FIG. 13a shows a picture to be detected, and FIG. 13b shows the detection result of FIG. 13 a;

FIG. 14a shows a picture to be detected, and FIG. 14b shows the detection result of FIG. 14 a;

FIG. 15a shows a picture to be examined, and FIG. 15b shows the examination result for FIG. 15 a;

FIG. 16a shows a picture to be detected, and FIG. 16b shows the detection result of FIG. 16 a;

FIG. 17a shows a picture to be detected, and FIG. 17b shows the detection result for FIG. 17 a;

fig. 18a shows a picture to be detected, and fig. 18b shows the detection result for fig. 18 a.

Detailed Description

The invention is explained in further detail below with reference to the drawing. The features and advantages of the present invention will become more apparent from the description.

The invention aims to provide a weak and small target detection method, which is not particularly limited to weak and small targets in an image and comprises the following steps:

step 2), carrying out image saliency analysis on the image to obtain a saliency map highlighting the area where the foreground target is located;

and 3) carrying out segmentation processing on the salient target in the image based on the saliency map of the image, and positioning to obtain the target position in the image.

In the present invention, the weak and small target refers to an interested target observed from a long distance, such as a human body, a detection, a vehicle, etc.

In the step 1) of the invention, the image for target detection is preprocessed, and the contrast of the image is enhanced.

The original image may be affected by complex and variable environmental factors, so that the information and features required by the image are not significant enough, and the operation directly on the original image often cannot achieve a satisfactory effect, so that the original image needs to be preprocessed first to enhance the contrast of the image. The image enhancement can highlight the interesting features in the image and inhibit the uninteresting features, thereby improving the image effect and enriching the image information. In this step, image enhancement will be achieved using a histogram equalization method.

The traditional histogram equalization method is directed at gray level images, and carries out nonlinear transformation on the gray level of an original image, so that the gray level histogram of the original image is uniformly expanded to the whole gray level range from an original area, thereby expanding the gray level difference between a foreground object and a background area and enhancing the contrast of the image. Most of the scene images shot today are colored. It is therefore necessary to extend the application of image equalization methods to color images.

In the present invention, the histogram equalization method includes a global histogram equalization method and a local histogram equalization method, and preferably, the local histogram equalization method.

The processing of the global histogram equalization method is global, with the pixels being modified by a transformation function based on the gray scale distribution of the entire image. This method is suitable for enhancement of the whole image. When a global histogram equalization method is adopted, the inventor finds that details of small areas in an image are often ignored in global calculation, so that an ideal effect cannot be obtained. A solution to this problem is to use a local histogram equalization method. In this case, the entire image is divided into a grid of sub-windows in the shape of m × n (8 × 8 is typically used), and then each part is processed separately. In each sub-window, the image histogram is concentrated on a certain small gray level area, so that each window is equalized, the gray level of a local area is expanded, and the influence of the image overall gray level histogram is avoided.

In the invention, the histogram equalization method is expanded and applied to the color image and comprises an equalization method based on an RGB color image model and an equalization method based on an HSV color image model:

an equalization method based on an RGB color image model comprises the following steps: and separating three channels of the image by using an RGB color image model to obtain three single-channel images, independently performing histogram equalization processing on each image, and finally merging the processed single-channel images to restore the processed single-channel images into a color image form. Since R, G, B are operated separately, the processed image may have color distortion, but the contrast of the image is still enhanced after the equalization process.

An equalization method based on an HSV color image model comprises the following steps: using the HSV color image model, the histogram equalization process is performed on the V (brightness) channel thereof alone while keeping the H (hue) and S (saturation) channels unchanged. The processing mode does not affect the hue and the saturation of the image, so that the defect of color distortion does not occur.

In the invention, when the image for target detection is an achromatic image (or called a gray image), the histogram equalization method is directly used to realize image contrast enhancement; when the image for target detection is a color image, an equalization method based on an RGB color image model or an equalization method based on an HSV color image model is adopted for image contrast enhancement, and the equalization method based on the HSV color image model is preferably adopted.

In the step 2), the image (especially the image after contrast enhancement) is subjected to image saliency analysis to obtain a saliency map highlighting the area where the foreground object is.

In this step, the inventors determined that salient regions in the scene image were extracted using a data-driven visual attention mechanism analysis algorithm. The visual attention mechanism has two processing modes, namely a bottom-up type and a top-down type. The bottom-up visual attention mechanism is unconscious, data-driven based, and independent of the specific task. Therefore, the attention model has no specific task limitation, and is suitable for a detection method without specific limitation on weak and small targets in an image; the method is free of the constraint of prior knowledge, free of manual control and high in calculation speed. The strategy utilizes the bottom layer information of the image such as brightness, color, texture and the like to calculate the difference between pixel points, thereby judging the saliency area.

Under the guidance of the above idea, the extraction of a salient region in an Image IS realized by using an Image Signature (IS) method. The image is labeled as a simple image descriptor that spatially approximates the foreground information of the image, and thus is useful for detecting salient regions of the image.

Consider an image of a scene, which has the following structure:

where f represents the foreground signal, which is assumed to be sparse on a standard spatial basis. b represents the background, which is assumed to be sparse on the basis of discrete cosine transform, and R represents the parameter value space. In other words, both f and dct (b) have only a few non-zero components. In general, given only x and their sparsity, it is very difficult to separate f and b. For the salient region extraction problem, we only care about the foreground signal f (the non-zero set of pixels in f). We can approximately separate f by solving the sign of the mixed signal x in the transform domain and then inverse transforming it into the spatial domain, i.e. computing the reconstructed image:

wherein, DCT (-) and IDCT (-) are discrete cosine transform and inverse discrete cosine transform, respectively. Formally, an Image tag (Image Signature) is defined as follows:

imagesignature (x) sign (dct (x)) formula (2-2)

Further, reconstructing the image by smoothing

To obtain the final saliency map s:

wherein g is a Gaussian kernel, a convolution operator,

is the hadamard product operator. Simple gaussian smoothing is necessary because some salient objects in the reconstructed image are point-like, however, in practice salient objects are not only sparse in space, but also located on a continuous area. Gaussian smoothing fuzzifies the saliency map, and is beneficial to obtaining a continuous saliency target in a certain area.

Specifically, the image saliency analysis comprises the following sub-steps:

step 2.1), based on the RGB color image model, for each single-channel image I in the input image I_iA normalization process is performed such that:

0≤I_i(x, y) is not more than 1, i is 1,2, …, N, formula (2-4)

Wherein N is the number of image channels;

step 2.2), the width of the single-channel image is scaled to a set size (such as 512 pixels), and meanwhile, the image height is scaled in an equal proportion for subsequent processing;

step 2.3), after the preprocessing, performing the following operations on the single-channel image:

S_i＝IDCT(sign(DCT(I_i))),i＝1,2, …, N formula (2-5)

Where N is the number of image channels, and DCT (-) and IDCT (-) are discrete cosine transform and inverse discrete cosine transform, respectively. The sign function sign (x) is defined as follows:

after this operation, a plurality of single-channel saliency maps S are obtained_i；

Step 2.4), averaging the multiple single-channel saliency maps, and combining to obtain a gray saliency map, namely:

wherein h is_sAnd w_sRespectively, the height and width of the grayscale saliency map.

By performing the above steps 2.1) to 2.4), the foreground object in the image can be obtained. Since salient objects are not only spatially sparse, but also localized on a continuous area, a simple gaussian smoothing is necessary for this purpose.

Thus, the image saliency analysis also includes step 2.5), blurring the grayscale saliency map using a gaussian kernel whose width and height are both:

ksize＝int(4×w_sx eta) formula (2-8)

Wherein η is a blurring parameter. The standard deviation of the gaussian function in the X and Y directions is: σ ═ w_s×η。

In step 3), based on the saliency map of the image, the saliency target in the image is segmented, and the target position in the image is obtained by positioning. Namely, the grayscale saliency map obtained in step 2) is used to complete image segmentation, so as to realize extraction of a salient object in an image.

In the invention, the fine segmentation processing of the salient objects in the image is implemented by using adaptive threshold segmentation and GrabCT segmentation. Adaptive threshold segmentation of the saliency map is preferably achieved using the Otsu method (also known as the maximum inter-class difference method).

The Otsu method, which uses the inter-class variance to mark the difference between two classes of pixels divided, is optimal in the case where the inter-class variance is the largest. The Otsu method is calculated based on a gray histogram of an input image, and an optimal division threshold value can be automatically obtained. The Otsu method comprises the following specific steps:

a gray histogram of the image is calculated (only the most common 8bit image, i.e. 256-level gray map, is considered) and normalized. Setting the segmentation threshold value as j, the segmentation threshold value divides the image pixel points into two classes, and the pixel gray value is in the interval [0, j]Inner is marked as C₀Class, representing background area; the gray value of the pixel is in the interval [ j,255]Inner is marked as C₁Class, representing foreground region. Statistics C₀The proportion of the similar pixels is marked as omega₀And calculating the average gray value thereof as mu₀(ii) a Similarly, statistics of C₁The proportion of the similar pixels is marked as omega₁And calculating the average gray value thereof as mu₁. Then C is₀Class and C₁The inter-class variance of a class can be expressed as:

g＝ω₀ω₁(μ₀-μ₁)²formula (3-1)

And (4) finding a segmentation threshold j which enables the inter-class variance g to be maximum by traversing all the gray levels, namely the obtained threshold.

Because the adaptive threshold segmentation method can only obtain the approximate region of the saliency target, in order to segment more specific targets, the invention uses GrabCT image segmentation algorithm to realize more accurate segmentation of the saliency target in the image on the basis of the adaptive threshold segmentation.

The GrabCut algorithm uses the RGB color space, and models foreground objects and background regions with a full covariance GMM (gaussian mixture model) containing K gaussian components, respectively (in the present invention, K is taken to be 5). Let vector k ═ k₁,…,k_n,…,k_NIn which k is_nIs the Gaussian component corresponding to the nth pixel, k_nE {1, …, K }. The energy of the whole image is:

e (α, k, θ, z) ═ U (α, k, θ, z) + V (α, z) formula (3-2)

Wherein z is the image pixel value, the parameter α ∈ {0,1}, 0 represents the background region, 1 represents the foreground object, and θ is the set of parameters. The core goal of the GrabCut algorithm is to optimize the energy function E.

The definition of the area item U is as follows:

U(α,k,θ,z)＝∑_nD(α_n,k_n,θ,z_n) Formula (3-3)

Wherein D (alpha)_n,k_n,θ,z_n)＝-logp(z_n|α_n,k_n,θ)-logπ(α_n,k_n) P (-) represents a Gaussian probability distribution, and π (-) is a mixture weight coefficient, then:

at this point, the parameters of the model are determined as:

θ ═ pi (α, K), μ (α, K), ∑ (α, K), α ═ 0,1, K ═ 1, …, K } expression (3-5), i.e., the weight pi of 2K gaussian components (each of the foreground object and the background region contains K components), the mean vector μ, and the covariance matrix Σ. All the three parameters are obtained through learning, and after all the three parameters are determined, the color value (R, G, B) of each pixel is sent into the GMM model, and the probability that the color value belongs to the foreground and the background is calculated. Thus, the region term U can be computationally determined, i.e., the weight of each pixel connected to the foreground and background.

The boundary term V is defined as follows:

where γ is a constant (generally, γ is 50), and β is a contrast balance factor. The boundary term V represents a penalty for discontinuities between adjacent pixels m and n. In the RGB color space, the Euclidean distance is used to calculate the distance between two pixel points, i.e., | z_m-z_n‖。The parameter beta is determined by the image contrast, and when the image contrast is low, the difference between the pixel points is amplified by using larger beta; conversely, when the image contrast is higher, a smaller β is used to narrow the difference. The balance factor β thus makes it possible for the boundary term V to adapt to images of different contrast.

The optimal value of E is obtained by using a maximum flow algorithm, the iteration times are set, GMM model parameters are updated after each iteration is finished, and then the process of optimizing the energy function E is continuously repeated, so that a better target segmentation result is finally obtained.

In the initial stage of the GrabCont algorithm, a pixel set of an image background region and an initial pixel set of a foreground target need to be provided, the Otsu algorithm just gives a primary segmentation result, the foreground target and the background region obtained by the Otsu method are sent to the GrabCont algorithm, and an image used for target detection is segmented through iteration to obtain a finer target segmentation result. Therefore, the Otsu algorithm and the GrabCut algorithm are effectively combined, and the final fine segmentation is realized.

When the target detection simultaneously adopts a plurality of time and space aligned images collected by a multi-source channel, in order to comprehensively utilize the plurality of images and comprehensively and accurately position the target, the method for detecting the weak and small target further comprises the step of fusing the images processed by the steps.

The image fusion refers to that the time and space aligned images collected by the multi-source channel are subjected to fusion algorithm, information which is beneficial to outputting results in each image is extracted, and the images which are more convenient to observe and process are comprehensively formed. The image fusion can be divided into three levels, namely pixel level fusion, feature level fusion and decision level fusion. The invention realizes the fusion of the segmentation results of the multi-source images such as the visible light images and the infrared images at the decision level.

The decision-level fusion is a process of performing comprehensive analysis and decision after independently processing multi-source information to obtain a result, and is high-level fusion based on image understanding. The invention realizes the decision-level fusion of the multi-source image segmentation result by using Bayesian decision.

The fusion process comprises the following substeps:

step 4.1), fusing the images acquired based on the multi-source channels to obtain the conditional probability of the category of each pixel point under the scene image;

set pixel class label ε ═ ε₀,ε₁In which epsilon₀Indicates that the pixel belongs to the background region, epsilon₁Indicating that the pixel belongs to the foreground object. Is provided with

Wherein, P (z)_x,y|ε₁) Is represented by₁The conditional probability of the pixel point (x, y) under the condition of the class; for the same reason P (z)_x,y|ε₀) Is represented by₀The conditional probability of the pixel point (x, y) under the condition of the class; c represents the number of fused images, and c is 2; p_n(z_x,y) Indicating that a pixel (x, y) belongs to epsilon in the nth image₁The probability of (d); alpha is alpha_nRepresents the fusion weight of the nth image and satisfies

Step 4.2), fusing to obtain the prior probability of the category of each pixel point under the scene image through the saliency map;

the prior probability of the category to which the pixel point (x, y) belongs can be calculated by the saliency map obtained in step 2), specifically:

wherein S (x, y) represents the gray value of the pixel point (x, y) in the saliency map; 255 is the maximum value of the gray level.

Step 4.2), through the analysis, the posterior probability, namely the Bayesian formula is utilized to obtain the conditional probability which can be obtained by the prior probability and the fusion of the pixel points

According to Bayes decision rule, the current posterior probability P (epsilon)₁|z_x,y)＞P(ε₀|z_x,y) Then, the pixel point (x, y) is judged to belong to epsilon₁Class; otherwise, judging that the pixel point (x, y) belongs to epsilon₀And positioning to obtain the target after image fusion.

Examples example 1

1. Image pre-processing enhancement

The input original image is shown in fig. 1, and is subjected to local histogram equalization based on an HSV color image model, and the result is shown in fig. 2. It can be seen that the contrast of the processed image is enhanced, so that the details in the image are displayed more clearly, and meanwhile, the adverse condition of color distortion does not occur.

2. Significance analysis

The results of the image saliency analysis are shown in fig. 3 and 4. It can be seen that the brightness of the area where the pedestrian and the vehicle are located in the original image is larger in the saliency map, which indicates that the saliency of the area is higher, and the foreground object interested by the person is most likely to appear in the area with high saliency. Therefore, the result of the significance analysis can be utilized to realize the detection of the significance target by combining with an image segmentation algorithm.

3. Adaptive threshold segmentation

The saliency map is adaptively threshold-segmented using the Otsu method, and its segmentation result (binary image) is superimposed with the original image, with the results shown in fig. 5 and 6. The area in the figure, which is pure black, is taken as a background and is binary to 0; the remaining part with the image is a foreground object and is binarized to 1. It can be seen that the regions with higher brightness (higher saliency) in the saliency map are basically divided. The result graph of the adaptive threshold segmentation can be used as a preliminary segmentation result, and then the GrabCut algorithm is combined to realize finer segmentation on the significance target.

4. Grabcut segmentation

The Otsu algorithm gives a preliminary segmentation result and generates a binary template map, wherein the label value of the foreground area is 1, and the label value of the background area is 0. The GrabCont algorithm needs to provide a pixel set of a background area and an initial pixel set of a foreground object, so that a pixel with a label value of 0 in a binary image is used as the pixel set of the background area, and a pixel with a label value of 1 is used as the initial pixel set of the foreground object. And then, the iteration times are set (set to be 3), and the segmentation model is subjected to iterative optimization to obtain a final target segmentation result.

The results of the GrabCut algorithm segmentation are shown in fig. 7 and 8. It can be seen that after the GrabCut algorithm, the significant targets in the foreground area in the template map are segmented, and the outline of most targets are well segmented, so that the expected effect is achieved.

5. Image decision level fusion

As shown in fig. 9, the segmentation results of the visible light image and the infrared image are fused by using a bayesian decision, so as to determine whether each pixel point in the image belongs to a foreground target or a background region, so as to obtain a final target detection result.

Example 2

By using the method disclosed by the invention, the detection speed and accuracy of the weak and small targets in the image are verified.

Wherein, FPS (number of pictures processed per second) is adopted as an evaluation criterion for the detection speed, and the following evaluation criteria are adopted for the detection precision:

wherein: true Positives, TP: the number of targets detected as foreign objects and actually also as foreign objects; false Positives, FP: detecting the number of targets which are foreign targets and are actually non-foreign targets; true negotives, TN: the number of targets detected as non-foreign targets, which are actually non-foreign targets; false Negatives, FN: the number of targets detected as non-foreign targets, actually foreign targets. N is TN + FN and P is TP + FP. In this embodiment, precision and recall are mainly used as evaluation indexes, the recall reflects a missing detection rate, the larger the recall is, the lower the missing detection is, the precision reflects a false alarm rate, and the larger the precision is, the lower the false alarm rate is.

In this embodiment, the CPU obtains the detection speed (FPS) by calculating all the pictures in the data set (infrared and visible light separate operations) and obtaining the execution time, as shown in the following table.

Model detection speedometer

In the present embodiment, weak and small target detection is performed for targets with different sizes, and specific detection results are shown in fig. 10-14; wherein, fig. 10a is a picture to be detected, and fig. 10b is a target detection result obtained by detecting fig. 10 a; FIG. 11a is a picture to be detected, and FIG. 11b is a target detection result obtained by detecting FIG. 11 a; fig. 12a is a picture to be detected, and fig. 12b is a target detection result obtained by detecting fig. 12 a; fig. 13a is a picture to be detected, and fig. 13b is a target detection result obtained by detecting fig. 13 a; fig. 14a is a picture to be detected, and fig. 14b is a target detection result obtained by detecting fig. 14 a. The corresponding detection accuracy is shown in the following table:

weak and small target detection is carried out on targets with different background complex conditions, and specific detection results are shown in fig. 15-18; wherein, fig. 15a is a picture to be detected, and fig. 15b is a target detection result obtained by detecting fig. 15 a; fig. 16a is a picture to be detected, and fig. 16b is a target detection result obtained by detecting fig. 16 a; fig. 17a is a picture to be detected, and fig. 17b is a target detection result obtained by detecting fig. 17 a; fig. 18a is a picture to be detected, and fig. 18b is a target detection result obtained by detecting fig. 18 a. The corresponding detection accuracy is shown in the following table:

according to the embodiment 2, the weak and small target detection method based on the multi-source information fusion can provide high-efficiency detection efficiency and good detection precision.

The present invention has been described above in connection with preferred embodiments, but these embodiments are merely exemplary and merely illustrative. On the basis of the above, the invention can be subjected to various substitutions and modifications, and the substitutions and the modifications are all within the protection scope of the invention.

Claims

1. A weak and small target detection method comprises the following steps:

step 3), based on the saliency map of the image, carrying out segmentation processing on the saliency target in the image, and positioning to obtain the target position in the image;

in step 1), for an achromatic image, enhancing the contrast of the image by using a histogram equalization method,

the histogram equalization method comprises a global histogram equalization method and a local histogram equalization method, wherein the local histogram equalization method comprises the following steps:

dividing the whole image into 8 x 8 sub-window grids, and then processing each part independently, wherein in each sub-window, the histogram of the image is concentrated on a certain small gray level area, so that each window is equalized, the gray level of a local area is expanded, but the local area is not influenced by the whole gray level histogram of the image;

for a color image, an equalization method based on an HSV color image model is adopted to enhance the contrast of the image, and specifically:

using an HSV color image model to independently perform histogram equalization processing on a V channel of the HSV color image model, and simultaneously keeping H and S channels unchanged;

in the step 3), firstly, carrying out primary segmentation on the salient target by using self-adaptive threshold segmentation, then carrying out fine segmentation on the salient target in the image by using GrabCT segmentation on the basis of the primary segmentation, and positioning to obtain the salient target;

using an Otsu method, performing adaptive threshold segmentation on the saliency map;

the Otsu method comprises the following specific steps:

calculating a gray level histogram of the image and carrying out normalization processing; setting the segmentation threshold as h, it divides the image pixel points into two classes, the pixel gray value is in the interval [0, h]Inner is marked as C₀Class, representing background area; the pixel gray value is in the interval [ h, 255%]Inner is marked as C₁Class, representing foreground region; statistics C₀The proportion of the similar pixels is marked as omega₀And calculating the average gray value thereof as mu₀(ii) a Similarly, statistics of C₁The proportion of the similar pixels is marked as omega₁And calculating the average gray value thereof as mu₁Then C is₀Class and C₁The inter-class variance of a class can be expressed as:

g＝ω₀ω₁(μ₀-μ₁)²formula (3-1)

Finding a segmentation threshold h which enables the inter-class variance g to be maximum by traversing all gray levels, wherein the segmentation threshold h is the required threshold;

when a plurality of time and space aligned images collected by a multi-source channel are adopted for target detection at the same time, the detection method further comprises the step of fusing the images processed in the steps 1) to 3);

the fusion process comprises the following substeps:

set pixel class label ε ═ ε₀，ε₁In which epsilon₀Indicates that the pixel belongs to the background region, epsilon₁Representing that the pixel belongs to a foreground target; is provided with

P(z_x，y|ε₀)＝1-P(z_x，y|ε₁)

Wherein, P (z)_x，y|ε₁) Is represented by₁The conditional probability of the pixel point (x, y) under the condition of the class; for the same reason P (z)_x，y|ε₀) Is represented by₀The conditional probability of the pixel point (x, y) under the condition of the class; c represents the number of fused images; p_n(z_x，y) Indicating that a pixel (x, y) belongs to epsilon in the nth image₁The probability of (d); alpha is alpha_nRepresents the fusion weight of the nth image and satisfies

Step 4.2), fusing the saliency map obtained in the step 2) to obtain the prior probability of the category to which each pixel point belongs under the scene image;

the method specifically comprises the following steps:

P_x,y(ε₀)＝1-P_x,y(ε₁)

wherein S is_n(x, y) represents the gray value of a pixel point (x, y) in the saliency map of the nth image; 255 is the maximum value of the gray level;

step 4.3), based on the prior probability and the conditional probability of the pixel points, obtaining the posterior probability of the pixel points by using a Bayes algorithm, and judging whether the pixel points belong to the significance target or not according to the posterior probability;

specifically, the method comprises the following steps:

when a posterior probability P (epsilon)₁|z_x,y)＞P(ε₀|z_x,y) Then, the pixel point (x, y) is judged to belong to epsilon₁Class; otherwise, judging that the pixel point (x, y) belongs to epsilon₀And positioning to obtain the target after image fusion.

2. The method according to claim 1, characterized in that in step 1), for color images, an equalization method based on an RGB color image model is used to enhance the contrast of the image, in particular:

and separating three channels of the image by using an RGB color image model to obtain three single-channel images, independently performing histogram equalization processing on each image, and finally merging the processed single-channel images to restore the processed single-channel images into a color image form.

3. The method according to claim 1, characterized in that in step 2), the image saliency analysis comprises the following sub-steps:

step 2.1), based on the RGB color image model, for each single-channel image I in the input image I_mA normalization process is performed such that: i is more than or equal to 0_m(x, y) is less than or equal to 1, m is 1,2, …, N, wherein N is the number of image channels;

step 2.2), the following operations are carried out on the single-channel image to obtain a plurality of single-channel saliency maps S_m：

S_m＝IDCT(sign(DCT(I_m))),m＝1,2,…,N

Wherein, DCT (-) and IDCT (-) are discrete cosine transform and inverse discrete cosine transform respectively;

step 2.3), calculating an average value of the multiple single-channel saliency maps, and combining to obtain a gray saliency map, namely:

4. A method according to claim 3, characterized in that before step 2.2), it further comprises the step of resizing the image:

the width of the single channel image is scaled to a set size while the image height is scaled equally.

5. The method according to claim 3, wherein the image saliency analysis further comprises a step 2.4) of blurring the grayscale saliency map using a Gaussian kernel whose width and height are both: ksize ═ int (4 Xw)_sX η), where η is the blur parameter;

the standard deviation of the gaussian function in the X and Y directions is: σ ═ w_s×η。