CN110598560A

CN110598560A - Night monitoring and identifying method and system based on neural network enhancement

Info

Publication number: CN110598560A
Application number: CN201910754820.2A
Authority: CN
Inventors: 罗洪燕; 沈玺
Original assignee: Chongqing Terminus Technology Co Ltd
Current assignee: Chongqing Terminus Technology Co Ltd
Priority date: 2019-08-15
Filing date: 2019-08-15
Publication date: 2019-12-20

Abstract

The application discloses a night monitoring and identifying method based on neural network enhancement, which comprises the steps of firstly obtaining neighborhood information of pixels in a night monitoring image, carrying out self-adaptive enhancement on the image based on a secondary Taylor series to obtain a self-adaptive enhanced image, then extracting region features and edge features from the self-adaptive enhanced image and respectively inputting the region features and the edge features into corresponding neural networks, carrying out significance calculation on feature identification results output by the neural networks, fusing the calculated significant region images to obtain a comprehensive significant image, finally segmenting the comprehensive significant image by using a maximum entropy method to obtain a binary image, and extracting a target image from the self-adaptive enhanced image based on the binary image. The method can improve the contrast of the image, and can perform self-adaptive image enhancement aiming at the problem of uneven illumination of each part of the image, and the generated saliency map can effectively cover the boundary of the target area and well inhibit the saliency of the background area.

Description

Night monitoring and identifying method and system based on neural network enhancement

Technical Field

The application relates to the technical field of video monitoring, in particular to a night monitoring identification method and system based on neural network enhancement.

Background

Video surveillance plays an important role in people's daily life. For example, monitoring cameras are installed at intersections of traffic roads and on roads to detect whether illegal behaviors such as speeding and unbuckling of safety belts exist in running vehicles, and to monitor the occurrence process of traffic accidents as evidences beneficial to judging accident responsibilities. For example, surveillance cameras are also installed in places such as office area roads, office buildings, residential area roads and residential buildings for security monitoring, so that events such as property theft and fighting can be avoided in time. Still for example unmanned aerial vehicle carries the camera on and patrols in the air.

The current video monitoring system can realize automatic image recognition processing on a video monitoring picture, and extract an interested target from an image, for example, a vehicle running a red light is recognized from the picture and the license plate information of the vehicle is recorded so as to execute corresponding punishment measures. Video monitoring is usually uninterrupted for 24 hours, so that monitoring pictures are processed in the daytime and at night, and the quality of the monitoring pictures is affected according to different weather conditions. In clear daytime, due to the irradiation of natural light, edge extraction and target identification are relatively easy to perform from a monitoring picture, the image identification processing equipment can directly perform target identification from the monitoring picture in daytime, and in night video monitoring picture and in rainy, snowy, foggy and haze weather, the monitoring picture has the conditions of low object brightness, reduced contrast, fuzzy double image of an object caused by prolonged exposure time and the like, so that the overall gray value of an image is low or the image target and the background are fused or highlight appears, the difficulty of edge extraction and target identification from the video picture is remarkably increased, and the image identification processing equipment is not favorable for performing target extraction and identification on the monitoring picture.

If monitor through the manual work, look over the video monitoring picture with the manual mode, for example night shift personnel observe the control picture that shows on the polylith display screen, then can face the above-mentioned difficult condition of discernment equally to still can increase human cost and time cost, reduce work efficiency.

Therefore, an efficient and accurate image enhancement method is needed, which can enhance the monitored image and improve the image quality to facilitate image recognition aiming at the characteristics of poor brightness, contrast and definition of the monitored image at night.

Disclosure of Invention

Object of the application

Based on this, in order to improve the defect that the brightness, the contrast and the definition of the night monitoring picture are poor, and efficiently and accurately perform image enhancement on the night monitoring picture so as to facilitate target identification, the application discloses the following technical scheme.

(II) technical scheme

As a first aspect of the present application, the present application discloses a night monitoring identification method based on neural network enhancement, including:

an image enhancement step: acquiring neighborhood information of pixels in a night monitoring image, and performing self-adaptive enhancement on the image based on a secondary Taylor series to obtain a self-adaptive enhanced image;

a saliency map generation step: extracting region features and edge features from the self-adaptive enhanced image, respectively inputting the region features and the edge features into corresponding neural networks, performing significance calculation on feature recognition results output by the neural networks, and fusing calculated significant region images to obtain a comprehensive significant image;

a target image extraction step: and segmenting the comprehensive saliency map by using a maximum entropy method to obtain a binary image, and extracting a target image from the self-adaptive enhanced image based on the binary image.

In a possible implementation, the image enhancement step specifically comprises:

converting the night monitoring image into a gray image and carrying out normalization processing;

enhancing the gray level image by using an exponential nonlinear enhancement function;

performing convolution operation on the gray level image by using a Gaussian bilateral filter function to calculate the mean value information of the gray level image;

performing two-stage Taylor series expansion on the exponential nonlinear enhancement function, performing convolution on the mean value information and an expansion result, and obtaining a self-adaptive enhancement image of the gray image based on local output brightness obtained after convolution;

converting the adaptively enhanced image to a color image.

In a possible implementation, the saliency map generation step specifically includes:

performing superpixel segmentation on the self-adaptive enhanced image to obtain region information of a target;

extracting edge information of a target from the self-adaptive enhanced image by using a bilateral filter;

inputting the area information and the edge information into corresponding convolutional neural networks respectively to obtain area characteristics and edge characteristics of the target;

and inputting the region features and the edge features into a significance detection model for calculation, and combining the obtained significant region images to obtain a comprehensive significant image.

In a possible implementation manner, the target image extracting step specifically includes:

calculating the occurrence probability of pixels with different gray values in the comprehensive saliency map, and dividing the comprehensive saliency map into a target region and a background region by using a threshold;

calculating entropies of the target area and the background area, and determining an optimal threshold value based on the entropies of the target area and the background area;

and segmenting the comprehensive saliency map based on the optimal threshold value to obtain a binary image, and superposing the binary image and the self-adaptive enhanced image to obtain a target image.

In one possible embodiment, the method further comprises: a target image recognition step; the target image recognition step includes:

extracting the depth features of the target image by using a DCNN two-channel convolutional neural network;

and classifying the depth features by using a random forest classifier so as to realize target class identification.

As a second aspect of the present application, the present application further discloses a night monitoring and identification system based on neural network enhancement, including:

the image enhancement module is used for acquiring neighborhood information of pixels in the night monitoring image and carrying out self-adaptive enhancement on the image based on a secondary Taylor series to obtain a self-adaptive enhanced image;

the saliency map generation module is used for extracting region features and edge features from the self-adaptive enhanced image, respectively inputting the region features and the edge features into corresponding neural networks, performing saliency calculation on feature recognition results output by the neural networks, and fusing the calculated saliency region maps to obtain a comprehensive saliency map;

the target image extraction module is used for segmenting the comprehensive saliency map by utilizing a maximum entropy method to obtain a binary image and extracting a target image from the self-adaptive enhanced image based on the binary image;

and the target image identification module is used for identifying the target image by utilizing a neural network.

In one possible embodiment, the image enhancement module comprises:

the gray level conversion unit is used for converting the night monitoring image into a gray level image and carrying out normalization processing;

the image enhancement unit is used for enhancing the gray level image by utilizing an exponential nonlinear enhancement function;

the mean value calculating unit is used for performing convolution operation on the gray level image by utilizing a Gaussian bilateral filter function to calculate mean value information of the gray level image;

the convolution unit is used for performing two-stage Taylor series expansion on the exponential nonlinear enhancement function, performing convolution on the mean value information and an expansion result, and obtaining a self-adaptive enhancement image of the gray image based on local output brightness obtained after convolution;

a color conversion unit for converting the adaptively enhanced image into a color image.

In one possible implementation, the saliency map generation module comprises:

the region information acquisition unit is used for performing super-pixel segmentation on the self-adaptive enhanced image to obtain region information of a target;

an edge information acquisition unit, configured to extract edge information of a target from the adaptive enhanced image using a bilateral filter;

the characteristic identification unit is used for respectively inputting the area information and the edge information into corresponding convolutional neural networks to obtain the area characteristic and the edge characteristic of the target;

and the saliency map generation unit is used for inputting the region features and the edge features into a saliency detection model for calculation, and combining the obtained saliency region maps to obtain a comprehensive saliency map.

In one possible implementation, the target image extraction module includes:

the saliency map dividing unit is used for calculating the occurrence probability of pixels with different gray values in the comprehensive saliency map and dividing the comprehensive saliency map into a target region and a background region by utilizing a threshold;

a threshold determination unit, configured to calculate entropies of the target region and the background region, and determine an optimal threshold based on the entropies of the target region and the background region;

and the target image acquisition unit is used for segmenting the comprehensive saliency map based on the optimal threshold value to obtain a binary image, and superposing the binary image and the self-adaptive enhanced image to obtain a target image.

In one possible embodiment, the system further comprises: a target image recognition module; the target image recognition module includes:

the depth feature extraction unit is used for extracting the depth features of the target image by utilizing a DCNN two-channel convolutional neural network;

and the target class identification unit is used for classifying the depth features by utilizing a random forest classifier so as to realize target class identification.

(III) advantageous effects

The application discloses a night monitoring and identifying method and system based on neural network enhancement, aiming at the condition that the bright surface exposure is insufficient in contrast or low in brightness and low in contrast, a quadratic Taylor series is introduced to improve the quality of an image, an exponential function is used for carrying out nonlinear enhancement on the image so as to improve the contrast of a backlight or night scene image, and self-adaptive image enhancement is carried out aiming at the problem that all parts of the image are uneven in illumination.

Meanwhile, aiming at the situation that the boundary of a target detected by the existing target detection method is fuzzy, the region information of the target is obtained by using a superpixel segmentation method, the edge information of the target is extracted by using a bilateral filter, the extracted region information and the extracted edge information are learned by using two independent CNNs, finally, the learned confidence coefficient is fused into a conditional random field, the energy value is obtained, the judgment of significance and non-significance is realized by calculating the minimum value, and the target detection is completed. The generated saliency map can effectively cover the target boundary, the contrast between the target area and the background area is high, and the saliency of the background area is well inhibited.

In addition, a mode of combining a double-channel convolutional neural network with a random forest classifier is provided for the problems that the target types are more and the appearance is similar, so that the target identification difficulty is higher, the DCNN double-channel convolutional neural network extracts the features of the target image through two independent convolutional neural networks, the abstraction degree of the features is higher, the judgment power is higher, and the essential features of the image can be better reflected.

Drawings

The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining and illustrating the present application and should not be construed as limiting the scope of the present application.

Fig. 1 is a schematic flow chart of an embodiment of a night monitoring and identifying method disclosed in the present application.

Fig. 2 is a block diagram of an embodiment of the night monitoring and identification system disclosed in the present application.

Detailed Description

In order to make the implementation objects, technical solutions and advantages of the present application clearer, the technical solutions in the embodiments of the present application will be described in more detail below with reference to the drawings in the embodiments of the present application.

An embodiment of the night time monitoring and identification method disclosed in the present application is described in detail below with reference to fig. 1. As shown in fig. 1, the method disclosed in this embodiment mainly includes the following steps 100 to 400.

Step 100, image enhancement: the image enhancement module acquires neighborhood information of pixels in the night monitoring image, and performs adaptive enhancement on the image based on a secondary Taylor series to obtain an adaptive enhanced image.

In the image enhancement step, a nonlinear adaptive enhancement method based on a quadratic Taylor series is used for solving the problems of low contrast, low overall gray value, uneven illumination and the like in the night monitoring image. The method improves the quality of an image by introducing a secondary Taylor series, acquires neighborhood information of pixels, performs nonlinear enhancement on the image, not only can effectively improve the overall gray value and contrast of the image for a backlight or night scene image with insufficient bright-face exposure contrast or low brightness and low contrast, but also can perform adaptive image enhancement aiming at the problem of uneven illumination of each part of the image, and can also adaptively adjust the gray value of a brightness part in the image to achieve the purpose of adaptive image enhancement.

Step 200, a saliency map generation step: the saliency map generation module extracts region features and edge features from the self-adaptive enhanced image, respectively inputs the region features and the edge features into corresponding neural networks, performs saliency calculation on feature recognition results output by the neural networks, and fuses the calculated saliency region maps to obtain a comprehensive saliency map.

If certain features are particularly able to be captured, then the characteristic is referred to as saliency. Saliency is a part of image segmentation that is commonly used to locate objects and boundaries in images. In particular, image partitioning is the process of assigning a label to each pixel in an image, so that pixels of the same label can share certain features. The saliency map is an image showing the uniqueness of each pixel, and the aim of the saliency map is to simplify or change the representation of a general image into a more easily analyzed style.

Aiming at the situation that the boundary of a target detected by the existing target detection method is fuzzy, the saliency map generation step introduces a visual saliency model of a convolutional neural network, and provides a target detection method based on the model. The method extracts the area information and the edge information of the target, learns the extracted area information and the extracted edge information by using two independent convolutional neural networks to obtain the area and edge characteristics of the target, obtains the significance and the non-significance of a detection area, and finally obtains a significance map highlighting the target.

Step 300, target image extraction: and the target image extraction module is used for segmenting the comprehensive saliency map by utilizing a maximum entropy method to obtain a binary image, and extracting the target image from the self-adaptive enhanced image based on the binary image.

After the comprehensive saliency map is obtained, firstly, a maximum entropy method is selected to be used for segmenting the comprehensive saliency map to obtain a black-white binary image, then, mathematical morphology operation is selected to process the black-white binary image so as to exclude discontinuous areas existing in the image, and finally, the binary image and the self-adaptive enhanced image are subjected to superposition operation to obtain a target.

In one embodiment, the image enhancement step of step 100 specifically includes the following steps 110 to 150.

And step 110, converting the night monitoring image into a gray image by a gray conversion unit and carrying out normalization processing.

The formula for converting a color image to a grayscale image is:

wherein, I_r(x，y)、I_g(x，y)、I_b(x, y) are at the pixel (x, y) in the image, respectivelyRed, green, blue component values.

After obtaining the gray level image of the image, carrying out normalization treatment on the gray level image:wherein I (x, y) is the gray level of the pixel, I_inIs the gray value of the pixel after normalization.

In step 120, the image enhancement unit enhances the grayscale image using an exponential nonlinear enhancement function.

The exponential nonlinear enhancement function is:

T(I_in(x，y))＝I_in(x，y)^pformula (2);

wherein, I_in(x，y)∈[0，1]，T(I_in(x, y)) is the image gray scale value after nonlinear enhancement, and T (I)_in(x，y))∈[0，1]. p is a key parameter and is a critical parameter,wherein, I_ave(x，y)∈[0，1]Is the pixel value at the (x, y) position in the image, c₁And c₂Is a constant value, which is specified by testing. The value of epsilon is 0.001.

The image is processed by the exponential nonlinear enhancement function, so that the image contrast is improved.

And step 130, the mean value calculating unit performs convolution operation on the gray level image by using a Gaussian bilateral filter function to calculate mean value information of the gray level image.

In order to perform adaptive enhancement processing on an image, neighborhood information of pixels, that is, information of pixels around the pixels at positions in the image, needs to be considered. This embodiment selects a Gaussian Bilateral Filtering (GBF) function to calculate the mean information of the image. The Gaussian bilateral filtering is an optimization method of Gaussian filtering, and the main principle of the method is to filter by utilizing a Gaussian weight coefficient and image information through convolution operation, wherein the weight coefficient is the product of a Gaussian function and image brightness information. Besides using the geometric proximity between the pixels, the bilateral filter also considers the luminosity/color difference between the pixels, so that the bilateral filter can effectively remove the noise on the image and simultaneously save the edge information on the image.

The gaussian bilateral filter is defined as follows:

where Ω is the neighborhood range at pixel (x, y), and:

∑_(m，n)∈Ωw_mnformula (4) 1;

wherein σ_dAnd σ_rIs w_mnK is a normalization factor, and

and (3) performing convolution operation on the normalized image by using the formula (3) to obtain the mean value information of the pixel at the (x, y) position in the image. The convolution operation is defined as follows:

I_ave(x，y)＝∑_(m，n)_∈Ωw_mn[I_in(m，n)]formula (5);

where (x, y) is the position of the center pixel of the mxn pixels, I_in(m, n) is the value of the pixel normalization operation at position (m, n) in the image.

The edge blurring problem generated during Gaussian filtering can be effectively avoided through Gaussian bilateral filtering, so that clear edge information is reserved, and the image edge is smoother.

And 140, performing two-stage Taylor series expansion on the exponential nonlinear enhancement function by a convolution unit, performing convolution on the mean value information and the expansion result, and obtaining a self-adaptive enhancement image of the gray image based on the local output brightness obtained after the convolution.

After the image neighborhood information is obtained, Non-linear Adaptive Enhancement (NAE) processing is carried out on the image. In order to make the convolution of the quadratic taylor series and the convolution of the above equation (5) coincide in the local contrast of the output image, the following equation is defined:

wherein, g_out(x, y) and g_aveAnd (x, y) are the output brightness and the output local brightness mean value of the pixel at the image position (x, y) after the convolution by the quadratic taylor series expansion respectively.

Performing second taylor series expansion on the formula (2) to obtain:

similar to the above equation (5), the convolution is formed by the above equation (4) and equation (7), resulting in:

wherein, T' is [ I ]_in(x，y)]And T' [ I ]_in(x，y)]The first partial derivative and the second partial derivative of the above formula (1) are provided, respectively. Obtaining the local output brightness of the adjusted image according to the above equation (6) and equation (8):

wherein, g_out1、g_out2、g_out3Are respectively g_outThree components of (x, y). For any λ > 0, λ [ I ]_in(x，y)-I_ave(x，y)]Refers to high frequency components of the input image for improving local contrast.

In order to protect the image edge information, the above equation (9) is rewritten as:

wherein the content of the first and second substances,in the range of [0, 1]，Intensity_adjustment、Contrast_{emenhancement}、Edge_protectionRespectively a brightness adjustment component, a contrast adjustment component and an edge protection component. If g is_out3＞0，Edge_protectionCan be used to improve image Edge information otherwise Edge_protectionFor protecting image edge information.

In step 150, the color conversion unit converts the adaptively enhanced image into a color image.

The formula for restoring the grayscale image to a color image is as follows:

wherein ε is 0.01.

By applying the constant parameter c of the neighborhood radius R and the p in the formula (2)₁And c₂And the standard deviation sigma of Gaussian bilateral filtering in the above formula (3)_dAnd σ_rThe following parameter settings were obtained by testing the values of (a):

r is 2 or 3, c₁∈(0.4，0.6]，c₁∈[0.4，0.6]，σ_d∈(0，1]，σ_d∈(0.5，1)。

According to the embodiment, the neighborhood information of the pixels is obtained by utilizing the Gaussian bilateral filter function, and the exponential function is used for carrying out nonlinear enhancement on the image, so that the image self-adaptive enhancement method carries out image enhancement on the night monitoring image with low brightness, the obtained enhanced image improves the overall brightness and the contrast, and the edge of an object in the image is also optimized; in addition, after the image enhancement is carried out on the partially dark and partially exposed image, the overall brightness of the obtained enhanced image is improved, the global contrast is enhanced, the part of the image where the object is overexposed is repaired, and the display of the details of the dark area is enhanced.

Compared with the original image, the contrast of the image processed by the embodiment is obviously improved and the details of the image are clearer through a comparison test.

In one embodiment, the saliency map generation step of step 200 comprises in particular the following steps 210 to 250.

In step 210, the region information obtaining unit performs superpixel segmentation on the adaptive enhanced image obtained in step 100 to obtain the region information of the target.

The super pixel (superpixel) segmentation algorithm divides adjacent pixels with similar texture, color, brightness and other characteristics into image blocks through clustering, and redundant information of the images can be obtained by adopting a pixel grouping method, so that the processing process of the images is simplified. Specifically, a Simple Linear Iterative Clustering (SLIC) algorithm may be selected to perform superpixel segmentation on the adaptive enhanced image.

After the image is divided, the image is divided into a plurality of area blocks with different sizes. Since the convolutional neural network requires the input images to be uniform in size, the centroid of the superpixel block is acquired, and a window with the size of τ × τ is acquired with the centroid as the center.

Wherein, the centroid extraction formula of the super pixel block is as follows:

wherein (z)_i，z_j) Is the confidence of the i-th region block, corresponding to the pixel position at the centroid in the image, Γ^r(z_i，z_j) Is a pair of centroids (z)_i，z_j) The result of the window-taking of (1),is a windowing operation, x_iIs the ith area block, F_SLIC(x_i) Is the super-pixel processing result of the SLIC algorithm.

In step 220, the edge information obtaining unit extracts the edge information of the target from the adaptive enhanced image by using the bilateral filter.

The bilateral filter is used for extracting the target boundary, and then a specific algorithm is used for extracting the image contour information and the edge interest points, so that the high-frequency information of the image can be kept, and the contour of the image is kept. After the interest points are extracted, region windowing can be performed, and the region operation operator is as follows:

in step 230, the feature identification unit inputs the area information and the edge information into corresponding convolutional neural networks, respectively, to obtain the area feature and the edge feature of the target.

The boundary and the region have different features, so two separate Convolutional Neural Networks (CNN) can be selected for training the two features, for example, a google net model is selected for learning the region feature and the boundary feature of the image. The convolutional neural network comprises six layers, namely a convolutional layer 1, a pooling layer 1, a convolutional layer 2, a pooling layer 2, a fully-connected layer 1 and a fully-connected layer 2. For an input image, firstly carrying out convolution operation and downsampling processing twice, and then carrying out full connection twice, so as to obtain the characteristics of the image.

The convolution kernel is set to be odd, and according to the pixel range of the feature to be proposed in the image, the embodiment selects the convolution kernel with the size of 11 × 11 to perform feature extraction on the image. And calculating by adopting a Dropout algorithm instead of an excitation function after convolution, and performing characteristic sparsification by randomly setting the neural nodes to be zero in the training process so as to improve the algorithm speed. The pooling kernel size is 2 x 2 with a step size of 1, reducing recognition error rate and preventing overfitting by overlapping pooling.

And 240, inputting the region characteristics and the edge characteristics into a saliency detection model for calculation by a saliency map generation unit, and combining the obtained saliency region maps to obtain a comprehensive saliency map.

For local patch x_i∈R^pSet L ═ y_i|y_iE {0, 1} }, where 1 is significant, 0 is not significant, y is_iTo estimate the label. The present embodiment constructs a significance monitoring model based on Conditional Random Field (CRF):

where w is the conditional random field weight vector, Z (w) is the normalized distribution function, and E (-) is the energy function.

The conditional random field is a conditional probability distribution model P (Y | X) representing a markov random field of a set of output random variables Y given a set of input random variables X, i.e. the conditional random field is characterized by assuming that the output random variables constitute a markov random field. The Markov property refers to the distribution characteristic of the N +1 th moment when a random variable sequence is arranged in sequence according to the time sequence, and is irrelevant to the values of the random variables before the N moment. A random field refers to the ensemble after each position is randomly assigned a value of phase space according to some distribution. And adding a Markov property on the basis of the random field to obtain the Markov random field.

The saliency value of an image is: s_i＝p(y_i＝1|x_i(ii) a w), when m local blocks are significant, the significant region of the image can be obtained as S ═ S₁，s₂，s₃，...，s_mWhen merging the saliency maps, the energy function is decomposed into a form of superposition of two energy terms:

wherein, y_i' is an estimate of the ith local Block tag, y_iAnd y_jRefers to the local image blocks that are next to each other,for the data energy term, i.e. the confidence of the pixel in the region map,to smooth the energy term, i.e., the local similarity of the salient region map. The two energy terms are decomposed by the following two equations:

wherein the content of the first and second substances,representing the data energy term region (region),the boundary (boundary) of the region representing the data energy term, and the two terms jointly determine the data energy term. In the above formula (11):

wherein, α and β_xIs a constant value and is provided with a constant value,<·>for the average contrast of the image, phi_p2(x) Is a spatial jump function for reflecting the brightness difference of different local block pixels. The data energy term in the conditional random field can be learned by two independent convolutional neural networks to obtain confidence, and the region and boundary energy of each node are defined as:

wherein, P_CNN1(y_i′|x_b) And P_CNN2(y_i′|x_b) Is the region and edge confidence obtained using two separate GoogleNet networks.

Because a plurality of targets may exist in the image, the comprehensive saliency map is segmented into salient regions, then each salient target is sequentially selected from left to right and from top to bottom to serve as an attention focus, the salient targets are excluded by calculating the area of the salient targets, and the salient targets are not subjected to subsequent processing for the targets which are too small.

In the embodiment, a visual attention mechanism is introduced into target detection, a super-pixel segmentation method is used for clustering regions belonging to a target to obtain region information of the target, a bilateral filter is used for extracting edge information of the target, the significance and the non-significance of a detection region are obtained, finally, a learned confidence coefficient is fused into a conditional random field, an energy value is obtained, the minimum value is calculated to realize the distinguishing between significance and non-significance, and target detection is completed.

Compared with the existing visual saliency model, the saliency map generated by the model provided by the embodiment can effectively cover the boundary of the target, the contrast ratio of the target area and the background area is high, and the saliency of the background area is well inhibited.

In one embodiment, the target image extraction step in step 300 specifically includes the following steps 310 to 350 for the integrated saliency map S with a gray scale of 256.

In step 310, the saliency map dividing unit calculates the occurrence probability of pixels with different gray values in the integrated saliency map, and divides the integrated saliency map into a target region and a background region by using a threshold.

Calculating the appearance probability P of the pixel with the gray value i in the image by the following formula_i：

Wherein n is_iH and W are the height and width of the image, respectively, as the number of occurrences of a pixel having a gray value i.

Given a threshold t, the image is divided into two parts, namely a target area O and a background area B, wherein the probability of the target area O is defined as:the probability of the background region B is defined as:wherein P ═_O(t)+P_B(t)＝1。

In step 320, the threshold determination unit calculates entropies of the target region and the background region, and determines an optimal threshold based on the entropies of the target region and the background region.

The formula for calculating the entropy of the target area is:

the calculation formula of the entropy of the background area is as follows:

the formula for the calculation of the entropy function is: h (t) ═ H_O(t)+H_B(t), the gray value corresponding to the maximum value obtained by the entropy function is the optimal threshold value: t is t^*＝argmax H(t)0≤t≤255。

And step 330, the target image acquisition unit divides the comprehensive saliency map based on the optimal threshold value to obtain a binary image, and superposes the binary image and the self-adaptive enhanced image to obtain a target image.

Using the obtained optimal threshold t^*And (4) segmenting the comprehensive saliency map to obtain a binary image. And (3) performing morphological operations such as corrosion, expansion, opening operation and the like on the binary image, excluding some independent discontinuous white areas, and performing superposition operation on the binary image subjected to the morphological operations and the self-adaptive enhanced image obtained in the step 100 to obtain the target in the image.

The method provided by the embodiment comprehensively considers the regional characteristics and the boundary characteristics of the target, and learns by adopting the deep convolutional neural network to obtain the regional characteristics and the edge characteristics of the target, so that the significance of the target region can be effectively improved, the significance of a background region can be inhibited, the position information of the target can be accurately positioned, and the boundary of the target can be effectively covered.

In one embodiment, the night time monitoring and identification method further comprises: and identifying a target image. The target image recognition step includes: a depth feature extraction unit extracts the depth features of the target image by using a DCNN (digital channel neural network) dual-channel convolutional neural network; and the target class identification unit classifies the selected depth features by using a random forest classifier so as to realize target identification. Aiming at the problems that the target identification difficulty is high due to the fact that the target types are large and the appearances of the targets are similar in the monitoring identification process, a mode that a Double-channel convolutional neural network is combined with a random forest classifier is provided, a DCNN (Double-channel CNN, DCNN) Double-channel convolutional neural network extracts the characteristics of a target image through two independent convolutional neural networks, the abstract degree of the characteristics is high, the judgment power is high, and the essential characteristics of the image can be well reflected.

The double-channel convolutional neural network consists of two independent and parallel convolutional neural network models CNNa and CNNb, the CNNa and CNNb are different in input but same in model structure, the DCNN respectively performs feature learning on the images through the CNNa and CNNb, and finally performs cross mixing operation on the learned features at the top end of the model to obtain final target image features.

Specifically, CNNa and CNNb are both 9-layer network structures, including 5 convolutional layers and 4 fully-connected layers. In order to ensure that the features extracted by two groups of CNNs are different and increase the robustness of the features, the image of the CNN input is appropriately transformed, so that the CNNa and the CNNb have difference on the input. The method specifically comprises the following steps: the input of the CNNa is an image with a size of 256 × 256 after the original image is subjected to normalization processing, and the input of the CNNb is a V (brightness) channel component extracted after HSV (hue, saturation, brightness) conversion is performed on the original image.

And after the characteristic data of the last full-connection layer of the CNNa and the CNNb is obtained on the 10 th layer of the DCNN, the DCNN performs secondary cross mixing operation on the two groups of characteristic data. Namely, the DCNN firstly performs a cross connection on the 9 th layer of CNNa and the 9 th layer of CNNb, takes the result as the input of the 10 th layer, decomposes the crossed result into two blocks in the 10 th layer, each block includes 512 neurons, and then performs a second mixing operation on the CNN features extracted from the two conversion streams in the 11 th layer to obtain a 256-dimensional feature vector, wherein the 256-dimensional feature vector is the depth feature value calculated by the DCNN.

A Random Forest (RF) classifier forms a decision "Forest" by generating a plurality of randomly selected sample subsets and a decision tree generated by a feature subspace, and obtains classification results by voting in a classification stage. The random forest mainly realizes the function of voting selection on the decision to determine the prediction result, and has the characteristics of high prediction precision, strong anti-noise capability, good fitting performance and the like, so that the features are classified by a random forest classifier after the DCNN identifies the depth features.

The classification process of the target image identification step is divided into two stages: a training phase and a testing phase.

In the training stage, firstly, images are randomly selected from a database by using DCNN (distributed computing network) on an image database and image features are extracted; the learned features are then analyzed based on the fitness of the random forest classification, and feature selection is performed based on the results of the analysis. Finally, the selected features are used to train a random forest.

In the testing stage, firstly, the DCNN is used for calculating the depth features of the images, then the feature subset selected in the training stage is used as the final features of the images, and finally the trained random forest is used for classifying the input images.

Random forest training process: when training a tree in a random forest, the input space is recursively partitioned into a set of disjoint partitions, starting with a root node corresponding to the entire input space. At each node, a set of segmentation rules and prediction models are determined for each partition in order to minimize the loss. And selecting a constant segmentation model when training the random forest so as to adapt to the condition that the image feature dimension extracted by the DCNN is higher.

Suppose a random forest F ═ T_tIs a set of trees, each tree T in F_tTraining samples selected at random S ═ S_i＝(X_i，y_i) Carry out training. X_i∈R^dIs a training sample s_iCharacteristic vector of (y)_iIs a category label for the corresponding image. Given characteristic X_iAt each node, the following partitioning function is defined:

whereinRepresenting vector X_iJ (d) of_thAnd (5) maintaining. Sending each sample in S to either the left or right sub-tree using the selected dimension and threshold, and partitioning S into S_iAnd S_r. Training continues to split the samples until all samples have been tested.

For each node, tests of multiple hypotheses are generated by randomly selecting some dimensions and thresholds. Given the minimum score of the criterion of the kini coefficient, which is widely used in decision tree algorithms for selecting segmentation attributes, the choice is to segment the sample to the left child node or the right child node, respectively.

Suppose samples in S come from m different classes C_i(i,.. multidot.m). The kini coefficient of set S is defined as:wherein P is_iIs of the class C_iThe number of samples in (a) to the number of samples in set S.

The kini coefficient is an impure measure. The kini coefficient reaches a minimum value when all samples in the set belong to one class; and when all samples in the set are evenly distributed, the kini coefficient reaches a maximum. If in hypothesis testing, the set S is divided into two subsets S_iAnd S_rThen the kini coefficient becomes:

the dimensions and thresholds are tested at each node randomly and the minimum Gini is chosen to be given_spiltIs performed.

And (3) a characteristic selection process: for each node of the random forest tree, depth features need to be tested, and the optimal features are selected to be used as feature representations of the image. When the random forest is trained, the dimensionality for segmenting the samples is selected based on the Gini coefficient standard, namely, the dimensionality for segmenting the samples into purer sets is selected, so that the aim of easily distinguishing different types of samples is achieved.

The present embodiment employs linear discrimination to evaluate depth features. Linear discrimination, also known as fisher linear discrimination, is used to find linear combinations of features that characterize or separate two or more classes, and can ensure that features of different classes are well separated, while features of the same class remain in a compact distribution. In selecting features, each dimension of the image representation is processed independently and the effectiveness of each of the features is evaluated based on a manner similar to the fisher criterion. For each dimension, the intra-class dispersion calculation for all samples is as follows:

where k is the kth dimension of the image representation, m is the number of separate classes, X is the image feature vector, D_iIs a sample set from class i, n_iIs the number of samples from category i.

Intra-class scattering gives the variance of samples within the same class in the test dimension. When testing dimensions, the inter-class scatter calculation formula is as follows:

where n is the total number of samples for all classes and D is the sample set. The inter-class scatter matrix gives the dissimilarity of the samples of different classes in dimension k.

The weakness is at the dimension of the samples where the intra-class variance is small, while the inter-class dissimilarity is large, it will be easier to separate the samples into purer subsets. Therefore, a score is assigned to each test dimension using the following criteria:

the amount of discriminative power of a dimension can be determined by this formula,

the dimensions of the different classes that are best separated can be obtained by taking the maximum value of f (k). The higher a feature dimension score, the higher the likelihood of being selected for image classification recognition. However, if feature selection is performed directly based on the above criteria, it may cause unstable classification performance of random forests. To reduce the correlation between the selected dimensions, the present embodiment performs feature selection sequentially so that each newly selected dimension is least correlated with the previously selected dimension.

In particular, let K be the set of selected dimensions. To add a new dimension to K, another subset of candidate dimensions L is selected, the probability of which is weighted by the scores from all unselected dimensions. For candidate dimension 1 in L, its correlation to dimension K in K previously selected is calculated using the following equation:

l^*selecting a dimension to add to K: l^*From the subset L, a new dimension may be selected that is least relevant to the previously selected subset.

And (3) target class identification: after training of the random forest F is finished, each tree has a group of leaves, and for an input image to be recognized, firstly, DCNN is used for carrying out depth feature extraction on the input image to be recognized; then, extracting depth features from layers 6 to 11 of the DCNN and concatenating them into an image representation; then reformulating the image representation by discarding the unselected dimensions, the reconstructed image representation passing through each random tree, which descends from the root node and remains descending according to a splitting function until it reaches the leaf nodes; at the leaf node, a predicted value of the image to be identified can be obtained; finally, the class with the largest number of votes from the random forest is predicted as the label for the image to be identified.

An embodiment of the night surveillance identification system disclosed herein is described in detail below with reference to fig. 2. The present embodiment is an embodiment for implementing the night monitoring and identifying method. As shown in fig. 2, the system disclosed in the present embodiment includes:

the salient map generation module is used for extracting the regional characteristics and the edge characteristics from the self-adaptive enhanced image, respectively inputting the regional characteristics and the edge characteristics into the corresponding neural networks, performing saliency calculation on the characteristic recognition result output by the neural networks, and fusing the calculated salient regional maps to obtain a comprehensive salient map;

and the target image identification module is used for identifying the target image by utilizing the neural network.

In one embodiment, an image enhancement module comprises:

the convolution unit is used for performing two-stage Taylor series expansion on the exponential nonlinear enhancement function, performing convolution on the mean value information and an expansion result, and obtaining a self-adaptive enhancement image of the gray level image based on local output brightness obtained after the convolution;

and the color conversion unit is used for converting the self-adaptive enhanced image into a color image.

In one embodiment, the saliency map generation module comprises:

the edge information acquisition unit is used for extracting the edge information of the target from the self-adaptive enhanced image by using the bilateral filter;

the characteristic identification unit is used for respectively inputting the area information and the edge information into the corresponding convolutional neural networks to obtain the area characteristic and the edge characteristic of the target;

and the saliency map generation unit is used for inputting the region features and the edge features into the saliency detection model for calculation, and combining the obtained saliency region maps to obtain a comprehensive saliency map.

In one embodiment, the target image extraction module comprises:

the threshold value determining unit is used for calculating the entropies of the target area and the background area and determining an optimal threshold value based on the entropies of the target area and the background area;

In one embodiment, the system further comprises: a target image recognition module; the target image recognition module includes:

The division of the modules and units herein is only one division of logical functions, and other divisions may be possible in actual implementation, for example, a plurality of modules and/or units may be combined or integrated in another system. The modules and units described as separate parts may be physically separated or not. The components displayed as cells may or may not be physical cells, and may be located in a specific place or distributed in grid cells. Therefore, some or all of the units can be selected according to actual needs to implement the scheme of the embodiment.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A night monitoring identification method based on neural network enhancement is characterized by comprising the following steps:

2. The method according to claim 1, characterized in that said image enhancement step comprises in particular:

converting the adaptively enhanced image to a color image.

3. The method of claim 1, wherein the saliency map generation step specifically comprises:

4. The method according to claim 1, wherein the target image extraction step specifically comprises:

5. The method of claim 1, further comprising: a target image recognition step; the target image recognition step includes:

6. A night monitoring and identification system based on neural network enhancement is characterized by comprising:

7. The system of claim 6, wherein the image enhancement module comprises:

8. The system of claim 6, wherein the saliency map generation module comprises:

9. The system of claim 6, wherein the target image extraction module comprises:

10. The system of claim 6, further comprising: a target image recognition module; the target image recognition module includes: