WO2024037408A1 - 一种基于图像融合和特征增强的煤矿井下行人检测方法 - Google Patents

一种基于图像融合和特征增强的煤矿井下行人检测方法 Download PDF

Info

Publication number
WO2024037408A1
WO2024037408A1 PCT/CN2023/112201 CN2023112201W WO2024037408A1 WO 2024037408 A1 WO2024037408 A1 WO 2024037408A1 CN 2023112201 W CN2023112201 W CN 2023112201W WO 2024037408 A1 WO2024037408 A1 WO 2024037408A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
fusion
cornernet
pedestrian
squeeze
Prior art date
Application number
PCT/CN2023/112201
Other languages
English (en)
French (fr)
Inventor
邹盛
周李兵
陈晓晶
季亮
叶柏松
郝大彬
邱云香
于政乾
蒋雪利
王天宇
黄小明
张清
Original Assignee
天地(常州)自动化股份有限公司
中煤科工集团常州研究院有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 天地(常州)自动化股份有限公司, 中煤科工集团常州研究院有限公司 filed Critical 天地(常州)自动化股份有限公司
Publication of WO2024037408A1 publication Critical patent/WO2024037408A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present invention relates to the technical field of pedestrian detection in coal mines, and in particular, to a pedestrian detection method in coal mines based on image fusion and feature enhancement.
  • Pedestrian detection methods based on machine vision use camera devices to obtain video images, detect and analyze target information through image processing algorithms, and use them for subsequent tracking tasks. They play an important role in video surveillance, driverless vehicles, intelligent robots and other fields.
  • machine vision technology is used to detect pedestrians in dangerous areas such as long-distance belts, closed tunnel entrances, and inclined tunnels. It is of great significance for improving the safety production management level of coal mines and preventing personal casualties.
  • the underground video image environment is complex, with dim light and large noise interference, and underground surveillance cameras are generally installed at high places, resulting in problems such as small size, low resolution, scale changes, and overlap of pedestrians in the video images.
  • Multi-scale pedestrian target detection under With the continuous development of deep learning artificial intelligence algorithms, through large-scale data set training and learning, features are actively extracted, which solves the problem of poor model robustness caused by traditional methods of manual feature extraction.
  • Deep learning pedestrian target detection algorithms are mainly divided into two categories: two-stage and one-stage.
  • the former two-stage mainly generates target candidate frames based on regions, and then performs classification and regression.
  • R-CNN, Fast R-CNN, Faster R-CNN, etc. as representatives, have achieved better results than traditional detection methods. Although they have achieved higher detection accuracy, the detection efficiency is lower; the latter one-stage directly uses an end-to-end training network , there is no need to generate candidate frames, a network realizes the result output, mainly including SSD series, YOLO series, CornerNet series, etc.; CornerNet series has detection accuracy comparable to the two-stage detector, while avoiding the problems caused by the SSD series and YOLO series.
  • the anchor-box mechanism introduces too many hyperparameters and increases the amount of calculation, and the detection of the target is converted into the detection of key points of the target.
  • the hourglass feature extraction network (Hourglass) is used as the backbone network, through the upper left corner point and the lower right corner point of the target. Determine the bounding box position and omit the step of generating anchor boxes.
  • the present invention aims to solve at least one of the technical problems existing in the prior art.
  • the present invention proposes a pedestrian detection method in coal mines based on image fusion and feature enhancement to improve the detection capability of multi-scale pedestrian targets in underground low-illumination complex environments.
  • Step 1 fusion processing of depth image and infrared image:
  • the fusion of depth image and infrared image uses TIF algorithm, which is achieved through three steps of image decomposition, image fusion, and image reconstruction; the fused image is subjected to morphological processing ;
  • Step 2 Construct a CornerNet-Squeeze pedestrian target detection network with enhanced target edge features: the CornerNet-Squeeze pedestrian target detection network is combined with the SqueezeNet network based on the CornerNet network, and uses the fire module in the SqueezeNet network to replace the Res residual in the CornerNet network. Module; introduce the octave convolution OctConv into the CornerNet-Squeeze pedestrian target detection network to process the high and low frequency feature information after the backbone network, forming an improved feature enhancement module. CornerNet-Squeeze pedestrian target detection network;
  • Step 3 Establish an underground infrared depth image fusion pedestrian detection data set, and train the CornerNet-Squeeze pedestrian target detection model with enhanced target edge features: install the infrared camera and depth camera on the roof of the explosion-proof trackless rubber tire vehicle to fully collect pedestrians underground in the coal mine Data, register and align the collected depth images and infrared images, and use the fusion processing method in the first step to form a fused image; use annotation software to manually annotate the depth images, infrared images and the fused images after the fusion of the two, and get Three data sets, the three data sets are depth image training data set, infrared image training data set and fusion image training data set; the three data sets are divided into training sets and test sets, and CornerNet with enhanced target edge features is trained- Squeeze pedestrian target detection model;
  • Step 4 The intrinsically safe edge computing device deploys the CornerNet-Squeeze pedestrian target detection model with enhanced target edge features, and the effect is verified on the test set: the improved CornerNet-Squeeze algorithm with enhanced target edge features and the original CornerNet-Squeeze algorithm are used in depth respectively.
  • the model trained on the image training data set, infrared image training data set and fused image training data set is deployed, tested and verified on intrinsically safe edge computing equipment.
  • the beneficial effects of the present invention are: (1) For underground low-illumination application scenarios, the infrared image and depth image fusion methods are used to combine the advantages of the two, and then through morphological processing, the background interference is effectively reduced, and pedestrian targets with richer edge contours are obtained Features; (2) Using deep learning to independently extract target features, based on the CornerNet-Squeeze target network model, after introducing the octave convolution OctConv connection into the hourglass backbone network, it can effectively process high and low frequency information in image features and enhance the image Edge features improve the detection ability of small target pedestrians.
  • the image decomposition is to use a mean filter to obtain the base layer and detail layer of the image by using a mean filter on aligned infrared images and depth images of the same size.
  • an arithmetic mean strategy is used for fusion of base layer images.
  • a weighted average strategy is used for fusion of detail layer images.
  • the base layer of the fused depth image and the infrared image, and the detail layer of the depth image and the infrared image are directly added to obtain the final fused image of the depth image and the infrared image.
  • the fused image in the morphological processing of the fused image, is processed by first corroding and then dilating the morphological opening operation.
  • the feature enhancement module of octave convolution OctConv is introduced after the CornerNet-Squeeze backbone network.
  • the specific processing steps are as follows:
  • Step 2.1 Perform convolution operation on the feature map extracted by the backbone network to reduce dimensionality
  • step 2.2 the dimensionally reduced feature map uses OctConv to separate and fuse high- and low-frequency feature information
  • Step 2.3 Perform a deconvolution operation on the output high-frequency information to restore the feature size.
  • the depth image, the infrared image and the fused image after the fusion of the two are manually annotated using the annotation software LabelImg.
  • Figure 1 is an algorithm flow chart of the present invention
  • Figure 2 is a schematic diagram of image fusion processing
  • Figure 3 is an infrared image during image fusion processing
  • Figure 4 is the depth image during image fusion processing
  • Figure 5 is the fused image during the image fusion process
  • Figure 6 is the fused image after morphological processing during the image fusion process
  • Figure 7 is a schematic diagram of a single module structure of an hourglass network
  • Figure 8 is a schematic diagram of the Res residual module in the backbone network
  • Figure 9 is a schematic diagram of the fire module in the SqueezeNet network
  • Figure 10 is a schematic diagram of the improved CornerNet-Squeez-Oct network structure
  • Figure 11 is a schematic diagram of the OctConv operation process
  • Figure 12 is the data set production flow chart
  • Figure 13 is a schematic diagram of the CornerNet-Squeeze detection results
  • Figure 14 is a schematic diagram of the CornerNet-Squeeze detection results with target edge enhancement.
  • CornerNet-Squeeze is based on the CornerNet network and combines the ideas of the SqueezeNet network. It makes lightweight improvements to the residual module of the stacked hourglass backbone network, greatly reducing network parameters and improving the model's inference speed.
  • the CornerNet-Squeeze network only performs lightweight processing on the backbone hourglass network.
  • visible light cameras are used to collect visible light images
  • infrared cameras are used to collect infrared images
  • depth cameras are used to collect depth images.
  • the advantage of visible light images lies in their high resolution and rich background details.
  • their disadvantages are that they are easily affected by external factors and have poor imaging quality in complex environments such as low illumination, making them unable to meet actual detection needs.
  • the pedestrian target area is prominent and is not affected by lighting conditions.
  • its disadvantage is low resolution and less detailed feature information.
  • Pedestrians in depth images have clear outlines and are not easily affected by the environment, but the imaging distance is short.
  • the present invention will propose a pedestrian detection method in coal mines based on image fusion and feature enhancement. Specifically, it is a pedestrian target in coal mines based on image fusion and CornerNet-Squeeze. Detection method.
  • the TIF algorithm is used to fuse the images collected by the infrared camera and the depth camera at the pixel level, fully combining the advantages of the two, and then performing morphological processing to reduce background interference; then, based on the CornerNet-Squeeze target network model, the After the octave convolution OctConv connection is introduced into the hourglass backbone network, it processes high and low frequency information in image features and enhances image edge features, which can effectively improve the detection capabilities of multi-scale pedestrian targets in underground low-illumination complex environments.
  • a pedestrian detection method in coal mines based on image fusion and feature enhancement of the present invention includes the following steps:
  • the fusion of depth images and infrared images uses the TIF (Two-Scale Image Fusion) algorithm, through three steps of image decomposition, image fusion, and image reconstruction. accomplish.
  • TIF Tele-Scale Image Fusion
  • the morphological method is used to process the fused image, that is, morphological processing is performed on the fused image.
  • the processing steps are as follows:
  • Image decomposition is to use the mean filter on the aligned infrared image and depth image of the same size to obtain the base layer and detail layer of the image respectively.
  • Image decomposition first aligns the original infrared image f 1 (x, y) and the original depth image f 2 (x, y) of the same size using the mean filter ⁇ (x, y) to obtain the infrared image base layer respectively.
  • Depth image base layer After obtaining the base layer, the detail layer image is obtained through the difference between the original infrared and depth images and the base layer image.
  • Step 1.2 image fusion
  • f b (x, y) represents the fusion image of the base layer obtained by arithmetic averaging of the depth image obtained by image decomposition and the base layer of the infrared image.
  • the image obtained by mean filtering the RGB three-channel data of the original infrared and depth images respectively is used.
  • the image obtained by median filtering Calculate the Euler distance to obtain the visually salient image ⁇ (x, y). The specific calculation is as follows:
  • ⁇ 1 (x, y) represents the infrared image detail layer fusion coefficient
  • ⁇ 1 (x, y) represents the visually salient image of the original infrared image
  • ⁇ 2 (x, y) represents the visually salient image of the original depth image.
  • ⁇ 2 (x, y) represents the depth image detail layer fusion coefficient.
  • f d (x, y) represents the fusion of the detail layer of the depth image and the detail layer of the depth image.
  • Step 1.3 image reconstruction.
  • the base layer of the fused depth image and infrared image, and the detail layer of the depth image and infrared image are directly added to obtain the final fused image ⁇ (x, y) of the depth image and infrared image.
  • Step 1.4 morphological processing.
  • dilation and erosion is the process of convolving a two-dimensional image (or part of an image) with a template (that is, the kernel). It has It can eliminate small areas with high brightness, remove isolated dots and burrs, eliminate small objects, and smooth the boundaries of larger objects.
  • dilation is the operation of finding the local maximum of the image (x, y) and the convolution kernel (x′, y′).
  • corrosion is the operation of finding the local minimum of the image (x, y) and the convolution kernel (x′, y′).
  • the present invention uses first erosion and then expansion morphological opening operation to process the fused image to reduce background interference and highlight pedestrian outline features.
  • the original depth image and infrared image are shown in Figure 3 and Figure 4.
  • the fused image result after the above three steps of step 1.1, step 1.2 and step 1.3 is shown in Figure 5. From the results, it can be seen that after the fusion The image combines the pedestrian grayscale features of the infrared image and the contour edges of the depth image; the result of the morphological processing of the fused image in step 1.4 is shown in Figure 6, which reduces the interference of a lot of unnecessary environmental information and highlights the pedestrian features. , which helps to improve the accuracy of pedestrian detection.
  • Step 2 Construct a CornerNet-Squeeze pedestrian target detection network with enhanced target edge features: the CornerNet-Squeeze pedestrian target detection network is combined with the SqueezeNet network based on the CornerNet network, and uses the fire module in the SqueezeNet network to replace the Res residual in the CornerNet network. Module; introduce the feature enhancement module of the octave convolution OctConv into the CornerNet-Squeeze pedestrian target detection network to process the high and low frequency feature information after the backbone network, forming an improved CornerNet-Squeeze pedestrian target detection network.
  • CornerNet network The core idea of the CornerNet network is through the convolution pool in the hourglass backbone network (Hourglass)
  • the probability map of two sets of corner points in the upper left corner and lower right corner of the target is obtained through processing, also called a heat map.
  • the predicted corner points are aggregated through each set of heat maps to form a target detection box.
  • CornerNet-Squeeze is based on the CornerNet network and combines the idea of the SqueezeNet network, and makes lightweight improvements to the residual module of the stacked hourglass backbone network.
  • the structure of a single module of the hourglass network is shown in Figure 7.
  • the Res residual module in the backbone network is shown in Figure 8.
  • the fire module in the SqueezeNet network (the fire module is shown in Figure 9) was used to replace the Res residual module in the hourglass network.
  • Each original residual module contains two 3 ⁇ 3 kernel convolution layers, while the fire module first uses a 1 ⁇ 1 kernel convolution layer for data dimensionality reduction, and then uses a separable 1 ⁇ 1 kernel convolution layer. layer and a 3 ⁇ 3 kernel convolution layer are combined to expand the output results, greatly reducing network parameters and improving the model’s inference speed.
  • the CornerNet-Squeeze network only performs lightweight processing on the backbone hourglass network. However, when predicting the corner points of the border in the later stage, once the feature information extracted in the hourglass network is incomplete, it will directly affect the target position of the heat map. judgment, resulting in incorrect positioning of the target frame.
  • This invention introduces Scripte Convolution (OctConv) into the CornerNet-Squeeze network to process the high and low frequency feature information after the backbone network to enhance image edge features, which is beneficial to the detection of the diagonal position of the heat map and facilitates the differentiation of targets. and positioning, reducing the misdetection of similar targets and small targets with small spatial distances due to missed corner detection in the CornerNet-Squeeze network, and improving target recognition accuracy.
  • the improved CornerNet-Squeeze network structure is shown in Figure 10.
  • Step 2.1 perform convolution operation and dimensionality reduction processing on the feature map extracted by the backbone network: use 1*1 Conv to perform convolution operation and dimensionality reduction processing on the feature map extracted by the backbone network;
  • Step 2.2 The feature map after dimensionality reduction uses OctConv to separate and fuse high-frequency and low-frequency feature information: the feature map after dimensionality reduction is filtered to separate and fuse high-frequency and low-frequency feature information through OctConv.
  • the OctConv operation process is shown in Figure 11.
  • X, Y ⁇ M c ⁇ h ⁇ w is the convolution feature tensor, where h and w represent the spatial dimensions of the feature tensor, and c represents the number of channels.
  • XL performs convolution kernel upsampling operation
  • XH performs average pooling and convolution operations
  • the output fusion feature components are YL and YH
  • the output fusion feature high and low frequency components YL and YH are solved as follows:
  • F ⁇ M c ⁇ k ⁇ k is the convolution kernel of k ⁇ k
  • represents the convolution operation
  • P represents the pooling operation.
  • F ⁇ M c ⁇ k ⁇ k is the convolution kernel of k ⁇ k
  • represents the convolution operation
  • U represents the upsampling operation
  • the adjustment coefficient ⁇ can control the proportion of high- and low-frequency fusion components to obtain the final fusion feature information Y.
  • represents the adjustment coefficient
  • represents the amplitude coefficient
  • the OctConv module is used to enhance high-frequency information and integrate low-frequency information to output more high-frequency components on the basis of effective communication of high- and low-frequency component features.
  • N the number of targets in the image
  • C represents the number of channels
  • H and W represent spatial dimensions
  • P cij represents the (i, j) position of the c-th channel in the heat map
  • y cij represents the correctly labeled data ground truth of the c-th channel corresponding to the target
  • ⁇ and ⁇ represent the hyperparameters that control the corner points
  • the (1-y cij ) term enhances the constraints on the target ground truth.
  • Step 2.3 Perform a deconvolution operation on the output high-frequency information to restore the feature size.
  • Step 3 Establish an underground infrared depth image fusion pedestrian detection data set, and train the CornerNet-Squeeze pedestrian target detection model with enhanced target edge features:
  • the infrared camera and The depth camera is installed on the roof of the explosion-proof trackless rubber-tyred vehicle to fully collect the data of pedestrians underground in the coal mine.
  • the original data collected is saved in the form of video.
  • the depth image and infrared image are obtained by extracting frames from the video.
  • the resolution of the infrared image is 1080 ⁇ 720dpi, and the depth The image resolution is 640 ⁇ 360dpi.
  • the collected depth images and infrared images are registered and aligned based on the scale-invariant feature transformation algorithm.
  • the resolutions of the registered depth images and infrared images are both 640 ⁇ 360dpi.
  • the images are then processed Center cropping eliminates alignment errors at the edges, and finally obtains 1,000 sets of infrared images and depth images with a resolution of 480 ⁇ 360dpi, including special scene samples such as occlusion, dense crowds, small targets in underground mines with low illumination, water mist, and dust, totaling approximately 2000 pedestrian targets.
  • the fusion processing method in the first step is used to form the fused image; the depth image, the infrared image and the fused image after the fusion of the two are manually annotated using annotation software to obtain three data sets, which are depth image training data.
  • the depth image and infrared image are fused into a fused image.
  • the depth image, infrared image and the fused image after the fusion of the two are manually annotated using the labeling software LabelImg to obtain three training data sets.
  • the three training data sets are depth image training.
  • the data set production process is shown in Figure 12.
  • the training platform of the pedestrian target detection model is NVIDIA GeForce GTX 2080Ti, the memory is 32GB, the operating system is Ubuntu18.04LTS, and the Pytorch deep learning framework is used.
  • the learning rate is set to 0.001
  • the batch size is 8, and the number of training iterations is 500.
  • the training set and verification set contained 700 and 100 image samples respectively, and the test set contained 200 image samples.
  • Step 4 The intrinsically safe edge computing device deploys the CornerNet-Squeeze pedestrian target detection model with enhanced target edge features, and the effect is verified on the test set: the improved CornerNet-Squeeze algorithm with enhanced target edge features and the original CornerNet-Squeeze algorithm are used in depth respectively.
  • the model trained on the three data sets of image training data set, infrared image training data set and fused image training data set is deployed, tested and verified on the intrinsically safe edge computing device. Specifically, the trained model is deployed on the model Test and verify the ZJB18-Z mining intrinsically safe edge computing device.
  • the device has 14TOP computing power.
  • the performance indicators obtained on the test set are shown in Table 1.
  • mAP mean average precision
  • FPS recall rate
  • the mAP of the test results obtained by training the fused image data set on three different models has improved, indicating that depth image and infrared fusion can fully combine the advantages of both and improve the detection accuracy of the model; in three different models
  • the improved target edge-enhanced CornerNet-Squeeze model of the present invention significantly improved the mAP and FPS speed indicators on all three data sets. Due to the image fusion calculation, the FPS of the improved model was slightly lower than that before the improvement. decline. It can be seen that the present invention basically maintains the detection speed of the original algorithm while improving the accuracy of pedestrian detection.
  • the target confidence of pedestrian detection using fused image data on the two models is improved compared with both infrared images and depth images; the improved CornerNet-Squeeze of the present invention can better detect distant objects Small targets are not detected by CornerNet-Squeeze, and the detection effect is more ideal.
  • This invention is mainly used in the fields of underground unmanned driving and security monitoring.
  • pedestrians in images have less edge texture details, low signal-to-noise ratio, and are affected by background information.
  • a big problem is that it is difficult to effectively identify pedestrian targets at multiple scales.
  • a pedestrian detection method in coal mines based on image fusion and feature enhancement is proposed.
  • the octave convolution OctConv connection is introduced into the CornerNet-Squeeze hourglass backbone network to enhance image edge features, overcome the above problems, and improve the detection ability of underground pedestrians in low illumination and multi-scale.
  • the present invention is a method for pedestrian detection in coal mines based on image fusion and feature enhancement.
  • the infrared image and depth image fusion methods are used to combine the advantages of both, and then undergo morphological processing to effectively reduce background interference and obtain Pedestrian target features with richer edge contours are adopted; deep learning is used to independently extract target features.
  • the CornerNet-Squeeze target network model the octave convolution OctConv connection is introduced into the hourglass backbone network, which can effectively process medium and high image features. Low-frequency information enhances image edge features and improves the detection ability of small target pedestrians.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种基于图像融合和特征增强的煤矿井下行人检测方法,包括以下步骤:第1步骤、深度图像和红外图像的融合处理;第2步骤、构建目标边缘特征增强的CornerNet-Squeeze行人目标检测网络;第3步骤、建立井下红外深度图像融合行人检测数据集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型;第4步骤、本安型边缘计算设备部署目标边缘增强的CornerNet-Squeeze行人检测模型,在测试集验证效果。该种基于图像融合和目标边缘特征增强的煤矿井下行人检测方法,用以提高井下低照度复杂环境的多尺度行人目标的检测能力。

Description

一种基于图像融合和特征增强的煤矿井下行人检测方法 技术领域
本发明涉及煤矿井下行人检测的技术领域,尤其是一种基于图像融合和特征增强的煤矿井下行人检测方法。
背景技术
基于机器视觉的行人检测方法采用摄像装置获取视频图像,通过图像处理算法对目标信息进行检测和分析,并用于后续跟踪任务,在视频监控、无人驾驶车辆、智能机器人等领域发挥了重要作用。在智能化矿井建设中,采用机器视觉技术对长距离胶带沿线、封闭巷道入口、斜巷等危险区域进行行人检测,对于提高煤矿安全生产管理水平、防范人身伤亡事故具有重要的意义。但井下的视频图像环境复杂,光线暗淡,噪声干扰大,且井下监控摄像头一般安装在高处,导致视频图像中的行人存在尺寸偏小、分辨率低、尺度变化、行人重叠等问题。因井下环境的特殊性,面临行人目标检测常见的多尺度、遮挡、低照度等多种因素的挑战,研究井下低照度等复杂环境多尺度目标行人鲁棒性识别是一个亟待解决的问题,对保障井下安全生产具有重要意义和应用价值。
传统的行人检测算法,如HOG+SVM、ICF+AdaBoost、DPM等,主要依赖于人工设计特征,特征单一且主观性强,泛化能力差,难以适用于井下低照度、粉尘等特殊工况环境下的多尺度行人目标检测。随着深度学习人工智能算法的不断更迭发展,通过大规模数据集训练学习,主动提取特征,解决了由于传统方法人工提取特征导致的模型鲁棒性差的问题。
深度学习行人目标检测算法主要分为two-stage和one-stage两类,前者two-stage主要是基于区域生成目标候选框,再进行分类回归,以R-CNN、Fast  R-CNN、Faster R-CNN等为代表,取得了比传统检测方法更好的结果,虽取得更高的检测精度,但检测效率较低;后者one-stage则直接采用端到端的训练网络,无需生成候选框,一个网络实现结果输出,主要包括SSD系列、YOLO系列、CornerNet系列等;CornerNet系列具有可以与两阶段检测器相媲美的检测精度,同时可避免SSD系列、YOLO系列因采用的anchor-box机制引入太多超参数而增加计算量的问题,且目标的检测转换为对目标关键点的检测,使用沙漏特征提取网络(Hourglass)作为骨干网络,通过目标左上角点与右下角点确定边界框位置,省略生成锚框的步骤。
发明内容
本发明旨在至少解决现有技术中存在的技术问题之一。
为此,本发明提出一种基于图像融合和特征增强的煤矿井下行人检测方法,用以提高井下低照度复杂环境的多尺度行人目标的检测能力。
根据本发明实施例的一种基于图像融合和特征增强的煤矿井下行人检测方法,包括以下步骤:
第1步骤、深度图像和红外图像的融合处理:深度图像和红外图像两者的融合采用TIF算法,通过图像分解、图像融合、图像重构三个步骤实现;对融合后的图像进行形态学处理;
第2步骤、构建目标边缘特征增强的CornerNet-Squeeze行人目标检测网络:CornerNet-Squeeze行人目标检测网络在CornerNet网络基础上结合SqueezeNet网络,使用SqueezeNet网络中的fire模块代替了CornerNet网络中的Res残差模块;在CornerNet-Squeeze行人目标检测网络中引入八度卷积OctConv处理主干网络后高低频特征信息的特征增强模块,形成改进后的 CornerNet-Squeeze行人目标检测网络;
第3步骤、建立井下红外深度图像融合行人检测数据集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型:将红外相机和深度相机安装在防爆无轨胶轮车车顶,充分采集煤矿井下行人数据,对采集的深度图像和红外图像进行配准对齐,采用第1步骤中的融合处理方法形成融合图像;对深度图像、红外图像及两者融合后的融合图像采用标注软件进行人工标注,得到三种数据集,三种数据集分别为深度图像训练数据集、红外图像训练数据集和融合图像训练数据集;将三种数据集划分为训练集和测试集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型;
第4步骤、本安型边缘计算设备部署目标边缘特征增强的CornerNet-Squeeze行人目标检测模型,在测试集验证效果:将改进的目标边缘特征增强CornerNet-Squeeze算法和原始CornerNet-Squeeze算法分别在深度图像训练数据集、红外图像训练数据集和融合图像训练数据集上训练得到模型在本安型边缘计算设备进行部署测试和验证。
本发明的有益效果是,(1)针对井下低照度应用场景,采用红外图像和深度图像融合方式结合两者优势,再经过形态学处理,有效减少背景干扰,获得了边缘轮廓更加丰富的行人目标特征;(2)采用深度学习自主提取目标特征的方法,在CornerNet-Squeeze目标网络模型的基础上,将八度卷积OctConv连接引入沙漏主干网络之后,能够有效处理图像特征中高低频信息,增强图像边缘特征,提升了对小目标行人的检测能力。
根据本发明一个实施例,所述图像分解是将对齐后同样大小的红外图像和深度图像使用均值滤波器分别获得图像的基础层和细节层。
根据本发明一个实施例,在所述图像融合中,对于基础层图像的融合采用算术平均策略融合。
根据本发明一个实施例,在所述图像融合中,对于细节层图像的融合使用加权平均的策略进行融合。
根据本发明一个实施例,在所述图像重构中,将融合后的深度图像和红外图像的基础层、深度图像和红外图像的细节层直接相加得到最终深度图像和红外图像的融合图像。
根据本发明一个实施例,在融合图像形态学处理中,采用先腐蚀后膨胀形态学开运算处理融合后的图像。
根据本发明一个实施例,在CornerNet-Squeeze主干网络后引入八度卷积OctConv的特征增强模块,具体处理步骤如下:
第2.1步骤、对主干网络提取的特征图进行卷积操作降维处理;
第2.2步骤、降维后的特征图采用OctConv分离融合高低频特征信息;
第2.3步骤、对输出的高频信息经过反卷积操作,还原特征尺寸。
根据本发明一个实施例,在所述第1步骤中,对深度图像、红外图像及两者融合后的融合图像采用标注软件LabelImg进行人工标注。
本发明的其他特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本发明而了解。本发明的目的和其他优点在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。
为使本发明的上述目的、特征和优点能更明显易懂,下文特举较佳实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明的算法流程图;
图2是图像融合处理示意图;
图3是图像融合处理过程中的红外图像;
图4是图像融合处理过程中的深度图像;
图5是图像融合处理过程中的融合图像;
图6是图像融合处理过程中的形态学处理后的融合图像;
图7是沙漏型网络单个模块结构示意图;
图8是主干网络中的Res残差模块示意图;
图9是SqueezeNet网络中fire模块示意图;
图10是改进的CornerNet-Squeez-Oct网络结构示意图;
图11是OctConv操作过程示意图;
图12是数据集制作流程图;
图13是CornerNet-Squeeze检测结果示意图;
图14是目标边缘增强的CornerNet-Squeeze检测结果示意图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中 的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
目前基于深度学习的行人检测算法在地面/可见光场景取得较高的准确性和实时性,然而针对井下低照度等复杂环境,行人检测面临着行人姿态尺度多变、复杂环境导致行人特征丢失、对网络模型实时性要求较高等挑战,存在高检测率和低误判率难以同时兼容的问题。CornerNet-Squeeze是在CornerNet网络基础上结合SqueezeNet网络的思想,针对堆栈沙漏型主干网络的残差模块进行了轻量化的改进,大大减少网络参数,提高模型的推理速度。但CornerNet-Squeeze网络只针对主干沙漏网络做了轻量化的处理,但在后期预测边框角点时,一旦由于在沙漏网络提取特征信息不完整,将直接影响热图对目标位置的判断,导致目标框定位错误,降低目标识别的准确性。
图像采集设备主要有可见光相机、红外相机和深度相机三种,其中,可见光相机用于采集可见光图像,红外相机用于采集红外图像,深度相机用于采集深度图像。可见光图像的优势在于其分辨率高,背景细节信息丰富,但其缺点是容易受到外界因素影响,在低照度等复杂环境下成像质量差,无法满足实际检测需求。而红外图像中行人目标区域突出,不受光照条件的影响,但其缺点是分辨率低,细节特征信息较少。深度图像中行人轮廓清晰,不易受环境影响,但成像距离短。考虑到井下低照度应用场景,井下低照度等复杂环境多尺度小目标行人鲁棒性识别是一个难点,仅仅利用可见光相机或红外相机或深度相机等采集到的单一图像来源难以满足井下行人目标精准检测的需求。
因此,本发明将提出一种基于图像融合和特征增强的煤矿井下行人检测方法,具体地,是一种基于图像融合和CornerNet-Squeeze的煤矿井下行人目标 检测方法。首先,采用TIF算法将红外相机和深度相机采集的图像进行像素级融合,充分结合两者的优点,再进行形态学处理,减少背景干扰;然后,在CornerNet-Squeeze目标网络模型的基础上,将八度卷积OctConv连接引入沙漏主干网络之后,处理图像特征中高低频信息,增强图像边缘特征,可有效提高井下低照度复杂环境的多尺度行人目标的检测能力。
下面参考附图具体描述本发明实施例的基于图像融合和特征增强的煤矿井下行人检测方法。
见图1,本发明的一种基于图像融合和特征增强的煤矿井下行人检测方法,包括以下步骤:
见图2,第1步骤、深度图像和红外图像的融合处理:深度图像和红外图像两者的融合采用TIF(Two-Scale Image Fusion)算法,通过图像分解、图像融合、图像重构三个步骤实现。同时,为了提高融合后图像的成像品质,突出行人的纹理细节和灰度特征,消除冗余的背景干扰,采用形态学方法对融合图像进行处理,即对融合后的图像进行形态学处理,具体处理步骤如下:
第1.1步骤、图像分解:图像分解是将对齐后同样大小的红外图像和深度图像使用均值滤波器分别获得图像的基础层和细节层。
图像分解首先将对齐后同样大小的原始红外图像f1(x,y)和原始深度图像f2(x,y)使用均值滤波器μ(x,y)分别获得红外图像基础层深度图像基础层在得到基础层之后,通过原始红外和深度图像与基础层图像的差值得到细节层图像,红外图像细节层和深度图像细节层
红外图像基础层的计算公式如下所示:
深度图像基础层的计算公式如下所示:
红外图像细节层的计算公式如下所示:
深度图像细节层的计算公式如下所示:
第1.2步骤、图像融合。
对于基础层图像的融合采用算术平均策略融合,具体计算如下所示:
其中,公式(5)中的各个符号所表示的含义具体如下所示:
fb(x,y)表示将图像分解得到的深度图像和红外图像的基础层进行算术平均得到基础层的融合图像。
对于细节层图像,采用对原始红外和深度图像的RGB三通道数据分别经过均值滤波得到的图像和中值滤波得到的图像计算欧拉距离得到视觉显著图像ε(x,y),具体计算如下所示:
其中,公式(6)中的各个符号所表示的含义具体如下所示:
表示对图像RGB的红色通道进行均值滤波处理后的结果;
表示对图像RGB的绿色通道进行均值滤波处理后的结果;
表示对图像RGB的蓝色通道进行均值滤波处理后的结果;
表示对图像RGB的红色通道进行中值滤波处理后的结果;
表示对图像RGB的绿色通道进行中值滤波处理后的结果;
表示对图像RGB的蓝色通道进行中值滤波处理后的结果。
即,原始红外图像的视觉显著图像ε1(x,y)的计算如下所示:
其中,公式(7)中的各个符号所表示的含义具体如下所示:
表示对原始红外图像RGB的红色通道进行均值滤波处理后的结果;
表示对原始红外图像RGB的绿色通道进行均值滤波处理后的结果;
表示对原始红外图像RGB的蓝色通道进行均值滤波处理后的结果;
表示对原始红外图像RGB的红色通道进行中值滤波处理后的结果;
表示对原始红外图像RGB的绿色通道进行中值滤波处理后的结果;
表示对原始红外图像RGB的蓝色通道进行中值滤波处理后的结果。
即,原始深度图像的视觉显著图像ε2(x,y)的计算如下所示:
其中,公式(8)中的各个符号所表示的含义具体如下所示:
表示对原始深度图像RGB的红色通道进行均值滤波处理后的结果;
表示对原始深度图像RGB的绿色通道进行均值滤波处理后的结果;
表示对原始深度图像RGB的蓝色通道进行均值滤波处理后的结果;
表示对原始深度图像RGB的红色通道进行中值滤波处理后的结果;
表示对原始深度图像RGB的绿色通道进行中值滤波处理后的结果;
表示对原始深度图像RGB的蓝色通道进行中值滤波处理后的结果。
对原始红外图像f1(x,y)和原始深度图像f2(x,y),分别进行上述计算得到ε1(x,y)和ε2(x,y),通过这两个视觉显著图像得到细节层的融合系数矩阵:
其中,公式(9)中的各个符号所表示的含义具体如下所示:
δ1(x,y)表示红外图像细节层融合系数;
ε1(x,y)表示原始红外图像的视觉显著图像;
ε2(x,y)表示原始深度图像的视觉显著图像。
其中,公式(10)中的各个符号所表示的含义具体如下所示:
δ2(x,y)表示深度图像细节层融合系数。
对于细节层图像的融合使用加权平均的策略进行融合,具体计算如下所示:
其中,公式(11)中的各个符号所表示的含义具体如下所示:
表示红外图像的细节层;
表示深度图像的细节层;
fd(x,y)表示深度图像的细节层和深度图像的细节层的融合。
第1.3步骤、图像重构建。
在图像重构中,将融合后的深度图像和红外图像的基础层、深度图像和红外图像的细节层直接相加得到最终深度图像和红外图像的融合图像ρ(x,y),融合图像ρ(x,y)的具体计算如下所示:
ρ(x,y)=fb(x,y)+fd(x,y)     (12)
第1.4步骤、形态学处理。
最基本的形态学操作包括膨胀(dilate)和腐蚀(erode),是将一幅二维图像(或图像的一部分)与一个模板(也就是核)进行卷积运算的过程,具有 消除亮度较高的细小区域,去除孤立的小点,毛刺,消除小物体,平滑较大物体边界的作用。
膨胀运算的数学表达式如下所示:
其中,膨胀就是求图像(x,y)与卷积核(x′,y′)的局部最大值的操作。
腐蚀运算的数学表达式如下所示:
其中,腐蚀就是求图像(x,y)与卷积核(x′,y′)的局部最小值的操作。
本发明采用先腐蚀后膨胀形态学开运算处理融合后的图像,减小背景干扰,突出行人轮廓特征。
原始深度图像和红外图像见图3、图4,经过上述第1.1步骤、第第1.2步骤、第1.3步骤这三个步骤处理后的融合图像结果如图5所示,从结果可以看出融合后图像结合了红外图像的行人灰度特征和深度图像的轮廓边缘;融合后图像经第1.4步骤形态学处理的结果如图6所示,减少了很多不必要的环境信息的干扰,突出了行人特征,有助于提高行人检测的准确率。
第2步骤、构建目标边缘特征增强的CornerNet-Squeeze行人目标检测网络:CornerNet-Squeeze行人目标检测网络在CornerNet网络基础上结合SqueezeNet网络,使用SqueezeNet网络中的fire模块代替了CornerNet网络中的Res残差模块;在CornerNet-Squeeze行人目标检测网络中引入八度卷积OctConv处理主干网络后高低频特征信息的特征增强模块,形成改进后的CornerNet-Squeeze行人目标检测网络。
CornerNet网络的核心思想是通过沙漏型主干网络(Hourglass)中的卷积池 化处理得到目标的左上角和右下角的两组角点的概率图,也称热图(Heatmap)。根据模型的类别数目,通过每组热图对预测角点进行聚合,形成目标的检测框box。CornerNet-Squeeze是在CornerNet网络基础上结合SqueezeNet网络的思想,针对堆栈沙漏型主干网络的残差模块进行了轻量化的改进。沙漏型网络单个模块结构如图7所示,从图中可以看出该网络中使用了大量的残差Res模块,导致CornerNet主干部分在输入为256×256dpi图像时网络参数高达18700万,其计算复杂度随输入图像尺寸增大而呈指数增加,主干网络中的Res残差模块如图8所示。为追求更高的实时性,对网络模型进行了精简,使用SqueezeNet网络中的fire模块(fire模块如图9所示)代替了沙漏网络中的Res残差模块。每个原始残差模块包含2个3×3核卷积层,而fire模块则首先使用1个1×1核卷积层进行数据降维,然后用可分离的1个1×1核卷积层和1个3×3核卷积层进行组合扩展输出结果,大大减少网络参数,提高模型的推理速度。
改进CornerNet-Squeeze模型,CornerNet-Squeeze网络只针对主干沙漏网络做了轻量化的处理,但在后期预测边框角点时,一旦由于在沙漏网络提取特征信息不完整,将直接影响热图对目标位置的判断,导致目标框定位错误。本发明在CornerNet-Squeeze网络中引入八度卷积(Octave Convolution,OctConv)处理主干网络后高低频特征信息,来增强图像边缘特征,有利于热图对角点位置的检测,便于对目标进行区分和定位,减少CornerNet-Squeeze网络由于角点检测漏检导致空间距离较小的同类目标以及小目标的误检情况,提高目标识别准确性。改进的CornerNet-Squeeze网络结构如图10所示。
在图像处理中,高频分量所代表的图像轮廓边缘等细节特征是需要关注的, 有助于进行显著性检测和物体识别。相反,低频特征图包含的信息较少,如对图像中高频分量和低频分量同等处理,前者高频分量的效益是远大于后者低频分量。同理,在卷积神经网络中,卷积计算得到的特征图中也存在高频部分和低频部分,通过分离特征图,增加高频信息输出,可更多的提取图像中所关注目标的轮廓特征,有助于目标边缘增强,提高识别率。本发明在CornerNet-Squeeze主干网络后引入八度卷积OctConv的特征增强模块,具体处理步骤如下:
第2.1步骤、对主干网络提取的特征图进行卷积操作降维处理:对主干网络提取的特征图采用1*1的Conv进行卷积操作降维处理;
第2.2步骤、降维后的特征图采用OctConv分离融合高低频特征信息:降维后的特征图通过OctConv过滤分离-融合高频和低频特征信息,OctConv操作过程如图11所示。首先将主干网络提取的Feature Map沿通道尺寸使用系数α将主干网络输出的特征图分解为高频分量XH∈M(1-α)·c×h×w和低频分量 α∈[0,1],X,Y∈Mc×h×w为卷积特征张量,其中h、w表示特征张量的空间维度,c表示通道数。XL进行卷积核上采样操作,XH进行平均池化和卷积操作,输出融合特征分量为YL和YH,最终得到融合的特征信息Y=[YH,YL]。输出的融合特征高低频分量YL和YH求解如下:
YL的计算公式如下所示:
YL=(XL×F)+(PXH×F)       (15)
其中,公式(15)中的各个符号所表示的含义具体如下所示:
F∈Mc×k×k为k×k的卷积核;
×表示卷积运算;
P表示池化操作。
YH的计算公式如下所示:
YH=(XH×F)+u(XL×F)    (16)
其中,公式(16)中的各个符号所表示的含义具体如下所示:
F∈Mc×k×k为k×k的卷积核;
×表示卷积运算;
U表示上采样操作;
调节系数α可控制高低频融合分量的比例,得到最终的融合特征信息Y。
最终的融合特征信息Y的计算公式如下所示:
Y=[αYL+(1-α)YH]·ρ    (17)
其中,公式(17)中的各个符号所表示的含义具体如下所示:
α表示调节系数;
ρ表示幅值系数,且ρ∈(0,1)。
由于高频处理能够突出特征的边缘信息,利于特征边缘信息的增强显示,通过OctConv模块增强高频信息,融合低频信息,在实现高低频分量特征有效通信的基础上输出更多的高频分量。得到损失函数L:
其中,公式(18)中的各个符号所表示的含义具体如下所示:
N表示图像中目标的数量;
C表示通道数;
H和W表示空间维度;
Pcij表示热图中第c个通道的(i,j)位置;
ycij表示第c个通道对应目标的正确标记的数据ground truth;
α和β表示控制角点的超参数;
(1-ycij)项增强了对目标ground truth的约束。
第2.3步骤、对输出的高频信息经过反卷积操作,还原特征尺寸。
第3步骤、建立井下红外深度图像融合行人检测数据集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型:为同时结合深度图像和红外图像的特征及其优点与技术优势,将红外相机和深度相机安装在防爆无轨胶轮车车顶,充分采集煤矿井下行人数据,采集的原始数据以视频方式保存,通过对视频抽帧得到深度图像和红外图像,红外图像分辨率为1080×720dpi,深度图像分辨率为640×360dpi,基于尺度不变特征变换算法对采集的深度图像和红外图像进行配准对齐,配准后的深度图像、红外图像的分辨率均为640×360dpi,再对图像进行中心裁剪消除边缘部分的对齐误差,最终得到1000组分辨率为480×360dpi的红外图像和深度图像,包含遮挡、密集人群、小目标在井下低照度、水雾、粉尘等特殊场景样本,总计约2000个行人目标。采用第1步骤中的融合处理方法形成融合图像;对深度图像、红外图像及两者融合后的融合图像采用标注软件进行人工标注,得到三种数据集,三种数据集分别为深度图像训练数据集、红外图像训练数据集和融合图像训练数据集;将三种数据集划分为训练集和测试集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型。
将深度图像和红外图像融合成融合图像,对深度图像、红外图像及两者融合后的融合图像采用标注软件LabelImg进行人工标注,得到三种训练数据集,三种训练数据集分别为深度图像训练数据集、红外图像训练数据集和融合图像 训练数据集。数据集制作流程如图12所示。
行人目标检测模型的训练平台为NVIDIA GeForce GTX 2080Ti,内存为32GB,操作系统为Ubuntu18.04LTS,采用Pytorch深度学习框架,模型训练时设置学习率为0.001,批尺寸为8,训练迭代次数为500。实验过程中训练集和验证集分别包含700和100张图像样本,测试集包含200张图像样本。
第4步骤、本安型边缘计算设备部署目标边缘特征增强的CornerNet-Squeeze行人目标检测模型,在测试集验证效果:将改进的目标边缘特征增强CornerNet-Squeeze算法和原始CornerNet-Squeeze算法分别在深度图像训练数据集、红外图像训练数据集和融合图像训练数据集这三种数据集上上训练得到模型在本安型边缘计算设备进行部署测试和验证,具体地,将训练好的模型部署在型号为ZJB18-Z矿用本安型边缘计算设备进行测试验证,该设备具有14TOP算力,测试集上得到的性能指标如表1所示。
所采用的性能评价指标为平均精度均值(mAP,mean Average Precision)以及帧率(FPS,frames per second)。mAP为衡量算法检测精度的指标,是一种对准确率P(Precision)和召回率R(Recall)的综合处理指标,表示PR曲线下的面积。FPS是衡量算法速度的指标,其表示算法每秒内可以检测的图片数量,针对融合图像,时间计算包括图像融合和行人检测整个过程。
表1不同模型在不同数据集性能行人检测性能对比表

由上表可见,融合图像数据集在三种不同的模型上训练得到测试结果mAP均有提升,表明深度图像和红外融合能充分结合两者的优势,提高模型的检测精度;在三种不同的数据集上,本发明改进后的目标边缘增强的CornerNet-Squeeze模型在三种数据集上均显著提升了mAP,FPS速度指标方面,由于图像融合计算,改进后模型的FPS相比于改进前略有下降。由此可见,本发明在提升行人检测准确性的同时,基本保持了原算法的检测速度。
如图13和图14所示,给出了测试集中部分图像的行人目标检测结果。
如图13所示,从左到右分别为红外图像、深度图像和融合图像在CornerNet-Squeeze上测试结果,目标框上的数字表示置信度。
见图13,按照从左到右的顺序,三张小图所对应的置信度分别为0.69、0.73、0.79。
如图14所示,从左到右分别为红外图像、深度图像和融合图像在本发明改进的CornerNet-Squeeze上测试结果,目标框上的数字表示置信度。
见图14,按照从左到右的顺序,三张小图所对应的置信度分别为0.42、0.69、0.75、0.45、0.82。
由图13和图14可见,采用融合图像数据在两种模型上进行行人检测的目标置信度较红外图像和深度图像均有提升;本发明改进后的CornerNet-Squeeze能更好够检测出远处小目标,而CornerNet-Squeeze没有检测,检测效果更为理想。
本发明主要应用于井下无人驾驶和安防监控等领域,针对在煤矿井下受低照度、粉尘等特殊工况环境的影响,图像中行人存在边缘纹理细节少、信噪比低、受背景信息影响大的问题,难以有效识别多尺度下的行人目标,提出的一种基于图像融合和特征增强的煤矿井下行人检测方法,通过采用增加形态学处理的红外图像和深度图像融合的TIF方法,并将八度卷积OctConv连接引入CornerNet-Squeeze沙漏主干网络之后增强图像边缘特征,克服上述问题,提高井下行人低照度多尺度行人的检测能力。
本发明的一种基于图像融合和特征增强的煤矿井下行人检测方法,针对井下低照度应用场景,采用红外图像和深度图像融合方式结合两者优势,再经过形态学处理,有效减少背景干扰,获得了边缘轮廓更加丰富的行人目标特征;采用深度学习自主提取目标特征的方法,在CornerNet-Squeeze目标网络模型的基础上,将八度卷积OctConv连接引入沙漏主干网络之后,能够有效处理图像特征中高低频信息,增强图像边缘特征,提升了对小目标行人的检测能力。
以上,仅为本发明较佳的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,根据本发明的技术方案及其发明构思加以等同替换或改变,都应涵盖在本发明的保护范围之内。

Claims (7)

  1. 一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于,包括以下步骤:
    第1步骤、深度图像和红外图像的融合处理:深度图像和红外图像两者的融合采用TIF算法,通过图像分解、图像融合、图像重构三个步骤实现;对融合后的图像进行形态学处理;
    第2步骤、构建目标边缘特征增强的CornerNet-Squeeze行人目标检测网络:CornerNet-Squeeze行人目标检测网络在CornerNet网络基础上结合SqueezeNet网络,使用SqueezeNet网络中的fire模块代替了CornerNet网络中的Res残差模块;在CornerNet-Squeeze行人目标检测网络中引入八度卷积OctConv处理主干网络后高低频特征信息的特征增强模块,形成改进后的CornerNet-Squeeze行人目标检测网络;
    第3步骤、建立井下红外深度图像融合行人检测数据集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型:将红外相机和深度相机安装在防爆无轨胶轮车车顶,充分采集煤矿井下行人数据,对采集的深度图像和红外图像进行配准对齐,采用第1步骤中的融合处理方法形成融合图像;对深度图像、红外图像及两者融合后的融合图像采用标注软件进行人工标注,得到三种数据集,三种数据集分别为深度图像训练数据集、红外图像训练数据集和融合图像训练数据集;将三种数据集划分为训练集和测试集,训练目标边缘特征增强的CornerNet-Squeeze行人目标检测模型;
    第4步骤、本安型边缘计算设备部署目标边缘特征增强的CornerNet-Squeeze行人目标检测模型,在测试集验证效果:将改进的目标边缘特征增强CornerNet-Squeeze算法和原始CornerNet-Squeeze算法分别在深度 图像训练数据集、红外图像训练数据集和融合图像训练数据集上训练得到模型在本安型边缘计算设备进行部署测试和验证。
  2. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于:所述图像分解是将对齐后同样大小的红外图像和深度图像使用均值滤波器分别获得图像的基础层和细节层。
  3. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于:在所述图像融合中,对于基础层图像的融合采用算术平均策略融合。
  4. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于:在所述图像融合中,对于细节层图像的融合使用加权平均的策略进行融合。
  5. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于:在所述图像重构中,将融合后的深度图像和红外图像的基础层、深度图像和红外图像的细节层直接相加得到最终深度图像和红外图像的融合图像。
  6. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于:在融合图像形态学处理中,采用先腐蚀后膨胀形态学开运算处理融合后的图像。
  7. 根据权利要求1所述的一种基于图像融合和特征增强的煤矿井下行人检测方法,其特征在于,在CornerNet-Squeeze主干网络后引入八度卷积OctConv的特征增强模块,具体处理步骤如下:
    第2.1步骤、对主干网络提取的特征图进行卷积操作降维处理;
    第2.2步骤、降维后的特征图采用OctConv分离融合高低频特征信息;
    第2.3步骤、对输出的高频信息经过反卷积操作,还原特征尺寸。
PCT/CN2023/112201 2022-08-16 2023-08-10 一种基于图像融合和特征增强的煤矿井下行人检测方法 WO2024037408A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210980531.6A CN115311241B (zh) 2022-08-16 2022-08-16 一种基于图像融合和特征增强的煤矿井下行人检测方法
CN202210980531.6 2022-08-16

Publications (1)

Publication Number Publication Date
WO2024037408A1 true WO2024037408A1 (zh) 2024-02-22

Family

ID=83861943

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/112201 WO2024037408A1 (zh) 2022-08-16 2023-08-10 一种基于图像融合和特征增强的煤矿井下行人检测方法

Country Status (2)

Country Link
CN (1) CN115311241B (zh)
WO (1) WO2024037408A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117783051A (zh) * 2024-02-28 2024-03-29 西安尚展信息科技有限公司 一种基于多传感器数据融合的甲烷气体泄漏检测方法
CN117876836A (zh) * 2024-03-11 2024-04-12 齐鲁工业大学(山东省科学院) 基于多尺度特征提取和目标重建的图像融合方法
CN117876836B (zh) * 2024-03-11 2024-05-24 齐鲁工业大学(山东省科学院) 基于多尺度特征提取和目标重建的图像融合方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311241B (zh) * 2022-08-16 2024-04-23 天地(常州)自动化股份有限公司 一种基于图像融合和特征增强的煤矿井下行人检测方法
CN117556978A (zh) * 2023-12-29 2024-02-13 天地(常州)自动化股份有限公司北京分公司 一种基于大数据分析的煤矿井下运维方法及系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111582080A (zh) * 2020-04-24 2020-08-25 杭州鸿泉物联网技术股份有限公司 车辆360度环视监控实现方法及装置
CN111814595A (zh) * 2020-06-19 2020-10-23 武汉工程大学 基于多任务学习的低光照行人检测方法及系统
CN112364883A (zh) * 2020-09-17 2021-02-12 福州大学 一种基于单阶段目标检测和deeptext识别网络的美式车牌识别方法
EP3838427A1 (en) * 2019-12-20 2021-06-23 IHP Systems A/S A method for sorting objects travelling on a conveyor belt
CN114359838A (zh) * 2022-01-14 2022-04-15 北京理工大学重庆创新中心 一种基于高斯交叉注意力网络的跨模态行人检测方法
CN115311241A (zh) * 2022-08-16 2022-11-08 天地(常州)自动化股份有限公司 一种基于图像融合和特征增强的煤矿井下行人检测方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458133A (zh) * 2019-08-19 2019-11-15 电子科技大学 基于生成式对抗网络的轻量级人脸检测方法
CN110795991B (zh) * 2019-09-11 2023-03-31 西安科技大学 一种基于多信息融合的矿用机车行人检测方法
CN111986225A (zh) * 2020-08-14 2020-11-24 山东大学 一种基于角点检测和孪生网络的多目标跟踪方法及装置
CN112115871B (zh) * 2020-09-21 2024-04-19 大连民族大学 适用于行人目标检测的高低频交织边缘特征增强方法
CN112434715B (zh) * 2020-12-10 2022-07-22 腾讯科技(深圳)有限公司 基于人工智能的目标识别方法、装置及存储介质
CN113408593A (zh) * 2021-06-05 2021-09-17 桂林电子科技大学 一种基于改进的ResNeSt卷积神经网络模型的糖尿病性视网膜病变图像分类方法
CN114241511B (zh) * 2021-10-21 2024-05-03 西安科技大学 一种弱监督行人检测方法、系统、介质、设备及处理终端

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3838427A1 (en) * 2019-12-20 2021-06-23 IHP Systems A/S A method for sorting objects travelling on a conveyor belt
CN111582080A (zh) * 2020-04-24 2020-08-25 杭州鸿泉物联网技术股份有限公司 车辆360度环视监控实现方法及装置
CN111814595A (zh) * 2020-06-19 2020-10-23 武汉工程大学 基于多任务学习的低光照行人检测方法及系统
CN112364883A (zh) * 2020-09-17 2021-02-12 福州大学 一种基于单阶段目标检测和deeptext识别网络的美式车牌识别方法
CN114359838A (zh) * 2022-01-14 2022-04-15 北京理工大学重庆创新中心 一种基于高斯交叉注意力网络的跨模态行人检测方法
CN115311241A (zh) * 2022-08-16 2022-11-08 天地(常州)自动化股份有限公司 一种基于图像融合和特征增强的煤矿井下行人检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LI XUE-MENG, YANG DA-WEI, MAO LIN: "Object Edge Feature Enhancement Detection Algorithm", JOURNAL OF DALIAN MINZU UNIVERSITY., vol. 22, no. 1, 1 January 2020 (2020-01-01), pages 47 - 50, XP093140274, DOI: 10.13744/j.cnki.cn21-1431/g4.2020.01.010 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117783051A (zh) * 2024-02-28 2024-03-29 西安尚展信息科技有限公司 一种基于多传感器数据融合的甲烷气体泄漏检测方法
CN117876836A (zh) * 2024-03-11 2024-04-12 齐鲁工业大学(山东省科学院) 基于多尺度特征提取和目标重建的图像融合方法
CN117876836B (zh) * 2024-03-11 2024-05-24 齐鲁工业大学(山东省科学院) 基于多尺度特征提取和目标重建的图像融合方法

Also Published As

Publication number Publication date
CN115311241A (zh) 2022-11-08
CN115311241B (zh) 2024-04-23

Similar Documents

Publication Publication Date Title
WO2024037408A1 (zh) 一种基于图像融合和特征增强的煤矿井下行人检测方法
CN110232380B (zh) 基于Mask R-CNN神经网络的火灾夜间场景复原方法
WO2019196130A1 (zh) 面向车载热成像行人检测的分类器训练方法和装置
CN108875608B (zh) 一种基于深度学习的机动车交通信号识别方法
CN113052210B (zh) 一种基于卷积神经网络的快速低光照目标检测方法
WO2019196131A1 (zh) 面向车载热成像行人检测的感兴趣区域过滤方法和装置
WO2021238019A1 (zh) 基于Ghost卷积特征融合神经网络实时车流量检测系统及方法
CN111709416B (zh) 车牌定位方法、装置、系统及存储介质
US20060067562A1 (en) Detection of moving objects in a video
Xu et al. Fast vehicle and pedestrian detection using improved Mask R-CNN
Zhong et al. Multi-scale feature fusion network for pixel-level pavement distress detection
CN104978567A (zh) 基于场景分类的车辆检测方法
CN110532937B (zh) 基于识别模型与分类模型进行列车前向目标精准识别的方法
CN103324958B (zh) 一种复杂背景下基于投影法和svm的车牌定位方法
Chen et al. YOLOv5-based vehicle detection method for high-resolution UAV images
CN104517095A (zh) 一种基于深度图像的人头分割方法
CN111915583A (zh) 复杂场景中基于车载红外热像仪的车辆和行人检测方法
Yao et al. Coupled multivehicle detection and classification with prior objectness measure
CN113486712B (zh) 一种基于深度学习的多人脸识别方法、系统和介质
CN113177439A (zh) 一种行人翻越马路护栏检测方法
CN111127355A (zh) 一种对缺损光流图进行精细补全的方法及其应用
CN112115767B (zh) 基于Retinex和YOLOv3模型的隧道异物检测方法
CN104077566B (zh) 基于颜色差分的卡口图片人脸检测方法
CN106920398A (zh) 一种智能车牌识别系统
Zhang et al. Chinese license plate recognition using machine and deep learning models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854310

Country of ref document: EP

Kind code of ref document: A1