CN116052120A - Excavator night object detection method based on image enhancement and multi-sensor fusion - Google Patents

Excavator night object detection method based on image enhancement and multi-sensor fusion Download PDF

Info

Publication number
CN116052120A
CN116052120A CN202310039503.9A CN202310039503A CN116052120A CN 116052120 A CN116052120 A CN 116052120A CN 202310039503 A CN202310039503 A CN 202310039503A CN 116052120 A CN116052120 A CN 116052120A
Authority
CN
China
Prior art keywords
image
pedestrian
data set
detection
vehicle data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310039503.9A
Other languages
Chinese (zh)
Inventor
迟文政
邹美塬
余嘉杰
陆波
孙立宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou University
Original Assignee
Suzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou University filed Critical Suzhou University
Priority to CN202310039503.9A priority Critical patent/CN116052120A/en
Publication of CN116052120A publication Critical patent/CN116052120A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/08Detecting or categorising vehicles
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an excavator night object detection method based on image enhancement and multi-sensor fusion, which comprises the steps of firstly collecting images of pedestrians and vehicles in an excavator night scene, constructing a pedestrian and vehicle data set, preprocessing the data set, carrying out coordinate marking on the preprocessed data set to generate an xml file containing coordinate information of pedestrians and vehicle frames, training data in the file to obtain trained data, carrying out object recognition detection and semantic segmentation on the trained data, carrying out joint calibration of a camera and a laser radar, projecting a three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of an object, and finally reversely mapping pedestrians or vehicles in a detection frame back to the three-dimensional Lei Dadian cloud according to the corresponding relation between the radar and the image coordinate system and marking the pedestrians and the background in the detection frame, so that the accuracy of object recognition and segmentation is improved, and meanwhile, accurate object space position positioning is provided.

Description

Excavator night object detection method based on image enhancement and multi-sensor fusion
Technical Field
The invention relates to the technical field of object detection, in particular to an excavator night object detection method based on image enhancement and multi-sensor fusion.
Background
The excavator is the most favored heavy equipment in the engineering machinery industry, and the application of the excavator plays an immeasurable role in saving manpower and improving working efficiency. At present, the driving of the traditional excavator still depends on manual observation, and the risk of construction in an unstructured environment is increased. Particularly when the excavator is operated in a relatively dim environment such as at night, there is a potential risk of both the cab and surrounding pedestrians or vehicles.
The object detection is the first step for realizing the intellectualization of the excavator, two main methods are mainly used, one is based on a computer vision detection technology, the main method comprises a traditional HOG+SVM method, the method has limitations, the good detection effect can be realized only under the condition that pedestrians are not shielded and are upright, and the problem of multiple dimensions and multiple postures of the pedestrians under the working condition of the excavator can not be met. In addition, there are YOLO series algorithms that are popular in the current robot field, and such methods have fast detection speed but lack depth distance information, which is unfavorable for night detection. While computer vision detection techniques are currently mature, it is apparent that such methods rely largely on the image quality of the camera, and that conventional RGB cameras cannot function properly at night, which also results in a significant reduction in the detection efficiency of the detection algorithm at night. Another method is to use a laser radar sensor to detect the point cloud of an object by radar, so as to design an anti-collision function and avoid obstacles. Although lidar has the advantage of being unaffected by illumination and having accurate three-dimensional depth information, such methods cannot accurately distinguish pedestrians from other obstacles due to lack of image information.
Disclosure of Invention
Therefore, the invention aims to solve the technical problems that the detection accuracy of various objects at night is low and the accurate position of the object in the three-dimensional space cannot be obtained in the prior art.
In order to solve the technical problems, the invention provides an excavator night object detection method based on image enhancement and multi-sensor fusion, which comprises the following steps:
s1, collecting images of pedestrians and vehicles in an excavator night scene, and constructing a pedestrian and vehicle data set according to the collected images;
s2, preprocessing the pedestrian and vehicle data set, marking the preprocessed data set, and constructing a complete pedestrian and vehicle data set;
step S3, training the data in the complete pedestrian and vehicle data set;
s4, performing object recognition detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and performing detection and recognition on various objects in the complete pedestrian and vehicle data set according to the prediction feature map;
s5, calibrating the detected and identified various objects through the combination of a camera and a laser radar, and projecting a three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects;
and S6, reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the pedestrians or vehicles in the point cloud, and distinguishing pedestrians and background obstacles in the detection frame.
In one embodiment of the present invention, the capturing images of pedestrians and vehicles in the night scene of the excavator in the step S1 specifically includes:
the variety of the pedestrian images is a plurality of pedestrians, a plurality of blocked pedestrians, a plurality of squatting positions of the pedestrians and other multi-pose multi-scale images.
In one embodiment of the present invention, the method for preprocessing the pedestrian and vehicle data set in step S2 includes:
performing histogram equalization on the original infrared images of the pedestrian and vehicle data sets, changing the gray scale of each pixel in the images by changing the histograms of the images, and improving the image contrast;
convolving the original images of the pedestrian and vehicle data sets with a two-dimensional Gaussian function, performing weighted average on the images, and removing noise, wherein the used functions are as follows:
Figure BDA0004050512920000031
wherein ,Gσ (x, y) is a second order Gaussian function, σ ε R is the standard deviation of the Gaussian normal distribution;
setting different standard deviations, subtracting images of two adjacent Gaussian scale spaces to obtain a feature detection image, obtaining a Gaussian difference response value image, enhancing the image, and constructing a high-definition pedestrian and vehicle data set, wherein the formula is as follows:
Figure BDA0004050512920000032
wherein ,g1 (x,y),g 2 (x, y) is a second order gaussian function and f (x, y) is the coordinates of the midpoint of the input image.
In one embodiment of the present invention, the method for labeling the preprocessed dataset in step S2 includes:
labeling pedestrians and vehicles in the high-definition pedestrian and vehicle data set by using a labeling tool, labeling X-Y coordinates of the upper left corner and the lower right corner of a pedestrian and vehicle frame as shown in fig. 2, generating an xml file containing coordinate information of the pedestrian and vehicle frame, and constructing the complete pedestrian and vehicle data set.
In one embodiment of the present invention, the training method for the data in the complete pedestrian and vehicle dataset in step S3 is as follows:
training the data in the xml file by utilizing a YOLO-v5 target detection algorithm, and performing iterative training of set rounds to obtain trained data.
In one embodiment of the present invention, the method for obtaining the prediction feature map of the complete pedestrian and vehicle dataset in step S4 includes:
extracting features by using a Backbone network part of a backlight of a YOLO-v5 algorithm to obtain an original input feature map;
extracting context information by using a pyramid pooling module in a PSPNet pyramid scene analysis network to perform semantic segmentation, dividing the pyramid pooling model into a plurality of layers, and fusing feature graphs extracted from the layers into global features;
splicing the original input feature map and the global feature, and extracting a feature map carrying local and global context information at the same time;
a predictive feature map is generated by a layer of convolution.
In one embodiment of the present invention, the method for obtaining the two-dimensional position coordinates of the object in the step S5 is:
coordinate information of the center points of the various objects identified through the combined calibration detection of the camera and the laser radar under the pixel plane is obtained, the mapping relation between an image coordinate system and a camera coordinate system and between the image coordinate system and a radar coordinate system is shown in fig. 4, and the mapping relation is as follows:
Figure BDA0004050512920000041
wherein ,
Figure BDA0004050512920000042
for the coordinates of the object center point under the radar coordinate system,/-for>
Figure BDA0004050512920000043
At camera coordinates for the object center pointCoordinates of the system,/->
Figure BDA0004050512920000044
The coordinates of the center point of the object under the image coordinate system;
the three-dimensional Lei Dadian cloud-to-two-dimensional position coordinate conversion relationship is as follows:
Figure BDA0004050512920000045
wherein ,fx A pixel length f which is a focal length in the x-axis direction y A pixel length, c, being the focal length in the y-axis direction x ,c y For the translational dimension of the camera origin, f x ,f y ,c x ,c y As an internal reference of the camera, R is a 3×3 rotation matrix, and T is a 3×1 translation vector;
according to the conversion relation, the point cloud in the radar is projected onto an image, and an error calculation formula is as follows:
Figure BDA0004050512920000046
wherein ,
Figure BDA0004050512920000047
is the actual pixel coordinates in the image, (x u,i ,y u,i ) The pixel coordinates in the image are projected for the point cloud.
In one embodiment of the present invention, the method for distinguishing the pedestrian from the background obstacle in the detection frame in the step S6 is as follows:
all original laser radar point clouds are reserved;
extracting and storing point clouds displayed in the infrared image visual field range in the original laser radar point clouds by setting the distance;
projecting the point cloud on a two-dimensional image plane through a rotation matrix R and a translation vector T which are obtained through joint calibration, enabling the point cloud to coincide with the object position in the image, and recording the number of each point cloud;
operating the recognition and segmentation based on the YOLO-v5 backbone network, acquiring the coordinate range of an object in an image coordinate system, recording the number of each point cloud in the object detection frame, and realizing two-dimensional to three-dimensional inverse mapping by indexing the corresponding number in the radar point cloud to acquire the depth information of the object;
and (3) carrying out color marking on the point cloud solved by the inverse mapping, and distinguishing the point cloud of the obstacle irrelevant to the background.
In one embodiment of the present invention, after the step S6 is completed, the method further includes:
different warning thresholds are set according to pedestrians and background obstacles in the distinguished detection frame, and audible and visual alarms with different degrees are triggered through the change of the threshold value of the object from the excavator, so that sectional early warning according to the distance is realized.
The invention also provides a night object detection device of the excavator, which comprises:
the information acquisition module is used for acquiring images of pedestrians and vehicles in the night scene of the excavator and constructing a pedestrian and vehicle data set according to the acquired images;
the image processing module is used for preprocessing the pedestrian and vehicle data set, labeling the preprocessed data set and constructing a complete pedestrian and vehicle data set;
the data training module is used for training the data in the complete pedestrian and vehicle data set;
the detection and identification module is used for carrying out object identification detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and carrying out detection and identification on various objects in the complete pedestrian and vehicle data set according to the prediction feature map;
the acquisition module is used for jointly calibrating the detected and identified various objects through a camera and a laser radar, and projecting the three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects;
and the position determining module is used for reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the point cloud, and distinguishing the pedestrians and the background obstacles in the detection frame.
Correspondingly, the embodiment of the invention also provides a detection device, which comprises:
a memory for storing a computer program;
and the processor is used for calling the computer program stored in the memory and executing the excavator night object detection method based on image enhancement and multi-sensor fusion according to the program.
Correspondingly, the embodiment of the invention also provides a computer readable nonvolatile storage medium, which comprises computer readable instructions, and when the computer reads and executes the computer readable instructions, the computer is caused to execute the excavator night object detection method based on image enhancement and multi-sensor fusion.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the invention discloses an excavator night object detection method based on image enhancement and multi-sensor fusion, which comprises the steps of firstly collecting images of pedestrians and vehicles in an excavator night scene, constructing a pedestrian and vehicle data set, preprocessing the data set, carrying out coordinate marking on the preprocessed data set to generate an xml file containing coordinate information of pedestrians and vehicle frames, training data in the file to obtain trained data, carrying out object recognition detection and semantic segmentation on the trained data, carrying out joint calibration of a camera and a laser radar, projecting three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of an object, and finally reversely mapping pedestrians or vehicles in a detection frame back to the three-dimensional Lei Dadian cloud and marking according to the corresponding relation between the radar and the image coordinate system to distinguish pedestrians and backgrounds in the detection frame, thereby not only improving the accuracy of object recognition and segmentation, but also providing accurate object space position positioning.
Drawings
In order that the invention may be more readily understood, a more particular description of the invention will be rendered by reference to specific embodiments thereof which are illustrated in the appended drawings, in which:
FIG. 1 is a flow chart provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of a labeling calibration tool provided by an embodiment of the present invention;
FIG. 3 is a graph comparing enhanced results provided by embodiments of the present invention;
FIG. 4 is a diagram of the conversion relationship between an image and a camera and a radar coordinate system according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a detection device according to an embodiment of the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings and specific examples, which are not intended to be limiting, so that those skilled in the art will better understand the invention and practice it.
Example 1
As shown in fig. 1, the method for detecting an excavator night object based on image enhancement and multi-sensor fusion according to the embodiment specifically includes:
s1, collecting images of pedestrians and vehicles in an excavator night scene, and constructing a pedestrian and vehicle data set according to the collected images; s2, preprocessing the pedestrian and vehicle data sets, marking the preprocessed data sets, and constructing complete pedestrian and vehicle data sets; s3, training the data in the complete pedestrian and vehicle data set; s4, performing object recognition detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and performing detection and recognition on various objects in the complete pedestrian and vehicle data set according to the prediction feature map; s5, detecting various identified objects through joint calibration of a camera and a laser radar, and projecting a three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects; and S6, reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the pedestrians or vehicles in the point cloud, and distinguishing pedestrians and background obstacles in the detection frame.
According to the night object detection method for the excavator based on image enhancement and multi-sensor fusion, the pedestrian and vehicle data sets are preprocessed, noise and redundant information can be removed, image contrast is improved, the data sets of the pedestrian and the vehicle are clearer, and the accuracy of object identification and segmentation is effectively improved; training the data in the complete pedestrian and vehicle data set, and obviously improving the accuracy of image identification after training; object identification detection and semantic segmentation are carried out on the trained data to obtain a complete predicted feature map of the pedestrian and vehicle data set, and the method is favorable for marking various objects and annotating names of the corresponding objects through the complete predicted feature map of the pedestrian and vehicle data set, so that detection and identification of various objects in an image are realized; by acquiring the two-dimensional position coordinates of the object, the pedestrian and the background obstacle in the detection frame are distinguished, accurate spatial position information is provided, accurate judgment of an excavator driver is facilitated, and the pedestrian and the background in the detection frame are distinguished.
The method for preprocessing the pedestrian and vehicle data set in the step S2 is as follows: performing histogram equalization on the original infrared images of the pedestrian and vehicle data sets, changing the gray scale of each pixel in the images by changing the histograms of the images, and improving the image contrast; convolving the original images of the pedestrian and vehicle data sets with a two-dimensional Gaussian function, performing weighted average on the images, and removing noise, wherein the used functions are as follows:
Figure BDA0004050512920000081
wherein ,Gσ (x, y) is a second order Gaussian function, σ ε R is the standard deviation of the Gaussian normal distribution;
setting different standard deviations, subtracting images of two adjacent Gaussian scale spaces to obtain a feature detection image, obtaining a Gaussian difference response value image, enhancing the image, and constructing a high-definition pedestrian and vehicle data set, wherein the formula is as follows:
Figure BDA0004050512920000082
wherein ,g1 (x,y),g 2 (x, y) is a second order gaussian function and f (x, y) is the coordinates of the midpoint of the input image.
By preprocessing the pedestrian and vehicle data sets, noise and redundant information are removed, image contrast is improved, the pedestrian and vehicle data sets are clearer, the object detection performance is not affected by excessive illumination, the quality of infrared images is improved, and the accuracy of object identification and segmentation is improved.
The method for labeling the preprocessed data set in the step S2 includes: and labeling pedestrians and vehicles in the high-definition pedestrian and vehicle data set by using a labeling tool, and labeling the coordinates X-Y of the upper left corner and the lower right corner of the pedestrian and vehicle frame as shown in fig. 2, so as to generate an xml file containing the coordinate information of the pedestrian and vehicle frame, and constructing a complete pedestrian and vehicle data set. The method for training the data in the complete pedestrian and vehicle data set in step S3 includes: training data in the xml file by utilizing the YOLO-v5 target detection algorithm, and performing iterative training of set rounds to obtain trained data.
The accuracy of image recognition detection is improved by marking the preprocessed data set and training the data in the complete pedestrian and vehicle data set, the marked coordinates are shown in fig. 3, the Enhanced result is compared with the graph shown in fig. 3, wherein the horizontal axis Epochs is the iteration number 300, the vertical axis Precision is the accuracy, the Raw Images are the original image folding lines, the Enhanced Images are the Enhanced image folding lines, and the accuracy of image recognition after training is obviously improved.
The method for obtaining the prediction feature map of the complete pedestrian and vehicle data set in the step S4 is as follows: extracting features by using a Backbone network part of a backlight of a YOLO-v5 target detection algorithm to obtain an original input feature map; extracting context information by using a pyramid pooling module in a PSPNet pyramid scene analysis network to perform semantic segmentation, dividing the pyramid pooling model into a plurality of layers, and fusing feature graphs extracted from the layers into global features; splicing the original input feature map and the global feature, and extracting a feature map carrying local and global context information at the same time; a predictive feature map is generated by a layer of convolution.
Through the complete predictive feature graphs of the pedestrian and vehicle data sets, various objects are marked in the form of rectangular frames and names of the corresponding objects are annotated, so that detection and identification of various objects in the image are realized.
The method for obtaining the two-dimensional position coordinates of the object in the step S5 comprises the following steps: coordinate information of the center points of the various objects identified through the combined calibration detection of the camera and the laser radar under the pixel plane is obtained, the mapping relation between an image coordinate system and a camera coordinate system and between the image coordinate system and a radar coordinate system is shown in fig. 4, and the mapping relation is as follows:
Figure BDA0004050512920000101
wherein ,
Figure BDA0004050512920000102
for the coordinates of the object center point under the radar coordinate system,/-for>
Figure BDA0004050512920000103
For the coordinates of the object center point under the camera coordinate system,/->
Figure BDA0004050512920000104
The coordinates of the center point of the object under the image coordinate system;
the three-dimensional Lei Dadian cloud-to-two-dimensional position coordinate conversion relationship is as follows:
Figure BDA0004050512920000105
wherein ,fx A pixel length f which is a focal length in the x-axis direction y A pixel length, c, being the focal length in the y-axis direction x ,c y For the translational dimension of the camera origin, f x ,f y ,c x ,c y As an internal reference of the camera, R is a 3×3 rotation matrix, and T is a 3×1 translation vector;
according to the conversion relation, the point cloud in the radar is projected onto the image, and the error calculation formula is as follows:
Figure BDA0004050512920000106
wherein ,
Figure BDA0004050512920000107
is the actual pixel coordinates in the image, (x u,i ,y u,i ) The pixel coordinates in the image are projected for the point cloud.
The method for distinguishing the pedestrian from the background obstacle in the detection frame in the step S6 is as follows: all original laser radar point clouds are reserved; extracting and storing point clouds displayed in the infrared image visual field range in the original laser radar point clouds by setting the distance; projecting the point cloud on a two-dimensional image plane through a rotation matrix R and a translation vector T which are obtained through joint calibration, enabling the point cloud to coincide with the object position in the image, and recording the number of each point cloud; the serial numbers of each point cloud recorded in the object detection frame are used for realizing two-dimensional to three-dimensional inverse mapping through indexing the corresponding serial numbers in the radar point cloud, and obtaining the depth information of the object; and (3) carrying out color marking on the point cloud solved by the inverse mapping, and distinguishing the point cloud of the obstacle irrelevant to the background. By acquiring the two-dimensional position coordinates of the object, pedestrians and background obstacles in the detection frame are distinguished, accurate spatial position information is provided, and judgment of an excavator driver is facilitated.
In the excavator night object detection method based on image enhancement and multi-sensor fusion, different warning thresholds are set for pedestrians and background obstacles in the distinguished detection frame, and sound and light alarms of different degrees are triggered through the change of the object distance threshold value of the excavator, so that sectional early warning according to the distance is realized.
In the following discussion of the detection accuracy of the excavator night object detection method based on image enhancement and multi-sensor fusion in combination with data, table 1 shows that the detection accuracy of the data set before and after image enhancement, the average accuracy of the whole data set and the detection accuracy of two typical objects of pedestrians and vehicles in the excavator scene are improved. The overall accuracy is improved by 10.2% after the image is enhanced, the detection accuracy of pedestrians at night is improved by 10.8%, and the detection accuracy of vehicles is improved by 9.6%.
TABLE 1
Original picture After reinforcement
All objects 76.2% 86.4%
Pedestrian 76.9% 87.7%
Vehicle with a vehicle body having a vehicle body support 75.4% 85%
Table 2 shows that the detection accuracy of the pedestrian in various states is kept at about 95% as shown in the experimental data, and the average detection time is only about 0.014s, so that the detection requirement of the excavator under the working condition can be met.
TABLE 2
Figure BDA0004050512920000111
Figure BDA0004050512920000121
Example two
Based on the same inventive concept, the present embodiment provides an excavator night object detection device based on image enhancement and multi-sensor fusion, the principle of which solves the problem is similar to that of the excavator night object detection method based on image enhancement and multi-sensor fusion, and the repetition is not repeated.
Fig. 5 shows an excavator night object detection device based on image enhancement and multi-sensor fusion, which comprises:
the information acquisition module is used for acquiring images of pedestrians and vehicles in the night scene of the excavator and constructing a pedestrian and vehicle data set according to the acquired images;
the image processing module is used for preprocessing the pedestrian and vehicle data set, labeling the preprocessed data set and constructing a complete pedestrian and vehicle data set;
the data training module is used for training the data in the complete pedestrian and vehicle data set;
the detection and identification module is used for carrying out object identification detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and carrying out detection and identification on various objects in the complete pedestrian and vehicle data set according to the prediction feature map;
the acquisition module is used for detecting the identified various objects through joint calibration of the camera and the laser radar, and projecting the three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects;
and the position determining module is used for reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the point cloud, and distinguishing the pedestrians and the background obstacles in the detection frame.
Example III
The embodiment also provides a detection device, including:
a memory for storing a computer program;
a processor, configured to implement the steps of the method for detecting an excavator night object based on image enhancement and multi-sensor fusion according to the first embodiment when executing the computer program.
The invention also provides a computer readable nonvolatile storage medium, which comprises computer readable instructions, wherein when the computer reads and executes the computer readable instructions, the computer is caused to execute the method for detecting the night object of the excavator based on image enhancement and multi-sensor fusion.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It is apparent that the above examples are given by way of illustration only and are not limiting of the embodiments. Other variations and modifications of the present invention will be apparent to those of ordinary skill in the art in light of the foregoing description. It is not necessary here nor is it exhaustive of all embodiments. And obvious variations or modifications thereof are contemplated as falling within the scope of the present invention.

Claims (10)

1. An excavator night object detection method based on image enhancement and multi-sensor fusion is characterized by comprising the following steps:
s1, collecting images of pedestrians and vehicles in an excavator night scene, and constructing a pedestrian and vehicle data set according to the collected images;
s2, preprocessing the pedestrian and vehicle data set, marking the preprocessed data set, and constructing a complete pedestrian and vehicle data set;
step S3, training the data in the complete pedestrian and vehicle data set;
s4, performing object recognition detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and performing detection and recognition on various objects in the complete pedestrian and vehicle data set according to the prediction feature map;
s5, calibrating the detected and identified various objects through the combination of a camera and a laser radar, and projecting a three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects;
and S6, reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the pedestrians or vehicles in the point cloud, and distinguishing pedestrians and background obstacles in the detection frame.
2. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 1, wherein the method for preprocessing the pedestrian and the vehicle data set in the step S2 is as follows:
performing histogram equalization on the original infrared images of the pedestrian and vehicle data sets, changing the gray scale of each pixel in the images by changing the histograms of the images, and improving the image contrast;
convolving the original images of the pedestrian and vehicle data sets with a two-dimensional Gaussian function, performing weighted average on the images, and removing noise, wherein the used functions are as follows:
Figure FDA0004050512910000021
wherein ,Gσ (x, y) is a second order Gaussian function, σ ε R is the standard deviation of the Gaussian normal distribution;
setting different standard deviations, subtracting images of two adjacent Gaussian scale spaces to obtain a feature detection image, obtaining a Gaussian difference response value image, enhancing the image, and constructing a high-definition pedestrian and vehicle data set, wherein the formula is as follows:
Figure FDA0004050512910000022
wherein ,g1 (x,y),g 2 (x, y) is a second order gaussian function and f (x, y) is the coordinates of the midpoint of the input image.
3. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 1, wherein the method for labeling the preprocessed data in the step S2 is as follows:
labeling pedestrians and vehicles in the high-definition pedestrian and vehicle data set by using a labeling tool, labeling X-Y coordinates of the upper left corner and the lower right corner of a pedestrian and vehicle frame, generating an xml file containing coordinate information of the pedestrian and vehicle frame, and constructing the complete pedestrian and vehicle data set.
4. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 3, wherein the training of the data in the complete pedestrian and vehicle data set in the step S3 is as follows:
training the data in the xml file by utilizing a YOLO-v5 target detection algorithm, and performing iterative training of set rounds to obtain trained data.
5. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 1, wherein the method for obtaining the prediction feature map of the complete pedestrian and vehicle data set in the step S4 comprises the following steps:
extracting features by using a Backbone network part of a backlight of a YOLO-v5 target detection algorithm to obtain an original input feature map;
extracting context information by using a pyramid pooling module in a PSPNet pyramid scene analysis network to perform semantic segmentation, dividing the pyramid pooling model into a plurality of layers, and fusing feature graphs extracted from the layers into global features;
splicing the original input feature map and the global feature, and extracting a feature map carrying local and global context information at the same time;
a predictive feature map is generated by a layer of convolution.
6. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 1, wherein the method for obtaining the two-dimensional position coordinates of the object in the step S5 is as follows:
coordinate information of the center points of the various objects identified through the combined calibration detection of the camera and the laser radar under the pixel plane is obtained, and the mapping relation between an image coordinate system and a camera coordinate system and between the image coordinate system and a radar coordinate system is as follows:
Figure FDA0004050512910000031
wherein ,
Figure FDA0004050512910000032
for the coordinates of the object center point under the radar coordinate system,/-for>
Figure FDA0004050512910000033
For the coordinates of the object center point under the camera coordinate system,/->
Figure FDA0004050512910000034
The coordinates of the center point of the object under the image coordinate system;
the three-dimensional Lei Dadian cloud-to-two-dimensional position coordinate conversion relationship is as follows:
Figure FDA0004050512910000035
wherein ,fx A pixel length f which is a focal length in the x-axis direction y A pixel length, c, being the focal length in the y-axis direction x ,c y For the translational dimension of the camera origin, f x ,f y ,c x ,c y As an internal reference of the camera, R is a 3×3 rotation matrix, and T is a 3×1 translation vector;
according to the conversion relation, the point cloud in the radar is projected onto the image, and the error calculation formula is as follows:
Figure FDA0004050512910000041
wherein ,
Figure FDA0004050512910000042
is the actual pixel coordinates in the image, (x u,i ,y u,i ) The pixel coordinates in the image are projected for the point cloud.
7. The method for detecting the night object of the excavator based on the image enhancement and the multi-sensor fusion according to claim 1, wherein the method for distinguishing the pedestrian from the background obstacle in the detection frame in the step S6 is as follows:
all original laser radar point clouds are reserved;
extracting and storing point clouds displayed in the infrared image visual field range in the original laser radar point clouds by setting the distance;
projecting the point cloud on a two-dimensional image plane through a rotation matrix R and a translation vector T which are obtained through joint calibration, enabling the point cloud to coincide with the object position in the image, and recording the number of each point cloud;
the serial numbers of each point cloud recorded in the object detection frame are used for realizing two-dimensional to three-dimensional inverse mapping through indexing the corresponding serial numbers in the radar point cloud, and obtaining the depth information of the object;
and (3) carrying out color marking on the point cloud solved by the inverse mapping, and distinguishing the point cloud of the obstacle irrelevant to the background.
8. The method for detecting an excavator night object based on image enhancement and multi-sensor fusion according to claim 1, further comprising, after the step S6 is completed:
different warning thresholds are set according to pedestrians and background obstacles in the distinguished detection frame, and audible and visual alarms with different degrees are triggered through the change of the threshold value of the object from the excavator, so that sectional early warning according to the distance is realized.
9. An excavator night object detection device based on image enhancement and multi-sensor fusion, comprising:
the information acquisition module is used for acquiring images of pedestrians and vehicles in the night scene of the excavator and constructing a pedestrian and vehicle data set according to the acquired images;
the image processing module is used for preprocessing the pedestrian and vehicle data set, labeling the preprocessed data set and constructing a complete pedestrian and vehicle data set;
the data training module is used for training the data in the complete pedestrian and vehicle data set;
the detection and identification module is used for carrying out object identification detection and semantic segmentation on the trained data to obtain a prediction feature map of the complete pedestrian and vehicle data set, and carrying out detection and identification on various objects in the complete pedestrian and vehicle data set according to the prediction feature map;
the acquisition module is used for jointly calibrating the detected and identified various objects through a camera and a laser radar, and projecting the three-dimensional radar point cloud on an image plane to obtain two-dimensional position coordinates of the objects;
and the position determining module is used for reversely mapping the pedestrians or vehicles in the detection frame into a three-dimensional Lei Dadian cloud according to the two-dimensional position coordinates and the corresponding relation between the radar and the image coordinate system, marking the point cloud, and distinguishing the pedestrians and the background obstacles in the detection frame.
10. A detection apparatus, characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of an excavator night object detection method based on image enhancement and multi-sensor fusion as claimed in any one of claims 1 to 8 when executing the computer program.
CN202310039503.9A 2023-01-12 2023-01-12 Excavator night object detection method based on image enhancement and multi-sensor fusion Pending CN116052120A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310039503.9A CN116052120A (en) 2023-01-12 2023-01-12 Excavator night object detection method based on image enhancement and multi-sensor fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310039503.9A CN116052120A (en) 2023-01-12 2023-01-12 Excavator night object detection method based on image enhancement and multi-sensor fusion

Publications (1)

Publication Number Publication Date
CN116052120A true CN116052120A (en) 2023-05-02

Family

ID=86117769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310039503.9A Pending CN116052120A (en) 2023-01-12 2023-01-12 Excavator night object detection method based on image enhancement and multi-sensor fusion

Country Status (1)

Country Link
CN (1) CN116052120A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663761A (en) * 2023-06-25 2023-08-29 昆明理工大学 Pseudo-ginseng chinese-medicinal material low-loss excavation system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
US20220277557A1 (en) * 2020-05-08 2022-09-01 Quanzhou equipment manufacturing research institute Target detection method based on fusion of vision, lidar, and millimeter wave radar

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
US20220277557A1 (en) * 2020-05-08 2022-09-01 Quanzhou equipment manufacturing research institute Target detection method based on fusion of vision, lidar, and millimeter wave radar

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HENGSHUANG ZHAO 等: "Pyramid Scene Parsing Network", ARXIV, pages 1 - 11 *
MEIYUAN ZOU 等: "Active Pedestrian Detection for Excavator Robots based on Multi-Sensor Fusion", IEEE, pages 255 - 260 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663761A (en) * 2023-06-25 2023-08-29 昆明理工大学 Pseudo-ginseng chinese-medicinal material low-loss excavation system
CN116663761B (en) * 2023-06-25 2024-04-23 昆明理工大学 Pseudo-ginseng chinese-medicinal material low-loss excavation system

Similar Documents

Publication Publication Date Title
Dhiman et al. Pothole detection using computer vision and learning
CN111178236B (en) Parking space detection method based on deep learning
CN112967283B (en) Target identification method, system, equipment and storage medium based on binocular camera
US11379963B2 (en) Information processing method and device, cloud-based processing device, and computer program product
CN115049700A (en) Target detection method and device
CN116188999B (en) Small target detection method based on visible light and infrared image data fusion
CN112883790A (en) 3D object detection method based on monocular camera
CN112950725A (en) Monitoring camera parameter calibration method and device
CN111967396A (en) Processing method, device and equipment for obstacle detection and storage medium
CN114565675A (en) Method for removing dynamic feature points at front end of visual SLAM
Petrovai et al. A stereovision based approach for detecting and tracking lane and forward obstacles on mobile devices
CN111488808A (en) Lane line detection method based on traffic violation image data
CN116052120A (en) Excavator night object detection method based on image enhancement and multi-sensor fusion
CN111191482B (en) Brake lamp identification method and device and electronic equipment
CN114549542A (en) Visual semantic segmentation method, device and equipment
CN110197104B (en) Distance measurement method and device based on vehicle
CN111881752B (en) Guardrail detection classification method and device, electronic equipment and storage medium
Giosan et al. Superpixel-based obstacle segmentation from dense stereo urban traffic scenarios using intensity, depth and optical flow information
CN109101874B (en) Library robot obstacle identification method based on depth image
CN114648639B (en) Target vehicle detection method, system and device
CN113052118A (en) Method, system, device, processor and storage medium for realizing scene change video analysis and detection based on high-speed dome camera
Burlacu et al. Stereo vision based environment analysis and perception for autonomous driving applications
CN111539279A (en) Road height limit height detection method, device, equipment and storage medium
Zhu et al. Toward the ghosting phenomenon in a stereo-based map with a collaborative RGB-D repair
US20230419522A1 (en) Method for obtaining depth images, electronic device, and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination