CN116092042A - Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium - Google Patents

Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN116092042A
CN116092042A CN202210907845.3A CN202210907845A CN116092042A CN 116092042 A CN116092042 A CN 116092042A CN 202210907845 A CN202210907845 A CN 202210907845A CN 116092042 A CN116092042 A CN 116092042A
Authority
CN
China
Prior art keywords
image
mesh
obstacle
identified
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210907845.3A
Other languages
Chinese (zh)
Inventor
魏翼鹰
姜一阳
江澳
张渝沄
杨训鑑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210907845.3A priority Critical patent/CN116092042A/en
Publication of CN116092042A publication Critical patent/CN116092042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a mesh obstacle identification method, a device, electronic equipment and a computer storage medium, wherein the method comprises the steps of acquiring an image to be identified, wherein the image to be identified comprises a mesh obstacle; inputting the image to be identified into a complete semantic segmentation prediction model, and outputting a semantic segmentation graph of the image to be identified; acquiring a depth map of the image to be identified; and carrying out image fusion on the semantic segmentation map and the depth map, and carrying out pixel analysis on the fusion image to determine the depth information of the mesh obstacle. According to the invention, the semantic segmentation map and the depth map are subjected to image fusion, and the accurate depth information of the mesh obstacle is obtained by utilizing the accurate classification characteristic of the semantic segmentation map, so that the recognition precision of unmanned equipment on the mesh obstacle is improved, and the driving safety is ensured.

Description

Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium
Technical Field
The present invention relates to the field of computer vision, and in particular, to a method and apparatus for identifying a mesh obstacle, an electronic device, and a computer storage medium.
Background
The unmanned technology can be divided into three modules, namely perception, cognition and control, wherein the environment is accurately perceived at first, then information is processed, and finally an instruction is sent to a control system of an automobile to realize specific functions.
In the sensing part, there are a large number of sensors which work in coordination with each other, as many as possible acquire effective information to move the vehicle along the correct path, such as lidar, millimeter wave radar, ultrasonic radar, camera, inertial navigation (IMU), wheel odometer, etc., among which the most operable, expandable sensor is the camera, which is very widely used in unmanned driving because it is closest to the principle of human eye recognition environment, and attracts a large number of students, engineers to study.
In general, unmanned vehicles are required to run in daytime or under sufficient light conditions, and with the development of hardware technology, computers are getting more and more powerful, and the carried cameras can meet environmental perception tasks under most conditions. However, for mesh obstacle recognition, since mesh targets are too fine and meshes are often distributed horizontally, the meshes seen by the left and right cameras of the binocular camera have no obvious parallax, so that the mesh obstacle is difficult to recognize, and potential safety hazards exist.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a mesh obstacle recognition method, apparatus, electronic device, and computer storage medium for solving the problem of collision of unmanned equipment caused by low accuracy of mesh obstacle recognition in the prior art.
In order to solve the above-mentioned problems, the present invention provides, in a first aspect, a mesh obstacle recognition method including:
acquiring an image to be identified, wherein the image to be identified comprises a net-shaped obstacle;
inputting the image to be identified into a complete semantic segmentation prediction model, and outputting a semantic segmentation graph of the image to be identified;
acquiring a depth map of the image to be identified;
and carrying out image fusion on the semantic segmentation map and the depth map, and carrying out pixel analysis on the fusion image to determine the depth information of the mesh obstacle.
Further, the trained complete semantic segmentation prediction model is trained based on a PSPNet neural network;
the PSPNet neural network structure comprises a feature extraction sub-network, a pooling sub-network and a convolution sub-network.
Further, the training process of the semantic segmentation prediction model includes:
acquiring a picture set comprising a net-shaped obstacle, and labeling the picture set with a classification label to obtain a classification result set;
the picture sets and the classification results corresponding to each picture form a data set, wherein the data set comprises a training set, a testing set and a prediction set;
inputting the training set into a PSPNet neural network for training, acquiring trained model parameters after a preset loss condition is reached, loading the trained model parameters based on the PSPNet neural network, and finishing training of the semantic segmentation prediction model;
the training set is input into a PSPNet neural network for training, and specifically comprises the following steps:
extracting a first picture feature layer in the training set by utilizing the feature extraction sub-network;
carrying out pooling operation of different scales on the first picture feature layer by utilizing the pooling sub-network to obtain a second picture feature layer
And adjusting the characteristic layer number and the channel number of the second picture characteristic layer by utilizing the convolution sub-network so as to enable the output picture and the input picture to be the same in size.
Further, the obtaining the depth map of the image to be identified includes:
acquiring calibration parameters of a depth camera, and correcting the image to be identified according to the calibration parameters;
and matching the corrected images, and calculating the depth of each pixel point in the image to be identified according to the matching result so as to obtain a depth map of the image to be identified.
Further, the image fusion of the semantic segmentation map and the depth map includes:
constructing a camera coordinate system based on a depth camera;
and carrying out image fusion on the semantic segmentation map and the depth map in the camera coordinate system to obtain a fusion image.
Further, the pixel analysis of the fused image to determine depth information of the mesh obstacle includes:
and acquiring a histogram corresponding to the fusion image, carrying out pixel statistics on the histogram, and determining the depth information of the mesh obstacle based on a statistical result.
Further, the method further comprises:
and filling and repairing the depth map based on the depth information.
In a second aspect, the present invention also provides a mesh obstacle identifying apparatus, including:
the first acquisition module is used for acquiring an image to be identified, wherein the image to be identified comprises a net-shaped obstacle;
the output module is used for inputting the image to be identified into a complete semantic segmentation prediction model and outputting a semantic segmentation graph of the image to be identified;
the second acquisition module is used for acquiring the depth map of the image to be identified;
and the determining module is used for carrying out image fusion on the semantic segmentation map and the depth map and carrying out pixel analysis on the fusion image so as to determine the depth information of the mesh obstacle.
In a third aspect, the present invention also provides an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps in the mesh obstacle recognition method described above when executing the computer program.
In a fourth aspect, the present invention also provides a computer storage medium storing a computer program which, when executed by a processor, implements steps as in a mesh obstacle recognition method.
The beneficial effects of adopting the embodiment are as follows:
according to the method, the image to be identified is obtained in real time through the camera, then the semantic segmentation technology is used for obtaining the semantic segmentation image of the image to be identified, the net-shaped obstacle is accurately detected in real time, then the depth image generated by binocular stereoscopic imaging is corrected and supplemented according to the semantic segmentation image set, the identification precision of the unmanned equipment on the net-shaped obstacle is improved, so that a robot or an unmanned vehicle can comprehensively perceive the environment in a complex environment, accurate obstacle avoidance is realized, and driving safety is further improved.
Drawings
Fig. 1 is a flowchart illustrating an embodiment of a mesh obstacle recognition method according to the present invention;
FIG. 2 is a reference diagram of an image to be identified according to an embodiment of the present invention;
FIG. 3 is a reference diagram of a semantic segmentation map of an image to be identified according to an embodiment of the present invention;
FIG. 4 is an overall frame diagram of a PSPNet according to an embodiment of the present invention;
FIG. 5 is a label producing effect diagram according to an embodiment of the present invention;
FIG. 6 is a depth map of an image to be identified according to an embodiment of the present invention;
fig. 7 is a fusion diagram of a semantic segmentation diagram and a depth diagram of an image to be identified after image fusion according to an embodiment of the present invention;
FIG. 8 is a histogram corresponding to a partial region of a fused image according to an embodiment of the present invention;
FIG. 9 is a schematic diagram illustrating a mesh obstacle recognition device according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Preferred embodiments of the present invention will now be described in detail with reference to the accompanying drawings, which form a part hereof, and together with the description serve to explain the principles of the invention, and are not intended to limit the scope of the invention.
In the description of the present invention, the meaning of "a plurality" is two or more, unless explicitly defined otherwise. Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.
Vehicles in unmanned technology are generally required to run in daytime or under sufficient light, and with the development of hardware technology, computers are increasingly powerful, and depth cameras in unmanned technology can meet environmental perception tasks under most conditions. However, for mesh obstructions, it is difficult to identify the mesh obstruction because the mesh target is too fine and the mesh is often distributed horizontally, resulting in no apparent parallax to the mesh seen by the depth camera. The semantic segmentation map can represent the area where each object is located in the image, so that the mesh obstacle can be well identified by combining the depth map and the semantic segmentation map.
The invention provides a mesh obstacle recognition method, a mesh obstacle recognition device, electronic equipment and a computer storage medium, which are respectively described below.
Referring to fig. 1, fig. 1 is a flow chart of an embodiment of a mesh obstacle recognition method according to the present invention, and a mesh obstacle recognition method according to an embodiment of the present invention is disclosed, including:
step S101: acquiring an image to be identified, wherein the image to be identified comprises a net-shaped obstacle;
step S102: inputting the image to be identified into a complete semantic segmentation prediction model, and outputting a semantic segmentation graph of the image to be identified;
step S103: acquiring a depth map of an image to be identified;
step S104: and carrying out image fusion on the semantic segmentation map and the depth map, and carrying out pixel analysis on the fused image to determine the depth information of the mesh obstacle.
The image to be identified comprises an image carrying a mesh obstacle, and it is understood that the unmanned device can meet the environmental perception task under most conditions in the automatic driving process, and the unmanned device comprises, but is not limited to, a robot, an unmanned vehicle and the like. However, for the mesh obstacle, since the mesh target is too fine and the mesh is often horizontally distributed and difficult to identify, the mesh obstacle in the field of view of the unmanned device needs to be processed, so that the unmanned device can automatically run.
Specifically, an image to be identified can be obtained through a depth camera of the unmanned equipment, then the image to be identified is input into a fully trained semantic segmentation prediction model, and a semantic segmentation graph aiming at the image to be identified is output. Referring to fig. 2 and 3, fig. 2 is a reference diagram of an image to be identified according to an embodiment of the present invention, and fig. 3 is a reference diagram of a semantic segmentation diagram of an image to be identified according to an embodiment of the present invention. It will be appreciated that semantic representation is from concrete to abstract, semantic segmentation, meaning that a computer is caused to segment according to the semantics of an image, and in the image domain, semantics refer to the content of an image. The semantic segmentation map of the image to be identified is thus a presentation map by classifying the content in the image.
It will be appreciated that the depth information of the mesh obstacle obtained by the depth camera is inaccurate, as the mesh seen by the depth camera has no apparent parallax, whereas the classified region contents are well presented in the semantic segmentation map. Therefore, the depth image of the image to be recognized and the semantic image captured by the depth camera can be subjected to image fusion, then the fused image is subjected to pixel analysis, and the depth information of the mesh obstacle is further determined, so that the unmanned equipment can comprehensively sense the surrounding environment according to the depth information of the mesh obstacle, and the precise obstacle avoidance is realized.
According to the method, the image to be identified is obtained in real time through the camera, then the semantic segmentation technology is used for obtaining the semantic segmentation image of the image to be identified, the net-shaped obstacle is accurately detected in real time, then the depth image generated by binocular stereoscopic imaging is corrected and supplemented according to the semantic segmentation image set, the identification precision of the unmanned equipment on the net-shaped obstacle is improved, so that a robot or an unmanned vehicle can comprehensively perceive the environment in a complex environment, accurate obstacle avoidance is realized, and driving safety is further improved.
In one embodiment of the present application, training a complete semantic segmentation prediction model is based on a PSPNet neural network;
the structure of the PSPNet neural network comprises a feature extraction sub-network, a pooling sub-network and a convolution sub-network.
Firstly, it should be noted that the PSPNet (Pyramid Scene Parsing Network) pyramid scene analysis network is used as a neural network model for identifying mesh obstacles, and the core module of the model is a pyramid pooling module, which can divide a feature layer into grids with different sizes and aggregate context information of different areas, so as to improve the capability of acquiring global information. Referring to fig. 4, fig. 4 is an overall frame diagram of a PSPNet according to an embodiment of the present invention.
It should be noted that, because the technical solution in the present invention will be used in an embedded system or other mobile platforms, the requirement on computing performance cannot be too high, but a certain accuracy is also required, after the performance and the speed are balanced, the feature extraction subnetwork in the present invention selects the resnet50 network, and a backbone part can be used to obtain one feature layer after another as the input of the subsequent processing part.
Then, referring to the pooling sub-network, please refer to part c in fig. 4, four scale features are fused together, the top row is the coarsest global pooling, the bottom row is pooling of different scales, and after a series of processing, up-sampling is performed to restore the original size of the image, and then the images are stacked together to form the overall framework of the PSPNet.
It will be appreciated that by the above two steps we have obtained the characteristics of the input picture, and in order to obtain a picture of the same dimension as the input, a final channel adjustment is required, i.e. a convolutional sub-network is used, for example, a 3x3 convolutional is used to adjust the characteristic layer, a 1x1 convolutional is used to adjust the number of channels, and finally a size adjustment method is used to adjust the picture to be consistent with the input picture, so as to obtain a final semantic segmentation diagram.
In one embodiment of the present application, a training process of a semantic segmentation prediction model includes:
acquiring a picture set comprising the net-shaped barriers, and labeling the picture set with a classification label to obtain a classification result set;
the method comprises the steps that a picture set and a classification result corresponding to each picture form a data set, wherein the data set comprises a training set, a testing set and a prediction set;
and inputting the training set into a PSPNet neural network for training, acquiring trained model parameters after a preset loss condition is reached, loading the trained model parameters based on the PSPNet neural network, and finishing the training of the semantic segmentation prediction model.
It can be understood that unmanned equipment, such as an autopilot, is more used in factory transportation, and more barriers such as fence nets exist in the environment, so that a picture set including the net-shaped barriers mainly comes from the factory environment, after a certain number of photos are taken, classification tags can be marked on the picture set, and specifically, a tag tool software can be used for manually manufacturing tags, so that the effect is shown in fig. 5, and referring to fig. 5, fig. 5 is a tag manufacturing effect diagram provided in an embodiment of the invention. It can be understood that by manually labeling the labels, the method is equivalent to manually classifying the contents in the picture set, and further obtaining a data set composed of the pictures and the classification results corresponding to the pictures, so that the data set can be used for subsequent training.
In order to accelerate training and achieve better prediction effect, a method of transfer learning is adopted. In the pre-training model, 20 kinds of output are totally used, in order to simplify the model, the output classification is adjusted to 5 kinds in the training process, and the training process specifically comprises background, fence, ground, nylon net and person, so that the training speed is greatly increased, and the later prediction speed is also greatly improved.
The training set is then input into the built PSPNet neural network for training, and in one embodiment of the present invention,
the training set is input into the PSPNet neural network for training, and specifically comprises the following steps:
extracting a first picture feature layer in the training set by utilizing a feature extraction sub-network;
carrying out pooling operation of different scales on the first picture feature layer by using a pooling sub-network to obtain a second picture feature layer
And adjusting the feature layer number and the channel number of the second picture feature layer by utilizing the convolution sub-network so as to enable the sizes of the output picture and the input picture to be the same.
It can be understood that after the performance and the speed are balanced, the feature extraction sub-network selects a resnet50 network, and the first picture feature layer corresponding to the training set can be obtained by using the resnet50 network and used as the input of a subsequent processing part; the pooling sub-network carries out pooling operation of different scales on the first picture feature layer to obtain a second picture feature layer, and it can be understood that the pooling sub-network can divide the first feature layer into grids of different sizes, then aggregate context information of different areas to obtain global information, and finally generate the second picture feature layer, wherein the second picture feature layer is added with more detail information compared with the first picture feature layer, so that semantic content in a picture can be accurately identified; through the above two steps, we have acquired the characteristics of the input picture, and in order to obtain the picture with the same dimension as the input, the final channel adjustment is needed, that is, the characteristic layer is adjusted by using the convolution sub-network, for example, the 3x3 convolution is used, the channel number adjustment is performed by using the 1x1 convolution, and finally, the picture is adjusted to be consistent with the input picture by using the size adjustment method, so as to obtain the final semantic segmentation map.
After reaching the preset training condition, such as 100 times of iterative training, selecting the model file with the minimum comprehensive loss on the training set and the testing set from the model files generated by each generation to determine the parameters of the neural network model, and loading the trained model parameters to the constructed PSPNet model to complete the training of the semantic segmentation prediction model.
After training is completed, a prediction process is carried out, the process does not need back propagation, the neural network parameters do not need updating and learning, and the semantic segmentation pictures can be output after the original pictures are input into a semantic segmentation prediction model with complete training.
In one embodiment of the present invention, after training of the semantic segmentation prediction model is completed, the method further includes:
and evaluating the completely trained semantic segmentation prediction model by using a preset evaluation index to obtain an evaluation result.
The preset evaluation indexes comprise mIoU indexes, the mIoU indexes are used for evaluation, the mIoU is used for calculating the ratio of the intersection and the union of the two sets of the true value and the predicted value, and the evaluation result is 76.03% by performing the mIoU evaluation on the semantic segmentation prediction model trained above, so that the prediction effect of the model is better.
In one embodiment of the present application, obtaining a depth map of an image to be identified includes:
acquiring calibration parameters of the depth camera, and correcting an image to be identified according to the calibration parameters;
and matching the corrected images, and calculating the depth of each pixel point in the image to be identified according to the matching result so as to obtain a depth map of the image to be identified.
It should be noted that, currently, there are three main depth camera technologies: structured light, binocular vision and TOF time of flight methods. The invention adopts a binocular vision scheme, namely, the depth camera is a binocular camera, the binocular camera imitates the principle of human eye ranging, and the parallax of images acquired by the left camera and the right camera is utilized to restore the depth information of the picture. The binocular camera has strong light interference resistance, can work in an outdoor environment, has the lowest manufacturing cost, and can be combined with deep learning to further optimize imaging.
The calibration parameters of the depth camera respectively comprise the internal and external parameters of two cameras in the binocular camera and a homography matrix between the two cameras. The camera is calibrated to be understood as mapping from world coordinates to pixel coordinates, and the mapping relation between the world coordinates and the pixel coordinates is obtained by calibrating, so that the world coordinates can be reversely deduced by the pixel coordinates of the pixel points. The intrinsic parameters of the camera are parameters related to the characteristics of the camera, such as the focal length, pixel size, etc. of the camera; the external parameters of the camera include parameters of the camera in the world coordinate system, such as the position, rotation direction, etc. of the camera; the homography between two cameras describes the mapping between two planes, i.e. the transformation between two images at some points on a common plane.
It can be understood that when calculating the depth of the pixel, besides knowing the focal length and the base line of the camera in the parameters of the camera, the parallax between the two cameras needs to be known, that is, the corresponding relation between each pixel point of the left camera and the corresponding point of the right camera is known, that is, the corrected two images are subjected to pixel point matching, specifically, the homography matrix of the two cameras can be used for matching, or the polar constraint can be used for matching the pixel points, when the matching is completed, the parallax between the two cameras is obtained, the depth of each pixel can be calculated, and then the depth map about the image to be recognized is obtained. Referring to fig. 6, fig. 6 is a depth map of an image to be identified according to an embodiment of the present invention. The frame is a depth map of a net-shaped barrier, so that the effect is not ideal. This is because the mesh targets are too fine and the mesh is often distributed horizontally, resulting in no apparent parallax to the mesh seen by the left and right cameras of the binocular camera, and thus it is difficult to identify the mesh obstacle.
In one embodiment of the present application, image fusion of a semantic segmentation map with a depth map includes:
constructing a camera coordinate system based on a depth camera;
and carrying out image fusion on the semantic segmentation map and the depth map in a camera coordinate system to obtain a fusion image.
It can be understood that, since the depth map and the semantic segmentation map are both obtained by further processing the image to be identified captured by the depth camera, and the change of the coordinate system is not involved in the processing process, the depth map semantic segmentation map shares the same coordinate system, so that the depth map and the semantic segmentation map can be fused by using the camera coordinate system under the depth camera. Referring to fig. 7, fig. 7 is a fused image obtained by fusing a semantic segmentation image and a depth image of an image to be identified according to an embodiment of the present invention.
In one embodiment of the present application, performing pixel analysis on the fused image to determine depth information of the mesh obstacle includes:
and acquiring a histogram corresponding to the fusion image, carrying out pixel statistics on the histogram, and determining the depth information of the mesh obstacle based on a statistical result.
In the process of performing pixel analysis on the fused image, in order to improve analysis efficiency, a histogram corresponding to an area where the mesh obstacle is located may be obtained, specifically, the area where the mesh obstacle is located may be cut out from the fused image as an area of interest, and histogram pixel statistics may be performed. Referring to fig. 8, fig. 8 is a histogram corresponding to a fused image portion area according to an embodiment of the present invention.
The histogram shows that the pixel values are mainly distributed around two peaks, wherein the pixel value of the area where the peak P1 is located is highest, and the pixel value is the distance closest to the obstacle, namely the distance of the net-shaped obstacle; the peak P2 and the areas of lower pixel values are other objects behind the mesh. The depth information of the mesh obstacle can be recalculated based on the pixel value of the area where the peak P1 is located.
In one embodiment of the present application, the method further includes:
and filling and repairing the depth map based on the depth information.
It will be appreciated that after the depth information of the mesh obstacle is obtained, the depth map may be padded and repaired according to the depth information of the mesh obstacle, specifically replaced by the recalculated depth information of the mesh obstacle.
According to the depth information of the mesh obstacle generated by binocular stereoscopic imaging, the depth map is corrected and supplemented, so that a robot or an unmanned vehicle can comprehensively sense the environment in a complex environment, and the accurate obstacle avoidance is realized.
In order to better implement the mesh obstacle recognition method according to the embodiment of the present invention, referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of a mesh obstacle recognition device according to the present invention, where the mesh obstacle recognition device 900 further includes:
a first obtaining module 901, configured to obtain an image to be identified, where the image to be identified includes a mesh obstacle;
the output module 902 is configured to input the image to be identified into a complete semantic segmentation prediction model, and output a semantic segmentation map of the image to be identified;
a second obtaining module 903, configured to obtain a depth map of the image to be identified;
the determining module 904 is configured to perform image fusion on the semantic segmentation map and the depth map, and perform pixel analysis on the fused image to determine depth information of the mesh obstacle.
What needs to be explained here is: the apparatus 900 provided in the foregoing embodiments may implement the technical solutions described in the foregoing method embodiments, and the specific implementation principles of each module or unit may refer to the corresponding content in the foregoing method embodiments, which is not repeated herein.
Based on the above mesh obstacle recognition method, the embodiment of the invention further provides an electronic device, which includes: a processor and a memory, and a computer program stored in the memory and executable on the processor; the steps in the mesh obstacle recognition method of the embodiments described above are implemented when the processor executes a computer program.
A schematic structural diagram of an electronic device 1000 suitable for use in implementing embodiments of the present invention is shown in fig. 10. The electronic device in the embodiment of the present invention may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a car-mounted terminal (e.g., car navigation terminal), etc., and a stationary terminal such as a digital TV, a desktop computer, etc. The electronic device shown in fig. 10 is merely an example, and should not impose any limitation on the functionality and scope of use of embodiments of the present invention.
An electronic device includes: a memory and a processor, where the processor may be referred to as a processing device 1001 hereinafter, the memory may include at least one of a Read Only Memory (ROM) 1002, a Random Access Memory (RAM) 1003, and a storage device 1008 hereinafter, as specifically described below:
as shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, the ROM1002, and the RAM1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.
In general, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, and the like; an output device 1007 including, for example, a Liquid Crystal Display (LCD), speaker, vibrator, etc.; storage 1008 including, for example, magnetic tape, hard disk, etc.; and communication means 1009. The communication means 1009 may allow the electronic device 1000 to communicate wirelessly or by wire with other devices to exchange data. While fig. 10 shows an electronic device 1000 having various means, it is to be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may be implemented or provided instead.
In particular, according to embodiments of the present invention, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program embodied on a non-transitory computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication device 1009, or installed from the storage device 1008, or installed from the ROM 1002. When being executed by the processing means 1001, performs the above-described functions defined in the method of the embodiment of the present invention.
Based on the mesh obstacle recognition method, the embodiment of the present invention further provides a computer readable storage medium storing one or more programs, where the one or more programs may be executed by one or more processors to implement the steps in the mesh obstacle recognition method according to the foregoing embodiments.
Those skilled in the art will appreciate that all or part of the flow of the methods of the embodiments described above may be accomplished by way of a computer program to instruct associated hardware, where the program may be stored on a computer readable storage medium. Wherein the computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory, etc.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention.

Claims (10)

1. A method of identifying a mesh obstacle, comprising:
acquiring an image to be identified, wherein the image to be identified comprises a net-shaped obstacle;
inputting the image to be identified into a complete semantic segmentation prediction model, and outputting a semantic segmentation graph of the image to be identified;
acquiring a depth map of the image to be identified;
and carrying out image fusion on the semantic segmentation map and the depth map, and carrying out pixel analysis on the fusion image to determine the depth information of the mesh obstacle.
2. The mesh obstacle recognition method of claim 1, wherein the trained complete semantic segmentation prediction model is trained based on a PSPNet neural network;
the PSPNet neural network structure comprises a feature extraction sub-network, a pooling sub-network and a convolution sub-network.
3. The mesh obstacle recognition method of claim 2, wherein the training process of the semantic segmentation prediction model comprises:
acquiring a picture set comprising a net-shaped obstacle, and labeling the picture set with a classification label to obtain a classification result set;
the picture sets and the classification results corresponding to each picture form a data set, wherein the data set comprises a training set, a testing set and a prediction set;
inputting the training set into a PSPNet neural network for training, acquiring trained model parameters after a preset loss condition is reached, loading the trained model parameters based on the PSPNet neural network, and finishing training of the semantic segmentation prediction model;
the training set is input into a PSPNet neural network for training, and specifically comprises the following steps:
extracting a first picture feature layer in the training set by utilizing the feature extraction sub-network;
carrying out pooling operation of different scales on the first picture feature layer by utilizing the pooling sub-network to obtain a second picture feature layer
And adjusting the characteristic layer number and the channel number of the second picture characteristic layer by utilizing the convolution sub-network so as to enable the output picture and the input picture to be the same in size.
4. The mesh obstacle recognition method according to claim 1, wherein the acquiring the depth map of the image to be recognized includes:
acquiring calibration parameters of a depth camera, and correcting the image to be identified according to the calibration parameters;
and matching the corrected images, and calculating the depth of each pixel point in the image to be identified according to the matching result so as to obtain a depth map of the image to be identified.
5. The mesh obstacle recognition method of claim 1, wherein the image fusing the semantic segmentation map with the depth map comprises:
constructing a camera coordinate system based on a depth camera;
and carrying out image fusion on the semantic segmentation map and the depth map in the camera coordinate system to obtain a fusion image.
6. The mesh obstacle recognition method according to any one of claims 1 or 5, wherein the performing pixel analysis on the fused image to determine depth information of the mesh obstacle includes:
and acquiring a histogram corresponding to the fusion image, carrying out pixel statistics on the histogram, and determining the depth information of the mesh obstacle based on a statistical result.
7. The mesh obstacle identification method as set forth in claim 6, further comprising:
and filling and repairing the depth map based on the depth information.
8. A mesh obstacle recognition device, comprising:
the first acquisition module is used for acquiring an image to be identified, wherein the image to be identified comprises a net-shaped obstacle;
the output module is used for inputting the image to be identified into a complete semantic segmentation prediction model and outputting a semantic segmentation graph of the image to be identified;
the second acquisition module is used for acquiring the depth map of the image to be identified;
and the determining module is used for carrying out image fusion on the semantic segmentation map and the depth map and carrying out pixel analysis on the fusion image so as to determine the depth information of the mesh obstacle.
9. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program; the processor, coupled to the memory, for executing the program stored in the memory to implement the steps in the mesh obstacle recognition method of any one of the preceding claims 1 to 7.
10. A computer readable storage medium storing a computer readable program or instructions which, when executed by a processor, is capable of carrying out the steps of the mesh obstacle identification method as claimed in any one of the preceding claims 1 to 7.
CN202210907845.3A 2022-07-29 2022-07-29 Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium Pending CN116092042A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210907845.3A CN116092042A (en) 2022-07-29 2022-07-29 Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210907845.3A CN116092042A (en) 2022-07-29 2022-07-29 Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN116092042A true CN116092042A (en) 2023-05-09

Family

ID=86206915

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210907845.3A Pending CN116092042A (en) 2022-07-29 2022-07-29 Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN116092042A (en)

Similar Documents

Publication Publication Date Title
JP7254823B2 (en) Neural networks for object detection and characterization
JP7239703B2 (en) Object classification using extraterritorial context
CN114080634B (en) Proxy trajectory prediction using anchor trajectories
CN113366495A (en) Searching autonomous vehicle sensor data repository
CN111563450B (en) Data processing method, device, equipment and storage medium
US11755917B2 (en) Generating depth from camera images and known depth data using neural networks
CN112927234A (en) Point cloud semantic segmentation method and device, electronic equipment and readable storage medium
EP4307219A1 (en) Three-dimensional target detection method and apparatus
EP4214682A1 (en) Multi-modal 3-d pose estimation
CN112257668A (en) Main and auxiliary road judging method and device, electronic equipment and storage medium
US20210364637A1 (en) Object localization using machine learning
US20230032669A1 (en) Systems and methods for determining drivable space
CN112507891A (en) Method and device for automatically identifying high-speed intersection and constructing intersection vector
Bougharriou et al. Vehicles distance estimation using detection of vanishing point
CN110720025B (en) Method, device and system for selecting map of mobile object and vehicle/robot
CN116092042A (en) Mesh obstacle recognition method, mesh obstacle recognition device, electronic equipment and computer storage medium
CN113902047A (en) Image element matching method, device, equipment and storage medium
EP3944137A1 (en) Positioning method and positioning apparatus
CN117994614A (en) Target detection method and device
CN117315402A (en) Training method of three-dimensional object detection model and three-dimensional object detection method
CN117036403A (en) Image processing method, device, electronic equipment and storage medium
CN116524114A (en) Automatic labeling method for automatic driving data, electronic equipment and storage medium
CN117611800A (en) YOLO-based target grounding point detection and ranging method
CN116152776A (en) Method, device, equipment and storage medium for identifying drivable area
CN117203678A (en) Target detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination