CN118097624A - Vehicle environment sensing method and device - Google Patents
Vehicle environment sensing method and device Download PDFInfo
- Publication number
- CN118097624A CN118097624A CN202410488944.1A CN202410488944A CN118097624A CN 118097624 A CN118097624 A CN 118097624A CN 202410488944 A CN202410488944 A CN 202410488944A CN 118097624 A CN118097624 A CN 118097624A
- Authority
- CN
- China
- Prior art keywords
- processing
- image segmentation
- target detection
- map
- feature vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 89
- 238000001514 detection method Methods 0.000 claims abstract description 117
- 238000003709 image segmentation Methods 0.000 claims abstract description 110
- 238000012545 processing Methods 0.000 claims abstract description 110
- 238000010586 diagram Methods 0.000 claims abstract description 95
- 230000004927 fusion Effects 0.000 claims abstract description 67
- 230000008447 perception Effects 0.000 claims abstract description 53
- 230000010354 integration Effects 0.000 claims abstract description 12
- 238000012549 training Methods 0.000 claims description 28
- 238000013136 deep learning model Methods 0.000 claims description 18
- 230000007246 mechanism Effects 0.000 claims description 18
- 238000000605 extraction Methods 0.000 claims description 16
- 230000002457 bidirectional effect Effects 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 10
- 238000002372 labelling Methods 0.000 claims description 7
- 230000007613 environmental effect Effects 0.000 claims description 6
- 238000007499 fusion processing Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 description 21
- 238000013528 artificial neural network Methods 0.000 description 10
- 238000013527 convolutional neural network Methods 0.000 description 9
- 230000011218 segmentation Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 7
- 238000004422 calculation algorithm Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
- G06V20/582—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/588—Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
The application provides a vehicle environment sensing method and a device, wherein the method comprises the following steps: acquiring vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; processing the vehicle environment data through a backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics; processing the focusing characteristic map through a bottleneck layer to obtain a fusion characteristic vector map and an enhancement characteristic vector map; processing the fusion feature vector diagram and the enhancement feature vector diagram through a target detection model to obtain a road information target detection result; processing the fusion feature vector diagram and the enhancement feature vector diagram through an image segmentation model to obtain an image segmentation result; and carrying out information integration processing on the road information target detection result and the image segmentation result to obtain the hybrid perception information. Therefore, the method and the device can quickly and accurately realize the vehicle environment sensing, solve the problem of difficult image segmentation, and improve the environment sensing precision.
Description
Technical Field
The application relates to the technical field of automatic driving, in particular to a vehicle environment sensing method and device.
Background
The unmanned vehicle is used as an intelligent mobile vehicle and can replace a human driver to finish a series of driving behaviors. Unmanned vehicle research relates to scientific fields such as environment perception, navigation positioning, decision control and the like. The existing vehicle environment sensing method is generally based on FASTER RCNN, and based on FASTER RCNN, an SE Net structure improved neural network convolution module is adopted to construct and obtain a pedestrian detection model, and then the road pedestrian is tracked based on a particle filtering algorithm and a multi-feature fusion strategy of road pedestrian tracking. However, in practice, it has been found that the existing method has a problem of difficulty in image segmentation, thereby reducing the environmental perception accuracy.
Disclosure of Invention
The embodiment of the application aims to provide a vehicle environment sensing method and device, which can quickly and accurately realize vehicle environment sensing, solve the problem of difficult image segmentation and improve environment sensing precision.
The first aspect of the present application provides a vehicle environment sensing method, comprising:
Acquiring vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
Processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;
Processing the focusing feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;
Processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;
processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;
And carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.
Further, before the acquiring the vehicle environment data and the pre-constructed vehicle-mounted multi-task environment awareness model, the method further includes:
Collecting original video data for training a model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions;
labeling the original video data to obtain a training data set;
pre-constructing an original environment perception model;
And training the original environment perception model through the training data set to obtain a trained vehicle-mounted multi-task environment perception model.
Further, the backbone network architecture comprises a lightweight convolution module and a bidirectional pyramid structure;
The bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;
the target detection model comprises a target detection deep learning model based on Faster R-CNN and a target detection decoder;
The image segmentation model comprises an image segmentation depth learning model based on SCNN and an image segmentation decoder.
Further, the processing the vehicle environment data through the backbone network architecture to obtain a focusing feature map carrying fusion features includes:
Extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data;
And classifying and positioning the feature map data through the bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.
Further, the processing the focusing feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map includes:
processing the focusing characteristic map through the channel attention module to obtain a channel focusing characteristic map;
Processing the focusing characteristic map through the spatial attention module to obtain a spatial focusing characteristic map;
Carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map;
And sequentially inputting the focusing characteristic images into the channel attention module and the space attention module which are sequentially arranged for processing to obtain an enhanced characteristic vector image.
Further, the processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result includes:
acquiring backbone network weights of the backbone network architecture;
based on the backbone network weight and the target detection deep learning model based on Faster R-CNN, extracting secondary target detection features of the fusion feature vector diagram to obtain a secondary target detection feature diagram;
performing channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map to obtain a coded target detection feature map;
And processing the target detection feature map through the target detection decoder to obtain a road information target detection result.
Further, the processing the fused feature vector diagram and the enhanced feature vector diagram through the image segmentation model to obtain an image segmentation result includes:
Performing secondary image segmentation feature extraction on the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image;
Performing channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map;
and processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.
A second aspect of the present application provides a vehicle environment sensing device including:
The acquisition unit is used for acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
The first processing unit is used for processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;
The second processing unit is used for processing the focusing characteristic image through the bottleneck layer to obtain a fusion characteristic vector image and an enhancement characteristic vector image;
The third processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;
the fourth processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;
and the information integration unit is used for carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.
Further, the vehicle environment sensing device further includes:
the acquisition unit is used for acquiring original video data for training the model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions;
the marking unit is used for marking the original video data to obtain a training data set;
The construction unit is used for pre-constructing an original environment perception model;
The training unit is used for training the original environment perception model through the training data set to obtain a trained vehicle-mounted multi-task environment perception model.
Further, the backbone network architecture comprises a lightweight convolution module and a bidirectional pyramid structure;
The bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;
the target detection model comprises a target detection deep learning model based on Faster R-CNN and a target detection decoder;
The image segmentation model comprises an image segmentation depth learning model based on SCNN and an image segmentation decoder.
Further, the first processing unit includes:
The first extraction subunit is used for extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data;
and the classification positioning subunit is used for performing classification positioning processing on the feature map data through the bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.
Further, the second processing unit includes:
the first processing subunit is used for processing the focusing characteristic diagram through the channel attention module to obtain a channel focusing characteristic diagram;
The first processing subunit is further configured to process the focusing feature map through the spatial attention module to obtain a spatial focusing feature map;
The fusion subunit is used for carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map;
The first processing subunit is further configured to sequentially input the focusing feature map into the channel attention module and the spatial attention module that are sequentially arranged for processing, so as to obtain an enhanced feature vector map.
Further, the third processing unit includes:
an acquisition subunit, configured to acquire a backbone network weight of the backbone network architecture;
The second extraction subunit is used for extracting the secondary target detection features of the fusion feature vector diagram based on the backbone network weight and the target detection deep learning model based on the fast R-CNN to obtain a secondary target detection feature diagram;
The first splicing subunit is used for carrying out channel direction splicing processing on the secondary target detection feature image and the enhancement feature vector image to obtain an encoded target detection feature image;
and the second processing subunit is used for processing the target detection feature map through the target detection decoder to obtain a road information target detection result.
Further, the fourth processing unit includes:
The third extraction subunit is used for extracting secondary image segmentation features of the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image;
The second splicing subunit is used for carrying out channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map;
And the third processing subunit is used for processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.
A third aspect of the present application provides an electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the vehicle context awareness method of any of the first aspects of the present application.
A fourth aspect of the application provides a computer readable storage medium storing computer program instructions which, when read and executed by a processor, perform the vehicle environment awareness method of any of the first aspects of the application.
The beneficial effects of the application are as follows: the method and the device can quickly and accurately realize the sensing of the vehicle environment, solve the problem of difficult image segmentation and improve the sensing precision of the environment.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of a vehicle environment sensing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a vehicle-mounted multitasking environment sensing flow provided in an embodiment of the present application;
FIG. 3 is a general block diagram of a vehicle-mounted multi-task environment awareness method according to an embodiment of the present application;
FIG. 4 is a flowchart of another vehicle environment awareness method according to an embodiment of the present application;
Fig. 5 is a schematic diagram of a backbone network based on cross-stage deep residual convolution according to an embodiment of the present application;
FIG. 6 is a schematic diagram of a bottleneck layer architecture according to an embodiment of the present application;
Fig. 7 is a schematic structural diagram of a vehicle environment sensing device according to an embodiment of the present application;
Fig. 8 is a schematic structural diagram of another vehicle environment sensing device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.
Example 1
Referring to fig. 1, fig. 1 is a flowchart of a vehicle environment sensing method according to the present embodiment. The vehicle environment sensing method comprises the following steps:
S101, acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model.
In this embodiment, the backbone network architecture includes a lightweight convolution module and a bidirectional pyramid structure;
the bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;
The target detection model comprises a target detection deep learning model based on the fast R-CNN and a target detection decoder;
the image segmentation model includes an SCNN-based image segmentation depth learning model and an image segmentation decoder.
S102, processing the vehicle environment data through a backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics.
S103, processing the focusing feature images through a bottleneck layer to obtain a fusion feature vector image and an enhancement feature vector image.
S104, processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result.
S105, processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result.
S106, carrying out information integration processing on the road information target detection result and the image segmentation result to obtain the hybrid perception information.
In this embodiment, the method provides a vehicle-mounted multitasking environment sensing method based on a lightweight neural network. The method applies a vehicle-mounted multi-task environment perception model, and a specific vehicle-mounted multi-task environment perception flow diagram can be shown by referring to fig. 2.
In this embodiment, on the basis of fig. 2, reference may be made to the general block diagram of the vehicle-mounted multi-task environment awareness method shown in fig. 3, which further shows specific flow steps of the vehicle-mounted multi-task environment awareness method, so as to assist in explaining this embodiment.
In this embodiment, the execution subject of the method may be a computing device such as a computer or a server, which is not limited in this embodiment.
In this embodiment, the execution body of the method may be an intelligent device such as a smart phone or a tablet computer, which is not limited in this embodiment.
Therefore, by implementing the vehicle environment sensing method described in the embodiment, the camera carrying the corresponding algorithm can be used to obtain the important road information, so that the obtaining cost of the important road information is reduced; meanwhile, the problems that lane line segmentation is difficult and multi-task learning cannot be completed in the environment perception of the unmanned automobile can be solved; finally, the network model structure required in the unmanned automobile environment perception can be simplified, so that the corresponding vehicle-mounted computing resource cost is reduced.
Example 2
Referring to fig. 4, fig. 4 is a flowchart of a vehicle environment sensing method according to the present embodiment. The vehicle environment sensing method comprises the following steps:
S201, collecting original video data for training a model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions.
In the embodiment, the method obtains the original video data through a front high-resolution camera of the unmanned automobile. The original video data obtained by the method can ensure accurate achievement of the data. It can be seen that the collection of the data sets aims at simulating a real driving scenario, thereby providing a broad set of context-aware data sets for training and evaluating the corresponding training model.
In this embodiment, the unmanned vehicle may cover different kinds of roads during the collection process, including city streets, expressways, and rural roads. Thereby ensuring that the data set has rich scene diversity, thereby facilitating better training of the autopilot system to accommodate various environments.
S202, labeling the original video data to obtain a training data set.
In this embodiment, the acquired video data can understand each element in the image after labeling. Where labeling generally includes the identification and location of vehicles, pedestrians, traffic signs, road boundaries, etc.
For example, the training dataset may include 3-channel images x h*w*c from vehicle front-end cameras 1280 x 720 pixels, where h is high, w is wide, and c is the number of channels. The position of the center point of the target frame corresponding to class in the image is (x 0,y0), the distance from the center point to the periphery of the target frame is (w 0,h0), and therefore the label information isSuch as vehicle, pedestrian, road boundary, lane line category labels, and the like.
In this embodiment, the training data set contains driving scenarios from different geographies, environments and weather conditions. Wherein the training set, the verification set and the test set respectively comprise 70000, 10000 and 20000 sample data.
S203, an original environment perception model is built in advance.
S204, training the original environment perception model through a training data set to obtain a trained vehicle-mounted multi-task environment perception model.
S205, acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model.
In this embodiment, steps S206 to S207 provide a process of extracting feature information from an image based on a backbone network (fig. 5) of cross-stage deep residual convolution, and transmitting the feature information to different task branches for processing. The backbone network adopts a cross-stage deep residual convolution network structure, a traditional convolution module in the backbone network is replaced by a lightweight convolution module, and the backbone network is combined with a bidirectional feature pyramid structure, so that the consumption of computing resources is obviously reduced on the premise of keeping the detection accuracy of an original network. Meanwhile, through backbone network sharing, the multi-task neural network can simultaneously perform target detection and lane segmentation tasks, so that computing resources are saved, and the intelligent vehicle environment sensing efficiency is improved.
S206, extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data.
In this embodiment, in the process of processing image features and transferring task information through the backbone network, the method needs to input in advance: vehicle live road image dataset (input sample: live road image x h*w*c, tag: vehicle, pedestrian and lane line etc.))。
Then, the method can be used for extracting the features of the Xinin images through a lightweight convolution module. In order to build a more efficient network architecture and reduce the consumption of network model computing resources, the method adopts a Ghost module to replace common convolution in a backbone network. Specifically, the Ghost module may generate a part of the real feature layer through convolution operation, obtain the remaining phantom feature layer on the real feature layer through linear operation, and then splice the real feature layer and the phantom feature layer together to form a complete feature layer. It can be seen that compared to the original common convolution model, the Ghost module has a better compression effect, and the feature extraction effect is well maintained.
In this embodiment, the Ghost module and the normal convolution are specifically compared as follows:
The input image size is h×w×c, the output feature image size is h '×w' ×n, the convolution kernel size is k×k, s is an hyper-parameter and is far smaller than c, the common convolution calculation consumption is shown in formula (1), and the calculation consumption of the Ghost module is shown in formula (2):
(1)
(2)
the ratio value between the two is r c, and the specific calculation is shown in the formula (3):
(3)
It can be seen that using the Ghost module can save s times training time and cut s times model parameters.
S207, classifying and positioning the feature map data through a bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.
In the embodiment, the method adopts a bidirectional characteristic pyramid structure to transmit task information. The bidirectional feature pyramid structure is a structure for decoding image feature information. The system consists of a characteristic pyramid network and a pyramid attention network. The feature pyramid network structure is top-down, delivering strong semantic feature information from higher levels into the entire pyramid, but only enhancing semantic information and not positioning information. To complement the inadequacies of the feature pyramid network, pyramid attention structures are added to the back of the feature pyramid structure, conveying strongly located feature information from lower levels in a bottom-up fashion. It can be seen that the use of this structure can enhance the accuracy of target detection and reduce the rate of missed detection of small target objects by the network.
By implementing the implementation mode, shallow and deep characteristic information learned by the deep neural network can be fully utilized to process the target detection task and the classification and positioning task in image segmentation. For classification tasks, deep features may be more important, while for localization tasks, shallow and deep features are equally important. By using the bidirectional feature pyramid structure, the accuracy of target detection and image segmentation can be improved, and the omission ratio of a network to a small target object and a small image point can be reduced, so that the task information transfer from a backbone network to a bottleneck layer is realized.
Finally, feature extraction is carried out on the input image through a backbone structure based on a cross-stage depth residual convolution network, and a focusing feature map carrying fusion features is finally obtained。
In this embodiment, since an additional intermediate module is generally required to perform multi-level feature fusion and compression on the downstream task in the multi-task environment sensing process. Therefore, the method introduces a bottleneck layer (as shown in fig. 6) as an intermediate layer between the backbone network and the downstream task module, so that feature fusion and feature enhancement between different tasks are unified in the bottleneck layer. The bottleneck layer generates a feature map which is richer and has more semantic information by fusing and combining features from different levels of the backbone network. In this regard, the method specifically proposes steps S208 to S211.
S208, the focusing characteristic diagram is processed through the channel attention module, and the channel focusing characteristic diagram is obtained.
In this embodiment, in the flow of bottleneck layer multitasking feature fusion and feature enhancement, input data is required in advance: focusing feature map(Feature size is h '×w' ×n).
S209, processing the focusing characteristic map through a spatial attention module to obtain a spatial focusing characteristic map.
S210, carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map.
In this embodiment, the method may use a convolution attention mechanism module to perform feature fusion. In order to overcome the difficulty of high miss rate of the current neural network in detecting a remote small-sized vehicle target and an actual driving scene, the method proposes to fuse a convolution attention mechanism module in the multi-task neural network so as to improve the small-sized target detection effect and reduce the miss rate. In particular, the module is an attention mechanism for neural networks that solves the information overload problem in situations where computational power is limited by allocating computational resources to more important tasks. Therefore, the method can concentrate attention on more critical information in the current task by introducing an attention mechanism, reduce attention on other information and even filter irrelevant information, thereby improving the efficiency and accuracy of task processing.
In this embodiment, the convolution attention mechanism module mainly comprises a channel attention module and a spatial attention module, converts the input image information extracted by the initial feature into low-level features, and can capture more accurate high-level feature information by adding the channel and the spatial attention module. The convolution attention mechanism module is a lightweight module, and because of its versatility, the added overhead of integrating it into any convolution neural network is negligible.
Specifically, the focusing characteristic diagram is cascaded with the focusing characteristic diagram through a channel attention block reuse residual error network module to obtain the channel focusing characteristic diagram; Then the channel focusing characteristic map is cascaded with the channel focusing characteristic map by utilizing the space attention block and the residual error network module to generate a space focusing characteristic map/>. Finally, cascading the channel focusing feature map and the space focusing feature map to generate a fusion feature vector map/>。
S211, sequentially inputting the focusing feature images into a channel attention module and a space attention module which are sequentially arranged for processing, and obtaining an enhanced feature vector image.
In this embodiment, the method may employ a convolution attention mechanism module for feature enhancement. The specific operation is similar to feature fusion, only the application of a residual error module is removed, a focusing feature map is sequentially input into a channel attention module and a space attention module of a sequence model, and finally an enhanced feature vector map is generated。
By implementing the implementation mode, more accurate target position and category information can be provided through the feature maps, so that the accuracy of target detection is improved; meanwhile, more accurate image segmentation boundaries and areas can be provided through the feature images, so that the accuracy of image segmentation is improved. The bottleneck layer plays a role in connecting a backbone network and a target detection module in a target detection task, and improves detection performance through feature fusion and feature enhancement; meanwhile, the bottleneck layer plays a role in connecting a backbone network and a lane line segmentation module in a lane line segmentation task, and improves the image segmentation performance through feature fusion and feature enhancement.
S212, acquiring backbone network weights of the backbone network architecture.
S213, performing secondary target detection feature extraction on the fusion feature vector diagram based on the backbone network weight and the target detection deep learning model based on the Faster R-CNN to obtain a secondary target detection feature diagram.
S214, performing channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map to obtain the encoded target detection feature map.
S215, processing the target detection feature map through a target detection decoder to obtain a road information target detection result.
S216, performing secondary image segmentation feature extraction on the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image.
S217, performing channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map.
S218, processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.
In this embodiment, steps S212 to S218 provide methods for target detection and road marking segmentation detection. After the enhanced features and the fusion features output in the backbone network are obtained, the method can respectively perform simple secondary feature processing on downstream tasks, and finally the corresponding vehicle-mounted perception information is obtained.
In this embodiment, the flow and method of different downstream tasks may be as follows:
first, input data: enhanced feature vector diagram And fusion feature vector diagram/>(Note: input features for different tasks are consistent).
And secondly, realizing target detection by adopting a multi-task neural network. On the basis of sharing the previous backbone network weight, a target detection deep learning model based on Faster R-CNN is added in a subsequent task, and the function of the model is to analyze and predict the positions and the categories of road information target frames such as vehicles, pedestrians and the like from a convolution characteristic diagram.
Specifically, the target detection deep learning model firstly performs secondary target detection feature extraction on the fusion feature vector diagram to obtain a secondary target detection feature diagram, then performs channel direction splicing operation on the secondary target detection feature diagram and the enhancement feature vector diagram to obtain an encoded target detection feature diagram, and takes the encoded target detection feature diagram as input of a target detection decoder part. The decoder is responsible for adjusting and correcting the location of the bounding box proposed by the RPN by applying a regression operation to each candidate region in the convolution feature map. In this way, the final prediction box can more accurately fit the actual position of the target. The decoder is also responsible for predicting for each candidate region the class of the object it contains. By applying the softmax activation function (corresponding to equation (4)), each possible target class is classified and the class with the highest probability is selected.
(4)
Wherein the formula represents rescaling the n-dimensional input tensor such that the element is within the [0,1] range and the sum is 1. Finally, obtaining road target information results such as vehicles, pedestrians, obstacles and the like after frame selection。
Then, the method completes image segmentation by adopting a multi-task neural network. On the basis of sharing the previous backbone network weight, an image segmentation deep learning model based on SCNN is added in the subsequent tasks, and the function of the model is to generate a pixel-level segmentation result from a convolution feature map and allocate different areas in the image to specific categories, such as roads and non-roads. Then, the division of the vehicle course is completed.
Specifically, the image segmentation deep learning model also needs to extract secondary image segmentation features of the fusion feature vector diagram to obtain a secondary image segmentation feature diagram, and then performs channel direction stitching operation on the secondary image segmentation feature diagram and the enhancement feature vector diagram, and the secondary image segmentation feature diagram and the enhancement feature vector diagram are used as decoder parts for inputting the image segmentation of the encoded image segmentation feature diagram. The decoder is responsible for restoring the feature mapping of the high-level abstraction to the original image resolution by up-sampling and analyzing the convolution feature map; in another aspect the decoder further classifies each pixel to determine the class to which it belongs.
Finally, the decoder generates a matrix of the same size as the original image, wherein each pixel is marked as its category, as a result of the segmentation of the final generated image。
S219, carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.
In this embodiment, the method may divide the image obtained as described aboveAnd road information target detection results/>Respectively, to the information integration system of the unmanned vehicle, and performs a cascade mixing operation as in formula (5).
(5)
Therein, whereinIs a mixed perceptual information feature vector. Finally, the mixed perception information/>, under the multi-task environment, is obtained through the cascade mixing operationIncluding the location, category, and road marking information of the detected vehicle and pedestrian.
In this embodiment, the execution subject of the method may be a computing device such as a computer or a server, which is not limited in this embodiment.
In this embodiment, the execution body of the method may be an intelligent device such as a smart phone or a tablet computer, which is not limited in this embodiment.
Therefore, by implementing the vehicle environment sensing method described in the embodiment, a lightweight convolution module Ghost module can be introduced, so that the parameter quantity of the backbone network is greatly reduced, and meanwhile, the utilization efficiency of computing resources is improved, so that the utilization efficiency of the model computing resources on the unmanned automobile is improved. The method can also introduce full utilization of deep and shallow characteristic information by a characteristic pyramid and capture of more local information by a convolution attention mechanism, so that semantic information and positioning information are enhanced, and finally, the detection precision of the unmanned automobile road information target is improved. And the bottleneck layer can be added on the basis of a cross-stage deep residual convolution network backbone network to realize weight sharing, so that fusion characteristics are provided for downstream tasks, and a plurality of tasks can be executed under the same model architecture in an unmanned perception manner.
Example 3
Referring to fig. 7, fig. 7 is a schematic structural diagram of a vehicle environment sensing device according to the present embodiment. As shown in fig. 7, the vehicle environment sensing device includes:
An acquiring unit 310, configured to acquire vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
The first processing unit 320 is configured to process the vehicle environment data through a backbone network architecture to obtain a focusing feature map carrying the fusion feature;
A second processing unit 330, configured to process the focusing feature map through a bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;
The third processing unit 340 is configured to process the fused feature vector diagram and the enhanced feature vector diagram through a target detection model, so as to obtain a road information target detection result;
a fourth processing unit 350, configured to process the fused feature vector diagram and the enhanced feature vector diagram through an image segmentation model, so as to obtain an image segmentation result;
The information integrating unit 360 is configured to perform information integration processing on the road information target detection result and the image segmentation result, so as to obtain hybrid perception information.
In this embodiment, the explanation of the vehicle environment sensing device may refer to the description in embodiment 1 or embodiment 2, and the description is not repeated in this embodiment.
Therefore, the vehicle environment sensing device described in the embodiment can adopt the camera carrying the corresponding algorithm to acquire the important road information, so that the acquisition cost of the important road information is reduced; meanwhile, the problems that lane line segmentation is difficult and multi-task learning cannot be completed in the environment perception of the unmanned automobile can be solved; finally, the network model structure required in the unmanned automobile environment perception can be simplified, so that the corresponding vehicle-mounted computing resource cost is reduced.
Example 4
Referring to fig. 8, fig. 8 is a schematic structural diagram of a vehicle environment sensing device according to the present embodiment. As shown in fig. 8, the vehicle environment sensing device includes:
An acquiring unit 310, configured to acquire vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
The first processing unit 320 is configured to process the vehicle environment data through a backbone network architecture to obtain a focusing feature map carrying the fusion feature;
A second processing unit 330, configured to process the focusing feature map through a bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;
The third processing unit 340 is configured to process the fused feature vector diagram and the enhanced feature vector diagram through a target detection model, so as to obtain a road information target detection result;
a fourth processing unit 350, configured to process the fused feature vector diagram and the enhanced feature vector diagram through an image segmentation model, so as to obtain an image segmentation result;
The information integrating unit 360 is configured to perform information integration processing on the road information target detection result and the image segmentation result, so as to obtain hybrid perception information.
As an alternative embodiment, the vehicle environment sensing device further includes:
an acquisition unit 370 for acquiring raw video data for training a model; wherein the raw video data includes driving scene data under different geographic, environmental and weather conditions;
The labeling unit 380 is configured to perform labeling processing on the original video data to obtain a training data set;
a construction unit 390 for pre-constructing an original environment awareness model;
The training unit 400 is configured to train the original environment perception model through the training data set, and obtain a trained vehicle-mounted multi-task environment perception model.
In this embodiment, the backbone network architecture includes a lightweight convolution module and a bidirectional pyramid structure;
the bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;
The target detection model comprises a target detection deep learning model based on the fast R-CNN and a target detection decoder;
the image segmentation model includes an SCNN-based image segmentation depth learning model and an image segmentation decoder.
As an alternative embodiment, the first processing unit 320 includes:
The first extraction subunit 321 is configured to perform image feature extraction on the vehicle environment data through the lightweight convolution module, so as to obtain feature map data;
the classifying and positioning subunit 322 is configured to perform classifying and positioning processing on the feature map data through the bidirectional pyramid structure, so as to obtain a focused feature map carrying the fusion feature.
As an alternative embodiment, the second processing unit 330 includes:
the first processing subunit 331 is configured to process the focusing feature map through the channel attention module to obtain a channel focusing feature map;
The first processing subunit 331 is further configured to process the focusing feature map through a spatial attention module to obtain a spatial focusing feature map;
the fusion subunit 332 is configured to perform fusion processing on the channel focusing feature map and the spatial focusing feature map to obtain a fusion feature vector map;
the first processing subunit 331 is further configured to sequentially input the focusing feature map into the channel attention module and the spatial attention module that are sequentially arranged for processing, so as to obtain an enhanced feature vector map.
As an alternative embodiment, the third processing unit 340 includes:
An obtaining subunit 341, configured to obtain a backbone network weight of the backbone network architecture;
The second extraction subunit 342 is configured to perform secondary target detection feature extraction on the fused feature vector diagram based on the backbone network weight and the target detection deep learning model based on the fast R-CNN to obtain a secondary target detection feature diagram;
A first splicing subunit 343, configured to perform channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map, so as to obtain a coded target detection feature map;
the second processing subunit 344 is configured to process the target detection feature map through the target detection decoder to obtain a road information target detection result.
As an alternative embodiment, the fourth processing unit 350 includes:
A third extraction subunit 351, configured to perform secondary image segmentation feature extraction on the fused feature vector diagram based on the backbone network weight and the SCNN-based image segmentation depth learning model, to obtain a secondary image segmentation feature diagram;
the second stitching subunit 352 is configured to perform channel direction stitching on both the secondary image segmentation feature map and the enhancement feature vector map, so as to obtain an encoded image segmentation feature map;
the third processing subunit 353 is configured to process the image segmentation feature map by using an image segmentation decoder, so as to obtain an image segmentation result.
In this embodiment, the explanation of the vehicle environment sensing device may refer to the description in embodiment 1 or embodiment 2, and the description is not repeated in this embodiment.
Therefore, the vehicle environment sensing device described in the embodiment can adopt the camera carrying the corresponding algorithm to acquire the important road information, so that the acquisition cost of the important road information is reduced; meanwhile, the problems that lane line segmentation is difficult and multi-task learning cannot be completed in the environment perception of the unmanned automobile can be solved; finally, the network model structure required in the unmanned automobile environment perception can be simplified, so that the corresponding vehicle-mounted computing resource cost is reduced.
An embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to execute the computer program to cause the electronic device to execute a vehicle environment sensing method in embodiment 1 or embodiment 2 of the present application.
Embodiments of the present application provide a computer readable storage medium storing computer program instructions that, when read and executed by a processor, perform the vehicle environment awareness method of embodiment 1 or embodiment 2 of the present application.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Claims (10)
1. A method of vehicle context awareness, comprising:
Acquiring vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
Processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;
Processing the focusing feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;
Processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;
processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;
And carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.
2. The vehicle environment awareness method of claim 1, wherein prior to the acquiring vehicle environment data and the pre-built on-board multitasking environment awareness model, the method further comprises:
Collecting original video data for training a model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions;
labeling the original video data to obtain a training data set;
pre-constructing an original environment perception model;
And training the original environment perception model through the training data set to obtain a trained vehicle-mounted multi-task environment perception model.
3. The vehicle context awareness method of claim 1, wherein the backbone network architecture comprises a lightweight convolution module and a bi-directional pyramid structure;
The bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;
the target detection model comprises a target detection deep learning model based on Faster R-CNN and a target detection decoder;
The image segmentation model comprises an image segmentation depth learning model based on SCNN and an image segmentation decoder.
4. The vehicle environment awareness method according to claim 3, wherein the processing the vehicle environment data through the backbone network architecture to obtain a focused feature map carrying a fused feature comprises:
Extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data;
And classifying and positioning the feature map data through the bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.
5. The vehicle environment awareness method according to claim 3, wherein the processing the focus feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map includes:
processing the focusing characteristic map through the channel attention module to obtain a channel focusing characteristic map;
Processing the focusing characteristic map through the spatial attention module to obtain a spatial focusing characteristic map;
Carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map;
And sequentially inputting the focusing characteristic images into the channel attention module and the space attention module which are sequentially arranged for processing to obtain an enhanced characteristic vector image.
6. The vehicle environment sensing method according to claim 3, wherein the processing the fused feature vector diagram and the enhanced feature vector diagram by the object detection model to obtain a road information object detection result includes:
acquiring backbone network weights of the backbone network architecture;
based on the backbone network weight and the target detection deep learning model based on Faster R-CNN, extracting secondary target detection features of the fusion feature vector diagram to obtain a secondary target detection feature diagram;
performing channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map to obtain a coded target detection feature map;
And processing the target detection feature map through the target detection decoder to obtain a road information target detection result.
7. The vehicle environment sensing method according to claim 6, wherein the processing the fused feature vector diagram and the enhanced feature vector diagram by the image segmentation model to obtain an image segmentation result includes:
Performing secondary image segmentation feature extraction on the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image;
Performing channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map;
and processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.
8. A vehicular environment sensing device, characterized by comprising:
The acquisition unit is used for acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;
The first processing unit is used for processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;
The second processing unit is used for processing the focusing characteristic image through the bottleneck layer to obtain a fusion characteristic vector image and an enhancement characteristic vector image;
The third processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;
the fourth processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;
and the information integration unit is used for carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.
9. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the vehicle context awareness method of any of claims 1 to 7.
10. A readable storage medium having stored therein computer program instructions which, when read and executed by a processor, perform the vehicle environment awareness method of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410488944.1A CN118097624B (en) | 2024-04-23 | 2024-04-23 | Vehicle environment sensing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410488944.1A CN118097624B (en) | 2024-04-23 | 2024-04-23 | Vehicle environment sensing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN118097624A true CN118097624A (en) | 2024-05-28 |
CN118097624B CN118097624B (en) | 2024-07-19 |
Family
ID=91163272
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410488944.1A Active CN118097624B (en) | 2024-04-23 | 2024-04-23 | Vehicle environment sensing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN118097624B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178253A (en) * | 2019-12-27 | 2020-05-19 | 深圳佑驾创新科技有限公司 | Visual perception method and device for automatic driving, computer equipment and storage medium |
CN113887588A (en) * | 2021-09-17 | 2022-01-04 | 北京科技大学 | Vehicle detection method and device based on attention mechanism and feature weighting fusion |
CN114821357A (en) * | 2022-04-24 | 2022-07-29 | 中国人民解放军空军工程大学 | Optical remote sensing target detection method based on transformer |
CN114998255A (en) * | 2022-05-31 | 2022-09-02 | 南京工业大学 | Lightweight deployment method based on aeroengine hole detection crack detection |
CN115546769A (en) * | 2022-12-02 | 2022-12-30 | 广汽埃安新能源汽车股份有限公司 | Road image recognition method, device, equipment and computer readable medium |
CN115984815A (en) * | 2022-12-29 | 2023-04-18 | 上海涵润汽车电子有限公司 | Multitask perception method and device based on neural network |
US20230406366A1 (en) * | 2022-06-15 | 2023-12-21 | Beihang University | Active perception system for double-axle steering cab-less mining vehicle |
-
2024
- 2024-04-23 CN CN202410488944.1A patent/CN118097624B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178253A (en) * | 2019-12-27 | 2020-05-19 | 深圳佑驾创新科技有限公司 | Visual perception method and device for automatic driving, computer equipment and storage medium |
CN113887588A (en) * | 2021-09-17 | 2022-01-04 | 北京科技大学 | Vehicle detection method and device based on attention mechanism and feature weighting fusion |
CN114821357A (en) * | 2022-04-24 | 2022-07-29 | 中国人民解放军空军工程大学 | Optical remote sensing target detection method based on transformer |
CN114998255A (en) * | 2022-05-31 | 2022-09-02 | 南京工业大学 | Lightweight deployment method based on aeroengine hole detection crack detection |
US20230406366A1 (en) * | 2022-06-15 | 2023-12-21 | Beihang University | Active perception system for double-axle steering cab-less mining vehicle |
CN115546769A (en) * | 2022-12-02 | 2022-12-30 | 广汽埃安新能源汽车股份有限公司 | Road image recognition method, device, equipment and computer readable medium |
CN115984815A (en) * | 2022-12-29 | 2023-04-18 | 上海涵润汽车电子有限公司 | Multitask perception method and device based on neural network |
Also Published As
Publication number | Publication date |
---|---|
CN118097624B (en) | 2024-07-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019101142A4 (en) | A pedestrian detection method with lightweight backbone based on yolov3 network | |
CN111814621B (en) | Attention mechanism-based multi-scale vehicle pedestrian detection method and device | |
Devi et al. | A comprehensive survey on autonomous driving cars: A perspective view | |
Mahaur et al. | Road object detection: a comparative study of deep learning-based algorithms | |
Han et al. | A comprehensive review for typical applications based upon unmanned aerial vehicle platform | |
CN113420607A (en) | Multi-scale target detection and identification method for unmanned aerial vehicle | |
EP3690744B1 (en) | Method for integrating driving images acquired from vehicles performing cooperative driving and driving image integrating device using same | |
CN112990065B (en) | Vehicle classification detection method based on optimized YOLOv5 model | |
Maity et al. | Last decade in vehicle detection and classification: a comprehensive survey | |
CN112613434A (en) | Road target detection method, device and storage medium | |
CN114581710A (en) | Image recognition method, device, equipment, readable storage medium and program product | |
CN117710764A (en) | Training method, device and medium for multi-task perception network | |
CN118155183A (en) | Unstructured scene automatic driving network architecture method for deep multi-mode perception | |
CN114037976A (en) | Road traffic sign identification method and device | |
Bhattacharyya et al. | JUVDsi v1: developing and benchmarking a new still image database in Indian scenario for automatic vehicle detection | |
Hasan Yusuf et al. | Real-time car parking detection with deep learning in different lighting scenarios | |
CN118097624B (en) | Vehicle environment sensing method and device | |
Meletis | Towards holistic scene understanding: Semantic segmentation and beyond | |
CN118097604B (en) | Intelligent vehicle environment sensing method and device | |
Hari Krishna et al. | Advancements in Traffic Sign Detection and Recognition for Adverse Image and Motion Artifacts in Transportation Systems | |
Sheng et al. | Advancements in Lane Marking Detection: An Extensive Evaluation of Current Methods and Future Research Direction | |
Piralkar | A Deep Learning Framework to Traffic Sign Recognition in All Weather Conditions | |
Yalla et al. | CHASE Algorithm:" Ease of Driving" Classification | |
Jaju | Multiclass Classification of Road Traffic Signs Using Machine Learning Algorithms | |
Ciuntu | Real-Time Traffic Sign Detection and Classification Based on a Video Feed |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |