CN118097624A

CN118097624A - Vehicle environment sensing method and device

Info

Publication number: CN118097624A
Application number: CN202410488944.1A
Authority: CN
Inventors: 吕忠鹏; 周赖栏; 沈烈
Original assignee: GAC Aion New Energy Automobile Co Ltd
Current assignee: GAC Aion New Energy Automobile Co Ltd
Priority date: 2024-04-23
Filing date: 2024-04-23
Publication date: 2024-05-28
Anticipated expiration: 2044-04-23
Also published as: CN118097624B

Abstract

The application provides a vehicle environment sensing method and a device, wherein the method comprises the following steps: acquiring vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; processing the vehicle environment data through a backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics; processing the focusing characteristic map through a bottleneck layer to obtain a fusion characteristic vector map and an enhancement characteristic vector map; processing the fusion feature vector diagram and the enhancement feature vector diagram through a target detection model to obtain a road information target detection result; processing the fusion feature vector diagram and the enhancement feature vector diagram through an image segmentation model to obtain an image segmentation result; and carrying out information integration processing on the road information target detection result and the image segmentation result to obtain the hybrid perception information. Therefore, the method and the device can quickly and accurately realize the vehicle environment sensing, solve the problem of difficult image segmentation, and improve the environment sensing precision.

Description

Vehicle environment sensing method and device

Technical Field

The application relates to the technical field of automatic driving, in particular to a vehicle environment sensing method and device.

Background

The unmanned vehicle is used as an intelligent mobile vehicle and can replace a human driver to finish a series of driving behaviors. Unmanned vehicle research relates to scientific fields such as environment perception, navigation positioning, decision control and the like. The existing vehicle environment sensing method is generally based on FASTER RCNN, and based on FASTER RCNN, an SE Net structure improved neural network convolution module is adopted to construct and obtain a pedestrian detection model, and then the road pedestrian is tracked based on a particle filtering algorithm and a multi-feature fusion strategy of road pedestrian tracking. However, in practice, it has been found that the existing method has a problem of difficulty in image segmentation, thereby reducing the environmental perception accuracy.

Disclosure of Invention

The embodiment of the application aims to provide a vehicle environment sensing method and device, which can quickly and accurately realize vehicle environment sensing, solve the problem of difficult image segmentation and improve environment sensing precision.

The first aspect of the present application provides a vehicle environment sensing method, comprising:

Acquiring vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;

Processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;

Processing the focusing feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;

Processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;

processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;

And carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.

Further, before the acquiring the vehicle environment data and the pre-constructed vehicle-mounted multi-task environment awareness model, the method further includes:

Collecting original video data for training a model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions;

labeling the original video data to obtain a training data set;

pre-constructing an original environment perception model;

And training the original environment perception model through the training data set to obtain a trained vehicle-mounted multi-task environment perception model.

Further, the backbone network architecture comprises a lightweight convolution module and a bidirectional pyramid structure;

The bottleneck layer comprises a convolution attention mechanism module; the convolution attention mechanism module comprises a channel attention module and a space attention module;

the target detection model comprises a target detection deep learning model based on Faster R-CNN and a target detection decoder;

The image segmentation model comprises an image segmentation depth learning model based on SCNN and an image segmentation decoder.

Further, the processing the vehicle environment data through the backbone network architecture to obtain a focusing feature map carrying fusion features includes:

Extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data;

And classifying and positioning the feature map data through the bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.

Further, the processing the focusing feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map includes:

processing the focusing characteristic map through the channel attention module to obtain a channel focusing characteristic map;

Processing the focusing characteristic map through the spatial attention module to obtain a spatial focusing characteristic map;

Carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map;

And sequentially inputting the focusing characteristic images into the channel attention module and the space attention module which are sequentially arranged for processing to obtain an enhanced characteristic vector image.

Further, the processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result includes:

acquiring backbone network weights of the backbone network architecture;

based on the backbone network weight and the target detection deep learning model based on Faster R-CNN, extracting secondary target detection features of the fusion feature vector diagram to obtain a secondary target detection feature diagram;

performing channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map to obtain a coded target detection feature map;

And processing the target detection feature map through the target detection decoder to obtain a road information target detection result.

Further, the processing the fused feature vector diagram and the enhanced feature vector diagram through the image segmentation model to obtain an image segmentation result includes:

Performing secondary image segmentation feature extraction on the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image;

Performing channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map;

and processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.

A second aspect of the present application provides a vehicle environment sensing device including:

The acquisition unit is used for acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multi-task environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;

The first processing unit is used for processing the vehicle environment data through the backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics;

The second processing unit is used for processing the focusing characteristic image through the bottleneck layer to obtain a fusion characteristic vector image and an enhancement characteristic vector image;

The third processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result;

the fourth processing unit is used for processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result;

and the information integration unit is used for carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.

Further, the vehicle environment sensing device further includes:

the acquisition unit is used for acquiring original video data for training the model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions;

the marking unit is used for marking the original video data to obtain a training data set;

The construction unit is used for pre-constructing an original environment perception model;

The training unit is used for training the original environment perception model through the training data set to obtain a trained vehicle-mounted multi-task environment perception model.

Further, the first processing unit includes:

The first extraction subunit is used for extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data;

and the classification positioning subunit is used for performing classification positioning processing on the feature map data through the bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.

Further, the second processing unit includes:

the first processing subunit is used for processing the focusing characteristic diagram through the channel attention module to obtain a channel focusing characteristic diagram;

The first processing subunit is further configured to process the focusing feature map through the spatial attention module to obtain a spatial focusing feature map;

The fusion subunit is used for carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map;

The first processing subunit is further configured to sequentially input the focusing feature map into the channel attention module and the spatial attention module that are sequentially arranged for processing, so as to obtain an enhanced feature vector map.

Further, the third processing unit includes:

an acquisition subunit, configured to acquire a backbone network weight of the backbone network architecture;

The second extraction subunit is used for extracting the secondary target detection features of the fusion feature vector diagram based on the backbone network weight and the target detection deep learning model based on the fast R-CNN to obtain a secondary target detection feature diagram;

The first splicing subunit is used for carrying out channel direction splicing processing on the secondary target detection feature image and the enhancement feature vector image to obtain an encoded target detection feature image;

and the second processing subunit is used for processing the target detection feature map through the target detection decoder to obtain a road information target detection result.

Further, the fourth processing unit includes:

The third extraction subunit is used for extracting secondary image segmentation features of the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image;

The second splicing subunit is used for carrying out channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map;

And the third processing subunit is used for processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.

A third aspect of the present application provides an electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the vehicle context awareness method of any of the first aspects of the present application.

A fourth aspect of the application provides a computer readable storage medium storing computer program instructions which, when read and executed by a processor, perform the vehicle environment awareness method of any of the first aspects of the application.

The beneficial effects of the application are as follows: the method and the device can quickly and accurately realize the sensing of the vehicle environment, solve the problem of difficult image segmentation and improve the sensing precision of the environment.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a vehicle environment sensing method according to an embodiment of the present application;

fig. 2 is a schematic diagram of a vehicle-mounted multitasking environment sensing flow provided in an embodiment of the present application;

FIG. 3 is a general block diagram of a vehicle-mounted multi-task environment awareness method according to an embodiment of the present application;

FIG. 4 is a flowchart of another vehicle environment awareness method according to an embodiment of the present application;

Fig. 5 is a schematic diagram of a backbone network based on cross-stage deep residual convolution according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a bottleneck layer architecture according to an embodiment of the present application;

Fig. 7 is a schematic structural diagram of a vehicle environment sensing device according to an embodiment of the present application;

Fig. 8 is a schematic structural diagram of another vehicle environment sensing device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

Example 1

Referring to fig. 1, fig. 1 is a flowchart of a vehicle environment sensing method according to the present embodiment. The vehicle environment sensing method comprises the following steps:

S101, acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model.

In this embodiment, the backbone network architecture includes a lightweight convolution module and a bidirectional pyramid structure;

The target detection model comprises a target detection deep learning model based on the fast R-CNN and a target detection decoder;

the image segmentation model includes an SCNN-based image segmentation depth learning model and an image segmentation decoder.

S102, processing the vehicle environment data through a backbone network architecture to obtain a focusing characteristic diagram carrying fusion characteristics.

S103, processing the focusing feature images through a bottleneck layer to obtain a fusion feature vector image and an enhancement feature vector image.

S104, processing the fusion feature vector diagram and the enhancement feature vector diagram through the target detection model to obtain a road information target detection result.

S105, processing the fusion feature vector diagram and the enhancement feature vector diagram through the image segmentation model to obtain an image segmentation result.

S106, carrying out information integration processing on the road information target detection result and the image segmentation result to obtain the hybrid perception information.

In this embodiment, the method provides a vehicle-mounted multitasking environment sensing method based on a lightweight neural network. The method applies a vehicle-mounted multi-task environment perception model, and a specific vehicle-mounted multi-task environment perception flow diagram can be shown by referring to fig. 2.

In this embodiment, on the basis of fig. 2, reference may be made to the general block diagram of the vehicle-mounted multi-task environment awareness method shown in fig. 3, which further shows specific flow steps of the vehicle-mounted multi-task environment awareness method, so as to assist in explaining this embodiment.

In this embodiment, the execution subject of the method may be a computing device such as a computer or a server, which is not limited in this embodiment.

In this embodiment, the execution body of the method may be an intelligent device such as a smart phone or a tablet computer, which is not limited in this embodiment.

Therefore, by implementing the vehicle environment sensing method described in the embodiment, the camera carrying the corresponding algorithm can be used to obtain the important road information, so that the obtaining cost of the important road information is reduced; meanwhile, the problems that lane line segmentation is difficult and multi-task learning cannot be completed in the environment perception of the unmanned automobile can be solved; finally, the network model structure required in the unmanned automobile environment perception can be simplified, so that the corresponding vehicle-mounted computing resource cost is reduced.

Example 2

Referring to fig. 4, fig. 4 is a flowchart of a vehicle environment sensing method according to the present embodiment. The vehicle environment sensing method comprises the following steps:

S201, collecting original video data for training a model; wherein the raw video data includes driving scenario data under different geographic, environmental and weather conditions.

In the embodiment, the method obtains the original video data through a front high-resolution camera of the unmanned automobile. The original video data obtained by the method can ensure accurate achievement of the data. It can be seen that the collection of the data sets aims at simulating a real driving scenario, thereby providing a broad set of context-aware data sets for training and evaluating the corresponding training model.

In this embodiment, the unmanned vehicle may cover different kinds of roads during the collection process, including city streets, expressways, and rural roads. Thereby ensuring that the data set has rich scene diversity, thereby facilitating better training of the autopilot system to accommodate various environments.

S202, labeling the original video data to obtain a training data set.

In this embodiment, the acquired video data can understand each element in the image after labeling. Where labeling generally includes the identification and location of vehicles, pedestrians, traffic signs, road boundaries, etc.

For example, the training dataset may include 3-channel images x ^h*w*c from vehicle front-end cameras 1280 x 720 pixels, where h is high, w is wide, and c is the number of channels. The position of the center point of the target frame corresponding to class in the image is (x ₀,y₀), the distance from the center point to the periphery of the target frame is (w ₀,h₀), and therefore the label information isSuch as vehicle, pedestrian, road boundary, lane line category labels, and the like.

In this embodiment, the training data set contains driving scenarios from different geographies, environments and weather conditions. Wherein the training set, the verification set and the test set respectively comprise 70000, 10000 and 20000 sample data.

S203, an original environment perception model is built in advance.

S204, training the original environment perception model through a training data set to obtain a trained vehicle-mounted multi-task environment perception model.

S205, acquiring vehicle environment data and a pre-constructed vehicle-mounted multi-task environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model.

In this embodiment, steps S206 to S207 provide a process of extracting feature information from an image based on a backbone network (fig. 5) of cross-stage deep residual convolution, and transmitting the feature information to different task branches for processing. The backbone network adopts a cross-stage deep residual convolution network structure, a traditional convolution module in the backbone network is replaced by a lightweight convolution module, and the backbone network is combined with a bidirectional feature pyramid structure, so that the consumption of computing resources is obviously reduced on the premise of keeping the detection accuracy of an original network. Meanwhile, through backbone network sharing, the multi-task neural network can simultaneously perform target detection and lane segmentation tasks, so that computing resources are saved, and the intelligent vehicle environment sensing efficiency is improved.

S206, extracting image features of the vehicle environment data through the lightweight convolution module to obtain feature map data.

In this embodiment, in the process of processing image features and transferring task information through the backbone network, the method needs to input in advance: vehicle live road image dataset (input sample: live road image x ^h*w*c, tag: vehicle, pedestrian and lane line etc.)）。

Then, the method can be used for extracting the features of the Xinin images through a lightweight convolution module. In order to build a more efficient network architecture and reduce the consumption of network model computing resources, the method adopts a Ghost module to replace common convolution in a backbone network. Specifically, the Ghost module may generate a part of the real feature layer through convolution operation, obtain the remaining phantom feature layer on the real feature layer through linear operation, and then splice the real feature layer and the phantom feature layer together to form a complete feature layer. It can be seen that compared to the original common convolution model, the Ghost module has a better compression effect, and the feature extraction effect is well maintained.

In this embodiment, the Ghost module and the normal convolution are specifically compared as follows:

The input image size is h×w×c, the output feature image size is h '×w' ×n, the convolution kernel size is k×k, s is an hyper-parameter and is far smaller than c, the common convolution calculation consumption is shown in formula (1), and the calculation consumption of the Ghost module is shown in formula (2):

（1）

（2）

the ratio value between the two is r _c, and the specific calculation is shown in the formula (3):

（3）

It can be seen that using the Ghost module can save s times training time and cut s times model parameters.

S207, classifying and positioning the feature map data through a bidirectional pyramid structure to obtain a focusing feature map carrying fusion features.

In the embodiment, the method adopts a bidirectional characteristic pyramid structure to transmit task information. The bidirectional feature pyramid structure is a structure for decoding image feature information. The system consists of a characteristic pyramid network and a pyramid attention network. The feature pyramid network structure is top-down, delivering strong semantic feature information from higher levels into the entire pyramid, but only enhancing semantic information and not positioning information. To complement the inadequacies of the feature pyramid network, pyramid attention structures are added to the back of the feature pyramid structure, conveying strongly located feature information from lower levels in a bottom-up fashion. It can be seen that the use of this structure can enhance the accuracy of target detection and reduce the rate of missed detection of small target objects by the network.

By implementing the implementation mode, shallow and deep characteristic information learned by the deep neural network can be fully utilized to process the target detection task and the classification and positioning task in image segmentation. For classification tasks, deep features may be more important, while for localization tasks, shallow and deep features are equally important. By using the bidirectional feature pyramid structure, the accuracy of target detection and image segmentation can be improved, and the omission ratio of a network to a small target object and a small image point can be reduced, so that the task information transfer from a backbone network to a bottleneck layer is realized.

Finally, feature extraction is carried out on the input image through a backbone structure based on a cross-stage depth residual convolution network, and a focusing feature map carrying fusion features is finally obtained。

In this embodiment, since an additional intermediate module is generally required to perform multi-level feature fusion and compression on the downstream task in the multi-task environment sensing process. Therefore, the method introduces a bottleneck layer (as shown in fig. 6) as an intermediate layer between the backbone network and the downstream task module, so that feature fusion and feature enhancement between different tasks are unified in the bottleneck layer. The bottleneck layer generates a feature map which is richer and has more semantic information by fusing and combining features from different levels of the backbone network. In this regard, the method specifically proposes steps S208 to S211.

S208, the focusing characteristic diagram is processed through the channel attention module, and the channel focusing characteristic diagram is obtained.

In this embodiment, in the flow of bottleneck layer multitasking feature fusion and feature enhancement, input data is required in advance: focusing feature map(Feature size is h '×w' ×n).

S209, processing the focusing characteristic map through a spatial attention module to obtain a spatial focusing characteristic map.

S210, carrying out fusion processing on the channel focusing feature map and the space focusing feature map to obtain a fusion feature vector map.

In this embodiment, the method may use a convolution attention mechanism module to perform feature fusion. In order to overcome the difficulty of high miss rate of the current neural network in detecting a remote small-sized vehicle target and an actual driving scene, the method proposes to fuse a convolution attention mechanism module in the multi-task neural network so as to improve the small-sized target detection effect and reduce the miss rate. In particular, the module is an attention mechanism for neural networks that solves the information overload problem in situations where computational power is limited by allocating computational resources to more important tasks. Therefore, the method can concentrate attention on more critical information in the current task by introducing an attention mechanism, reduce attention on other information and even filter irrelevant information, thereby improving the efficiency and accuracy of task processing.

In this embodiment, the convolution attention mechanism module mainly comprises a channel attention module and a spatial attention module, converts the input image information extracted by the initial feature into low-level features, and can capture more accurate high-level feature information by adding the channel and the spatial attention module. The convolution attention mechanism module is a lightweight module, and because of its versatility, the added overhead of integrating it into any convolution neural network is negligible.

Specifically, the focusing characteristic diagram is cascaded with the focusing characteristic diagram through a channel attention block reuse residual error network module to obtain the channel focusing characteristic diagram; Then the channel focusing characteristic map is cascaded with the channel focusing characteristic map by utilizing the space attention block and the residual error network module to generate a space focusing characteristic map/>. Finally, cascading the channel focusing feature map and the space focusing feature map to generate a fusion feature vector map/>。

S211, sequentially inputting the focusing feature images into a channel attention module and a space attention module which are sequentially arranged for processing, and obtaining an enhanced feature vector image.

In this embodiment, the method may employ a convolution attention mechanism module for feature enhancement. The specific operation is similar to feature fusion, only the application of a residual error module is removed, a focusing feature map is sequentially input into a channel attention module and a space attention module of a sequence model, and finally an enhanced feature vector map is generated。

By implementing the implementation mode, more accurate target position and category information can be provided through the feature maps, so that the accuracy of target detection is improved; meanwhile, more accurate image segmentation boundaries and areas can be provided through the feature images, so that the accuracy of image segmentation is improved. The bottleneck layer plays a role in connecting a backbone network and a target detection module in a target detection task, and improves detection performance through feature fusion and feature enhancement; meanwhile, the bottleneck layer plays a role in connecting a backbone network and a lane line segmentation module in a lane line segmentation task, and improves the image segmentation performance through feature fusion and feature enhancement.

S212, acquiring backbone network weights of the backbone network architecture.

S213, performing secondary target detection feature extraction on the fusion feature vector diagram based on the backbone network weight and the target detection deep learning model based on the Faster R-CNN to obtain a secondary target detection feature diagram.

S214, performing channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map to obtain the encoded target detection feature map.

S215, processing the target detection feature map through a target detection decoder to obtain a road information target detection result.

S216, performing secondary image segmentation feature extraction on the fusion feature vector image based on the backbone network weight and the SCNN-based image segmentation deep learning model to obtain a secondary image segmentation feature image.

S217, performing channel direction splicing processing on the secondary image segmentation feature map and the enhancement feature vector map to obtain an encoded image segmentation feature map.

S218, processing the image segmentation feature map through an image segmentation decoder to obtain an image segmentation result.

In this embodiment, steps S212 to S218 provide methods for target detection and road marking segmentation detection. After the enhanced features and the fusion features output in the backbone network are obtained, the method can respectively perform simple secondary feature processing on downstream tasks, and finally the corresponding vehicle-mounted perception information is obtained.

In this embodiment, the flow and method of different downstream tasks may be as follows:

first, input data: enhanced feature vector diagram And fusion feature vector diagram/>(Note: input features for different tasks are consistent).

And secondly, realizing target detection by adopting a multi-task neural network. On the basis of sharing the previous backbone network weight, a target detection deep learning model based on Faster R-CNN is added in a subsequent task, and the function of the model is to analyze and predict the positions and the categories of road information target frames such as vehicles, pedestrians and the like from a convolution characteristic diagram.

Specifically, the target detection deep learning model firstly performs secondary target detection feature extraction on the fusion feature vector diagram to obtain a secondary target detection feature diagram, then performs channel direction splicing operation on the secondary target detection feature diagram and the enhancement feature vector diagram to obtain an encoded target detection feature diagram, and takes the encoded target detection feature diagram as input of a target detection decoder part. The decoder is responsible for adjusting and correcting the location of the bounding box proposed by the RPN by applying a regression operation to each candidate region in the convolution feature map. In this way, the final prediction box can more accurately fit the actual position of the target. The decoder is also responsible for predicting for each candidate region the class of the object it contains. By applying the softmax activation function (corresponding to equation (4)), each possible target class is classified and the class with the highest probability is selected.

（4）

Wherein the formula represents rescaling the n-dimensional input tensor such that the element is within the [0,1] range and the sum is 1. Finally, obtaining road target information results such as vehicles, pedestrians, obstacles and the like after frame selection。

Then, the method completes image segmentation by adopting a multi-task neural network. On the basis of sharing the previous backbone network weight, an image segmentation deep learning model based on SCNN is added in the subsequent tasks, and the function of the model is to generate a pixel-level segmentation result from a convolution feature map and allocate different areas in the image to specific categories, such as roads and non-roads. Then, the division of the vehicle course is completed.

Specifically, the image segmentation deep learning model also needs to extract secondary image segmentation features of the fusion feature vector diagram to obtain a secondary image segmentation feature diagram, and then performs channel direction stitching operation on the secondary image segmentation feature diagram and the enhancement feature vector diagram, and the secondary image segmentation feature diagram and the enhancement feature vector diagram are used as decoder parts for inputting the image segmentation of the encoded image segmentation feature diagram. The decoder is responsible for restoring the feature mapping of the high-level abstraction to the original image resolution by up-sampling and analyzing the convolution feature map; in another aspect the decoder further classifies each pixel to determine the class to which it belongs.

Finally, the decoder generates a matrix of the same size as the original image, wherein each pixel is marked as its category, as a result of the segmentation of the final generated image。

S219, carrying out information integration processing on the road information target detection result and the image segmentation result to obtain mixed perception information.

In this embodiment, the method may divide the image obtained as described aboveAnd road information target detection results/>Respectively, to the information integration system of the unmanned vehicle, and performs a cascade mixing operation as in formula (5).

（5）

Therein, whereinIs a mixed perceptual information feature vector. Finally, the mixed perception information/>, under the multi-task environment, is obtained through the cascade mixing operationIncluding the location, category, and road marking information of the detected vehicle and pedestrian.

Therefore, by implementing the vehicle environment sensing method described in the embodiment, a lightweight convolution module Ghost module can be introduced, so that the parameter quantity of the backbone network is greatly reduced, and meanwhile, the utilization efficiency of computing resources is improved, so that the utilization efficiency of the model computing resources on the unmanned automobile is improved. The method can also introduce full utilization of deep and shallow characteristic information by a characteristic pyramid and capture of more local information by a convolution attention mechanism, so that semantic information and positioning information are enhanced, and finally, the detection precision of the unmanned automobile road information target is improved. And the bottleneck layer can be added on the basis of a cross-stage deep residual convolution network backbone network to realize weight sharing, so that fusion characteristics are provided for downstream tasks, and a plurality of tasks can be executed under the same model architecture in an unmanned perception manner.

Example 3

Referring to fig. 7, fig. 7 is a schematic structural diagram of a vehicle environment sensing device according to the present embodiment. As shown in fig. 7, the vehicle environment sensing device includes:

An acquiring unit 310, configured to acquire vehicle environment data and a pre-constructed vehicle-mounted multitasking environment perception model; the vehicle-mounted multitasking environment perception model comprises a backbone network architecture, a bottleneck layer, a target detection model and an image segmentation model;

The first processing unit 320 is configured to process the vehicle environment data through a backbone network architecture to obtain a focusing feature map carrying the fusion feature;

A second processing unit 330, configured to process the focusing feature map through a bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map;

The third processing unit 340 is configured to process the fused feature vector diagram and the enhanced feature vector diagram through a target detection model, so as to obtain a road information target detection result;

a fourth processing unit 350, configured to process the fused feature vector diagram and the enhanced feature vector diagram through an image segmentation model, so as to obtain an image segmentation result;

The information integrating unit 360 is configured to perform information integration processing on the road information target detection result and the image segmentation result, so as to obtain hybrid perception information.

In this embodiment, the explanation of the vehicle environment sensing device may refer to the description in embodiment 1 or embodiment 2, and the description is not repeated in this embodiment.

Therefore, the vehicle environment sensing device described in the embodiment can adopt the camera carrying the corresponding algorithm to acquire the important road information, so that the acquisition cost of the important road information is reduced; meanwhile, the problems that lane line segmentation is difficult and multi-task learning cannot be completed in the environment perception of the unmanned automobile can be solved; finally, the network model structure required in the unmanned automobile environment perception can be simplified, so that the corresponding vehicle-mounted computing resource cost is reduced.

Example 4

Referring to fig. 8, fig. 8 is a schematic structural diagram of a vehicle environment sensing device according to the present embodiment. As shown in fig. 8, the vehicle environment sensing device includes:

As an alternative embodiment, the vehicle environment sensing device further includes:

an acquisition unit 370 for acquiring raw video data for training a model; wherein the raw video data includes driving scene data under different geographic, environmental and weather conditions;

The labeling unit 380 is configured to perform labeling processing on the original video data to obtain a training data set;

a construction unit 390 for pre-constructing an original environment awareness model;

The training unit 400 is configured to train the original environment perception model through the training data set, and obtain a trained vehicle-mounted multi-task environment perception model.

As an alternative embodiment, the first processing unit 320 includes:

The first extraction subunit 321 is configured to perform image feature extraction on the vehicle environment data through the lightweight convolution module, so as to obtain feature map data;

the classifying and positioning subunit 322 is configured to perform classifying and positioning processing on the feature map data through the bidirectional pyramid structure, so as to obtain a focused feature map carrying the fusion feature.

As an alternative embodiment, the second processing unit 330 includes:

the first processing subunit 331 is configured to process the focusing feature map through the channel attention module to obtain a channel focusing feature map;

The first processing subunit 331 is further configured to process the focusing feature map through a spatial attention module to obtain a spatial focusing feature map;

the fusion subunit 332 is configured to perform fusion processing on the channel focusing feature map and the spatial focusing feature map to obtain a fusion feature vector map;

the first processing subunit 331 is further configured to sequentially input the focusing feature map into the channel attention module and the spatial attention module that are sequentially arranged for processing, so as to obtain an enhanced feature vector map.

As an alternative embodiment, the third processing unit 340 includes:

An obtaining subunit 341, configured to obtain a backbone network weight of the backbone network architecture;

The second extraction subunit 342 is configured to perform secondary target detection feature extraction on the fused feature vector diagram based on the backbone network weight and the target detection deep learning model based on the fast R-CNN to obtain a secondary target detection feature diagram;

A first splicing subunit 343, configured to perform channel direction splicing processing on the secondary target detection feature map and the enhanced feature vector map, so as to obtain a coded target detection feature map;

the second processing subunit 344 is configured to process the target detection feature map through the target detection decoder to obtain a road information target detection result.

As an alternative embodiment, the fourth processing unit 350 includes:

A third extraction subunit 351, configured to perform secondary image segmentation feature extraction on the fused feature vector diagram based on the backbone network weight and the SCNN-based image segmentation depth learning model, to obtain a secondary image segmentation feature diagram;

the second stitching subunit 352 is configured to perform channel direction stitching on both the secondary image segmentation feature map and the enhancement feature vector map, so as to obtain an encoded image segmentation feature map;

the third processing subunit 353 is configured to process the image segmentation feature map by using an image segmentation decoder, so as to obtain an image segmentation result.

An embodiment of the present application provides an electronic device, including a memory and a processor, where the memory is configured to store a computer program, and the processor is configured to execute the computer program to cause the electronic device to execute a vehicle environment sensing method in embodiment 1 or embodiment 2 of the present application.

Embodiments of the present application provide a computer readable storage medium storing computer program instructions that, when read and executed by a processor, perform the vehicle environment awareness method of embodiment 1 or embodiment 2 of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method of vehicle context awareness, comprising:

2. The vehicle environment awareness method of claim 1, wherein prior to the acquiring vehicle environment data and the pre-built on-board multitasking environment awareness model, the method further comprises:

labeling the original video data to obtain a training data set;

pre-constructing an original environment perception model;

3. The vehicle context awareness method of claim 1, wherein the backbone network architecture comprises a lightweight convolution module and a bi-directional pyramid structure;

4. The vehicle environment awareness method according to claim 3, wherein the processing the vehicle environment data through the backbone network architecture to obtain a focused feature map carrying a fused feature comprises:

5. The vehicle environment awareness method according to claim 3, wherein the processing the focus feature map through the bottleneck layer to obtain a fusion feature vector map and an enhancement feature vector map includes:

6. The vehicle environment sensing method according to claim 3, wherein the processing the fused feature vector diagram and the enhanced feature vector diagram by the object detection model to obtain a road information object detection result includes:

acquiring backbone network weights of the backbone network architecture;

7. The vehicle environment sensing method according to claim 6, wherein the processing the fused feature vector diagram and the enhanced feature vector diagram by the image segmentation model to obtain an image segmentation result includes:

8. A vehicular environment sensing device, characterized by comprising:

9. An electronic device comprising a memory for storing a computer program and a processor that runs the computer program to cause the electronic device to perform the vehicle context awareness method of any of claims 1 to 7.

10. A readable storage medium having stored therein computer program instructions which, when read and executed by a processor, perform the vehicle environment awareness method of any of claims 1 to 7.