CN115861969A

CN115861969A - Intelligent driving vision system and method based on artificial intelligence

Info

Publication number: CN115861969A
Application number: CN202211626461.0A
Authority: CN
Inventors: 吴戈; 刘笑颖
Original assignee: Shaoxing Yanqi Technology Co ltd
Current assignee: Shaoxing Yanqi Technology Co ltd
Priority date: 2022-12-16
Filing date: 2022-12-16
Publication date: 2023-03-28

Abstract

The application discloses an intelligent driving visual system based on artificial intelligence and a method thereof, wherein a first convolution neural network used as a feature extractor is used for coding radar data collected by a millimeter wave radar to obtain a radar depth feature map, a second convolution neural network used for spatial attention is used for coding image data collected by a vehicle-mounted camera to obtain a spatial enhancement feature map, then the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map, the classification feature map is used for obtaining a classification loss function value through a classifier, and then the first convolution neural network and the second convolution neural network are trained by taking the weighted sum of a label value scattering response loss function value of the spatial enhancement feature map and the classification loss function value as a loss function value, so that the accuracy of vehicle target detection is improved to support intelligent driving.

Description

Intelligent driving vision system and method based on artificial intelligence

Technical Field

The present application relates to the field of intelligent driving technology, and more particularly, to an intelligent driving vision system and method based on artificial intelligence.

Background

In recent years, with the rapid development of computer vision and intelligent driving technologies, a series of perception algorithms based on convolutional neural networks are proposed, and the perception performance of an intelligent driving automobile is greatly improved. Because the optical camera is susceptible to weather, illumination and the like, when the optical camera faces severe weather such as rain and fog, low illumination and remote small target detection scenes, a sensing system only depending on images has a large amount of missed detection and false detection.

Accordingly, an optimized smart driving vision system for a vehicle that can more accurately recognize and classify an object detected by the vehicle to support smart driving is desired.

Disclosure of Invention

The present application is proposed to solve the above-mentioned technical problems. The embodiment of the application provides an intelligent driving vision system, method and electronic equipment based on artificial intelligence, wherein radar data collected by a millimeter wave radar is coded through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map, image data collected by a vehicle-mounted camera is coded through a second convolution neural network using spatial attention to obtain a spatial enhancement feature map, then the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map, the classification feature map is passed through a classifier to obtain a classification loss function value, and then the first convolution neural network and the second convolution neural network are trained through a label value scattering response loss function value of the spatial enhancement feature map and a weighted sum of the classification loss function values as a loss function value, so that the accuracy of vehicle target detection is improved to support intelligent driving.

According to one aspect of the present application, there is provided an artificial intelligence based intelligent driving vision system, comprising: a training module comprising: a training data acquisition unit for acquiring radar data acquired by a millimeter wave radar disposed in a vehicle and image data acquired by a vehicle-mounted camera disposed in the vehicle; the coordinate conversion unit is used for converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on the conversion relation between a radar plane and a camera plane; the explicit spatial coding unit is used for enabling the radar point cloud projection image to pass through a first convolution neural network serving as a feature extractor to obtain a radar depth feature image; a spatial attention coding unit, which is used for passing the image data through a second convolutional neural network using spatial attention to obtain a spatial enhancement feature map; the feature map fusion unit is used for fusing the radar depth feature map and the space enhancement feature map to obtain a classification feature map; the classification loss unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification loss function value; a label scattering response loss unit, configured to calculate a label value scattering response loss function value of the spatial enhancement feature map, where the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; a training unit for training the first convolutional neural network and the second convolutional neural network with a weighted sum of the tag value scattering response loss function value and the classification loss function value as a loss function value; and an inference module comprising: the vehicle monitoring data acquisition unit is used for acquiring radar data acquired by a millimeter wave radar deployed on a vehicle and image data acquired by a vehicle-mounted camera deployed on the vehicle; the space conversion unit is used for converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on the conversion relation between a radar plane and a camera plane; the radar data coding unit is used for enabling the radar point cloud projection image to pass through the first convolution neural network which is trained by the training module and used as the feature extractor so as to obtain a radar depth feature image; the image data coding unit is used for enabling the image data to pass through a second convolutional neural network which is trained by the training module and uses the space attention so as to obtain a space strengthening feature map; the multi-sensor data feature fusion unit is used for fusing the radar depth feature map and the space enhancement feature map to obtain a classification feature map; and the sensing unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification result, and the classification result is a class label of an object to be identified in the image data.

In another aspect, the present application provides an intelligent driving vision method based on artificial intelligence, comprising: a training phase comprising: acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; passing the radar point cloud projection map through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map; passing the image data through a second convolutional neural network using spatial attention to obtain a spatially enhanced feature map; fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; passing the classification feature map through a classifier to obtain a classification loss function value; calculating a label value scattering response loss function value of the spatial enhancement feature map, wherein the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; training the first convolutional neural network and the second convolutional neural network with a weighted sum of the tag value scatter response loss function value and the classification loss function value as a loss function value; and an inference phase comprising: acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; passing the radar point cloud projection map through the first convolution neural network which is trained by the training module and is used as a feature extractor to obtain a radar depth feature map; passing the image data through a second convolutional neural network which is trained by the training module and uses spatial attention to obtain a spatial enhanced feature map; fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; and enabling the classification characteristic graph to pass through a classifier to obtain a classification result, wherein the classification result is a class label of an object to be recognized in the image data.

According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the artificial intelligence based intelligent driving vision method as described above.

According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform an artificial intelligence based intelligent driving vision method as described above.

Compared with the prior art, the intelligent driving vision system based on artificial intelligence and the method thereof have the advantages that the radar data collected by the millimeter wave radar are coded through the first convolution neural network serving as the feature extractor to obtain the radar depth feature map, the image data collected by the vehicle-mounted camera is coded through the second convolution neural network with spatial attention to obtain the spatial enhanced feature map, then the radar depth feature map and the spatial enhanced feature map are fused to obtain the classification feature map, the classification feature map is subjected to the classifier to obtain the classification loss function value, and then the first convolution neural network and the second convolution neural network are trained through the label value scattering response loss function value of the spatial enhanced feature map and the weighted sum of the classification loss function values as the loss function value, so that the accuracy of vehicle target detection is improved to support intelligent driving.

Drawings

The above and other objects, features and advantages of the present application will become more apparent by describing in more detail embodiments of the present application with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 illustrates an application scenario of an intelligent driving vision system based on artificial intelligence according to an embodiment of the present application.

FIG. 2 illustrates a block diagram of an artificial intelligence based intelligent driving vision system, according to an embodiment of the present application;

FIG. 3A illustrates an architectural diagram of a training phase of an artificial intelligence based intelligent driving vision system, according to an embodiment of the present application;

FIG. 3B illustrates an architectural diagram of an inference phase of an artificial intelligence based intelligent driving vision system, in accordance with an embodiment of the present application;

FIG. 4A illustrates a flow chart of a training phase of an artificial intelligence based intelligent driving vision method in accordance with an embodiment of the present application;

FIG. 4B illustrates a flow chart of an inference phase of an artificial intelligence based intelligent driving vision method in accordance with an embodiment of the present application;

FIG. 5 illustrates a block diagram of an electronic device in accordance with an embodiment of the present application.

Detailed Description

Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be understood that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and that the present application is not limited by the example embodiments described herein.

Application overview:

correspondingly, in the technical scheme of this application, this application inventor considers that compare in traditional camera, millimeter wave radar still can keep stable range finding speed measuring performance under bad weather and illumination condition, consequently, if can encode and fuse the information that the millimeter wave radar detected and the information that on-vehicle camera detected with appropriate mode, be favorable to improving vehicle target detection's precision in order to support intelligent driving.

Specifically, in the technical scheme of the application, firstly, radar data of a measured object is collected through a millimeter wave radar deployed on a vehicle, and image data of the measured object is collected through a vehicle-mounted camera deployed on the vehicle.

Considering that the radar data and the image data are not in the same coordinate system, the radar data is first spatially mapped to convert the radar data into a pixel coordinate system of the image data to obtain a radar point cloud projection map. Specifically, firstly, calibrating the vehicle-mounted camera and jointly calibrating the vehicle-mounted camera and the millimeter wave radar to obtain sensor parameters; then, the radar data is subjected to space mapping based on the sensor units to obtain the radar point cloud projection map.

And then, passing the radar point cloud projection map through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map. That is, a convolutional neural network with excellent performance in the field of image feature extraction is used as a feature extractor to extract high-dimensional implicit associations between pixels in the radar point cloud projection image so as to obtain the radar depth feature image.

Also, the image data is explicitly spatially encoded using a convolutional neural network as a feature extractor to obtain a feature map. In particular, in the technical solution of the present application, in order to make the measured object in the image have stronger identifiability, the image data is explicitly and spatially enhanced coded using a convolutional neural model with spatial attention to obtain a spatially enhanced feature map, considering that the image data has other interference information besides the measured object, such as street background, environmental interference, and the like.

Considering that the image data comprises the measured object, the radar point cloud projection map also comprises the measured object, so that the space enhancement feature map and the radar depth feature map have certain consistency and relevance in a high-dimensional feature space. Therefore, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map containing image information and millimeter wave information. And then, the classification characteristic map is processed by a classifier to obtain a classification result of a class label for representing the measured object.

However, for the spatial enhancement feature map obtained by using the spatial attention mechanism, the spatial position of the feature distribution changes due to the spatial enhancement constraint of the spatial attention mechanism on the feature value set. In the classification process, the feature distribution has position sensitivity relative to the label value, so that a label value scattering response loss function is calculated for the spatial enhanced feature map F:

j is a label value, F is a feature value of the space-enhancing feature map F, and p is a probability value of the feature F under the label.

The loss function stacks feature distributions into a depth structure in a solution space of a classification problem based on feature and label values based on scattering responses of feature value positions relative to label probabilities, so that when a second convolutional neural network using a spatial attention mechanism is trained by taking the loss function as the loss function, the interpretability of the classification solution on model feature extraction is improved in a response angle-like mode, and the iterative optimization capability of the feature extraction of the second convolutional neural network relative to the classification solution is improved.

After training is completed, in the process of inference, radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle can be directly input into a trained convolutional neural network for feature extraction, so that a classification feature map is obtained, and then the classification feature map is classified by a classifier to obtain a classification result of a class label for representing an object to be recognized in the image data.

Based on this, the application provides an intelligent driving vision system based on artificial intelligence and a method thereof, which comprises a training phase and an inference phase. In the training stage, radar data collected by a millimeter wave radar deployed in a vehicle is converted from a radar coordinate system to a pixel coordinate system of image data collected by a vehicle-mounted camera of the vehicle to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane, the radar point cloud projection map is encoded through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map, the image data is encoded through a second convolution neural network with spatial attention to obtain a spatial enhancement feature map, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map, the classification feature map is passed through a classifier to obtain a classification loss function value, and the first convolution neural network and the second convolution neural network are trained through a weighted sum of a label value scattering response loss function value and the classification loss function value of the spatial enhancement feature map as a loss function value. In the inference stage, radar data acquired by a millimeter wave radar deployed in a vehicle passes through a first convolutional neural network trained in a training stage to obtain a radar depth feature map, image data acquired by a vehicle-mounted camera of the vehicle passes through a second convolutional neural network trained in the training stage to obtain a space enhanced feature map, then the radar depth feature map and the space enhanced feature map are fused to obtain a classification feature map, the classification feature map passes through a classifier to obtain a classification result, and the classification result is a class label of an object to be recognized in the image data, so that the accuracy of vehicle target detection is improved to support intelligent driving.

FIG. 1 illustrates a scene schematic diagram of an artificial intelligence based intelligent driving vision system according to an embodiment of the application. As shown in fig. 1, in an application scenario of the present application, first, radar data of a measured object is collected by a millimeter wave radar deployed in a vehicle and image data of the measured object is collected by a vehicle-mounted camera deployed in the vehicle, and then, the collected radar data and the image data are input into a server (S in fig. 1) deployed with an artificial intelligence based intelligent driving vision algorithm, wherein the server can train a first convolutional neural network and a second convolutional neural network on the collected radar data and the image data based on the artificial intelligence based intelligent driving vision algorithm.

After training is completed, in an inference stage, firstly, radar data of a measured object is collected through a millimeter wave radar deployed in a vehicle and image data of the measured object is collected through a vehicle-mounted camera deployed in the vehicle, and then, the collected radar data and the image data are input into a server (such as S in fig. 1) deployed with an artificial intelligence-based intelligent driving vision algorithm, wherein the server can process the collected radar data and the image data based on the artificial intelligence-based intelligent driving vision algorithm to output a classification result of a class label representing an object to be recognized in the image data.

Having described the general principles of the present application, various non-limiting embodiments of the present application will now be described with reference to the accompanying drawings.

Exemplary System

FIG. 2 illustrates a flow diagram of a training phase in an artificial intelligence based intelligent driving vision system, according to an embodiment of the application.

As shown in fig. 2, the intelligent driving vision system 100 based on artificial intelligence according to the embodiment of the present application includes: a training module 110 and an inference module 120.

As shown in fig. 2, the training module 110 includes: a training data acquisition unit 111 configured to acquire radar data acquired by a millimeter wave radar disposed in a vehicle and image data acquired by a vehicle-mounted camera disposed in the vehicle; a coordinate conversion unit 112, configured to convert the radar data from a radar coordinate system to a pixel coordinate system of the image data based on a conversion relationship between a radar plane and a camera plane to obtain a radar point cloud projection map; an explicit spatial coding unit 113, configured to pass the radar point cloud projection map through a first convolutional neural network as a feature extractor to obtain a radar depth feature map; a spatial attention coding unit 114 for passing the image data through a second convolutional neural network using spatial attention to obtain a spatial enhanced feature map; a feature map fusion unit 115, configured to fuse the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; a classification loss unit 116, configured to pass the classification feature map through a classifier to obtain a classification loss function value; a label scattering response loss unit 117, configured to calculate a label value scattering response loss function value of the spatial enhancement feature map, where the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; a training unit 118 for training the first convolutional neural network and the second convolutional neural network with a weighted sum of the tag value scatter response loss function value and the classification loss function value as a loss function value.

As shown in fig. 2, the inference module 120 includes: a vehicle monitoring data acquisition unit 121 configured to acquire radar data acquired by a millimeter wave radar disposed in a vehicle and image data acquired by a vehicle-mounted camera disposed in the vehicle; a space conversion unit 122, configured to convert the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relationship between a radar plane and a camera plane; a radar data encoding unit 123, configured to pass the radar point cloud projection map through the first convolutional neural network as a feature extractor trained by the training module to obtain a radar depth feature map; an image data encoding unit 124, configured to pass the image data through a second convolutional neural network that is trained by the training module and uses spatial attention to obtain a spatially enhanced feature map; a multi-sensor data feature fusion unit 125, configured to fuse the radar depth feature map and the spatial enhanced feature map to obtain a classification feature map; and the sensing unit 126 is configured to pass the classification feature map through a classifier to obtain a classification result, where the classification result is a class label of an object to be identified in the image data.

FIG. 3A illustrates an architectural diagram of a training phase in an artificial intelligence based intelligent driving vision system, according to an embodiment of the application. As shown in fig. 3A, in the training phase, in the network architecture, first, radar data collected by a millimeter wave radar of a vehicle is converted from a radar coordinate system to a pixel coordinate system of image data collected by a vehicle-mounted camera of the vehicle based on a conversion relationship between a radar plane and a camera plane to obtain a radar point cloud projection map, the radar point cloud projection map is passed through a first convolution neural network as a feature extractor to obtain a radar depth feature map, then, the image data is passed through a second convolution neural network using spatial attention to obtain a spatial enhancement feature map, then, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map, the classification feature map is passed through a classifier to obtain a classification loss function value, then, a label value scattering response loss function value of the spatial enhancement feature map is calculated, and the first convolution neural network and the second neural network are trained with a weighted sum of the label value scattering response loss function value and the classification loss function value as a loss function value.

FIG. 3B illustrates an architectural diagram of an inference stage in an artificial intelligence based intelligent driving vision system, according to an embodiment of the application. As shown in fig. 3B, in the inference stage, in the network architecture, first, radar data collected by a millimeter wave radar of a vehicle is converted from a radar coordinate system to a pixel coordinate system of image data collected by a vehicle-mounted camera of the vehicle based on a conversion relationship between a radar plane and a camera plane to obtain a radar point cloud projection map, the radar point cloud projection map passes through a trained first convolutional neural network to obtain a radar depth feature map, then, the image data passes through a trained second convolutional neural network to obtain a spatial enhancement feature map, then, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map, the classification feature map passes through a classifier to obtain a classification result, and the classification result is a class label of an object to be identified in the image data.

Specifically, in the training module 110, the training data acquisition unit 111 is configured to acquire radar data acquired by a millimeter wave radar disposed in a vehicle and image data acquired by an on-vehicle camera disposed in the vehicle. Accordingly, in the technical solution of the present application, the inventor of the present application considers that, compared to a conventional camera, a millimeter wave radar can still maintain stable distance and speed measurement performance under severe weather and lighting conditions, and therefore, if information (i.e., radar data) detected by the millimeter wave radar and information (i.e., image data) detected by a vehicle-mounted camera can be encoded and fused in a proper manner, it is beneficial to improve the accuracy of vehicle target detection to support intelligent driving.

The coordinate transformation unit 112 is configured to transform the radar data from a radar coordinate system to a pixel coordinate system of the image data based on a transformation relationship between a radar plane and a camera plane to obtain a radar point cloud projection. Considering that the radar data and the image data are not in the same coordinate system, the radar data is first spatially mapped to convert the radar data into a pixel coordinate system of the image data to obtain a radar point cloud projection map.

In some embodiments of the present application, the coordinate transformation unit 112 includes:

the calibration unit is used for calibrating the vehicle-mounted camera and carrying out combined calibration on the vehicle-mounted camera and the millimeter wave radar to obtain sensor parameters; and

and the conversion unit is used for carrying out space mapping on the radar data based on the sensor unit so as to obtain the radar point cloud projection map.

Specifically, firstly, calibrating the vehicle-mounted camera and jointly calibrating the vehicle-mounted camera and the millimeter wave radar to obtain sensor parameters; then, the radar data is subjected to space mapping based on the sensor units to obtain the radar point cloud projection map. That is, the radar data is converted to the same coordinate system as the image data based on the conversion relationship between the radar plane and the camera plane to obtain a radar point cloud projection map.

The explicit spatial coding unit 113 is configured to pass the radar point cloud projection map through a first convolutional neural network as a feature extractor to obtain a radar depth feature map. And then, passing the radar point cloud projection map through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map. That is, a convolutional neural network with excellent performance in the field of image feature extraction is used as a feature extractor to extract high-dimensional implicit associations between pixels in the radar point cloud projection image so as to obtain the radar depth feature image.

In some embodiments of the present application, the explicit spatial coding unit 113 is further configured to: using the layers of the first convolutional neural network to perform, in forward passes of the layers, on input data: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing based on a feature matrix on the convolution feature map to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; and outputting the last layer of the first convolution neural network as the radar depth feature map, and inputting the first layer of the first convolution neural network as the radar point cloud projection map.

The spatial attention coding unit 114 is configured to pass the image data through a second convolutional neural network using spatial attention to obtain a spatially enhanced feature map. Also, the image data is explicitly spatially encoded using a convolutional neural network as a feature extractor to obtain a spatially enhanced feature map. In particular, in the technical solution of the present application, in order to make the measured object in the image have stronger identifiability, the image data is explicitly and spatially enhanced coded using a convolutional neural model with spatial attention to obtain a spatially enhanced feature map, considering that the image data has other interference information besides the measured object, such as street background, environmental interference, and the like.

Specifically, the spatial attention coding unit 114 is further configured to: performing, using layers of a second convolutional neural network, on the input data in forward passes of the layers: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing on the convolution feature map based on a feature matrix to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; wherein the output of the last layer of the second convolutional neural network is the spatial enhancement feature map, and the input of the first layer of the second convolutional neural network is the image data.

The feature map fusion unit 115 and the classification loss unit 116 are configured to fuse the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map, and pass the classification feature map through a classifier to obtain a classification loss function value. Considering that the image data comprises the measured object, the radar point cloud projection map also comprises the measured object, so that the space enhancement feature map and the radar depth feature map have certain consistency and relevance in a high-dimensional feature space. Therefore, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map containing image information and millimeter wave information. And then, the classification characteristic map passes through a classifier to obtain a classification result of a class label for representing the measured object.

Specifically, the feature map fusion unit 115 is further configured to: fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map according to the following formula:

F＝λF ₁ +F ₂

wherein F is the classification feature map, F ₁ For the radar depth profile, F ₂ For the spatial enhancement feature map, "+" indicates the addition of elements at the corresponding positions of the radar depth feature map and the spatial enhancement feature map, and λ is a weighting parameter for controlling the balance between the radar depth feature map and the spatial enhancement feature map in the classification feature map.

Specifically, the classification loss unit 116 is further configured to: passing the classification feature map through a classifier to obtain a classification loss function value, comprising: performing full-join encoding on the classification feature map using a plurality of full-join layers of the classifier to obtain a classification feature vector; inputting the classification feature vectors into a Softmax classification function to obtain probability values of the classification feature vectors belonging to class labels of objects to be recognized in the image data respectively; determining the class label of the object to be identified in the image data corresponding to the person with the maximum probability value as a classification result; and calculating a cross entropy value between the classification result and the real value as the classification loss function value.

That is, in the training phase, the classification feature map is used as an input of the classifier, and is output as a classification result and a classification loss function value, the classification result may be used to represent class labels of objects to be identified in the image data, and the classification loss function value may be used to train the first convolutional neural network and the second convolutional neural network in combination with the label value scattering response loss function value of the spatial enhancement feature map as a loss function value.

Specifically, the classification feature map is processed by the classifier according to the following formula to generate a classification result, wherein the formula is as follows: softmax { (W) _n ，B _n )：...：(W ₁ ，B ₁ ) L Project (F) }, where Project (F) denotes the projection of the classification feature matrix as a vector, W ₁ To W _n As a weight matrix for each fully connected layer, B ₁ To B _n A bias matrix representing the layers of the fully connected layer.

However, for the spatial enhancement feature map obtained by using the spatial attention mechanism, the spatial position of the feature distribution changes due to the spatial enhancement constraint of the spatial attention mechanism on the feature value set. In the classification process, the feature distribution has position sensitivity relative to the label value, so that a label value scattering response loss function is calculated for the space strengthening feature graph F, and then, when the second convolutional neural network using a space attention mechanism is trained by taking the second convolutional neural network as the loss function, the interpretability of the classification solution on the model feature extraction is improved in a similar response angle mode, so that the iterative optimization capability of the feature extraction of the second convolutional neural network relative to the classification solution is improved.

Specifically, the label scattering response loss unit 117 and the training unit 118 are configured to calculate a label value scattering response loss function value of the spatial enhanced feature map, and train the first convolutional neural network and the second convolutional neural network with a weighted sum of the label value scattering response loss function value and the classification loss function value as a loss function value, where the label value scattering response loss function value of the spatial enhanced feature map is related to a probability value obtained by the spatial enhanced feature map through the classifier.

The tag scattering response loss unit 117 is further configured to: calculating a label value scattering response loss function value of the spatial enhancement feature map with the following formula:

wherein,

and F is a probability value of the feature F under the label.

In some optional embodiments of the present application, the training unit 118 is further configured to: in each iteration of the round, parameters of the first and second convolutional neural networks are updated simultaneously with the loss function values and using backpropagation of gradient descent. That is to say, in the training process, the parameters of the first convolutional neural network and the second convolutional neural network can be updated simultaneously, so as to realize synchronous training, and thus the cooperativity of the first convolutional neural network and the second convolutional neural network can be improved. Of course, in some optional embodiments, the first convolutional neural network and the second convolutional neural network may be trained separately, for example, first update the parameters of the first convolutional neural network, and then update the parameters of the second convolutional neural network, or first update the parameters of the second convolutional neural network, and then update the parameters of the first convolutional neural network, so that the training process may be more focused, which is beneficial to improving the performance of the system.

After training is completed, the inference phase is entered. That is, in the inference process, the acquired radar data acquired by the millimeter wave radar deployed in the vehicle and the image data acquired by the vehicle-mounted camera deployed in the vehicle may be directly input into the trained first and second convolutional neural networks for feature extraction, so as to obtain a classification feature map, and then the classification feature map is classified by the classifier to obtain a classification result of the class label representing the object to be recognized in the image data.

In summary, the intelligent driving vision system based on artificial intelligence according to the embodiment of the present application is illustrated, which encodes radar data collected by a millimeter wave radar through a first convolutional neural network serving as a feature extractor to obtain a radar depth feature map, encodes image data collected by a vehicle-mounted camera through a second convolutional neural network using spatial attention to obtain a spatial enhanced feature map, then fuses the radar depth feature map and the spatial enhanced feature map to obtain a classification feature map, passes the classification feature map through a classifier to obtain a classification loss function value, and trains the first convolutional neural network and the second convolutional neural network through a label value scattering response loss function value of the spatial enhanced feature map and a weighted sum of the classification loss function values as a loss function value, thereby facilitating improvement of vehicle target detection accuracy to support intelligent driving.

It is to be understood that some or all of the steps or operations in the above-described embodiments are merely examples, and other operations or variations of various operations may be performed by the embodiments of the present application. Further, the various steps may be performed in a different order presented in the above-described embodiments, and it is possible that not all of the operations in the above-described embodiments are performed.

Exemplary method

FIG. 4A illustrates a flow diagram of a training phase in an artificial intelligence based intelligent driving vision method according to an embodiment of the present application. As shown in fig. 4A, the intelligent driving vision method based on artificial intelligence according to the embodiment of the application includes: a training phase comprising: step S101, acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; s102, converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; s103, passing the radar point cloud projection map through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map; s104, obtaining a spatial enhancement feature map by the image data through a second convolution neural network using spatial attention; s105, fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; s106, enabling the classification characteristic graph to pass through a classifier to obtain a classification loss function value; s107, calculating a label value scattering response loss function value of the spatial enhancement feature map, wherein the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; s108, training the first convolutional neural network and the second convolutional neural network by taking the weighted sum of the label value scattering response loss function value and the classification loss function value as a loss function value.

FIG. 4B illustrates a flow diagram of an inference phase in an artificial intelligence based intelligent driving vision method according to an embodiment of the application. As shown in fig. 4B, the intelligent driving vision method based on artificial intelligence according to the embodiment of the present application includes: an inference phase comprising: s201, acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; s202, converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; s203, passing the radar point cloud projection map through the first convolution neural network which is trained by the training module and used as a feature extractor to obtain a radar depth feature map; s204, the image data passes through a second convolutional neural network which is trained by the training module and uses space attention to obtain a space enhancement feature map; s205, fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; s206, the classification characteristic graph is processed through a classifier to obtain a classification result, and the classification result is a class label of an object to be identified in the image data.

In one example, in the above-mentioned artificial intelligence based intelligent driving vision method, the converting the radar data from a radar coordinate system to a pixel coordinate system of the image data based on a conversion relation between a radar plane and a camera plane to obtain a radar point cloud projection map includes: the calibration unit is used for calibrating the vehicle-mounted camera and carrying out combined calibration on the vehicle-mounted camera and the millimeter wave radar to obtain sensor parameters; and the conversion unit is used for carrying out spatial mapping on the radar data based on the sensor unit so as to obtain the radar point cloud projection map.

In one example, in the above artificial intelligence based intelligent driving vision method, the passing the radar point cloud projection map through a first convolutional neural network as a feature extractor to obtain a radar depth feature map includes: using layers of a first convolutional neural network to perform, on input data, in forward passes of the layers: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing based on a feature matrix on the convolution feature map to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; and outputting the last layer of the first convolution neural network as the radar depth feature map, and inputting the first layer of the first convolution neural network as the radar point cloud projection map.

In one example, in the above intelligent driving vision method based on artificial intelligence, the passing the image data through a second convolutional neural network using spatial attention to obtain a spatially enhanced feature map includes: performing, using layers of a second convolutional neural network, on the input data in forward passes of the layers: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing based on a feature matrix on the convolution feature map to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; wherein the output of the last layer of the second convolutional neural network is the spatial enhancement feature map, and the input of the first layer of the second convolutional neural network is the image data.

In one example, in the above-mentioned artificial intelligence-based intelligent driving vision method, the radar depth feature map and the spatial enhancement feature map are fused to obtain a classification feature map according to the following formula:

F＝λF ₁ +F ₂

In one example, in the above intelligent driving vision method based on artificial intelligence, the passing the classification feature map through a classifier to obtain a classification loss function value includes: passing the classification feature map through a classifier to obtain a classification loss function value, comprising: performing full-join encoding on the classification feature map using a plurality of full-join layers of the classifier to obtain a classification feature vector; inputting the classification characteristic vectors into a Softmax classification function to obtain probability values of the classification characteristic vectors belonging to class labels of objects to be recognized in the image data respectively; determining the class label of the object to be identified in the image data corresponding to the image data with the maximum probability value as a classification result; and calculating a cross entropy value between the classification result and the real value as the classification loss function value.

In one example, in the above-described artificial intelligence-based intelligent driving vision method, the tag value scattering response loss function value of the spatial enhancement feature map is calculated with the following formula:

wherein,

and F is the value of the label value scattering response loss function, j is the label value, F is the characteristic value of the space enhancement characteristic map F, and p is the probability value of the characteristic F under the label.

Here, it will be understood by those skilled in the art that the specific functions and steps in the above-described artificial intelligence based intelligent driving vision system have been described in detail in the above description of the artificial intelligence based intelligent driving vision system with reference to fig. 2 to 3A and 3B, and thus, a repetitive description thereof will be omitted.

As described above, the intelligent driving vision system 100 based on artificial intelligence according to the embodiment of the present application may be implemented in various terminal devices, such as a server based on intelligent driving vision of artificial intelligence, and the like. In one example, the artificial intelligence based intelligent driving vision system 100 according to embodiments of the present application may be integrated into a terminal device as one software module and/or hardware module. For example, the intelligent driver vision system 100 based on artificial intelligence may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the intelligent driver vision system 100 based on artificial intelligence can also be one of many hardware modules of the terminal device.

Alternatively, in another example, the artificial intelligence based intelligent driving vision system 100 and the terminal device may also be separate devices, and the artificial intelligence based intelligent driving vision system 100 may be connected to the terminal device through a wired and/or wireless network and transmit the interactive information in an agreed data format.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present application is described with reference to fig. 5.

As shown in fig. 5, the electronic device 10 includes one or more processors 11 and memory 12.

The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.

Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 11 to implement the artificial intelligence based intelligent driving vision system of the various embodiments of the present application described above and/or other desired functionality. Various contents such as a loss function value may also be stored in the computer-readable storage medium.

In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other form of connection mechanism (not shown).

The input device 13 may include, for example, a keyboard, a mouse, and the like.

The output device 14 can output various information including classification results or warning prompts to the outside. The output devices 14 may include, for example, a display, speakers, a printer, and a communication network and its connected remote output devices, among others.

Of course, for simplicity, only some of the components of the electronic device 10 relevant to the present application are shown in fig. 5, and components such as buses, input/output interfaces, and the like are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present application may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in an artificial intelligence based intelligent driving vision system according to various embodiments of the present application described in the "exemplary methods" section of this specification, supra.

The computer program product may be written with program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in an artificial intelligence based intelligent driving vision system according to various embodiments of the present application described in the "exemplary methods" section above of this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present application in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present application are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present application. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the foregoing disclosure is not intended to be exhaustive or to limit the disclosure to the precise details disclosed.

The block diagrams of devices, apparatuses, systems referred to in this application are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably herein. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It should also be noted that in the devices, apparatuses, and methods of the present application, the components or steps may be decomposed and/or recombined. These decompositions and/or recombinations are to be considered as equivalents of the present application.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, the description is not intended to limit embodiments of the application to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An intelligent driving vision system based on artificial intelligence, comprising: a training module comprising: a training data acquisition unit for acquiring radar data acquired by a millimeter wave radar disposed in a vehicle and image data acquired by a vehicle-mounted camera disposed in the vehicle; the coordinate conversion unit is used for converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; the explicit spatial coding unit is used for enabling the radar point cloud projection image to pass through a first convolutional neural network serving as a feature extractor so as to obtain a radar depth feature image; a spatial attention coding unit, configured to pass the image data through a second convolutional neural network using spatial attention to obtain a spatial enhancement feature map; the feature map fusion unit is used for fusing the radar depth feature map and the space enhancement feature map to obtain a classification feature map; the classification loss unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification loss function value; a label scattering response loss unit, configured to calculate a label value scattering response loss function value of the spatial enhancement feature map, where the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; a training unit for training the first convolutional neural network and the second convolutional neural network with a weighted sum of the label value scatter response loss function value and the classification loss function value as a loss function value; and an inference module comprising: the vehicle monitoring data acquisition unit is used for acquiring radar data acquired by a millimeter wave radar deployed on a vehicle and image data acquired by a vehicle-mounted camera deployed on the vehicle; the space conversion unit is used for converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on the conversion relation between a radar plane and a camera plane; the radar data coding unit is used for enabling the radar point cloud projection image to pass through the first convolution neural network which is trained by the training module and used as the feature extractor so as to obtain a radar depth feature image; the image data coding unit is used for enabling the image data to pass through a second convolutional neural network which is trained by the training module and uses the space attention so as to obtain a space strengthening feature map; the multi-sensor data feature fusion unit is used for fusing the radar depth feature map and the space enhancement feature map to obtain a classification feature map; and the sensing unit is used for enabling the classification characteristic graph to pass through a classifier to obtain a classification result, and the classification result is a class label of an object to be identified in the image data.

2. The intelligent driver vision system based on artificial intelligence of claim 1, wherein the coordinate transformation unit comprises: the calibration unit is used for calibrating the vehicle-mounted camera and carrying out combined calibration on the vehicle-mounted camera and the millimeter wave radar to obtain sensor parameters; and the conversion unit is used for carrying out spatial mapping on the radar data based on the sensor unit so as to obtain the radar point cloud projection map.

3. The intelligent driving vision system based on artificial intelligence of claim 2, wherein the explicit spatial encoding unit is further configured to: using the layers of the first convolutional neural network to perform, in forward passes of the layers, on input data: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing based on a feature matrix on the convolution feature map to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; and outputting the last layer of the first convolution neural network as the radar depth feature map, and inputting the first layer of the first convolution neural network as the radar point cloud projection map.

4. The artificial intelligence based smart driver vision system of claim 3, wherein the spatial attention coding unit is further configured to: performing, using layers of a second convolutional neural network, on the input data in forward passes of the layers: performing convolution processing on the input data to obtain a convolution characteristic diagram; performing global pooling processing based on a feature matrix on the convolution feature map to obtain a pooled feature vector; carrying out nonlinear activation on the characteristic value of each position in the pooled characteristic vector to obtain an activated characteristic vector; wherein the output of the last layer of the second convolutional neural network is the spatial enhancement feature map, and the input of the first layer of the second convolutional neural network is the image data.

5. The artificial intelligence based intelligent driving vision system of claim 4, wherein the feature map fusion unit is further configured to: fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map according to the following formula:

F＝λF ₁ +F ₂

6. The intelligent artificial intelligence-based driving vision system of claim 5, wherein the classification loss unit is further configured to: passing the classification feature map through a classifier to obtain a classification loss function value, comprising: performing full-join encoding on the classification feature map using a plurality of full-join layers of the classifier to obtain a classification feature vector; inputting the classification feature vectors into a Softmax classification function to obtain probability values of the classification feature vectors belonging to class labels of objects to be recognized in the image data respectively; determining the class label of the object to be identified in the image data corresponding to the person with the maximum probability value as a classification result; and calculating a cross entropy value between the classification result and the real value as the classification loss function value.

7. The intelligent driver vision system based on artificial intelligence of claim 6, wherein the tag scattering response loss unit is further configured to: calculating a label value scattering response loss function value of the spatial enhancement feature map with the following formula:

wherein,

8. An artificial intelligence based intelligent driving vision method, comprising: a training phase comprising: acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; converting the radar data from a radar coordinate system to a pixel coordinate system of the image data based on a conversion relation between a radar plane and a camera plane to obtain a radar point cloud projection map; passing the radar point cloud projection map through a first convolution neural network serving as a feature extractor to obtain a radar depth feature map; passing the image data through a second convolutional neural network using spatial attention to obtain a spatially enhanced feature map; fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; passing the classification feature map through a classifier to obtain a classification loss function value; calculating a label value scattering response loss function value of the spatial enhancement feature map, wherein the label value scattering response loss function value of the spatial enhancement feature map is related to a probability value obtained by the spatial enhancement feature map through the classifier; training the first convolutional neural network and the second convolutional neural network with a weighted sum of the tag value scatter response loss function value and the classification loss function value as a loss function value; and an inference phase comprising: acquiring radar data acquired by a millimeter wave radar deployed in a vehicle and image data acquired by a vehicle-mounted camera deployed in the vehicle; converting the radar data from a radar coordinate system to a pixel coordinate system of the image data to obtain a radar point cloud projection map based on a conversion relation between a radar plane and a camera plane; passing the radar point cloud projection map through the first convolution neural network which is trained by the training module and is used as a feature extractor to obtain a radar depth feature map; passing the image data through a second convolutional neural network which is trained by the training module and uses spatial attention to obtain a spatial enhanced feature map; fusing the radar depth feature map and the spatial enhancement feature map to obtain a classification feature map; and passing the classification characteristic graph through a classifier to obtain a classification result, wherein the classification result is a class label of an object to be identified in the image data.

9. The artificial intelligence based intelligent driving vision method of claim 8, wherein the passing the classification feature map through a classifier to obtain a classification loss function value comprises: passing the classification feature map through a classifier to obtain a classification loss function value, comprising: performing full-join encoding on the classification feature map using a plurality of full-join layers of the classifier to obtain a classification feature vector; inputting the classification characteristic vectors into a Softmax classification function to obtain probability values of the classification characteristic vectors belonging to class labels of objects to be recognized in the image data respectively; determining the class label of the object to be identified in the image data corresponding to the person with the maximum probability value as a classification result; and calculating a cross entropy value between the classification result and the real value as the classification loss function value.

10. The artificial intelligence based smart driving vision method of claim 9, wherein the label value scattering response loss function value of the spatial enhancement feature map is calculated with the following formula:

wherein,

is a label value scattering response loss function value, j is a label value, F is a feature value of the spatial enhancement feature map F, and p is a feature valueSymbolize the probability value of F under the label. />