CN112016532A

CN112016532A - Vehicle detection method and device

Info

Publication number: CN112016532A
Application number: CN202011136716.6A
Authority: CN
Inventors: 刘畅
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-10-22
Filing date: 2020-10-22
Publication date: 2020-12-01
Anticipated expiration: 2040-10-22
Also published as: CN112016532B

Abstract

The application provides a vehicle detection method, a vehicle detection device, electronic equipment and a computer-readable storage medium; relates to an automatic driving technology based on artificial intelligence; the method comprises the following steps: acquiring an image of a vehicle in a road environment; carrying out feature extraction processing on the image to obtain feature maps of the image in multiple scales; performing feature fusion processing on the feature maps of the multiple scales to obtain a head feature map containing multiple features; mapping the head characteristic diagram to obtain the position of an enclosing frame of the vehicle and the position of a key point; and determining the attitude information of the vehicle according to the position of the surrounding frame and the position of the key point. By the method and the device, the attitude information of the vehicle can be efficiently and accurately detected.

Description

Vehicle detection method and device

Technical Field

The present application relates to an automatic driving technology in the field of artificial intelligence, and in particular, to a vehicle detection method, apparatus, electronic device, and computer-readable storage medium.

Background

As an important application of artificial intelligence, the automatic driving technology has been greatly developed in recent years. In the automatic driving technology, the perception system is an important part in automatic driving as the "otoscope" of an automatic driving vehicle. The safety driving of the autonomous driving depends on the correctness, the implementation and the robustness of the perception system. The performance of the sensing system depends on a sensor deployment scheme at the vehicle end, and common vehicle-end sensing devices comprise a camera, a laser radar, a millimeter wave radar and the like. On the basis of correct deployment of the sensors, the efficient perception algorithm can provide reliable guarantee for an automatic driving system.

Taking an application scene that a sensor of an automatic driving vehicle senses other vehicles as an example, in the related art, a 2D rectangular frame wrapping the vehicle can only be detected by a camera sensor, or a radar point displaying an obstacle is detected by a radar sensor, an error exists in the fusion of two detection results, the detection results of different sensor modes on the automatic driving vehicle cannot be well matched and associated, and the comprehensive information of the vehicle, such as attitude information and the like, is difficult to further give according to the detection results of the different sensor modes. Therefore, a detection algorithm capable of efficiently and accurately detecting the posture information of the vehicle is lacking in the related art.

Disclosure of Invention

The embodiment of the application provides a vehicle detection method, a vehicle detection device, electronic equipment and a computer readable storage medium, which can efficiently and accurately detect attitude information of a vehicle.

The technical scheme of the embodiment of the application is realized as follows:

the embodiment of the application provides a vehicle detection method, which comprises the following steps:

acquiring an image of a vehicle in a road environment;

carrying out feature extraction processing on the image to obtain feature maps of the image in multiple scales;

performing feature fusion processing on the feature maps of the multiple scales to obtain a head feature map containing multiple features;

mapping the head characteristic diagram to obtain the position of an enclosing frame of the vehicle and the position of a key point;

and determining the attitude information of the vehicle according to the position of the surrounding frame and the position of the key point.

The embodiment of the application provides a vehicle detection device, includes:

the acquisition module is used for acquiring images of vehicles in a road environment;

the extraction module is used for carrying out feature extraction processing on the image to obtain feature maps of the image in multiple scales;

the fusion module is used for carrying out feature fusion processing on the feature maps of the multiple scales to obtain a head feature map containing multiple features;

the mapping module is used for mapping the head characteristic diagram to obtain the position of an enclosing frame of the vehicle and the position of a key point;

and the first determining module is used for determining the attitude information of the vehicle according to the position of the surrounding frame and the position of the key point.

In the foregoing solution, the extracting module is further configured to iterate k to perform the following processing:

carrying out downsampling processing on the image to obtain a kth intermediate feature map;

performing downsampling processing on the kth intermediate feature map to obtain a (K + 1) th intermediate feature map until the kth intermediate feature map is obtained;

wherein K is an integer with the value increasing from 1 and the maximum value is K, and K is an integer greater than or equal to 2.

In the foregoing solution, the fusion module is further configured to iterate i to perform the following processing:

carrying out up-sampling processing on the K-i intermediate feature map to obtain an intermediate feature map with the same scale as the K-i-1 intermediate feature map;

adding the intermediate feature map obtained by the up-sampling processing and the K-i-1 th intermediate feature map pixel by pixel to obtain the head feature map containing a plurality of features;

wherein, i is an integer with the value increasing from 0 in turn, and the maximum value is K-1.

In the foregoing solution, before performing the mapping process on the head feature map, a vehicle detection apparatus provided in an embodiment of the present application further includes:

the weighting module is used for coding the feature map so as to compress the feature map into a one-dimensional matrix of weight parameters for characterizing each feature;

decoding the one-dimensional matrix to obtain a weight parameter of each feature;

and weighting the weight parameters into each feature of the head feature map to obtain a weighted head feature map.

In the above aspect, the plurality of features includes: the feature of the central point position, the offset feature of the central point position, the width and height feature of the bounding box and the feature of the key point position; the mapping module is further configured to perform convolution processing on the head feature map to obtain a feature representing the center point position of the vehicle, an offset feature of the center point position, a width and height feature of the bounding box, and a feature of the key point position;

predicting coordinates of the central point position of the enclosure frame through a first full-connection layer based on the characteristics of the central point position and the offset characteristics of the central point position;

predicting the width and the height of the surrounding frame through a second full-connection layer based on the width and the height characteristics of the surrounding frame;

determining the position of the surrounding frame of the vehicle according to the coordinates of the central point position of the surrounding frame and the width and the height of the surrounding frame;

and predicting the coordinates of the positions of the key points of the vehicle through a third full-connected layer based on the characteristics of the positions of the key points.

In the above solution, the key points include: a left tail lamp point and a right tail lamp point; the first determining module is further configured to determine, in a plane coordinate system including the bounding box and the keypoints, a line segment that meets at least one of the following conditions as a boundary line of a vehicle body tail: a line segment parallel to the line segment of the enclosure frame passing through the right tail lamp point and passing through the left tail lamp point; a line segment parallel to the line segment of the bounding box passing through the left tail light point and passing through the right tail light point;

when the boundary of the vehicle body tail is a line segment which is parallel to the line segment of the enclosure frame passing through the right tail lamp point and passes through the left tail lamp point, taking the area on the left side of the boundary in the enclosure frame as a vehicle body area, and taking the area on the right side of the boundary in the enclosure frame as a vehicle tail area;

when the boundary line of the vehicle body tail is a line segment which is parallel to the line segment of the enclosure frame passing through the left tail lamp point and passes through the right tail lamp point, the area on the right side of the boundary line in the enclosure frame is used as a vehicle body area, and the area on the left side of the boundary line in the enclosure frame is used as a vehicle tail area.

In the above solution, the key points include: a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point, and a right rear tire grounding point; the embodiment of the application provides a vehicle detection device, still includes:

a second determination module for using the connection line of the left front tire grounding point and the right front tire grounding point as a left wheel grounding line;

taking the connecting line of the left rear tire grounding point and the right rear tire grounding point as a right side wheel grounding line;

and determining the position relationship between the vehicle and the vehicle according to the relative relationship between the left-side wheel grounding line, the right-side wheel grounding line and the lane line of the vehicle.

In the foregoing aspect, the second determination module is configured to determine that the vehicle is in a left lane of the host vehicle when the right wheel ground line is on the left side of a lane line of the host vehicle and the right wheel ground line is parallel to the lane line of the host vehicle;

determining that the vehicle is in a right lane of the host vehicle when the left wheel ground line is on the right side of the lane line of the host vehicle and the left wheel ground line is parallel to the lane line of the host vehicle;

determining that the vehicle is in the same lane of the host vehicle when the left wheel ground line is on the right side of the lane line of the host vehicle and the right wheel ground line is on the left side of the lane line of the host vehicle;

determining that the vehicle changes lanes from a right lane of the host vehicle to a same lane of the host vehicle when the left wheel ground line is to the left of the lane line of the host vehicle and the left wheel ground line is not parallel to the lane line of the host vehicle;

when the right wheel grounding line is on the right side of the lane line of the vehicle and the right wheel grounding line is not parallel to the lane line of the vehicle, determining that the vehicle changes the lane from the left lane of the vehicle to the same lane of the vehicle.

In the above scheme, the feature extraction processing and the feature fusion processing are implemented by a convolutional neural network; before feature extraction is performed on the image, a vehicle detection apparatus provided in an embodiment of the present application further includes:

a training module to perform the following training operations on the convolutional neural network:

initializing parameters of the convolutional neural network;

determining a prediction loss function of the convolutional neural network according to the difference between the predicted value of the pixel point position of the image sample and the true value of the pixel point position of the image sample;

determining an offset loss function of the convolutional neural network according to a predicted offset value of a pixel point position of the image sample;

updating parameters of the convolutional neural network by minimizing an added value of the predictive loss function and the offset loss function.

a memory for storing executable instructions;

and the processor is used for realizing the vehicle detection method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer-readable storage medium, which stores executable instructions and is used for realizing the vehicle detection method provided by the embodiment of the application when being executed by a processor.

The embodiment of the application has the following beneficial effects:

the method has the advantages that the positions of the surrounding frame and the key points of the vehicle are detected simultaneously by extracting, fusing and mapping the features of the image of the vehicle in the road environment, so that the attitude information of the vehicle can be determined, namely the attitude information of the vehicle can be obtained only by image sensing data, and the perception performance of an image sensor is improved; the feature fusion processing can effectively fuse the low-level features and the high-level features, thereby enhancing the feature expression capability of the image and improving the accuracy of determining the positions of the bounding box and the key point.

Drawings

FIG. 1 is a schematic diagram of a vehicle frame inspection in the related art;

FIG. 2 is a schematic diagram of radar sensor detection in the related art;

fig. 3 is a schematic diagram of projection of an image detection result and a radar detection result onto the same image in the related art;

FIG. 4 is a schematic diagram of an architecture of a vehicle detection system 100 provided by an embodiment of the present application;

fig. 5 is a schematic structural diagram of a terminal 400 for vehicle detection provided in an embodiment of the present application;

FIG. 6A is a schematic flow chart of a vehicle detection method according to an embodiment of the present disclosure;

FIG. 6B is a schematic flow chart of a vehicle detection method according to an embodiment of the present disclosure;

FIG. 6C is a schematic flow chart of a vehicle detection method according to an embodiment of the present disclosure;

FIG. 7 is a schematic flow chart diagram of a vehicle detection method provided by an embodiment of the present application;

FIG. 8 is a schematic diagram of an enclosure and 6 key points of a vehicle provided by an embodiment of the present application;

FIG. 9 is a schematic diagram of a vehicle key point detection network algorithm provided by an embodiment of the present application;

FIG. 10 is a schematic process flow diagram of feature extraction and feature fusion provided in the embodiments of the present application;

fig. 11 is a schematic diagram of an actual drive test applied to an autonomous vehicle according to an embodiment of the present application.

Detailed Description

In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the attached drawings, the described embodiments should not be considered as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

In the following description, references to the terms "first \ second \ third" are only to distinguish similar objects and do not denote a particular order, but rather the terms "first \ second \ third" are used to interchange specific orders or sequences, where appropriate, so as to enable the embodiments of the application described herein to be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.

1) The automatic driving is a function of guiding and deciding a vehicle driving task without testing the physical driving operation performed by a driver, and replacing the test of the operation and control behavior of the driver to enable the vehicle to complete safe driving.

2) Obstacle: the vehicle generally refers to an object which is encountered by the vehicle during running and has the capability of colliding with the vehicle.

Sensors in an autonomous vehicle are responsible for sensing static elements (such as lane lines, lane markings, posts, signs, etc.) and dynamic elements (such as other vehicles, pedestrians, motorcycles, etc.) in an autonomous driving scenario. For the dynamic elements, the vehicles of other vehicles have the greatest influence on the self vehicle. Taking an application scenario that a sensor of an autonomous vehicle senses other vehicles as an example, referring to fig. 1, fig. 1 is a schematic diagram of vehicle frame detection in the related art, and an image modality result, that is, a 2D rectangular frame wrapping each vehicle, can be obtained through vehicle frame detection based on deep learning by a camera sensor; referring to fig. 2, fig. 2 is a schematic diagram of radar sensor detection in the related art, by which a radar modality result of an image, i.e., a radar point on an image plane showing an obstacle, can be obtained. However, referring to fig. 3, fig. 3 is a schematic diagram of projecting an image detection result and a radar detection result onto the same image in the related art, it can be seen that in the related art, a radar point or a 2D rectangular frame obtained by fusing the image detection result and the radar detection result has a misalignment condition, and in the related art, it is not possible to give comprehensive information of a vehicle, such as attitude information, according to the detection results of the two modalities through a fusion algorithm, and it is not possible to know which radar points in the frame belong to the side of the vehicle and which belong to the tail of the vehicle from the 2D rectangular frame result of the image plane, that is, it is difficult to perform good matching correlation on detection results of different sensor modalities on an autonomous vehicle. That is, the conventional vehicle detection method can only detect a rectangular frame or a radar point for other vehicles, has an error in the fusion of the two detection results, and has a difficulty in providing comprehensive information of the vehicle, such as attitude information, a positional relationship with the own vehicle, and the like, based on the detection results of different sensor modalities.

In view of the foregoing problems, embodiments of the present application provide a vehicle detection method, an apparatus, an electronic device, and a computer-readable storage medium, which can efficiently and accurately detect vehicle attitude information, and an exemplary application of the vehicle detection device provided in embodiments of the present application is described below. Next, an exemplary application when the vehicle detection apparatus is implemented as a terminal will be explained.

Referring to fig. 4, fig. 4 is a schematic structural diagram of the vehicle detection system 100 according to the embodiment of the present application, in order to support a vehicle detection application, the terminal 400 is connected to the server 200 through the network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 400 is configured to collect images of vehicles in a road environment, and send the images to the server 200, so as to obtain posture information returned by the server 200, provide data support for decision making of automatic driving, and provide a reliable driving environment for a user. The server 200 is configured to receive the image sent by the terminal 400, perform a series of processes including feature extraction, fusion, and mapping on the image, obtain the position of the bounding box of the vehicle and the position of the key point, and further determine the posture information of the vehicle.

In some embodiments, taking the obstacle as an example of a vehicle of another vehicle, the camera sensor in the terminal 400 sends the acquired image of the vehicle in the road environment to the server 200, and the server 200 receives the image sent by the terminal 400, performs a series of processes of feature extraction, fusion and mapping on the image, obtains the position of the bounding box of the vehicle and the position of the key point, further determines the attitude information of the vehicle, provides data support for decision of automatic driving, and provides a reliable automatic driving environment for the user.

In some embodiments, the server 200 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a network service, cloud communication, a middleware service, a domain name service, a security service, a CDN, and a big data and artificial intelligence platform. The terminal 400 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, and the like. The terminal and the server may be directly or indirectly connected through wired or wireless communication, and the embodiment of the present application is not limited.

The following describes in detail a hardware structure of an electronic device of the vehicle detection method provided in the embodiment of the present application, where the electronic device includes, but is not limited to, a server or a terminal. Taking an electronic device as the terminal 400 shown in fig. 4 as an example, referring to fig. 5, fig. 5 is a schematic structural diagram of the terminal 400 for vehicle detection provided in the embodiment of the present application, and the terminal 400 shown in fig. 5 includes: at least one processor 410, memory 450, at least one network interface 420, and a user interface 430. The various components in the terminal 400 are coupled together by a bus system 440. It is understood that the bus system 440 is used to enable communications among the components. The bus system 440 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 440 in fig. 5.

The Processor 410 may be an integrated circuit chip having Signal processing capabilities, such as a general purpose Processor, a Digital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like, wherein the general purpose Processor may be a microprocessor or any conventional Processor, or the like.

The user interface 430 includes one or more output devices 431, including one or more speakers and/or one or more visual displays, that enable the presentation of media content. The user interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

The memory 450 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard disk drives, optical disk drives, and the like. Memory 450 optionally includes one or more storage devices physically located remote from processor 410.

The memory 450 includes either volatile memory or nonvolatile memory, and may include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a Random Access Memory (RAM). The memory 450 described in embodiments herein is intended to comprise any suitable type of memory.

In some embodiments, memory 450 is capable of storing data, examples of which include programs, modules, and data structures, or a subset or superset thereof, to support various operations, as exemplified below.

An operating system 451, including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

a network communication module 452 for communicating to other computing devices via one or more (wired or wireless) network interfaces 420, exemplary network interfaces 420 including: bluetooth, wireless compatibility authentication (WiFi), and Universal Serial Bus (USB), etc.;

in some embodiments, the vehicle detection apparatus provided in the embodiments of the present application may be implemented in software, and fig. 5 shows a vehicle detection apparatus 453 stored in the memory 450, which may be software in the form of programs, plug-ins, and the like, and includes the following software modules: an acquisition module 4531, an extraction module 4532, a fusion module 4533, a mapping module 4534, a first determination module 4535, a weighting module 4536, a second determination module 4537, and a training module 4538, which are logical and thus can be arbitrarily combined or further split depending on the functions implemented. The functions of the respective modules will be explained below.

In other embodiments, the vehicle detection Device provided in the embodiments of the present Application may be implemented in hardware, and for example, the vehicle detection Device provided in the embodiments of the present Application may be a processor in the form of a hardware decoding processor, which is programmed to execute the vehicle detection method provided in the embodiments of the present Application, for example, the processor in the form of the hardware decoding processor may be one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), or other electronic components.

Based on the foregoing description of the implementation scenario and the electronic device of the vehicle detection method according to the embodiment of the present application, the vehicle detection method according to the embodiment of the present application is described below. In some embodiments, the vehicle detection method may be implemented by a server or a terminal alone, or implemented by a server and a terminal in cooperation, and the vehicle detection method provided by the embodiments of the present application will be described below in conjunction with exemplary applications and implementations of the terminal provided by the embodiments of the present application.

Referring to fig. 6A, fig. 6A is a schematic flowchart of a vehicle detection method provided in an embodiment of the present application, and will be described with reference to the steps shown in fig. 6A.

In step 101, an image of a vehicle in a road environment is acquired.

In some embodiments, capturing images of vehicles in a road environment may be accomplished by an image sensor on the host vehicle. For example, the image sensor may be an in-vehicle camera.

In the embodiment of the application, the information of the attitude dimension of the vehicle can be detected through the image data acquired by the camera sensor, and other types of sensing data such as radar and the like are not needed. Compared with the method and the device based on the image data of the vehicle, the method and the device based on the image data of the vehicle also need the radar data at the same moment, and the acquisition cost of the data is greatly reduced.

In step 102, feature extraction processing is performed on the image to obtain feature maps of the image at a plurality of scales.

In some embodiments, the feature extraction processing is performed on the image to obtain feature maps of multiple scales of the image, and the feature extraction processing may be implemented by: iterating k to perform the following: carrying out downsampling processing on the image to obtain a kth intermediate feature map; performing downsampling processing on the kth intermediate feature map to obtain a (K + 1) th intermediate feature map until the kth intermediate feature map is obtained; wherein K is an integer with the value increasing from 1 and the maximum value is K, and K is an integer greater than or equal to 2.

For example, assuming that the image scale is 160 × 480, 4 times of continuous downsampling processing is performed to obtain feature maps of multiple scales: carrying out downsampling processing on the image to obtain a 1 st intermediate feature map, wherein the scale of the intermediate feature map is 80 x 240; performing downsampling processing on the 1 st intermediate feature map to obtain a 2 nd intermediate feature map with the scale of 40 × 120; performing downsampling processing on the 2 nd intermediate feature map to obtain a 3 rd intermediate feature map with the dimension of 20 × 60; the 3 rd intermediate feature map is down-sampled to obtain a 4 th intermediate feature map having a scale of 10 × 30.

In step 103, feature fusion processing is performed on the feature maps of the plurality of scales to obtain a head feature map including a plurality of features.

In some embodiments, the feature fusion processing is performed on feature maps of multiple scales to obtain a head feature map including multiple features, which may be implemented by: iterating i to perform the following: carrying out up-sampling processing on the K-i intermediate feature map to obtain an intermediate feature map with the same scale as the K-i-1 intermediate feature map; adding the intermediate feature map obtained by the up-sampling process and the K-i-1 intermediate feature map pixel by pixel to obtain a head feature map containing a plurality of features; wherein, i is an integer with the value increasing from 0 in turn, and the maximum value is K-1.

For example, the feature fusion processing is performed on the 1 st intermediate feature map with a pixel of 80 × 240, the 2 nd intermediate feature map with a pixel of 40 × 120, the 3 rd intermediate feature map with a pixel of 20 × 60, and the 4 th intermediate feature map with a pixel of 10 × 30 to obtain a head feature map including a plurality of features, which can be implemented as follows: carrying out up-sampling processing on the 4 th intermediate feature map to obtain an intermediate feature map with the same scale as the 3 rd intermediate feature map; carrying out pixel-by-pixel addition on the intermediate feature map obtained by the up-sampling treatment and the 3 rd intermediate feature map and carrying out up-sampling treatment to obtain an intermediate feature map with the same scale as the 2 nd intermediate feature map; adding the obtained intermediate feature map and the 2 nd intermediate feature map pixel by pixel and performing up-sampling treatment to obtain an intermediate feature map with the same scale as the 1 st intermediate feature map; and adding the obtained intermediate feature map and the 1 st intermediate feature map pixel by pixel to obtain a head feature map comprising a plurality of features.

It should be noted that, here, the feature fusion processing is implemented by Element-wise summation (Element-wise sum), where the Element-wise summation is used to add corresponding pixels in the feature map, and the number of channels is not changed; the up-sampling layer is realized by a deconvolution module (Deconv Block), which is sequentially a deconvolution layer (Deconv), a normalization layer (BN, Batch Norm), and a modified linear unit layer (Relu). The deconvolution layer is equivalent to 2 times of upsampling, a normalization layer is arranged behind the deconvolution layer, a residual error connection is carried out between the input characteristic and the output characteristic of the deconvolution layer, and a modified linear unit layer is arranged behind the residual error connection; the deconvolution layer is used for expanding the scale of the fused feature graph, the normalization layer is used for normalizing the fused feature data, and the correction linear unit is used for adding nonlinear factors. When the intermediate feature map obtained by the upsampling process is added with the K-i-1 th intermediate feature map pixel by pixel, dimension alignment processing needs to be carried out, namely, an alignment module (Lat Block) is connected behind the K-i-1 th intermediate feature map and is used for converting the number of channels in the K-i-1 th intermediate feature map into the number of channels which is the same as the number of channels in the intermediate feature map obtained by the upsampling process; the alignment module is sequentially a convolution layer (1 × 1Conv), a normalization layer, and a modified linear unit layer.

In the embodiment of the application, on one hand, the problem of gradient disappearance caused by hierarchy deepening of the convolutional neural network is solved, on the other hand, for the loss of the feature map information of the convolutional neural network caused by convolutional operation, residual connection not only fuses multi-scale features, but also fuses original input information of the layer, so that the lost information is supplemented to a certain extent, and the feature fusion is further enhanced.

In step 104, the head feature map is subjected to mapping processing to obtain the position of the bounding box of the vehicle and the position of the key point.

In some embodiments, the plurality of features includes: the feature of the central point position, the offset feature of the central point position, the width and height feature of the bounding box and the feature of the key point position; the mapping processing is performed on the head feature map to obtain the position of the bounding box of the vehicle and the position of the key point, and the method can be realized by the following steps: performing convolution processing on the head feature map to obtain a feature representing the central point position of the vehicle, an offset feature of the central point position, a width and height feature of a bounding box and a feature of the key point position; predicting coordinates of the central point position of the enclosure frame through the first full-connection layer based on the characteristics of the central point position and the offset characteristics of the central point position; predicting the width and the height of the surrounding frame through the second full-connection layer based on the width and the height characteristics of the surrounding frame; determining the position of the surrounding frame of the vehicle according to the coordinates of the central point position of the surrounding frame and the width and the height of the surrounding frame; and predicting the coordinates of the positions of the key points of the vehicle through the third full-connected layer based on the characteristics of the positions of the key points.

In the embodiment of the application, two results, namely 6 key points of the vehicle and a surrounding frame of the vehicle, can be detected simultaneously through one-time forward operation.

In some embodiments, based on fig. 6A, referring to fig. 6B, fig. 6B is a schematic flowchart of a vehicle detection method provided in an embodiment of the present application, and before step 104, steps 106 to 108 may also be performed. The description will be made in conjunction with the respective steps.

In step 106, the header feature map is subjected to an encoding process to compress the header feature map into a one-dimensional matrix of weight parameters characterizing each feature.

In some examples, the encoding process is implemented by a global average pooling layer, information of positions in the whole header feature map is fused, global spatial information of the header feature map is compressed into a descriptor of one dimension, and a later network layer can obtain information of a global receptive field according to the descriptor, so that the condition that evaluation is not accurate due to the fact that a feature information range of local receptive field extraction is too small and reference information quantity is insufficient when weight evaluation is performed on features due to the problem of convolution kernel size is avoided.

In step 107, the one-dimensional matrix is decoded to obtain a weight parameter of each feature.

In some examples, the decoding process is implemented by convolutional layers followed by a sample-specific activation function; and (3) operating each feature through a screening mechanism based on the dependency relationship among the features so as to carry out weight reward and punishment on each feature, and carrying out weight evaluation on feature information through the screening mechanism by the one-dimensional matrix.

In step 108, weighting parameters into each feature of the head feature map to obtain a weighted head feature map.

In some examples, weighting the weight parameter into each feature of the head feature map is achieved by matrix dot multiplication (Element-wise mul).

In other embodiments, the encoding processing and the decoding processing can be stacked for multiple times, requirements of an automatic adaptive network model can be inserted into any network layer, weight punishment is performed on the feature information, the expression capability of the feature information of the position where the feature information is located is enhanced, and the feature information is continuously accumulated through the multiple encoding processing and the multiple decoding processing in the whole network model.

In the embodiment of the application, the characteristics of the global receptive field are obtained according to the descriptor obtained by compression, and the weight value is rewarded and punished on the characteristics through the global receptive field, so that the accuracy of weight evaluation is improved; the spatial correlation of the capture is better, and the capability of extracting the features is improved.

In step 105, the attitude information of the vehicle is determined based on the position of the bounding box and the position of the keypoint.

In some embodiments, the key points include: a left tail lamp point and a right tail lamp point; obtaining the attitude information of the vehicle according to the position of the surrounding frame and the position of the key point, and can be realized by the following modes: in a plane coordinate system comprising the surrounding frame and the key points, determining a line segment meeting at least one of the following conditions as a boundary line of the tail of the vehicle body: a line segment parallel to the line segment passing through the surrounding frame of the right tail lamp point and passing through the left tail lamp point; a line segment parallel to the line segment passing through the surrounding frame of the left tail lamp point and passing through the right tail lamp point; when the boundary of the tail of the vehicle body is a line segment which is parallel to the line segment of the surrounding frame passing through the right tail lamp point and passes through the left tail lamp point, taking the area on the left side of the boundary in the surrounding frame as a vehicle body area, and taking the area on the right side of the boundary in the surrounding frame as a tail area; when the boundary line of the vehicle tail is a line segment which is parallel to the line segment of the bounding box passing through the left tail lamp point and passes through the right tail lamp point, the area on the right side of the boundary line in the bounding box is used as a vehicle body area, and the area on the left side of the boundary line in the bounding box is used as a vehicle tail area.

In the embodiment of the application, two results, namely 6 key points of a vehicle and a surrounding frame of the vehicle, can be detected simultaneously through one-time forward operation; the orientation information of the vehicle can be further specified by the key points of the left and right tail lamps of the vehicle.

In some embodiments, referring to fig. 6C, fig. 6C is a schematic flowchart of a vehicle detection method provided in an embodiment of the present application, and based on fig. 6A, in some embodiments, the key points include: a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point, and a right rear tire grounding point; after step 105, steps 109 to 111 may also be performed. The description will be made in conjunction with the respective steps.

In step 109, the left front tire grounding point and the right front tire grounding point connecting line are set as the left wheel grounding line.

In step 110, the left rear tire grounding point and the right rear tire grounding point connecting line are used as the right side wheel grounding line.

In step 111, the positional relationship between the host vehicle and the vehicle is determined based on the relative relationship between the left-side wheel ground line and the right-side wheel ground line and the lane line of the host vehicle.

In some examples, determining the position relationship between the host vehicle and the vehicle according to the relative relationship between the left wheel grounding line and the right wheel grounding line and the lane line of the host vehicle can be realized by the following steps: when the right wheel grounding line is on the left side of the lane line of the vehicle and the right wheel grounding line is parallel to the lane line of the vehicle, determining that the vehicle is on the left lane of the vehicle; when the left wheel grounding line is on the right side of the lane line of the vehicle and the left wheel grounding line is parallel to the lane line of the vehicle, determining that the vehicle is on the right lane of the vehicle; when the left wheel grounding line is on the right side of the lane line of the vehicle and the right wheel grounding line is on the left side of the lane line of the vehicle, determining that the vehicle is in the same lane of the vehicle; when the left wheel grounding line is on the left side of the lane line of the vehicle and the left wheel grounding line is not parallel to the lane line of the vehicle, determining that the vehicle changes the lane from the right lane of the vehicle to the same lane of the vehicle; and when the right wheel grounding line is positioned on the right side of the lane line of the vehicle and the right wheel grounding line is not parallel to the lane line of the vehicle, determining that the vehicle changes the lane from the left lane of the vehicle to the same lane of the vehicle.

It should be noted that the lane line of the host vehicle includes a left lane line or a right lane line; for example, when the right wheel ground line of the other vehicle is on the left side of the left lane line of the host vehicle and the right wheel ground line is parallel to the left lane line of the host vehicle, determining that the other vehicle is in the left lane of the host vehicle; when the left wheel grounding line is positioned on the right side of the right lane line of the vehicle and the left wheel grounding line is parallel to the right lane line of the vehicle, determining that other vehicles are positioned on the right lane of the vehicle; when the left wheel grounding line is on the right side of the left lane line of the vehicle and the right wheel grounding line is on the left side of the right lane line of the vehicle, determining that the vehicle is in the same lane of the vehicle; when the left wheel grounding line of the other vehicle is on the left side of the right lane line of the vehicle and the left wheel grounding line is not parallel to the right lane line of the vehicle, determining that the other vehicle changes the lane from the right lane of the vehicle to the same lane of the vehicle; and when the right wheel grounding line of the other vehicle is on the right side of the left lane line of the vehicle and the right wheel grounding line is not parallel to the left lane line of the vehicle, determining that the other vehicle changes the lane from the left lane of the vehicle to the same lane of the vehicle.

In the embodiment of the application, two results, namely 6 key points of a vehicle and a surrounding frame of the vehicle, can be detected simultaneously through one-time forward operation; the orientation information of the vehicle can be further determined through key points of the left tail lamp and the right tail lamp of the vehicle; the position relationship between the other vehicle and the vehicle can be judged by combining the lane lines through the grounding points of the left and right tires of the vehicle.

In some embodiments, the feature extraction process and the feature fusion process are implemented by a convolutional neural network; before feature extraction is performed on the image, the following steps can also be performed: performing the following training operations on the convolutional neural network: initializing parameters of a convolutional neural network; determining a prediction loss function of the convolutional neural network according to the difference between the predicted value of the pixel point position of the image sample and the true value of the pixel point position of the image sample; determining an offset loss function of the convolutional neural network according to the predicted offset value of the pixel point position of the image sample; the parameters of the convolutional neural network are updated by minimizing the sum of the predicted loss function and the offset loss function.

For example, the input image is

The loss function is:

. Wherein H and W are respectively the length and width of the input image, R is the output down-sampling multiple,

is a function of the predicted loss for the convolutional neural network,

is the offset loss function of the convolutional neural network.

The predicted value of each pixel point position of the output feature map is as follows:

wherein, H and W are respectively the length and width of the input image, R is the output down-sampling multiple, and J is the number of key points. Iterative convolutional neural networks, resulting in a loss function

Minimum, where the prediction loss function of the convolutional neural network is:

wherein the content of the first and second substances,

in order to predict the hyperparameter of the loss function, N is the number of samples, x and y are the horizontal and vertical coordinates of the pixel points,

is the true value of the coordinate position of the pixel point of the key point j,

the predicted value of the coordinate position of the pixel point of the key point j is obtained.

The offset loss function of the convolutional neural network is used for compensating the offset of the pixel point position caused by the down sampling, and the offset loss function of the convolutional neural network is as follows:

wherein N is the number of samples, R is the output down-sampling multiple,

is the true value of the position of the key point,

is a predicted value of the position of the key point,

and (4) predicting the deviation of the positions of the key points.

In some embodiments, the head feature map is subjected to convolution processing, and the confidence of the prediction of the key point position can be predicted through the fourth full-connection layer; and stopping the training operation of the convolutional neural network when the confidence of the key point position prediction exceeds a confidence threshold value.

In some embodiments, the head feature map is subjected to convolution processing, and the visibility degree of the positions of the key points can be predicted through a fifth full-connection layer; the visibility degree can be combined with radar fusion information to further improve the accuracy of the key point position prediction.

Next, an exemplary application of the embodiment of the present application in a practical application scenario will be described. Taking the example of determining the attitude information of other vehicles by detecting image data of vehicles in the road environment collected by an image sensor, a series of processing of feature extraction, fusion and mapping is carried out on the image data, the positions of the surrounding frame of other vehicles and the positions of 6 key points (a left tail lamp point, a right tail lamp point, a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point and a right rear tire grounding point) are detected simultaneously, the posture information of the vehicle is determined through the left tail lamp point and the right tail lamp point of the other vehicle, the position relation between the other vehicle and the vehicle can be judged by combining lane lines through the front tire grounding point, the rear tire grounding point, the left tire grounding point, the rear tire grounding point, the right tire grounding point and the left tire grounding point of the other vehicle, the sensing performance of the image sensor is improved, the accuracy of determining the positions of the surrounding frame and the key point is improved, data support is provided for decision of automatic driving, and a reliable automatic driving environment is.

Referring to fig. 7, fig. 7 is a schematic flow chart of a vehicle detection method provided in the embodiment of the present application, and an implementation scheme of the embodiment of the present application is specifically as follows:

step 201: image data is acquired. For example, the image data may be acquired by an onboard camera.

Step 202: and (5) carrying out image normalization preprocessing.

In some embodiments, normalization may be achieved in two ways: one is that after the image data is de-averaged, the data in each dimension is divided by the standard deviation of the data in that dimension; another way is to divide by the maximum absolute value of the data to ensure that all data are normalized to a value between-1 and 1.

In the embodiment of the application, the influence of noise and the like in the image is removed through preprocessing, so that the precision and the accuracy of subsequent processing are ensured; the subsequent calculation amount is reduced, and the convergence is accelerated.

Step 203: and detecting key points.

For other vehicles in the automatic driving scene, from the perspective of a perception system, the ideal situation of vehicle detection in the image is that 1) the vehicle body and the vehicle tail of the vehicle can be divided; 2) the location of the keypoint. The vehicle body and the vehicle tail can be divided to provide effective radar fusion information for a downstream fusion detection algorithm, and the key point position can be further used for calculating the distance estimation of other vehicles, the position relation between other vehicles and the vehicle itself and the like.

Referring to fig. 8, fig. 8 is a schematic diagram of a bounding box of a vehicle and 6 key points provided in the embodiment of the present application, where for the 6 key points of other vehicles in an autonomous driving scenario, that is: a left tail light point, a right tail light point, a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point, and a right rear tire grounding point. The left tail lamp point and the right tail lamp point can give out vehicle body tail boundary information, and the 4 tire grounding points can give out front end input required by other vehicle distance estimation and position estimation.

Step 204: and (4) a key point post-processing algorithm.

The keypoint detection problem is abstracted as a mathematical problem: the input image is

The loss function is:

is a function of the predicted loss for the convolutional neural network,

is the offset loss function of the convolutional neural network.

wherein the content of the first and second substances,

wherein N is the number of samples, R is the output down-sampling multiple,

is the true value of the position of the key point,

is a predicted value of the position of the key point,

and (4) predicting the deviation of the positions of the key points.

In some examples, referring to fig. 9, fig. 9 is a schematic diagram of a vehicle key point detection network algorithm provided by an embodiment of the present application.

The backbone network feature extraction is used for performing feature extraction processing on the image to obtain feature maps of multiple scales of the image, and can be realized in the following ways: iterating k to perform the following: carrying out downsampling processing on the image to obtain a kth intermediate feature map; performing downsampling processing on the kth intermediate feature map to obtain a (K + 1) th intermediate feature map until the kth intermediate feature map is obtained; wherein K is an integer with the value increasing from 1 and the maximum value is K, and K is an integer greater than or equal to 2. For example, backbone network feature extraction can be implemented by a residual network ResNet.

The Feature Fusion characterization sub-network is used for performing Feature Fusion processing on Feature maps of multiple scales to obtain a head Feature map containing multiple features, and can be realized through a Feature Fusion Block, where the Feature Fusion Block performs processing including: iterating i to perform the following: carrying out up-sampling processing on the K-i intermediate feature map to obtain an intermediate feature map with the same scale as the K-i-1 intermediate feature map; adding the intermediate feature map obtained by the up-sampling process and the K-i-1 intermediate feature map pixel by pixel to obtain a head feature map containing a plurality of features; wherein, i is an integer with the value increasing from 0 in turn, and the maximum value is K-1.

The head feature map is subjected to convolution processing, and 6 features can be obtained at the same time: the feature of the central point position, the offset feature of the central point position, the width and height feature of the bounding box, the feature of the 6 key point positions, the confidence feature of the key point position prediction and the feature of the key point visibility degree. And connecting a full connection layer behind each feature to obtain the coordinate of the central point position, the offset value of the central point position, the width and the height of the bounding box, the coordinates of 6 key point positions, the confidence coefficient of key point position prediction and the visibility degree of the key points. And calculating to obtain the position of the bounding box according to the coordinates of the central point position, the deviation value of the central point position and the width and the height of the bounding box.

For example, referring to fig. 10, fig. 10 is a schematic processing flow diagram of feature extraction and feature fusion provided in the embodiment of the present application. The input image has a scale of 160 × 480, and is subjected to 4 times of continuous downsampling processing to obtain feature maps of a plurality of scales: carrying out downsampling processing on an input image to obtain a 1 st intermediate feature map, wherein the scale of the intermediate feature map is 80 x 240; performing downsampling processing on the 1 st intermediate feature map to obtain a 2 nd intermediate feature map with the scale of 40 × 120; performing downsampling processing on the 2 nd intermediate feature map to obtain a 3 rd intermediate feature map with the dimension of 20 × 60; performing downsampling processing on the 3 rd intermediate feature map to obtain a 4 th intermediate feature map with the scale of 10 multiplied by 30; carrying out up-sampling processing on the 4 th intermediate feature map to obtain an intermediate feature map with the same scale as the 3 rd intermediate feature map; aligning the 3 rd intermediate feature map by an alignment module (changing the number of channels of the intermediate feature map), adding the intermediate feature map obtained by the upsampling process and the 3 rd intermediate feature map after the alignment process pixel by pixel, and performing the upsampling process to obtain an intermediate feature map with the same scale as the 2 nd intermediate feature map; adding the obtained intermediate feature map and the aligned 2 nd intermediate feature map pixel by pixel, and performing up-sampling processing to obtain an intermediate feature map with the same scale as the 1 st intermediate feature map; adding the obtained intermediate feature map and the 1 st intermediate feature map subjected to alignment processing pixel by pixel to obtain a head feature map comprising a plurality of features; and weighting the features of the head feature map: encoding the head characteristic diagram to compress the head characteristic diagram into a one-dimensional matrix for representing the weight parameter of each characteristic, and decoding the one-dimensional matrix to obtain the weight parameter of each characteristic; and weighting the weight parameters into each feature of the head feature map to obtain the weighted head feature map.

Obtaining the attitude information of the vehicle according to the position of the surrounding frame and the positions of the key points (the left tail lamp point and the right tail lamp point); and obtaining the position relation between the other vehicles and the vehicle according to the positions of key points (a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point and a right rear tire grounding point) and the combination of lane lines. According to the embodiment of the application, the sensing performance of the image sensor is improved, the accuracy of determining the positions of the surrounding frame and the key point is improved, data support is provided for decision of automatic driving, and a reliable automatic driving environment is provided for a user. Referring to fig. 11, fig. 11 is a schematic diagram of an application in an actual drive test of an autonomous vehicle according to an embodiment of the present disclosure. Wherein, P0 is left tail lamp point, P1 is right tail lamp point, P2 is left front tire grounding point, P3 is right front tire grounding point, P4 is left rear tire grounding point, P5 is right rear tire grounding point, 501 is vehicle tail area, 502 is vehicle body area.

Continuing with the exemplary structure of the implementation of the vehicle detection device 453 as software modules provided by the embodiments of the present application, in some embodiments, as shown in fig. 5, the software modules stored in the vehicle detection device 453 of the memory 440 may include:

a collecting module 4531 for collecting an image of a vehicle in a road environment; an extraction module 4532, configured to perform feature extraction processing on the image to obtain feature maps of multiple scales of the image; a fusion module 4533, configured to perform feature fusion processing on the feature maps of multiple scales to obtain a head feature map including multiple features; the mapping module 4534 is configured to perform mapping processing on the head feature map to obtain a position of an enclosure frame of the vehicle and a position of a key point; a first determining module 4535, configured to determine the attitude information of the vehicle according to the position of the bounding box and the position of the key point.

In some embodiments, the extraction module 4532 is further configured to iterate k to perform the following: carrying out downsampling processing on the image to obtain a kth intermediate feature map; performing downsampling processing on the kth intermediate feature map to obtain a (K + 1) th intermediate feature map until the kth intermediate feature map is obtained; wherein K is an integer with the value increasing from 1 and the maximum value is K, and K is an integer greater than or equal to 2.

In some embodiments, the fusion module 4533 is further configured to iterate i to perform the following: carrying out up-sampling processing on the K-i intermediate feature map to obtain an intermediate feature map with the same scale as the K-i-1 intermediate feature map; adding the intermediate feature map obtained by the up-sampling processing and the K-i-1 th intermediate feature map pixel by pixel to obtain the head feature map containing a plurality of features; wherein, i is an integer with the value increasing from 0 in turn, and the maximum value is K-1.

In some embodiments, before the mapping process is performed on the head feature map, a vehicle detection apparatus provided in an embodiment of the present application further includes: a weighting module 4536, configured to perform an encoding process on the feature map to compress the feature map into a one-dimensional matrix of weight parameters for characterizing each feature; decoding the one-dimensional matrix to obtain a weight parameter of each feature; and weighting the weight parameters into each feature of the head feature map to obtain a weighted head feature map.

In some embodiments, the plurality of features includes: the feature of the central point position, the offset feature of the central point position, the width and height feature of the bounding box and the feature of the key point position; the mapping module 4534 is further configured to perform convolution processing on the head feature map to obtain a feature representing the center point position of the vehicle, an offset feature of the center point position, a width and height feature of the bounding box, and a feature of the key point position; predicting coordinates of the central point position of the enclosure frame through a first full-connection layer based on the characteristics of the central point position and the offset characteristics of the central point position; predicting the width and the height of the surrounding frame through a second full-connection layer based on the width and the height characteristics of the surrounding frame; determining the position of the surrounding frame of the vehicle according to the coordinates of the central point position of the surrounding frame and the width and the height of the surrounding frame; and predicting the coordinates of the positions of the key points of the vehicle through a third full-connected layer based on the characteristics of the positions of the key points.

In some embodiments, the keypoints comprise: a left tail lamp point and a right tail lamp point; the first determining module 4535 is further configured to determine, in a plane coordinate system including the bounding box and the keypoint, a line segment that meets at least one of the following conditions as a boundary line of a vehicle body tail: a line segment parallel to the line segment of the enclosure frame passing through the right tail lamp point and passing through the left tail lamp point; a line segment parallel to the line segment of the bounding box passing through the left tail light point and passing through the right tail light point; when the boundary of the vehicle body tail is a line segment which is parallel to the line segment of the enclosure frame passing through the right tail lamp point and passes through the left tail lamp point, taking the area on the left side of the boundary in the enclosure frame as a vehicle body area, and taking the area on the right side of the boundary in the enclosure frame as a vehicle tail area; when the boundary line of the vehicle body tail is a line segment which is parallel to the line segment of the enclosure frame passing through the left tail lamp point and passes through the right tail lamp point, the area on the right side of the boundary line in the enclosure frame is used as a vehicle body area, and the area on the left side of the boundary line in the enclosure frame is used as a vehicle tail area.

In some embodiments, the keypoints comprise: a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point, and a right rear tire grounding point; the embodiment of the application provides a vehicle detection device, still includes: a second determination module 4537 for setting the left front tire ground point and the right front tire ground point connection line as a left wheel ground line; taking the connecting line of the left rear tire grounding point and the right rear tire grounding point as a right side wheel grounding line; and determining the position relationship between the vehicle and the vehicle according to the relative relationship between the left-side wheel grounding line, the right-side wheel grounding line and the lane line of the vehicle.

In some embodiments, the second determining module 4537 is configured to determine that the vehicle is in a left lane of the host vehicle when the right wheel ground line is to the left of the lane line of the host vehicle and the right wheel ground line is parallel to the lane line of the host vehicle; determining that the vehicle is in a right lane of the host vehicle when the left wheel ground line is on the right side of the lane line of the host vehicle and the left wheel ground line is parallel to the lane line of the host vehicle; determining that the vehicle is in the same lane of the host vehicle when the left wheel ground line is on the right side of the lane line of the host vehicle and the right wheel ground line is on the left side of the lane line of the host vehicle; determining that the vehicle changes lanes from a right lane of the host vehicle to a same lane of the host vehicle when the left wheel ground line is to the left of the lane line of the host vehicle and the left wheel ground line is not parallel to the lane line of the host vehicle; when the right wheel grounding line is on the right side of the lane line of the vehicle and the right wheel grounding line is not parallel to the lane line of the vehicle, determining that the vehicle changes the lane from the left lane of the vehicle to the same lane of the vehicle.

In some embodiments, the feature extraction process and the feature fusion process are implemented by a convolutional neural network; before feature extraction is performed on the image, a vehicle detection apparatus provided in an embodiment of the present application further includes: a training module 4538, configured to perform the following training operations on the convolutional neural network: initializing parameters of the convolutional neural network; determining a prediction loss function of the convolutional neural network according to the difference between the predicted value of the pixel point position of the image sample and the true value of the pixel point position of the image sample; determining an offset loss function of the convolutional neural network according to a predicted offset value of a pixel point position of the image sample; updating parameters of the convolutional neural network by minimizing an added value of the predictive loss function and the offset loss function.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer readable storage medium, and the processor executes the computer instructions, so that the computer device executes the vehicle detection method described above in the embodiments of the present application.

Embodiments of the present application provide a computer-readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a vehicle detection method provided by embodiments of the present application, for example, as shown in fig. 6A, 6B, and 6C.

In some embodiments, the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; or may be various devices including one or any combination of the above memories.

In some embodiments, executable instructions may be written in any form of programming language (including compiled or interpreted languages), in the form of programs, software modules, scripts or code, and may be deployed in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.

By way of example, executable instructions may correspond, but do not necessarily have to correspond, to files in a file system, and may be stored in a portion of a file that holds other programs or data, such as in one or more scripts in a hypertext Markup Language (HTML) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

By way of example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices at one site or distributed across multiple sites and interconnected by a communication network.

In summary, by the embodiment of the application, the positions of the surrounding frame and the key point of the vehicle are detected simultaneously by performing feature extraction, fusion and mapping on the image of the vehicle in the road environment, so that the attitude information of the vehicle can be determined; the attitude information of the vehicle can be obtained only by image sensing data, so that the sensing performance of the image sensor is improved, and the acquisition cost of the data is greatly reduced without other types of sensing data; the feature fusion processing can effectively fuse the low-level features and the high-level features, thereby enhancing the feature expression capability of the image and improving the accuracy of determining the positions of the bounding box and the key point; two results, namely 6 key points of the vehicle and a surrounding frame of the vehicle, can be detected simultaneously through one-time forward operation; the orientation information of the vehicle can be further determined through key points of the left tail lamp and the right tail lamp of the vehicle; the position relationship between the other vehicle and the vehicle can be judged by combining the lane lines through the grounding points of the left and right tires of the vehicle.

The above description is only an example of the present application, and is not intended to limit the scope of the present application. Any modification, equivalent replacement, and improvement made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A vehicle detection method, characterized by comprising:

acquiring an image of a vehicle in a road environment;

2. The method according to claim 1, wherein the performing the feature extraction processing on the image to obtain a feature map of the image at multiple scales comprises:

iterating k to perform the following:

3. The method according to claim 2, wherein the performing feature fusion processing on the feature maps of the plurality of scales to obtain a head feature map including a plurality of features comprises:

iterating i to perform the following:

4. The method of claim 3, wherein before the mapping the head feature map, further comprising:

performing encoding processing on the head feature map so as to compress the feature map into a one-dimensional matrix of weight parameters for characterizing each feature;

5. The method of claim 1,

the plurality of features includes: the feature of the central point position, the offset feature of the central point position, the width and height feature of the bounding box and the feature of the key point position;

the mapping processing of the head feature map to obtain the position of the bounding box in the image and the position of the key point comprises the following steps:

performing convolution processing on the head feature map to obtain a feature representing the central point position of the vehicle, an offset feature of the central point position, a width and height feature of the surrounding frame and a feature representing the key point position;

6. The method of claim 5, wherein the keypoints comprise: a left tail lamp point and a right tail lamp point;

the obtaining of the attitude information of the vehicle according to the position of the enclosure frame and the position of the key point includes:

in a plane coordinate system comprising the bounding box and the key points, determining a line segment meeting at least one of the following conditions as a boundary line of the tail of the vehicle body: a line segment parallel to the line segment of the enclosure frame passing through the right tail lamp point and passing through the left tail lamp point; a line segment parallel to the line segment of the bounding box passing through the left tail light point and passing through the right tail light point;

7. The method of claim 5, wherein the keypoints comprise: a left front tire grounding point, a right front tire grounding point, a left rear tire grounding point, and a right rear tire grounding point;

the method further comprises the following steps:

taking the left front tire grounding point and the right front tire grounding point connecting line as a left wheel grounding line;

8. The method of claim 7, wherein determining the positional relationship of the host vehicle and the vehicle according to the relative relationship of the left-side wheel ground line, the right-side wheel ground line and the lane line of the host vehicle comprises:

determining that the vehicle is in a left lane of the host vehicle when the right wheel ground line is on the left side of the lane line of the host vehicle and the right wheel ground line is parallel to the lane line of the host vehicle;

9. The method of claim 1,

the feature extraction processing and the feature fusion processing are realized by a convolutional neural network;

before feature extraction is carried out on the image, the method further comprises the following steps:

performing the following training operations on the convolutional neural network:

initializing parameters of the convolutional neural network;

10. A vehicle detection device, characterized by comprising:

and the determining module is used for determining the attitude information of the vehicle according to the position of the surrounding frame and the position of the key point.