CN113673444B - Intersection multi-view target detection method and system based on angular point pooling - Google Patents

Intersection multi-view target detection method and system based on angular point pooling Download PDF

Info

Publication number
CN113673444B
CN113673444B CN202110971811.6A CN202110971811A CN113673444B CN 113673444 B CN113673444 B CN 113673444B CN 202110971811 A CN202110971811 A CN 202110971811A CN 113673444 B CN113673444 B CN 113673444B
Authority
CN
China
Prior art keywords
pooling
view
corner
feature
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110971811.6A
Other languages
Chinese (zh)
Other versions
CN113673444A (en
Inventor
张新钰
李骏
李志伟
高鑫
魏宏杨
王力
熊一瑾
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202110971811.6A priority Critical patent/CN113673444B/en
Publication of CN113673444A publication Critical patent/CN113673444A/en
Application granted granted Critical
Publication of CN113673444B publication Critical patent/CN113673444B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention discloses a method and a system for detecting intersection multi-view targets based on angular point pooling, wherein the method comprises the following steps: preprocessing images of the intersection multi-view cameras collected in real time; inputting the preprocessed images of the multi-view camera into a pre-established and trained intersection multi-view target detection model, and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the multi-view camera after preprocessing, performing feature projection, feature fusion and corner pooling on the extracted features, predicting the target position through a ground plane rectangular feature map after corner pooling, performing single-view detection and result projection on the extracted features, correcting the target position through a single-view target position mapping map and outputting a target prediction result.

Description

Intersection multi-view target detection method and system based on angular point pooling
Technical Field
The invention belongs to the field of target detection, and particularly relates to a corner pooling-based intersection multi-view target detection method and system.
Background
With the rapid development of unmanned driving and smart cities, the vehicle detection technology of a single sensor is relatively mature, however, in intersections with complex traffic conditions, factors such as high difficulty in dense detection caused by vehicle congestion, shielding problems caused by bulky vehicles, uncertainty of the single sensor and the like seriously restrict the accuracy of vehicle detection, and potential safety hazards also exist in the complex intersections. With the introduction of a multi-view detection method, the detection performance of vehicles at the intersection in crowded or sheltered scenes is remarkably improved, and the method has a great promotion effect on the safety of unmanned driving. However, the multi-view-based vehicle detection method is often accompanied by the fusion of multi-sensor data, and the integration of the multi-view data together to realize vehicle detection can be realized through multi-view result level fusion and multi-view feature level fusion, but they have the following problems respectively:
1. multi-view result level fusion: the data of each view requires a separate computing unit, which inevitably brings a large amount of overhead of computing resources. When the detection results of all the visual angles are projected together, due to the error of perspective transformation and the distortion of the edge during image splicing, the results of the targets in the visual angle overlapping area in different visual angles are often inconsistent, which causes the ghost phenomenon of the vehicle detection result and brings great uncertainty to the decision of unmanned driving.
2. Multi-view feature level fusion: after the characteristics of the multi-view data are extracted, all calculation processes are completed on an independent calculation unit so as to reduce the calculation redundancy. However, feature fusion can only reduce the amount of computation, and does not substantially improve the "ghost" phenomenon, but detects two ghosts as a larger target, which also interferes with the final decision.
Disclosure of Invention
The invention aims to overcome the technical defects and provides a crossing multi-view target detection method based on corner pooling. In addition, the intersection multi-view vehicle detection based on the angular point pooling effectively improves the detection precision and the robustness of the model due to the fact that the angular point information of the vehicle features is enhanced.
In order to achieve the above object, the present invention provides a method for detecting intersection multi-view objects based on angular point pooling, which comprises:
preprocessing images of the intersection multi-view cameras collected in real time;
inputting the preprocessed images of the multi-view camera into a pre-established and trained intersection multi-view target detection model, and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the multi-view camera after preprocessing, performing feature projection, feature fusion and corner pooling on the extracted features, predicting the target position through a ground plane rectangular feature map after corner pooling, performing single-view detection and result projection on the extracted features, correcting the target position through a single-view target position mapping map and outputting a target prediction result.
Further, the intersection multi-view target detection model comprises: the system comprises a feature extraction module, a multi-view feature projection module, a feature fusion module, a feature map corner pooling module, a single-view detection module and a prediction module;
the characteristic extraction module is used for extracting characteristics of the images of the cameras with the multiple visual angles to obtain characteristic graphs of the multiple visual angles;
the multi-view characteristic projection module is used for projecting the characteristic diagrams of a plurality of views onto a bird's-eye view plane based on perspective transformation by utilizing the calibration file of each camera to obtain a cascade projection characteristic diagram of the plurality of cameras;
the feature fusion module is used for fusing the cascade projection feature maps of the cameras with the camera coordinate feature map of the 2 channels and outputting a ground plane rectangular feature map of one (NxC +2) channel, wherein N is the number of the cameras, and C is the number of feature channels extracted from the image of each camera;
the feature map corner pooling module is used for performing corner pooling on the horizontal plane rectangular feature map and outputting the horizontal plane rectangular feature map after the corner pooling;
the single-view detection module is used for performing corner pooling on the feature map of each view to obtain a plurality of single-view target detection results, projecting the single-view target detection results onto a bird's-eye view plane and outputting a single-view target position mapping map;
and the prediction module is used for predicting the target position by using the ground plane rectangular characteristic map subjected to corner pooling, correcting the target position by using the single-view detection result of the single-view target position mapping map, and outputting a target prediction result.
Further, the feature extraction module uses a ResNet50 network, including: one 1x1 convolutional layer for dimensionality reduction, one 3x3 convolutional layer, and one 1x1 convolutional layer for recovery dimensions.
Further, the specific implementation process of the multi-view feature projection module is as follows:
projecting the feature map of each view onto a bird's eye view plane:
Figure BDA0003220214920000031
wherein s is a real number scale factor, u and v are coordinates before projection, and x, y and z are coordinates after projection; a is a camera intrinsic parameter matrix of 3 multiplied by 3; [ R | t ] is a 3 × 4 joint rotation-translation matrix, where R represents rotation and t represents translation; for each camera calibration file, quantizing the ground plane position into a grid with the size of H x W, wherein H and W are the length and the width of the final generated aerial view; the image is projected according to a perspective transformation to ground plane z equal to 0, and the ground plane positions outside the field of view are filled with zeros.
Further, the specific implementation process of the feature map corner pooling module is as follows:
copying 3 parts of the fused ground plane rectangular feature maps, and performing maximum pooling of all feature vectors of 4 identical ground plane rectangular feature maps to the left, the right, the upward and the downward respectively;
in the pooling process of a certain direction, firstly setting the first characteristic value of the edges of all characteristic vectors as a maximum value, if the subsequent characteristic value is smaller than the maximum value, performing maximum pooling on the small characteristic values, if a larger characteristic value is met, replacing the maximum value, and continuously pooling backwards by using a new maximum value until the pooling of the characteristic vectors in the direction is finished;
adding the maximum pooling results of left pooling and upward pooling, wherein the added result is top left corner pooling;
adding the maximum pooling results of right pooling and downward pooling, wherein the added result is right-lower corner pooling;
and cascading the pooling results of the corner points at the upper left corner and the corner points at the lower right corner to obtain a ground plane rectangular feature map after corner pooling.
Further, the single view angle detection module includes: the single-view characteristic graph point pooling unit and the single-view detection unit are used for detecting the single-view characteristic graph points;
the single-view feature map point pooling unit: the single-view angle detection unit is used for performing pooling of the corner points at the upper left corner and pooling of the corner points at the lower right corner on the feature maps of all the view angles respectively and outputting the feature maps to the single-view angle detection unit;
for each pooling vector, the angular point pooling mode is maximal pooling in a certain direction, adaptive attenuation optimization is adopted for the maximal pooling, and an attenuation formula is as follows:
Figure BDA0003220214920000032
wherein w is the size of the feature value after corner pooling for performing adaptive attenuation, λ is the attenuation coefficient, step represents the distance from the current maximum feature value, w0Is the current maximum eigenvalue;
the single-view detection unit; and respectively carrying out single-view target detection on the output results of the single-view feature map point pooling units, and projecting a plurality of single-view target detection results onto a bird's-eye view according to a projection transformation formula to form a single-view target position mapping map.
Further, the method further comprises: the step of training the intersection multi-view target detection model specifically comprises the following steps:
establishing a data set for training a model; the data set includes: the system comprises a label file set, an image data set and a calibration file set, wherein the label file set comprises a plurality of json files, the image file set comprises a plurality of preprocessed RGB (red, green and blue) images, the json files and the RGB images are in one-to-one correspondence, and the calibration file set comprises an internal reference file, an external reference file and a calibration file relative to a ground plane of each intersection camera;
in the intersection multi-view target detection model, each corner pooling feature layer is provided with corner pooling results of a plurality of targets, in order to establish a relation between each corner pooling result of the targets in different corner pooling feature layers, the pooling results are grouped by using a Pull loss function, and the upper left corner and the lower right corner of each target are in a group; separating the angular points by using a Push loss function due to the independence of each characteristic layer;
the Pull loss function is as follows:
Figure BDA0003220214920000041
the Push penalty function is as follows:
Figure BDA0003220214920000042
wherein the content of the first and second substances,
Figure BDA0003220214920000043
and
Figure BDA0003220214920000044
as embedding vectors for the top left corner and bottom right corner of the kth target, respectively, ekIs that
Figure BDA0003220214920000045
And
Figure BDA0003220214920000046
Δ is 1, which corresponds to the offset loss;
in the angular point pooling of the single-view feature map, the results of the left upper angular point pooling and the right lower angular point pooling are used as a group of angular points, and the angular points are used as one-dimensional embedded vectors to be added into the training in the network training;
setting the size, batch processing number, training wheel times and learning rate of each wheel of an encoder and a decoder for model training, inputting a data set into the intersection multi-view target detection model, and training the model to obtain the trained intersection multi-view target detection model.
The invention also provides an intersection multi-view target detection system based on angular point pooling, which comprises: an intersection multi-view target detection model, a data preprocessing module and a target detection module,
the data preprocessing module: the system comprises a front camera, a rear camera, a front camera, a rear camera, a front camera, a rear camera and a camera;
the target detection module is used for inputting the preprocessed multi-view camera data into the intersection multi-view target detection model and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the multi-view camera after preprocessing, inputting the extracted features into one path for feature projection, feature fusion and corner pooling, predicting the target position through a ground plane rectangular feature map after corner pooling, inputting the extracted features into the other path for single-view detection and result projection, correcting the target position through a single-view target position mapping map and outputting a target prediction result.
Compared with the prior art, the invention has the advantages that:
1. the method does not need additional post-processing operation, and can accurately complete the multi-view target detection on the premise of ensuring the timeliness;
2. the detection method based on angular point pooling can greatly improve the accuracy of intersection multi-view target detection and better solve the 'ghost' phenomenon from the algorithm level;
3. the invention aims at the possible phenomena of congestion or various target vehicles at the intersection, improves the corner pooling, and improves the precision measurement precision of the congestion or the excessive target vehicles at the intersection by using the corner pooling mode of the attenuation of the activation value.
Drawings
In order to illustrate the invention more clearly, the drawings that are needed for the invention will be briefly described below, it being apparent that the drawings in the following description are some embodiments of the invention, for which other drawings may be derived by those skilled in the art without inventive effort.
FIG. 1 is a schematic diagram of corner pooling of the present invention, wherein the upper left corner pooling is shown;
FIG. 2 is a flow chart of the intersection multi-view target detection method based on corner pooling of the present invention;
FIG. 3 is a simulation diagram of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It is to be understood that the described embodiments are only a few embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Before describing the embodiments of the present invention, the related terms related to the embodiments of the present invention are first explained as follows:
a multi-view camera: the system is characterized in that a plurality of monocular cameras placed at the intersection are distributed on the road side, and the total field angle of the multi-view cameras can cover the whole intersection.
Multi-view image: the method refers to a color image acquired by a multi-view camera, and the color image is a three-channel image.
Label: labels used for the supervised training of the target detection neural network are represented, and the category and the position of each target of the multi-view image are labeled.
The embodiment 1 of the invention provides a corner pooling-based intersection multi-view target detection method, wherein a target is a vehicle, and the method comprises the following specific implementation steps:
step 1) establishing and training a multi-view target detection model of the intersection;
step 101) establishing a multi-view target detection model of the intersection;
the intersection multi-view target detection model comprises: the system comprises a feature extraction module, a multi-view feature projection module, a feature fusion module, a feature map corner pooling module, a single-view detection module and a prediction module;
the characteristic extraction module is used for extracting the characteristics of the multi-view image;
in use, in consideration of light weight and real-time requirements of intersection multi-view detection, a 'bottomplan' is adopted for ResNet50, two convolution layers of 3x3 are replaced by 1x1+3x3+1x1 convolution layers, the convolution layer of the middle 3x3 is reduced in dimension through one convolution layer of 1x1, and then reduction is carried out under the other convolution layer of 1x1, so that the accuracy is maintained, and the calculated amount is reduced. The first convolution of 1x1 reduces the 256-dimensional channels to 64-dimensional and then finally recovers by convolution with 1x 1. Finally, the number of parameters is reduced, and a lighter intersection multi-view target detection model is obtained.
Multi-view feature projection module:
projecting the characteristic diagram of each visual angle to a bird's-eye view plane by using a calibration file of multiple cameras and a perspective transformation principle, wherein the transformation process comprises the following steps:
Figure BDA0003220214920000061
wherein s is a real number scale factor, u and v are coordinates before projection, and x, y and z are coordinates after projection; pθIs a 3x 4 angular transformation matrix. A is a 3 × 3 intrinsic parameter matrix. [ R | t]Is a 3 × 4 joint rotation-translation matrix, i.e. an extrinsic parameter matrix in the extrinsic reference file, where R denotes rotation and t denotes translation. For camera N ∈ {1, …, N } and calibration file, we pass the custom sampling grid shape [ H, W ∈ [ ]]The image is projected according to a perspective transformation to the ground plane z-0. Ground plane locations outside the field of view are filled with zeros. Mapping the characteristics of n camerasAnd sequentially projecting according to a perspective transformation formula.
A feature fusion module:
the ground plane location is quantized into a grid of size H W, where H and W specify the length and width of the final generated bird's-eye view. In addition, a 2-channel graph is used to specify the X-Y coordinates of the ground plane location. The projection feature maps of the N cameras output by the multi-view feature projection module are cascaded, and are added with the coordinate feature maps from the 2 channels to obtain a (NxC +2) channel ground plane rectangular feature map, which is also a bird's-eye view feature map at the intersection, wherein the shape of the feature map is [ H, W ], and C is the number of feature channels extracted from the image of each camera.
A feature corner pooling module:
in object detection, the corner points of the bounding box are usually located outside the object, in which case the corner points cannot be located according to local or edge features of the object. From the viewpoint of observing the target by eyes, in order to determine whether the upper-left corner point of the target detection frame exists at a certain pixel position, the topmost boundary of the target needs to be horizontally viewed to the right, and the leftmost boundary of the target needs to be vertically viewed to the bottom, wherein the observation mode is applied to the operation of fusing the feature maps after multi-view projection.
Copying 3 parts of the fused feature map, and performing maximum pooling of all feature vectors in the left direction, the right direction, the upward direction and the downward direction respectively, specifically, in the pooling process of a certain direction, firstly setting the first feature value of the edge of all the feature vectors as a maximum value, if the backward feature value is smaller than the maximum value, performing maximum pooling on the small feature value, if a larger feature value is met, replacing the maximum value, and continuing to perform backward pooling by using a new maximum value until the feature vector pooling in the direction is completed.
And adding left pooling and upward pooling results to the result of angular point pooling, wherein the result is left upper angular point pooling, and adding right pooling and downward pooling results to obtain right lower angular point pooling. Fig. 1 shows a schematic diagram of the process of pooling the top left corner points. And cascading the pooling results of the corner points at the upper left corner and the corner points at the lower right corner to be used as the output of the fused feature map after multi-view projection.
The single viewing angle detection module includes: the single-view characteristic graph point pooling unit and the single-view detection unit are used for detecting the single-view characteristic graph points;
single-view feature map point pooling unit:
the feature graph used for single-view detection comes from the shared features extracted from the multi-view features in the feature extraction module. In order to more accurately extract the vehicle characteristics under the intersection, angular point pooling is also adopted for the single-view characteristic map:
copying 3 parts of the feature map of each single visual angle, and performing maximum pooling of all feature vectors towards the left, the right, the up and the down respectively; and adding the pooling results of left pooling and upward pooling to obtain a pooling result of the corner points at the upper left corner, and adding the pooling results of right pooling and downward pooling to obtain a pooling result of the corner points at the lower right corner.
Furthermore, adaptive attenuation optimization is carried out on the corner pooling mode. In the angular point pooling, for each pooled vector, the angular point pooling mode is maximal pooling in a certain direction. The pooling mode improves the detection of the target corner points, but can cause feature confusion for a plurality of targets, so that the maximum pooling is optimized by self-adaptive attenuation, the maximum pooling value of the current target is prevented from causing interference to a gap between the two targets, and the attenuation formula is as follows:
Figure BDA0003220214920000083
wherein w is the size of the feature value after corner pooling for performing adaptive attenuation, λ is the attenuation coefficient, step represents the distance from the current maximum feature value, w0Is the current maximum eigenvalue. Researches show that the self-adaptive attenuation corner pooling not only maintains the detection performance of vehicles with multiple visual angles under the road junction, but also effectively reduces the false detection and detection errors of the workshop gap.
A single-view detection unit; and performing single-view detection on the output result of the point pooling unit of the single-view characteristic diagram, and projecting the single-view detection result to the aerial view according to a projection transformation formula to serve as part of supervision information of intersection multi-view target detection, so as to assist in better obtaining position distribution information of vehicles at the intersection and further improve the detection performance.
And the prediction module is used for predicting the vehicle position information under the intersection by using the output, correcting the position information by using the detection result of the single-view detection unit, and finally outputting the accurate information of the vehicle position projected to the bird's-eye view by the multi-view image under the current intersection.
Step 102) training a multi-view vehicle detection model at the intersection;
establishing a data set for training a model; the data set includes: the system comprises a label file set, an image data set and a calibration file set, wherein the label file set comprises a plurality of json files, the image file set comprises a plurality of RGB images, the json files and the RGB images are in one-to-one correspondence, and the calibration file set comprises internal reference files and external reference files of each data acquisition camera and calibration files relative to a ground plane;
preprocessing the three-channel RGB image; as an input to a neural network model;
in the training process of the intersection multi-view target detection model, each corner pooling feature layer is provided with corner pooling results of a plurality of targets, in order to establish a relation between each corner pooling result of the targets in different corner pooling feature layers, a Pull loss function is used for grouping the pooling results, and the upper left corner and the lower right corner of each target form a group; separating the angular points by using a Push loss function due to the independence of each characteristic layer;
the Pull loss function is as follows:
Figure BDA0003220214920000081
the Push penalty function is as follows:
Figure BDA0003220214920000082
wherein the content of the first and second substances,
Figure BDA0003220214920000091
and
Figure BDA0003220214920000092
as embedding vectors for the top left corner and bottom right corner of the kth target, respectively, ekIs that
Figure BDA0003220214920000093
And
Figure BDA0003220214920000094
Δ is 1, which corresponds to the offset loss;
different from the corner pooling of the multi-view fused feature map, in the corner pooling of the single-view feature map, the results of the left-upper corner pooling and the right-lower corner pooling are used as a group of corners, and the corners are used as one-dimensional embedded vectors to be added into the training in the network training.
Setting the size, batch processing number, training wheel times and learning rate of each wheel of an encoder and a decoder for training the intersection multi-view target detection model, and training the model to obtain the intersection multi-view target detection model.
Step 2) preprocessing the multi-view camera raw data collected in real time, including whitening, denoising and other operations;
and 3) inputting the preprocessed multi-view camera data into a trained intersection multi-view target detection model, firstly extracting features, performing feature projection, feature fusion and corner pooling on the extracted features, outputting vehicle position information, performing single-view detection and result projection at the same time, outputting correction information, correcting the vehicle position information, and outputting an accurate vehicle position prediction result, as shown in FIG. 2.
Vehicle position prediction of multi-view camera data is performed using the method of the present invention, as shown in fig. 3.
Example 2
Embodiment 2 of the present invention provides an intersection multi-view target detection system based on angular point pooling, including: a trained intersection multi-view target detection model, a data preprocessing module and a target detection module,
a data preprocessing module: the system comprises a front camera, a rear camera, a front camera, a rear camera, a front camera, a rear camera and a camera;
the target detection module is used for inputting the preprocessed multi-view camera data into the intersection multi-view target detection model and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the multi-view camera after preprocessing, inputting the extracted features into one path for feature projection, feature fusion and corner pooling, predicting the target position through a ground plane rectangular feature map after corner pooling, inputting the extracted features into the other path for single-view detection and result projection, correcting the target position through a single-view target position mapping map and outputting a target prediction result.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and are not limited. Although the present invention has been described in detail with reference to the embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (6)

1. An intersection multi-view target detection method based on corner pooling, the method comprising:
preprocessing images of the intersection multi-view cameras collected in real time;
inputting the preprocessed images of the multi-view camera into a pre-established and trained intersection multi-view target detection model, and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the pre-processed multi-view camera, performing feature projection, feature fusion and corner pooling on the extracted features, predicting the target position through a ground plane rectangular feature map after the corner pooling is performed, performing single-view detection and result projection on the extracted features, correcting the target position through a single-view target position mapping map and outputting a target prediction result;
the intersection multi-view target detection model comprises: the system comprises a feature extraction module, a multi-view feature projection module, a feature fusion module, a feature map corner pooling module, a single-view detection module and a prediction module;
the characteristic extraction module is used for extracting characteristics of the images of the cameras with the multiple visual angles to obtain characteristic graphs of the multiple visual angles;
the multi-view characteristic projection module is used for projecting the characteristic diagrams of a plurality of views onto a bird's-eye view plane based on perspective transformation by utilizing the calibration file of each camera to obtain a cascade projection characteristic diagram of the plurality of cameras;
the feature fusion module is used for fusing the cascade projection feature maps of the cameras with the camera coordinate feature map of the 2 channels and outputting a ground plane rectangular feature map of one (NxC +2) channel, wherein N is the number of the cameras, and C is the number of feature channels extracted from the image of each camera;
the feature map corner pooling module is used for performing corner pooling on the horizontal plane rectangular feature map and outputting the horizontal plane rectangular feature map after the corner pooling;
the single-view detection module is used for performing corner pooling on the feature map of each view to obtain a plurality of single-view target detection results, projecting the single-view target detection results onto a bird's-eye view plane and outputting a single-view target position mapping map;
the prediction module is used for predicting the target position by using the ground plane rectangular characteristic image subjected to angular point pooling, correcting the target position by using the single-view detection result of the single-view target position mapping image and outputting a target prediction result;
the specific implementation process of the feature map corner pooling module is as follows:
copying 3 parts of the fused ground plane rectangular feature maps, and performing maximum pooling of all feature vectors of 4 identical ground plane rectangular feature maps to the left, the right, the upward and the downward respectively;
in the pooling process of a certain direction, firstly setting the first characteristic value of the edges of all characteristic vectors as a maximum value, if the subsequent characteristic value is smaller than the maximum value, performing maximum pooling on the small characteristic values, if a larger characteristic value is met, replacing the maximum value, and continuously pooling backwards by using a new maximum value until the pooling of the characteristic vectors in the direction is finished;
adding the maximum pooling results of left pooling and upward pooling, wherein the added result is top left corner pooling;
adding the maximum pooling results of right pooling and downward pooling, wherein the added result is right-lower corner pooling;
and cascading the pooling results of the corner points at the upper left corner and the corner points at the lower right corner to obtain a ground plane rectangular feature map after corner pooling.
2. The intersection multi-view target detection method based on corner pooling of claim 1, wherein said feature extraction module uses a ResNet50 network comprising: one 1x1 convolutional layer for dimensionality reduction, one 3x3 convolutional layer, and one 1x1 convolutional layer for recovery dimensions.
3. The intersection multi-view target detection method based on corner pooling of claim 1, wherein the multi-view feature projection module is implemented by the following specific processes:
projecting the feature map of each view onto a bird's eye view plane:
Figure FDA0003487866710000021
wherein s is a real number scale factor, u and v are coordinates before projection, and x, y and z are coordinates after projection; a is a camera intrinsic parameter matrix of 3 multiplied by 3; [ R | t ] is a 3 × 4 joint rotation-translation matrix, where R represents rotation and t represents translation; for each camera calibration file, quantizing the ground plane position into a grid with the size of H x W, wherein H and W are the length and the width of the final generated aerial view; the image is projected according to a perspective transformation to ground plane z equal to 0, and the ground plane positions outside the field of view are filled with zeros.
4. The intersection multi-view target detection method based on corner pooling of claim 1, wherein said single-view detection module comprises: the single-view characteristic graph point pooling unit and the single-view detection unit are used for detecting the single-view characteristic graph points;
the single-view feature map point pooling unit: the single-view angle detection unit is used for performing pooling of the corner points at the upper left corner and pooling of the corner points at the lower right corner on the feature maps of all the view angles respectively and outputting the feature maps to the single-view angle detection unit;
for each pooling vector, the angular point pooling mode is maximal pooling in a certain direction, adaptive attenuation optimization is adopted for the maximal pooling, and an attenuation formula is as follows:
Figure FDA0003487866710000022
wherein w is the size of the feature value after corner pooling for performing adaptive attenuation, λ is the attenuation coefficient, step represents the distance from the current maximum feature value, w0Is the current maximum eigenvalue;
the single-view detection unit; and respectively carrying out single-view target detection on the output results of the single-view feature map point pooling units, and projecting a plurality of single-view target detection results onto a bird's-eye view according to a projection transformation formula to form a single-view target position mapping map.
5. The intersection multi-view target detection method based on corner pooling of claim 1, further comprising: the step of training the intersection multi-view target detection model specifically comprises the following steps:
establishing a data set for training a model; the data set includes: the system comprises a label file set, an image data set and a calibration file set, wherein the label file set comprises a plurality of json files, the image file set comprises a plurality of preprocessed RGB (red, green and blue) images, the json files and the RGB images are in one-to-one correspondence, and the calibration file set comprises an internal reference file, an external reference file and a calibration file relative to a ground plane of each intersection camera;
in the intersection multi-view target detection model, each corner pooling feature layer is provided with corner pooling results of a plurality of targets, in order to establish a relation between each corner pooling result of the targets in different corner pooling feature layers, the pooling results are grouped by using a Pull loss function, and the upper left corner and the lower right corner of each target are in a group; separating the angular points by using a Push loss function due to the independence of each characteristic layer;
pull penalty function LpullThe following were used:
Figure FDA0003487866710000031
push loss function LpushThe following were used:
Figure FDA0003487866710000032
wherein the content of the first and second substances,
Figure FDA0003487866710000033
and
Figure FDA0003487866710000034
as embedding vectors for the top left corner and bottom right corner of the kth target, respectively, ekIs that
Figure FDA0003487866710000035
And
Figure FDA0003487866710000036
Δ is 1, which corresponds to the offset loss;
in the angular point pooling of the single-view feature map, the results of the left upper angular point pooling and the right lower angular point pooling are used as a group of angular points, and the angular points are used as one-dimensional embedded vectors to be added into the training in the network training;
setting the size, batch processing number, training wheel times and learning rate of each wheel of an encoder and a decoder for model training, inputting a data set into the intersection multi-view target detection model, and training the model to obtain the trained intersection multi-view target detection model.
6. An intersection multi-view target detection system based on corner pooling, the system comprising: an intersection multi-view target detection model, a data preprocessing module and a target detection module,
the data preprocessing module: the system comprises a front camera, a rear camera, a front camera, a rear camera, a front camera, a rear camera and a camera;
the target detection module is used for inputting the preprocessed multi-view camera data into the intersection multi-view target detection model and outputting a target prediction result; the multi-view target detection model is used for extracting the features of the image of the pre-processed multi-view camera, inputting the extracted features into one path for feature projection, feature fusion and corner pooling, predicting the target position through a ground plane rectangular feature map after the corner pooling, inputting the extracted features into the other path for single-view detection and result projection, correcting the target position through a single-view target position mapping map and outputting a target prediction result;
the intersection multi-view target detection model comprises: the system comprises a feature extraction module, a multi-view feature projection module, a feature fusion module, a feature map corner pooling module, a single-view detection module and a prediction module;
the characteristic extraction module is used for extracting characteristics of the images of the cameras with the multiple visual angles to obtain characteristic graphs of the multiple visual angles;
the multi-view characteristic projection module is used for projecting the characteristic diagrams of a plurality of views onto a bird's-eye view plane based on perspective transformation by utilizing the calibration file of each camera to obtain a cascade projection characteristic diagram of the plurality of cameras;
the feature fusion module is used for fusing the cascade projection feature maps of the cameras with the camera coordinate feature map of the 2 channels and outputting a ground plane rectangular feature map of one (NxC +2) channel, wherein N is the number of the cameras, and C is the number of feature channels extracted from the image of each camera;
the feature map corner pooling module is used for performing corner pooling on the horizontal plane rectangular feature map and outputting the horizontal plane rectangular feature map after the corner pooling;
the single-view detection module is used for performing corner pooling on the feature map of each view to obtain a plurality of single-view target detection results, projecting the single-view target detection results onto a bird's-eye view plane and outputting a single-view target position mapping map;
the prediction module is used for predicting the target position by using the ground plane rectangular characteristic image subjected to angular point pooling, correcting the target position by using the single-view detection result of the single-view target position mapping image and outputting a target prediction result;
the specific implementation process of the feature map corner pooling module is as follows:
copying 3 parts of the fused ground plane rectangular feature maps, and performing maximum pooling of all feature vectors of 4 identical ground plane rectangular feature maps to the left, the right, the upward and the downward respectively;
in the pooling process of a certain direction, firstly setting the first characteristic value of the edges of all characteristic vectors as a maximum value, if the subsequent characteristic value is smaller than the maximum value, performing maximum pooling on the small characteristic values, if a larger characteristic value is met, replacing the maximum value, and continuously pooling backwards by using a new maximum value until the pooling of the characteristic vectors in the direction is finished;
adding the maximum pooling results of left pooling and upward pooling, wherein the added result is top left corner pooling;
adding the maximum pooling results of right pooling and downward pooling, wherein the added result is right-lower corner pooling;
and cascading the pooling results of the corner points at the upper left corner and the corner points at the lower right corner to obtain a ground plane rectangular feature map after corner pooling.
CN202110971811.6A 2021-08-19 2021-08-19 Intersection multi-view target detection method and system based on angular point pooling Active CN113673444B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110971811.6A CN113673444B (en) 2021-08-19 2021-08-19 Intersection multi-view target detection method and system based on angular point pooling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110971811.6A CN113673444B (en) 2021-08-19 2021-08-19 Intersection multi-view target detection method and system based on angular point pooling

Publications (2)

Publication Number Publication Date
CN113673444A CN113673444A (en) 2021-11-19
CN113673444B true CN113673444B (en) 2022-03-11

Family

ID=78545259

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110971811.6A Active CN113673444B (en) 2021-08-19 2021-08-19 Intersection multi-view target detection method and system based on angular point pooling

Country Status (1)

Country Link
CN (1) CN113673444B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114898585B (en) * 2022-04-20 2023-04-14 清华大学 Intersection multi-view-angle-based vehicle track prediction planning method and system
CN115578702B (en) * 2022-09-26 2023-12-05 北京百度网讯科技有限公司 Road element extraction method and device, electronic equipment, storage medium and vehicle

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177247A (en) * 2013-04-09 2013-06-26 天津大学 Target detection method fused with multi-angle information
CN111429514A (en) * 2020-03-11 2020-07-17 浙江大学 Laser radar 3D real-time target detection method fusing multi-frame time sequence point clouds

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729620B (en) * 2013-12-12 2017-11-03 北京大学 A kind of multi-view pedestrian detection method based on multi-view Bayesian network
US10540545B2 (en) * 2017-11-22 2020-01-21 Intel Corporation Age classification of humans based on image depth and human pose
US10452959B1 (en) * 2018-07-20 2019-10-22 Synapse Tehnology Corporation Multi-perspective detection of objects
CN111222387B (en) * 2018-11-27 2023-03-03 北京嘀嘀无限科技发展有限公司 System and method for object detection
CN110363815A (en) * 2019-05-05 2019-10-22 东南大学 The robot that Case-based Reasoning is divided under a kind of haplopia angle point cloud grabs detection method
CN110084222B (en) * 2019-05-08 2022-10-21 大连海事大学 Vehicle detection method based on multi-target angular point pooling neural network
CN110246141B (en) * 2019-06-13 2022-10-21 大连海事大学 Vehicle image segmentation method based on joint corner pooling under complex traffic scene
CN111523553B (en) * 2020-04-03 2023-04-18 中国计量大学 Central point network multi-target detection method based on similarity matrix
CN112329662A (en) * 2020-11-10 2021-02-05 西北工业大学 Multi-view saliency estimation method based on unsupervised learning
CN112365581B (en) * 2020-11-17 2024-04-09 北京工业大学 Single-view and multi-view three-dimensional reconstruction method and device based on RGB data
CN112488066A (en) * 2020-12-18 2021-03-12 航天时代飞鸿技术有限公司 Real-time target detection method under unmanned aerial vehicle multi-machine cooperative reconnaissance
CN112581503B (en) * 2020-12-25 2022-11-11 清华大学 Multi-target detection and tracking method under multiple visual angles
CN112966736B (en) * 2021-03-03 2022-11-11 北京航空航天大学 Vehicle re-identification method based on multi-view matching and local feature fusion
CN113096058B (en) * 2021-04-23 2022-04-12 哈尔滨工业大学 Spatial target multi-source data parametric simulation and MixCenterNet fusion detection method
CN113673425B (en) * 2021-08-19 2022-03-15 清华大学 Multi-view target detection method and system based on Transformer

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103177247A (en) * 2013-04-09 2013-06-26 天津大学 Target detection method fused with multi-angle information
CN111429514A (en) * 2020-03-11 2020-07-17 浙江大学 Laser radar 3D real-time target detection method fusing multi-frame time sequence point clouds

Also Published As

Publication number Publication date
CN113673444A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
CN111583337B (en) Omnibearing obstacle detection method based on multi-sensor fusion
CN109740465B (en) Lane line detection algorithm based on example segmentation neural network framework
CN109948661B (en) 3D vehicle detection method based on multi-sensor fusion
CN109034018B (en) Low-altitude small unmanned aerial vehicle obstacle sensing method based on binocular vision
CN115439424B (en) Intelligent detection method for aerial video images of unmanned aerial vehicle
CN108694386B (en) Lane line detection method based on parallel convolution neural network
CN104848851B (en) Intelligent Mobile Robot and its method based on Fusion composition
WO2022141910A1 (en) Vehicle-road laser radar point cloud dynamic segmentation and fusion method based on driving safety risk field
CN111079556A (en) Multi-temporal unmanned aerial vehicle video image change area detection and classification method
CN113673444B (en) Intersection multi-view target detection method and system based on angular point pooling
CN108648194B (en) Three-dimensional target identification segmentation and pose measurement method and device based on CAD model
CN113158768B (en) Intelligent vehicle lane line detection method based on ResNeSt and self-attention distillation
CN111914795A (en) Method for detecting rotating target in aerial image
CN111401150A (en) Multi-lane line detection method based on example segmentation and adaptive transformation algorithm
CN113129449B (en) Vehicle pavement feature recognition and three-dimensional reconstruction method based on binocular vision
CN110009675A (en) Generate method, apparatus, medium and the equipment of disparity map
CN114140672A (en) Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene
CN115019043A (en) Image point cloud fusion three-dimensional target detection method based on cross attention mechanism
CN114898353B (en) License plate recognition method based on video sequence image characteristics and information
CN116434088A (en) Lane line detection and lane auxiliary keeping method based on unmanned aerial vehicle aerial image
CN114372919B (en) Method and system for splicing panoramic all-around images of double-trailer train
CN114445442A (en) Multispectral image semantic segmentation method based on asymmetric cross fusion
CN110415299B (en) Vehicle position estimation method based on set guideboard under motion constraint
CN108460348A (en) Road target detection method based on threedimensional model
CN117115690A (en) Unmanned aerial vehicle traffic target detection method and system based on deep learning and shallow feature enhancement

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant