CN114332740B

CN114332740B - Video-based intersection deadlock event detection method and device

Info

Publication number: CN114332740B
Application number: CN202210217837.6A
Authority: CN
Inventors: 陈维强; 王雯雯; 陈晓明; 臧海洋
Original assignee: Hisense TransTech Co Ltd
Current assignee: Hisense TransTech Co Ltd
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2022-06-03
Anticipated expiration: 2042-03-08
Also published as: CN114332740A

Abstract

The application relates to the technical field of intelligent traffic, and provides a video-based intersection deadlock event detection method and equipment, wherein the method adopts a vehicle detection model based on partial channel random grouping convolution to improve the detection accuracy of target vehicles in an intersection; meanwhile, the vehicle detection model adopts a multilayer characteristic reinforced fusion mode, so that the sensitivity of the model to the vehicle posture is reduced, and the detection accuracy of the target vehicle in the intersection is further improved; further, a traffic parameter set corresponding to the intersection is obtained by tracking the target vehicle detected by the vehicle detection model, and whether an intersection deadlock event occurs is detected according to the obtained traffic parameter set. By accurately detecting the target vehicle, the detection accuracy of the deadlock event of the intersection is improved, so that the intersection with the deadlock event can be timely treated, the continuous accumulation of the vehicle at the intersection is avoided, and the passing efficiency of the vehicle in the intersection is improved.

Description

Crossing deadlock event detection method and device based on video

Technical Field

The application relates to the technical field of intelligent traffic, in particular to a method and equipment for detecting a crossing deadlock event based on videos.

Background

With the development of economy, the number of motor vehicles in cities is remarkably increased, and saturated traffic is a common phenomenon. Especially in peak hours, the traffic jam is more serious, and the traffic jam becomes a problem facing all cities.

At an intersection, vehicle backlog caused by congestion in one direction can finally influence traffic in other directions of the intersection, the congestion becomes a block, the intersection traffic enters a Deadlock (Deadlock) state, and the road traffic is paralyzed.

In order to avoid traffic congestion, intersection deadlock events need to be detected to reasonably control running vehicles. At present, the detection of the deadlock event at the intersection is often based on the detection and tracking result of a target vehicle in the intersection to determine the congestion condition of the intersection, and further determine whether the deadlock event occurs.

However, in an actual scene, when the intersection enters a deadlock state, the number of the blocked vehicles is large, the vehicles are distributed densely, and serious shielding exists among the vehicles; meanwhile, when the intersection is deadlocked, the vehicle has great difference in posture and visual range due to different driving directions and different intersection turning directions. These problems result in lower detection accuracy of the target vehicle at the intersection, which in turn affects the detection results of deadlock events.

Disclosure of Invention

The embodiment of the application provides a video-based intersection deadlock event detection method and equipment, which are used for improving the detection accuracy of an intersection deadlock event.

In a first aspect, an embodiment of the present application provides a method for detecting an intersection deadlock event based on a video, including:

acquiring a traffic video stream of a target intersection;

aiming at each video frame, acquiring a feature map set corresponding to a preset detection area of the video frame by adopting a trained vehicle detection model, wherein the scales of different feature maps in the feature map set are different;

respectively performing partial channel random grouping convolution on at least one feature map in the feature map set to obtain a feature vector corresponding to the corresponding feature map;

performing at least one multi-scale enhanced pooling operation on the feature map with the minimum scale in the feature map set to obtain a fused feature vector;

obtaining at least one target vehicle contained in the video frame according to the feature vector corresponding to each feature map in the feature map set;

and determining a traffic parameter set of the target intersection according to the detected target vehicles, and determining whether an intersection deadlock event occurs according to the traffic parameter set.

Optionally, the performing random grouping convolution on all or part of the feature maps in the feature map set to obtain a feature vector corresponding to each feature map includes:

for at least one feature map in the feature map set, respectively performing the following operations:

and carrying out grouping convolution on the channels of the characteristic diagram, and recombining the last two thirds of the channels of the characteristic diagram after convolution to obtain the characteristic vector corresponding to the characteristic diagram.

Optionally, the performing at least one multi-scale enhanced pooling operation on the feature map with the smallest scale in the feature map set to obtain a fused feature vector includes:

copying the feature graph with the minimum scale into N pieces, wherein N is an integer larger than 1;

pooling the copied N-1 feature maps to obtain N-1 first feature vectors, wherein the pooling kernels used by the N-1 feature maps are different;

pooling the rest 1 copied feature map by using pooling kernels with different scales, merging the pooled kernels into a new feature map, taking the new feature map as the input of the next enhanced pooling operation, and performing multi-scale enhanced pooling again to obtain a second feature vector;

and fusing the N-1 first feature vectors and the second feature vectors to obtain fused feature vectors.

Optionally, the determining a traffic parameter set of the target intersection according to the detected target vehicles includes:

determining the driving direction of each target vehicle according to the detected driving data of each target vehicle at different moments;

calculating the area ratio of the total area of each detected target vehicle to the preset detection area;

and acquiring a traffic parameter set of the target intersection according to the driving direction and the area ratio.

Optionally, the determining whether an intersection deadlock event occurs according to the traffic parameter set includes:

and when the area ratio is larger than a preset area threshold value and the number of the target vehicles with mutually perpendicular driving directions is larger than a preset vehicle threshold value, determining that an intersection deadlock event occurs.

Optionally, after determining that the intersection deadlock event occurs, the method further includes:

and sending prompt information to the downstream vehicle associated with the target intersection so that the downstream vehicle replans the route or adjusts the running speed according to the prompt information.

In a second aspect, an embodiment of the present application provides a video-based intersection deadlock event detection device, including a processor, a memory, and a communication interface, where the communication interface, the memory, and the processor are connected by a bus:

the memory stores a computer program, and the processor performs the following operations according to the computer program:

acquiring traffic video stream of a target intersection through the communication interface;

Optionally, the processor performs random grouping convolution on all or part of the feature maps in the feature map set to obtain a feature vector corresponding to each feature map, and the specific operation is as follows:

Optionally, the processor performs at least one multi-scale enhanced pooling operation on the feature map with the smallest scale in the feature map set to obtain a fused feature vector, where the specific operation is:

Optionally, the processor determines a traffic parameter set of the target intersection according to the detected target vehicles, and the specific operation is as follows:

Optionally, the processor determines whether an intersection deadlock event occurs according to the traffic parameter set, and specifically performs the following operations:

Optionally, after determining that an intersection deadlock event occurs, the processor further performs the following operations:

In a third aspect, the present application provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer to execute the method for detecting a video-based intersection deadlock event provided in the embodiment of the present application.

The beneficial effect of this application is as follows:

in the embodiment of the application, a trained vehicle detection model is adopted for each video frame in a traffic video stream of a target intersection to obtain a feature map set corresponding to a preset detection area of the video frame, and due to the fact that scales of different feature maps in the feature map set are different, multi-scale information and semantic information of a target vehicle in the video frame are fully reserved, and the detection accuracy of the target vehicle is improved; by respectively carrying out random grouping convolution on partial channels on at least one feature map in the feature map set, obtaining feature vectors corresponding to corresponding feature maps, wherein the feature vectors retain original information of partial channels and recombined information of partial channels, the richness of the feature vectors is improved, and the feature vectors are beneficial to target vehicle detection; by carrying out at least one multi-scale enhanced pooling operation on the feature map with the minimum scale in the feature map set, the attitude information of the vehicle in the video frame can be reserved by the fused feature vector, and the accuracy of target vehicle detection is further improved; and then, according to the detected target vehicles, determining a traffic parameter set of the target intersection, and according to the traffic parameter set, determining whether an intersection deadlock event occurs.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the description below are some embodiments of the present application, and those skilled in the art can obtain other drawings based on the drawings without inventive labor.

FIG. 1 is a flow chart illustrating a method for training a vehicle detection model provided by an embodiment of the present application;

FIG. 2 is a schematic diagram illustrating a principle of random grouping and convolution of channels according to an embodiment of the present application;

FIG. 3 is a schematic diagram illustrating an enhanced pooling scheme provided by an embodiment of the present application;

FIG. 4 is a block diagram illustrating an overall view of a vehicle detection model provided by an embodiment of the present application;

FIG. 5 is a diagram illustrating a feature vector extraction structure of a vehicle detection model provided in an embodiment of the present application;

FIG. 6 is a flowchart illustrating a method for detecting a deadlock event of a video-based intersection according to an embodiment of the present application;

FIG. 7 is a schematic diagram illustrating partial channel random packet convolution provided by an embodiment of the present application;

FIG. 8 is a flow chart illustrating a method for enhancing pooling provided by an embodiment of the present application;

fig. 9 is a flowchart illustrating a traffic parameter set obtaining method provided by an embodiment of the present application;

FIG. 10 is a diagram illustrating an effect of detecting an intersection deadlock event according to an embodiment of the present application;

fig. 11 is a block diagram illustrating an example of a video-based intersection deadlock event detection apparatus according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

Road traffic relates to daily travel of urban residents, intersections serve as important traffic blocks, and if the intersections enter a deadlock state, traffic jam can reduce vehicle passing efficiency and influence social operation order. Therefore, the method is particularly important for detecting the congestion condition of the intersection in real time so as to dredge the intersection in time when the intersection is found to be deadlocked.

At present, under the influence of factors such as the number of vehicles in an intersection, the density, the shielding and the driving direction, the prior art cannot accurately detect a target vehicle, and also cannot accurately detect the congestion condition of the intersection, so that the detection accuracy of an intersection deadlock event is reduced, the traffic cannot be dredged in time, and the traffic efficiency is low.

In view of this, the embodiment of the present application provides a method and a device for detecting an intersection deadlock event based on a video, in the method, a vehicle detection model based on partial channel random grouping convolution is adopted, so as to improve the detection accuracy of a target vehicle in an intersection; meanwhile, the vehicle detection model adopts a multilayer characteristic reinforced fusion mode, so that the sensitivity of the model to the vehicle posture is reduced, and the detection accuracy of the target vehicle in the intersection is further improved; further, a traffic parameter set corresponding to the intersection is obtained by tracking the target vehicle detected by the vehicle detection model, and whether an intersection deadlock event occurs is detected according to the obtained traffic parameter set. By accurately detecting the target vehicle, the detection accuracy of the intersection deadlock event is improved, so that the intersection with the deadlock event can be timely disposed, the vehicles are prevented from being continuously accumulated at the intersection, the passing efficiency of the vehicles at the intersection is improved, the urban traffic jam is effectively relieved, and the occurrence rate of traffic accidents is reduced.

In the embodiment of the present application, in order to obtain the vehicle detection model, a set of training samples for training the vehicle detection model needs to be collected in advance.

At present, acquisition equipment is generally installed in intersections (such as crossroads, T-shaped intersections, five-way intersections and the like) with various forms, including but not limited to intersection cameras, electronic policers and bayonets; the collection device may collect various types of vehicles including, but not limited to, cars, vans, buses, blenders, and the like. Thus, a set of training samples may be obtained by the acquisition device in the intersection. Considering the difference of the installation position and the angle of the acquisition equipment, the traffic pictures shot by the acquisition equipment may be different, and the acquisition equipment needs to be configured in advance to acquire the pictures meeting the detection requirements.

Taking the collecting device as an example of an electronic police, when a traffic manager installs the electronic police, the installation height and the shooting angle of the camera are usually debugged, so that the traffic manager can shoot traffic pictures at intersections. Therefore, after the detection equipment in the embodiment of the application is connected with an electronic police, the intersection video is obtained in real time, and when the obtained intersection video meets the preset vehicle detection requirement (such as the intersection is located in the central area of a video frame), secondary debugging is not needed; and when the acquired intersection video does not meet the preset vehicle detection requirement, debugging is carried out again until the preset vehicle detection requirement is met. Further, after intersection videos meeting preset vehicle detection requirements are obtained, one frame of reference image is randomly selected from the intersection videos, the detection area of the intersection deadlock event is manually marked based on the selected reference image, and the position information of the marked detection area frame is recorded.

In a similar way, after the detection equipment is connected with the intersection camera and the bayonet, marking of the detection area is also carried out aiming at the acquired intersection video.

In the embodiment of the application, because the parameters such as the installation position, the angle, the resolution ratio and the like of the same acquisition device are unchanged, only one reference image needs to be selected for marking the detection area, and the marking efficiency is improved; moreover, through the labeled detection area, the interference of a complex background can be eliminated, and the accuracy of target detection is improved.

It should be noted that, due to the difference in the resolution, angle, etc. of the acquisition devices, the position and size of the labeled detection area may be different for the intersection videos acquired by different acquisition devices. The detection areas corresponding to the respective acquisition devices are shown in table 1.

TABLE 1 detection regions corresponding to respective acquisition devices

Where, (xi, yi) represents the origin of the detection region frame, wi represents the width of the detection region frame, and hi represents the height of the detection region frame.

After the detection areas are marked for the intersection videos collected by the collecting devices in advance, the intersection videos collected by the collecting devices are used as a training sample set to train a vehicle detection model. The specific training process is shown in fig. 1:

s101: a set of training samples is obtained.

In an embodiment of the application, the training sample set is from intersection videos collected by each collection device in an intersection. Because the detection areas of the intersection videos acquired by the acquisition devices are labeled in advance, the intersection videos acquired by the acquisition devices are cut based on the detection area corresponding to each acquisition device, and a training sample set is obtained. The training sample set comprises a positive training sample and a negative training sample, the positive training sample comprises at least one vehicle, the type and the driving direction of each vehicle can be different, and the negative training sample does not comprise any vehicle.

After the training sample set is obtained, a sample label is marked on each training sample. For example, positive training samples are labeled 1 and negative training samples are labeled 0.

S102: inputting a plurality of training samples in the training sample set and a pre-labeled real label corresponding to each training sample into an initial vehicle detection model, and obtaining the vehicle detection model with the detection loss value within a preset range through multi-round iterative training.

In S102, for each round of training, the following operations are performed:

s1021: and performing convolution operation on each training sample through a plurality of residual error units in the initial vehicle detection model to obtain a shallow layer feature vector and a deep layer feature vector.

In step S1021, after inputting a plurality of training samples in the training sample set to the initial vehicle detection model, a feature vector of each training sample is extracted by a plurality of residual units (Resblock _ body) in the initial vehicle detection model. In specific implementation, considering that input training samples have the characteristic of multi-scale targets, a deep layer feature vector and a shallow layer feature vector of each training sample are respectively extracted through a plurality of residual error units, wherein the deep layer feature vector contains semantic information of a detection object in the training sample under a complex intersection traffic scene, and the shallow layer feature vector contains position information and multi-scale information of the detection object in the training sample under the complex intersection traffic scene.

Generally, intersection traffic scenes are complex, and the resolution ratios of the acquisition devices are different, so that detection objects in training samples are often influenced by environmental factors, and the problems of large light change, inconsistent shielding, large scale change and the like exist. Although the shallow feature vector extracted by the initial vehicle detection model includes the position information and the multi-scale information of the detection object, the receptive field for extracting the feature vector is also reduced. The reduction in the receptive field is more pronounced when the resolution of the training samples is greater (e.g., 11920 x 1080 pixels resolution).

In view of the above situation, in the embodiment of the application, when a vehicle detection model is trained, in at least one residual error unit for extracting a shallow feature vector and a deep feature vector, a random grouping convolution of partial channels is adopted, so that, compared with a standard convolution, the feature vector extracted by at least one residual error unit can not only retain original information of a partial channel of a detection object, but also obtain reorganization information of the partial channel of the detection object, wherein, under the condition that the detection object is distributed densely or is shielded, the semantic features of the detection object can be enhanced by the reorganization information obtained after mixing partial channels, so that under a complex traffic scene of intersection deadlock, rich features of multiple angles, multiple scales, spatial positions and the like of a vehicle can be extracted, and further the detection accuracy of the intersection deadlock event is improved.

Optionally, in the last two residual error units of the vehicle detection model, partial channel random grouping convolution is adopted in the embodiment of the present application.

The principle of the random grouping convolution of the channels is as follows: the feature map after the group convolution is "reorganized". For example, assuming that the input is divided into g groups and the total number of channels is g × n, the channel dimension is first split into (g, n) two dimensions, then the two dimensions are transposed into (n, g), and finally recombined into g × n one dimension. Therefore, the obtained feature vectors can be ensured to be from different groups, and the feature information of the detection object can be circulated among different groups when the feature vectors are used as the input of other groups.

Referring to fig. 2, a schematic diagram of random grouping and convolution of channels is shown, where a feature map of a training sample has 9 channels, which are divided into 3 groups, and the channels are shuffled uniformly and then recombined, where each group includes features of other groups.

S1022: and performing multiple times of strengthening pooling on the deep feature vector of each training sample, wherein a plurality of pooling kernels with different scales are used for strengthening pooling each time.

In the embodiment of the application, after the deep feature vectors are extracted by the vehicle detection model, in order to increase the receptive field and reduce the calculated amount, the deep feature vectors extracted from each training sample are subjected to pooling operation. The pooling is to average the features extracted by convolution each time on the basis of extracting the features by convolution, and reduce the influence of hidden nodes on feature dimensions, thereby reducing the design burden of the classifier. Meanwhile, the dimensionality of the pooling is different, and the semantic information represented by the final output result also changes.

As shown in fig. 3, which is a schematic diagram of a primary enhanced pooling operation, pooling operations are performed using pooling kernels (e.g., 1 × 1, 2 × 2, 3 × 3, 5 × 5, etc.) with different scales for each convolved result during pooling, and the pooled result is upsampled to make input and output sizes consistent before and after pooling, so as to perform feature fusion.

The vehicle detection model provided by the embodiment of the application can further keep important information on whether a detection object can be detected or not before pooling, such as multi-posture information, multi-scale information, spatial position information and the like, by performing continuous and repeated pooling on deep characteristic vectors of each training sample, so that the detection difficulty of multi-posture vehicles caused by different appearances, driving directions and intersection turning directions of the vehicles can be reduced under the condition of intersection deadlock, the detection difficulty of multi-scale vehicles caused by different visible ranges of acquisition equipment installed at each intersection is reduced, and the vehicle detection accuracy is improved.

S1023: and obtaining the prediction label of the corresponding training sample according to the shallow layer feature vector and the deep layer feature vector of each training sample.

In S1023, for each training sample, whether the deep layer feature vector and the shallow layer feature vector of the training sample contain a detection object is predicted, and a prediction label of the training sample is determined according to the prediction result.

S1024: and determining a detection loss value according to the real label and the predicted label corresponding to each training sample.

For example, for a positive training sample, the true label is 1, and assuming that the predicted label is 0.89, the loss value is 0.11, and the loss value is small; for the negative training sample, its true label is 0, and assuming the predicted label is 0.78, the loss value is 0.78.

S1025: and adjusting the parameters of the initial vehicle detection model according to the detection loss value.

In S1025, parameters of the initial vehicle detection model are adjusted by using the detection loss value obtained in each training round until the detection loss value is within a preset range, thereby obtaining a trained vehicle detection model.

On one hand, in the vehicle detection model of the embodiment of the application, in the process of extracting the shallow feature vector and the deep feature vector of the training sample, by adopting partial channel random grouping convolution in at least one residual error unit of the vehicle detection model, the extracted feature vector can not only retain the original information of a partial channel of a detection object, but also can obtain the reorganization information of the partial channel of the detection object, and the reorganization information can enhance the semantic features of the detection object under the condition that the detection object is distributed densely or is shielded, so that under the complex traffic scene of intersection deadlock, rich features of multiple angles, multiple scales, spatial positions and the like of a vehicle can be extracted, and the detection accuracy of the intersection deadlock event is further improved; on the other hand, by carrying out continuous and repeated enhanced pooling on the deep characteristic vector of each training sample, important information such as multi-posture information, multi-scale information, spatial position information and the like on whether a detection object can be detected or not before pooling can be further reserved, so that the detection difficulty of multi-posture vehicles caused by different appearances, driving directions and intersection turning directions of the vehicles can be reduced under the condition of intersection deadlock, the detection difficulty of multi-scale vehicles caused by different visual ranges of acquisition equipment installed at each intersection is reduced, and the vehicle detection accuracy is improved.

The network structure of the vehicle detection model in the embodiment of the application is based on the CSPNet structure, the CSPNet is that the feature map is divided into two parts, one part is subjected to convolution operation, and the other part and the result of the convolution operation are subjected to feature fusion, so that the learning capability of a Convolutional Neural Network (CNN) is enhanced, the network can be lightened, and the accuracy of target detection is kept.

The complete structure of the vehicle detection model is shown in fig. 4, and mainly comprises a feature extraction module, a pooling module and a prediction module. As shown in fig. 5, the feature extraction module includes 5 residual error units, and the number of the residual error blocks in different residual error units is different, and a Cross Stage Partial (Cross Stage Partial) structure is added to each residual error unit, so as to reduce parameters of the vehicle detection model and make it easier to train.

Each residual unit can output feature maps with different scales, and five feature maps with the sizes of 128 × 128, 64 × 64, 32 × 32, 16 × 16 and 8 are obtained through convolution operation of a plurality of residual units by taking an example of inputting a training sample with the size of 256 × 256 pixels. In the embodiment of the application, the vehicle detection model selects the feature maps (i.e., the feature maps of 32 × 32, 16 × 16, and 8 × 8) output by the last three residual error units for target detection.

As shown in fig. 5, the vehicle detection model adopts partial channel random grouping convolution in the last two residual error units, and compared with standard convolution, the residual error unit adopting partial channel random grouping convolution can extract stronger semantic information under the condition that the vehicles are distributed densely or are shielded, so that the characteristics of multiple angles, multiple scales, spatial positions and the like of the vehicles are enriched in the complex traffic scene of intersection deadlock, and the detection accuracy of the intersection deadlock event is improved.

The original CSPNet structure uses a maximum pooling mode in the pooling module, and the pooling mode can cause the reduction of spatial resolution, thereby reducing the detection effect on multi-angle and multi-pose vehicles. In the pooling module of the vehicle detection model provided in the embodiment of the present application, as shown in fig. 4, pooling kernels of 3 × 3, 5 × 5, and 8 × 8 are respectively used for performing general pooling on the feature map output by the last residual error unit, and the feature map output by the last residual error unit is subjected to enhanced pooling twice continuously, a plurality of pooling kernels with different scales are used for enhanced pooling each time, and finally deep feature vectors obtained by the general pooling and the enhanced pooling are fused. By introducing the enhanced pooling, important information on whether the detection object can be detected before pooling, such as multi-pose information, multi-scale information, spatial position information and the like, can be further reserved, so that the accuracy of the vehicle detection model is improved.

It should be noted that the selection of the pooling core in fig. 5 is only an example, and other pooling cores may be selected according to the size of the feature map.

In a prediction module of the vehicle detection model, as shown in fig. 4, reducing the dimension of the pooled deep layer feature vectors, and predicting a first probability that the detection object exists in the training sample based on the deep layer feature vectors after dimension reduction; the pooled deep layer feature vector is subjected to up-sampling, and a second probability that the detection object exists in the training sample is predicted by combining a shallow layer feature vector corresponding to the feature map output by the fourth residual error unit; and predicting a third probability that the detection object exists in the training sample by combining two shallow feature vectors corresponding to the feature maps output by the third residual unit and the fourth residual unit. Further, the first probability, the second probability and the third probability are weighted to obtain weighted probabilities, so that whether the training sample contains the detection object or not is predicted.

Based on the trained vehicle detection model, a video-based intersection deadlock event detection method flow is executed, referring to fig. 6, and the flow is executed by a detection device and mainly includes the following steps:

s601: and acquiring traffic video stream of the target intersection.

In S601, the detection device switches on the acquisition device of the target intersection, and acquires a traffic video stream from the switched-on acquisition device.

S602: and aiming at each video frame, acquiring a feature map set corresponding to a preset detection area of the video frame by adopting a trained vehicle detection model.

In the embodiment of the application, the detection area corresponding to each acquisition device is predetermined, and when S602 is executed, the detection device determines, for each acquired video frame, the preset detection area of the video frame by determining the acquisition device to which the video frame belongs, and acquires a feature map set corresponding to the preset detection area of the video frame by using a trained vehicle detection model. Wherein the scales of different feature maps in the feature map set are different.

For example, the feature map set includes three feature maps, the first feature map having a scale of 32 × 32 pixels, the second feature map having a scale of 16 × 16 pixels, and the third feature map having a scale of 8 × 8 pixels.

S603: and respectively carrying out random grouping convolution on partial channels on at least one characteristic diagram in the characteristic diagram set to obtain a characteristic vector corresponding to the corresponding characteristic diagram.

In specific implementation, as shown in fig. 7, for at least one feature map in the feature map set, the following steps are respectively performed: and performing grouping convolution on the parts of the feature map, and recombining the last two thirds parts of the feature map channel after convolution to obtain a feature vector corresponding to the feature map.

When at least one feature map is a part of the feature map set, the remaining feature maps may be subjected to standard convolution.

For example, still taking the example that the feature map includes three feature maps, the first feature map is subjected to standard convolution, and the second and third feature maps are respectively subjected to partial channel random grouping convolution, so as to obtain a feature vector of each feature map.

S604: and performing multi-scale enhanced pooling operation at least once on the feature map with the minimum scale in the feature map set to obtain a fused feature vector.

Since the different feature maps in the feature map set have different scales, the smaller the scale is, the deeper the number of convolution layers where the feature map is located is, the richer the semantic information contained is, therefore, in S604, the feature map with the smallest scale contains the richest deep semantic information, and the feature map with the smallest scale is subjected to at least one multi-scale enhanced pooling operation to obtain the fused feature vector. With particular reference to fig. 8:

s6041: and copying the feature map with the minimum scale into N pieces, wherein N is an integer larger than 1.

For example, taking the case that the feature map set includes three feature maps, 8 × 8 feature maps are copied into 4.

S6042: pooling the copied N-1 feature maps to obtain N-1 first feature vectors, wherein the pooling kernels used by the N-1 feature maps are different.

In specific implementation, performing spatial pyramid pooling on the copied 1 st feature map by using a pooling core 1; performing spatial pyramid pooling on the copied 2 nd feature map by using a pooling kernel 2; and in the same way, performing spatial pyramid pooling on the copied N-1 characteristic diagram by using the pooling core N-1.

S6043: and pooling the rest 1 copied feature map by using pooling kernels with different scales, merging the pooled feature maps into a new feature map, taking the new feature map as the input of the next enhanced pooling operation, and performing multi-scale enhanced pooling again to obtain a second feature vector.

Taking an example of performing two times of enhanced pooling on the copied remaining 1 feature graph, as shown in fig. 3, pooling the feature graph with pooling kernels of 1 × 1, 2 × 2, 3 × 3, and 5 × 5, respectively, and performing upsampling after pooling to obtain four feature sub-graphs having a size consistent with that of the feature graph, then fusing the four feature sub-graphs with the feature graph to obtain a new feature graph, pooling the new feature graph with the pooling kernels of 1 × 1, 2 × 2, 3 × 3, and 5 × 5, respectively, and performing upsampling after pooling to obtain four feature sub-graphs having a size consistent with that of the new feature graph, and then fusing the four feature sub-graphs with the new feature graph to obtain a final feature vector.

S6044: and fusing the N-1 first feature vectors and the second feature vectors to obtain fused feature vectors.

The N-1 first feature vectors are obtained in a spatial pyramid Pooling mode, the spatial pyramid Pooling mode adopts maximum Pooling (Max Pooling), information irrelevant to the vehicle in the first feature vectors is eliminated, the second feature vectors are obtained by at least one time of enhanced Pooling, information such as the posture, the scale information and the spatial position of the vehicle is reserved, and after the information is fused, feature description of the vehicle is enhanced, so that the accuracy of vehicle detection is improved.

S605: and obtaining at least one target vehicle contained in the video frame according to the feature vector corresponding to each feature map in the feature map set.

In S605, the shallow feature vector and the deep feature vector corresponding to the feature maps of different scales include information such as multi-pose, multi-scale, and spatial position of the vehicle, so that at least one target vehicle included in the video frame can be accurately detected based on the feature vector corresponding to each feature map.

S606: and determining a traffic parameter set of the target intersection according to the detected target vehicles.

The specific process of determining the traffic parameter set is shown in fig. 9:

s6061: and determining the running direction of each target vehicle according to the detected running data of each target vehicle at different time.

Taking a target vehicle as an example, the target vehicle is tracked by using radar, direction information 1 of the target vehicle at the time t1 and direction information 2 of the target vehicle at the time t2 are obtained, and the running direction of the target vehicle is determined according to the direction information 1 and the direction information 2. When the number of vehicles in the driving directions perpendicular to each other is large, an intersection deadlock event may occur.

S6062: and calculating the area ratio of the total area of the detected target vehicles to the preset detection area of the video frame.

Considering that the traffic scene of the intersection is complex, a plurality of target vehicles may exist in one intersection, the total area size of each target vehicle can be obtained through the size of the detection frame of each target vehicle, and the size of the preset detection area of the video frame is known, so that the area ratio of the total area to the preset detection area of each target vehicle is calculated. The area ratio reflects the space occupancy of each target vehicle in the video frame, and the larger the space occupancy is, the larger the number of vehicles in the target intersection is, the higher the probability of intersection deadlock events is.

S6063: and obtaining a traffic parameter set of the target intersection according to the determined driving direction and the area ratio.

In S6063, the driving direction and the area ratio of each target vehicle in the traffic parameter set fully describe the characteristics of the intersection deadlock event, and can be used for determining the intersection deadlock event.

S607: and determining whether an intersection deadlock event occurs according to the traffic parameter set.

Specifically, when the area ratio is greater than a preset area threshold value and the number of target vehicles in the mutually perpendicular driving directions is greater than a preset vehicle threshold value, it is determined that an intersection deadlock event occurs.

For example, when the area ratio of each target vehicle in the traffic parameter set is greater than 70%, and the number of target vehicles in two driving directions perpendicular to each other is greater than 30%, it is determined that an intersection deadlock event occurs at the target intersection.

The embodiment of the application provides a video-based intersection deadlock event detection method, which comprises the steps of automatically detecting an intersection deadlock event based on a traffic video stream of an intersection acquired in real time, detecting each target vehicle at a target intersection through a trained vehicle detection model, acquiring a traffic parameter set of the target intersection according to the detected driving direction and area ratio of each target vehicle, and further carrying out logic judgment on the intersection deadlock event by using the acquired traffic parameter set. Because at least one residual error unit in a vehicle detection model used by the method adopts partial channel random grouping convolution, the original information of partial channels in a vehicle characteristic diagram is kept in the characteristic extraction process, the recombination information of the partial channels in the vehicle characteristic diagram is also obtained, and the semantic information of vehicles in the extracted characteristic vector is enhanced, so that the vehicle detection accuracy is improved in intersections with dense vehicle distribution or shielding, and the detection accuracy of intersection deadlock events is improved; in addition, the vehicle detection model adopts a multilayer characteristic strengthening and fusing mode, reduces the sensitivity of the model to the vehicle posture, further improves the detection accuracy of the target vehicle in the intersection and the detection accuracy of the intersection deadlock event, is convenient for timely handling the intersection with the deadlock event, avoids the continuous accumulation of the vehicle at the intersection, improves the traffic efficiency of the vehicle in the intersection, effectively relieves the urban traffic jam, and reduces the occurrence rate of traffic accidents.

Referring to fig. 10, an effect diagram of detecting an intersection deadlock event is provided in the embodiment of the present application, where in a preset detection area marked by a thick solid line, a vehicle detection model provided in the embodiment of the present application can accurately detect a plurality of target vehicles, and according to a detected driving direction of each target vehicle and an area ratio in the preset detection area, it can accurately determine that an intersection deadlock event occurs at the intersection, and it is necessary to perform traffic diversion in time.

In some embodiments, after it is determined that an intersection deadlock event occurs, further, the detection device sends a prompt message to a downstream vehicle associated with the target intersection, and after receiving the prompt message, the downstream vehicle can re-plan a route or adjust a driving speed, so as to alleviate a congestion condition of the intersection, improve the normal operation efficiency of the target intersection, and avoid a traffic accident.

The method for detecting the intersection deadlock event based on the video can be applied to an intelligent traffic system, and automatic detection and real-time early warning of the intersection deadlock event are achieved. Under the condition that the multidirectional intersection enters the deadlock state, the method can quickly and accurately detect the deadlock event within 30s, improves the processing efficiency of the intersection deadlock event by 20 percent, and effectively maintains the normal running state of the traffic order.

Based on the same technical concept, embodiments of the present application provide a detection device, which can execute the method for detecting a video-based intersection deadlock event provided in the foregoing embodiments, and can achieve the same technical effect, and details are not repeated herein.

Referring to fig. 11, the detection apparatus includes a processor 1101, a memory 1102, and a communication interface 1103, where the communication interface 1103 and the memory 1102 are connected to the processor 1101 through a bus 1104, the memory 1102 stores computer program instructions, and the processor 1101 performs the following operations according to the computer program instructions stored in the memory 1102:

acquiring a traffic video stream of a target intersection through the communication interface 1103;

Optionally, the processor 1101 performs partial channel random grouping convolution on all or part of the feature maps in the feature map set to obtain a feature vector corresponding to each feature map, and the specific operation is:

and grouping and convolving the channels of the feature maps aiming at least one feature map in the feature map set, and recombining the last two thirds of the convolved feature map channels to obtain the feature vector corresponding to the feature map.

Optionally, the processor 1101 performs at least one multi-scale enhanced pooling operation on the feature map with the smallest scale in the feature map set to obtain a fused feature vector, where the specific operation is:

Optionally, the processor 1101 determines a traffic parameter set of the target intersection according to each detected target vehicle, and specifically performs the following operations:

Optionally, the processor 1101 determines whether an intersection deadlock event occurs according to the traffic parameter set, and the specific operation is:

Optionally, after determining that the intersection deadlock event occurs, the processor 1101 further performs the following operations:

sending prompt information to the downstream vehicle associated with the target intersection through the communication interface 1103, so that the downstream vehicle replans a route or adjusts the running speed according to the prompt information.

It should be noted that fig. 11 is only necessary hardware for implementing the video-based intersection deadlock event detection method provided by the embodiment of the present application, and optionally, the detection device further includes conventional hardware such as a display, a video processor, and the like.

The processor referred to in the embodiments of the present application may be a Central Processing Unit (CPU), a general-purpose processor, a Digital Signal Processor (DSP), an application-specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like. Wherein the memory may be integrated in the processor or may be provided separately from the processor.

Embodiments of the present application also provide a computer-readable storage medium for storing instructions that, when executed, may implement the methods of the foregoing embodiments.

The embodiments of the present application also provide a computer program product for storing a computer program, where the computer program is used to execute the method of the foregoing embodiments.

The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the illustrative discussions above are not intended to be exhaustive or to limit the embodiments to the precise forms disclosed above. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles and the practical application, to thereby enable others skilled in the art to best utilize the embodiments and various embodiments with various modifications as are suited to the particular use contemplated.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for detecting an intersection deadlock event based on a video is characterized by comprising the following steps:

acquiring a traffic video stream of a target intersection;

performing at least one multi-scale enhanced pooling operation on the feature map with the minimum scale in the feature map set to obtain a fused feature vector, wherein the multi-scale enhanced pooling operation is as follows: performing primary pooling operation on the copied N-1 minimum-scale feature maps by using different pooling kernels respectively, and performing multiple pooling operations on the rest 1 minimum-scale feature maps by using different-scale pooling kernels to realize pooling enhancement, wherein N is an integer greater than 1;

2. The method according to claim 1, wherein the performing random grouping convolution on at least one feature map in the feature map set to obtain a feature vector corresponding to the corresponding feature map comprises:

3. The method according to claim 1, wherein the performing at least one multi-scale enhanced pooling operation on the feature map with the smallest scale in the feature map set to obtain a fused feature vector comprises:

copying the feature map with the minimum scale into N pieces;

4. The method of claim 1, wherein said determining a set of traffic parameters for the target intersection based on each detected target vehicle comprises:

calculating the area ratio of the total area of the detected target vehicles to the preset detection area;

5. The method of claim 4, wherein said determining whether an intersection deadlock event occurs based on said set of traffic parameters comprises:

6. The method of any of claims 1-5, wherein upon determining that an intersection deadlock event occurs, the method further comprises:

7. The crossing deadlock event detection device based on the video is characterized by comprising a processor, a memory and a communication interface, wherein the communication interface, the memory and the processor are connected through a bus:

8. The detection apparatus according to claim 7, wherein the processor performs random grouping convolution on at least one feature map in the feature map set by using partial channels, respectively, to obtain feature vectors corresponding to the corresponding feature maps, and the specific operation is:

9. The detection apparatus according to claim 7, wherein the processor performs at least one multi-scale enhanced pooling operation on the feature map of the smallest scale in the feature map set to obtain a fused feature vector, and the specific operations are:

copying the feature map with the minimum scale into N pieces;

10. The detection device according to claim 7, wherein the processor determines the traffic parameter set of the target intersection according to each detected target vehicle by: