CN113469026B

CN113469026B - Intersection retention event detection method and system based on machine learning

Info

Publication number: CN113469026B
Application number: CN202110735123.XA
Authority: CN
Inventors: 汪志涛; 胡健萌; 许乐; 倪红波; 李汪; 唐崇伟
Original assignee: Shanghai Intelligent Transportation Co ltd
Current assignee: Shanghai Intelligent Transportation Co ltd
Priority date: 2021-06-30
Filing date: 2021-06-30
Publication date: 2023-03-24
Anticipated expiration: 2041-06-30
Also published as: CN113469026A

Abstract

The invention relates to a method and a system for detecting an intersection retention event based on machine learning, belonging to the field of computer vision and comprising the following steps: collecting continuous multi-frame images of a road junction to be detected in a set time period; obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image according to the position information and the speed information of each vehicle target frame in the ith frame image and the adjacent three frames of images, wherein i is greater than 3; based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; and determining the retention event of the intersection to be detected according to the retention state of the intersection to be detected at each time point. The determined multi-dimensional feature vector has continuity and robustness between front and back frames, so that misjudgment of a prediction result can be reduced, and the accuracy of detection of intersection retention events is improved.

Description

Intersection retention event detection method and system based on machine learning

Technical Field

The invention relates to the field of computer vision, in particular to a method and a system for detecting intersection retention events based on machine learning.

Background

For traffic jam, scholars at home and abroad give different definitions aiming at different angles, generally speaking, the traffic jam is a traffic running condition that the traffic demand exceeds the traffic supply, vehicles are blocked on a traffic lane outside a signal lamp-controlled intersection, the queuing length exceeds 250m, or an intersection where the vehicles which do not pass through the intersection are defined as a jammed intersection when the intersection is green for 3 times under the control of a signal lamp; the congested section is defined as a state that the vehicles are blocked on the roadway and the queuing length exceeds 1 km.

The existing traffic state (including traffic congestion) detection algorithm can be roughly divided into an indirect mode and a direct mode, wherein the indirect detection mode is a method for judging the traffic state through parameters such as traffic flow, occupancy, delay time and the like of upstream and downstream sections. The direct detection mode is to detect the traffic state through a direct traffic state detection algorithm or a manual monitoring means, and is typically a traffic state detection method based on video images.

However, the traffic retention state in an actual scene is a dynamic change process, and most of the existing retention detection technical solutions, whether direct or indirect, only classify and judge the state under a single time node without considering information correlation between previous and next frames, so that the robustness of the algorithm is poor, and false alarm or false alarm omission easily occurs in some special scenes.

Disclosure of Invention

The invention aims to provide a method and a system for detecting intersection retention events based on machine learning, which can improve the detection accuracy and reduce the retention false alarm in an actual scene.

In order to achieve the purpose, the invention provides the following scheme:

a machine learning-based intersection retention event detection method, comprising:

collecting continuous multi-frame images of a road junction to be detected in a set time period;

obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; the vehicle target frame is a vehicle image frame which is calibrated in each frame image in advance aiming at each vehicle;

aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image, wherein i is greater than 3, according to the position information and the speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image;

based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; the retention state is dredging or retention;

determining a retention event of the intersection to be detected in the time period according to the retention state of the intersection to be detected at each time point;

the retention event is from the beginning of a retention event to the end of a retention event; wherein, when the detention state of the intersection to be detected changes from dredging to detention, the detention state is the beginning of a detention event; and when the retention state of the intersection to be detected is changed from retention to dredging, ending a retention event.

Optionally, the obtaining, according to each frame image, position information and speed information of each vehicle target frame in the real coordinate system specifically includes:

determining a spatial projection relation corresponding to an image coordinate system and a real coordinate system according to any frame of image;

for each frame of image, determining the position information of the image coordinate of the corresponding vehicle target frame in a real coordinate system according to the space projection relation and the image coordinate of each vehicle target frame in the image;

and obtaining the speed information of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

Optionally, the determining, according to any frame of image, a spatial projection relationship corresponding to an image coordinate system and a real coordinate system specifically includes:

any four image coordinates which are not on a straight line in the plane of the road surface are marked in any image;

acquiring real coordinates of the intersection to be detected corresponding to the coordinates of the four images;

and determining the spatial projection relation according to the four pairs of corresponding image coordinates and real coordinates.

Optionally, for the ith frame image, obtaining a multidimensional feature vector, i >3, of the ith frame image according to the position information and the speed information of each vehicle target frame in the ith frame image, the (i-1) th frame image, the (i-2) th frame image, and the (i-3) th frame image, specifically including:

dividing the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image into four areas in equal proportion;

for each region, obtaining eight eigenvectors of the region according to the position information and the speed information of each vehicle target frame of the region; the eight feature vectors comprise the number of vehicles, the transverse average speed of the vehicles, the longitudinal average speed of the vehicles, the area of a lane area, the average width ratio of a vehicle detection frame to a lane, the average height ratio of the vehicle detection frame to the lane, the change rate of the traffic flow and the stability of a picture;

and integrating the feature vectors of each region of the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image to obtain the multi-dimensional feature vector.

Optionally, the determining, based on the retention detection model, the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multidimensional feature vector of the ith frame image specifically includes:

acquiring continuous multi-frame historical images of a road junction to be detected;

obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of historical image;

aiming at the ith frame of historical image, obtaining a multi-dimensional feature vector of the ith frame of historical image, wherein i is greater than 3, according to the position information and the speed information of each vehicle target frame in the ith frame of historical image, the (i-1) th frame of historical image, the (i-2) th frame of historical image and the (i-3) th frame of historical image;

determining a corresponding retention state prediction confidence coefficient according to the multi-dimensional feature vector of the ith frame of historical image;

training a random forest classifier according to the multi-dimensional feature vectors of the historical images of the frames and the corresponding retention state prediction confidence coefficient to obtain a retention detection model;

based on the retention detection model, determining a retention state prediction confidence corresponding to the ith frame of image according to the multi-dimensional feature vector of the ith frame of image;

and determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the retention state prediction confidence corresponding to the ith frame image.

Optionally, the determining, based on the retention detection model, the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image further includes:

after determining a retention state prediction confidence corresponding to the ith frame of image based on the retention detection model, performing smoothing processing on the retention state prediction confidence corresponding to the ith frame of image to obtain a processed retention state prediction confidence; and the processed retention state prediction confidence coefficient is used for determining the retention state of the intersection to be detected at the time point corresponding to the ith frame of image.

Optionally, the processed retention state prediction confidence is obtained according to the following formula:

Score' _i ＝r×Score _i +(1-r)×Score' _i-1 ,i＞4；

Score' ₄ ＝Score ₄ ；

wherein, score' _i Is the retention state prediction confidence after the i-th frame image processing, score _i Is a retention state prediction confidence, score' _i-1 The retention state prediction confidence after the processing of the i-1 th frame image is shown, and r is a damping coefficient.

In order to achieve the above purpose, the invention also provides the following scheme:

a machine learning based intersection retention event detection system, the machine learning based intersection retention event detection system comprising:

the acquisition unit is used for acquiring continuous multi-frame images of the intersection to be detected within a set time period;

the position and speed determining unit is connected with the acquisition unit and is used for obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; the vehicle target frame is a vehicle image frame which is calibrated in each frame image in advance aiming at each vehicle;

a multi-dimensional feature vector determining unit connected with the position and speed determining unit and used for obtaining a multi-dimensional feature vector of the ith frame image, wherein i >3 according to the position information and the speed information of each vehicle target frame in the ith frame image, the (i-1) th frame image, the (i-2) th frame image and the (i-3) th frame image;

the retention state determining unit is connected with the multi-dimensional characteristic vector determining unit and used for determining the retention state of the intersection to be detected at the time point corresponding to the ith frame of image according to the multi-dimensional characteristic vector of the ith frame of image based on a retention detection model; the retention state is dredging or retention;

the retention event determining unit is connected with the retention state determining unit and used for determining the retention event of the intersection to be detected in the time period according to the retention state of the intersection to be detected at each time point; the retention event is from the beginning of a retention event to the end of a retention event; wherein, when the detention state of the intersection to be detected changes from dredging to detention, the detention state is the beginning of a detention event; and when the retention state of the intersection to be detected is changed from retention to dredging, ending a retention event.

Optionally, the position and velocity determination unit comprises:

the spatial projection relation determining module is connected with the acquisition unit and used for determining a spatial projection relation corresponding to an image coordinate system and a real coordinate system according to any frame of image;

the position determining module is respectively connected with the acquisition unit and the spatial projection relation determining module and is used for determining the position information of the image coordinate of the corresponding vehicle target frame in a real coordinate system according to the spatial projection relation and the image coordinate of each vehicle target frame in each image;

and the speed determining module is connected with the position determining module and is used for obtaining the speed information of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

Optionally, the spatial projection relationship determining module includes:

the image coordinate extraction submodule is connected with the acquisition unit and is used for calibrating any four image coordinates which are not on a straight line in a road surface plane in any image;

the real coordinate acquisition submodule is used for acquiring real coordinates of the intersection to be detected, which correspond to the four image coordinates;

and the projection relation determining submodule is respectively connected with the image coordinate extracting submodule and the real coordinate obtaining submodule and is used for determining a space projection relation according to four pairs of corresponding image coordinates and real coordinates.

According to the specific embodiment provided by the invention, the invention discloses the following technical effects: collecting continuous multi-frame images of a road junction to be detected in a set time period; obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image according to the position information and the speed information of the vehicle target frame in the ith frame image and the adjacent three frames of images, wherein i is greater than 3, and the determined multi-dimensional feature vector has continuity and robustness between front and back frames, so that the misjudgment of a prediction result can be reduced; based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; and finally, determining the retention event of the intersection to be detected according to the retention state of the intersection to be detected at each time point, thereby improving the accuracy of detecting the retention event of the intersection.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a flow chart of a machine learning based intersection retention event detection method of the present invention;

fig. 2 is a block diagram of the intersection retention event detection system based on machine learning according to the present invention.

Description of the symbols:

the device comprises an acquisition unit-1, a position and velocity determination unit-2, a multi-dimensional feature vector determination unit-3, a retention state determination unit-4 and a retention event determination unit-5.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.

The invention aims to provide a method and a system for detecting intersection retention events based on machine learning, which are implemented by acquiring continuous multi-frame images of an intersection to be detected within a set time period; obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image according to the position information and the speed information of the vehicle target frame in the ith frame image and the adjacent three frames of images, wherein i is greater than 3, and the determined multi-dimensional feature vector has continuity and robustness between front and back frames, so that the misjudgment of a prediction result can be reduced; based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; and finally, determining the retention event of the intersection to be detected according to the retention state of the intersection to be detected at each time point, thereby improving the accuracy of detecting the retention event of the intersection.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

As shown in fig. 1, the intersection retention event detection method based on machine learning of the present invention includes:

s1: and collecting continuous multi-frame images of the intersection to be detected in a set time period.

S2: obtaining the position information and the speed information of each vehicle target frame in a real coordinate system according to each frame of image; the vehicle target frame is a vehicle image frame which is calibrated in each frame image in advance aiming at each vehicle.

S3: and aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image, wherein i is greater than 3, according to the position information and the speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image.

S4: based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; the retention state is dredging or retention.

S5: and determining the retention events of the intersection to be detected in the time period according to the retention states of the intersection to be detected at all time points.

In this embodiment, the holdover event is the beginning of a holdover event to the end of a holdover event; and when the detention state of the intersection to be detected changes from dredging to detention, the detention event is started. And when the retention state of the intersection to be detected is changed from retention to dredging, ending a retention event.

In order to improve the degree of distinction of retention events, the intersection retention event detection method based on machine learning further comprises the following steps:

and acquiring the time interval of two adjacent retention events, and merging the two retention events if the time interval of the two adjacent retention events is smaller than an interval threshold.

And acquiring the time length of the retention event for each retention event, and deleting the retention event if the time length of the retention event is smaller than a time threshold.

Further, S2: obtaining the position information and the speed information of each vehicle target frame under the real coordinate system according to each frame of image, which specifically comprises the following steps:

s21: and determining a spatial projection relation corresponding to the image coordinate system and the real coordinate system according to any frame of image.

S22: and determining the position information of the image coordinates of the corresponding vehicle target frame in a real coordinate system according to the space projection relation and the image coordinates of each vehicle target frame in each frame of image.

S23: and obtaining the speed information of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

As another embodiment, when determining the speed information, acquiring position information of a central point of the vehicle target frame in a real coordinate system; and obtaining the speed information of the central point of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the central point of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

Further, S21: according to any frame of image, determining a spatial projection relation corresponding to an image coordinate system and a real coordinate system, specifically comprising:

any four image coordinates not on a straight line in the road surface plane are specified in any one image.

And acquiring real coordinates of the intersection to be detected corresponding to the coordinates of the four images. In this embodiment, by obtaining an aerial photography top view of the intersection to be measured, the real coordinates corresponding to the four image coordinates are determined according to the aerial photography top view.

And determining the space projection relation according to the four pairs of corresponding image coordinates and real coordinates.

In this embodiment, the transformation formula of the spatial projection relationship is:

wherein u, v, w are image coordinatesX ', y ', w ' are real coordinates, a ₁₁ To a ₃₃ A perspective transformation matrix composed of 9 unknown parameters.

In addition, the normal perspective transformation matrix has 9 unknown variables, and coordinates of four point pairs are required to be provided to obtain the transformation matrix. However, for the application scenario of the present invention, the purpose of the calibration frame is to obtain the speed of the vehicle target, and only the relative position of the target frame in the real coordinate system is required to be obtained, instead of the real position, so that only three point pairs are required.

Substituting the four pairs of corresponding image coordinates and real coordinates into the transformation formula to obtain four equation sets, and solving the equation sets to obtain 9 unknown parameters a of the perspective transformation matrix ₁₁ To a ₃₃ The value of (c).

The real positions and the real speeds of all vehicles can be obtained through the space projection relation.

Preferably, S3: for the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image, wherein i >3, according to the position information and the speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image, specifically comprising the following steps:

s31: dividing the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image into four areas in equal proportion.

S32: and aiming at each area, obtaining eight characteristic vectors of the area according to the position information and the speed information of each vehicle target frame of the area. In the present embodiment, the eight feature vectors include the number of vehicles, the vehicle lateral average speed, the vehicle longitudinal average speed, the lane area, the average width ratio of the vehicle detection frame to the lane, the average height ratio of the vehicle detection frame to the lane, the rate of change of the traffic flow, and the screen stability.

S33: and integrating the feature vectors of each region of the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image to obtain the multi-dimensional feature vector. In this embodiment, the multidimensional feature vector is a 128-dimensional feature vector of 4 regions × 4 frames × 8 features.

Aiming at the characteristics of road conditions when a traffic intersection is detained, the feature vector acquired by the invention mainly comprises static features and dynamic features. The static characteristics refer to the vehicle and lane characteristics of the current frame and are used for describing a lane detention static state under the current frame; the dynamic feature refers to vehicle and lane features of three frames extracted every 10s in the past, and is used for describing dynamic changes of a lane staying state in the past time period.

Specifically, the number of vehicles is the number of vehicles in a lane area of the intersection to be detected, and the characteristic is positively correlated with the retention state of the intersection to be detected.

The vehicle transverse average speed is the average speed of the vehicle in the horizontal direction in the lane area of the intersection to be detected, and the characteristic is negatively correlated with the retention state of the intersection to be detected.

The longitudinal average speed of the vehicle is the average speed of the vehicle in the vertical direction in the lane area of the intersection to be detected, and the characteristic is negatively correlated with the retention state of the intersection to be detected.

The area of the lane area is the area of the lane area in the image screen. Due to the fact that the distance and the angle of a camera shooting picture are different, the area of lanes with the same size in different camera pictures is different, and errors caused by camera differences are caused. The introduction of this feature may alleviate such problems.

The average width ratio of the vehicle detection frame to the lane is the average value of the ratio of each vehicle target frame to the lane width, and the characteristic represents the bearing capacity of the lateral direction of the lane area to the retention jam.

The average height ratio of the vehicle detection frame to the lane is the average value of the height ratio of each vehicle target frame to the lane, and the characteristic represents the bearing capacity of the longitudinal direction of the lane area to the retention jam.

The traffic flow change rate is the traffic flow of the lane area under each frame of image, and the traffic flow of the current frame minus the traffic flow of the previous frame is the traffic flow change rate. The special frame is in negative correlation with the retention state of the intersection to be detected.

In the embodiment, a ball-based camera is adopted to collect multi-frame images of the intersection to be detected. The ball-based camera may rotate, and the scene may be affected by day and night changes, lighting conditions, and climate, and the picture status of the ball-based camera is unstable. Therefore, for each camera for collecting the image of the intersection to be detected, the invention extracts a reference picture in advance and selects a plurality of sub-areas from the background part of the picture. And obtaining the similarity of the two images by carrying out template matching on the sub-regions of the reference image and the current frame image. The picture stability is the similarity of two images.

Further, S4: based on a retention detection model, determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image, specifically comprising:

s41: acquiring continuous multi-frame historical images of the intersection to be detected.

S42: and obtaining the position information and the speed information of each vehicle target frame in the real coordinate system according to each frame of historical image.

S43: and aiming at the ith frame of historical image, obtaining the multi-dimensional feature vector of the ith frame of historical image, wherein i is greater than 3, according to the position information and the speed information of each vehicle target frame in the ith frame of historical image, the ith-1 frame of historical image, the ith-2 frame of historical image and the ith-3 frame of historical image.

S44: and determining the corresponding retention state prediction confidence according to the multi-dimensional feature vector of the ith frame of historical image.

S45: and training a random forest classifier according to the multi-dimensional feature vector of each frame of historical image and the corresponding retention state prediction confidence coefficient to obtain a retention detection model.

S46: based on the retention detection model, determining a retention state prediction confidence corresponding to the ith frame of image according to the multi-dimensional feature vector of the ith frame of image;

s47: and determining the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the retention state prediction confidence corresponding to the ith frame image.

Further, S4: based on a detention detection model, determining the detention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multi-dimensional characteristic vector of the ith frame image, and further comprising:

after determining a retention state prediction confidence degree corresponding to the ith frame of image based on the retention detection model, performing smoothing processing on the retention state prediction confidence degree corresponding to the ith frame of image to obtain a processed retention state prediction confidence degree; and the processed retention state prediction confidence coefficient is used for determining the retention state of the intersection to be detected at the time point corresponding to the ith frame of image.

In this embodiment, the processed retention state prediction confidence is obtained according to the following formula:

Score' _i ＝r×Score _i +(1-r)×Score' _i-1 ,i＞4；

Score' ₄ ＝Score ₄ ；

The prediction result of the random forest classifier is the voting result of the decision tree, and the principle of training by the random forest classifier adopted by the invention is as follows:

dividing a sample data set into a training set and a test set;

each tree in the forest has the same probability distribution, and the classification error of the classifier depends on the classification capability of each decision tree and the relevance between the trees. Although the classification performance of a decision tree is relatively weak, through the integration of a large number of randomly generated decision trees, a test sample passes through each decision tree and is counted to determine the final classification result.

Random repeated replacement extraction from the entire training set (n samples) results in m new subsets of samples. Constructing a decision tree for each sample subset, wherein samples which are not extracted in each extraction form corresponding m pieces of out-of-bag data; each sample has 128 characteristic variables, y variables are randomly extracted for each decision node of each decision tree, and one variable with the strongest classification capability and a threshold value are selected. In the process of generating the decision tree, each tree grows as much as possible and no pruning is performed. And (4) combining the generated decision trees into a random forest, and comprehensively determining the prediction result of the random forest classifier according to the voting result of each tree.

And after the training of the random forest classifier is finished, the feature vectors extracted from the test set are sent to the classifier, and the retention state prediction confidence of the current frame can be obtained.

And after smoothing the confidence coefficient of the stay state prediction, judging the stay state of the current frame image if the confidence coefficient is higher than a threshold value.

Because the change of the confidence coefficient of the retention state prediction between different adjacent frames is jumping and discontinuous, the confidence coefficient needs to be smoothed, and in the embodiment, the confidence coefficient is smoothed by adopting a window average method, so that a prediction result with high robustness is obtained. After adding the window smoothing, the confidence of each frame is updated according to the proportion of the damping coefficient r. When a certain frame of a video is detained, the prediction score of the model reaches a threshold value through a smoothing process, so that false detection of a few frames can be avoided, and the prediction result of the method has continuity and robustness.

Because the detention state of the intersection to be detected is a dynamic change process, the detection of the detention state of the intersection to be detected only according to the single-frame image has limitation. The method firstly calibrates the collected multi-frame images, and obtains the real positions and the real speeds of all vehicles by utilizing projection transformation. And secondly, obtaining a multi-dimensional retention characteristic vector according to vehicle condition data information in a multi-frame image in a time period, and combining a random forest classifier to enable the retention classified characteristic vector to have continuity and robustness between front and rear frames so as to reduce misjudgment of a prediction result. And finally, carrying out window smoothing treatment on the prediction confidence coefficient of the retention state by using a window averaging algorithm to obtain a prediction result with stronger robustness.

As shown in fig. 2, the intersection retention event detection system based on machine learning of the present invention includes: an acquisition unit 1, a position and velocity determination unit 2, a multi-dimensional feature vector determination unit 3, a retention state determination unit 4, and a retention event determination unit 5.

The acquisition unit 1 is used for acquiring continuous multi-frame images of the intersection to be detected in a set time period.

The position and speed determining unit 2 is connected with the collecting unit 1, and the position and speed determining unit 2 is used for obtaining position information and speed information of each vehicle target frame in a real coordinate system according to each frame image; the vehicle target frame is a vehicle image frame which is calibrated in each frame image in advance aiming at each vehicle.

The multi-dimensional feature vector determination unit 3 is connected to the position and speed determination unit 2, and the multi-dimensional feature vector determination unit 3 is configured to, for an ith frame image, obtain a multi-dimensional feature vector of the ith frame image, where i >3, according to position information and speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image, and the ith-3 frame image.

The retention state determining unit 4 is connected to the multi-dimensional feature vector determining unit 3, and the retention state determining unit 4 is configured to determine, based on a retention detection model, a retention state of the intersection to be detected at a time point corresponding to the ith frame image according to the multi-dimensional feature vector of the ith frame image; the retention state is dredging or retention.

The retention event determining unit 5 is connected with the retention state determining unit 4, and the retention event determining unit 5 is configured to determine a retention event of the intersection to be measured in the time period according to the retention state of the intersection to be measured at each time point; the retention event is from the beginning of a retention event to the end of a retention event; wherein, when the detention state of the intersection to be detected changes from dredging to detention, the detention state is the beginning of a detention event; and when the retention state of the intersection to be detected is changed from retention to dredging, ending a retention event.

Further, the position and velocity determination unit 2 includes: the device comprises a spatial projection relation determining module, a position determining module and a speed determining module.

The spatial projection relation determining module is connected with the acquisition unit 1, and is used for determining a spatial projection relation corresponding to an image coordinate system and a real coordinate system according to any frame of image.

The position determining module is respectively connected with the acquisition unit 1 and the spatial projection relation determining module, and is used for determining the position information of the image coordinates of the corresponding vehicle target frame in a real coordinate system according to the spatial projection relation and the image coordinates of each vehicle target frame in each frame of image.

The speed determining module is connected with the position determining module and is used for obtaining the speed information of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

Still further, the spatial projection relationship determination module includes: the system comprises an image coordinate extraction submodule, a real coordinate acquisition submodule and a projection relation determination submodule.

The image coordinate extraction submodule is connected with the acquisition unit 1 and is used for calibrating any four image coordinates which are not on a straight line in a road surface plane in any frame of image.

And the real coordinate acquisition submodule is used for acquiring real coordinates of the intersection to be detected, which correspond to the four image coordinates.

The projection relation determining submodule is respectively connected with the image coordinate extracting submodule and the real coordinate obtaining submodule and is used for determining a space projection relation according to four pairs of corresponding image coordinates and real coordinates.

Compared with the prior art, the intersection retention event detection system based on machine learning has the same beneficial effects as the intersection retention event detection method based on machine learning, and is not repeated herein.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.

The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims

1. A machine learning-based intersection retention event detection method is characterized by comprising the following steps:

aiming at the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image, wherein i is greater than 3, according to the position information and the speed information of each vehicle target frame in the ith frame image, the (i-1) th frame image, the (i-2) th frame image and the (i-3) th frame image;

for the ith frame image, obtaining a multi-dimensional feature vector of the ith frame image according to the position information and the speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image, wherein i >3, specifically comprises the following steps:

for each region, obtaining eight eigenvectors of the region according to the position information and the speed information of each vehicle target frame of the region; the eight feature vectors comprise the number of vehicles, the transverse average speed of the vehicles, the longitudinal average speed of the vehicles, the area of a lane area, the average width ratio of a vehicle detection frame to a lane, the average height ratio of the vehicle detection frame to the lane, the change rate of the traffic flow and the stability of a picture; for each camera for collecting the image of the intersection to be detected, extracting a reference picture in advance, and selecting a plurality of sub-regions from the background part of the picture; obtaining the similarity of two images by carrying out template matching on the sub-regions of the reference picture and the current frame picture; the picture stability is the similarity of two images;

integrating the feature vectors of each region of the ith frame image, the ith-1 frame image, the ith-2 frame image and the ith-3 frame image to obtain a multi-dimensional feature vector;

the retention event is from the beginning of a retention event to the end of a retention event; wherein, when the detention state of the intersection to be detected changes from dredging to detention, the detention state is the beginning of a detention event; and when the detention state of the intersection to be detected is changed from detention to dredging, ending a detention event.

2. The intersection retention event detection method based on machine learning of claim 1, wherein the obtaining of the position information and the speed information of each vehicle target frame in the real coordinate system according to each frame image specifically comprises:

3. The machine learning-based intersection retention event detection method according to claim 2, wherein the determining a spatial projection relationship corresponding to an image coordinate system and a real coordinate system according to any frame of image specifically comprises:

4. The machine learning-based intersection retention event detection method according to claim 1, wherein the determining, based on the retention detection model, the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multidimensional feature vector of the ith frame image specifically includes:

aiming at the ith frame of historical image, obtaining a multi-dimensional feature vector of the ith frame of historical image according to the position information and the speed information of each vehicle target frame in the ith frame of historical image, the ith-1 frame of historical image, the ith-2 frame of historical image and the ith-3 frame of historical image, wherein i is greater than 3;

training a random forest classifier according to the multi-dimensional feature vector of each frame of historical image and the corresponding retention state prediction confidence coefficient to obtain a retention detection model;

5. The intersection retention event detection method based on machine learning according to claim 4, wherein the retention detection model is used to determine the retention state of the intersection to be detected at the time point corresponding to the ith frame image according to the multidimensional feature vector of the ith frame image, and further includes:

6. The machine learning-based intersection retention event detection method according to claim 5, wherein the processed retention state prediction confidence is obtained according to the following formula:

Score′ _i ＝r×Score _i +(1-r)×Score′ _i-1 ,i＞4；

Score′ ₄ ＝Score′ ₄ ；

7. A machine learning based intersection retention event detection system, comprising:

a multidimensional feature vector determination unit, connected to the position and speed determination unit, configured to, for an ith frame image, obtain a multidimensional feature vector, i >3, of the ith frame image according to position information and speed information of each vehicle target frame in the ith frame image, the ith-1 frame image, the ith-2 frame image, and the ith-3 frame image, where:

8. The machine learning-based intersection tie-down event detection system of claim 7, wherein the position speed determination unit comprises:

and the speed determining module is connected with the position determining module and used for obtaining the speed information of the vehicle target frame in the real coordinate system in the time points corresponding to the two frames of images according to the moving distance of the image coordinates of the same vehicle target frame in the two adjacent frames of images in the real coordinate system and the time difference between the two frames of images.

9. The machine learning-based intersection holdover event detection system of claim 8, wherein the spatial projection relationship determination module comprises:

the real coordinate acquisition sub-module is used for acquiring real coordinates of the intersection to be detected, which correspond to the four image coordinates;