CN110379168B

CN110379168B - Traffic vehicle information acquisition method based on Mask R-CNN

Info

Publication number: CN110379168B
Application number: CN201910550286.3A
Authority: CN
Inventors: 张建; 张博; 许肇峰
Original assignee: Guangdong Jiaoke Testing Co ltd; Southeast University
Current assignee: Guangdong Jiaoke Testing Co ltd; Southeast University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2021-09-24
Anticipated expiration: 2039-06-24
Also published as: CN110379168A

Abstract

The invention discloses a traffic vehicle information acquisition method based on Mask R-CNN, which can simultaneously acquire the type, number of axles, length, speed, lane where the vehicle runs and the number of vehicles statistical information of the vehicle in a traffic scene. The method comprises the steps of firstly establishing a vehicle virtual detection area in a traffic monitoring lens visual field range, and then detecting a video frame by frame based on a Mask R-CNN network. And tracking the vehicle target entering the detection area by using an SORT target tracking method. And after the vehicle leaves the detection area, taking the recognition value with the highest frequency of occurrence in the information sequence of the vehicle type, the number of axles and the lane where the vehicle is located, which are obtained from multiple frames in the vehicle tracking process in the detection area, as a final vehicle parameter, taking an average value of the vehicle lengths obtained from the multiple frames as the vehicle length, then calculating the vehicle speed according to the running distance and time of the vehicle in the detection area, and accumulating the number of passing vehicles on the corresponding lane. The method for acquiring the traffic vehicle information has high intelligent degree and can be used as an important component of intelligent traffic.

Description

Traffic vehicle information acquisition method based on Mask R-CNN

Technical Field

The invention relates to the field of computer vision technology and intelligent traffic, in particular to a traffic vehicle information acquisition method based on Mask R-CNN.

Background

The traffic vehicle information provides important information support for traffic planning, city management, automatic driving and infrastructure maintenance. The acquisition of the current transportation vehicle information is mainly based on embedded sensing taking an embedded sensor as a core and non-contact sensing taking technologies such as radar, infrared rays and video as the core. The embedded sensing mode has the advantages of higher measurement precision, strong stability, difficult external interference, expensive corresponding equipment cost, difficult replacement and incapability of acquiring information of vehicle models and the like. Non-contact sensing, particularly video-based methods, have been extensively studied in recent years because of their ability to obtain rich vehicle information. However, the existing traffic vehicle information identification based on video detection is usually limited to single tasks of identifying vehicle types, vehicle speeds, vehicle flow and the like, the robustness of the obtained vehicle information is poor, the advantages of a video method are not fully exerted, and the requirements of high-level intelligent traffic cannot be met.

Disclosure of Invention

Aiming at the defects of the existing methods and technologies, the invention provides a traffic vehicle information acquisition method based on Mask R-CNN by means of traffic monitoring lenses installed beside roads, and statistical information of the types, the number of axles, the length, the speed, the lanes where the vehicles run and the number of vehicles passing through can be acquired simultaneously.

In order to achieve the above purpose, the invention provides the following technical scheme:

a traffic vehicle information acquisition method based on Mask R-CNN utilizes traffic monitoring cameras arranged beside roads to combine with a Mask R-CNN network to simultaneously acquire statistical information of types, axles, lengths, speeds, lanes where vehicles run and the number of vehicles passing through.

The method specifically comprises the following steps:

1. the method comprises the steps of establishing a traffic scene image database containing vehicles, dividing the vehicles in the database into a plurality of types by using an image segmentation marking tool, simultaneously, independently dividing wheels into one type, and then training a Mask R-CNN network to enable the network to have the capacity of identifying the vehicles and the wheels in a traffic environment.

2. Respectively determining a first orthogonal vanishing point, a second orthogonal vanishing point and a third orthogonal vanishing point of a scene according to a lane line, a vehicle texture and a street lamp position in a traffic scene, drawing a tangent line to a vehicle Mask generated by Mask R-CNN based on the three orthogonal vanishing points to construct a three-dimensional vehicle boundary frame, and taking a lane where a midpoint of the bottom surface of the three-dimensional vehicle boundary frame is as a lane where a current vehicle runs.

3. And determining road surface calibration reference points by using a second vanishing point and lane dotted lines in a traffic scene, and obtaining a homography matrix between road plane world coordinates and image plane pixel coordinates based on pixel coordinates and world coordinates of the reference points, thereby providing a basis for calculating the length and the speed of the vehicle.

4. A virtual vehicle detection area is established in the visual field range of the traffic monitoring lens, vehicle targets outside the virtual detection area are ignored, and then frame-by-frame detection is carried out on the video based on a Mask R-CNN network, so that the information of the vehicle type, the number of axles, the length and the lane where the vehicle is located, which are identified in each frame of the running vehicle in the detection area, is obtained. The method comprises the steps of taking a point with the minimum pixel vertical coordinate in each identified wheel mask in an image as a vertex of the wheel mask, and then counting the number of the vertexes of the wheel mask contained in one vehicle mask, wherein the number of the vertexes is the number of vehicle axes.

5. In order to acquire vehicle information of each frame when a vehicle runs in the virtual detection area, the vehicle target entering the virtual detection area is tracked by combining a two-dimensional vehicle boundary frame generated by Mask R-CNN and an SORT target tracking method until the vehicle leaves the detection area.

6. After the vehicle leaves the virtual detection area, analyzing the vehicle type, the number of axles and the lane information of each frame obtained in the vehicle tracking process in the detection area, taking the recognition value with the highest frequency in all the frames as the final vehicle parameter, taking the average value of the corresponding vehicle lengths in all the obtained frames as the final vehicle length calculation value, then calculating the corresponding vehicle speed according to the running distance and time of the vehicle in the detection, and accumulating the number of passing vehicles on the corresponding lane.

Compared with the prior art, the invention has the beneficial effects that:

(1) the traffic vehicle information acquisition method provided by the invention can simultaneously acquire the statistical information of the type, the number of axles, the length, the speed, the driving lane and the number of vehicles of the vehicle, and has high intelligent degree.

(2) By utilizing target tracking, statistical analysis is carried out on vehicle parameters identified by multiple frames in the virtual detection area, compared with the traditional method that only a single frame is used for identifying results, the method is not easily affected by accidental missing detection and short-time shielding of targets, and more robust vehicle information is obtained.

(3) The method provided by the invention only needs one monocular traffic monitoring camera, and the equipment cost is lower.

Drawings

FIG. 1 is a general framework diagram of the method of the present invention;

FIG. 2 is a schematic diagram of a three-dimensional vehicle bounding box generation;

FIG. 3 is a schematic view of a road surface calibration;

FIG. 4 is a two-dimensional intersection of vehicle bounding boxes;

FIG. 5 is a schematic view of a vehicle tracking process.

Detailed Description

The present invention will be further illustrated with reference to the accompanying drawings and specific embodiments, which are to be understood as merely illustrative of the invention and not as limiting the scope of the invention.

Example 1

As shown in fig. 1 to 5, a traffic vehicle information acquisition method based on Mask R-CNN takes a traffic scene on a certain bridge deck as an example, and acquires information of passing vehicles through a traffic monitoring lens arranged beside a road. The overall method framework is shown in fig. 1, and includes the following contents:

1. the method comprises the steps of establishing a traffic image database containing vehicles, selecting a skeleton network structure for extracting image features, dividing the vehicles in the database into multiple types by using an image segmentation and labeling tool, and simultaneously, independently dividing the wheels into one type. Then training a Mask R-CNN network, training and iterating for 3 ten thousand times, and setting the learning rate to be 2 multiplied by 10 before iterating for 1 ten thousand times^-32 x 10 between 1 ten thousand and 2 ten thousand times^-42 x 10 between 2 ten thousand and 3 ten thousand times^-4. After training, the network has the capability of identifying the vehicle and the wheel in the traffic environment.

2. Respectively determining a first orthogonal vanishing point, a second orthogonal vanishing point and a third orthogonal vanishing point of a scene according to a lane line, a vehicle texture and a street lamp position in a traffic scene, constructing a three-dimensional vehicle boundary frame by taking a tangent line to a vehicle Mask generated by Mask R-CNN based on the three orthogonal vanishing points, and taking a lane where a bottom surface midpoint 5 of the three-dimensional vehicle boundary frame is as a lane where a current vehicle runs as shown in FIG. 2.

3. 12 reference points (a, B, C, D,1,2,3,4, a, B, C, D) are established by using the scene second vanishing point and the end point of the lane dotted line segment, and as shown in fig. 3, the pixel coordinate of each reference point is obtained. And then, the world coordinates of all the reference points can be obtained by using the known actual length of the dotted line segment of the lane and the actual width of the lane. And a homography matrix between the road plane world coordinate and the image plane pixel coordinate can be obtained by combining the pixel coordinate of the reference point and the world coordinate, so that a basis is provided for calculating the length and the speed of the vehicle.

4. And establishing a vehicle virtual detection area in the visual field range of the traffic monitoring lens, and ignoring vehicle targets outside the virtual detection area in the detection. And then, carrying out frame-by-frame detection on the video based on a Mask R-CNN network to obtain the information of the vehicle type, the number of axles, the length and the lane in which the vehicle is identified in each frame of the running vehicle in the detection region. The method comprises the steps of taking a point with the minimum pixel vertical coordinate in each identified wheel mask in an image as a vertex of the wheel mask, and then counting the number of the vertexes of the wheel mask contained in one vehicle mask, wherein the number of the vertexes is the number of vehicle axes.

5. And tracking the vehicle target entering the virtual detection area by using a two-dimensional vehicle boundary frame generated by Mask R-CNN and an SORT target tracking method. In the tracking process, a constant speed hypothesis model is adopted, and a state vector of a two-dimensional vehicle boundary box is expressed as

Where u and v are the horizontal and vertical pixel coordinates, respectively, of the center of the two-dimensional vehicle bounding box, s and r are the area and aspect ratio, respectively, of the two-dimensional vehicle bounding box, and the measurement vector is represented as [ u, v, s, r]^T. At the current frame, for the vehicle detected in the detection area, if its two-dimensional bounding box matches a two-dimensional edge of the vehicle target predicted to be generated by Kalman filtering on the basis of the previous frameAnd the bounding box updates the two-dimensional vehicle target bounding box obtained based on the prediction of the previous frame through Kalman filtering according to the two-dimensional vehicle bounding box detected in the current frame, and simultaneously calculates the corresponding vehicle length, the number of axles, the type and the lane according to the detection result in the current frame. And regarding the vehicle detected in the detection area in the current frame, if the vehicle target predicted based on the previous frame does not match with the vehicle target predicted based on the previous frame, the vehicle detected in the current frame is considered to just enter the detection area and is taken as a new vehicle target. For an existing vehicle object, the existing vehicle object is considered to have left the detection zone if no detected vehicle matches it for more than 5 frames.

6. The matching strategy between the two-dimensional vehicle bounding box detected in the current frame and the two-dimensional vehicle bounding box predicted on the basis of the previous frame is based on Hungarian algorithm. The method determines an optimal matching result by utilizing an intersection and comparison matrix between two-dimensional boundary frames, wherein intersection of the two-dimensional boundary frames is shown in fig. 4, an intersection between a detection frame 6 and a prediction frame 7 is an intersection area 8, in addition, an intersection and comparison minimum threshold value is set to be 0.3, and if the intersection and comparison between the two-dimensional boundary frames is smaller than the threshold value, the two-dimensional boundary frames are defined to be mismatched.

7. After the vehicle leaves the virtual detection area 9, the vehicle type, the number of axles and the lane information of each frame obtained in the vehicle tracking process in the detection area are analyzed, the identification value with the highest frequency of occurrence in all the frames is used as the final vehicle parameter, the average value of the vehicle lengths corresponding to all the obtained frames is used as the final vehicle length calculation value, then the corresponding vehicle speed is calculated according to the running distance and time of the vehicle in the detection area, and the number of passing vehicles on the corresponding lane is accumulated. The tracking process is schematically illustrated in fig. 5, in which a tracking trajectory 10 of a vehicle is shown.

In conclusion, the traffic vehicle information acquisition method based on the Mask R-CNN successfully obtains the statistical information of the type, the number of axles, the length, the speed, the driving lane and the number of vehicles of the passing vehicles.

The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims

1. A traffic vehicle information acquisition method based on Mask R-CNN is characterized in that: the method comprises the following steps: the method comprises the steps that a traffic monitoring camera arranged beside a road is combined with a Mask R-CNN network to simultaneously obtain statistical information of the type, the number of axles, the length, the speed, the lane where the vehicle runs and the number of vehicles of the passing vehicle;

the specific method comprises the following steps:

respectively determining a first orthogonal vanishing point, a second orthogonal vanishing point and a third orthogonal vanishing point of a scene according to a lane line, a vehicle texture and a street lamp position in a traffic scene, then drawing a tangent line to a vehicle Mask generated by Mask R-CNN based on the three orthogonal vanishing points to construct a three-dimensional vehicle boundary frame, and taking a lane where a midpoint of the bottom surface of the three-dimensional vehicle boundary frame is as a lane where a current vehicle runs;

determining road surface calibration reference points by using second vanishing points and lane dotted lines in a traffic scene, and obtaining a homography matrix between road plane world coordinates and image plane pixel coordinates based on pixel coordinates and world coordinates of the reference points, thereby providing a basis for calculating the length and the speed of a vehicle;

establishing a virtual vehicle detection area in a visual field range of a traffic monitoring camera, neglecting vehicle targets outside the virtual detection area, and then carrying out frame-by-frame detection on a video based on a Mask R-CNN network to obtain the type, the number of axles, the length and the information of a lane where the vehicle is located in each frame; taking a point with the minimum pixel vertical coordinate in each identified wheel mask in the image as a vertex of the wheel mask, and then counting the number of the vertexes of the wheel mask contained in one vehicle mask, wherein the number of the vertexes is the number of axes;

tracking a vehicle target entering a virtual detection area by combining a two-dimensional vehicle boundary frame generated by Mask R-CNN and an SORT target tracking method until the vehicle leaves the detection area;

and after the vehicle leaves the virtual detection area, taking the recognition value with the highest frequency of occurrence in the information sequence of the vehicle type, the number of axles and the lane in which the vehicle is located, which are obtained from the multiple frames corresponding to the vehicle tracking process in the detection area, as a final vehicle parameter, taking the average value of the corresponding vehicle lengths in the obtained multiple frames as a final vehicle length calculation value, then calculating the corresponding vehicle speed according to the running distance and time of the vehicle in the detection, and accumulating the number of passing vehicles on the corresponding lane.