CN110633678A

CN110633678A - Rapid and efficient traffic flow calculation method based on video images

Info

Publication number: CN110633678A
Application number: CN201910883883.8A
Authority: CN
Inventors: 王亚涛; 江龙; 赵英; 魏世安; 邓佳; 邓家勇; 郑全新; 张磊; 孟祥松; 高志成; 黄志举
Original assignee: Beijing Tongfang Software Co Ltd; Tongfang Co Ltd
Current assignee: Beijing Tongfang Software Co Ltd; Tongfang Co Ltd
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2019-12-31
Anticipated expiration: 2039-09-19
Also published as: CN110633678B

Abstract

A fast and efficient traffic flow calculation method based on video images relates to target detection based on video images, an intelligent event analysis system applied to video monitoring data in traffic scenes and a traffic parameter calculation system. The invention collects the vehicle video image and processes and analyzes it by the camera or image collecting card device installed on the road, the method steps are: firstly, generating a road surface area and a camera position; secondly, generating an original sampling graph; thirdly, perspective transformation; fourthly, training a light sparse convolutional neural network algorithm model; fifthly, uplink and downlink statistics; and sixthly, predicting the model. Compared with the prior art, the method can realize rapid traffic flow calculation under multiple scenes, different weather conditions and different road surface states, and has the characteristics of high detection precision, good real-time performance and high efficiency.

Description

Rapid and efficient traffic flow calculation method based on video images

Technical Field

The invention relates to target detection based on video images, an intelligent event analysis system applied to video monitoring data in a traffic scene and a traffic parameter calculation system.

Background

At present, the schemes for calculating the traffic flow are mainly a ground induction coil detector, a microwave detector, an intelligent video detection method and the like. The ground induction coil detection belongs to a passive contact type detection technology, and the scheme has high precision in the aspects of traffic flow calculation, traffic occupancy and the like, and is less influenced by weather conditions. However, the construction of the scheme is complex, the coil needs to be buried under the road in the installation process, the road needs to be excavated to interrupt traffic, normal traffic is affected in the construction process, the road is damaged, and the equipment maintenance success rate is high. The microwave detector uses special equipment such as infrared rays, ultrasonic waves or microwaves and the like to finish vehicle detection by transmitting electromagnetic waves and receiving induction information. The scheme is insensitive to the change of the climate conditions, and meanwhile, the equipment is simpler to install. However, the sensitivity of the scheme is not high enough, and a certain false detection rate exists.

An intelligent video image detection method belongs to a non-contact detection technology, and vehicle video images are acquired through a camera or an image acquisition card and other equipment arranged on a road. When a vehicle passes under a monitoring scene, the number of traffic flow is counted when the vehicle target crosses a certain position by detecting and tracking the vehicle target. Compared with other schemes, the video image detection method has the following advantages:

1. the hardware is simple to install and maintain, and the normal traffic of the road surface is not influenced;

2. the traffic condition can be monitored in real time through the video device, and the traffic condition can be intuitively mastered in real time;

3. the collected vehicle information is rich, and the management of traffic managers is facilitated;

4. signals between adjacent monitoring points cannot interfere with each other;

5. the monitoring range can be adjusted and expanded.

The traffic flow refers to the number of vehicles passing through a certain highway point in a certain time; the number of vehicles refers to the number of vehicles on a single static picture; since the same object may last for some time in a continuous video, traffic information cannot be obtained by accumulating a single detection result. In the current traffic flow detection scheme based on image processing, the main flow is vehicle detection and target tracking. The current main vehicle detection methods comprise a background difference method, an interframe difference method, a Vibe algorithm and the like, and the main tracking algorithms comprise TLD tracking, particle filtering, KCF tracking and the like.

The target detection and tracking method based on the video image mainly comprises the following steps:

1. the background difference method is a general method for motion segmentation of a static scene, which performs difference operation on a currently acquired image frame and a background image to obtain a gray level image of a target motion area, performs thresholding on the gray level image to extract the motion area, and updates the background image according to the currently acquired image frame in order to avoid the influence of environmental illumination change. 2. The inter-frame difference method is to subtract pixel values of two adjacent frames or two images separated by several frames in a video stream, and perform thresholding on the subtracted images to extract a motion region in the images.

And 3, storing a sample set for all the pixel points by the Vibe algorithm, wherein the sampling values stored in the sample set are the past pixel values of the pixel points and the pixel values of the neighbor points of the pixel points. And comparing the new pixel value of each frame in the following with the sample historical value in the sample set to judge whether the new pixel value belongs to the background point.

The operation mechanism of TLD tracking is as follows: and the detection module and the tracking module perform complementary interference parallel processing. First, the tracking module estimates the motion of the object by assuming that the motion of the object between adjacent video frames is limited and the tracked object is visible. If the target disappears in the camera field of view, it will cause tracking failure. The detection module assumes that each view frame is independent of each other and performs a full-image search on each frame image to locate areas where objects may appear, based on previously detected and learned object models. As with other target detection methods, errors may occur in the detection module in the TLD, and the errors are divided into two cases, namely, false negative samples and false positive samples. And the learning module evaluates the two errors of the detection module according to the result of the tracking module, generates a training sample according to the evaluation result, updates the target model of the detection module, and updates the key characteristic point of the tracking module, so as to avoid similar errors in the future.

5. Particle filtering is a nonlinear filtering method based on Monte Carlo simulation, and the core idea is to express probability density distribution by randomly sampled particles. Three important steps of particle filtering are: 1) sampling particles, extracting a set of particles from the proposed distribution; 2) particle weighting, namely calculating the weight of each particle according to observation probability distribution, importance distribution and a Bayes formula; 3) estimating output, outputting mean covariance of system state, etc. In addition, in order to cope with the particle degradation phenomenon, strategies such as resampling are also adopted.

KCF is a discriminant tracking method, which generally trains a target detector during tracking, uses the target detector to detect whether the next frame predicted position is a target, and then uses the new detection result to update the training set and further update the target detector. When the target detector is trained, the target area is generally selected as a positive sample, the area around the target is selected as a negative sample, and of course, the more the area closer to the target is, the higher the probability of the area being a positive sample is

The prior art has the following defects:

1. the traditional target detection method is particularly sensitive to image quality, light and camera shake, and meanwhile, for the adhered vehicle targets, single targets cannot be distinguished; meanwhile, the vehicle target detection effect under a large scene is not ideal, and a large number of false detection and missing detection conditions exist.

2. According to the tracking scheme, a plurality of methods are provided for multi-target tracking at present, but staggering and shielding are always difficult problems in tracking, and particularly for a plurality of vehicle targets in a large scene, when the number of targets is large and the shielding is serious at the same time, the tracking effect is not ideal.

3. The target detection method based on deep learning has a good detection effect, but the detection in the CPU mode takes time and cannot achieve a real-time effect.

4. The current most traffic flow calculation schemes adopt the flow, the detection and tracking effects directly determine the accuracy of traffic flow statistics, and the scheme combining the detection module and the tracking module has a certain problem. And because the detection and the tracking are complementary processes, part of targets may be missed for detection if the detection interval period is long, the tracking significance cannot be played if the detection interval period is short, and meanwhile, the algorithm efficiency is low.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention aims to provide a quick and efficient traffic flow calculation method based on a video image. The method can realize rapid traffic flow calculation under multiple scenes, different weather conditions and different road surface states, and has the characteristics of high detection precision, good real-time performance and high efficiency.

In order to achieve the above object, the technical solution of the present invention is implemented as follows:

a fast and efficient traffic flow calculation method based on video images collects vehicle video images through a camera or an image acquisition card device installed on a road and processes and analyzes the vehicle video images, and the method comprises the following steps:

firstly, generating a road surface area and a camera position:

and inputting a video image, and respectively training two tasks of taking the road surface area information and the camera position information as a network. Collecting image data of different pavements, and selecting 5 boundary key points of a pavement area for data annotation on the image data; and marking the position information of the camera relative to the road surface.

For the same network task, the weight of the road surface area information and the loss of the camera position is adjusted when the loss is designed, so that the data dimensionality of the road surface area information and the data dimensionality of the camera position are balanced:

and calculating and generating a virtual mixing line according to the generated road surface area.

Secondly, generating an original sampling graph:

arranging each piece of data in sequence according to the inclination angle direction of the virtual mixing line to form an original sampling graph:

2.1 virtual swizzle point set data:

the generated virtual stirring line is represented by two points, the width of the virtual stirring line is recorded as SW, data of all point sets covered by the virtual stirring line is calculated, and all the point sets are recorded, wherein all the point set data are represented as [ (x1, y1), (x2, y2) ].

2.2 creating a sampling graph:

according to the width of the virtual mixing line, creating an original sampling graph, and recording the original sampling graph as SrcSample; image data much larger than the actual number of frames is created.

2.3 filling sampling chart:

for the 1 st frame data, putting the corresponding point set data on the frame data into [ (M + x1, y1), (M + x2, y2) ] of the SrcSample graph according to the position information; two endpoints (X _ S1, Y _ S1), (X _ S2, Y _ S2) recording the piece of data; for the 2 nd frame data, putting the frame data into the corresponding position of the 2 nd line of the Sample graph according to the position information of the camera, if the camera is on the right side, putting the frame data according to [ (M + x1-SW (time-1), y1 + SW (time-1)), (M + x2-SW (time-1), y2 + SW (time-1)).. the. (M + xn-SW (time-1), yn + SW (time-1)) ]; if the camera is on the left, it is placed as [ (M + x1 + SW (time-1), y1 + SW (time-1)), (M + x2 + SW (time-1), y2 + SW (time-1)).. the. (M + xn + SW (time-1), yn + SW (time-1)) ]; ... Two end points (X _ E1, Y _ E1), (X _ E2, Y _ E2) of the last piece of data are recorded, wherein SW is the width 1 of the sampling line and time is the nth piece of image data.

Thirdly, perspective transformation:

carrying out perspective transformation on the original sampling image, wherein the general formula of the perspective transformation is as follows:

wherein

The linear transformation is realized in such a way that,

the translation transformation is realized by the aid of the translation transformation,

the perspective transformation is realized by the method of the method,

and realizing full-scale transformation.

The mathematical expression of the perspective transformation is:。

the original sampling chart obtained records 4 pairs of coordinate points, that is, the coordinates of the original are (X _ S1, Y _ S1), (X _ S2, Y _ S2) and (X _ E1, Y _ E1), (X _ E2, Y _ E2), respectively, and the width and height of the transformed coordinates are set to W and H, so that the 4 changed coordinate points are (0,0), (0, H) and (W, H), (W, 0). And establishing a perspective transformation matrix according to the above transformation formula, and completing the transformation from the original sampling image to the aerial view.

Fourthly, training a light sparse convolution neural network algorithm model:

data is collected from a fixed location on the raw image where the up and down pixels change to within k pixels when no vehicle passes. If this value is exceeded, it is noted that a transition between two rows has occurred.

The light sparse convolutional neural network algorithm comprises the following specific steps:

4.1, the number of vehicles in the flow time series image is marked, and if the marked sample data is T pieces in total, T > = M is normal.

4.2 according to the characteristics of the flow time sequence image pixel mutation, combining the idea of a convolutional neural network, the algorithm formula is as follows:

。

wherein

Indicating the proportion of each column of pixel transitions,

a feature matrix representing the column is then generated,

and N represents the number of marked vehicles for the feature vector to be trained.

，

Wherein V_jThe pixel value of the j-th row is represented, k is the threshold value, and H is the height of the image.

4.3 model training:

4.3.1 select W data from T data set, initially select No. 1 ~ W, initial minimum error value ErrorMin is a large number。

4.3.2 testing the remaining T-W sheets of data according to the corresponding

And

and (5) solving a prediction result, and recording a difference value between the prediction result and the actual marked data, wherein the difference value is Error. If Error is less than the minimum Error value ErrorMin, the Error is updated to ErrorMin, and the corresponding Error is recorded

The value is obtained.

4.3.3 select the 2 nd 2 ~ W +1 st data, solve by W data

(ii) a Then step 4.3.2 is performed.

4.3.4 loops through 4.3.2 and 4.3.3 until ErrorMin no longer changes.

4.3.5 calculate the ratio of the absolute errors, which is used as the tuning parameter in the later model prediction.

Where ErrorMin is the minimum error value in the training process, N_iIs the label data of the corresponding image.

Fifthly, uplink and downlink statistics:

5.1 count the number of transitions in each lane。

5.2 circularly counting the jump data of a certain number of times, and accumulating the result of each time as A_iWherein。

5.3 statistics A from left lane to right lane_iThe columns smaller than a certain threshold are uniformly stored in an array J_iPerforming the following steps; if the column A_iLess than the threshold, then J_iIs 0; otherwise it is 1.

5.4 statistical array J_iAccording to the priori knowledge, the area is within 0.3 ~ 0.8.8 of the image width, the section with the longest length of 0 continuous is selected as the middle green belt or guardrail area, and the middle upper and lower line boundary is obtained according to the starting position of the section.

Sixthly, model prediction:

and step one only needs to be completed once, the operation of step two is performed on each subsequent frame of data, the operation of step three is performed when the calculation period is met, the traffic flow statistics is completed by using the flow timing diagram generated in step three according to the model parameters trained above, and the uplink and downlink flow statistics is completed by combining the uplink and downlink information.

In the traffic flow calculation method, the data labels are labeled according to a counterclockwise sequence, the data of the two upper points are subjected to standardized setting, the height position information of the two upper points is dynamically set according to the size of the image, the height is 16% of the image height, and the intersection point of the line and the road surface edge is used as the two upper boundary key points.

In the traffic flow calculation method, in the data labeling process, data normalization is performed by using relative coordinate position information, namely, the ratio of the coordinate position to the height and width of the image.

In the above-described method for calculating the vehicle flow rate, the generated virtual phantom line is substantially perpendicular to the road surface direction, has the same width as the road surface area, and has a height 1/2 equal to the road surface height.

As the method is adopted, compared with the prior art, the method has the following advantages that:

1. the calculation is fast, and the average time from input to result output of a single picture is 0.005 s;

2. the method is efficient, calculation is not needed in the generation process of the sampling graph, and the efficiency is higher.

3. The reliability is high and stable, and the sampling map can be normally collected and predicted under the conditions of vehicle existence, vehicle absence and vehicle.

4. The whole process is simple, and target detection and tracking are not required;

5. the result is comprehensive and high in precision, and the sampling graph can completely reflect the actual running condition of the traffic flow in a period of time.

6. The adaptability is wide, and the method is suitable for scenes in the day and at night; the method is suitable for high-speed and urban traffic and tunnel scenes; can be suitable for different weather conditions.

The invention is further described with reference to the following figures and detailed description.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

Referring to fig. 1, the fast and efficient traffic flow calculation method based on video images of the present invention collects video images of vehicles through a camera or an image capture card device installed on a road and processes and analyzes the video images, and the method comprises the following steps:

firstly, generating a road surface area and a camera position:

and inputting a video image, and respectively training two tasks of taking the road surface area information and the camera position information as a network. Collecting image data of different pavements, and selecting 5 boundary key points of a pavement area from the image data to perform anticlockwise sequential data annotation. The actual marking finds that three points of the lower part of the image can be easily marked at the position of the road surface boundary. Two points on the upper part of the image are inconsistent due to the fact that the front end of the expressway in different scenes is not consistent, and meanwhile, the boundary characteristics of the especially upper pavement area are not obvious. Therefore, at the time of labeling, the data of the two upper points are standardized. And dynamically setting height position information of two points at the upper end according to the size of the image, wherein the height is 16% of the height of the image, and the intersection point of the line and the edge of the road surface is used as two boundary key points. On one hand, the marking mode can carry out standardization processing on the two points, meanwhile, the influence of the road surface which is too close to the road surface is ignored, and the marking mode has higher precision than the mode without the height line through actual tests.

In the labeling process, relative coordinate position information is adopted for normalization processing, namely the proportion of the coordinate position relative to the height and the width of the image. Therefore, the network structure can be better designed, and the model precision is favorably improved.

And marking the position information of the camera relative to the road surface, wherein the data position is 0.9999 if the camera is on the left side of the road surface, the data position is 0.5 if the camera is in the middle of the road surface, and the data is 0.0001 if the camera is not in the middle of the road surface.

For the same network task, the road surface area and the camera position information need to be calculated, meanwhile, the road surface has 5 points and 10-dimensional information, and the camera position information only has 1-dimensional information. In order to solve the problem of unbalanced data dimension, the road surface area information and the loss of the camera position are weighted and adjusted when the loss is designed, so that the data dimensions of the road surface area information and the camera position are balanced:

the algorithm can quickly and accurately generate the road surface area, and then a virtual blending line is calculated according to the road surface area, wherein the virtual blending line is basically vertical to the road surface direction, has the same width as the road surface area, and has the height approximately about 1/2 of the road surface height. The height, the width, the direction and other key information of the virtual wire mixing ensure that the sampling graph can accurately reflect the passing condition of the vehicle.

Secondly, generating an original sampling graph:

in combination with the practical application situation, since the virtual blending lines are rarely horizontal, if we put each piece of virtual blending line data directly and horizontally on the sampling graph, the target on the sampling graph has large deformation, stretching and other situations, which is not beneficial to the subsequent target number statistics. Therefore, each piece of data is arranged in sequence according to the inclination angle direction of the virtual wire mixing to form an original sampling graph.

2.1 virtual swizzle point set data:

a virtual blending line is generated according to the road surface area, the virtual blending line is represented by two points, the width of the virtual blending line is recorded as SW, and the width of the virtual blending line is 1 in an actual system. It is necessary to calculate data of all point sets covered by the virtual bridle and record all point sets, which are represented by [ (x1, y1), (x2, y 2)........ -. (xn, yn) ], and all point set data represent data to be acquired of each frame of video data.

2.2 creating a sampling graph:

according to the width of the virtual puddle, an original sampling image is created and is marked as SrcSample. In practical application, the traffic flow is counted for 2 minutes, the data of about 1200 frames in 2 minutes in the system, the original sampling graph needs to meet the limit, the virtual stirring line is considered to have a certain inclination angle, image data far larger than the actual frame number is created, and the original graph of 5000 × 5000 is created in the actual system.

2.3 filling sampling chart:

for the 1 st frame data, the corresponding point set data on the frame data is put into [ (M + x1, y1), (M + x2, y2) ] of the SrcSample graph according to the position information, and M in the actual system takes 2500. Recording the two endpoints (X _ S1, Y _ S1), (X _ S2, Y _ S2) of the piece of data in preparation for subsequent perspective transformation; for the 2 nd frame data, putting the 2 nd row corresponding position of the Sample graph according to the camera position information, if the camera is on the right side, because the slope of the virtual line mixing is a negative value, when the data is put, we are according to [ (M + x1-SW (time-1), y1 + SW (time-1)), (M + x2-SW (time-1), y2 + SW (time-1)), (M + xn-SW (time-1), yn + SW (time-1)) ]; if the camera is on the left side, since the slope of the virtual stippling is positive, we are placed with the data as the camera is in the middle (M + x1 + SW (time-1), y1 + SW (time-1)), (M + x2 + SW (time-1), y2 + SW (time-1)). 9. (M + xn + SW (time-1), yn + SW (time-1)) ]; .... The sequence is executed until the 1200 th data, the generation of the original sampling graph is completed, and two endpoints (X _ E1, Y _ E1), (X _ E2, Y _ E2) of the last piece of data are recorded. Wherein SW is the width 1 of the sampling line, and times is the nth image data.

Thirdly, perspective transformation:

in the generated original sampling map, the target has a certain tilt due to a problem of the camera angle, and the original sampling map is subjected to tensile deformation, so that it is necessary to convert the original sampling map into an overhead bird's eye view. The general formula for the perspective transformation is:

wherein

The linear transformation is realized in such a way that,the translation transformation is realized by the aid of the translation transformation,

the perspective transformation is realized by the method of the method,realizing full-scale transformation;

the mathematical expression of the perspective transformation is:

。

the perspective transformation is the transformation of the image based on four pairs of vertexes, and the coordinates of four pairs of pixel points corresponding to the perspective transformation, namely the coordinates of the original image and the transformed coordinates, can be given to obtain a perspective transformation matrix. Based on the transformation matrix, the perspective transformation of the image can be completed.

The obtained original sampling image records 4 pairs of coordinate points, namely, the coordinates of the original image are respectively (X _ S1, Y _ S1), (X _ S2, Y _ S2) and (X _ E1, Y _ E1), (X _ E2, Y _ E2), the width of the transformed coordinate is set as W and the height of the transformed coordinate is set as H, so that the changed 4 coordinate points are (0,0), (0, H) and (W, H), (W,0), actually, W =500 and H =1200, a perspective transformation matrix is established according to the above transformation formula, and the transformation from the original sampling image to the bird' S-eye view image is completed. The converted target eliminates the influence of the angle, the height and the position of the camera on the acquisition target, and provides effective image data for the next target counting, and the effective image data is recorded as a flow timing chart.

Fourthly, training a light sparse convolution neural network algorithm model:

through analyzing the flow rate sequence diagram, the flow rate sequence diagram is found to have certain difference with an RGB (red, green and blue) diagram directly acquired by a camera, and has some unique characteristics. Since we are data collected from fixed positions on the original image, the spatial positions corresponding to the same column of the image data flow timing chart after perspective transformation are the same, but different in time. Assuming that there is no vehicle passing through the position, the pixel values of the upper and lower lines should be identical, and the pixel values of the upper and lower lines should be changed only when there is a vehicle passing through the position. Considering the changes of light, jitter and the like in the actual operation environment, when no vehicle passes through, the uplink and downlink pixel changes within k pixels, and in the actual application, we take k = 10. If this value is exceeded, we note that a transition between two rows has occurred.

Based on the characteristics of the flow timing diagram, a light sparse convolutional neural network algorithm is designed. The light sparse convolutional neural network algorithm is provided aiming at a flow sequence diagram, is inspired by the convolutional neural network idea in order to quickly and efficiently count a new algorithm idea of vehicle flow, and is combined with a sparse coding algorithm to take the target number statistics as an accumulation result of products of different characteristics and different weights. The algorithm is similar to sparse coding in form, but the variable represents different meanings from sparse coding, and meanwhile, effective features of the image are manually selected and only the weights of the features are trained, so that the algorithm is different from a common convolutional neural network. An algorithm model is formed through training in an actual system, the number of targets is directly predicted through the model in the application process, and the algorithm has the characteristics of few training samples, simplicity in training, quickness in prediction, high efficiency and the like.

。

wherein

Indicating the proportion of each column of pixel transitions,

a feature matrix representing the column is then generated,

，

Wherein V_jThe pixel value of the j-th row is shown, k is the threshold value, and in practical application we take 10. H is the height of the image.

4.3 model training:

4.3.1 selecting W data from T data set, initially selecting No. 1 ~ W, initial minimum error value ErrorMin is larger number, actually taking ErrorMin =10000, solving out W data

。

4.3.2 testing the remaining T-W sheets of data according to the corresponding

And

The value is obtained.

4.3.3 select the 2 nd 2 ~ W +1 st data, solve by W data

(ii) a Then step 4.3.2 is performed.

4.3.4 loops through 4.3.2 and 4.3.3 until ErrorMin no longer changes.

4.3.5, calculating the proportion of the absolute error, wherein the proportion is used as an adjusting parameter in the later model prediction;

Fifthly, uplink and downlink statistics:

based on the above characteristics of the traffic timing diagram, we count the uplink and downlink boundaries according to the hopping situation. The left side has the condition that the vehicle jumps through each column, the right side has the condition that the vehicle jumps through each column, and the middle green belt or the guardrail part has no vehicle to pass through all the time, so the jump is 0. Based on the above, we perform the positioning of the uplink and downlink boundary by a statistical method, and in consideration of the situation that there is a possibility that the road surface does not pass through the vehicle in a short time, we count the jump situation in a period of time (10 times of data) and finally determine the boundary.

5.1 count the number of transitions in each lane

。

5.2 cycle count 10 times data, add 10 results as A_iWherein

。

5.4 statistical array J_iAccording to the priori knowledge, the area is within the range of 0.3 ~ 0.8.8 of the image width, the section with the longest length of 0 is selected as the middle green belt or guardrail area, and the middle upper and lower line boundary is obtained according to the starting position of the section.

Thirdly, model prediction:

In summary, the innovation points of the present invention different from the prior art are:

1. the idea of converting the original sampling image into a bird's-eye view through perspective transformation;

2. an algorithm for counting an uplink and a downlink boundary;

3. the algorithm idea of the light sparse convolutional neural network algorithm operation;

training and optimizing process of light sparse convolution neural network algorithm operation.

Claims

1. A fast and efficient traffic flow calculation method based on video images collects vehicle video images through a camera or an image acquisition card device installed on a road and processes and analyzes the vehicle video images, and the method comprises the following steps:

firstly, generating a road surface area and a camera position:

inputting a video image, and respectively training two tasks which take the road surface area information and the camera position information as a network; collecting image data of different pavements, and selecting 5 boundary key points of a pavement area for data annotation on the image data; marking the position information of the camera relative to the road surface;

calculating and generating a virtual wire mixing according to the generated road surface area;

secondly, generating an original sampling graph:

2.1 virtual swizzle point set data:

the generated virtual stirring line is represented by two points, the width of the virtual stirring line is recorded as SW, data of all point sets covered by the virtual stirring line are calculated, and all the point sets are recorded, wherein all the point set data are represented as [ (x1, y1), (x2, y2) ]. the. (xn, yn) ] and represent data to be acquired of each frame of video data;

2.2 creating a sampling graph:

according to the width of the virtual mixing line, creating an original sampling graph, and recording the original sampling graph as SrcSample; creating image data much larger than the actual number of frames;

2.3 filling sampling chart:

for the 1 st frame data, putting the corresponding point set data on the frame data into [ (M + x1, y1), (M + x2, y2) ] of the SrcSample graph according to the position information; two endpoints (X _ S1, Y _ S1), (X _ S2, Y _ S2) recording the piece of data; for the 2 nd frame data, putting the frame data into the corresponding position of the 2 nd line of the Sample graph according to the position information of the camera, if the camera is on the right side, putting the frame data according to [ (M + x1-SW (time-1), y1 + SW (time-1)), (M + x2-SW (time-1), y2 + SW (time-1)).. the. (M + xn-SW (time-1), yn + SW (time-1)) ]; if the camera is on the left, it is placed as [ (M + x1 + SW (time-1), y1 + SW (time-1)), (M + x2 + SW (time-1), y2 + SW (time-1)).. the. (M + xn + SW (time-1), yn + SW (time-1)) ]; .... sequentially executing the steps till 1200 th time of data, finishing generating an original sampling graph, and recording two endpoints (X _ E1, Y _ E1), (X _ E2, Y _ E2) of the last piece of data, wherein SW is the width 1 of a sampling line, and time is nth image data;

thirdly, perspective transformation:

carrying out perspective transformation on the original sampling image, wherein the general formula of the perspective transformation is as follows:whereinThe linear transformation is realized in such a way that,

the mathematical expression of the perspective transformation is:

；

4 pairs of coordinate points are recorded in the obtained original sampling image, namely the coordinates of the original image are respectively (X _ S1, Y _ S1), (X _ S2, Y _ S2) and (X _ E1, Y _ E1), (X _ E2, Y _ E2), the width of the transformed coordinates is W and the height of the transformed coordinates is H, so that the 4 transformed coordinate points are (0,0), (0, H) and (W, H), (W,0), a perspective transformation matrix is established according to the above transformation formula, and the transformation from the original sampling image to the bird' S eye view image is completed;

fourthly, training a light sparse convolution neural network algorithm model:

data collected from a fixed position on an original image, wherein when no vehicle passes through the position, the uplink and downlink pixels are changed within k pixels; if the value exceeds the value, recording that jump occurs between two rows;

4.1, marking the number of vehicles in the flow time sequence images, assuming that the number of marking sample data is T, and usually T > = M;

；

wherein

Indicating the proportion of each column of pixel transitions,

a feature matrix representing the column is then generated,

representing the number of marked vehicles by N, which is a feature vector to be trained;

，

wherein V_jThe pixel value of the j-th line is represented, k is a threshold value, and H is the height of the image;

4.3 model training:

4.3.1 selecting W data from T data set, initially selecting No. 1 ~ W, with initial minimum error value ErrorMin being a larger number, passing WOpen up the data, solve out

；

4.3.2 testing the remaining T-W sheets of data according to the correspondingAnd

calculating a prediction result, recording a difference value between the prediction result and actual labeled data, and recording the difference value as Error; if Error is less than the minimum Error value ErrorMin, the Error is updated to ErrorMin, and the corresponding Error is recordedA value;

4.3.3 select the 2 nd 2 ~ W +1 st data, solve by W data

(ii) a Then executing the step 4.3.2;

4.3.4 loop through 4.3.2 and 4.3.3 until ErrorMin no longer changes;

where ErrorMin is the minimum error value in the training process, N_iLabeling data of the corresponding image;

fifthly, uplink and downlink statistics:

5.1 count the number of transitions in each lane；

5.2 Loop statistics of hop data for a certain number of timesAdding up the results of each time to A_iWherein

；

5.3 statistics A from left lane to right lane_iThe columns smaller than a certain threshold are uniformly stored in an array J_iPerforming the following steps; if the column A_iLess than the threshold, then J_iIs 0; otherwise, the value is 1;

5.4 statistical array J_iSelecting a section with the maximum length of 0 continuously as a middle green belt or guardrail area according to the priori knowledge that the area is within the range of 0.3 ~ 0.8.8 of the image width, and calculating a middle uplink and downlink boundary according to the initial position of the section;

sixthly, model prediction:

2. The method for rapidly and efficiently calculating the traffic flow based on the video image as claimed in claim 1, wherein the data labels are labeled according to a counterclockwise sequence, data of two upper points are standardized, height position information of the two upper points is dynamically set according to the size of the image, the height is 16% of the height of the image, and intersection points of lines and road surface edges are used as two upper boundary key points.

3. The method for fast and efficiently calculating the traffic flow based on the video image according to claim 1 or 2, wherein in the data labeling process, data normalization is performed by using relative coordinate position information, namely, the ratio of the coordinate position to the height and the width of the image.

4. The method for fast and efficiently calculating the traffic flow according to claim 3, wherein the generated virtual blending line is substantially perpendicular to the road surface, has the same width as the road surface area, and has a height at 1/2 of the road surface height.