CN111783675A

CN111783675A - Intelligent city video self-adaptive HDR control method based on vehicle semantic perception

Info

Publication number: CN111783675A
Application number: CN202010629204.7A
Authority: CN
Inventors: 邢李蓉
Original assignee: Zhengzhou Maitou Information Technology Co ltd
Current assignee: Zhengzhou Maitou Information Technology Co ltd
Priority date: 2020-07-03
Filing date: 2020-07-03
Publication date: 2020-10-16

Abstract

The invention discloses a smart city video self-adaptive HDR control method based on vehicle semantic perception. The method comprises the following steps: constructing a city information model; detecting vehicles on a road and carrying out track analysis on the vehicles on the road based on a neural network; calculating the real-time speed of the vehicle according to the vehicle running track and the urban information model; judging whether to switch high dynamic range imaging or not according to the comparison result of the vehicle speed and the threshold value, and realizing self-adaptive HDR control; meanwhile, the road subregion images are projected and spliced and integrated into a city information model, and visual display is carried out by combining a WebGIS technology. By using the method and the device, the camera HDR mode can be adjusted according to the real-time traffic condition of the road, high-quality images of the real-time traffic condition of the vehicle are obtained, the information integration performance of the smart city is good, and the panoramic road imaging can be displayed in real time.

Description

Intelligent city video self-adaptive HDR control method based on vehicle semantic perception

Technical Field

The invention relates to the technical field of artificial intelligence, smart cities, CIM and computer vision, in particular to a smart city video self-adaptive HDR control method based on vehicle semantic perception.

Background

The smart city is explored for ten years, has been initially developed, and is not well reflected in the daily work related to the city. In many jobs, human resources are still mainly used, and the mode of judging and re-working through subjective recognition of people causes a large amount of work and a long time consumption for workers.

In smart urban road traffic, some scenarios require the synthesis of high dynamic range images, providing more image details. At present, a camera used for road information acquisition generally does not adopt a high refresh rate camera with better performance due to the limitation of working environment and working time, and if the frame rate is reduced for a long time to meet the requirement of high dynamic range imaging, the monitoring of road traffic can be influenced. Some studies judge whether to start a high dynamic range imaging mode by analyzing image brightness, and cannot obtain a high-quality vehicle passing condition image without combining with an actual road vehicle passing condition. Moreover, the existing camera used for information acquisition is only responsible for a monitored area, information is difficult to integrate due to the existence of repeated acquisition and other conditions in the process of processing the acquired image, and the existing camera is often a single visual angle and lacks of integral visual angle real-time imaging in the process of artificial monitoring.

Therefore, the prior art has the problems that high-quality vehicle passing condition images cannot be obtained, information is scattered, and the whole visual angle is lacked for real-time imaging.

Disclosure of Invention

The invention aims to provide a smart city video self-adaptive HDR control method based on vehicle semantic perception aiming at the defects in the prior art, which can adjust a camera HDR mode according to the real-time traffic condition of a road, obtain high-quality images of the real-time traffic condition of the vehicle, has good information integration performance of the smart city, and can display panoramic road imaging in real time.

A smart city video self-adaptive HDR control method based on vehicle semantic perception comprises the following steps:

(1) building a three-dimensional city space model by combining city building information and road information, and building a city information model;

(2) inputting the road subregion images acquired by the camera in real time into a vehicle detection encoder and a vehicle detection decoder for analysis to obtain a vehicle center thermodynamic diagram and the width and height of a vehicle surrounding frame;

(3) IoU of the vehicle surrounding frame of the current frame road subregion image and the vehicle surrounding frame of the previous frame road subregion image is calculated, matching weight is set according to the calculation result, and the optimal matching solution of the vehicle surrounding frame set of the current frame and the vehicle surrounding frame set of the previous frame is obtained through a KM algorithm;

(4) matching the center point of the surrounding frame of the current vehicle to the center point set of the corresponding vehicle according to the optimal matching solution to obtain a new center point set of the corresponding vehicle;

(5) according to the center point set of the vehicle, obtaining the driving track of the corresponding vehicle through curve fitting, and sending the driving track of the vehicle to the city information model;

(6) calculating the real-time speed of the vehicle by combining an urban information model according to the running track of the vehicle, judging whether the vehicle running at an overspeed exists or not, and adjusting a switch of a camera HDR mode according to an overspeed judgment result; the method comprises the following steps of calculating the real-time speed of a vehicle by combining an urban information model according to the running track of the vehicle: taking any two frames in the road subregion images of the continuous time sequence, and obtaining the time interval between the two frames by knowing the refresh rate of the camera; projecting the vehicle running track points in the two frames of images to a ground plane of the urban information model to obtain coordinates of the vehicle running track points in the urban information model; calculating the vehicle position offset between two frames according to the coordinates; dividing the vehicle position offset by the time interval to obtain the real-time vehicle speed of the vehicle;

(7) each camera acquires images of the sub-area of the road; according to the corresponding point pair of the corner points between the camera image plane and the ground two-dimensional plane of the city information model, a homography matrix of projection transformation between the two planes is obtained through an SVD algorithm, and the points of the image plane are projected onto a composite panoramic plane parallel to the ground of the city information model through the homography matrix; and carrying out image splicing operation on the image after projection transformation, integrating the image into the urban information model, and visualizing the urban information model by combining a WebGIS technology.

The vehicle detection encoder performs feature extraction on the road subregion image and outputs a feature map;

and the vehicle center point detection decoder is used for carrying out convolution decoding on the feature map and outputting a vehicle center point thermodynamic diagram and the width and height of the surrounding frame, and the hot spots in the vehicle center point thermodynamic diagram represent the confidence coefficient of the position of the vehicle center point.

The method also comprises training the vehicle detection encoder and the vehicle detection decoder, and the training method comprises the following steps: selecting images of road vehicle conditions collected under various working conditions as a training set, marking a vehicle center point in the images, generating hot spots on the vehicle center point through Gaussian blur, and marking the size of a vehicle surrounding frame; training is performed based on the designed loss function.

Designed loss function L_detComprises the following steps:

L_det＝L_k+ω_sizeL_size

wherein, ω is_sizeThe method is a weighting coefficient, N is the number of key points in an image, x and Y are pixel point coordinates, c is a category, Y is a truth value thermodynamic diagram, Y 'is a prediction thermodynamic diagram, α is a hyperparameter, k represents a k-th vehicle target, and S'_pkFor the predicted value of the vehicle bounding box size, s_kThe real value of the size of the vehicle surrounding frame is obtained.

(3) The method for obtaining the optimal matching solution of the current frame vehicle surrounding frame set and the previous frame vehicle surrounding frame set through the KM algorithm comprises the following steps:

if the image of the previous frame is processed and then s surrounding frames exist, the image of the current frame is processed and then s surrounding frames exist, and the optimal solution that each current frame surrounding frame is matched with one previous frame surrounding frame can be obtained through the KM algorithm; if the previous frame of image has s surrounding frames after being processed, and the current frame of image has s-a or s + a surrounding frames after being processed, calculating the distance from the surrounding frame to the image edge for the frames with more surrounding frames in the previous frame and the current frame, obtaining a surrounding frames nearest to the image edge by using a top-k method, removing the surrounding frames, and then performing KM algorithm matching to obtain the optimal matching solution.

(6) The method for calculating the real-time speed of the vehicle by combining the urban information model according to the running track of the vehicle specifically comprises the following steps:

setting two reference lines perpendicular to the lane direction in a city information model ground coordinate system, recording road sub-region images corresponding to the vehicles passing through the two reference lines, calculating the frame number of the interval of the two frames of images, obtaining the time interval of the two frames of images by combining the real-time camera refresh rate, and obtaining the real-time vehicle speed of the vehicles by dividing the distance of the reference lines by the time interval.

(7) The method for judging whether the vehicle runs at an overspeed or not comprises the following steps of:

setting a speed threshold, and closing the HDR mode of the camera when the speed of the vehicle is greater than the speed threshold; otherwise, camera HDR mode is enabled.

The HDR mode implementation method comprises the following steps: converting the real-time acquired road subregion RGB image into a color space Lab, obtaining brightness information from an L value, judging whether HDR needs to be synthesized or not according to the brightness information, adjusting two frames of images acquired by twice exposure compensation if the HDR needs to be synthesized, and finally synthesizing three frames of images into HDR imaging.

The image splicing operation is carried out on the image after the projection transformation, and the image is integrated into a city information model, and the image splicing operation comprises the following steps:

carrying out image correction and noise suppression on the projection-transformed sub-area images of the road to be spliced; extracting characteristic points of the sub-area images of the road to be spliced, and matching the characteristic points; estimating a homography matrix according to the matching point pairs, estimating the homography matrix by using a RANSAC method, and converting the images to be spliced into the same coordinate; deforming all input images onto a composite panoramic plane, calculating the coordinate range of the deformed images of the input images to obtain the size of an output image, calculating the offset between the original point of a source image and the original point of an output panoramic image, and mapping the pixel of each input source image onto the output plane; and carrying out image fusion to obtain an image splicing result, and integrating the image splicing result into the city information model.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention analyzes the vehicle passing condition according to the road area image, switches the HDR mode according to the real-time passing condition of the road vehicle, and can reduce the power consumption of the camera and obtain a high-quality image of the real-time passing condition of the vehicle.

2. The vehicle detection neural network is designed based on the deep learning technology, a large number of samples are adopted for training, the vehicle detection accuracy and efficiency are high, and the robustness and the generalization capability are strong.

3. According to the method, before the optimal matching of the vehicle surrounding frame set is carried out, the number of surrounding frames of the images of the previous frame and the next frame is analyzed, and the images are subjected to corresponding surrounding frame eliminating processing, so that the calculation efficiency of the optimal matching can be improved, and the optimal matching solution of the vehicle surrounding frame can be quickly obtained.

4. The invention combines the city information model technology to construct the city information model, can integrate various city information and improves the information integration capability of the intelligent city model.

5. The method is combined with the urban information model technology, the real-time speed of the vehicle is calculated according to the coordinates of the vehicle track points in the urban information model and the inter-frame time difference, no additional hardware configuration is needed, the cost is low, the result feedback real-time performance is strong, the calculation efficiency is high, and the accuracy is high.

6. The invention combines the urban information model technology to transform the road subregion image to the urban information model ground coordinate system, and realizes the road panoramic real-time imaging through image splicing, thereby being convenient for the monitoring of the monitoring personnel.

Drawings

FIG. 1 is a block diagram of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a smart city video self-adaptive HDR control method based on vehicle semantic perception. FIG. 1 is a flow chart of the method of the present invention. The following description will be made by way of specific examples.

Example 1:

firstly, a city information model CIM and an information exchange module thereof are established, namely, an organic complex of a three-dimensional city space model and city information is established on the basis of city information data. The city information model comprises BIM information of city buildings, city road information and other various information required by three-dimensional city space modeling.

The city information model can acquire information including camera perception information, corresponding geographical location information and the current environment through the information exchange module. The information exchange module is an access module of a database of the CIM, and may be in various forms, for example: the method can be applied to multiple information exchange forms such as RESTful and MQ, and the implementer specifically adopts which implementation mode and can select the implementation mode according to the implementation scene.

The computer vision detection technology has the remarkable advantages of non-contact, high efficiency, economy and the like, and has wide application prospect in various detection management applications. Therefore, the monitoring efficiency can be effectively improved by adopting a form of combining the CIM and the computer vision. And the result is visualized by using the WebGIS, the network output result is uploaded to the WebGIS as information, and a supervisor can search, inquire and analyze on the Web, so that the supervisor can conveniently know the real-time imaging information of the vehicle detection in the target area and take corresponding measures.

The invention mainly aims to realize camera self-adaptive HDR vehicle detection real-time imaging. The output result is an adaptive adjustment instruction and a road vehicle running track.

And analyzing the image acquired by the camera of each sub-area by taking a large-range road area as a total area, adaptively controlling the HDR mode and imaging in real time.

Adaptive HDR control is implemented based on vehicle detection. Vehicle detection is accomplished based on a vehicle detection neural network. The vehicle detection encoder performs feature extraction on the road subregion image and outputs a feature map; and the vehicle detection decoder is used for carrying out convolution decoding on the feature map and outputting a vehicle center point thermodynamic diagram and the width and height of the surrounding frame, and the hot spot in the vehicle center point thermodynamic diagram represents the confidence coefficient of the position of the vehicle center point. The analysis process of the vehicle detection neural network comprises the following steps: and taking the images collected by the sub-areas as input, and sending the images into a vehicle detection encoder and a vehicle detection decoder to obtain a vehicle center thermodynamic diagram Heatmap and the width W and the height H of a surrounding frame of the vehicle center. And performing coordinate regression on the thermodynamic diagram to obtain a peak point. The Heatmap peak coordinates are combined W, H to get the bounding box. And matching the key points of the current frame with the existing track through IoU of the current frame enclosure box and the previous frame enclosure box to obtain a new vehicle running track. And judging whether the vehicle running track is overspeed-driven by combining with CIM information, and correspondingly adjusting a switch of a camera HDR mode. The shooting area of the camera of each sub-area can cover the whole target area.

Order (x)₁ ^(k)，y₁ ^(k)，x₂ ^(k)，y₂ ^(k)) Bounding box BBox as the true value of vehicle target k, center point coordinates:

vehicle target size: s_k＝(x₂ ^(k)-x₁ ^(k)，y₂ ^(k)-y₁ ^(k)) (W, H). The coordinates of the center point are obtained by key point prediction, and then a target size is regressed for each target k. The specific method for obtaining the bounding box through the predicted value comprises the following steps: extracting the peak point of each category of Heatmap to shape the coordinate (x)_i，y_i) And (4) form representation. The bounding box is obtained by the following formula:

wherein, (x'_i，y′_i) Predicting coordinates for vehicle center point, (W'_i，H′_i) Is a vehicle size prediction result.

Training is required before the vehicle detection neural network can be used. And selecting images of the road vehicle condition collected under various working conditions (different ambient brightness) as a training set. The hot spot generated by the central point of the vehicle through Gaussian blur is labeled, and the size W, H of the surrounding frame of the vehicle is labeled.

The loss function predicted by the vehicle center point adopts a pixel-level logistic regression loss function L_k：

Wherein N is the number of key points in the image; x and Y are pixel point coordinates, c is a category, Y is a true value thermodynamic diagram hot spot point value, and Y' is a predicted hot spot point value. The probability that each point value is a keypoint at [0, 1] in the Heatmap map is represented. α, β are hyper-parameters of focalloss and are set to 2 and 4 in the present embodiment.

The loss function for bounding box size prediction uses L1 loss:

wherein N is the number of vehicles in the image, k represents the kth vehicle target, S'_pkFor the predicted value of the vehicle bounding box size, s_kThe real value of the size of the vehicle surrounding frame is obtained.

The target Loss function of the whole network training is as follows:

L_det＝L_k+ω_sizeL_size

to adjust the effect of the size loss function, it is multiplied by a factor ω_sizeIn the present embodiment, the coefficient is set to 0.1.

After the vehicle detection is performed, a vehicle running track needs to be obtained. And (4) calculating the intersection ratio IoU of the vehicle surrounding frame of the current frame road subregion image and the vehicle surrounding frame of the previous frame road subregion image, setting the matching weight according to the calculation result, and obtaining the optimal matching solution of the vehicle surrounding frame set of the current frame and the vehicle surrounding frame set of the previous frame through a KM algorithm.

The KM algorithm was refined with the size of IoU between the bounding boxes as the weight for matching. If the image of the previous frame is processed and then s surrounding frames exist, the image of the current frame is processed and then s surrounding frames exist, and the optimal solution that each current frame surrounding frame is matched with one previous frame surrounding frame can be obtained through the KM algorithm; if there are s bounding boxes after the previous frame image is processed, and there are s-a (i.e. the number of vehicle targets is decreased by a) or s + a (i.e. the number of vehicle targets is increased by s) bounding boxes after the current frame image is processed, calculating the distance from the bounding box to the image edge for the frames with more bounding boxes in the previous frame and the current frame, obtaining a bounding boxes nearest to the image edge by using a top-k method, in this case top-a, and removing the bounding boxes and then performing the matching of the KM algorithm.

And matching the center point of the current frame vehicle bounding box to the center point set of the corresponding vehicle according to the optimal matching solution to obtain a new center point set of the corresponding vehicle. And obtaining the running track of the corresponding vehicle through curve fitting according to the central point set of the vehicle.

And after KM matching, obtaining a previous frame surrounding frame matched with the vehicle surrounding frame of each current frame, and forming a vehicle driving track through curve fitting according to the predicted central point of the current frame surrounding frame and the track where the corresponding previous frame surrounding frame is located or the central point of the corresponding previous frame surrounding frame.

And calculating the real-time speed of the vehicle by combining the urban information model according to the running track of the vehicle, judging whether the vehicle running at an overspeed exists or not, and adjusting the switch of the HDR mode of the camera according to the overspeed judgment result.

And combining the matched vehicle running track with the geographic position information of the CIM to calculate the running speed of the vehicle.

The first method comprises the following steps: setting two reference lines perpendicular to the lane direction in a city information model ground coordinate system, recording road sub-region images corresponding to the vehicles passing through the two reference lines, calculating the frame number of the interval of the two frames of images, obtaining the time interval of the two frames of images by combining the real-time camera refresh rate, and obtaining the real-time vehicle speed of the vehicles by dividing the distance of the reference lines by the time interval. The distance of the reference line can be set by the practitioner according to actual conditions.

The second method comprises the following steps: taking any two frames in the road subregion images of the continuous time sequence, and obtaining the time interval between the two frames by knowing the refresh rate of the camera; projecting the known image to a homography matrix of the CIM, projecting the vehicle driving track points in the two frames of images to a ground plane of the urban information model, and obtaining coordinates of the vehicle driving track points in the urban information model; calculating the vehicle position offset between two frames according to the coordinates; and dividing the vehicle position offset by the time interval to obtain the moving speed of the vehicle track point, namely the real-time vehicle speed of the vehicle.

Setting a speed threshold, and turning off the HDR mode of the camera when the speed of the vehicle exceeds the threshold; when the vehicle speed is less than the threshold, the camera HDR mode is enabled. The threshold value should be determined according to the number of working frames in the HDR mode of the camera, and the operator can set the threshold value according to the actual situation of the camera.

One conventional implementation of HDR mode is: collecting RGB images at the current frame, converting a color space Lab, obtaining brightness information from an L value, judging whether HDR needs to be synthesized or not according to the brightness information, adjusting two frames of images after two times of exposure compensation collection if the HDR needs to be synthesized, and finally synthesizing three frames of images into the HDR. The starting of the HDR mode reduces the frame rate of the camera, so that the invention judges whether to start the HDR mode of the camera according to the detected vehicle speed in order to prevent the situation that the camera is difficult to capture complete information by adopting the HDR mode when the vehicle running speed is too high.

The real-time imaging method comprises the following steps: and each camera acquires images of the sub-area of the road. And performing projective transformation on the images to enable the images to be located on the same plane. And carrying out image splicing operation on the image subjected to projection transformation, and projecting the image into the CIM to realize real-time imaging.

The specific method of projective transformation comprises the following steps: projection transformation projects a picture to a new view plane, and angular point detection is performed firstly, and a plurality of angular point detection methods are provided, which are not described herein, for example, a Harris angular point detection algorithm, a SIFT angular point detection algorithm, a SUSAN angular point detection algorithm, a Kitchen-Rosenfeld angular point detection algorithm, and the like, and an implementer can select a proper angular point according to a required angular point characteristic. And obtaining a homography matrix H of projection transformation between the two planes through corresponding point pairs of four corner points between the camera image plane and the CIM ground two-dimensional plane and through an SVD (singular value decomposition) algorithm, and projecting the points of the image plane onto a composite panoramic plane parallel to the CIM ground through the homography matrix.

The specific contents of image splicing are as follows: first, preprocessing the image, including image correction and noise suppression, is performed, and the method of preprocessing the image is well known and will not be described herein. And then, registering the images, extracting the characteristic points, matching the characteristic points, and converting the two images into the same coordinate. The matching of the feature points can be transformed by a homography matrix, which can be estimated by the RANSAC method. And finally, deforming all input images to a composite panoramic plane, calculating the coordinate range of the deformed images of the input images to obtain the size of an output image, calculating the offset of the original point of the source image and the original point of the output panoramic image, and mapping the pixel of each input source image to the output plane. And finally, fusing images by a feathering method, a pyramid method, a gradient method and the like. The implementer can select the feature point matching and image fusion method according to actual conditions.

In order to visually present the information output by the system, the invention combines CIM to perform visual processing through WebGIS. The image plane images are spliced and then projected into a three-dimensional building model of the CIM, so that the information of continuous coverage of the whole road video can be obtained, and the effect of real-time imaging is achieved. By combining the vehicle detection information and the switch information of the camera HDR mode, the real-time road condition information of each area can be known. Meanwhile, through visualization of the WebGIS, a supervisor can search, inquire and analyze on Web, so that the supervisor can conveniently know real-time road conditions in the whole supervision area and take corresponding measures.

The above embodiments are merely preferred embodiments of the present invention, which should not be construed as limiting the present invention, and any modifications, equivalents and improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A smart city video self-adaptive HDR control method based on vehicle semantic perception is characterized by comprising the following steps:

2. The method of claim 1, wherein the vehicle detection encoder performs feature extraction on the road subregion image, outputting a feature map;

3. A method as claimed in claim 1 or 2, wherein the method further comprises training a vehicle detection encoder, a vehicle detection decoder, by: selecting images of road vehicle conditions collected under various working conditions as a training set, marking a vehicle center point in the images, generating hot spots on the vehicle center point through Gaussian blur, and marking the size of a vehicle surrounding frame; training is performed based on the designed loss function.

4. The method of claim 3, wherein the designed loss function L_detComprises the following steps:

L_det＝L_k+ω_sizeL_size

wherein, ω is_sizeIs weight coefficient, N is the number of key points in the image, x and Y are pixel point coordinates, c is category, Y is truth value thermodynamic diagram, Y' is preliminaryCalorimetry diagram, α being a hyperparameter, k representing the kth vehicle target, S'_pkFor the predicted value of the vehicle bounding box size, s_kThe real value of the size of the vehicle surrounding frame is obtained.

5. The method of claim 1, wherein obtaining the optimal matching solution of the current frame vehicle bounding box set and the previous frame vehicle bounding box set by the KM algorithm in (3) comprises:

6. The method according to claim 1, wherein the step (6) of calculating the real-time speed of the vehicle by combining the city information model according to the driving track of the vehicle specifically comprises the following steps:

7. The method of claim 1, wherein in (7), whether the vehicle is speeding is determined, and the switch for adjusting the HDR mode of the camera according to the speeding determination is specifically:

8. The method of claim 1, wherein the HDR mode is implemented by: converting the real-time acquired road subregion RGB image into a color space Lab, obtaining brightness information from an L value, judging whether HDR needs to be synthesized or not according to the brightness information, adjusting two frames of images acquired by twice exposure compensation if the HDR needs to be synthesized, and finally synthesizing three frames of images into HDR imaging.

9. The method of claim 1, wherein the performing an image stitching operation on the projectively transformed image and integrating into the city information model comprises: