CN114639171A

CN114639171A - Panoramic safety monitoring method for parking lot

Info

Publication number: CN114639171A
Application number: CN202210535421.9A
Authority: CN
Inventors: 刘寒松; 王永; 王国强; 刘瑞; 翟贵乾; 李贤超; 焦安健; 谭连胜; 董玉超
Original assignee: Sonli Holdings Group Co Ltd
Current assignee: Sonli Holdings Group Co Ltd
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-06-17
Anticipated expiration: 2042-05-18
Also published as: CN114639171B

Abstract

The invention belongs to the technical field of parking lot safety monitoring, and relates to a parking lot panoramic safety monitoring method.

Description

Panoramic safety monitoring method for parking lot

Technical Field

The invention belongs to the technical field of parking lot safety monitoring, and relates to a parking lot panoramic safety monitoring method based on multi-mode significance region recommendation.

Background

With the development of society and economic progress, family cars gradually become indispensable to families, and some families even have a plurality of cars, but the parking problem gradually becomes the problem that needs to be considered for urban management, and if the parking problem can not be fundamentally solved, great inconvenience will be brought to urban management, so the construction and safety problems of parking lots are more and more emphasized.

The traditional monitoring camera for the parking lot can only shoot at a fixed visual angle, so that the defects that if safety accidents occur at other positions which cannot be shot in a monitoring mode, or illegal activities are carried out through monitoring dead corners, real-time monitoring and real-time safety alarming cannot be achieved, and because only events in a fixed range can be seen in the traditional monitoring mode, the comprehensive monitoring of the parking lot is very unfavorable; even if the monitoring probe capable of moving in a range of 360 degrees is adopted, the content to be watched can be seen only by continuously moving, a large amount of content which may need attention is easily missed in the moving process, and the camera still belongs to a single visual angle at a certain moment and cannot achieve comprehensive monitoring. Meanwhile, the monitoring of the parking lot is limited by the internal memory of computer hardware, so that the parking lot can not be stored permanently for saving cost, the parking lot needs to be cleaned at intervals, some parking lots can only store monitoring contents for 4 days recently, a large amount of important information is lost, and corresponding monitoring contents cannot be called out when the monitoring contents are needed in the later period, so that the monitoring effect cannot be achieved.

In summary, the existing monitoring method can only monitor a fixed range, cannot automatically find a suitable position according to the content in the parking lot, cannot perform full-range real-time monitoring, and meanwhile, the traditional monitoring is time-consuming, labor-consuming and expensive, along with the rapid development of the deep learning technology, the deep learning-based method is rapidly developed in various industries, and a deep learning-based recommendation algorithm is continuously proposed, so that great performance improvement is achieved.

Disclosure of Invention

The invention aims to overcome the defects in the prior art, and provides a parking lot panoramic safety monitoring method based on multi-mode salient region recommendation.

In order to achieve the purpose, the specific process for realizing the panoramic safety monitoring of the parking lot comprises the following steps:

(1) collecting a panoramic camera video segment, and constructing a parking lot panoramic safety monitoring video data set according to the collected panoramic camera video segment;

(2) detecting according to the panoramic monitoring video data set of the parking lot based on the motion information, the comparison information and the significance of the sound information respectively;

(3) finding out corresponding important objects based on the significance detection results of the motion information, the comparison information and the sound information, and ranking the importance of the found important objects;

(4) carrying out panoramic monitoring video interframe consistency smoothing on the ranked objects, and connecting the same objects among the frames to form a smooth window motion track;

(5) converting the points into planes according to the motion tracks of different objects formed in the step (4), and providing panoramic safety content of a single window (single object) or multiple windows (multiple objects) in a screen mode;

(6) and (5) dynamically adjusting and manually intervening and monitoring contents aiming at the window contents output in the step (5), if the window contents are not the contents which are wanted to be focused, only moving a mouse to an object which needs to be focused, clicking the object, forming a projection window aiming at the center as a projection center by taking a mouse mark point as the center, and providing subsequent monitoring based on the window.

Specifically, in the step (1), the parking lot panoramic safety monitoring video data set adopts a saliency detection data set as a training set, in the process of constructing the training set, the whole data set is trained firstly, and then a data set related to a vehicle is selected as fine tuning data, so that a model focuses on saliency areas such as vehicles, pedestrians and obstacles, and the saliency detection data set comprises a video saliency detection data set DAVSOD which is most consistent with human eye visual features and is the largest, an audio contrast saliency detection data set MSRA10K which is most consistent with human eye visual features and is the largest, and an audio saliency detection data set AVEDataset which is most consistent with human eye visual features and is the largest.

Specifically, the motion information significance (VSOD) in the step (2) obtains the speed of the motion of the object relative to the background region based on the displacement relationship between the previous and subsequent frames of the object in the monitoring video to judge the state of the object, and calculates two adjacent frames (I) through FlowNet2₁，I₂) Obtaining motion information between videos, optical flow information

To calculate the velocity difference between two pixels:

，

，

，

is the distance the pixel has moved along the x-axis and y-axis respectively,

is pixel from I₁To I₂The time, x, y are the coordinate positions of the initial pixel points, and t is the movement

，

The time required; after obtaining the optical flow information, the significance detection network is perceived by designing the optical flow (

) Adopting DAVSOD as a motion information perception training data set, and converting the optical flow characteristics into motion significance characteristics:

，

flow2Color stands for converting the optical Flow graph into an RGB graph, different colors represent different moving directions, the shade represents the speed of movement,

taking as an input the RGB-map,

the motion significance detection result is output.

Specifically, the significance of the contrast information in the step (2) is based on the characteristics of the position, color, texture and the like of the object, the object is highlighted through the contrast with the background information and the object, the MSRA10K is adopted as a contrast information perception training data set, and the image-based significance (ISOD) detection algorithm (ContrastNet) outputs the scale contrast information of the perception object through different layer sides of the network layer,

，

respectively representing color information, texture information, edge information,

represents the multi-scale information of the multi-scale information,

taking as input the RGB map of MSRA10K,

the graph is an output comparative significance detection result graph.

Specifically, the sound information significance detection in the step (2) is to use sound as an auxiliary of visual information, use AVEDataset as a sound information perception training data set, convert audio information into a spectrogram through fourier transform, input the spectrogram into a network, and determine whether the current input contains sound information, so as to determine the importance of the current frame, specifically:

，

wherein,

in order to be a fourier transform,

taking the fourier transform map as an input,

the sound significance detection result graph is output.

Specifically, the ranking of the importance of the object in the step (3) is realized by the following formula:

，

wherein

Representing the ranking of the kth object, N being the number of pixels, Norm being the maximum and minimum normalization function of the matrix,

as object confidence, object confidence

And the object number k is obtained by object detection YoloV 5.

Specifically, in the step (4), the inter-frame consistency information is that the window coordinate changes within 5 pixels, the coordinate change beyond the range is regarded as the content beyond the motion range, and the formed motion track is as follows:

，

' an initial trajectory of a representative point,

represents the smoothed track of points, smooths represents the mean smoothing of the track,

is a point

The surrounding c =5 field of view,

is a point

Pixel points in the surrounding c =5 field.

Specifically, in the step (5), the dots are converted into the plane by the following formula

，

Wherein windows represents an output window,

representing point-based trajectories

The perspective projection.

Compared with the prior art, the invention has the beneficial effects that: the method is characterized in that a multi-mode salient region recommendation algorithm is adopted, an concerned region is enhanced, a non-concerned region reduces resolution, transmission is accelerated, storage space is saved, and cost is saved; the video significance adopted is mainly based on the human attention behavior to formulate a rule, and the priority ranking of monitoring content processing is formulated by combining the event importance, so that the purpose of real-time panoramic safety monitoring is achieved.

Drawings

Fig. 1 is a structural framework schematic diagram of a parking lot panoramic safety monitoring method based on multi-modal saliency area recommendation provided by the invention.

Detailed Description

The invention will now be further described by way of examples in connection with the accompanying drawings without in any way limiting the scope of the invention.

Example (b):

according to the embodiment, through collected video data, a multi-mode salient region recommendation algorithm based on motion information, contrast information and sound information is adopted to detect a human eye attention region based on a human eye attention mechanism, a monitoring content processing priority ranking is formulated by combining event importance weights, meanwhile, resource occupation is saved in a mode of reducing resolution ratio of a non-attention region, and therefore the purpose of real-time panoramic safety monitoring is achieved, and panoramic parking lot panoramic safety monitoring is formulated based on safety detection weights and monitoring interframe continuity, and the method specifically comprises the following steps:

(1) construction of a parking lot panoramic surveillance video dataset

Firstly, a panoramic camera video segment is collected to construct a parking lot panoramic safety monitoring video data set recommended based on a multi-modal saliency region, and since most parking lots do not have the equipment popularized on a large scale at present, and the device has less attention to the multi-modal information of the parking lot, which results in the lack of the current data set, the embodiment adopts the significance detection data set (the largest video significance detection data set davsodod which best conforms to the human eye visual characteristics, the largest sound contrast significance detection data set MSRA10K which best conforms to the human eye visual characteristics and the largest audio significance detection data set avedaast which best conforms to the human eye visual characteristics) as the training set, in the process of constructing a training set, pre-training is carried out on a complete data set, and then a data set related to a vehicle is selected as fine tuning data, so that the model focuses on salient regions such as vehicles, pedestrians and obstacles;

(2) motion information based saliency detection

Human eyes are very interested in moving objects because moving objects are often accompanied by the occurrence of events, and based on this mechanism, moving objects in parking lots are also very important, such as a traveling vehicle, a driver driving the vehicle, and an animal running around, if they can be captured by a camera in advance,the method can be used for processing the objects, such as clearing obstacles, driving away animals, and the like, so as to greatly avoid traffic accidents, the motion information significance (VSOD) obtains the speed of the object moving relative to a background area based on the displacement relation between the front frame and the rear frame of the object in a monitoring video to judge the state of the object, and in order to obtain the motion information between videos, the embodiment calculates two adjacent frames (I) through FlowNet2₁，I₂) Optical flow information between, optical flow information

To calculate the velocity difference between two pixels:

，

，

，

is the distance the pixel has moved along the x-axis and y-axis respectively,

，

The time required; after optical flow information is obtained, converting the optical flow characteristics into motion saliency characteristics, and designing an optical flow perception saliency detection network (

) Adopting DAVSOD as a motion information perception training data set, converting an optical flow graph into an RGB graph as network input, and outputting a significance detection picture as follows:

，

taking an RGB map as input;

(3) saliency detection based on contrast information

In addition to moving objects, in normal situations, cameras are in a still shooting state, but there are still many situations to be noticed, such as that a marker is present in the middle of a road and a vehicle object is lost, in these situations, the object is often still, and good effect cannot be obtained by using moving object detection, so that, in order to obtain contrast information of a video, a saliency (ISOD) detection algorithm (contrast net) based on picture RGB (from MSRA 10K) is used to obtain the contrast information of the video by highlighting the object through comparison with background information and between objects, MSRA10K is used as the contrast information perception training data to output scale contrast information of a perceived object through different layers of a network layer, while a light layer side outputs more attention to color, texture and edge contrast information, the network deep layer side characteristic concerns the high-layer semantic information comparison, and the output comparison significance detection result graph is as follows:

，

respectively, color information, texture information and edge information,

represents the multi-scale information of the multi-scale information,

taking an RGB map as input;

(4) saliency detection based on sound information

The occurrence of voice is often accompanied by the occurrence of events, and has a very strong interest in the sounding object human, because in most cases, the occurrence of traffic accidents is accompanied by the occurrence of voice, such as car accidents, vehicle alarm alarms and vehicle horn sounds, which has a very strong research value for the detection of vehicles in parking lots; because of the binaural effect, the person has the ability of recognizing the position by listening to sound, and mainly benefits from that the ear of the person can recognize the position according to the distance from the sound object to the ear, so that the embodiment uses the sound as the assistance of visual information, adopts AVEDataset as a sound information perception training data set, converts the audio information into a spectrogram through Fourier transform, inputs the spectrogram into the network, and judges whether the current input contains sound information, thereby determining the importance of the current frame, and the output sound significance detection result graph is as follows:

，

wherein,

in order to be a fourier transform,

taking the Fourier transform map as an input;

(4) content body level importance ranking based on panoramic surveillance video

The corresponding important objects can be found through information such as motion, contrast and sound, but a plurality of important objects may exist in some casesIn this case, the importance of a plurality of objects needs to be ranked, because when different information is used as the leading information, the determined important objects are different, and the confidence of the objects is obtained through object detection (YoloV 5) (ii)

) And an object number (k) which is determined by the significance value of the pixel as the degree of likelihood that the current object is significant, that is, the degree of importance of the current object, and the sound basically accompanies the occurrence of the event, wherein the weight of the event determined by the sound is 2 times the weight, and the obtained object is ranked as:

，

representing the ranking of the kth object, wherein N is the number of pixels, and Norm is the maximum and minimum normalization function of the matrix;

(5) smoothing based on panoramic surveillance video inter-frame consistency

After a plurality of objects are ranked, the same objects between frames need to be connected to form a smooth window motion track, in order to prevent the window position from generating serious jump caused by the object motion, continuous sliding between windows needs to be modeled as inter-frame consistency, the inter-frame consistency information is that the window coordinate change is within a certain range (5 pixels), the coordinate change beyond the range is regarded as the content beyond the motion range, and the obtained motion track of the point is as follows:

，

' an initial trajectory of a representative point,

is a point

The surrounding c =5 field of view,

is a point

Pixel points in the surrounding c =5 field;

(6) panoramic monitoring content output based on saliency region recommendation

After the motion tracks of different objects are formed, the panoramic security content of a single window (single object) or multiple windows (multiple objects) needs to be provided in a screen manner, and therefore, points need to be converted into a plane so as to be displayed in a display, specifically:

，

wherein windows represents an output window,

representing point-based trajectories

(ii) a perspective projection;

(7) dynamically adjusting and manually intervening monitoring content

The window obtained in the step (6) is a program recommended window, if the window for directly playing the program recommended window conforms to human eyes under most conditions, however, a part of the window still exists, which may not be the content really concerned by security personnel, for example, the security personnel may want to patrol the garage once to see whether the garage is safe or not, at this time, even if the garage does not work at this time, if the security personnel needs to convert the window view angle determination right into self control, the mouse needs to be moved to the object, and the object concerned is clicked, at this time, the mouse mark point is used as the center to form a projection window aiming at the center as the projection center, and subsequent monitoring is provided based on the window.

Algorithms and image processing processes not disclosed herein are common or prior art in the art.

It is noted that the present embodiment is intended to aid in further understanding of the present invention, but those skilled in the art will understand that: various substitutions and modifications are possible without departing from the spirit and scope of the invention and appended claims. Therefore, the invention should not be limited to the embodiments disclosed, but the scope of the invention is defined by the appended claims.

Claims

1. A parking lot panoramic safety monitoring method is characterized by comprising the following steps:

(5) converting the points into planes according to the motion tracks of different objects formed in the step (4), and providing single-window single object) or multi-window panoramic safety content in a screen mode;

(6) and (5) carrying out dynamic adjustment and manual intervention on the content monitored by the window content output in the step (5), if the window content is not the content wanted to be watched, only moving a mouse to an object needing to be watched, clicking the object, forming a projection window taking a mouse mark point as the center at the moment, and providing subsequent monitoring based on the window.

2. The parking lot panoramic safety monitoring method according to claim 1, wherein the parking lot panoramic safety monitoring video data set in the step (1) adopts a saliency detection data set as a training set, in the process of constructing the training set, the whole data set is trained firstly, and then a data set related to a vehicle is selected as fine tuning data, so that the model focuses on the saliency areas of the vehicle, the pedestrian and the obstacle, and the saliency detection data set comprises a video saliency detection data set DAVSOD which best conforms to the visual characteristics of human eyes and is the largest, an audio contrast saliency detection data set MSRA10K which best conforms to the visual characteristics of human eyes and is the largest, and an audio saliency detection data set AVEDataset which best conforms to the visual characteristics of human eyes and is the largest.

3. The parking lot panoramic safety monitoring method according to claim 2, wherein the motion information significance in the step (2) is based on the displacement relation between the front frame and the rear frame of the object in the monitoring video, the speed of the object moving relative to the background area is obtained to judge the state of the object, and two adjacent frames (I) are calculated through FlowNet2₁，I₂) Obtaining motion information between videos, optical flow information

To calculate the velocity difference between two pixels:

，

，

，

is the distance the pixel has moved along the x-axis and y-axis respectively,

，

The time required; after optical flow information is obtained, the significance detection network is perceived by designing the optical flow

And adopting the DAVSOD as a motion information perception training data set to convert the optical flow characteristics into motion significance characteristics:

，

flow2Color stands for converting an optical Flow graph into an RGB graph, different colors represent different motion directions, shades represent the speed of motion,

taking the RGB-map as an input,

the motion significance detection result is output.

4. The parking lot panoramic safety monitoring method according to claim 3, wherein the comparison information significance of step (2) is based on the position, color and texture characteristics of the object, the object is highlighted by comparing with the background information and the object, the MSRA10K is used as a comparison information perception training data set, the picture-based significance detection algorithm ContrastNet outputs the scale comparison information of the perception object through different layer sides of the network layer,

，

represents the multi-scale information of the multi-scale information,

taking as input the RGB map of MSRA10K,

and the output comparative significance detection result is a graph.

5. The parking lot panoramic safety monitoring method according to claim 4, wherein the sound information significance detection in the step (2) is implemented by taking sound as an aid of visual information, adopting AVEDataset as a sound information perception training data set, converting audio information into a spectrogram through Fourier transform, inputting the spectrogram into a network, and judging whether the current input contains sound information, so as to determine the importance of the current frame, and specifically comprises the following steps:

，

wherein,

in order to be a fourier transform,

taking the fourier transform map as an input,

the sound significance detection result graph is output.

6. The parking lot panoramic safety monitoring method according to claim 5, wherein the object importance ranking in step (3) is realized by the following formula:

，

wherein

Representing the ranking of the kth object, N is the number of pixels, Norm is the maximum and minimum normalization function of the matrix,

as object confidence, object confidence

And the object number k is obtained by object detection YoloV 5.

7. The parking lot panoramic safety monitoring method according to claim 6, wherein in the step (4), the inter-frame consistency information is that the window coordinate changes within 5 pixels, the coordinate change beyond the range is regarded as the content beyond the motion range, and the formed motion trail is as follows:

，

' an initial trajectory of a representative point,

representing the smoothed tracks of the points, Smooth represents the averaging smoothing of the tracks,

is a point

The surrounding area c =5 is the area,

is a point

Surrounding c =5 pixels within the field.

8. The parking lot panoramic safety monitoring method according to claim 7, wherein in the step (5), the points are converted into planes by the following formula:

，

wherein windows represents an output window,

representing point-based trajectories

The perspective projection.