CN111881853A - Method and device for identifying abnormal behaviors in oversized bridge and tunnel - Google Patents

Method and device for identifying abnormal behaviors in oversized bridge and tunnel Download PDF

Info

Publication number
CN111881853A
CN111881853A CN202010755106.8A CN202010755106A CN111881853A CN 111881853 A CN111881853 A CN 111881853A CN 202010755106 A CN202010755106 A CN 202010755106A CN 111881853 A CN111881853 A CN 111881853A
Authority
CN
China
Prior art keywords
image
video stream
tunnel
bridge
visible light
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010755106.8A
Other languages
Chinese (zh)
Other versions
CN111881853B (en
Inventor
陈平
王彦林
刘宾
闫禹
胡敏涛
赵鑫
苏新彦
聂鹏飞
王鉴
刘嘉宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North University of China
Original Assignee
North University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North University of China filed Critical North University of China
Priority to CN202010755106.8A priority Critical patent/CN111881853B/en
Publication of CN111881853A publication Critical patent/CN111881853A/en
Application granted granted Critical
Publication of CN111881853B publication Critical patent/CN111881853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/143Sensing or illuminating at different wavelengths

Abstract

The application provides a method for identifying abnormal behaviors in a super-large bridge and tunnel, which comprises the following steps: acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera; fusing the infrared image and the visible light image which are shot simultaneously in the video stream; determining whether a moving target exists in the fused video stream image or not by an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel. The method can improve the efficiency and accuracy of identifying abnormal behaviors in the oversized bridge and tunnel.

Description

Method and device for identifying abnormal behaviors in oversized bridge and tunnel
Technical Field
The invention relates to the technical field of image processing, in particular to a method and a device for identifying abnormal behaviors in an oversized bridge and tunnel.
Background
The bridge tunnel is a special structure of a road section, particularly an oversized bridge tunnel, and has the characteristics of easiness in occurrence of accidents, high accident hazard degree and difficulty in handling. How to identify abnormal behaviors in the tunnel is particularly important, and the method is directly related to safe and efficient operation after opening and operating the bridge and the tunnel.
The diameter tunnel and the sensitive fragile area of the super-large bridge are mostly in the environments of mountains, rivers, oceans and the like, the structural form of the super-large bridge is mostly long and narrow, and typical weather conditions are mostly accompanied by insufficient light irradiation of super-heavy rain, fog, cloudy days and the like. In actual detection, changes of angles of cameras, existence of shadows, occlusion problems, motion segmentation under complex backgrounds and the like often cause invalidation of human motion detection, and further, whether abnormity exists cannot be accurately identified.
Disclosure of Invention
In view of this, the application provides a method and a device for identifying abnormal behaviors in a super-large bridge and tunnel, which can improve the efficiency and accuracy of identifying abnormal behaviors in the super-large bridge and tunnel.
In order to solve the technical problem, the technical scheme of the application is realized as follows:
in one embodiment, a method for identifying abnormal behaviors in a super-large bridge and tunnel is provided, wherein a camera device is deployed in the super-large bridge and tunnel, and the camera device can shoot an image of a position, needing to be monitored, of the super-large bridge and tunnel; the method comprises the following steps:
acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
fusing the infrared image and the visible light image which are shot simultaneously in the video stream;
determining whether a moving target exists in the fused video stream image or not by an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
In another embodiment, a device for recognizing abnormal behaviors in a super-large bridge and tunnel is provided, wherein a camera device is deployed in the super-large bridge and tunnel, and the camera device can shoot an image of a position to be monitored of the super-large bridge and tunnel; the device comprises: the device comprises an acquisition unit, a fusion unit and a determination unit;
acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
the fusion unit is used for fusing the infrared image and the visible light image which are shot simultaneously in the video stream acquired by the acquisition unit;
the determining unit is used for determining whether a moving target exists in the video stream image fused by the fusing unit through an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
In another embodiment, an electronic device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor executes the computer program to implement the steps of the method for recognizing abnormal behavior in grand bridge tunneling.
In another embodiment, a computer readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method for identifying abnormal behavior in grand bridge tunneling.
According to the technical scheme, the video stream image acquired by the camera device is acquired, the visible light image and the infrared image in the video stream image are fused, and the difference between two fused continuous frames is calculated by using the interframe background difference to determine whether the video stream image has the moving target or not so as to determine whether the abnormal behavior exists in the ultralarge bridge tunnel or not. The scheme can improve the efficiency and accuracy of identifying abnormal behaviors in the ultra-large bridge and tunnel.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic diagram of an abnormal behavior recognition system in a grand bridge tunnel according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a deployed image capture device in an embodiment of the present application;
fig. 3 is a schematic view illustrating a process of identifying abnormal behavior in a tunnel with a super-large bridge according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a process for fusing an infrared image and a visible light image in an embodiment of the present application;
fig. 5 is a schematic view illustrating a process of identifying abnormal behavior in a tunnel with a super-large bridge in the second embodiment of the present application;
FIG. 6 is a schematic diagram of an object marked by a preset object detection model in the embodiment of the present application;
FIG. 7 is a schematic diagram of a human skeleton for marking a moving object in an embodiment of the present application;
FIG. 8 is a schematic diagram of an apparatus for implementing the above technique in an embodiment of the present application;
fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail with specific examples. Several of the following embodiments may be combined with each other and some details of the same or similar concepts or processes may not be repeated in some embodiments.
The embodiment of the application provides an abnormal behavior identification method in a super large bridge and tunnel, which is applied to an abnormal behavior identification system in the super large bridge and tunnel. Referring to fig. 1, fig. 1 is a schematic diagram of an abnormal behavior recognition system in a grand bridge tunnel according to an embodiment of the present application. The system comprises: an imaging device and a recognition device.
The camera device is arranged at the top of the oversized bridge and tunnel, and the camera device can shoot images of positions needing to be monitored of the oversized bridge and tunnel.
During specific implementation, the camera device is composed of a plurality of cameras, and the number of the cameras is determined by a deployment mode and the length of the oversized bridge tunnel.
In order to shoot video stream images of all positions needing to be shot, the cameras can be diagonally deployed in pairs according to the shooting range of the cameras to prolong the monitoring range in specific implementation.
In specific implementation, the shooting ranges of the two diagonally deployed cameras must be the same, and the shooting ranges of the two diagonally deployed cameras may be the same or different. The diagonal herein refers to one of two opposite corners of a rectangle. The cameras are deployed on the diagonal vertices of the rectangle.
The embodiment of the application takes the cameras with the same shooting range as an example, the length of the oversized bridge tunnel is 3L, the total shooting range of the two cameras deployed at opposite angles is L in the projection length on the ground, and then the shooting device consisting of 6 cameras is used for shooting the video stream image.
Referring to fig. 2, fig. 2 is a schematic diagram of an image capturing apparatus disposed in an embodiment of the present application. The 6 cameras deployed are cameras A, B, C, D, E and F, respectively. The length of the shooting range of the camera A, the camera B, the cameras C and D, and the cameras E and F on the ground is L. Fig. 2 greatly extends the shooting range by means of diagonally deployed cameras, and the number of the cameras can be saved.
That is to say, the camera device realizes the imaging detection of a large visual field range area through the combination of a plurality of small visual field sub-aperture imaging devices. The influence of interferents in an imaging scene is removed through a ray screening and refocusing reconstruction method, so that the reliable detection of the target is realized, the probability of the target identification in the region is increased, and the coverage range and the coverage distance of the detection region are ensured.
In consideration of the fact that the ultra-large bridge and tunnel are mostly located in environments such as mountains, rivers and oceans, insufficient light irradiation in rain, fog and cloudy days and the like are typical weather conditions, in order to guarantee the effectiveness of the recognition system under the above severe conditions, the camera device in the embodiment of the application can shoot the infrared image and the visible light image simultaneously.
In specific implementation, one camera can shoot an infrared image and a visible light image simultaneously, and two cameras can be arranged at the same position, wherein one camera is used for shooting the visible light image, and the other camera is used for shooting the infrared image.
The monitored area in the embodiment of the application has no access of moving objects such as vehicles, personnel and the like under the conventional condition.
The identification device can be a PC or a server, such as a GPU application computing server; the monitoring system is used for acquiring the video stream images shot by the camera device, processing the video stream images and identifying whether abnormal behaviors exist in a monitored area.
In the embodiment of the application, the video stream image shot by the camera device can be directly transmitted by the camera device or forwarded by the network video service device;
the network video service device may be one or more PCs or servers, and is configured to display, that is, to perform field monitoring, when acquiring the video stream image transmitted by the camera device, and transmit the acquired video stream image of the camera device to the identification device.
The following describes in detail a process of implementing abnormal behavior recognition in a super-large bridge tunnel in the embodiment of the present application with reference to the accompanying drawings.
Example one
And identifying abnormal behaviors performed on an unmanned area of the oversized bridge and tunnel.
Referring to fig. 3, fig. 3 is a schematic view illustrating a process of identifying abnormal behavior in a tunnel with a super-large bridge according to an embodiment of the present application. The method comprises the following specific steps:
step 301, acquiring a video stream image acquired by the camera device; wherein the video stream images include an infrared image and a visible light image captured simultaneously by the camera.
The acquired video stream image collected by the camera device can be directly transmitted by the camera device or transmitted by a network video service device.
In the embodiment of the application, during specific implementation, the image enhancement can be performed on the collected video stream image, and then the image fusion is performed.
Step 302, performing fusion processing on the infrared image and the visible light image which are shot simultaneously in the video stream.
The infrared image can be a 14bitst original infrared image collected by a camera with an infrared focal plane array IRFPA detector; the raw visible light image in raw YUV format collected by the camera with the visible light CCD sensor may be used for the visible light image.
In the embodiment of the present application, a specific implementation manner of fusing an infrared image and a visible light image is not limited, and a specific fusion manner is given below:
referring to fig. 4, fig. 4 is a schematic view of a fusion process of an infrared image and a visible light image in the embodiment of the present application. The method comprises the following specific steps:
step 401, performing two-point correction, blind pixel compensation and median filtering on the infrared image.
And step 402, carrying out image processing on the infrared image subjected to the planting filtering processing by a histogram equalization method.
The infrared image processed by the histogram equalization method can improve the contrast on the infrared image.
And step 403, registering the infrared image and the visible light image processed by the histogram equalization method.
Step 404, decomposing the infrared image into a high-level image, a middle-level image and a bottom-level image by a Laplacian pyramid decomposition method.
Step 405, extracting image details of the visible light image.
In specific implementation, details of visible light can be directly extracted, or a visible light image can be decomposed into a base layer and a detail layer through a low-pass filter, and the detail layer is used as image details of the visible light.
And 406, fusing the extracted image details of the visible light image with the bottom layer image of the infrared image.
According to the visible light infrared image fusion algorithm for multi-scale decomposition and saliency region extraction in the embodiment of the application, the saliency details of a visible light image are extracted by using the image detail extraction, the multi-scale image decomposition of the infrared image is combined, the bottom layer image and the visible light detail side image are fused through weighted reconstruction, and effective saliency detection and multi-scale visible light infrared image fusion are finally realized.
And step 407, performing image reconstruction by using the fused image and the middle-layer image and the high-layer image of the infrared image.
According to the method and the device, the texture detail characteristics of the scene in the visible light image and the target characteristics (bottom layer images) of the infrared image are combined with each other through advantages among various sensors, and the reconstructed image contains the advantages of the texture detail characteristics and the target characteristics, so that color and detail information in the scene can be reserved, and heat source objects such as pedestrians can be displayed.
Step 303, determining whether a moving target exists in the fused video stream image by an inter-frame background difference method; if so, go to step 304; otherwise, step 305 is performed.
And sequentially carrying out interframe background difference calculation on two continuous frames of images in the fused video stream image, for example, taking 5 continuous frames as an example, carrying out difference calculation on a first frame and a second frame, carrying out difference calculation on the second frame and a third frame, carrying out difference calculation on the third frame and a fourth frame, and carrying out difference calculation on the fourth frame and a fifth frame.
The specific process of implementing the difference for any two continuous frames is as follows:
extracting a motion area of a target from two continuous frames of images in time, or updating a background;
performing difference calculation on the two frames of images after the background is updated, namely performing difference on pixel points corresponding to different frames;
determining an absolute value of a gray difference, and if the difference value is greater than a preset threshold value, determining that a moving target exists in the video stream corresponding to the two frames of images; otherwise, determining that no moving object exists in the video streams corresponding to the two frames of images.
The processing is carried out on each two continuous frames, and when the gray difference value between all the two continuous frames is not greater than a preset threshold value, the fact that no moving target exists in the whole video stream is determined; otherwise, the moving object is determined to be present in the entire video stream.
And step 304, preliminarily determining that abnormal behaviors exist in the super bridge tunnel.
The method is generally used for monitoring an unmanned area, pedestrians do not exist in a super-large bridge tunnel, and abnormal behaviors exist in the super-large bridge tunnel primarily determined as long as moving objects such as the pedestrians exist; and may output an alarm.
And 305, determining that abnormal behaviors do not exist in the oversized bridge tunnel.
In the embodiment, the video stream image acquired by the camera device is acquired, the visible light image and the infrared image in the video stream image are fused, and the difference between two fused continuous frames is calculated by using the interframe background difference to determine whether a moving target exists in the video stream image so as to determine whether an abnormal behavior exists in the ultralarge bridge tunnel. The scheme can improve the efficiency and accuracy of identifying abnormal behaviors in the ultra-large bridge and tunnel.
Example two
Referring to fig. 5, fig. 5 is a schematic view of a process of identifying abnormal behavior in a tunnel with a super-large bridge in the second embodiment of the present application. The method comprises the following specific steps:
step 501, acquiring a video stream image acquired by the camera device; wherein the video stream images include an infrared image and a visible light image captured simultaneously by the camera.
The acquired video stream image collected by the camera device can be directly transmitted by the camera device or transmitted by a network video service device.
In the embodiment of the application, during specific implementation, the image enhancement can be performed on the collected video stream image, and then the image fusion is performed.
Step 502, fusing the infrared image and the visible light image which are shot simultaneously in the video stream.
The infrared image can be a 14bitst original infrared image collected by a camera with an infrared focal plane array IRFPA detector; the raw visible light image in raw YUV format collected by the camera with the visible light CCD sensor may be used for the visible light image.
In the embodiment of the present application, a specific implementation manner of fusing an infrared image and a visible light image is not limited, and a specific fusion manner is given below:
firstly, two-point correction, blind pixel compensation and median filtering processing are carried out on the infrared image.
And secondly, processing the infrared image after the planting filtering processing by a histogram equalization method.
The infrared image processed by the histogram equalization method can improve the contrast on the infrared image.
And thirdly, registering the infrared image and the visible light image processed by the histogram equalization method.
And fourthly, decomposing the infrared image into a high-level image, a middle-level image and a bottom-level image in a Laplacian pyramid decomposition mode.
And fifthly, extracting image details of the visible light image.
In specific implementation, details of visible light can be directly extracted, or a visible light image can be decomposed into a base layer and a detail layer through a low-pass filter, and the detail layer is used as image details of the visible light.
And sixthly, fusing the extracted image details of the visible light image with the bottom layer image of the infrared image.
The fusion algorithm used in fusing the image details of the visible light image and the underlying image of the infrared image is not limited in the embodiment of the present application.
And seventhly, carrying out image reconstruction by using the fused image and the middle-layer image and the high-layer image of the infrared image.
According to the method and the device, the texture detail characteristics of the scene in the visible light image and the target characteristics (bottom layer images) of the infrared image are combined with each other through advantages among various sensors, and the reconstructed image contains the advantages of the texture detail characteristics and the target characteristics, so that color and detail information in the scene can be reserved, and heat source objects such as pedestrians can be displayed.
Step 503, determining whether a moving target exists in the fused video stream image by an inter-frame background difference method; if yes, go to step 504; otherwise, step 510 is performed.
And sequentially carrying out interframe background difference calculation on two continuous frames of images in the fused video stream image, for example, taking 5 continuous frames as an example, carrying out difference calculation on a first frame and a second frame, carrying out difference calculation on the second frame and a third frame, carrying out difference calculation on the third frame and a fourth frame, and carrying out difference calculation on the fourth frame and a fifth frame.
The specific process of implementing the difference for any two continuous frames is as follows:
extracting a motion area of a target from two continuous frames of images in time, or updating a background;
performing difference calculation on the two frames of images after the background is updated, namely performing difference on pixel points corresponding to different frames;
determining an absolute value of a gray difference, and if the difference value is greater than a preset threshold value, determining that a moving target exists in the video stream corresponding to the two frames of images; otherwise, determining that no moving object exists in the video streams corresponding to the two frames of images.
The processing is carried out on each two continuous frames, and when the gray difference value between all the two continuous frames is not greater than a preset threshold value, the fact that no moving target exists in the whole video stream is determined; otherwise, the moving object is determined to be present in the entire video stream.
And step 504, preliminarily determining that abnormal behaviors exist in the super bridge tunnel.
And 505, detecting and marking the target in the video stream image with the moving target through a preset target detection model.
Presetting a target detection model as a pre-trained model; the model is able to detect and mark objects; a target detection model training process:
firstly, acquiring a video stream image shot in a super bridge tunnel.
And secondly, selecting a video stream image containing pedestrian behaviors, and manually marking the video stream image through labelme to be used as a training sample.
When manual labeling is performed, lines are used to label people in the video stream images.
And thirdly, training an initial Mask RCNN model by using the training sample based on a small batch gradient descent algorithm with the aim of minimizing loss to obtain a target detection model.
In the embodiment of the application, an initial Mask RCNN model is used for training, and only one class of classification labels is used: people are classified, namely the model is trained into a model only marking people, and other models are not marked, so that the marking efficiency can be improved.
Referring to fig. 6, fig. 6 is a schematic diagram of a target marked by a preset target detection model in the embodiment of the present application. The boxes and the labeling of the human body contours are given in fig. 6.
Step 506, describing the posture of the moving target by using a human skeleton mode for the moving target marked by the target detection model, acquiring the human skeleton of the moving target, and tracking the human skeleton of the moving target in the video stream image.
In the embodiment of the application, the description of the human body posture adopts a human body skeleton mode, and the human body is regarded as a rigid body set connected by joint nodes.
In the concrete implementation, the human body behaviors are simplified into a rigid body model with 17 pairs of human body skeleton coordinates, and the human body skeleton structure is obtained by connecting pipe nodes through direct sections. Specific implementations are not limited to such implementations.
When the tracking is specifically realized, searching the content range of the subsequent image according to the human skeleton characteristics detected by the current frame, and realizing the human skeleton tracking by comparing and matching the similarity scores of the key points in the adjacent frames;
if a plurality of people exist in one frame of image, the same person uses the same color to mark skeleton information and different people use different color to mark skeleton information, so as to complete the on-line multi-person posture tracking in the video stream.
The method has the advantages that the moving human body can be quickly and simply captured by tracking the moving target through tracking the human body skeleton, the complex motion of the human body is better depicted, the network computing burden is reduced, and the posture estimation precision and the abnormal behavior recognition effect of the target under multiple scenes are improved.
In the embodiment of the application, an optical flow method is introduced when the human skeleton of the moving target is tracked in the video stream image.
When the objects and scenes in the three-dimensional space are corresponding to the movement of the two-dimensional image plane, the projection of the objects and scenes on the two-dimensional image plane forms the movement, and the flow of the movement expressed by the brightness mode of the image plane is called optical flow. The optical flow method is an important method for analyzing a motion sequence image, and the optical flow not only contains motion information of an object in the image, but also contains rich information of a three-dimensional physical structure, so that the optical flow method can be used for determining the motion condition of the object and reflecting other information of the image.
The specific process of tracking by combining the introduced optical flow method with the human skeleton is as follows:
detecting human body key points of an Nth frame of a video input image, taking each obtained key point as a center, sampling the periphery by 16 x 16 to be used as a characteristic region, calculating the gradient amplitude and the direction of each pixel point in the region, taking the direction with the largest statistical number as the main direction of the key point, then dividing each characteristic region into 4 x 4 subdomains, and calculating a gradient direction histogram of 8 directions for each subdomain. And combining the 16 gradient histograms into a 128-dimensional vector to generate a 128-dimensional feature vector, and generating a vector set of key points after normalization. And simultaneously carrying out K nearest neighbor matching with the (N + 1) th target key point, and screening according to the structural consistency constraint condition to obtain the target key point which is successfully subjected to primary matching.
Calculating the optical flow of each key point in the key point set of the target object by using a Lucas-Kanade optical flow algorithm between the Nth frame and the (N + 1) th frame, matching the optical flow when the target moves from the initial position to the current position by a mapping function based on bilinear interpolation, updating the target position of the next frame by combining the change of the coordinates of the target key points of the initial frame and the previous frame, and completing the tracking of the human skeleton posture in the video stream.
In the embodiment of the application, the position problem of the same moving target in a series of video image sequences acquired by a plurality of monitoring sensors is determined; the method solves the problems of combining depth tracking and shielding processing and expanding a tracking scene area. An optical flow method is introduced in the aspect of skeleton tracking, loss of the skeleton is prevented through optical flow mapping, and irregular change of tracked limb movement of a pedestrian is prevented.
Referring to fig. 7, fig. 7 is a schematic diagram of a human skeleton for marking a moving object in the embodiment of the present application. As shown in fig. 7, the human skeleton is marked by connecting key points by line segments.
And 507, acquiring an N +1 frame video stream image corresponding to the N frames of video stream images continuously existing in the moving target by using a preset prediction model.
Under the condition of monitoring of the distributed combined cameras, namely under the monitoring of the camera device in the embodiment of the application, pedestrians are located in different positions in a video, the individual size change is large, and in order to solve the problem that the distance between an individual pedestrian and a monitoring camera is far and near, the size difference of the individual pedestrian in an image is caused, and the remote individual is easily and directly overlooked due to the fact that key points are too dense. The human motion is divided into a global rigid motion component of the body and a local non-rigid motion component of skeletal keypoints. The global component is defined as the absolute position of the center of the body within the original image frame, including information about the shape, size and rigid motion of the body's bounding box. The local component is defined as the relative position of the skeleton joint with respect to the bounding box, describing the internal deformation of the skeleton, and ignoring the absolute position of the skeleton with respect to the environment. The global and local components are processed simultaneously as two concurrent sub-processes. In video, these two processes can even be displayed independently. Even given a particular context, conventional human activity involves a strong correlation between these two components. I.e. the global component is a global rigid motion component of the body and the local component is a local non-rigid motion component of a key point of the skeleton.
The preset prediction model learns a normal behavior rule mode in a training video sequence by using an automatic coder-decoder and combining a double-circulation neural network, trains and predicts dynamic coordinate information of the global component and the local component independent components through a cross-branch information transfer mechanism, and enables a state under one component to be used as an input state of the other component through circulation matching.
Step 508, determining whether a difference value between a key point of a human skeleton in the N +1 frame video stream image obtained by using a preset prediction model and the N +1 frame image with the moving target is greater than a preset threshold, if so, executing step 509; otherwise, step 510 is performed.
The method comprises the following steps of determining the difference value of key points of a human skeleton between an N +1 frame image predicted by a preset prediction model and an actual N +1 frame image.
And 509, determining that abnormal behaviors exist in the oversized bridge tunnel, and outputting an alarm.
And if the moving person exists in the oversized bridge tunnel, determining that abnormal behaviors exist in the oversized bridge tunnel due to abnormal behaviors of the moving person.
And step 510, determining that abnormal behaviors do not exist in the oversized bridge tunnel.
And if no person exists in the super-large bridge tunnel or a person exists in the super-large bridge tunnel, and the existing person acts normally, determining that no abnormal behavior exists in the super-large bridge tunnel.
In the embodiment, the video stream image acquired by the camera device is acquired, the visible light image and the infrared image in the video stream image are fused, and the difference between two fused continuous frames is calculated by using the interframe background difference to determine whether a moving target exists in the video stream image so as to determine whether an abnormal behavior exists in the ultralarge bridge tunnel. The scheme can improve the efficiency and accuracy of identifying abnormal behaviors in the ultra-large bridge and tunnel.
In addition, a method of releasing from the top in deep learning is adopted in the embodiment of the application, firstly, a target object is detected, then, single key point recognition is carried out on the characteristics corresponding to the detected target person, secondly, the target is tracked by utilizing skeleton detection, and finally, abnormity detection is carried out in a mode of predicting the track of the pedestrian skeleton through a multi-cascade deep learning network according to the characteristic rule of the pedestrian skeleton motion in the learning monitoring video. When there is some rule or a behavior prone to frequent occurrences, an early warning is implemented to indicate that a relevant legitimate measure is taken.
Based on the same inventive concept, the embodiment of the application also provides a device for identifying abnormal behaviors in the ultra-large bridge tunnel. Deploying a camera device in the oversized bridge and tunnel, wherein the camera device can shoot images of positions needing to be monitored of the oversized bridge and tunnel. Referring to fig. 8, fig. 8 is a schematic structural diagram of an apparatus applied to the above technology in the embodiment of the present application. The device comprises: an acquisition unit 801, a fusion unit 802, and a determination unit 803;
an obtaining unit 801, configured to obtain a video stream image acquired by the image capturing apparatus; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
a fusion unit 802, configured to perform fusion processing on the infrared image and the visible light image that are simultaneously captured in the video stream acquired by the acquisition unit 801;
a determining unit 803, configured to determine whether a moving object exists in the video stream image fused by the fusing unit 802 by using an inter-frame background subtraction method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
Preferably, the apparatus performs a method comprising: a processing unit 804;
the processing unit 804 is configured to, after the determining unit 803 preliminarily determines that the abnormal behavior exists in the ultra-large bridge tunnel, detect and mark a target in a video stream image in which a moving target exists through a preset target detection model; describing the posture of the moving target by using a human body skeleton mode for the moving target marked by the target detection model, acquiring the human body skeleton of the moving target, and tracking the human body skeleton of the moving target in a video stream image; acquiring an N +1 frame video stream image corresponding to the N frames of video stream images continuously existing in the moving target by using a preset prediction model;
the determining unit 803 is further configured to determine whether a difference between a key point of a human skeleton in the N +1 th frame of video stream image obtained by the processing unit 804 using a preset prediction model and the N +1 th frame of image where the moving target exists is greater than a preset threshold, and if so, determine that an abnormal behavior exists in the ultra-large bridge tunnel, and output an alarm; otherwise, determining that abnormal behaviors do not exist in the oversized bridge tunnel.
Preferably, the target detection model training process includes:
acquiring a video stream image shot in an oversized bridge tunnel;
selecting a video stream image containing pedestrian behaviors, and manually marking the video stream image through labelme to be used as a training sample;
and training the initial MaskRCNN model by using the training sample based on a small batch gradient descent algorithm with the aim of minimizing loss to obtain a target detection model.
Preferably, the first and second electrodes are formed of a metal,
the processing unit 804 is further configured to introduce an optical flow method when the human skeleton of the moving target is tracked in the video stream image.
Preferably, the first and second electrodes are formed of a metal,
the preset prediction model learns a normal behavior rule mode in a training video sequence by using an automatic coder-decoder in combination with a bicirculating neural network, trains and predicts dynamic coordinate information of a global component and a local component independent component through a cross-branch information transfer mechanism, and enables a state under one component to be used as an input state of the other component through circular cooperation; wherein the global component is a global rigid motion component of the body, and the local component is a local non-rigid motion component of a key point of the skeleton.
Preferably, the camera device consists of a plurality of cameras and is arranged at the top of the monitoring range; and carrying out diagonal deployment on every two cameras according to the shooting range of the cameras to prolong the monitoring range.
Preferably, the first and second electrodes are formed of a metal,
a fusion unit 802, specifically configured to perform two-point correction, blind pixel compensation, and median filtering on the infrared image; processing the infrared image subjected to median filtering by a histogram equalization method; registering the infrared image and the visible light image processed by the histogram equalization method; decomposing the infrared image into a high-level image, a middle-level image and a bottom-level image in a Laplacian pyramid decomposition mode; extracting image details of the visible light image; fusing the extracted image details of the visible light image with the bottom layer image of the infrared image; and carrying out image reconstruction by using the fused image and the middle-layer image and the high-layer image of the infrared image.
The units of the above embodiments may be integrated into one body, or may be separately deployed; may be combined into one unit or further divided into a plurality of sub-units.
In another embodiment, an electronic device is further provided, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the steps of the method for identifying abnormal behavior in grand bridge tunnel when executing the program.
In another embodiment, a computer-readable storage medium is further provided, on which computer instructions are stored, and the instructions, when executed by a processor, can implement the steps in the method for identifying abnormal behavior in grand bridge tunnel.
Fig. 9 is a schematic physical structure diagram of an electronic device according to an embodiment of the present invention. As shown in fig. 9, the electronic device may include: a Processor (Processor)910, a communication Interface (Communications Interface)920, a Memory (Memory)930, and a communication bus 940, wherein the Processor 910, the communication Interface 920, and the Memory 930 communicate with each other via the communication bus 940. Processor 910 may invoke logic instructions in memory 930 to perform the following method:
acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
fusing the infrared image and the visible light image which are shot simultaneously in the video stream;
determining whether a moving target exists in the fused video stream image or not by an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
Furthermore, the logic instructions in the memory 930 may be implemented in software functional units and stored in a computer readable storage medium when the logic instructions are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for identifying abnormal behaviors in an oversized bridge and tunnel is characterized in that a camera device is deployed in the oversized bridge and tunnel, and the camera device can shoot an image of a position to be monitored of the oversized bridge and tunnel; the method comprises the following steps:
acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
fusing the infrared image and the visible light image which are shot simultaneously in the video stream;
determining whether a moving target exists in the fused video stream image or not by an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
2. The method of claim 1, wherein after initially determining that there is abnormal behavior in the ultralarge bridge tunnel, the method further comprises:
detecting and marking a target in a video stream image with a moving target through a preset target detection model;
describing the posture of the moving target by using a human body skeleton mode for the moving target marked by the target detection model, acquiring the human body skeleton of the moving target, and tracking the human body skeleton of the moving target in a video stream image;
acquiring an N +1 frame video stream image corresponding to the N frames of video stream images continuously existing in the moving target by using a preset prediction model;
determining whether a difference value of key points of a human skeleton in the (N + 1) th frame of video stream image obtained by using a preset prediction model and the (N + 1) th frame of image with the moving target is greater than a preset threshold value, if so, determining that abnormal behaviors exist in the super-large bridge tunnel, and outputting an alarm; otherwise, determining that abnormal behaviors do not exist in the oversized bridge tunnel.
3. The method of claim 2, wherein the target detection model training process:
acquiring a video stream image shot in an oversized bridge tunnel;
selecting a video stream image containing pedestrian behaviors, and manually marking the video stream image through labelme to be used as a training sample;
and training the initial Mask RCNN model by using the training sample based on a small batch gradient descent algorithm with the aim of minimizing loss to obtain a target detection model.
4. The method of claim 2, wherein an optical flow method is introduced when tracking the human skeleton of the moving object in the video stream image.
5. The method of claim 2,
the preset prediction model learns a normal behavior rule mode in a training video sequence by using an automatic coder-decoder in combination with a bicirculating neural network, trains and predicts dynamic coordinate information of a global component and a local component independent component through a cross-branch information transfer mechanism, and enables a state under one component to be used as an input state of the other component through circular cooperation; wherein the global component is a global rigid motion component of the body, and the local component is a local non-rigid motion component of a key point of the skeleton.
6. The method according to claim 1, characterized in that the camera device consists of a plurality of cameras, arranged at the top of the monitoring range; and carrying out diagonal deployment on every two cameras according to the shooting range of the cameras to prolong the monitoring range.
7. The method according to claim 1, wherein the fusing the infrared image and the visible light image captured simultaneously in the video stream comprises:
carrying out two-point correction, blind pixel compensation and median filtering processing on the infrared image;
processing the infrared image subjected to median filtering by a histogram equalization method;
registering the infrared image and the visible light image processed by the histogram equalization method;
decomposing the infrared image into a high-level image, a middle-level image and a bottom-level image in a Laplacian pyramid decomposition mode;
extracting image details of the visible light image;
fusing the extracted image details of the visible light image with the bottom layer image of the infrared image;
and carrying out image reconstruction by using the fused image and the middle-layer image and the high-layer image of the infrared image.
8. The abnormal behavior recognition device in the oversized bridge and tunnel is characterized in that a camera device is deployed in the oversized bridge and tunnel, and the camera device can shoot an image of a position to be monitored of the oversized bridge and tunnel; the device comprises: the device comprises an acquisition unit, a fusion unit and a determination unit;
acquiring a video stream image acquired by the camera device; wherein the video stream images comprise an infrared image and a visible light image captured simultaneously by the camera;
the fusion unit is used for fusing the infrared image and the visible light image which are shot simultaneously in the video stream acquired by the acquisition unit;
the determining unit is used for determining whether a moving target exists in the video stream image fused by the fusing unit through an interframe background difference method; if so, preliminarily determining that abnormal behaviors exist in the super bridge tunnel; otherwise, determining that abnormal behaviors exist in the oversized bridge tunnel.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-7 when executing the program.
10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 1 to 7.
CN202010755106.8A 2020-07-31 2020-07-31 Method and device for identifying abnormal behaviors in oversized bridge and tunnel Active CN111881853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010755106.8A CN111881853B (en) 2020-07-31 2020-07-31 Method and device for identifying abnormal behaviors in oversized bridge and tunnel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010755106.8A CN111881853B (en) 2020-07-31 2020-07-31 Method and device for identifying abnormal behaviors in oversized bridge and tunnel

Publications (2)

Publication Number Publication Date
CN111881853A true CN111881853A (en) 2020-11-03
CN111881853B CN111881853B (en) 2022-09-16

Family

ID=73205825

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010755106.8A Active CN111881853B (en) 2020-07-31 2020-07-31 Method and device for identifying abnormal behaviors in oversized bridge and tunnel

Country Status (1)

Country Link
CN (1) CN111881853B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159229A (en) * 2021-05-19 2021-07-23 深圳大学 Image fusion method, electronic equipment and related product
CN113822250A (en) * 2021-11-23 2021-12-21 中船(浙江)海洋科技有限公司 Ship driving abnormal behavior detection method
CN113901931A (en) * 2021-10-13 2022-01-07 山东大学 Knowledge distillation model-based behavior recognition method for infrared and visible light videos
CN114565882A (en) * 2022-04-29 2022-05-31 深圳航天信息有限公司 Abnormal behavior analysis method and device based on intelligent linkage of multiple video cameras
WO2022252642A1 (en) * 2021-06-01 2022-12-08 平安科技(深圳)有限公司 Behavior posture detection method and apparatus based on video image, and device and medium
CN116091959A (en) * 2022-11-21 2023-05-09 武汉坤达安信息安全技术有限公司 Double-light linkage identification method and device based on all-weather smoke and fire
CN117474983A (en) * 2023-12-27 2024-01-30 广东力创信息技术有限公司 Early warning method based on light-vision linkage and related device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001174242A (en) * 1999-12-22 2001-06-29 Toyo Commun Equip Co Ltd In-tunnel monitoring system
CN101902557A (en) * 2009-05-26 2010-12-01 南京敏思科技有限公司 Reconstruction method and device of video image background
CN107705322A (en) * 2017-09-27 2018-02-16 中北大学 Motion estimate tracking and system
CN109255774A (en) * 2018-09-28 2019-01-22 中国科学院长春光学精密机械与物理研究所 A kind of image interfusion method, device and its equipment
CN109376641A (en) * 2018-10-16 2019-02-22 长安大学 A kind of moving vehicle detection method based on unmanned plane video
CN109543513A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Method, apparatus, equipment and the storage medium that intelligent monitoring is handled in real time

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001174242A (en) * 1999-12-22 2001-06-29 Toyo Commun Equip Co Ltd In-tunnel monitoring system
CN101902557A (en) * 2009-05-26 2010-12-01 南京敏思科技有限公司 Reconstruction method and device of video image background
CN107705322A (en) * 2017-09-27 2018-02-16 中北大学 Motion estimate tracking and system
CN109255774A (en) * 2018-09-28 2019-01-22 中国科学院长春光学精密机械与物理研究所 A kind of image interfusion method, device and its equipment
CN109543513A (en) * 2018-10-11 2019-03-29 平安科技(深圳)有限公司 Method, apparatus, equipment and the storage medium that intelligent monitoring is handled in real time
CN109376641A (en) * 2018-10-16 2019-02-22 长安大学 A kind of moving vehicle detection method based on unmanned plane video

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
XINNAN FAN等: "A Thermal Infrared and Visible Images Fusion Based Approach for Multitarget Detection under Complex Environment", 《MATHEMATICAL PROBLEMS IN ENGINEERING》 *
李伟帅等: "基于多传感器的地面移动目标识别技术研究", 《光电技术应用》 *
李豪杰等: "基于视频的人体运动捕捉综述", 《计算机辅助设计与图形学学报》 *
王志等: "基于深度学习的复杂背景下目标检测", 《重庆理工大学学报(自然科学)》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113159229A (en) * 2021-05-19 2021-07-23 深圳大学 Image fusion method, electronic equipment and related product
CN113159229B (en) * 2021-05-19 2023-11-07 深圳大学 Image fusion method, electronic equipment and related products
WO2022252642A1 (en) * 2021-06-01 2022-12-08 平安科技(深圳)有限公司 Behavior posture detection method and apparatus based on video image, and device and medium
CN113901931A (en) * 2021-10-13 2022-01-07 山东大学 Knowledge distillation model-based behavior recognition method for infrared and visible light videos
CN113822250A (en) * 2021-11-23 2021-12-21 中船(浙江)海洋科技有限公司 Ship driving abnormal behavior detection method
CN114565882A (en) * 2022-04-29 2022-05-31 深圳航天信息有限公司 Abnormal behavior analysis method and device based on intelligent linkage of multiple video cameras
CN114565882B (en) * 2022-04-29 2022-07-19 深圳航天信息有限公司 Abnormal behavior analysis method and device based on intelligent linkage of multiple video cameras
CN116091959A (en) * 2022-11-21 2023-05-09 武汉坤达安信息安全技术有限公司 Double-light linkage identification method and device based on all-weather smoke and fire
CN116091959B (en) * 2022-11-21 2024-03-22 武汉坤达安信息安全技术有限公司 Double-light linkage identification method and device based on all-weather smoke and fire
CN117474983A (en) * 2023-12-27 2024-01-30 广东力创信息技术有限公司 Early warning method based on light-vision linkage and related device
CN117474983B (en) * 2023-12-27 2024-03-12 广东力创信息技术有限公司 Early warning method based on light-vision linkage and related device

Also Published As

Publication number Publication date
CN111881853B (en) 2022-09-16

Similar Documents

Publication Publication Date Title
CN111881853B (en) Method and device for identifying abnormal behaviors in oversized bridge and tunnel
Rakibe et al. Background subtraction algorithm based human motion detection
JP4216668B2 (en) Face detection / tracking system and method for detecting and tracking multiple faces in real time by combining video visual information
EP2713308B1 (en) Method and system for using fingerprints to track moving objects in video
US20060067562A1 (en) Detection of moving objects in a video
Rout A survey on object detection and tracking algorithms
CN103093198B (en) A kind of crowd density monitoring method and device
JP2008192131A (en) System and method for performing feature level segmentation
Sengar et al. Motion detection using block based bi-directional optical flow method
KR101681104B1 (en) A multiple object tracking method with partial occlusion handling using salient feature points
Sharma Human detection and tracking using background subtraction in visual surveillance
US20220366570A1 (en) Object tracking device and object tracking method
CN108875500B (en) Pedestrian re-identification method, device and system and storage medium
Ali et al. Vehicle detection and tracking in UAV imagery via YOLOv3 and Kalman filter
KR101243294B1 (en) Method and apparatus for extracting and tracking moving objects
Mishra et al. Recent trends in pedestrian detection for robotic vision using deep learning techniques
Conaire et al. Multispectral object segmentation and retrieval in surveillance video
KR101690050B1 (en) Intelligent video security system
Verma et al. Analysis of moving object detection and tracking in video surveillance system
Venu Object Detection in Motion Estimation and Tracking analysis for IoT devices
Yasir et al. Review on real time background extraction: models, applications, environments, challenges and evaluation approaches
Qureshi et al. Highway traffic surveillance over UAV dataset via blob detection and histogram of gradient
Demars et al. Multispectral detection and tracking of multiple moving targets in cluttered urban environments
Dave et al. Statistical survey on object detection and tracking methodologies
Savakis et al. Semantic background estimation in video sequences

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant