CN113052139A

CN113052139A - Deep learning double-flow network-based climbing behavior detection method and system

Info

Publication number: CN113052139A
Application number: CN202110448771.7A
Authority: CN
Inventors: 张泉; 赵曼; 刘海峰; 任广鑫; 张明; 季坤; 吴迪; 甄超; 王坤; 王刘芳; 郑浩
Original assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd; State Grid Anhui Electric Power Co Ltd
Current assignee: Hefei Zhongke Leinao Intelligent Technology Co ltd; State Grid Anhui Electric Power Co Ltd
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2021-06-29

Abstract

The invention discloses a climbing behavior detection method and system based on a deep learning double-flow network, belonging to the technical field of behavior identification by machine vision and comprising the following steps: s1: target detection, tracking and numbering; s2: cutting a target video clip; s3: random sampling; s4: and (5) classifying the actions. The classification network obtained through learning has good robustness, and can be accurately classified under different illumination and different weather conditions, so that multi-user behavior detection under complex conditions is realized; and the video is cut, redundant background information is removed, algorithm execution efficiency is greatly improved, detection efficiency is effectively improved by utilizing a pedestrian tracking random sampling method, and the method is worthy of being popularized and used.

Description

Deep learning double-flow network-based climbing behavior detection method and system

Technical Field

The invention relates to the technical field of behavior recognition by machine vision, in particular to a climbing behavior detection method and system based on a deep learning double-flow network.

Background

The climbing behavior detection is an important module in the field of intelligent video monitoring, and is widely applied to video monitoring systems in public places. Climbing behavior detection finds timely that people climb fence enclosing wall behaviors, automatically sends out corresponding warning or notice, and reduces investment of security and protection human resources. The climbing behavior identification mainly solves two problems, namely, the detection problem is that whether a person exists in an image is detected by using a detector; and secondly, identifying the problem, namely extracting the motion characteristics of the person and identifying the behavior of the person through a classifier.

The existing method for detecting the climbing behavior of the personnel utilizes the behavior recognition, calculates the star-shaped skeleton characteristics of the human body according to the silhouette of the human body, then classifies the skeleton characteristics into 4 states of walking, climbing, crossing and descending, and considers that the behavior of the personnel crossing the enclosure occurs when 3 states of climbing, crossing and descending continuously appear. This method is ideal and can only be used in an ideal environment with only one person, and is very poor in practical application environment. Some traditional visual methods are used for extracting human behavior features, and HMM or Bayesian networks are used for modeling and classifying the features, but the method also faces the problems that the target occlusion is serious and the manually designed behavior features are difficult to extract. The optical flow is calculated for a moving object, then the HMM or the Bayesian network is used for modeling the optical flow and the optical flow is analyzed by the classifier, and then some abnormal behaviors are detected. Therefore, a climbing behavior detection method based on a deep learning double-flow network is provided.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: how to solve the problems of poor application effect, high calculation complexity and the like of a behavior recognition method for detecting the climbing behavior of personnel, and provides a climbing behavior detection method based on a deep learning double-flow network.

The invention solves the technical problems through the following technical scheme, and the invention comprises the following steps:

s1: target detection, tracking and numbering

Carrying out pedestrian detection on the original video by using a target detection network to obtain a pedestrian detection frame; tracking by using the detection frame and the time sequence information of the video to obtain a target number;

s2: cropping a target video segment

Cutting the original video according to the detection frame and the target number, removing the area outside the detection frame, and storing the pedestrian with each number into a video segment again;

s3: random sampling

Randomly sampling and setting the number of sampling frames for each set frame number aiming at each video clip in the step S2, and calculating by using a dense optical flow method to obtain optical flow information of each pixel of the sampling frames;

s4: action classification

And (3) sending the color image and the optical flow information of each sampling frame into a climbing binary double-flow network, classifying the set frame number, and determining whether the climbing behavior exists.

Further, in the step S1, the adopted target detection network is a YOLO network, and a plurality of pedestrian targets in the video are simultaneously processed through the YOLO network, and the contour region of each pedestrian target is selected as a subsequent processing candidate region in a form of a rectangular surrounding frame.

Further, the specific process of step S2 is as follows:

s21: cutting the candidate area obtained in the step S1, cutting a square area by taking the maximum value of the length or width of the rectangular surrounding frame of each area as the side length and the central point of the rectangular frame as the cutting center, and then adjusting the size of the image to a set size;

s22: an image buffer pool of 30 is created for each person based on the pedestrian number obtained in step S1, the resized image is placed in the buffer pool, and when the image buffer reaches 30 sheets, step S3 is performed.

Furthermore, the buffer pool is used for temporary video storage established for each pedestrian target, corresponding to each pedestrian number, when the storage amount reaches 30 ℃, the buffer pool is emptied, the storage of a new picture is restarted, and the corresponding pedestrian number is unchanged. And when the content of the buffer pool cannot be updated for a long time, the pedestrian with the number corresponding to the buffer pool leaves the video monitoring range, and the buffer pool is destroyed after the preset updating time is exceeded.

Further, in step S3, the dense optical flow calculation uses a Farneback algorithm to obtain images of two adjacent frames, regards the images as a function of two-dimensional signals, sets a neighborhood (generally a square area of 2n + 1) around each pixel point, and uses a least square method to construct a functional relation between a gray value and a position, so as to convert a two-dimensional signal space of an original cartesian coordinate system image to another vector space, and obtain a pixel displacement difference between the two frames to obtain an optical flow.

Further, in the step S4, the training process of the dual-stream network is as follows:

s41: making a binary data set for training, extracting each frame of the acquired video segment, performing pedestrian detection, pedestrian tracking and cutting according to the previous method, calculating the optical flow of each frame, and storing the cut original image and the optical flow image of the corresponding position according to the manually marked type, wherein the image is climbed as a positive sample and the image is not climbed as a negative sample;

s42: and randomly selecting 3 cut original frames and optical flows of corresponding areas from a positive sample library as positive samples, selecting negative samples by using the same method, sending the negative samples into a double-flow network for classification training, and obtaining and storing the double-flow network after training.

Further, in the step S4, the structure of the dual-stream network is as follows:

the invention also provides a climbing behavior detection system based on the deep learning double-flow network, which adopts the detection method to detect the climbing behavior and comprises the following steps:

the target detection module is used for detecting pedestrians in the original video by using a target detection network to obtain a pedestrian detection frame; tracking by using the detection frame and the time sequence information of the video to obtain a target number;

the segment cutting module is used for cutting the original video according to the detection frame and the target number, removing the area outside the detection frame and storing the pedestrian with each number into a video segment again;

the random sampling module is used for randomly sampling and setting the number of sampling frames for each set frame number aiming at each video clip, and calculating by using a dense optical flow method to obtain the optical flow information of each pixel of the sampling frames;

the motion classification module is used for sending the color image and the optical flow information of each sampling frame into a climbing two-classification double-flow network, classifying the set frame number and determining whether climbing behaviors exist or not;

the central processing module is used for sending instructions to other modules to complete related actions;

the target detection module, the fragment cutting module, the random sampling module and the action classification module are all electrically connected with the central processing module.

Compared with the prior art, the invention has the following advantages: according to the climbing behavior detection method based on the deep learning double-flow network, the classification network obtained through learning has good robustness, accurate classification can be achieved under different illumination and different weather, and multi-person behavior detection under complex conditions is achieved; and the video is cut, redundant background information is removed, algorithm execution efficiency is greatly improved, detection efficiency is effectively improved by utilizing a pedestrian tracking random sampling method, and the method is worthy of being popularized and used.

Drawings

FIG. 1 is a schematic overall flow chart of a second embodiment of the present invention;

FIG. 2 is a schematic diagram illustrating classification of an image buffer pool by climbing a two-class bi-flow network according to a second embodiment of the present invention;

fig. 3 is a structure diagram of a TSN dual-flow network in the second embodiment of the present invention;

FIG. 4a is a structural diagram of a spatial convolution network according to a second embodiment of the present invention;

fig. 4b is a structural diagram of a time convolution network in the second embodiment of the present invention.

Detailed Description

The following examples are given for the detailed implementation and specific operation of the present invention, but the scope of the present invention is not limited to the following examples.

The embodiment provides a technical scheme: a climbing behavior detection method based on a deep learning double-flow network comprises the following steps:

s1: target detection, tracking and numbering

s2: cropping a target video segment

s3: random sampling

s4: action classification

In this embodiment, in the step S1, the adopted target detection network is a YOLO network, and a plurality of pedestrian targets in the video are simultaneously processed through the YOLO network, and the contour region of each pedestrian target is selected as the subsequent processing candidate region in a frame form of a rectangular surrounding frame.

In this embodiment, the specific process of step S2 is as follows:

In this embodiment, the buffer pool is used for temporary video storage established for each pedestrian target, and corresponds to each pedestrian number, when the storage amount reaches 30, the buffer pool is emptied, storage of a new picture is restarted, and the corresponding pedestrian number is unchanged. And when the content of the buffer pool cannot be updated for a long time, the pedestrian with the number corresponding to the buffer pool leaves the video monitoring range, and the buffer pool is destroyed after the preset updating time is exceeded.

In this embodiment, in step S3, the dense optical flow calculation uses a Farneback algorithm to obtain images of two adjacent frames, the images are regarded as a function of two-dimensional signals, a neighborhood (generally, a square region of 2n + 1) is set around each pixel, a function relation between a gray value and a position is constructed by using a least square method, and then a two-dimensional signal space of an original cartesian coordinate system image is converted into another vector space to obtain a pixel displacement difference between the two frames, so as to obtain an optical flow.

In this embodiment, in step S4, the training process of the dual-stream network is as follows:

The embodiment also provides a system for detecting the climbing behavior based on the deep learning double-flow network, which detects the climbing behavior by adopting the detection method, and comprises the following steps:

Example two

As shown in fig. 1, the embodiment provides a method for detecting a climbing behavior based on a deep learning dual-flow network, which includes the following specific processes:

step 1: firstly, using a target detection network to detect pedestrians in an original video to obtain a detection frame of the pedestrians; tracking by using the detection frame and the time sequence information of the video to obtain a target number;

the target detection network adopted in the embodiment is a YOLO network, the YOLO network can simultaneously detect a plurality of pedestrian targets by using a YOLO target detection algorithm, and the outline area of each pedestrian target is selected by using a rectangular bounding box to serve as a subsequent processing candidate area.

Step 2, cutting the original video according to the detection frame and the target number, removing the area outside the detection frame, and storing the pedestrian with each number into a video segment again;

the method comprises the following specific steps:

cutting the candidate regions obtained in the step 1, cutting a square region by taking the maximum value of the length or width of a rectangular frame of each region as the side length and the central point of the rectangular frame as a cutting center, and then adjusting the size of the image to 224 × 224;

and (3) establishing an image buffer pool for each person according to the pedestrian number obtained in the step (1) and the sampling frequency set during the double-current network training, and placing the image with the adjusted size into the buffer pool. And when the image buffer reaches the set number, performing the step 3. The size of the buffer pool is set according to the data sampling frequency during the training of the TSN dual-flow network, and if the TSN is trained and every 30 frames are sliced, the size of the buffer pool is also 30 during inference.

The buffer pool is used for temporarily storing the video established for each pedestrian target, corresponds to each pedestrian number, is emptied when the storage capacity reaches 30, and restarts the storage of new pictures without changing the corresponding pedestrian number. And when the content of the buffer pool cannot be updated for a long time, the pedestrian with the number corresponding to the buffer pool leaves the video monitoring range, and the buffer pool is destroyed after the preset updating time is exceeded.

The existing algorithm can only be applied to an ideal environment with only one person in a video generally, and has poor effect in an actual environment. The method can detect the climbing behavior under the condition that a plurality of people exist in the original video, and has better practical application effect. As shown in fig. 2, n pedestrians exist in the original video, the n pedestrians are detected through the target detection network, the detection frames are marked, tracking is performed by using the detection frames and the time sequence information of the video, and each pedestrian is numbered. Then, the video is edited again according to the detection frames and the serial numbers, the pedestrian with each serial number is saved into a video segment again, each video segment only keeps the area in the pedestrian detection frame with the corresponding serial number, for example, the video I is the area in the detection frame of the person I cut out from the original video, and the video I

Cutting out people from an original video

Regions within the frame are detected.

And 3, randomly sampling 3 frames in every 30 frames of each video segment, and calculating by using a dense optical flow method to obtain optical flow information of each pixel of the 3 frames.

The dense optical flow calculation adopts a Farneback algorithm, takes images of two adjacent frames, regards the images as a function of a two-dimensional signal, sets a neighborhood (generally a square area with 2n +1, and if n is 2, the neighborhood is a square area with 5 x 5) around each pixel point, and adopts a least square method to construct a function relation of a gray value and a position. And converting the two-dimensional signal space of the original Cartesian coordinate system image into other vector spaces, and obtaining the pixel displacement difference between two frames to obtain the optical flow.

Compared with a sparse optical flow method, the dense optical flow method has a better registration effect, can more accurately compare the action change of pedestrians in the image, and provides more accurate time sequence information for the next climbing identification. Accurate time sequence information reduces the requirement of the number of samples, and the detection efficiency can be effectively improved by adopting a random sampling method.

And 4, sending the color image and the optical flow information of the 3 frames into a climbing two-class double-flow network, classifying the 30 frames, and determining whether climbing exists.

The climbing recognition belongs to a binary task, and the image buffer pool is divided into a climbing state or an unsmoothed state.

The double-flow network training process firstly needs to make a binary data set for training, extracts each frame of an acquired video segment, performs pedestrian detection, pedestrian tracking and cutting according to the previous method, calculates the optical flow of each frame, stores the cut original image and the optical flow image of the corresponding position according to the manually marked type, wherein climbing is used as a positive sample, and non-climbing is used as a negative sample. And during training, randomly selecting 3 cut original frames from a positive sample library and optical flows of corresponding areas as positive samples, selecting negative samples by using the same method, and sending the negative samples into a double-current network for classification training.

As shown in fig. 3, the dual-stream network adopted in this embodiment is a TSN dual-stream network, the TSN network is a variant of a twin network, and is divided into a time convolution network and a space convolution network, and parameters of the two convolution networks are not shared. The three color pictures respectively pass through a spatial convolution network, then three output results are fused through a segment consensus function to generate spatial segment consensus, the optical flow information of the three color pictures respectively passes through a temporal convolution network, and then the three output results are fused through the segment consensus function to generate time segment consensus. The prediction fusion of all modes then yields the final prediction result.

As shown in fig. 4a and 4b, the structure diagrams of the spatial convolution network and the temporal convolution network in the TSN dual-flow network are shown, the two convolution networks only have a slight difference in the input layer, the input data of the spatial convolution network is 224 × 3, which is three channels of RGB, and the input data of the temporal convolution network is 224 × 2, which is two channels of optical flow, vertical and horizontal.

After training, the dual-flow network can perform secondary classification on 3 frames of RGB images and optical flow images with the size of 224 × 224, so as to achieve the purpose of classifying the buffer pool.

The classification network obtained through learning has good robustness, and can be accurately classified under different illumination and different weather conditions.

In summary, in the method for detecting a climbing behavior based on a deep learning dual-flow network according to the embodiment, the classification network obtained through learning has good robustness, and can be accurately classified under different illumination and different weather conditions, so that multi-user behavior detection under complex conditions is realized; and the video is cut, redundant background information is removed, algorithm execution efficiency is greatly improved, detection efficiency is effectively improved by utilizing a pedestrian tracking random sampling method, and the method is worthy of being popularized and used.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A climbing behavior detection method based on a deep learning double-flow network is characterized by comprising the following steps:

s1: target detection, tracking and numbering

s2: cropping a target video segment

s3: random sampling

s4: action classification

2. The method for detecting the climbing behavior based on the deep learning dual-flow network according to claim 1, characterized in that: in step S1, the adopted target detection network is a YOLO network, and a plurality of pedestrian targets in the video are simultaneously processed through the YOLO network, and the contour region of each pedestrian target is framed in a form of a rectangular bounding box and is used as a subsequent processing candidate region.

3. The method for detecting the climbing behavior based on the deep learning dual-flow network as claimed in claim 2, wherein: the specific process of step S2 is as follows:

4. The method for detecting the climbing behavior based on the deep learning dual-flow network as claimed in claim 3, wherein: the buffer pool is used for temporary video storage established for each pedestrian target, corresponds to each pedestrian number, is emptied when the storage capacity reaches 30, and restarts storage of new pictures without changing the corresponding pedestrian number; and when the content of the buffer pool cannot be updated for a long time, the pedestrian with the number corresponding to the buffer pool leaves the video monitoring range, and the buffer pool is destroyed after the preset updating time is exceeded.

5. The method for detecting the climbing behavior based on the deep learning dual-flow network according to claim 1, characterized in that: in step S3, the dense optical flow calculation uses a Farneback algorithm to obtain images of two adjacent frames, regards the images as a function of two-dimensional signals, sets a neighborhood around each pixel, and uses a least square method to construct a function relation between a gray value and a position, so as to convert a two-dimensional signal space of an original cartesian coordinate system image to another vector space, and obtain a pixel displacement difference between the two frames to obtain an optical flow.

6. The method for detecting the climbing behavior based on the deep learning dual-flow network according to claim 1, characterized in that: in step S4, the training process of the dual-stream network is as follows:

7. A climbing behavior detection system based on a deep learning dual-flow network is characterized in that the climbing behavior is detected by the detection method according to any one of claims 1 to 6, and the method comprises the following steps: