CN114581847A

CN114581847A - Method and device for detecting abnormal behaviors of pedestrians in community based on GAM tracker

Info

Publication number: CN114581847A
Application number: CN202210208816.8A
Authority: CN
Inventors: 王智慧; 李名帅; 刘辰旭; 孙瑞雪; 葛铭昌; 崔宾阁; 于建志; 包永堂
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2022-03-04
Filing date: 2022-03-04
Publication date: 2022-06-03
Anticipated expiration: 2042-03-04
Also published as: CN114581847B

Abstract

The invention discloses a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are used for detecting targets by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, fusing and merging later selected attention models, optimizing by using the GAM tracker, determining coordinates of key points according to matching scores of candidate attention maps, and judging whether the targets run quickly by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating motion frequency is higher than that of normal motion of the pedestrian, the abnormal behavior judgment result is stable, the method for detecting the abnormal behavior of the pedestrian in the community is improved, and the working efficiency of emergency situations is improved.

Description

Method and device for detecting abnormal behaviors of pedestrians in community based on GAM tracker

Technical Field

The invention relates to the technical field of deep learning and computer vision, in particular to a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker.

Background

The current society often takes place to steal, the unexpected condition such as old man and children fall down, if do not have other people witnessed around at the time of the incident, reconnaissance to stealing and the timely solution to the problem that brings falling down can have the difficulty, have promoted security personnel's the work degree of difficulty.

The multi-pedestrian tracking method with GAM is based on a multi-target tracking framework of detection, shows own capability in a plurality of data sets, and has good tracking effect even in dense crowds. The multi-row person tracking method with the GAM tracker has better tracking precision and reasoning speed than other existing tracking frames, and the switching times of the tracking target objects are obviously reduced.

In order to detect the emergency situation of the pedestrians in the community and timely give alarm feedback to improve the efficiency of security personnel, a new method for detecting the abnormal behaviors of the pedestrians in the community is urgently needed to be provided on the basis of the existing GAM.

Disclosure of Invention

The invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are characterized in that on the basis of a multi-pedestrian tracking method with the GAM tracker in the prior art, an abnormal judgment logic for analyzing the track of the pedestrians is added, and the real-time monitoring of the abnormal running, falling and loitering of the pedestrians is realized by virtue of an excellent tracking effect.

The specific technical scheme provided by the invention is as follows:

in one aspect, the invention provides a method for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which comprises the following steps:

inputting a video image data frame to be processed into a pre-trained YOLOX network, acquiring a high-resolution characteristic diagram, and detecting pedestrians by adopting a target detector;

based on parallel output ends and an anchor-free method thereof added on a YOLOX network and used for extracting target differentiation characteristics, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends, the fusion analysis is used for estimating a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out;

judging whether the target to be detected runs or not by adopting a normal distribution and threshold value method according to the target coordinate change speed detected in a preset time period based on the target feature extraction result; and/or the presence of a gas in the gas,

calculating the ratio of the height to the width of the pedestrian according to the detection result of the target to be detected based on the target feature extraction result, and judging whether the target to be detected falls down or not according to the calculated ratio of the length to the width; and/or the presence of a gas in the gas,

judging whether the target to be detected is abnormally loitering or not by adopting a vector method and a threshold value method according to the relation between the moving distance and the reciprocating times of the target to be detected in the video area and a preset loitering distance threshold value on the basis of the target feature extraction result;

if at least one of running, falling or abnormal loitering of the target to be detected exists in the video image data frames to be processed, marking the video image data frames to be processed as data frames with community pedestrian abnormal behaviors.

Optionally, the inputting a video image data frame to be processed into a pre-trained YOLOX network to obtain a high-resolution feature map, and performing pedestrian detection by using a target detector specifically includes:

inputting differentiated features output from a YOLOX network into a GAM tracker, expanding an image acquisition area of a target detected by the GAM tracker to at least twice the width height of a candidate of the map, and positioning according to the center position of the image acquisition area of the detected target;

adjusting an image in an image acquisition area to a standard size [ H ]_s，W_s]Then, carrying out normalized extraction;

determining candidates based on a sparsity selection policy, and extracting feature parameters corresponding to an image in the image acquisition region and the candidate directions, wherein the candidate widths and heights are 1/2 of the standard size, respectively;

inputting at least one of the candidates to a graph attention model, estimating a correlation between the target and the candidate;

and fusing and combining the attention models of all the candidates, and outputting a target detection result through the GAM tracker.

Optionally, based on parallel output ends added to the YOLOX network for extracting the target differentiation features and an anchor-free method thereof, fusion analysis is performed in the GAM tracker with the added output ends as inputs to estimate a matching degree heatmap between the to-be-detected result and the tracked target, so as to perform correlation matching between the tracked target and the detected result, specifically including:

estimating a graph attention model after carrying out 3 × 3 convolution operation on the characteristics of the YOLOX network, and generating a characteristic graph through a 1 × 1 convolution layer;

performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating a target type result through a 1 × 1 convolution layer;

performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating a frame size and target center offset through different 1 × 1 convolution layers;

and performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating individual differentiation features through a 1 × 1 convolution layer.

Optionally, the estimating a graph attention model after performing a convolution operation on the features of the YOLOX network by 3 × 3, and generating a feature graph by a 1 × 1 convolution layer specifically includes:

performing non-maximum suppression (NMS) according to the graph attention score to extract peak keypoints, preserving coordinates of keypoints for which the graph attention score is greater than a threshold;

and generating a bounding box and individual differential characteristics according to the target classification, the center offset and the target frame size, and extracting identity embedding in the estimated target center.

Optionally, based on parallel output ends added to the YOLOX network for extracting the target differentiation features and an anchor-free method thereof, fusion analysis is performed in the GAM tracker with the added output ends as inputs to estimate a matching degree heatmap between the to-be-detected result and the tracked target, so as to perform association matching between the tracked target and the detected result, and before determining whether the to-be-detected target runs, the method further includes:

the position coordinate based on the target is [ h ]_t，w_t]Determining the coordinate of the corresponding body gravity center in the frame as [ h_d，w_d]And determining the coordinate of the corresponding body barycenter in the candidate as [ h_c，w_c]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity.

Optionally, the target-based body barycentric coordinate is [ h ]_t，w_t]Determining the coordinate of the center of gravity of the corresponding body in the pedestrian frame as [ h_d，w_d]And determining the sitting position of the corresponding body barycenter in the candidateMarked by [ h ]_c，w_c]The method specifically comprises the following steps:

initializing a plurality of sub-tracks according to the estimation of the frame in the first frame of the video image data frame to be processed;

connecting the detected frame with the existing tracking segment in the frame after the first frame in the video image data frame to be processed;

predicting the position of the target in the current frame by adopting a Kalman filtering function; and if the distance between the coordinate of the target and the predicted position is detected to be larger than a preset value, setting the corresponding cost to be infinite.

Optionally, the determining, based on the target feature extraction result, whether the target to be detected runs or not by using a normal distribution and a threshold method according to the target coordinate change speed detected within a preset time period specifically includes:

minimizing the image capture area based on a position of the video image data frame prediction target;

establishing a coordinate system for the video image, and recording and storing the identifications and coordinates of all currently detected targets;

collecting coordinate information and speeds of a preset number aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result;

performing image analysis on the image in the target frame to determine the target gravity center;

and when the target gravity center deviation amount is larger than a preset threshold and keeps a first preset duration, determining that the target runs abnormally, and marking the target frame.

Optionally, the extracting a result based on the target feature, calculating a ratio of the height to the width according to a detection result of the target to be detected, and determining whether the target to be detected falls down according to the calculated ratio of the length to the width specifically includes:

detecting width and height information of frame regression size of the target at intervals of a first fixed frame number, and calculating a width and height ratio of the target;

when the width-height ratio continuously detected within a second preset time length is larger than a preset width-height threshold value, judging that the target falls down abnormally, and marking the target frame.

Optionally, the determining, based on the target feature extraction result, whether the target to be detected has abnormal loitering by using a vector method and a threshold method in combination according to a relationship between a distance and a reciprocating frequency of the target to be detected moving in the video region and a preset loitering distance threshold, specifically includes:

recording the reciprocating motion times of the pedestrians in the video image, recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target;

if the included angle of the motion direction vectors at the t-th moment and the t-1 th moment is larger than a preset value, determining that the motion direction vector is a reciprocating motion;

when the reciprocating times reach a preset threshold value, judging that the target is abnormally loitering, and marking the target frame.

In one aspect, the invention further provides a GAM tracker-based community pedestrian abnormal behavior detection apparatus comprising computer readable instructions which, when run on a computer, cause the computer to perform any of the methods described above.

The invention has the following beneficial effects:

the invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community of a GAM tracker, which are used for detecting targets by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, fusing and merging later selected attention models, optimizing by using the GAM tracker, determining coordinates of key points according to matching scores of candidate attention maps, and judging whether the targets run quickly by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating times are larger than the normal movement times of pedestrians, the abnormal behavior judgment result is stable, the method for detecting the abnormal behaviors of the pedestrians in the community is improved, and the working efficiency of emergencies is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a method for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker according to an embodiment of the present invention;

FIG. 2 is a schematic flow chart of inputting a frame of video image data to be processed into a pre-trained YOLOX network to obtain an output result according to an embodiment of the present invention;

figure 3 is a schematic diagram of the operation of the GAM tracker employed in an embodiment of the present invention;

fig. 4 is a schematic flowchart illustrating an exemplary process of determining whether a target to be detected runs according to an embodiment of the present invention;

fig. 5 is a schematic flowchart of an exemplary process for determining whether a target to be detected falls according to an embodiment of the present invention;

fig. 6 is a schematic flowchart illustrating an exemplary process of determining whether an abnormal loitering exists in a target to be detected according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "comprises," "comprising," and "having," and any variations thereof, in the description and claims of this invention, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

A method and an apparatus for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker according to an embodiment of the present invention will be described in detail with reference to fig. 1 to 6.

Referring to fig. 1, a method for detecting abnormal behaviors of a community pedestrian based on a GAM tracker according to an embodiment of the present invention includes:

s1: and inputting a video image data frame to be processed into a pre-trained YOLOX network, acquiring a high-resolution characteristic diagram, and detecting pedestrians by adopting a target detector.

The working principle diagram of the GAM tracker used in the embodiment of the present application is shown in fig. 3.

In the process of acquiring the high-resolution feature map, ResNet-34 (residual neural network) is adopted as a backbone network so as to achieve a good balance between accuracy and speed. Pedestrian detection is performed by a YOLOX detector, and in order to improve the efficiency of subsequent calculation, only the detection of pedestrians is reserved, and prediction parameters of other categories are tailored.

Specifically, fig. 2 shows a schematic flow chart of inputting a video image data frame to be processed into a previously trained YOLOX network to obtain an output result. The method specifically comprises the following steps:

s11: inputting the differentiated features output from the YOLOX network to a GAM tracker, expanding an image-capturing region of a target detected by the GAM tracker to at least twice the width-height of a candidate of the map, and performing positioning according to the center position of the image-capturing region of the detected target.

S12: adjusting an image in an image acquisition area to a standard size [ H ]_s，W_s]And then carrying out normalized extraction.

S13: determining candidates based on a sparsity selection policy, and extracting feature parameters corresponding to an image in the image-acquisition region and the candidate directions, wherein the candidate widths and heights are 1/2 of the standard size, respectively.

In the calculation process, based on a sparse selection strategy, selecting a plurality of sizes of [ H ]_c，W_c]＝[H_s/2，W_s/2]And extracting corresponding features sc from the features uniformly extracted from the search area.

At least one of the candidates is input to a graph attention model, and a correlation between the target and the candidate is estimated, S14.

For each valid ith tracking target, the feature template combines the extracted features S of the rectangles detected on the current n frames_i＝{s_i，1，s_i，2，...，s_i，n}. To improve the computational efficiency, n is set to 10. To keep the feature size consistent, the detected rectangle is also adjusted to [ H ] before feature extraction_t，W_t]＝[H_C，W_c]The size of (2).

Candidate c_jGAM tracker

Is used to estimate the correlation between the tth tracked target and each candidate, each eigen channel [ H_c，W_c]Are the same size. Graph attention model

Is given by each candidate c at the ith target and position_jS (h, w, n) between_c) To estimate and normalize:

and S15, fusing and merging the graph attention models of all the candidates, and outputting a target detection result through the GAM tracker.

Fused ofGraph attention model M_i，jBased on the graphical attention model of all these candidates, the calculation method is as follows:

and

wherein c is_j∈C_jAnd C_jIs the candidate set for the jth detected pedestrian, n_c∈{1，...，N_CIs the channel index of the merged attention map, N_cThe number of channels is the same as the number of extracted characteristic channels.

Attention model M of merged graph_i，jAs an output map of the GAM tracker.

S2: based on parallel output ends added to the YOLOX network and used for extracting target differentiation features and an anchor-free method thereof, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends, the fusion analysis is used for estimating a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out.

In the process, the difference between the predicted result and the actual real result is solved by using a preset loss function. In particular, the method comprises the following steps of,

and estimating a graph attention model after carrying out 3 x 3 convolution operation on the characteristics of the YOLOX network, and generating a characteristic graph through a 1 x 1 convolution layer.

performing non-maximum suppression (NMS) according to the graph attention score to extract peak keypoints, preserving coordinates of keypoints with graph attention scores greater than a threshold, then generating bounding boxes according to the target classification, center offset and the target bounding box size, and individual differentiation features, and extracting identity embedding in the estimated target center.

(II) performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating target frame size target center offset through different 1 × 1 convolution layers;

and (III) performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating individual difference features through a 1 × 1 convolution layer.

In the above steps S22 and S23, the L1 loss function is used, and the offset and the size loss are calculated together.

And the goal of the target center offset output is to more accurately locate the object. This will introduce a non-negligible quantization error since the output graph spans 4. The branch estimates the successive offset of each pixel relative to the center of the object to mitigate the effects of downsampling. The bounding box size output is responsible for estimating the height and width of the bounding box of the object at each coordinate location.

After that, the part coordinate based on the object is [ h ]_t，w_t]Determining the coordinate of the corresponding body gravity center in the frame as [ h_d，w_d]And determining the coordinate of the corresponding body barycenter in the candidate as [ h_c，w_c]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity. In particular, the method comprises the following steps of,

assume a position of a specific body center of gravity on the target as [ h ]_t，w_t]The detected same body center of gravity on the rectangle is [ h ]_d，w_d]. When selecting candidates, the position of the body center of gravity on each candidate is [ h ]_c，w_c]Wherein c is_j∈C_j. The positioning distance D of the body center of gravity between the target and the detected rectangle_ijIs always greater than D_i，c_jWherein one candidate has the following relationship:

in particular, this step may be performed according to the following sub-steps:

initializing a plurality of sub-tracks according to the estimation of the frame in the first frame of the video image data frame to be processed.

S3: and judging whether the target to be detected runs or not by adopting normal distribution and a threshold value method according to the target coordinate change speed detected in a preset time period based on the target characteristic extraction result.

As shown in fig. 4, a schematic flowchart of the process of determining whether a running occurs to a target to be detected is exemplarily shown, specifically:

s31: minimizing the image capture area based on a location of a predicted target of the frame of video image data.

For each frame, predicting the location of the target before target association can minimize the search area and improve the accuracy of the association. The method utilizes a Kalman filtering method to predict the target position. The current position and velocity of the target is first predicted using a kalman filter, and the parameters of the kalman filter are updated with previously collected information.

The predicting step is performed by the following formula:

wherein, the first and the second end of the pipe are connected with each other,

containing the predicted position and velocity of the ith target on the t-th frame,

containing the predicted position and velocity of the ith object in the t-1 th frame, F_tIs a state transition matrix, u_tIs a control vector, P_tIs the result of prediction

Of the covariance matrix, Q_tIs the noise covariance matrix in the prediction process.

And after the tracking process of the current frame is finished, updating the parameters of the Kalman filter to adapt to the actual situation. The updating step is performed by the following formula:

where K is the Kalman gain, z_tIs the position of each object, R_tIs an observation result z_tThe covariance of the uncertainty (i.e. the covariance of the sensor noise),

P′_tk' is

P_tK, and is fed back in the next prediction and update round.

Comparing the target position predicted by Kalman filtering with the actually detected target position, and when the distance between the two is greater than a set threshold value K_kalThe target is considered to have run abnormally.

S32: and establishing a coordinate system for the video image, and recording and storing the identifications and coordinates of all the currently detected targets.

And establishing a rectangular coordinate system by taking the upper left corner of the image as a coordinate origin, the direction of the width as an X coordinate and the direction of the height as a Y coordinate. Continuously recording all currently detected targets and corresponding center position coordinates by a fixed interval frame number i and storing:

Pos_x[id][t]＝x#(7)

Pos_y[id][t]＝y#(8)

wherein the subscript t is automatically increased by 1 after each storage and t in the above two formulas always remains synchronized, and then the velocity of the target is calculated according to the following formula:

wherein i is the interval frame number of the acquisition coordinate position, the velocity V of the target is calculated by the formula by using the previously stored position information, and the same subscript t stored in V _ now [ id ] [ t ] is automatically added with 1 after each storage. And simultaneously according to the formula:

and calculating the ratio K of the speed of the target at the moment to the speed detected last time, and storing the ratio K in K [ id ] [ t ] and adding 1 to t after each storage.

S33: and collecting coordinate information and speeds of a preset number aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result.

After a certain amount of coordinate information of the same target is collected and the speed is calculated, a Gaussian distribution model is built for the speed distribution of the target according to the existing speed information:

v～N(μ，σ²)#(11)

where μ is the mean value, σ²Is variance, and the Gaussian model of the velocity distribution of the target is updated in real time after the velocity of the target is calculated each time.

S34: and carrying out image analysis on the images in the frame of the target to determine the gravity center of the target.

Cutting the image according to the target frame to obtain an individual image of each target, wherein the upper left corner of the cut image is used as an origin, and the lower part of the cut image is used as a bottomEstablishing a coordinate system for the positive direction of the X axis in the positive direction of the Y axis and the right direction, obtaining the coordinates (X, Y) of each pixel, and recording X, Y the maximum value in the direction, and recording as: x is the number of_maxAnd y_maxThen cv processing is carried out on the picture to obtain a target gray-scale image, a histogram corresponding to the gray-scale image is calculated, a numerical range with concentrated gray-scale values is screened out according to the numerical distribution condition in the histogram, and the range is used as a gray-scale value interval D of the target_foreground(foreground), other values as background D_backgroundProcessing and defining a foreground mask identification formula:

and then traversing the image gray level image pixel by pixel according to a formula:

and calculating the target gravity center, wherein the (x, y) coordinates are obtained by establishing a coordinate system by using the origin of the upper left corner of the single cut icon.

S35: and when the target gravity center deviation amount is larger than a preset threshold and keeps a first preset duration, determining that the target runs abnormally, and marking the target frame.

The gravity center, the image center and the image width and height of the person are calculated according to the formula:

calculating the proportion when the target gravity center deviates from the proportion K_wGreater than a set threshold

And maintained for a certain time t_thrThe target is considered running abnormally and marked.

According to a preset speed ratio threshold K_thrDiscrete threshold v_thrAnd the y-coordinate of the target according to the formula:

calculating in real time a corresponding adaptive threshold according to the Y-coordinate of the target, wherein Pos_y[id][t]S is a preset fixed parameter for the Y coordinate of the target. If V > V_mOr K > K_mThe running exception is targeted and marked for display in the video.

S4: and calculating the ratio of the height to the width of the pedestrian according to the detection result of the target to be detected based on the target feature extraction result, and judging whether the target to be detected falls down or not according to the calculated ratio of the length to the width.

Fig. 5 exemplarily shows a schematic flow chart for determining whether the target to be detected falls, specifically:

s41: detecting width and height information of frame regression size of the target at intervals of a first fixed frame number, and calculating a width and height ratio of the target;

s42: when the width-height ratio continuously detected within a second preset time length is larger than a preset width-height threshold value, judging that the target falls down abnormally, and marking the target frame.

Illustratively, according to the detected width and height information of the target frame, every fixed frame number i, according to the formula:

computing a targetWhen the width-to-height ratio is continuously detected for a plurality of times to be larger than the set width-to-height threshold value and is kept for a certain time t_thrAnd judging that the object falls down abnormally, marking the object fall down, and displaying the object fall down in the video.

S5: and judging whether the target to be detected has abnormal loitering or not by adopting a vector method and a threshold value method according to the relation between the moving distance and the reciprocating times of the target to be detected in the video area and a preset loitering distance threshold value based on the target feature extraction result.

Fig. 6 exemplarily shows a flow chart for determining whether there is an abnormal loitering in the target to be detected, specifically:

s51: and recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target.

The total target movement distance Dis _ T and the target displacement (the linear distance between the current target position and the initial position) Dis _ P are set to be 0. And recording the coordinates of the center position of the target corresponding to each id at intervals of a fixed interval frame number i, and simultaneously storing coordinate position information based on the formulas (7) and (8) according to the following formulas:

the values of the movement distance Dis _ T and the displacement Dis _ P at the time T are updated.

And setting a reciprocating numerical value bf for the target corresponding to each id, and recording the reciprocating times of the target in the video. The initial value of bf is set to 0, and the slope at time t is calculated from the coordinate information stored in equations (7) and (8) by the following equation:

pos_tmp[id][t]＝{pos_x[id][t]，pos_y[id][t]}#(21)

and recording the central position corresponding to each id as a target walking track at intervals of a fixed interval frame number i. Then based on the motion trail of the target, according to the formula:

vector[id][t]＝{pos_tmp[id][t-1]，pos_tmp[id][t]}#(22)

s52: and if the included angle of the motion direction vectors at the t-th moment and the t-1 th moment is larger than a preset value, determining that the motion direction vector is a reciprocating motion.

And establishing a motion direction vector of the target, and setting a reciprocating numerical value bf for the target corresponding to each id for recording the reciprocating times of the target in the video. Setting the bf initial value as 0, and if the conditions of the t-th time and the t-1 time are met:

angel(vector[id][t-1]，vector[id][t]＞angel_thr#(23)

that is, the included angle between the vectors of the two forward and backward motion directions is larger than the set threshold value angel_thrThe target is considered to have a turn back, so that bf [ id ]]Plus 1. Then, then

When Dis _ T [ id ] is larger than 1.5Dis _ P [ id ] or bf [ id ] is larger than or equal to 3, the target is judged to be loitering abnormity, and the loitering abnormity is marked and displayed in the video.

S53: when the reciprocating times reach a preset threshold value, judging that the target is abnormally loitering, and marking the target frame.

S6: if at least one of running, falling or abnormal loitering of the target to be detected exists in the video image data frames to be processed, marking the video image data frames to be processed as data frames with community pedestrian abnormal behaviors.

The invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are characterized in that targets are detected by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, subsequently selected attention models are fused and merged, the GAM tracker is utilized for optimization, key point coordinates are determined according to matching scores of candidate attention maps, and whether the targets run quickly is judged by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating motion frequency is higher than that of normal motion of the pedestrian, the abnormal behavior judgment result is stable, the method for detecting the abnormal behavior of the pedestrian in the community is improved, and the working efficiency of emergency situations is improved.

Based on the same inventive concept, an embodiment of the present invention further provides a community pedestrian abnormal behavior detection apparatus based on a GAM tracker, which is characterized by comprising computer readable instructions, when the computer readable instructions are executed on a computer, the computer is caused to execute any one of the above methods.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A community pedestrian abnormal behavior detection method based on a GAM tracker is characterized by comprising the following steps:

inputting a video image data frame to be processed into a pre-trained YOLOX network to obtain a high-resolution feature map, and adopting a target detector to detect pedestrians;

based on parallel output ends added to the YOLOX network for extracting target differentiation characteristics and an anchor-free method thereof, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends to estimate a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out;

2. The method according to claim 1, wherein the step of inputting the video image data frame to be processed into a previously trained YOLOX network to obtain a high-resolution feature map and performing pedestrian detection by using a target detector comprises:

determining candidates based on a sparsity selection policy, extracting feature parameters corresponding to an image in the image acquisition region and the candidate directions, wherein the candidate width and height are 1/2 of the standard size respectively;

3. The method according to claim 1, wherein the parallel output ends added to the YOLOX network for extracting the differentiated features of the target and the anchor-free method thereof are used as inputs in a GAM tracker for fusion analysis to estimate a heat map of matching degree between the result to be detected and the tracked target, and further perform the correlation matching between the tracked target and the detected result, and specifically comprises:

performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating a target frame size and target center offset through different 1 × 1 convolution layers;

4. The method according to claim 3, wherein the estimating a graph attention model after performing a 3 × 3 convolution operation on the features of the YOLOX network and generating a feature graph through a 1 × 1 convolution layer specifically comprises:

and generating a boundary frame and individual differentiation characteristics according to the target classification, the center offset and the target frame size, and extracting identity embedding from the estimated target center.

5. The method for detecting abnormal pedestrian behaviors in community according to claim 1, wherein a parallel output end added to the YOLOX-based network for extracting differentiated features of targets and an anchor-free method thereof are used to perform fusion analysis in the GAM tracker with the added output end as input, so as to estimate a heat map of matching degree between the target to be detected and the tracked target, further perform correlation matching between the tracked target and the detection result, and before determining whether the target to be detected runs, the method further comprises:

the position coordinate based on the target is [ h ]_t，w_t]Determining the coordinate of the corresponding body gravity center in the frame as [ h_d，w_d]And determining the corresponding body barycentric coordinate in the candidate as [ h_c，w_c]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity.

6. The method according to claim 5, wherein the target-based body barycentric coordinate is [ h ]_t，w_t]Determining the coordinate of the corresponding body gravity center in the target frame as [ h_d，w_d]And determining the coordinate of the corresponding body barycenter in the candidate as [ h_c，w_c]The method specifically comprises the following steps:

7. The method according to claim 1, wherein the step of judging whether the target to be detected runs or not by adopting normal distribution and a threshold value method according to the target coordinate change speed detected within a preset time period based on the target feature extraction result specifically comprises:

collecting a preset amount of coordinate information and speeds aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result;

8. The method for detecting abnormal pedestrian behaviors in a community according to claim 1, wherein the step of calculating a ratio of height to width according to a detection result of the target to be detected based on the target feature extraction result and judging whether the target to be detected falls down according to the calculated ratio of length to width specifically comprises the steps of:

9. The method for detecting abnormal pedestrian behaviors in a community as claimed in claim 1, wherein the method for judging whether the target to be detected wanders abnormally or not by using a vector method and a threshold method according to a relationship between a distance and a reciprocating frequency of the target to be detected moving in a video region and a preset wandering distance threshold value based on the target feature extraction result specifically comprises the following steps of:

recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target;

10. A GAM tracker based community pedestrian abnormal behavior detection apparatus comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.