CN114581847A - Method and device for detecting abnormal behaviors of pedestrians in community based on GAM tracker - Google Patents

Method and device for detecting abnormal behaviors of pedestrians in community based on GAM tracker Download PDF

Info

Publication number
CN114581847A
CN114581847A CN202210208816.8A CN202210208816A CN114581847A CN 114581847 A CN114581847 A CN 114581847A CN 202210208816 A CN202210208816 A CN 202210208816A CN 114581847 A CN114581847 A CN 114581847A
Authority
CN
China
Prior art keywords
target
detected
frame
gam
tracker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210208816.8A
Other languages
Chinese (zh)
Other versions
CN114581847B (en
Inventor
王智慧
李名帅
刘辰旭
孙瑞雪
葛铭昌
崔宾阁
于建志
包永堂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Science and Technology
Original Assignee
Shandong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong University of Science and Technology filed Critical Shandong University of Science and Technology
Priority to CN202210208816.8A priority Critical patent/CN114581847B/en
Publication of CN114581847A publication Critical patent/CN114581847A/en
Application granted granted Critical
Publication of CN114581847B publication Critical patent/CN114581847B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are used for detecting targets by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, fusing and merging later selected attention models, optimizing by using the GAM tracker, determining coordinates of key points according to matching scores of candidate attention maps, and judging whether the targets run quickly by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating motion frequency is higher than that of normal motion of the pedestrian, the abnormal behavior judgment result is stable, the method for detecting the abnormal behavior of the pedestrian in the community is improved, and the working efficiency of emergency situations is improved.

Description

Method and device for detecting abnormal behaviors of pedestrians in community based on GAM tracker
Technical Field
The invention relates to the technical field of deep learning and computer vision, in particular to a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker.
Background
The current society often takes place to steal, the unexpected condition such as old man and children fall down, if do not have other people witnessed around at the time of the incident, reconnaissance to stealing and the timely solution to the problem that brings falling down can have the difficulty, have promoted security personnel's the work degree of difficulty.
The multi-pedestrian tracking method with GAM is based on a multi-target tracking framework of detection, shows own capability in a plurality of data sets, and has good tracking effect even in dense crowds. The multi-row person tracking method with the GAM tracker has better tracking precision and reasoning speed than other existing tracking frames, and the switching times of the tracking target objects are obviously reduced.
In order to detect the emergency situation of the pedestrians in the community and timely give alarm feedback to improve the efficiency of security personnel, a new method for detecting the abnormal behaviors of the pedestrians in the community is urgently needed to be provided on the basis of the existing GAM.
Disclosure of Invention
The invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are characterized in that on the basis of a multi-pedestrian tracking method with the GAM tracker in the prior art, an abnormal judgment logic for analyzing the track of the pedestrians is added, and the real-time monitoring of the abnormal running, falling and loitering of the pedestrians is realized by virtue of an excellent tracking effect.
The specific technical scheme provided by the invention is as follows:
in one aspect, the invention provides a method for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which comprises the following steps:
inputting a video image data frame to be processed into a pre-trained YOLOX network, acquiring a high-resolution characteristic diagram, and detecting pedestrians by adopting a target detector;
based on parallel output ends and an anchor-free method thereof added on a YOLOX network and used for extracting target differentiation characteristics, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends, the fusion analysis is used for estimating a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out;
judging whether the target to be detected runs or not by adopting a normal distribution and threshold value method according to the target coordinate change speed detected in a preset time period based on the target feature extraction result; and/or the presence of a gas in the gas,
calculating the ratio of the height to the width of the pedestrian according to the detection result of the target to be detected based on the target feature extraction result, and judging whether the target to be detected falls down or not according to the calculated ratio of the length to the width; and/or the presence of a gas in the gas,
judging whether the target to be detected is abnormally loitering or not by adopting a vector method and a threshold value method according to the relation between the moving distance and the reciprocating times of the target to be detected in the video area and a preset loitering distance threshold value on the basis of the target feature extraction result;
if at least one of running, falling or abnormal loitering of the target to be detected exists in the video image data frames to be processed, marking the video image data frames to be processed as data frames with community pedestrian abnormal behaviors.
Optionally, the inputting a video image data frame to be processed into a pre-trained YOLOX network to obtain a high-resolution feature map, and performing pedestrian detection by using a target detector specifically includes:
inputting differentiated features output from a YOLOX network into a GAM tracker, expanding an image acquisition area of a target detected by the GAM tracker to at least twice the width height of a candidate of the map, and positioning according to the center position of the image acquisition area of the detected target;
adjusting an image in an image acquisition area to a standard size [ H ]s,Ws]Then, carrying out normalized extraction;
determining candidates based on a sparsity selection policy, and extracting feature parameters corresponding to an image in the image acquisition region and the candidate directions, wherein the candidate widths and heights are 1/2 of the standard size, respectively;
inputting at least one of the candidates to a graph attention model, estimating a correlation between the target and the candidate;
and fusing and combining the attention models of all the candidates, and outputting a target detection result through the GAM tracker.
Optionally, based on parallel output ends added to the YOLOX network for extracting the target differentiation features and an anchor-free method thereof, fusion analysis is performed in the GAM tracker with the added output ends as inputs to estimate a matching degree heatmap between the to-be-detected result and the tracked target, so as to perform correlation matching between the tracked target and the detected result, specifically including:
estimating a graph attention model after carrying out 3 × 3 convolution operation on the characteristics of the YOLOX network, and generating a characteristic graph through a 1 × 1 convolution layer;
performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating a target type result through a 1 × 1 convolution layer;
performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating a frame size and target center offset through different 1 × 1 convolution layers;
and performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating individual differentiation features through a 1 × 1 convolution layer.
Optionally, the estimating a graph attention model after performing a convolution operation on the features of the YOLOX network by 3 × 3, and generating a feature graph by a 1 × 1 convolution layer specifically includes:
performing non-maximum suppression (NMS) according to the graph attention score to extract peak keypoints, preserving coordinates of keypoints for which the graph attention score is greater than a threshold;
and generating a bounding box and individual differential characteristics according to the target classification, the center offset and the target frame size, and extracting identity embedding in the estimated target center.
Optionally, based on parallel output ends added to the YOLOX network for extracting the target differentiation features and an anchor-free method thereof, fusion analysis is performed in the GAM tracker with the added output ends as inputs to estimate a matching degree heatmap between the to-be-detected result and the tracked target, so as to perform association matching between the tracked target and the detected result, and before determining whether the to-be-detected target runs, the method further includes:
the position coordinate based on the target is [ h ]t,wt]Determining the coordinate of the corresponding body gravity center in the frame as [ hd,wd]And determining the coordinate of the corresponding body barycenter in the candidate as [ hc,wc]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity.
Optionally, the target-based body barycentric coordinate is [ h ]t,wt]Determining the coordinate of the center of gravity of the corresponding body in the pedestrian frame as [ hd,wd]And determining the sitting position of the corresponding body barycenter in the candidateMarked by [ h ]c,wc]The method specifically comprises the following steps:
initializing a plurality of sub-tracks according to the estimation of the frame in the first frame of the video image data frame to be processed;
connecting the detected frame with the existing tracking segment in the frame after the first frame in the video image data frame to be processed;
predicting the position of the target in the current frame by adopting a Kalman filtering function; and if the distance between the coordinate of the target and the predicted position is detected to be larger than a preset value, setting the corresponding cost to be infinite.
Optionally, the determining, based on the target feature extraction result, whether the target to be detected runs or not by using a normal distribution and a threshold method according to the target coordinate change speed detected within a preset time period specifically includes:
minimizing the image capture area based on a position of the video image data frame prediction target;
establishing a coordinate system for the video image, and recording and storing the identifications and coordinates of all currently detected targets;
collecting coordinate information and speeds of a preset number aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result;
performing image analysis on the image in the target frame to determine the target gravity center;
and when the target gravity center deviation amount is larger than a preset threshold and keeps a first preset duration, determining that the target runs abnormally, and marking the target frame.
Optionally, the extracting a result based on the target feature, calculating a ratio of the height to the width according to a detection result of the target to be detected, and determining whether the target to be detected falls down according to the calculated ratio of the length to the width specifically includes:
detecting width and height information of frame regression size of the target at intervals of a first fixed frame number, and calculating a width and height ratio of the target;
when the width-height ratio continuously detected within a second preset time length is larger than a preset width-height threshold value, judging that the target falls down abnormally, and marking the target frame.
Optionally, the determining, based on the target feature extraction result, whether the target to be detected has abnormal loitering by using a vector method and a threshold method in combination according to a relationship between a distance and a reciprocating frequency of the target to be detected moving in the video region and a preset loitering distance threshold, specifically includes:
recording the reciprocating motion times of the pedestrians in the video image, recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target;
if the included angle of the motion direction vectors at the t-th moment and the t-1 th moment is larger than a preset value, determining that the motion direction vector is a reciprocating motion;
when the reciprocating times reach a preset threshold value, judging that the target is abnormally loitering, and marking the target frame.
In one aspect, the invention further provides a GAM tracker-based community pedestrian abnormal behavior detection apparatus comprising computer readable instructions which, when run on a computer, cause the computer to perform any of the methods described above.
The invention has the following beneficial effects:
the invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community of a GAM tracker, which are used for detecting targets by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, fusing and merging later selected attention models, optimizing by using the GAM tracker, determining coordinates of key points according to matching scores of candidate attention maps, and judging whether the targets run quickly by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating times are larger than the normal movement times of pedestrians, the abnormal behavior judgment result is stable, the method for detecting the abnormal behaviors of the pedestrians in the community is improved, and the working efficiency of emergencies is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a method for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of inputting a frame of video image data to be processed into a pre-trained YOLOX network to obtain an output result according to an embodiment of the present invention;
figure 3 is a schematic diagram of the operation of the GAM tracker employed in an embodiment of the present invention;
fig. 4 is a schematic flowchart illustrating an exemplary process of determining whether a target to be detected runs according to an embodiment of the present invention;
fig. 5 is a schematic flowchart of an exemplary process for determining whether a target to be detected falls according to an embodiment of the present invention;
fig. 6 is a schematic flowchart illustrating an exemplary process of determining whether an abnormal loitering exists in a target to be detected according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "comprises," "comprising," and "having," and any variations thereof, in the description and claims of this invention, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
A method and an apparatus for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker according to an embodiment of the present invention will be described in detail with reference to fig. 1 to 6.
Referring to fig. 1, a method for detecting abnormal behaviors of a community pedestrian based on a GAM tracker according to an embodiment of the present invention includes:
s1: and inputting a video image data frame to be processed into a pre-trained YOLOX network, acquiring a high-resolution characteristic diagram, and detecting pedestrians by adopting a target detector.
The working principle diagram of the GAM tracker used in the embodiment of the present application is shown in fig. 3.
In the process of acquiring the high-resolution feature map, ResNet-34 (residual neural network) is adopted as a backbone network so as to achieve a good balance between accuracy and speed. Pedestrian detection is performed by a YOLOX detector, and in order to improve the efficiency of subsequent calculation, only the detection of pedestrians is reserved, and prediction parameters of other categories are tailored.
Specifically, fig. 2 shows a schematic flow chart of inputting a video image data frame to be processed into a previously trained YOLOX network to obtain an output result. The method specifically comprises the following steps:
s11: inputting the differentiated features output from the YOLOX network to a GAM tracker, expanding an image-capturing region of a target detected by the GAM tracker to at least twice the width-height of a candidate of the map, and performing positioning according to the center position of the image-capturing region of the detected target.
S12: adjusting an image in an image acquisition area to a standard size [ H ]s,Ws]And then carrying out normalized extraction.
S13: determining candidates based on a sparsity selection policy, and extracting feature parameters corresponding to an image in the image-acquisition region and the candidate directions, wherein the candidate widths and heights are 1/2 of the standard size, respectively.
In the calculation process, based on a sparse selection strategy, selecting a plurality of sizes of [ H ]c,Wc]=[Hs/2,Ws/2]And extracting corresponding features sc from the features uniformly extracted from the search area.
At least one of the candidates is input to a graph attention model, and a correlation between the target and the candidate is estimated, S14.
For each valid ith tracking target, the feature template combines the extracted features S of the rectangles detected on the current n framesi={si,1,si,2,...,si,n}. To improve the computational efficiency, n is set to 10. To keep the feature size consistent, the detected rectangle is also adjusted to [ H ] before feature extractiont,Wt]=[HC,Wc]The size of (2).
Candidate cjGAM tracker
Figure BDA0003532241210000084
Is used to estimate the correlation between the tth tracked target and each candidate, each eigen channel [ Hc,Wc]Are the same size. Graph attention model
Figure BDA0003532241210000085
Is given by each candidate c at the ith target and positionjS (h, w, n) betweenc) To estimate and normalize:
Figure BDA0003532241210000081
and S15, fusing and merging the graph attention models of all the candidates, and outputting a target detection result through the GAM tracker.
Fused ofGraph attention model Mi,jBased on the graphical attention model of all these candidates, the calculation method is as follows:
Figure BDA0003532241210000082
and
Figure BDA0003532241210000083
wherein c isj∈CjAnd CjIs the candidate set for the jth detected pedestrian, nc∈{1,...,NCIs the channel index of the merged attention map, NcThe number of channels is the same as the number of extracted characteristic channels.
Attention model M of merged graphi,jAs an output map of the GAM tracker.
S2: based on parallel output ends added to the YOLOX network and used for extracting target differentiation features and an anchor-free method thereof, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends, the fusion analysis is used for estimating a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out.
In the process, the difference between the predicted result and the actual real result is solved by using a preset loss function. In particular, the method comprises the following steps of,
and estimating a graph attention model after carrying out 3 x 3 convolution operation on the characteristics of the YOLOX network, and generating a characteristic graph through a 1 x 1 convolution layer.
Performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating a target type result through a 1 × 1 convolution layer;
performing non-maximum suppression (NMS) according to the graph attention score to extract peak keypoints, preserving coordinates of keypoints with graph attention scores greater than a threshold, then generating bounding boxes according to the target classification, center offset and the target bounding box size, and individual differentiation features, and extracting identity embedding in the estimated target center.
(II) performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating target frame size target center offset through different 1 × 1 convolution layers;
and (III) performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating individual difference features through a 1 × 1 convolution layer.
In the above steps S22 and S23, the L1 loss function is used, and the offset and the size loss are calculated together.
And the goal of the target center offset output is to more accurately locate the object. This will introduce a non-negligible quantization error since the output graph spans 4. The branch estimates the successive offset of each pixel relative to the center of the object to mitigate the effects of downsampling. The bounding box size output is responsible for estimating the height and width of the bounding box of the object at each coordinate location.
After that, the part coordinate based on the object is [ h ]t,wt]Determining the coordinate of the corresponding body gravity center in the frame as [ hd,wd]And determining the coordinate of the corresponding body barycenter in the candidate as [ hc,wc]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity. In particular, the method comprises the following steps of,
assume a position of a specific body center of gravity on the target as [ h ]t,wt]The detected same body center of gravity on the rectangle is [ h ]d,wd]. When selecting candidates, the position of the body center of gravity on each candidate is [ h ]c,wc]Wherein c isj∈Cj. The positioning distance D of the body center of gravity between the target and the detected rectangleijIs always greater than Di,cjWherein one candidate has the following relationship:
Figure BDA0003532241210000101
in particular, this step may be performed according to the following sub-steps:
initializing a plurality of sub-tracks according to the estimation of the frame in the first frame of the video image data frame to be processed.
Connecting the detected frame with the existing tracking segment in the frame after the first frame in the video image data frame to be processed;
predicting the position of the target in the current frame by adopting a Kalman filtering function; and if the distance between the coordinate of the target and the predicted position is detected to be larger than a preset value, setting the corresponding cost to be infinite.
S3: and judging whether the target to be detected runs or not by adopting normal distribution and a threshold value method according to the target coordinate change speed detected in a preset time period based on the target characteristic extraction result.
As shown in fig. 4, a schematic flowchart of the process of determining whether a running occurs to a target to be detected is exemplarily shown, specifically:
s31: minimizing the image capture area based on a location of a predicted target of the frame of video image data.
For each frame, predicting the location of the target before target association can minimize the search area and improve the accuracy of the association. The method utilizes a Kalman filtering method to predict the target position. The current position and velocity of the target is first predicted using a kalman filter, and the parameters of the kalman filter are updated with previously collected information.
The predicting step is performed by the following formula:
Figure BDA0003532241210000102
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0003532241210000103
containing the predicted position and velocity of the ith target on the t-th frame,
Figure BDA0003532241210000104
containing the predicted position and velocity of the ith object in the t-1 th frame, FtIs a state transition matrix, utIs a control vector, PtIs the result of prediction
Figure BDA0003532241210000105
Of the covariance matrix, QtIs the noise covariance matrix in the prediction process.
And after the tracking process of the current frame is finished, updating the parameters of the Kalman filter to adapt to the actual situation. The updating step is performed by the following formula:
Figure BDA0003532241210000111
where K is the Kalman gain, ztIs the position of each object, RtIs an observation result ztThe covariance of the uncertainty (i.e. the covariance of the sensor noise),
Figure BDA0003532241210000112
P′tk' is
Figure BDA0003532241210000113
PtK, and is fed back in the next prediction and update round.
Comparing the target position predicted by Kalman filtering with the actually detected target position, and when the distance between the two is greater than a set threshold value KkalThe target is considered to have run abnormally.
S32: and establishing a coordinate system for the video image, and recording and storing the identifications and coordinates of all the currently detected targets.
And establishing a rectangular coordinate system by taking the upper left corner of the image as a coordinate origin, the direction of the width as an X coordinate and the direction of the height as a Y coordinate. Continuously recording all currently detected targets and corresponding center position coordinates by a fixed interval frame number i and storing:
Posx[id][t]=x#(7)
Posy[id][t]=y#(8)
wherein the subscript t is automatically increased by 1 after each storage and t in the above two formulas always remains synchronized, and then the velocity of the target is calculated according to the following formula:
Figure BDA0003532241210000114
wherein i is the interval frame number of the acquisition coordinate position, the velocity V of the target is calculated by the formula by using the previously stored position information, and the same subscript t stored in V _ now [ id ] [ t ] is automatically added with 1 after each storage. And simultaneously according to the formula:
Figure BDA0003532241210000115
and calculating the ratio K of the speed of the target at the moment to the speed detected last time, and storing the ratio K in K [ id ] [ t ] and adding 1 to t after each storage.
S33: and collecting coordinate information and speeds of a preset number aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result.
After a certain amount of coordinate information of the same target is collected and the speed is calculated, a Gaussian distribution model is built for the speed distribution of the target according to the existing speed information:
v~N(μ,σ2)#(11)
where μ is the mean value, σ2Is variance, and the Gaussian model of the velocity distribution of the target is updated in real time after the velocity of the target is calculated each time.
S34: and carrying out image analysis on the images in the frame of the target to determine the gravity center of the target.
Cutting the image according to the target frame to obtain an individual image of each target, wherein the upper left corner of the cut image is used as an origin, and the lower part of the cut image is used as a bottomEstablishing a coordinate system for the positive direction of the X axis in the positive direction of the Y axis and the right direction, obtaining the coordinates (X, Y) of each pixel, and recording X, Y the maximum value in the direction, and recording as: x is the number ofmaxAnd ymaxThen cv processing is carried out on the picture to obtain a target gray-scale image, a histogram corresponding to the gray-scale image is calculated, a numerical range with concentrated gray-scale values is screened out according to the numerical distribution condition in the histogram, and the range is used as a gray-scale value interval D of the targetforeground(foreground), other values as background DbackgroundProcessing and defining a foreground mask identification formula:
Figure BDA0003532241210000121
and then traversing the image gray level image pixel by pixel according to a formula:
Figure BDA0003532241210000122
Figure BDA0003532241210000123
and calculating the target gravity center, wherein the (x, y) coordinates are obtained by establishing a coordinate system by using the origin of the upper left corner of the single cut icon.
S35: and when the target gravity center deviation amount is larger than a preset threshold and keeps a first preset duration, determining that the target runs abnormally, and marking the target frame.
The gravity center, the image center and the image width and height of the person are calculated according to the formula:
Figure BDA0003532241210000124
calculating the proportion when the target gravity center deviates from the proportion KwGreater than a set threshold
Figure BDA0003532241210000134
And maintained for a certain time tthrThe target is considered running abnormally and marked.
According to a preset speed ratio threshold KthrDiscrete threshold vthrAnd the y-coordinate of the target according to the formula:
Figure BDA0003532241210000131
Figure BDA0003532241210000132
calculating in real time a corresponding adaptive threshold according to the Y-coordinate of the target, wherein Posy[id][t]S is a preset fixed parameter for the Y coordinate of the target. If V > VmOr K > KmThe running exception is targeted and marked for display in the video.
S4: and calculating the ratio of the height to the width of the pedestrian according to the detection result of the target to be detected based on the target feature extraction result, and judging whether the target to be detected falls down or not according to the calculated ratio of the length to the width.
Fig. 5 exemplarily shows a schematic flow chart for determining whether the target to be detected falls, specifically:
s41: detecting width and height information of frame regression size of the target at intervals of a first fixed frame number, and calculating a width and height ratio of the target;
s42: when the width-height ratio continuously detected within a second preset time length is larger than a preset width-height threshold value, judging that the target falls down abnormally, and marking the target frame.
Illustratively, according to the detected width and height information of the target frame, every fixed frame number i, according to the formula:
Figure BDA0003532241210000133
computing a targetWhen the width-to-height ratio is continuously detected for a plurality of times to be larger than the set width-to-height threshold value and is kept for a certain time tthrAnd judging that the object falls down abnormally, marking the object fall down, and displaying the object fall down in the video.
S5: and judging whether the target to be detected has abnormal loitering or not by adopting a vector method and a threshold value method according to the relation between the moving distance and the reciprocating times of the target to be detected in the video area and a preset loitering distance threshold value based on the target feature extraction result.
Fig. 6 exemplarily shows a flow chart for determining whether there is an abnormal loitering in the target to be detected, specifically:
s51: and recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target.
The total target movement distance Dis _ T and the target displacement (the linear distance between the current target position and the initial position) Dis _ P are set to be 0. And recording the coordinates of the center position of the target corresponding to each id at intervals of a fixed interval frame number i, and simultaneously storing coordinate position information based on the formulas (7) and (8) according to the following formulas:
Figure BDA0003532241210000141
Figure BDA0003532241210000142
the values of the movement distance Dis _ T and the displacement Dis _ P at the time T are updated.
And setting a reciprocating numerical value bf for the target corresponding to each id, and recording the reciprocating times of the target in the video. The initial value of bf is set to 0, and the slope at time t is calculated from the coordinate information stored in equations (7) and (8) by the following equation:
postmp[id][t]={posx[id][t],posy[id][t]}#(21)
and recording the central position corresponding to each id as a target walking track at intervals of a fixed interval frame number i. Then based on the motion trail of the target, according to the formula:
vector[id][t]={postmp[id][t-1],postmp[id][t]}#(22)
s52: and if the included angle of the motion direction vectors at the t-th moment and the t-1 th moment is larger than a preset value, determining that the motion direction vector is a reciprocating motion.
And establishing a motion direction vector of the target, and setting a reciprocating numerical value bf for the target corresponding to each id for recording the reciprocating times of the target in the video. Setting the bf initial value as 0, and if the conditions of the t-th time and the t-1 time are met:
angel(vector[id][t-1],vector[id][t]>angelthr#(23)
that is, the included angle between the vectors of the two forward and backward motion directions is larger than the set threshold value angelthrThe target is considered to have a turn back, so that bf [ id ]]Plus 1. Then, then
When Dis _ T [ id ] is larger than 1.5Dis _ P [ id ] or bf [ id ] is larger than or equal to 3, the target is judged to be loitering abnormity, and the loitering abnormity is marked and displayed in the video.
S53: when the reciprocating times reach a preset threshold value, judging that the target is abnormally loitering, and marking the target frame.
S6: if at least one of running, falling or abnormal loitering of the target to be detected exists in the video image data frames to be processed, marking the video image data frames to be processed as data frames with community pedestrian abnormal behaviors.
The invention provides a method and a device for detecting abnormal behaviors of pedestrians in a community based on a GAM tracker, which are characterized in that targets are detected by adopting a YOLOX method and an anchor-free method through analysis and processing of video images to be processed, subsequently selected attention models are fused and merged, the GAM tracker is utilized for optimization, key point coordinates are determined according to matching scores of candidate attention maps, and whether the targets run quickly is judged by using a gravity offset judgment method; judging whether the target falls down by using a method for comparing the length-width ratio and the change rate of the frame; comparing the travel distance to the number of reciprocations is used to determine whether the target wanders. When the detection target runs, falls and wanders, the speed, the detection frame and the moving distance of the target change obviously when the reciprocating motion frequency is higher than that of normal motion of the pedestrian, the abnormal behavior judgment result is stable, the method for detecting the abnormal behavior of the pedestrian in the community is improved, and the working efficiency of emergency situations is improved.
Based on the same inventive concept, an embodiment of the present invention further provides a community pedestrian abnormal behavior detection apparatus based on a GAM tracker, which is characterized by comprising computer readable instructions, when the computer readable instructions are executed on a computer, the computer is caused to execute any one of the above methods.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (10)

1. A community pedestrian abnormal behavior detection method based on a GAM tracker is characterized by comprising the following steps:
inputting a video image data frame to be processed into a pre-trained YOLOX network to obtain a high-resolution feature map, and adopting a target detector to detect pedestrians;
based on parallel output ends added to the YOLOX network for extracting target differentiation characteristics and an anchor-free method thereof, fusion analysis is carried out in a GAM tracker by taking the added output ends as input ends to estimate a matching degree heat map between a result to be detected and a tracking target, and then correlation matching between the tracking target and the detection result is carried out;
judging whether the target to be detected runs or not by adopting a normal distribution and threshold value method according to the target coordinate change speed detected in a preset time period based on the target feature extraction result; and/or the presence of a gas in the gas,
calculating the ratio of the height to the width of the pedestrian according to the detection result of the target to be detected based on the target feature extraction result, and judging whether the target to be detected falls down or not according to the calculated ratio of the length to the width; and/or the presence of a gas in the gas,
judging whether the target to be detected is abnormally loitering or not by adopting a vector method and a threshold value method according to the relation between the moving distance and the reciprocating times of the target to be detected in the video area and a preset loitering distance threshold value on the basis of the target feature extraction result;
if at least one of running, falling or abnormal loitering of the target to be detected exists in the video image data frames to be processed, marking the video image data frames to be processed as data frames with community pedestrian abnormal behaviors.
2. The method according to claim 1, wherein the step of inputting the video image data frame to be processed into a previously trained YOLOX network to obtain a high-resolution feature map and performing pedestrian detection by using a target detector comprises:
inputting differentiated features output from a YOLOX network into a GAM tracker, expanding an image acquisition area of a target detected by the GAM tracker to at least twice the width height of a candidate of the map, and positioning according to the center position of the image acquisition area of the detected target;
adjusting an image in an image acquisition area to a standard size [ H ]s,Ws]Then, carrying out normalized extraction;
determining candidates based on a sparsity selection policy, extracting feature parameters corresponding to an image in the image acquisition region and the candidate directions, wherein the candidate width and height are 1/2 of the standard size respectively;
inputting at least one of the candidates to a graph attention model, estimating a correlation between the target and the candidate;
and fusing and combining the attention models of all the candidates, and outputting a target detection result through the GAM tracker.
3. The method according to claim 1, wherein the parallel output ends added to the YOLOX network for extracting the differentiated features of the target and the anchor-free method thereof are used as inputs in a GAM tracker for fusion analysis to estimate a heat map of matching degree between the result to be detected and the tracked target, and further perform the correlation matching between the tracked target and the detected result, and specifically comprises:
estimating a graph attention model after carrying out 3 × 3 convolution operation on the characteristics of the YOLOX network, and generating a characteristic graph through a 1 × 1 convolution layer;
performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating a target type result through a 1 × 1 convolution layer;
performing 3 × 3 convolution operation on the characteristic diagram to obtain target center offset, and generating a target frame size and target center offset through different 1 × 1 convolution layers;
and performing 3 × 3 convolution operation on the feature map, estimating the size of a target frame, and generating individual differentiation features through a 1 × 1 convolution layer.
4. The method according to claim 3, wherein the estimating a graph attention model after performing a 3 × 3 convolution operation on the features of the YOLOX network and generating a feature graph through a 1 × 1 convolution layer specifically comprises:
performing non-maximum suppression (NMS) according to the graph attention score to extract peak keypoints, preserving coordinates of keypoints for which the graph attention score is greater than a threshold;
and generating a boundary frame and individual differentiation characteristics according to the target classification, the center offset and the target frame size, and extracting identity embedding from the estimated target center.
5. The method for detecting abnormal pedestrian behaviors in community according to claim 1, wherein a parallel output end added to the YOLOX-based network for extracting differentiated features of targets and an anchor-free method thereof are used to perform fusion analysis in the GAM tracker with the added output end as input, so as to estimate a heat map of matching degree between the target to be detected and the tracked target, further perform correlation matching between the tracked target and the detection result, and before determining whether the target to be detected runs, the method further comprises:
the position coordinate based on the target is [ h ]t,wt]Determining the coordinate of the corresponding body gravity center in the frame as [ hd,wd]And determining the corresponding body barycentric coordinate in the candidate as [ hc,wc]Wherein a positioning distance between the target and the frame to the body center of gravity is always larger than a minimum value of the positioning distances between the target and the candidate to the body center of gravity.
6. The method according to claim 5, wherein the target-based body barycentric coordinate is [ h ]t,wt]Determining the coordinate of the corresponding body gravity center in the target frame as [ hd,wd]And determining the coordinate of the corresponding body barycenter in the candidate as [ hc,wc]The method specifically comprises the following steps:
initializing a plurality of sub-tracks according to the estimation of the frame in the first frame of the video image data frame to be processed;
connecting the detected frame with the existing tracking segment in the frame after the first frame in the video image data frame to be processed;
predicting the position of the target in the current frame by adopting a Kalman filtering function; and if the distance between the coordinate of the target and the predicted position is detected to be larger than a preset value, setting the corresponding cost to be infinite.
7. The method according to claim 1, wherein the step of judging whether the target to be detected runs or not by adopting normal distribution and a threshold value method according to the target coordinate change speed detected within a preset time period based on the target feature extraction result specifically comprises:
minimizing the image capture area based on a position of the video image data frame prediction target;
establishing a coordinate system for the video image, and recording and storing the identifications and coordinates of all currently detected targets;
collecting a preset amount of coordinate information and speeds aiming at the same target, establishing Gaussian distribution model operation on the speed distribution, and updating the Gaussian model in real time based on an operation result;
performing image analysis on the image in the target frame to determine the target gravity center;
and when the target gravity center deviation amount is larger than a preset threshold and keeps a first preset duration, determining that the target runs abnormally, and marking the target frame.
8. The method for detecting abnormal pedestrian behaviors in a community according to claim 1, wherein the step of calculating a ratio of height to width according to a detection result of the target to be detected based on the target feature extraction result and judging whether the target to be detected falls down according to the calculated ratio of length to width specifically comprises the steps of:
detecting width and height information of frame regression size of the target at intervals of a first fixed frame number, and calculating a width and height ratio of the target;
when the width-height ratio continuously detected within a second preset time length is larger than a preset width-height threshold value, judging that the target falls down abnormally, and marking the target frame.
9. The method for detecting abnormal pedestrian behaviors in a community as claimed in claim 1, wherein the method for judging whether the target to be detected wanders abnormally or not by using a vector method and a threshold method according to a relationship between a distance and a reciprocating frequency of the target to be detected moving in a video region and a preset wandering distance threshold value based on the target feature extraction result specifically comprises the following steps of:
recording the central position of the target as a target motion track at intervals of a second fixed frame number, and establishing a motion direction vector of the target;
if the included angle of the motion direction vectors at the t-th moment and the t-1 th moment is larger than a preset value, determining that the motion direction vector is a reciprocating motion;
when the reciprocating times reach a preset threshold value, judging that the target is abnormally loitering, and marking the target frame.
10. A GAM tracker based community pedestrian abnormal behavior detection apparatus comprising computer readable instructions which, when run on a computer, cause the computer to perform the method of any one of claims 1 to 9.
CN202210208816.8A 2022-03-04 2022-03-04 Community pedestrian abnormal behavior detection method and device based on GAM tracker Active CN114581847B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210208816.8A CN114581847B (en) 2022-03-04 2022-03-04 Community pedestrian abnormal behavior detection method and device based on GAM tracker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210208816.8A CN114581847B (en) 2022-03-04 2022-03-04 Community pedestrian abnormal behavior detection method and device based on GAM tracker

Publications (2)

Publication Number Publication Date
CN114581847A true CN114581847A (en) 2022-06-03
CN114581847B CN114581847B (en) 2024-04-19

Family

ID=81773419

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210208816.8A Active CN114581847B (en) 2022-03-04 2022-03-04 Community pedestrian abnormal behavior detection method and device based on GAM tracker

Country Status (1)

Country Link
CN (1) CN114581847B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190238566A1 (en) * 2018-01-31 2019-08-01 Palo Alto Networks, Inc. Context profiling for malware detection
CN111291735A (en) * 2020-04-30 2020-06-16 华夏天信(北京)智能低碳技术研究院有限公司 Underground personnel running abnormal behavior detection method based on trajectory analysis

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190238566A1 (en) * 2018-01-31 2019-08-01 Palo Alto Networks, Inc. Context profiling for malware detection
CN111291735A (en) * 2020-04-30 2020-06-16 华夏天信(北京)智能低碳技术研究院有限公司 Underground personnel running abnormal behavior detection method based on trajectory analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王怀宝;郭江利;: "基于跟踪轨迹的徘徊行为分析", 计算机与数字工程, no. 05, 20 May 2016 (2016-05-20) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116681983A (en) * 2023-06-02 2023-09-01 中国矿业大学 Long and narrow target detection method based on deep learning

Also Published As

Publication number Publication date
CN114581847B (en) 2024-04-19

Similar Documents

Publication Publication Date Title
EP2858008B1 (en) Target detecting method and system
KR101788269B1 (en) Method and apparatus for sensing innormal situation
US9767570B2 (en) Systems and methods for computer vision background estimation using foreground-aware statistical models
US9076042B2 (en) Method of generating index elements of objects in images captured by a camera system
KR101735365B1 (en) The robust object tracking method for environment change and detecting an object of interest in images based on learning
EP2345999A1 (en) Method for automatic detection and tracking of multiple objects
CN110782483B (en) Multi-view multi-target tracking method and system based on distributed camera network
WO2009148702A1 (en) Detecting and tracking targets in images based on estimated target geometry
Sanchez-Matilla et al. A predictor of moving objects for first-person vision
CN115497056B (en) Method for detecting lost articles in region based on deep learning
US20220366570A1 (en) Object tracking device and object tracking method
Manikandan et al. Human object detection and tracking using background subtraction for sports applications
CN111666860A (en) Vehicle track tracking method integrating license plate information and vehicle characteristics
CN114581847B (en) Community pedestrian abnormal behavior detection method and device based on GAM tracker
KR101690050B1 (en) Intelligent video security system
CN112669294B (en) Camera shielding detection method and device, electronic equipment and storage medium
Douillard et al. A spatio-temporal probabilistic model for multi-sensor object recognition
CN108985131B (en) Target identification method and image processing equipment
KR101290517B1 (en) Photographing apparatus for tracking object and method thereof
CN107665495B (en) Object tracking method and object tracking device
Liu et al. A real-time vision-based vehicle tracking and traffic surveillance
CN111998853A (en) AGV visual navigation method and system
JP3729933B2 (en) Automatic monitoring device
杜绪伟 et al. Real-time hand tracking based on YOLOv4 model and Kalman filter
Lai et al. Automatic path modeling by image processing techniques

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant