CN113065431B - Human body violation prediction method based on hidden Markov model and recurrent neural network - Google Patents

Human body violation prediction method based on hidden Markov model and recurrent neural network Download PDF

Info

Publication number
CN113065431B
CN113065431B CN202110302219.7A CN202110302219A CN113065431B CN 113065431 B CN113065431 B CN 113065431B CN 202110302219 A CN202110302219 A CN 202110302219A CN 113065431 B CN113065431 B CN 113065431B
Authority
CN
China
Prior art keywords
probability
state
target detection
time
hidden markov
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110302219.7A
Other languages
Chinese (zh)
Other versions
CN113065431A (en
Inventor
包梓群
张娜
邵一鸣
许铭洋
马云龙
马铉钧
包晓安
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Sci Tech University ZSTU
Original Assignee
Zhejiang Sci Tech University ZSTU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sci Tech University ZSTU filed Critical Zhejiang Sci Tech University ZSTU
Priority to CN202110302219.7A priority Critical patent/CN113065431B/en
Publication of CN113065431A publication Critical patent/CN113065431A/en
Application granted granted Critical
Publication of CN113065431B publication Critical patent/CN113065431B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a human body violation behavior prediction method based on a hidden Markov model and a recurrent neural network, and belongs to the field of computer vision image processing. The method comprises the following steps: 1) collecting a data set; 2) preprocessing the image; 3) carrying out target detection on the preprocessed image to obtain a target detection frame; 4) taking the image with the target detection frame as the input of the CPN, extracting the human skeleton in the target detection frame, marking the joint points, and obtaining the image with the target detection frame and the joint point marking information; converting the images with the target detection frame and the joint point mark information in each group into pixel matrixes; 5) obtaining the probability that the sample belongs to each violation behavior by using an LSTM model to form a probability matrix; 6) and correcting the probability matrix by using a hidden Markov model, and taking the violation behavior corresponding to the maximum probability value in the corrected probability matrix as a final prediction result.

Description

Human body violation prediction method based on hidden Markov model and recurrent neural network
Technical Field
The invention relates to the field of computer vision image processing, in particular to a human body violation behavior prediction method based on a hidden Markov model and a recurrent neural network.
Background
Object detection and human behavior recognition technologies are hot spots in the field of computer vision today. The goal of human behavior recognition is to automatically analyze the ongoing behavior in an unknown video or image sequence. The method applies many emerging technologies. It can be used as a predictor of violations. The system can timely suppress the occurrence of the illegal event and can also play a role of alarming the illegal event. In order to better predict the occurrence of human violation, the human behavior recognition technology is a technology which is inevitably used. However, human behavior identification is a very challenging task due to the influence of various uncontrollable factors such as different illumination conditions, viewing angle diversity, complex background, large intra-class variation, crowd gathering and the like. In order to solve the above problems, researchers have proposed various treatments. The RNN network can process time series data, but cannot solve the problem of disappearance of the gradient, and therefore cannot process data having a long time series.
LSTM is a recurrent neural network after RNN improvement. On the basis of RNN structure, it adds a neuron and three gate structures to control the memory information on time sequence. The three gates are, respectively, a forgetting gate, an input gate and an output gate. Every time information needs to pass through the three gates, after certain processing, the cells can selectively memorize and control the forgetting degree. In this way, the defects left by RNN can be well compensated, and the method can be effectively applied to medium-long-term sequence data.
Hidden Markov Models (HMM) are dynamic Bayesian networks with the simplest structure, are well-known directed graph models, are mainly used for time series data modeling, and are widely applied to the fields of speech recognition, natural language processing and the like. The time sequence data with Markov property is solved by using the model, so that the calculation solving process can be greatly reduced. The present invention applies a hidden markov model to correct the prediction of violation behavior.
Disclosure of Invention
The invention aims to better predict human body violation behaviors, and provides a human body violation behavior prediction method based on a hidden Markov model and a recurrent neural network.
The technical scheme of the invention is as follows:
a human body violation behavior prediction method based on a hidden Markov model and a recurrent neural network comprises the following steps:
1) data acquisition: acquiring video data of different illegal behaviors, slicing the video data, converting continuous video data into continuous images, and marking the illegal behaviors of each group of continuous images;
2) preprocessing the image;
3) carrying out target detection on the preprocessed image to obtain a target detection frame;
4) taking the image with the target detection frame as the input of the CPN, extracting the human skeleton in the target detection frame, marking the joint points, and obtaining the image with the target detection frame and the joint point marking information; converting the images with the target detection frame and the joint point mark information in each group into pixel matrixes;
5) taking each group of pixel matrixes obtained in the step 4) as a sample to form a sample training set; training the LSTM model by using a sample training set to obtain the probability that the sample belongs to each violation behavior, and forming a probability matrix;
6) correcting the probability matrix by using a hidden Markov model, taking the violation behavior corresponding to the maximum probability value in the corrected probability matrix as a final prediction result, and training the hidden Markov model according to the prediction result and a real result;
7) obtaining video data to be predicted, converting the video data to be predicted into a pixel matrix to be processed through steps 1) to 4), obtaining an initial probability matrix by using a trained LSTM model, correcting the initial probability matrix by using a trained hidden Markov model, and taking violation behaviors corresponding to maximum probability values in the corrected probability matrix as final prediction results.
Further, the preprocessing method in step 2) is a filtering method, a square region is taken by taking a pixel point on the picture as a center, the gray values of all the pixel points in the region are sorted, the sorted middle value is taken as a new value of the gray value of the center pixel, and the image is traversed in a sliding window mode.
Further, the target detection process in step 3) includes:
sequentially carrying out size adjustment on continuous images in a group and extracting features to obtain a feature map;
carrying out convolution on the characteristic diagram once, concentrating characteristic information, and then dividing the characteristic diagram into two branches: in the first branch, a person and a background are distinguished through an rpn _ data layer, and a candidate frame marked as the person is output; in the second branch, calculating and outputting the offset of the candidate frame;
performing border crossing elimination and NMS non-maximum suppression on the candidate frames, and eliminating overlapped frames; inputting the residual candidate frame and the feature map into an ROI Pooling layer, mapping the candidate frame onto the feature map, and outputting after passing through a full connection layer.
Further, the data input into the LSTM model is a time-series ordered pixel matrix a ═ a (a)1,a2,a3,a4,…,an) Wherein a isiRepresenting the pixel matrix corresponding to the ith image in the group.
Further, the hidden markov model is established by the following steps:
6.1) determining the set of implicit states as S ═ S1,s2,...,sNAnd the observation state set is O ═ O1,o2,...,oNN is the type number of the violation behaviors;
6.2) determining the state transition probability matrix A ═ aij]N*NAnd the state at the current moment is only related to the state at the last moment, namely:
aij=p(yt+1=sj|yt=si)
wherein, ytIndicates the state at time t, yt=siThe state at time t is represented as si,yt+1=sjRepresents a state at time t +1 as sj;p(yt+1=sj|yt=si) Indicates that the state at time t is siThen the state at time t +1 is sjA probability value of (d); a isijFor observing the element in the ith row and jth column of the probability matrix A, i.e. at time t-1, the state of the model is siAt time t, the state of the model is transferred to sj
6.3) determining the observation probability matrix B ═ Bij]N*N
bij=p(xt=oj|yt=si)
Wherein o isjDenotes the jth observed value, xtRepresenting the observed value, x, at time tt=ojThe observed value at time t is oj;p(xt=oj|yt=si) Indicates that the state at time t is siThen the observed value at time t is ojA probability value of (d); bijFor observing the elements of row i and column j in the probability matrix B, i.e. in state siUnder the condition of occurrence of the observed value ojA probability value of (d);
6.5) taking the probability matrix output by the LSTM model as the probability distribution pi (pi) of the initial state1,Π2,…,ΠN):
Πi=p(y=si)
Therein, IIiIndicates belonging to state siThe probability of (c).
Further, a step of performing feature extraction and dimension reduction on the pixel matrix is further included between step 4) and step 5), specifically: inputting the pixel matrix obtained in the step 4) into a convolution layer for feature extraction, and then entering a pooling layer for dimensionality reduction; the reduced pixel matrix is used as the input of the LSTM model.
The invention has the beneficial effects that:
the invention provides a human body behavior violation prediction method based on a hidden Markov model and a recurrent neural network, which is characterized in that different violation behaviors are set according to different scenes, a data set is adopted in a specific scene, then the network model is trained to predict the violation behaviors, and then the hidden Markov model is combined to correct errors of the network model, so that the violation behaviors can be correctly judged and prevented in time, and the functions of warning in time are achieved.
Drawings
FIG. 1 is a basic flow diagram of a prediction method;
FIG. 2 is a simplified flow of CPN estimation of a human joint;
FIG. 3 is a diagram illustrating the setting and adjustment of various parameters of the LSTM model;
FIG. 4 is a flow chart of LSTM model training;
FIG. 5 is a diagram structure of a hidden Markov model;
Detailed Description
The human violation prediction method based on the hidden markov model and the recurrent neural network according to the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, a human violation prediction method based on hidden markov model and recurrent neural network of the present invention includes the following steps:
s1: data acquisition: and acquiring video data of different illegal behaviors, slicing the video data, converting continuous video data into continuous images, and marking the illegal behaviors of each group of continuous images. In the embodiment, the time series data are collected according to the monitoring system, part of the data come from an INRIA XMAX multi-view video library in the model training process, and the data are screened out according to specific requirements with obvious characteristics.
In order to better perform subsequent operations, when the obtained data is subjected to slicing operation, image interception is performed on the video data at certain intervals, and the video data is changed into continuous images.
S2: in order to improve the training efficiency and the accuracy of the model after the training is completed, the intercepted image data needs to be preprocessed.
Median filtering is used, which is a non-linear method. The median filter has a good effect in filtering impulse noise and selects an appropriate point instead of the value of the pollution point. Firstly, a region with a certain pixel as the center is determined, wherein a square region is taken, then the gray values of the pixels in the region are sorted, and the middle value is taken as the new value of the gray value of the center pixel. And moving the square area in a sliding window mode to obtain a clearer picture with less loss after all traversals are completed.
S3: carrying out target detection on the preprocessed image to obtain a target detection frame:
the target detection is carried out by adopting a fast-RCNN-based network, and the target detection mainly comprises four layer structures. The first layer is Conv layers, and a feature map is extracted by a convolution pooling method. The second layer is an rpn (region pro-technical network) for obtaining accurate candidate frames. And the third layer is ROIPooling and is used for extracting a candidate frame feature map and sending the candidate frame feature map into a full connection layer judgment target. The fourth layer is a classification layer, and the classification of the candidate frame is calculated by using the candidate frame feature map, and meanwhile, the accurate position of the candidate frame is obtained.
In implementation, the continuous images in a group are sequentially subjected to size adjustment and feature extraction to obtain a feature map;
carrying out convolution on the characteristic diagram once, concentrating characteristic information, and then dividing the characteristic diagram into two branches: in the first branch, people and backgrounds are distinguished through rpn _ data layers, and candidate frames marked as people are output; in the second branch, calculating and outputting the offset of the candidate frame;
performing border crossing elimination and NMS non-maximum suppression on the candidate frames, and eliminating overlapped frames; inputting the residual candidate frame and the feature map into an ROI Pooling layer, mapping the candidate frame onto the feature map, and outputting after passing through a full connection layer.
This embodiment is illustrated by way of example in fig. 5:
s3.1: the pictures are used for feature extraction through Conv layers. This layer contains 13 conv layers, where kernel _ size is 3, pad is 1, stride is 1, and according to the picture size formula:
Figure BDA0002986771770000051
therefore, the size of the original image is not changed by the convoluted image; next, 13 relu layers are linked, the number of relu images is doubled through one layer, 4 pooling layers, and kernel _ size is 2 and stride is 2, so the picture size becomes 1/2. The pictures are sequentially subjected to convolution, activation, convolution, activation and pooling, and then a feature map is obtained, wherein the size of the feature map is (M/16) × (N/16) × 512.
S3.2: after Feature Map is entered into RPN, it is first passed through a 3 × 3 convolution, again with Feature Map size M × N, number 512, which should be done to further concentrate Feature information, followed by two full convolutions, i.e., kernel _ size 1, pad 0, stride 1.
S3.3: after convolution, the signal is divided into two branches. The upper branch is firstly passed through rpn _ data layer, and 9 Anchor boxes thereof are classified into two types pixel by pixel, and whether the branch is human or background is distinguished.
The Anchor is a picture cut by respectively carrying out different aspect ratios on each small frame of each convolved picture. When generating the anchor, a box with a size of 16 × 16 is defined, and 16 × 16 is taken because a point on the feature map can correspond to an area with a size of 16 × 16 on the original image. Based on a box with the size of 16 × 16, three square scaling graphs with the side lengths of 8,16 and 32 are marked off, and three anchors with the aspect ratios of 0.5,1 and 2 are cut out from each square, so that a total of M × N × 9 anchors boxes are generated.
Further, the remaining anchor box is filtered and marked in this layer, as follows:
firstly, filtering out anchor boxes exceeding the size of an original image;
② if IoU values of the anchor box and the ground truth are maximum, marking as a positive sample, and label is 1;
③ if IoU of anchor box and ground channel is greater than 0.7, the label is positive sample, label is 1;
if IoU of the anchor box and the ground route is less than 0.3, marking as a negative sample, and labeling as 0;
the formula for calculation IoU is as follows:
IoU=(A∩B)/(A∪B)
wherein, the positive sample indicates that there is a target, the negative sample indicates that there is no target, the rest is neither the positive sample nor the negative sample, and does not participate in training, and label is-1.
In addition to marking the anchor box, the offset between the anchor box and the ground channel is also calculated. Order: and a ground route, namely the coordinates x and y of the central point position and the widths and heights w and h of a calibrated frame (a real frame), and an anchor box, namely the coordinates x _ a and y _ a of the central point position and the widths and heights w _ a and h _ a.
Then:
Δx=(x*-x_a)/w_a
Δy=(y*-y_a)/h_a
Δw=log(w*/w_a)
Δh=log(h*/h_a)
learning is performed by the difference between the ground channel box and the predicted anchor box, so that the weights in the RPN network can learn the capability of predicting the box.
S3.4: in the following layers, wherein rpn _ loss _ cls is the use of a cross entropy (binary cross entropy) function to calculate the classification loss; rpn _ loss _ bbox is the calculation of the regression loss using the Smooth L1 loss function; rpn _ cls _ prob, the probability value is calculated using the softmax function. The prediction box anchor box has been marked in RPN _ data as input to the tributary network above the RPN; and the offset between the anchor box and the gt _ boxes is calculated as the input to the network of the branch below the RPN. And then training by using the RPN network. The reason why the two layers rpn _ cls _ score _ restore are used is that the input/output shape of softmax is predetermined and needs to be changed to a predetermined shape, and the output result after sorting is changed to a desired shape. And finally, putting the data classified as the target into a candidate box (proposal).
S3.5: rpn _ bbox _ pred records the trained four regression position deviation values delta x, delta y, delta w, delta h, and then corrects the position information of anchors to obtain a more accurate prediction frame by using the four predicted position deviation values. Further performing out-of-range elimination on the prediction boxes and using NMS non-maximum value to inhibit, and eliminating overlapped boxes. The threshold IoU for the NMS is first set to 0.7, i.e., only anchor boxes with a local maximum fraction of coverage not exceeding 0.7 are retained. Finally, leaving about M anchors, and then taking the first N anchors from large to small according to the value of rpn _ cls _ prob; finally, there are only about N regions of region propofol when the next ROI is entered into Pooling.
S3.6: inputting the region prousals generated by RPN and the feature maps generated before into ROI Pooling layer to traverse each region prousal, and reducing the coordinate value by 16 times, thus the region prousals generated on the basis of original drawing can be mapped onto the feature map of M x N, and a region is determined on the feature map, namely the feature map corresponding to the region prousals, and is used as the full connection input of the next layer.
S3.7: calculating the specific category of each region proxy through full connect layer and softmax, and outputting cls _ prob probability vector; and simultaneously, obtaining the position offset bbox _ pred of each region proxy by using the bounding box regression again, and obtaining a more accurate target detection frame by regression.
And after finishing the target detection, taking the picture with the target detection frame as the input of the CPN network to carry out joint point estimation.
S4: taking the image with the target detection frame as the input of the CPN, extracting the human skeleton in the target detection frame, marking the joint points, and obtaining the image with the target detection frame and the joint point marking information; and converting the images with the target detection frame and the joint point mark information in each group into a pixel matrix.
As shown in fig. 2, in the GolbalNet stage, it is responsible for detecting all joint points in the image, and the prediction effect of joint points of the eye, arm, etc. which are easier to detect is better. On the other hand, sufficient contextual information can be provided, which is important for inferring occluded and invisible joint points. According to the figure, the joint points of the human body, such as ears, left elbow and right elbow, can be simply measured and predicted, and the joint points which are easy to observe are mainly detected at this stage. And processing and predicting the pixel information around the joint points which are difficult to observe.
In the RefineNet stage, it is responsible for modifying the results of the GolbAllNet prediction. GolbalNet predicts large errors for those joints where the body part is occluded, invisible, or has a complex background, and reflonenet corrects these points exclusively.
And integrating the two steps, and putting the two steps into an original image to obtain a picture subjected to joint point estimation.
S5: taking each group of pixel matrixes obtained in the step S4 as a sample to form a sample training set; training the LSTM model by using a sample training set to obtain the probability that the sample belongs to each violation behavior, and forming a probability matrix;
in order to better avoid the problem of gradient disappearance, the LSTM is selected as a network model. Training the LSTM network model by adopting a back propagation algorithm to obtain characteristic relation between human body states and time in a training set, and acquiring a plurality of weights and offsets of the LSTM network;
with reference to fig. 3 and 4, the specific process of the LSTM model training is as follows:
firstly, determining functions and parameters of the LSTM model, wherein the specific activation functions are sigmoid and Tanh functions. To prevent over-fitting, a value of dropout, here 0.2, should be determined, and the loss function uses the variance of the predicted value from the true value. 1000 data sets, with the batch size set to 200, then epoch is 5.
During training, data is input into the convolutional layer firstly, feature extraction is carried out, and dimension reduction is carried out when the data enters the pooling layer. Here, as to the specific example, there may be a plurality of convolution and pooling layers, and the size of the sliding window may vary according to the example. After a dropout, it enters the LSTM network. After being processed by a plurality of neurons, the data is subjected to dropout again, regression is carried out, and the result is a probability matrix in which each behavior can possibly occur. And on the premise that the iteration times are not full, calculating the update weight and the offset of the LossFunction. The above process is repeated until the number of iterations is full.
S6: and correcting the probability matrix by using the hidden Markov model, taking the violation behavior corresponding to the maximum probability value in the corrected probability matrix as a final prediction result, and training the hidden Markov model according to the prediction result and the real result.
The method specifically comprises the following steps: determining a set of implicit states as S ═ S1,s2,...,sNAnd the observation state set is O ═ O1,o2,...,oNN is the type number of the violation behaviors;
determining a state transition probability matrix A ═ aij]N*NAnd the state at the current moment is only related to the state at the last moment, namely:
aij=p(yt+1=sj|yt=si)
wherein, ytIndicating the state at time t, yt=siThe state at time t is represented as si,yt+1=sjRepresents a state at time t +1 as sj;p(yt+1=sj|yt=si) Indicates that the state at time t is siThen the state at time t +1 is sjA probability value of (d); a isijFor observing the element in the ith row and jth column of the probability matrix A, i.e. at time t-1, the state of the model is siAt time t, the state of the model is transferred to sj
Determining an observation probability matrix B ═ Bij]N*N
bij=p(xt=oj|yt=si)
Wherein o isjDenotes the jth observed value, xtRepresenting the observed value, x, at time tt=ojThe observed value at time t is oj;p(xt=oj|yt=si) Indicates that the state at time t is siThen the observed value at time t is ojA probability value of (d); bijFor observing the elements of row i and column j in the probability matrix B, i.e. in state siUnder the condition of occurrence of the observed value ojA probability value of (d);
taking the probability matrix output by the LSTM model as the probability distribution pi (pi) of the initial state1,Π2,…,ΠN):
Πi=p(y=si)
Therein, IIiIndicates belonging to state siThe probability of (c).
In practical application, video data to be predicted is obtained, the video data to be predicted is converted into a pixel matrix to be processed through S1-S4, a trained LSTM model is used for obtaining an initial probability matrix, the initial probability matrix is corrected through the trained hidden Markov model, and violation behaviors corresponding to the maximum probability values in the corrected probability matrix are taken as final prediction results.

Claims (4)

1. A human violation behavior prediction method based on a hidden Markov model and a recurrent neural network is characterized by comprising the following steps:
1) data acquisition: acquiring video data of different illegal behaviors, slicing the video data, converting continuous video data into continuous images, and marking the illegal behaviors of each group of continuous images;
2) preprocessing the image;
3) carrying out target detection on the preprocessed image to obtain a target detection frame;
4) taking the image with the target detection frame as the input of the CPN, extracting the human skeleton in the target detection frame, marking the joint points, and obtaining the image with the target detection frame and the joint point marking information; converting the images with the target detection frame and the joint point mark information in each group into pixel matrixes;
5) taking each group of pixel matrixes obtained in the step 4) as a sample to form a sample training set; training the LSTM model by using a sample training set, wherein the data input into the LSTM model is a pixel matrix a which is ordered according to time series (a)1,a2,a3,a4,…,an) Wherein a isiExpressing a pixel matrix corresponding to the ith image in the group, obtaining the probability that the sample belongs to each violation behavior, and forming a probability matrix;
6) correcting the probability matrix by using a hidden Markov model, taking the violation behavior corresponding to the maximum probability value in the corrected probability matrix as a final prediction result, and training the hidden Markov model according to the prediction result and a real result;
the hidden Markov model establishing process comprises the following steps:
6.1) determining the set of implicit states as S ═ S1,s2,...,sNAnd the observation state set is O ═ O1,o2,...,oNN is the type number of the violation behaviors;
6.2) determining the state transition probability matrix A ═ aij]N*NAnd the state at the current moment is only related to the state at the last moment, namely:
aij=p(yt+1=sj|yt=si)
wherein, ytIndicating the state at time t, yt=siThe state at time t is represented as si,yt+1=sjRepresents a state at time t +1 as sj;p(yt+1=sj|yt=si) Indicates that the state at time t is siThen the state at time t +1 is sjA probability value of (d); a isijFor observing the element in the ith row and jth column of the probability matrix A, i.e. at time t-1, the state of the model is siAt time t, the state of the model is transferred to sj
6.3) determining the observation probability matrix B ═ Bij]N*N
bij=p(xt=oj|yt=si)
Wherein o isjDenotes the jth observed value, xtRepresenting the observed value, x, at time tt=ojThe observed value at time t is oj;p(xt=oj|yt=si) Indicates that the state at time t is siThen the observed value at time t is ojA probability value of (d); bijFor observing the elements of row i and column j in the probability matrix B, i.e. in state siUnder the condition of occurrence of the observed value ojA probability value of (d);
6.4) taking the probability matrix output by the LSTM model as the probability distribution pi (pi) of the initial state1,Π2,…,ΠN):
Πi=p(y=si)
Therein, IIiIndicates belonging to state siThe probability of (d);
7) obtaining video data to be predicted, converting the video data to be predicted into a pixel matrix to be processed through steps 1) to 4), obtaining an initial probability matrix by using a trained LSTM model, correcting the initial probability matrix by using a trained hidden Markov model, and taking violation behaviors corresponding to maximum probability values in the corrected probability matrix as final prediction results.
2. The human body violation prediction method based on hidden markov models and recurrent neural networks as claimed in claim 1, wherein the preprocessing method in step 2) is a filtering method, taking a square region with a pixel point on a picture as a center, sorting the gray value of each pixel point in the region, taking the sorted middle value as a new value of the gray value of the center pixel, and traversing the image in a sliding window manner.
3. The human violation prediction method based on hidden markov models and recurrent neural networks as claimed in claim 1, wherein the target detection process in step 3) comprises:
sequentially carrying out size adjustment on continuous images in a group and extracting features to obtain a feature map;
carrying out convolution on the characteristic diagram once, concentrating characteristic information, and then dividing the characteristic diagram into two branches: in the first branch, people and backgrounds are distinguished through rpn _ data layers, and candidate frames marked as people are output; in the second branch, calculating and outputting the offset of the candidate frame;
performing border crossing elimination and NMS non-maximum suppression on the candidate frames, and eliminating overlapped frames; and inputting the remaining candidate frames and the feature map into a ROIPooling layer, mapping the candidate frames onto the feature map, and outputting the candidate frames after passing through a full connection layer.
4. The human violation prediction method according to claim 1, further comprising a step of feature extraction and dimension reduction of the pixel matrix between step 4) and step 5), and specifically comprising: inputting the pixel matrix obtained in the step 4) into a convolution layer for feature extraction, and then entering a pooling layer for dimension reduction; the reduced pixel matrix is used as the input of the LSTM model.
CN202110302219.7A 2021-03-22 2021-03-22 Human body violation prediction method based on hidden Markov model and recurrent neural network Active CN113065431B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110302219.7A CN113065431B (en) 2021-03-22 2021-03-22 Human body violation prediction method based on hidden Markov model and recurrent neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110302219.7A CN113065431B (en) 2021-03-22 2021-03-22 Human body violation prediction method based on hidden Markov model and recurrent neural network

Publications (2)

Publication Number Publication Date
CN113065431A CN113065431A (en) 2021-07-02
CN113065431B true CN113065431B (en) 2022-06-17

Family

ID=76563393

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110302219.7A Active CN113065431B (en) 2021-03-22 2021-03-22 Human body violation prediction method based on hidden Markov model and recurrent neural network

Country Status (1)

Country Link
CN (1) CN113065431B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537122A (en) * 2021-07-28 2021-10-22 浙江大华技术股份有限公司 Motion recognition method and device, storage medium and electronic equipment
CN113950113B (en) * 2021-10-08 2022-10-25 东北大学 Internet of vehicles switching decision method based on hidden Markov
CN114565087B (en) * 2022-04-28 2022-07-22 苏州浪潮智能科技有限公司 Method, device and equipment for reasoning intention of people and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
CN108549876A (en) * 2018-04-20 2018-09-18 重庆邮电大学 The sitting posture detecting method estimated based on target detection and human body attitude
CN108681923A (en) * 2018-05-16 2018-10-19 浙江大学城市学院 A kind of consumer spending behavior prediction method based on modified hidden Markov model
WO2019043406A1 (en) * 2017-08-31 2019-03-07 Calipsa Limited Anomaly detection from video data from surveillance cameras
CN109902744A (en) * 2019-02-28 2019-06-18 成都新希望金融信息有限公司 A method of it is modified based on exceptional value of the Markov transition matrix to matrix
CN112329974A (en) * 2020-09-03 2021-02-05 中国人民公安大学 LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN112347991A (en) * 2020-11-30 2021-02-09 北京理工大学 Method for analyzing skiing motion sequence based on hidden Markov

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845499A (en) * 2017-01-19 2017-06-13 清华大学 A kind of image object detection method semantic based on natural language
CN107563801A (en) * 2017-08-23 2018-01-09 浙江大学城市学院 Consumer behavior Forecasting Methodology under a kind of consumer's line based on hidden Markov model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615983A (en) * 2015-01-28 2015-05-13 中国科学院自动化研究所 Behavior identification method based on recurrent neural network and human skeleton movement sequences
WO2019043406A1 (en) * 2017-08-31 2019-03-07 Calipsa Limited Anomaly detection from video data from surveillance cameras
CN108549876A (en) * 2018-04-20 2018-09-18 重庆邮电大学 The sitting posture detecting method estimated based on target detection and human body attitude
CN108681923A (en) * 2018-05-16 2018-10-19 浙江大学城市学院 A kind of consumer spending behavior prediction method based on modified hidden Markov model
CN109902744A (en) * 2019-02-28 2019-06-18 成都新希望金融信息有限公司 A method of it is modified based on exceptional value of the Markov transition matrix to matrix
CN112329974A (en) * 2020-09-03 2021-02-05 中国人民公安大学 LSTM-RNN-based civil aviation security event behavior subject identification and prediction method and system
CN112347991A (en) * 2020-11-30 2021-02-09 北京理工大学 Method for analyzing skiing motion sequence based on hidden Markov

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Deep convolutional framework for abnormal behavior detection in a smart surveillance system;Kwang-EunKo et.al;《Engineering Applications of Artificial Intelligence》;20180131;第67卷;第226-234页 *
基于三维骨架的时空表示与人体行为识别;丁文文;《中国博士学位论文全文数据库 信息科技辑》;20190115;第I138-160页 *

Also Published As

Publication number Publication date
CN113065431A (en) 2021-07-02

Similar Documents

Publication Publication Date Title
CN113065431B (en) Human body violation prediction method based on hidden Markov model and recurrent neural network
CN109800689B (en) Target tracking method based on space-time feature fusion learning
WO2021139069A1 (en) General target detection method for adaptive attention guidance mechanism
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN107016357B (en) Video pedestrian detection method based on time domain convolutional neural network
WO2022036777A1 (en) Method and device for intelligent estimation of human body movement posture based on convolutional neural network
CN111563473B (en) Remote sensing ship identification method based on dense feature fusion and pixel level attention
CN110598736A (en) Power equipment infrared image fault positioning, identifying and predicting method
CN110956111A (en) Artificial intelligence CNN, LSTM neural network gait recognition system
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN110232380A (en) Fire night scenes restored method based on Mask R-CNN neural network
CN111611874B (en) Face mask wearing detection method based on ResNet and Canny
CN107481188A (en) A kind of image super-resolution reconstructing method
CN110533695A (en) A kind of trajectory predictions device and method based on DS evidence theory
CN108921879A (en) The motion target tracking method and system of CNN and Kalman filter based on regional choice
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN112784736B (en) Character interaction behavior recognition method based on multi-modal feature fusion
CN111640136B (en) Depth target tracking method in complex environment
CN109145836A (en) Ship target video detection method based on deep learning network and Kalman filtering
CN106815576B (en) Target tracking method based on continuous space-time confidence map and semi-supervised extreme learning machine
CN113378676A (en) Method for detecting figure interaction in image based on multi-feature fusion
CN110969171A (en) Image classification model, method and application based on improved convolutional neural network
CN110310305B (en) Target tracking method and device based on BSSD detection and Kalman filtering
CN112926522B (en) Behavior recognition method based on skeleton gesture and space-time diagram convolution network
CN115375737B (en) Target tracking method and system based on adaptive time and serialized space-time characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant