CN113743306A - Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate - Google Patents
Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate Download PDFInfo
- Publication number
- CN113743306A CN113743306A CN202111037913.7A CN202111037913A CN113743306A CN 113743306 A CN113743306 A CN 113743306A CN 202111037913 A CN202111037913 A CN 202111037913A CN 113743306 A CN113743306 A CN 113743306A
- Authority
- CN
- China
- Prior art keywords
- branch
- slow
- frame rate
- output
- convolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 title claims abstract description 25
- 206010000117 Abnormal behaviour Diseases 0.000 title claims abstract description 23
- 230000004927 fusion Effects 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims description 14
- 230000006399 behavior Effects 0.000 claims description 12
- 238000005096 rolling process Methods 0.000 claims description 12
- 238000005070 sampling Methods 0.000 claims description 12
- 238000004458 analytical method Methods 0.000 claims description 11
- 230000009194 climbing Effects 0.000 claims description 5
- 238000002790 cross-validation Methods 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 2
- 230000006870 function Effects 0.000 description 12
- 238000012360 testing method Methods 0.000 description 5
- 238000011161 development Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 238000013145 classification model Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Closed-Circuit Television Systems (AREA)
Abstract
The invention discloses a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate, and belongs to the field of video image recognition. In order to enable the slow network model to capture more space semantic information under the slow branch, the invention builds a multi-feature fusion slow double-frame rate network model, and the model performs top-down feature fusion on a feature layer under the slow branch, thereby improving the ability of the slow branch to extract category space semantic information. In order to enable the built slowfast network model to be better and converge more quickly, the method optimizes and designs the loss function of the slowfast network model, and improves the classification capability of the network by adopting the loss function based on the soft label.
Description
Technical Field
The invention relates to the field of video image recognition, in particular to a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on a slowfast double-frame rate.
Background
The digital industrial innovation integrating a new generation of internet technology, internet of things technology and AI technology is becoming a new engine for realizing public safety of monitoring, processing, analyzing and information conversion outputting, promoting the rapid development of intelligent monitoring and promoting the upgrade of traditional Chinese security to digital security. China relies on the rapid development of information technology and the increasingly perfect construction of intelligent information facilities, and intelligent equipment is becoming a reliable support for a new generation of public safety products. In recent years, video surveillance has begun to play a great role in a variety of scenes, effectively improving public safety management efficiency. However, with the continuous development of video monitoring, the monitoring cameras are distributed at all corners of a city, and the video inspection by means of human eye observation cannot meet the requirements of the current social development.
The intelligent monitoring of the 'eyes and the brain' in public safety can identify and interpret abnormal states in a monitoring range, send out corresponding early warning according to regulations and timely remind a supervisor to take measures. However, in the abnormal behavior analysis in the intelligent video monitoring system, there are dimensional disparities in the target, range and accuracy of analysis and detection, and uncertainty exists in quality, effect and behavior action. Moreover, the intelligent monitoring abnormal behavior analysis is used as an extension of perception type analysis and behavior judgment, and the analysis itself has the problems of single-node analysis failure, dynamic change of analysis and constraint, unequal information in space and uncertainty in time dimension. Conventional video surveillance does not adequately account for the aforementioned problems of disparity and uncertainty.
Disclosure of Invention
The invention aims to analyze the abnormal behavior of intelligent video monitoring in real time by using a slow double-frame rate model, and aims to enable the slow network model to capture more space semantic information under a slow branch, improve the capacity of the slow branch for extracting class space semantics and enable the classification precision of the network training model to be higher. The invention provides a method for analyzing abnormal behaviors of slow fast double-frame-rate real-time intelligent video monitoring based on multi-feature fusion and soft-label cross entropy loss function
The technical scheme adopted by the invention is as follows:
a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate comprises the following steps:
A. acquiring a character video clip with a specific behavior as a sample data set in an application scene, labeling a pedestrian category label, and preprocessing the sample data set;
B. building a multi-feature fused slow fast double-frame rate network model, wherein the model comprises a slow branch and a fast branch, the low branch operates at a low frame rate, and the fast branch operates at a high frame rate;
the slow branch comprises three first convolution blocks which are sequentially connected, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block;
the fast branch comprises three second rolling blocks which are connected in sequence, the input of the 1 st rolling block is a video frame image obtained by high frame rate sampling, and the output of the previous rolling block is used as the input of the next rolling block; and connecting the output of the first convolution block in the fast branch laterally with the output of the first convolution block of the slow branch;
after the output results of the last rolling block of the Slow branch and the last rolling block of the fast branch are connected, predicting the behavior category through a softmax function;
C. b, training the multi-feature fusion slowfast double-frame rate network model established in the step B by using a loss function based on a soft label and the sample data set in the step A;
D. and acquiring a monitoring video in real time, and detecting abnormal behaviors by using a trained slowfast double-frame rate network model with multi-feature fusion.
Preferably, the category labels include fighting, climbing and falling.
Preferably, the category label adopts one-hot coding, the position of the category is 1, and the rest is 0.
Preferably, the lateral connection is specifically: and fusing the output of the 1 st volume block of the fast branch with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and fusing the output of the 2 nd volume block of the fast branch with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch.
Preferably, when the side connection is performed, the result output in the fast branch is sampled every α frames, converted into the same number of video frames as in the slow branch, and then connected in the channel direction.
Preferably, the first convolution block and the second convolution block are implemented by adopting a network structure of multi-layer feature fusion output, and are composed of k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block.
Preferably, the soft tag-based loss function is:
wherein L isCEFor the loss value, N is the batch-size sample number, m is the class number, pji(k)Is the probability, p, that the jth sample is predicted as class i at the kth time of the network iterationji(k-1)Is the probability that the jth sample is predicted as the ith class during the last iteration of the network; y isji(k)Representing a soft label vector, having a length of m; k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
Preferably, the training of step C is performed according to a cross-validation method on the video data set in step a.
Preferably, the step D specifically includes: the method comprises the steps of obtaining a monitoring video in real time, extracting 15 frames of high frame rate sampling video frame images in 1 second by a fast branch in a trained multi-feature fused slowfast double-frame rate network model, extracting 2 frames of low frame rate sampling video frame images in 1 second by a slow branch, obtaining a behavior type through softmax function prediction after connecting output results of the two branches, and sending out a warning when abnormal behaviors are monitored.
Compared with the prior art, the invention has the beneficial effects that:
the method is used for analyzing the abnormal behaviors of the monitored video based on a slow double-frame rate network model, wherein the slow double-frame rate network model comprises a slow branch and a fast branch, the slow branch runs at a low frame rate, a large time sequence span (namely the number of frames skipped per second) is used, for example, 2 frames are extracted within 1 second, and the purpose is to capture semantic information provided by images or a plurality of sparse frames; the fast branch operates at a high frame rate, has a high temporal resolution, and takes 15 frames using a very small time span, e.g., 1 second, with the goal of capturing rapidly changing motion. In addition, the two branches adopt a structure of multi-layer feature fusion output, and feature fusion from top to bottom is carried out on the feature layers by utilizing the characteristic that features of different layers have different semantics, so that the capability of the Slow branch for extracting category space semantics and the capability of the fast branch for extracting time semantic information to weaken the space semantic information are improved.
In the slowfast network model, in order to train the classification model better, the cross entropy loss function is improved, and y is usedjiClass one-hot encoding (a vector consisting of 0 and 1 is converted to a soft label form, updating the label with the probability prediction results of each round.
The present invention is a technical breakthrough for methods based on analysis on image slices lacking analysis in the time dimension or on video data that is not differentiated in the time dimension.
Drawings
FIG. 1 is a diagram of the abnormal behavior analysis steps of the present invention;
FIG. 2 is a video sequence sample data for three types of behaviors shown in an embodiment of the present invention;
fig. 3 is a schematic diagram of a slowfast network structure with multi-feature fusion proposed by the present invention;
fig. 4 is a schematic diagram of the structure of each volume block in the slowfast network of fig. 3.
Detailed Description
The invention is described in detail below with reference to the drawings and specific embodiments, but the invention is not limited thereto.
The cross entropy loss function of the soft label is shown in fig. 1.
More specifically, the implementation steps of the invention are as follows:
A. in the practical application scene, the acquisition and labeling (such as fighting, climbing, paddling and the like) of the pedestrian video sample data set are carried out, and the pretreatment is carried out on the sample data set
In this embodiment, in an actual application scenario, a monitoring camera is used to capture video samples of 30 people, 500 video segments of about 10 seconds are obtained, the 500 videos are divided into 10 types of pedestrian behaviors (e.g., fighting, climbing, falling, etc.), each type of behavior includes 50 video segments, a data sample set of a part of the video segments is shown in fig. 2, where case 1 is a fighting video sequence, case 2 is a climbing video sequence, and case 3 is a falling video sequence.
B. Building a multi-feature fused slowfast double-frame rate network model
As shown in fig. 3, the slowfast two-frame rate network model with multi-feature fusion includes two branches: a slow branch and a fast branch, wherein the slow branch operates at a low frame rate, and extracts 2 frames in 1 second using a large time span (i.e., the number of frames skipped per second), aiming at capturing semantic information provided by an image or a few sparse frames; the fast branch operates at a high frame rate, has high temporal resolution, and takes 15 frames in 1 second using a very small time span, with the aim of capturing rapidly changing motion.
The slow branch comprises three first convolution blocks which are connected in sequence, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block. The marked C in the Slow branch indicates the channel and T indicates the number of sample frames.
The fast branch comprises three second convolution blocks which are connected in sequence, the input of the 1 st convolution block of the fast branch is a video frame image obtained by high frame rate sampling, and the output of the previous convolution block is used as the input of the next convolution block. In addition, the feature information extracted from the fast branch is added to the main stem of the slow branch through lateral connection, that is, the output of the 1 st volume block of the fast branch is fused with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and the output of the 2 nd volume block of the fast branch is fused with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch. This enables the slow branch to extract the spatial semantic information and also to obtain the temporal semantic information of the fast branch. β C, labeled in the Fast branch, represents the channel and α T represents the number of sample frames. Because the fast branch focuses more on the time sequence and weakens the spatial semantic information, the number of the fast branch network channels is 1/8 of the slow branch, so that the whole network becomes light and efficient, and the performance of real-time monitoring can be achieved.
Structure for outputting fast branch in lateral connection (alpha T, S)2Beta C deformation into a structure of slow branch { T, S }2α β C, that is to say that α frames need to be pushed into one frame. In this embodiment, the sampling may be performed by time-sampled, and the sampling may be performed simply every α frame, { α T, S2Beta C is transformed into { T, S }2β C }. Transformed { T, S2Beta C and { T, S ] output by slow branch2And the alpha beta C is connected according to the channel.
The convolution block in the slow branch and the convolution block in the fast branch both adopt a multi-layer characteristic fused output network structure shown in fig. 4 and are composed of k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block. In FIG. 3, the input of the 1 st convolutional layer is denoted as X0The output of the 1 st convolutional layer is denoted as X1The input of the k-th convolutional layer is denoted as Xk-1Layer kThe output of the convolutional layer is recorded as XkAnd the final output is recorded as XU。
The network structure shown in fig. 3 includes output of multiple layers of features, and features of different layers of features are utilized to perform feature fusion from top to bottom, so that the ability of slow branch extraction type space semantics and the ability of fast branch extraction time semantics are improved.
C. Loss function for designing slowfast network model
The method comprises the following steps that the Slowfast network model is finally used for classifying human behaviors in a video, the probability that the last characteristic layer of the network outputs categories through softmax is utilized, and in a training stage, the network model is optimized through a cross entropy loss function, so that the probability that the softmax outputs correct categories is higher, and the cross entropy loss function formula is as follows:
where N is the number of batch-size samples, m is the number of classes, pjiIs the probability that the network predicts as this class, yjiAnd if the real category of the sample j is i, the position of i in the one-hot coding vector is 1, and otherwise, the position is 0.
For better training of the classification model, this embodiment improves the cross-entropy loss function described above, where y isjiIs a one-hot code (a vector consisting of 0 and 1) of a class, and is expressed as that the position of the class is 1, and the position of the class is not 0, belonging to a hard tag, and the improved cross-entropy loss function changes the real tag from a hard tag to a soft tag, which is expressed as follows:
where N is the number of batch-size samples, m is the number of classes, pji(k)Is the probability, y, of predicting the class at the kth time of the network iterationji(k)The position indicated as being of this class is then pji(k-1)If the position in this category is 0, pji(k-1)The probability of the class predicted by the network in the last iteration is represented as follows:
where k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
D. Training the slowfast network model by using the sample data set constructed in the step A
Dividing the video segment data set in the step A into 10 mutually exclusive subsets with equal size according to a cross validation method, using a union set of 9 subsets as a training set each time, and using the rest subset as a test set, thus obtaining 10 groups of training/test sets;
training a slowfast network model by using the divided video data sets in a combined manner, and expanding and enhancing the data sets by integrally turning and randomly erasing videos of the data sets; for the network structure shown in fig. 4, the network is initialized with weights pre-trained by the ImageNet dataset, so that the network converges faster. In the slowfast network training, an initial learning rate is set to be 0.01, the value of the learning rate is exponentially reduced along with the training times, the size of batch size is set to be 8, the network training is stopped after 400 epochs are trained, and finally the trained model is stored as a pt file.
E. Testing the Slowfast network model
And loading the slowfast network model, reading the trained parameter file, namely, the pt file, including the weight values of all the network layers, importing the weight values into the slowfast network, and testing the effect of the model by using the test set divided in the step D.
In practical application, the video monitoring image is collected to be used as the input of a trained slowfast network model, so that the method can be used for monitoring abnormal behaviors in real time, and the type of the abnormal behaviors can be output and an alarm can be given out when the abnormal behaviors exist.
Claims (9)
1. A method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slow fast double-frame rate is characterized by comprising the following steps:
A. acquiring a character video clip with a specific behavior as a sample data set in an application scene, labeling a pedestrian category label, and preprocessing the sample data set;
B. building a multi-feature fused slow fast double-frame rate network model, wherein the model comprises a slow branch and a fast branch, the low branch operates at a low frame rate, and the fast branch operates at a high frame rate;
the slow branch comprises three first convolution blocks which are sequentially connected, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block;
the fast branch comprises three second rolling blocks which are connected in sequence, the input of the 1 st rolling block is a video frame image obtained by high frame rate sampling, and the output of the previous rolling block is used as the input of the next rolling block; and connecting the output of the first convolution block in the fast branch laterally with the output of the first convolution block of the slow branch;
after the output results of the last rolling block of the Slow branch and the last rolling block of the fast branch are connected, predicting the behavior category through a softmax function;
C. b, training the multi-feature fusion slowfast double-frame rate network model established in the step B by using a loss function based on a soft label and the sample data set in the step A;
D. and acquiring a monitoring video in real time, and detecting abnormal behaviors by using a trained slowfast double-frame rate network model with multi-feature fusion.
2. The method according to claim 1, wherein the category labels include fighting, climbing, and falling over.
3. The method according to claim 1, wherein the class label is encoded by using a single hot code, the class position is 1, and the rest are 0.
4. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate as claimed in claim 1, wherein the lateral connection specifically comprises: and fusing the output of the 1 st volume block of the fast branch with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and fusing the output of the 2 nd volume block of the fast branch with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch.
5. The method according to claim 4, wherein during the lateral connection, the result output in the fast branch is sampled every α frames, converted into the same number of video frames as in the slow branch, and then connected in the channel direction.
6. The method according to claim 1, wherein the slow-frame-rate-based real-time intelligent video monitoring abnormal behavior analysis method is implemented by using a network structure with multi-layer feature fusion output, and is characterized in that the slow-branch convolution block and the fast-branch convolution block are formed by k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block.
7. The method according to claim 1, wherein the soft-tag-based loss function is:
wherein L isCEFor the loss value, N is the batch-size sample number, m is the class number, pji(k)Is the probability, p, that the jth sample is predicted as class i at the kth time of the network iterationji(k-1)Is the probability that the jth sample is predicted as the ith class during the last iteration of the network; y isji(k)Representing a soft label vector, having a length of m; k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
8. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate as claimed in claim 1, wherein the training of the step C is performed on the video data set in the step A according to a cross-validation method.
9. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate according to the claim 1, wherein the step D is specifically as follows: the method comprises the steps of obtaining a monitoring video in real time, extracting 15 frames of high frame rate sampling video frame images in 1 second by a fast branch in a trained multi-feature fused slowfast double-frame rate network model, extracting 2 frames of low frame rate sampling video frame images in 1 second by a slow branch, obtaining a behavior type through softmax function prediction after connecting output results of the two branches, and sending out a warning when abnormal behaviors are monitored.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111037913.7A CN113743306A (en) | 2021-09-06 | 2021-09-06 | Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111037913.7A CN113743306A (en) | 2021-09-06 | 2021-09-06 | Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113743306A true CN113743306A (en) | 2021-12-03 |
Family
ID=78735889
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111037913.7A Pending CN113743306A (en) | 2021-09-06 | 2021-09-06 | Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113743306A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114550047A (en) * | 2022-02-22 | 2022-05-27 | 西安交通大学 | Behavior rate guided video behavior identification method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178432A (en) * | 2019-12-30 | 2020-05-19 | 武汉科技大学 | Weak supervision fine-grained image classification method of multi-branch neural network model |
CN112183313A (en) * | 2020-09-27 | 2021-01-05 | 武汉大学 | SlowFast-based power operation field action identification method |
CN112232355A (en) * | 2020-12-11 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Image segmentation network processing method, image segmentation device and computer equipment |
CN112597824A (en) * | 2020-12-07 | 2021-04-02 | 深延科技(北京)有限公司 | Behavior recognition method and device, electronic equipment and storage medium |
-
2021
- 2021-09-06 CN CN202111037913.7A patent/CN113743306A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111178432A (en) * | 2019-12-30 | 2020-05-19 | 武汉科技大学 | Weak supervision fine-grained image classification method of multi-branch neural network model |
CN112183313A (en) * | 2020-09-27 | 2021-01-05 | 武汉大学 | SlowFast-based power operation field action identification method |
CN112597824A (en) * | 2020-12-07 | 2021-04-02 | 深延科技(北京)有限公司 | Behavior recognition method and device, electronic equipment and storage medium |
CN112232355A (en) * | 2020-12-11 | 2021-01-15 | 腾讯科技(深圳)有限公司 | Image segmentation network processing method, image segmentation device and computer equipment |
Non-Patent Citations (2)
Title |
---|
冯超峰: "基于SlowFast与时域分割策略的行为识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
曹占涛等: "基于修正标签分布的乳腺超声图像分类", 《电子科技大学学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114550047A (en) * | 2022-02-22 | 2022-05-27 | 西安交通大学 | Behavior rate guided video behavior identification method |
CN114550047B (en) * | 2022-02-22 | 2024-04-05 | 西安交通大学 | Behavior rate guided video behavior recognition method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113936339B (en) | Fighting identification method and device based on double-channel cross attention mechanism | |
CN110175580B (en) | Video behavior identification method based on time sequence causal convolutional network | |
Zhao et al. | Spatio-temporal autoencoder for video anomaly detection | |
WO2020173226A1 (en) | Spatial-temporal behavior detection method | |
CN111259786B (en) | Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video | |
CN109711463B (en) | Attention-based important object detection method | |
CN108537119B (en) | Small sample video identification method | |
CN112580523A (en) | Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium | |
CN112016500A (en) | Group abnormal behavior identification method and system based on multi-scale time information fusion | |
CN112468888B (en) | Video abstract generation method and system based on GRU network | |
CN111738218B (en) | Human body abnormal behavior recognition system and method | |
Wang et al. | Spatial–temporal pooling for action recognition in videos | |
CN111914731B (en) | Multi-mode LSTM video motion prediction method based on self-attention mechanism | |
CN112200096B (en) | Method, device and storage medium for realizing real-time abnormal behavior identification based on compressed video | |
CN109446897B (en) | Scene recognition method and device based on image context information | |
CN109583334B (en) | Action recognition method and system based on space-time correlation neural network | |
CN111046728A (en) | Straw combustion detection method based on characteristic pyramid network | |
CN109614896A (en) | A method of the video content semantic understanding based on recursive convolution neural network | |
CN116721458A (en) | Cross-modal time sequence contrast learning-based self-supervision action recognition method | |
CN116580453A (en) | Human body behavior recognition method based on space and time sequence double-channel fusion model | |
CN113743306A (en) | Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate | |
Li et al. | Fire flame image detection based on transfer learning | |
CN115690658B (en) | Priori knowledge-fused semi-supervised video abnormal behavior detection method | |
CN110738129B (en) | End-to-end video time sequence behavior detection method based on R-C3D network | |
CN114120076B (en) | Cross-view video gait recognition method based on gait motion estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20211203 |
|
RJ01 | Rejection of invention patent application after publication |