CN113743306A - Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate - Google Patents

Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate Download PDF

Info

Publication number
CN113743306A
CN113743306A CN202111037913.7A CN202111037913A CN113743306A CN 113743306 A CN113743306 A CN 113743306A CN 202111037913 A CN202111037913 A CN 202111037913A CN 113743306 A CN113743306 A CN 113743306A
Authority
CN
China
Prior art keywords
branch
slow
frame rate
output
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111037913.7A
Other languages
Chinese (zh)
Inventor
涂小妹
包晓安
吴彪
张娜
金瑜婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Guangxia Construction Vocational and Technical University
Original Assignee
Zhejiang Guangxia Construction Vocational and Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Guangxia Construction Vocational and Technical University filed Critical Zhejiang Guangxia Construction Vocational and Technical University
Priority to CN202111037913.7A priority Critical patent/CN113743306A/en
Publication of CN113743306A publication Critical patent/CN113743306A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention discloses a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate, and belongs to the field of video image recognition. In order to enable the slow network model to capture more space semantic information under the slow branch, the invention builds a multi-feature fusion slow double-frame rate network model, and the model performs top-down feature fusion on a feature layer under the slow branch, thereby improving the ability of the slow branch to extract category space semantic information. In order to enable the built slowfast network model to be better and converge more quickly, the method optimizes and designs the loss function of the slowfast network model, and improves the classification capability of the network by adopting the loss function based on the soft label.

Description

Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate
Technical Field
The invention relates to the field of video image recognition, in particular to a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on a slowfast double-frame rate.
Background
The digital industrial innovation integrating a new generation of internet technology, internet of things technology and AI technology is becoming a new engine for realizing public safety of monitoring, processing, analyzing and information conversion outputting, promoting the rapid development of intelligent monitoring and promoting the upgrade of traditional Chinese security to digital security. China relies on the rapid development of information technology and the increasingly perfect construction of intelligent information facilities, and intelligent equipment is becoming a reliable support for a new generation of public safety products. In recent years, video surveillance has begun to play a great role in a variety of scenes, effectively improving public safety management efficiency. However, with the continuous development of video monitoring, the monitoring cameras are distributed at all corners of a city, and the video inspection by means of human eye observation cannot meet the requirements of the current social development.
The intelligent monitoring of the 'eyes and the brain' in public safety can identify and interpret abnormal states in a monitoring range, send out corresponding early warning according to regulations and timely remind a supervisor to take measures. However, in the abnormal behavior analysis in the intelligent video monitoring system, there are dimensional disparities in the target, range and accuracy of analysis and detection, and uncertainty exists in quality, effect and behavior action. Moreover, the intelligent monitoring abnormal behavior analysis is used as an extension of perception type analysis and behavior judgment, and the analysis itself has the problems of single-node analysis failure, dynamic change of analysis and constraint, unequal information in space and uncertainty in time dimension. Conventional video surveillance does not adequately account for the aforementioned problems of disparity and uncertainty.
Disclosure of Invention
The invention aims to analyze the abnormal behavior of intelligent video monitoring in real time by using a slow double-frame rate model, and aims to enable the slow network model to capture more space semantic information under a slow branch, improve the capacity of the slow branch for extracting class space semantics and enable the classification precision of the network training model to be higher. The invention provides a method for analyzing abnormal behaviors of slow fast double-frame-rate real-time intelligent video monitoring based on multi-feature fusion and soft-label cross entropy loss function
The technical scheme adopted by the invention is as follows:
a method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate comprises the following steps:
A. acquiring a character video clip with a specific behavior as a sample data set in an application scene, labeling a pedestrian category label, and preprocessing the sample data set;
B. building a multi-feature fused slow fast double-frame rate network model, wherein the model comprises a slow branch and a fast branch, the low branch operates at a low frame rate, and the fast branch operates at a high frame rate;
the slow branch comprises three first convolution blocks which are sequentially connected, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block;
the fast branch comprises three second rolling blocks which are connected in sequence, the input of the 1 st rolling block is a video frame image obtained by high frame rate sampling, and the output of the previous rolling block is used as the input of the next rolling block; and connecting the output of the first convolution block in the fast branch laterally with the output of the first convolution block of the slow branch;
after the output results of the last rolling block of the Slow branch and the last rolling block of the fast branch are connected, predicting the behavior category through a softmax function;
C. b, training the multi-feature fusion slowfast double-frame rate network model established in the step B by using a loss function based on a soft label and the sample data set in the step A;
D. and acquiring a monitoring video in real time, and detecting abnormal behaviors by using a trained slowfast double-frame rate network model with multi-feature fusion.
Preferably, the category labels include fighting, climbing and falling.
Preferably, the category label adopts one-hot coding, the position of the category is 1, and the rest is 0.
Preferably, the lateral connection is specifically: and fusing the output of the 1 st volume block of the fast branch with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and fusing the output of the 2 nd volume block of the fast branch with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch.
Preferably, when the side connection is performed, the result output in the fast branch is sampled every α frames, converted into the same number of video frames as in the slow branch, and then connected in the channel direction.
Preferably, the first convolution block and the second convolution block are implemented by adopting a network structure of multi-layer feature fusion output, and are composed of k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block.
Preferably, the soft tag-based loss function is:
Figure BDA0003248039950000031
Figure BDA0003248039950000032
wherein L isCEFor the loss value, N is the batch-size sample number, m is the class number, pji(k)Is the probability, p, that the jth sample is predicted as class i at the kth time of the network iterationji(k-1)Is the probability that the jth sample is predicted as the ith class during the last iteration of the network; y isji(k)Representing a soft label vector, having a length of m; k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
Preferably, the training of step C is performed according to a cross-validation method on the video data set in step a.
Preferably, the step D specifically includes: the method comprises the steps of obtaining a monitoring video in real time, extracting 15 frames of high frame rate sampling video frame images in 1 second by a fast branch in a trained multi-feature fused slowfast double-frame rate network model, extracting 2 frames of low frame rate sampling video frame images in 1 second by a slow branch, obtaining a behavior type through softmax function prediction after connecting output results of the two branches, and sending out a warning when abnormal behaviors are monitored.
Compared with the prior art, the invention has the beneficial effects that:
the method is used for analyzing the abnormal behaviors of the monitored video based on a slow double-frame rate network model, wherein the slow double-frame rate network model comprises a slow branch and a fast branch, the slow branch runs at a low frame rate, a large time sequence span (namely the number of frames skipped per second) is used, for example, 2 frames are extracted within 1 second, and the purpose is to capture semantic information provided by images or a plurality of sparse frames; the fast branch operates at a high frame rate, has a high temporal resolution, and takes 15 frames using a very small time span, e.g., 1 second, with the goal of capturing rapidly changing motion. In addition, the two branches adopt a structure of multi-layer feature fusion output, and feature fusion from top to bottom is carried out on the feature layers by utilizing the characteristic that features of different layers have different semantics, so that the capability of the Slow branch for extracting category space semantics and the capability of the fast branch for extracting time semantic information to weaken the space semantic information are improved.
In the slowfast network model, in order to train the classification model better, the cross entropy loss function is improved, and y is usedjiClass one-hot encoding (a vector consisting of 0 and 1 is converted to a soft label form, updating the label with the probability prediction results of each round.
The present invention is a technical breakthrough for methods based on analysis on image slices lacking analysis in the time dimension or on video data that is not differentiated in the time dimension.
Drawings
FIG. 1 is a diagram of the abnormal behavior analysis steps of the present invention;
FIG. 2 is a video sequence sample data for three types of behaviors shown in an embodiment of the present invention;
fig. 3 is a schematic diagram of a slowfast network structure with multi-feature fusion proposed by the present invention;
fig. 4 is a schematic diagram of the structure of each volume block in the slowfast network of fig. 3.
Detailed Description
The invention is described in detail below with reference to the drawings and specific embodiments, but the invention is not limited thereto.
The cross entropy loss function of the soft label is shown in fig. 1.
More specifically, the implementation steps of the invention are as follows:
A. in the practical application scene, the acquisition and labeling (such as fighting, climbing, paddling and the like) of the pedestrian video sample data set are carried out, and the pretreatment is carried out on the sample data set
In this embodiment, in an actual application scenario, a monitoring camera is used to capture video samples of 30 people, 500 video segments of about 10 seconds are obtained, the 500 videos are divided into 10 types of pedestrian behaviors (e.g., fighting, climbing, falling, etc.), each type of behavior includes 50 video segments, a data sample set of a part of the video segments is shown in fig. 2, where case 1 is a fighting video sequence, case 2 is a climbing video sequence, and case 3 is a falling video sequence.
B. Building a multi-feature fused slowfast double-frame rate network model
As shown in fig. 3, the slowfast two-frame rate network model with multi-feature fusion includes two branches: a slow branch and a fast branch, wherein the slow branch operates at a low frame rate, and extracts 2 frames in 1 second using a large time span (i.e., the number of frames skipped per second), aiming at capturing semantic information provided by an image or a few sparse frames; the fast branch operates at a high frame rate, has high temporal resolution, and takes 15 frames in 1 second using a very small time span, with the aim of capturing rapidly changing motion.
The slow branch comprises three first convolution blocks which are connected in sequence, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block. The marked C in the Slow branch indicates the channel and T indicates the number of sample frames.
The fast branch comprises three second convolution blocks which are connected in sequence, the input of the 1 st convolution block of the fast branch is a video frame image obtained by high frame rate sampling, and the output of the previous convolution block is used as the input of the next convolution block. In addition, the feature information extracted from the fast branch is added to the main stem of the slow branch through lateral connection, that is, the output of the 1 st volume block of the fast branch is fused with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and the output of the 2 nd volume block of the fast branch is fused with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch. This enables the slow branch to extract the spatial semantic information and also to obtain the temporal semantic information of the fast branch. β C, labeled in the Fast branch, represents the channel and α T represents the number of sample frames. Because the fast branch focuses more on the time sequence and weakens the spatial semantic information, the number of the fast branch network channels is 1/8 of the slow branch, so that the whole network becomes light and efficient, and the performance of real-time monitoring can be achieved.
Structure for outputting fast branch in lateral connection (alpha T, S)2Beta C deformation into a structure of slow branch { T, S }2α β C, that is to say that α frames need to be pushed into one frame. In this embodiment, the sampling may be performed by time-sampled, and the sampling may be performed simply every α frame, { α T, S2Beta C is transformed into { T, S }2β C }. Transformed { T, S2Beta C and { T, S ] output by slow branch2And the alpha beta C is connected according to the channel.
The convolution block in the slow branch and the convolution block in the fast branch both adopt a multi-layer characteristic fused output network structure shown in fig. 4 and are composed of k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block. In FIG. 3, the input of the 1 st convolutional layer is denoted as X0The output of the 1 st convolutional layer is denoted as X1The input of the k-th convolutional layer is denoted as Xk-1Layer kThe output of the convolutional layer is recorded as XkAnd the final output is recorded as XU
The network structure shown in fig. 3 includes output of multiple layers of features, and features of different layers of features are utilized to perform feature fusion from top to bottom, so that the ability of slow branch extraction type space semantics and the ability of fast branch extraction time semantics are improved.
C. Loss function for designing slowfast network model
The method comprises the following steps that the Slowfast network model is finally used for classifying human behaviors in a video, the probability that the last characteristic layer of the network outputs categories through softmax is utilized, and in a training stage, the network model is optimized through a cross entropy loss function, so that the probability that the softmax outputs correct categories is higher, and the cross entropy loss function formula is as follows:
Figure BDA0003248039950000051
where N is the number of batch-size samples, m is the number of classes, pjiIs the probability that the network predicts as this class, yjiAnd if the real category of the sample j is i, the position of i in the one-hot coding vector is 1, and otherwise, the position is 0.
For better training of the classification model, this embodiment improves the cross-entropy loss function described above, where y isjiIs a one-hot code (a vector consisting of 0 and 1) of a class, and is expressed as that the position of the class is 1, and the position of the class is not 0, belonging to a hard tag, and the improved cross-entropy loss function changes the real tag from a hard tag to a soft tag, which is expressed as follows:
Figure BDA0003248039950000061
where N is the number of batch-size samples, m is the number of classes, pji(k)Is the probability, y, of predicting the class at the kth time of the network iterationji(k)The position indicated as being of this class is then pji(k-1)If the position in this category is 0, pji(k-1)The probability of the class predicted by the network in the last iteration is represented as follows:
Figure BDA0003248039950000062
where k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
D. Training the slowfast network model by using the sample data set constructed in the step A
Dividing the video segment data set in the step A into 10 mutually exclusive subsets with equal size according to a cross validation method, using a union set of 9 subsets as a training set each time, and using the rest subset as a test set, thus obtaining 10 groups of training/test sets;
training a slowfast network model by using the divided video data sets in a combined manner, and expanding and enhancing the data sets by integrally turning and randomly erasing videos of the data sets; for the network structure shown in fig. 4, the network is initialized with weights pre-trained by the ImageNet dataset, so that the network converges faster. In the slowfast network training, an initial learning rate is set to be 0.01, the value of the learning rate is exponentially reduced along with the training times, the size of batch size is set to be 8, the network training is stopped after 400 epochs are trained, and finally the trained model is stored as a pt file.
E. Testing the Slowfast network model
And loading the slowfast network model, reading the trained parameter file, namely, the pt file, including the weight values of all the network layers, importing the weight values into the slowfast network, and testing the effect of the model by using the test set divided in the step D.
In practical application, the video monitoring image is collected to be used as the input of a trained slowfast network model, so that the method can be used for monitoring abnormal behaviors in real time, and the type of the abnormal behaviors can be output and an alarm can be given out when the abnormal behaviors exist.

Claims (9)

1. A method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slow fast double-frame rate is characterized by comprising the following steps:
A. acquiring a character video clip with a specific behavior as a sample data set in an application scene, labeling a pedestrian category label, and preprocessing the sample data set;
B. building a multi-feature fused slow fast double-frame rate network model, wherein the model comprises a slow branch and a fast branch, the low branch operates at a low frame rate, and the fast branch operates at a high frame rate;
the slow branch comprises three first convolution blocks which are sequentially connected, the input of the 1 st convolution block is a video frame image obtained by low frame rate sampling, the output of the 1 st convolution block is simultaneously used as the input of the 2 nd and 3 rd convolution blocks, the output of the 2 nd convolution block is also used as the input of the 3 rd convolution block, and multi-feature fusion is realized in the 3 rd convolution block;
the fast branch comprises three second rolling blocks which are connected in sequence, the input of the 1 st rolling block is a video frame image obtained by high frame rate sampling, and the output of the previous rolling block is used as the input of the next rolling block; and connecting the output of the first convolution block in the fast branch laterally with the output of the first convolution block of the slow branch;
after the output results of the last rolling block of the Slow branch and the last rolling block of the fast branch are connected, predicting the behavior category through a softmax function;
C. b, training the multi-feature fusion slowfast double-frame rate network model established in the step B by using a loss function based on a soft label and the sample data set in the step A;
D. and acquiring a monitoring video in real time, and detecting abnormal behaviors by using a trained slowfast double-frame rate network model with multi-feature fusion.
2. The method according to claim 1, wherein the category labels include fighting, climbing, and falling over.
3. The method according to claim 1, wherein the class label is encoded by using a single hot code, the class position is 1, and the rest are 0.
4. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate as claimed in claim 1, wherein the lateral connection specifically comprises: and fusing the output of the 1 st volume block of the fast branch with the output of the 1 st volume block of the slow branch to be used as the input of the 2 nd volume block of the slow branch, and fusing the output of the 2 nd volume block of the fast branch with the output of the 2 nd volume block of the slow branch to be used as the input of the 3 rd volume block of the slow branch.
5. The method according to claim 4, wherein during the lateral connection, the result output in the fast branch is sampled every α frames, converted into the same number of video frames as in the slow branch, and then connected in the channel direction.
6. The method according to claim 1, wherein the slow-frame-rate-based real-time intelligent video monitoring abnormal behavior analysis method is implemented by using a network structure with multi-layer feature fusion output, and is characterized in that the slow-branch convolution block and the fast-branch convolution block are formed by k +1 layers of convolution layers, the output of the ith layer of convolution layer is spliced with the input of the ith layer of convolution layer and then used as the input of the (i + 1) th layer of convolution layer, and the input of the 1 st layer of convolution layer is spliced with the output of the (k + 1) th layer of convolution layer and then used as the final output of the convolution block.
7. The method according to claim 1, wherein the soft-tag-based loss function is:
Figure FDA0003248039940000021
Figure FDA0003248039940000022
wherein L isCEFor the loss value, N is the batch-size sample number, m is the class number, pji(k)Is the probability, p, that the jth sample is predicted as class i at the kth time of the network iterationji(k-1)Is the probability that the jth sample is predicted as the ith class during the last iteration of the network; y isji(k)Representing a soft label vector, having a length of m; k denotes the number of iterations of the network, NepochRepresenting a preset maximum number of iterations of the network.
8. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate as claimed in claim 1, wherein the training of the step C is performed on the video data set in the step A according to a cross-validation method.
9. The method for analyzing the abnormal behavior of the real-time intelligent video monitoring based on the slowfast double-frame rate according to the claim 1, wherein the step D is specifically as follows: the method comprises the steps of obtaining a monitoring video in real time, extracting 15 frames of high frame rate sampling video frame images in 1 second by a fast branch in a trained multi-feature fused slowfast double-frame rate network model, extracting 2 frames of low frame rate sampling video frame images in 1 second by a slow branch, obtaining a behavior type through softmax function prediction after connecting output results of the two branches, and sending out a warning when abnormal behaviors are monitored.
CN202111037913.7A 2021-09-06 2021-09-06 Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate Pending CN113743306A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111037913.7A CN113743306A (en) 2021-09-06 2021-09-06 Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111037913.7A CN113743306A (en) 2021-09-06 2021-09-06 Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate

Publications (1)

Publication Number Publication Date
CN113743306A true CN113743306A (en) 2021-12-03

Family

ID=78735889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111037913.7A Pending CN113743306A (en) 2021-09-06 2021-09-06 Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate

Country Status (1)

Country Link
CN (1) CN113743306A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550047A (en) * 2022-02-22 2022-05-27 西安交通大学 Behavior rate guided video behavior identification method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178432A (en) * 2019-12-30 2020-05-19 武汉科技大学 Weak supervision fine-grained image classification method of multi-branch neural network model
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112232355A (en) * 2020-12-11 2021-01-15 腾讯科技(深圳)有限公司 Image segmentation network processing method, image segmentation device and computer equipment
CN112597824A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Behavior recognition method and device, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111178432A (en) * 2019-12-30 2020-05-19 武汉科技大学 Weak supervision fine-grained image classification method of multi-branch neural network model
CN112183313A (en) * 2020-09-27 2021-01-05 武汉大学 SlowFast-based power operation field action identification method
CN112597824A (en) * 2020-12-07 2021-04-02 深延科技(北京)有限公司 Behavior recognition method and device, electronic equipment and storage medium
CN112232355A (en) * 2020-12-11 2021-01-15 腾讯科技(深圳)有限公司 Image segmentation network processing method, image segmentation device and computer equipment

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯超峰: "基于SlowFast与时域分割策略的行为识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
曹占涛等: "基于修正标签分布的乳腺超声图像分类", 《电子科技大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114550047A (en) * 2022-02-22 2022-05-27 西安交通大学 Behavior rate guided video behavior identification method
CN114550047B (en) * 2022-02-22 2024-04-05 西安交通大学 Behavior rate guided video behavior recognition method

Similar Documents

Publication Publication Date Title
CN113936339B (en) Fighting identification method and device based on double-channel cross attention mechanism
CN110175580B (en) Video behavior identification method based on time sequence causal convolutional network
Zhao et al. Spatio-temporal autoencoder for video anomaly detection
WO2020173226A1 (en) Spatial-temporal behavior detection method
CN111259786B (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN109711463B (en) Attention-based important object detection method
CN108537119B (en) Small sample video identification method
CN112580523A (en) Behavior recognition method, behavior recognition device, behavior recognition equipment and storage medium
CN112016500A (en) Group abnormal behavior identification method and system based on multi-scale time information fusion
CN112468888B (en) Video abstract generation method and system based on GRU network
CN111738218B (en) Human body abnormal behavior recognition system and method
Wang et al. Spatial–temporal pooling for action recognition in videos
CN111914731B (en) Multi-mode LSTM video motion prediction method based on self-attention mechanism
CN112200096B (en) Method, device and storage medium for realizing real-time abnormal behavior identification based on compressed video
CN109446897B (en) Scene recognition method and device based on image context information
CN109583334B (en) Action recognition method and system based on space-time correlation neural network
CN111046728A (en) Straw combustion detection method based on characteristic pyramid network
CN109614896A (en) A method of the video content semantic understanding based on recursive convolution neural network
CN116721458A (en) Cross-modal time sequence contrast learning-based self-supervision action recognition method
CN116580453A (en) Human body behavior recognition method based on space and time sequence double-channel fusion model
CN113743306A (en) Method for analyzing abnormal behaviors of real-time intelligent video monitoring based on slowfast double-frame rate
Li et al. Fire flame image detection based on transfer learning
CN115690658B (en) Priori knowledge-fused semi-supervised video abnormal behavior detection method
CN110738129B (en) End-to-end video time sequence behavior detection method based on R-C3D network
CN114120076B (en) Cross-view video gait recognition method based on gait motion estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211203

RJ01 Rejection of invention patent application after publication