CN110414367B - Time sequence behavior detection method based on GAN and SSN - Google Patents

Time sequence behavior detection method based on GAN and SSN Download PDF

Info

Publication number
CN110414367B
CN110414367B CN201910599488.7A CN201910599488A CN110414367B CN 110414367 B CN110414367 B CN 110414367B CN 201910599488 A CN201910599488 A CN 201910599488A CN 110414367 B CN110414367 B CN 110414367B
Authority
CN
China
Prior art keywords
network
behavior
proposal
sub
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910599488.7A
Other languages
Chinese (zh)
Other versions
CN110414367A (en
Inventor
李致远
桑农
张士伟
高常鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201910599488.7A priority Critical patent/CN110414367B/en
Publication of CN110414367A publication Critical patent/CN110414367A/en
Application granted granted Critical
Publication of CN110414367B publication Critical patent/CN110414367B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a time sequence behavior detection method based on GAN and SSN, belonging to the technical field of computer vision, and the method comprises the following steps: performing frame extraction and optical flow calculation on the video data, and performing normalization and data enhancement on each frame image or optical flow image; selecting a continuous time region with an action segment in the video data as an offer, and using a frame image corresponding to the selected offer as a training set and a test set; constructing a time sequence behavior detection model comprising a structured segmented network and a generated countermeasure network; inputting a training set and a test set into the time sequence behavior detection model for training to obtain a trained time sequence behavior detection model; and inputting the video to be recognized into the trained time sequence behavior detection model to obtain the behavior category existing in the video and the starting position and the ending position corresponding to the behavior. The invention improves the resolution capability of the network on the background and the behaviors, and has higher identification precision on the detection of the time sequence behaviors in the video.

Description

Time sequence behavior detection method based on GAN and SSN
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a time sequence behavior detection method based on GAN and SSN.
Background
With the rapid spread of the internet, a huge amount of video data is generated, and as one of the largest information carriers in the current society, videos are in a rapidly growing state, and it is urgent how to fully utilize the huge amount of data. Accordingly, the demand for analyzing, classifying, identifying, etc. video data is also growing sharply, and the time-series behavior detection has attracted increasing attention from the research community due to the numerous potential applications in surveillance, video analysis, and other fields. Sequential behavior detection is a subtask in the field of behavior detection, which detects human action instances from unclipped and potentially very long videos, and compared to behavior recognition, its prediction results output not only action categories but also precise start and end time points, thus being more challenging.
In real-world applications, large amounts of video data are typically arbitrarily long in time and large in space, containing many instances of motion and containing much irrelevant background information. Two mainstream methods of detecting motion have been proposed, handmade features and deep features. Before CNN-based algorithms are widely applied in the field of behavior recognition, hand-crafted features achieve the best performance in the thumb 2014 and 2015 challenges, commonly used features including Improved Dense Trajectories (iDT) and Fisher Vectors (FV). Meanwhile, manual fabrication can be combined with deep learning, and high-accuracy results can be achieved. There have also been some recent studies on the implementation of automatic feature extraction by a single-frame-based deep neural network, relying on a 2D Convolutional Neural Network (CNN), without considering motion information. However, obtaining motion information is important for motion modeling and determination of temporal boundaries. To model the temporal evolution of motion, many methods generate candidate temporal segments by sliding window or binary classification, which are then classified and identified. However, a disadvantage of these mainstream sliding window-based frameworks is that there is a large amount of redundant detection, which not only reduces the detection accuracy, but also affects its application.
Meanwhile, many behavior detection methods based on different scenes are proposed and have achieved high detection performance, however, most methods assume that the video is well cropped, wherein the interested actions last almost for the whole duration, so they do not need to consider the problem of localized action instances, and the network has poor ability to distinguish behaviors from backgrounds because the network itself cannot distinguish hard samples well in the training process.
Generally speaking, the existing time-series behavior detection method cannot capture the subtle difference between the behavior and the background, so that the behavior and the background problem cannot be effectively distinguished.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to provide a time sequence behavior detection method based on GAN and SSN, and aims to solve the problem that the prior time sequence behavior detection has poor resolution capability on behaviors and backgrounds.
In order to achieve the above object, the present invention provides a time sequence behavior detection method based on GAN and SSN, including:
(1) dividing video data into a training set and a test set, and performing frame extraction and optical flow calculation on the training set and the test set;
(2) selecting some region segments as proposals for each video, and carrying out normalization and data enhancement processing on frame images or optical flow images contained in the proposals;
(3) constructing a time sequence behavior detection model;
the time sequence behavior detection model comprises a structured segmented network and a generation countermeasure network;
the structured segment network is used for extracting the characteristics of the images contained in the proposal, dividing the extracted characteristics into characteristics of a start stage, a behavior stage and an end stage according to a set proportion, and performing classification, boundary regression and integrity scoring according to the characteristics of the start stage, the behavior stage and the end stage;
the generation countermeasure network is used for generating the difficult-case characteristics which are the same as the characteristic dimensionality and size extracted by the structured segmented network, counting the difficult-case characteristics in the same distribution in a training set, and judging real or false characteristics according to the difficult-case characteristics and the characteristics extracted by the structured segmented network;
(4) inputting the training set and the test set into the time sequence behavior detection model for training and testing to obtain a finally trained time sequence behavior detection model;
(5) and inputting the video to be recognized into the trained time sequence behavior detection model to obtain the behavior categories existing in the video and the starting position and the ending position corresponding to various behaviors.
Further, the selecting some regional segments for each video as a proposal in the step (2) specifically includes:
(2.1) randomly generating a series of offers for each video;
(2.2) scoring the randomly generated proposal using a BNinception-based binary network;
(2.3) generating the proposal needed by the time-sequence behavior detection according to the proposal score by adopting a TAG algorithm.
Further, the step (2.3) specifically includes:
(2.3.1) inverting the offer score along the horizontal line and considering the offer with a score below the set score as an offer basin;
(2.3.2) starting from the current proposed basin, merging subsequent proposed basins until the proportion of the basin duration exceeding the total duration drops to a set threshold; the total duration is from the time the first proposed basin begins to the time the last proposed basin ends;
(2.3.3) merging the proposed basins and the separation region between the basins as a single proposal;
(2.3.4) performing steps (2.3.2) - (2.3.3) for each proposal, resulting in a plurality of proposals;
(2.3.5) the proposal with the degree of overlap of 0.95 is subjected to non-maximum inhibition, and the proposal required by the time-series behavior detection is obtained.
Further, the structured segment network comprises a proposed segmentation sub-network, a feature extraction sub-network, a boundary regression sub-network, a classification sub-network and an integrity judgment sub-network;
the proposal segmentation sub-network is used for expanding the selected proposal, dividing the proposal into a plurality of sections, and randomly extracting a frame image or an optical flow image from each section of the proposal; the feature extraction sub-network is used for extracting features of the extracted frame images or optical flow images and dividing the extracted features into start stage features, behavior stage features and end stage features according to a set proportion; the boundary regression sub-network is used for performing behavior boundary positioning regression according to the starting stage characteristic, the behavior stage characteristic and the ending stage characteristic; the classification sub-network is used for judging the behavior class according to the behavior phase characteristics; and the integrity judgment sub-network is used for carrying out behavior integrity grading according to the starting stage characteristic, the behavior stage characteristic and the ending stage characteristic.
Further, the feature extraction sub-network divides the extracted features into start features, behavior features and end features in a ratio of 2:5: 2.
Further, the loss function of the classification sub-network and the integrity determination sub-network is:
Lcls(ci,bi;pi)=-log P(ci|pi)-1(ci≥1)log P(bi|ci,pi)
wherein p isiIs a proposal, ciIs a class label, biRepresents piWhether complete, integrity P (b)i|ci,pi) Only proposing piUsed when not considered part of the background;
the loss function of the boundary regression subnetwork is as follows:
Figure BDA0002118716710000041
if and only if ci≥1&biWhen 1, a boundary regression subnetwork loss is calculated, where μiTo propose piRelative change from the nearest real behavior instance in the center of the two intervals, phiiTo propose piThe logarithmic scale span of the two interval centers from the nearest real behavior instance.
Further, the generation countermeasure network includes a generator and an arbiter;
the generator is used for generating the feature dimension and the size which are the same as those of the feature extraction sub-network in the structured segment network, and counting the hard-case features which are distributed in the same way in the training set; the discriminator is used for judging real or false characteristics according to the difficult-to-sample characteristics generated by the generator and the characteristics extracted by the characteristic extraction sub-network in the structured segmented network, and meanwhile, judging the behavior type of the real characteristics.
Further, the generator comprises two fully-connected layers which are connected in sequence; the input of the generator is a vector of random normal distribution.
Furthermore, the number of neurons in the two fully-connected layers is 4096, and the length of the vector is 100.
Further, the feature matching penalty of the generator is:
Figure BDA0002118716710000051
wherein phi (-) denotes a feature extraction sub-network, psi (-) denotes a classification sub-network, G (-) denotes a generator, Paction={(xsY) represents a training set of behavior windows, xsRepresenting a behavior window, y representing a ground truth label;
the penalty function of the discriminator is:
LD=Lreal+Lfake
wherein L isrealFor the classification loss of the actual sample, LfakeLoss of false samples for generation;
Figure BDA0002118716710000055
Lfake=Ez~noise[-log PD(K+2|G(z))]
Figure BDA0002118716710000056
indicating the desire to discriminate as a behavior,
Figure BDA0002118716710000053
indicating a desire to discriminate as background,
Figure BDA0002118716710000054
{o1,...,oK+1is the prediction vector, xnsAs a background window, Ez~noise[]Indicating the expectation of discrimination as noise, and K +2 represents a difficult example feature.
Through the technical scheme, compared with the prior art, the invention has the following beneficial effects:
(1) according to the method, the difficult-to-case characteristics which are the same as the characteristic dimensions and sizes extracted by the structured segmented network and are distributed in a training set are generated through the GAN network, so that the recognition capability of the model on the difficult-to-case samples is improved, the model can capture the slight difference between the behaviors and the background, the distinguishing capability of the model on the behaviors and the background is improved, and the positioning precision of the time sequence behaviors is improved;
(2) the method adopts the structured segmented network to perform segmented processing on the proposal, so that the model has context recognition capability on the behavior action in the video, and the recognition capability of the model on the behavior action is ensured.
Drawings
FIG. 1 is a flow chart of a method for detecting the time-series behavior based on GAN and SSN according to the present invention;
fig. 2 is a diagram of a time-series behavior detection model.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, an embodiment of the present invention provides a time sequence behavior detection method based on GAN and SSN, including:
(1) dividing video data into a training set and a test set, and performing frame extraction and optical flow calculation on the training set and the test set;
(2) selecting some region segments as proposals for each video, and carrying out normalization and data enhancement processing on frame images or optical flow images contained in the proposals;
specifically, selecting some regional segments as proposals for each video specifically includes:
(2.1) randomly generating a series of proposals for the video data;
(2.2) scoring the randomly generated proposal using a BNinception-based binary network;
specifically, 12 proposed explosal were taken for each video, where the ratio of foreground to background was Fg: bg is 3:9 (the overlap degree >0.7 is regarded as fg, the overlap degree <0.7 is regarded as Bg), and the network parameters are set as: batchsize ═ 3; the learning rate was 0.0001. The basic idea is to find those contiguous time regions with most highly active fragments as proposals and then use the TAG algorithm to generate the proposal needed for temporal behavior detection.
(2.3) generating the proposal needed by the time-sequence behavior detection according to the proposal score by adopting a TAG algorithm.
Specifically, the step (2.3) specifically includes:
(2.3.1) inverting the offer score along the horizontal line and considering the offer with a score below the set score as an offer basin;
(2.3.2) starting from the current proposed basin, merging subsequent proposed basins until the proportion of the basin duration exceeding the total duration drops to a set threshold; the total duration is from the time the first proposed basin begins to the time the last proposed basin ends;
(2.3.3) merging the proposed basins and the separation region between the basins as a single proposal;
(2.3.4) performing steps (2.3.2) - (2.3.3) for each proposal, resulting in a plurality of proposals;
(2.3.5) the proposal with the degree of overlap of 0.95 is subjected to non-maximum inhibition, and the proposal required by the time-series behavior detection is obtained.
Each frame image or optical flow image contained in the chosen proposal is normalized to 224 x 224 pixel size and randomly horizontally flipped with a probability of 0.5.
(3) Constructing a time sequence behavior detection model;
specifically, the time sequence behavior detection model of the present invention includes a structured segment network ssn (structural reconstruction network) and a generation countermeasure network gan (genetic adaptive network);
specifically, as shown in fig. 2, a segment network is structured, including a proposed segmentation sub-network, a feature extraction sub-network, a boundary regression sub-network, a classification sub-network, and an integrity judgment sub-network;
the proposal segmentation sub-network is used for expanding the selected proposal, dividing the proposal into a plurality of sections, and randomly extracting a frame of image from each section of the proposal; the feature extraction sub-network is used for extracting features of each frame of the extracted image and dividing the extracted features into a starting feature, a behavior feature and an ending feature according to the proportion of 2:5: 2; the boundary regression sub-network is used for performing behavior boundary positioning regression according to the starting stage characteristics, the behavior stage characteristics and the ending stage characteristics; the classification sub-network is used for judging the behavior category according to the behavior stage characteristics; the integrity judgment sub-network is used for carrying out behavior integrity grading according to the starting stage characteristic, the behavior stage characteristic and the ending stage characteristic;
generating a countermeasure network, including a generator and an arbiter; the generator is used for generating the feature dimension and the size which are the same as those of the feature extraction sub-network in the structured segmented network, and counting the hard case features which are distributed in the same way in the training set; the discriminator is used for judging real or false characteristics according to the difficult-case characteristics generated by the generator and the characteristics extracted by the characteristic extraction sub-network in the structured segmented network, and meanwhile judging the behavior types of the real characteristics;
as shown in fig. 2, the generator of the present invention includes two full connection layers FC1 and FC2 connected in sequence; the number of neurons of both fully-connected layers is 4096, and a vector with a length of 100 of random normal distribution is used as the input of the generator to output the difficult-to-sample characteristics.
(4) Inputting the training set and the test set into the time sequence behavior detection model for training to obtain a trained time sequence behavior detection model;
specifically, in the structured segment network part, the loss function is mainly divided into classification loss, behavior integrity loss and boundary regression loss, and the behavior classification sub-network and the integrity judgment sub-network jointly define uniform classification loss:
Lcls(ci,bi;pi)=-log P(ci|pi)-1(ci≥1)log P(bi|ci,pi)
wherein p isiIs a proposal, ciIs a class label, biRepresents piWhether complete, integrity P (b)i|ci,pi) Only proposing piUsed when not considered part of the background;
the loss function of the boundary regression subnetwork is:
Figure BDA0002118716710000081
if and only if ci≥1&biWhen 1, i.e. the proposal belongs to a behavior class and is complete, the boundary regression subnetwork loss is calculated, where μiTo propose piRelative change from the nearest real behavior instance in the center of the two intervals, phiiTo propose piThe logarithmic scale span of the two interval centers from the nearest real behavior instance.
In the part of generating the countermeasure network, the loss function is mainly divided into feature similarity loss and classification loss, and the feature matching loss of the generator is defined as:
Figure BDA0002118716710000091
wherein phi (-) denotes a feature extraction sub-network, psi (-) denotes a classification sub-network, G (-) denotes a generator, Paction={(xsY) represents a training set of behavior windows, xsRepresenting a behavior window, y representing a ground truth label;
the arbiter determines whether the feature is a loss generated by the generator as defined by:
LD=Lreal+Lfake
wherein L isrealFor the classification loss of the actual sample, LfakeLoss of false samples for generation;
Figure BDA0002118716710000092
Lfake=Ez~noise[-log PD(K+2|G(z))]
Figure BDA0002118716710000093
indicating the desire to discriminate as a behavior,
Figure BDA0002118716710000094
indicating a desire to discriminate as background,
Figure BDA0002118716710000095
{o1,...,oK+1is the prediction vector, xnsAs a background window, Ez~noise[]Indicating the expectation of discrimination as noise, and K +2 represents a difficult example feature.
(5) And inputting the video to be recognized into the trained time sequence behavior detection model to obtain the behavior category existing in the video and the starting position and the ending position corresponding to the behavior.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (9)

1. A time sequence behavior detection method based on GAN and SSN is characterized by comprising the following steps:
(1) dividing video data into a training set and a test set, and performing frame extraction and optical flow calculation on the training set and the test set;
(2) selecting some region segments as proposals for each video, and carrying out normalization and data enhancement processing on frame images or optical flow images contained in the proposals;
(3) constructing a time sequence behavior detection model;
the time sequence behavior detection model comprises a structured segmented network and a generation countermeasure network;
the structured segment network is used for extracting the characteristics of the images contained in the proposal, dividing the extracted characteristics into characteristics of a start stage, a behavior stage and an end stage according to a set proportion, and performing classification, boundary regression and integrity scoring according to the characteristics of the start stage, the behavior stage and the end stage; the structured segmentation network comprises a proposal segmentation sub-network, a feature extraction sub-network, a boundary regression sub-network, a classification sub-network and an integrity judgment sub-network;
the proposal segmentation sub-network is used for expanding the selected proposal, dividing the proposal into a plurality of sections, and randomly extracting a frame image or an optical flow image from each section of the proposal; the feature extraction sub-network is used for extracting features of the extracted frame images or optical flow images and dividing the extracted features into start stage features, behavior stage features and end stage features according to a set proportion; the boundary regression sub-network is used for performing behavior boundary positioning regression according to the starting stage characteristic, the behavior stage characteristic and the ending stage characteristic; the classification sub-network is used for judging the video behavior category according to the behavior stage characteristics; the integrity judgment sub-network is used for carrying out behavior integrity grading according to the starting stage characteristic, the behavior stage characteristic and the ending stage characteristic;
the generation countermeasure network is used for generating the difficult-case characteristics which are the same as the characteristic dimensionality and size extracted by the structured segmented network, counting the difficult-case characteristics in the same distribution in a training set, and judging real or false characteristics according to the difficult-case characteristics and the characteristics extracted by the structured segmented network;
(4) inputting the training set and the test set into the time sequence behavior detection model for training and testing to obtain a finally trained time sequence behavior detection model;
(5) and inputting the video to be recognized into the trained time sequence behavior detection model to obtain the behavior categories existing in the video and the starting position and the ending position corresponding to various behaviors.
2. The method according to claim 1, wherein the step (2) of selecting some region segments for each video as proposals specifically comprises:
(2.1) randomly generating a series of offers for each video;
(2.2) scoring the randomly generated proposal using a BNinception-based binary network;
(2.3) generating the proposal needed by the time-sequence behavior detection according to the proposal score by adopting a TAG algorithm.
3. The method according to claim 2, wherein the step (2.3) specifically comprises:
(2.3.1) inverting the offer score along the horizontal line and considering the offer with a score below the set score as an offer basin;
(2.3.2) starting from the current proposed basin, merging subsequent proposed basins until the proportion of the basin duration exceeding the total duration drops to a set threshold; the total duration is from the time the first proposed basin begins to the time the last proposed basin ends;
(2.3.3) merging the proposed basins and the separation region between the basins as a single proposal;
(2.3.4) performing steps (2.3.2) - (2.3.3) for each proposal, resulting in a plurality of proposals;
(2.3.5) the proposal with the degree of overlap of 0.95 is subjected to non-maximum inhibition, and the proposal required by the time-series behavior detection is obtained.
4. The method of claim 1, wherein the sub-network of feature extraction divides the extracted features into start features, behavior features and end features in a ratio of 2:5: 2.
5. The method of claim 1, wherein the loss functions of the classification sub-network and the integrity judgment sub-network are as follows:
Lcls(ci,bi;pi)=-log P(ci|pi)-1(ci≥1)log P(bi|ci,pi)
wherein p isiIs a proposal, ciIs a class label, biRepresents piWhether complete, integrity P (b)i|ci,pi) Only proposing piUsed when not considered part of the background;
the loss function of the boundary regression subnetwork is as follows:
Figure FDA0003473366830000031
if and only if ci≥1&biWhen 1, a boundary regression subnetwork loss is calculated, where μiTo propose piRelative change from the nearest real behavior instance in the center of the two intervals, phiiTo propose piThe logarithmic scale span of the two interval centers from the nearest real behavior instance.
6. The GAN and SSN-based time-series behavior detection method of any one of claims 1-5, wherein the generative countermeasure network comprises a generator and a discriminator;
the generator is used for generating the feature dimension and the size which are the same as those of the feature extraction sub-network in the structured segment network, and counting the hard-case features which are distributed in the same way in the training set; the discriminator is used for judging real or false characteristics according to the difficult-to-sample characteristics generated by the generator and the characteristics extracted by the characteristic extraction sub-network in the structured segmented network, and meanwhile, judging the behavior type of the real characteristics.
7. The GAN and SSN based time series behavior detection method of claim 6, wherein the generator comprises two fully connected layers connected in sequence; the input of the generator is a vector of random normal distribution.
8. The method according to claim 7, wherein the number of neurons in both fully-connected layers is 4096, and the length of the vector is 100.
9. The method of claim 7, wherein the generator has a loss of feature matching as follows:
Figure FDA0003473366830000032
wherein phi (-) denotes a feature extraction sub-network, psi (-) denotes a classification sub-network, G (-) denotes a generator, Paction={(xsY) represents a training set of behavior windows, xsRepresenting a behavior window, y representing a ground truth label;
the penalty function of the discriminator is:
LD=Lreal+Lfake
wherein L isrealFor the classification loss of the actual sample, LfakeLoss of false samples for generation;
Figure FDA0003473366830000041
Lfake=Ez~noise[-log PD(K+2|G(z))]
Figure FDA0003473366830000042
indicating the desire to discriminate as a behavior,
Figure FDA0003473366830000043
indicating a desire to discriminate as background,
Figure FDA0003473366830000044
{o1,...,oK+1is the prediction vector, xnsAs a background window, Ez~noise[]Indicating the expectation of discrimination as noise, and K +2 represents a difficult example feature.
CN201910599488.7A 2019-07-04 2019-07-04 Time sequence behavior detection method based on GAN and SSN Expired - Fee Related CN110414367B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910599488.7A CN110414367B (en) 2019-07-04 2019-07-04 Time sequence behavior detection method based on GAN and SSN

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910599488.7A CN110414367B (en) 2019-07-04 2019-07-04 Time sequence behavior detection method based on GAN and SSN

Publications (2)

Publication Number Publication Date
CN110414367A CN110414367A (en) 2019-11-05
CN110414367B true CN110414367B (en) 2022-03-29

Family

ID=68360334

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910599488.7A Expired - Fee Related CN110414367B (en) 2019-07-04 2019-07-04 Time sequence behavior detection method based on GAN and SSN

Country Status (1)

Country Link
CN (1) CN110414367B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325097B (en) * 2020-01-22 2023-04-07 陕西师范大学 Enhanced single-stage decoupled time sequence action positioning method
CN111368786A (en) * 2020-03-16 2020-07-03 平安科技(深圳)有限公司 Action region extraction method, device, equipment and computer readable storage medium
CN111832516B (en) * 2020-07-22 2023-08-18 西安电子科技大学 Video behavior recognition method based on unsupervised video representation learning
CN111931713B (en) * 2020-09-21 2021-01-29 成都睿沿科技有限公司 Abnormal behavior detection method and device, electronic equipment and storage medium
CN112749625B (en) * 2020-12-10 2023-12-15 深圳市优必选科技股份有限公司 Time sequence behavior detection method, time sequence behavior detection device and terminal equipment
CN113420598B (en) * 2021-05-25 2024-05-14 江苏大学 Time sequence action detection method based on decoupling of context information and proposal classification
CN114064471A (en) * 2021-11-11 2022-02-18 中国民用航空总局第二研究所 Ethernet/IP protocol fuzzy test method based on generation of countermeasure network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292249A (en) * 2017-06-08 2017-10-24 深圳市唯特视科技有限公司 A kind of time motion detection method based on structuring segmented network
CN107862331A (en) * 2017-10-31 2018-03-30 华中科技大学 It is a kind of based on time series and CNN unsafe acts recognition methods and system
CN108830252A (en) * 2018-06-26 2018-11-16 哈尔滨工业大学 A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN109190524A (en) * 2018-08-17 2019-01-11 南通大学 A kind of human motion recognition method based on generation confrontation network
EP3499429A1 (en) * 2017-12-12 2019-06-19 Institute for Imformation Industry Behavior inference model building apparatus and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107292249A (en) * 2017-06-08 2017-10-24 深圳市唯特视科技有限公司 A kind of time motion detection method based on structuring segmented network
CN107862331A (en) * 2017-10-31 2018-03-30 华中科技大学 It is a kind of based on time series and CNN unsafe acts recognition methods and system
EP3499429A1 (en) * 2017-12-12 2019-06-19 Institute for Imformation Industry Behavior inference model building apparatus and method
CN108830252A (en) * 2018-06-26 2018-11-16 哈尔滨工业大学 A kind of convolutional neural networks human motion recognition method of amalgamation of global space-time characteristic
CN109190524A (en) * 2018-08-17 2019-01-11 南通大学 A kind of human motion recognition method based on generation confrontation network

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Step-by-step Erasion, One-by-one Collection: A Weakly Supervised Temporal Action Detector;Jia-Xing Zhong et al.;《MM "18: Proceedings of the 26th ACM international conference on Multimedia》;20181031;35–44 *
Yue Zhao et al. .Temporal Action Detection with Structured Segment Networks.《Proceedings of the IEEE International Conference on Computer Vision (ICCV)》.2017,2914-2923. *
基于逆向习得推理的网络异常行为检测模型;杨宏宇 等;《计算机应用》;20190308;第39卷(第7期);1967-1972 *

Also Published As

Publication number Publication date
CN110414367A (en) 2019-11-05

Similar Documents

Publication Publication Date Title
CN110414367B (en) Time sequence behavior detection method based on GAN and SSN
CN109146921B (en) Pedestrian target tracking method based on deep learning
Gupta et al. Sequential modeling of deep features for breast cancer histopathological image classification
CN108875624B (en) Face detection method based on multi-scale cascade dense connection neural network
CN108764085B (en) Crowd counting method based on generation of confrontation network
US11640714B2 (en) Video panoptic segmentation
CN109410184B (en) Live broadcast pornographic image detection method based on dense confrontation network semi-supervised learning
KR102132407B1 (en) Method and apparatus for estimating human emotion based on adaptive image recognition using incremental deep learning
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN111027377B (en) Double-flow neural network time sequence action positioning method
EP3596655B1 (en) Method and apparatus for analysing an image
CN111325237B (en) Image recognition method based on attention interaction mechanism
CN116030396B (en) Accurate segmentation method for video structured extraction
CN113591674B (en) Edge environment behavior recognition system for real-time video stream
Roy et al. Foreground segmentation using adaptive 3 phase background model
Tamou et al. Transfer learning with deep convolutional neural network for underwater live fish recognition
Hirzi et al. Literature study of face recognition using the viola-jones algorithm
Fernandez Garcia et al. AcousticIA, a deep neural network for multi-species fish detection using multiple models of acoustic cameras
CN108154172A (en) Image-recognizing method based on three decisions
CN117671597B (en) Method for constructing mouse detection model and mouse detection method and device
Krithika et al. MAFONN-EP: A minimal angular feature oriented neural network based emotion prediction system in image processing
CN109002808B (en) Human behavior recognition method and system
EP3627391A1 (en) Deep neural net for localising objects in images, methods for preparing such a neural net and for localising objects in images, corresponding computer program product, and corresponding computer-readable medium
Teršek et al. Re-evaluation of the CNN-based state-of-the-art crowd-counting methods with enhancements
Nishath et al. An Adaptive Classifier Based Approach for Crowd Anomaly Detection.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20220329