CN113516028A - Human body abnormal behavior identification method and system based on mixed attention mechanism - Google Patents

Human body abnormal behavior identification method and system based on mixed attention mechanism Download PDF

Info

Publication number
CN113516028A
CN113516028A CN202110468555.9A CN202110468555A CN113516028A CN 113516028 A CN113516028 A CN 113516028A CN 202110468555 A CN202110468555 A CN 202110468555A CN 113516028 A CN113516028 A CN 113516028A
Authority
CN
China
Prior art keywords
features
feature
characteristic
low
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110468555.9A
Other languages
Chinese (zh)
Other versions
CN113516028B (en
Inventor
李洪均
孙晓虎
余阿祥
申栩林
陈金怡
陈俊杰
谢正光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nantong University
Original Assignee
Nantong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nantong University filed Critical Nantong University
Priority to CN202110468555.9A priority Critical patent/CN113516028B/en
Publication of CN113516028A publication Critical patent/CN113516028A/en
Application granted granted Critical
Publication of CN113516028B publication Critical patent/CN113516028B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a human body abnormal behavior identification method and system based on a mixed attention mechanism, wherein the identification method comprises the following steps: extracting the features of the original image to obtain low-level detail features F; screening the low-level detail features F to obtain main significant features F'; inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features; fusing the high-level semantic features and the low-level detail features to obtain fused features; calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value; optimizing a training parameter based on the loss value; training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model; and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model. The method can improve the identification precision of the abnormal behaviors of the human body.

Description

Human body abnormal behavior identification method and system based on mixed attention mechanism
Technical Field
The invention relates to the field of human body abnormal behavior recognition, in particular to a human body abnormal behavior recognition method and system based on a mixed attention mechanism.
Background
Human body abnormal behavior detection has recently been gaining wide attention in academic and industrial fields as one of the research hotspots in the field of human body behavior recognition. With the rapid development of social economy, whether explosion-proof measures of important places such as gas stations are in place or not directly threatens the safety of surrounding people and buildings. According to incomplete statistics, the proportion of smokers in China is as high as 26.92%, and the explosion accident rate caused by smoking is as high as 12.2%. As is well known, flammable and explosive oil gas is reserved in the air of a gas station, so that the possibility of explosion accidents caused by smoking at and near the gas station is higher; in addition, the illegal behaviors such as smoking, calling and the like of the driver in the driving process also have great potential safety hazards. Therefore, people hope to analyze human behaviors, pertinently and emphatically strengthen prevention, and can send out hidden danger early warning before the occurrence of potential safety hazards to prevent the hidden dangers.
At present, classification and identification methods for human body abnormal behaviors are divided into two types according to different feature extraction modes, wherein one type is a traditional method for extracting features by relying on manual design, and the other type is a method based on deep learning. The method for extracting the artificial design features mainly judges whether abnormal behaviors occur or not through a series of means such as target detection, feature extraction and the like according to specific abnormal behavior characteristics. The traditional abnormal behavior recognition algorithm has the advantages and the disadvantages. On one hand, the traditional abnormal behavior recognition algorithm does not need complex calculation amount and strong hardware device support. Therefore, for sample data with a small calculation amount, the detection of abnormal behaviors by using the traditional recognition algorithm is more advantageous. On the other hand, there are also disadvantages, such as manually extracting features only for specific scenes, which causes its limitation and unity, and poor generalization ability. Different from the traditional method, the deep learning-based method does not need manual extraction, and mainly trains and learns the model by artificially defining some abnormal behaviors or directly based on data on the basis of the special requirements of a scene on the basis of human behavior recognition and classification, and the extracted deep features can effectively express the human behaviors and enhance the adaptability of the model to input data.
With the development of deep learning, attention mechanism is gradually and widely applied to the fields of computer vision and the like. Jaderberg et al think that the direct pooling method is too violent, and the key information cannot be identified due to direct combination of the information, so that a space conversion module is provided to perform corresponding space transformation on the space domain information in the picture, and the key information can be extracted; hu et al consider the contribution weight of the feature map for each channel to be different, and therefore propose a compressed excitation network that adaptively recalibrates the feature response in terms of channels by explicitly modeling the interdependencies between channels; although the channel attention mechanism shows great potential in improving the performance of deep convolutional neural networks, the existing method inevitably increases the complexity of the model while obtaining better performance, and Wang et al [7] propose an effective channel attention module that maintains performance while significantly reducing the complexity of the model in order to overcome the contradiction between performance and complexity; fu et al propose a double-attention mechanism, which, unlike the previous one by multi-scale feature fusion, extracts significant features with relevance from spatial dimension and channel dimension, adaptively integrates local features and their global dependency.
Inspired by an attention mechanism, the method for identifying the abnormal behavior of the mixed attention mechanism is provided, and the characteristics that a convolution block attention module can effectively extract spatial information and channel information are utilized to highlight the significance characteristics of an identified object; meanwhile, the hidden high-level semantic information is mined layer by using an improved convolution feature extraction module and is combined with the low-level information, so that the classification performance of the network is further improved.
Disclosure of Invention
The invention aims to provide a human body abnormal behavior identification method and system based on a mixed attention mechanism, which can realize accurate identification of human body abnormal behaviors.
In order to achieve the purpose, the invention provides the following scheme:
a human body abnormal behavior identification method based on a mixed attention mechanism comprises the following steps:
extracting the features of the original image to obtain low-level detail features F;
screening the low-level detail features F to obtain main significant features F';
inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features;
fusing the high-level semantic features and the low-level detail features to obtain fused features;
calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
optimizing a training parameter based on the loss value;
training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
Optionally, screening the low-level detail feature F to obtain a main significant feature F ″ specifically includes:
inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension, and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
splicing the first average pooling characteristic and the first maximum pooling characteristic, and obtaining a weight coefficient Mc through a Sigmoid activation function;
multiplying the weight coefficient Mc with the low-level detail feature F to obtain a new feature F' after zooming;
inputting the new feature F' into an average pooling layer and a maximum pooling layer of the channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
splicing the second average pooling characteristic and the second maximum pooling characteristic, and obtaining a weight coefficient Ms through a Sigmoid activation function;
and multiplying the weight coefficient Ms and the scaled new feature F 'to obtain a main significant feature F'.
Optionally, inputting the main significant feature F ″ to a convolution feature extraction module to obtain a high-level semantic feature and fusing the high-level semantic feature and the low-level detail feature, where the obtaining of the fused feature specifically includes:
carrying out point-by-point convolution and depth separable convolution operations on the main significant features F' to obtain features G;
compressing the feature G by adopting global average pooling to obtain a compressed vector L;
carrying out excitation operation on the compressed vector L to obtain an output S;
weighting the output S to a characteristic G to obtain a re-calibrated characteristic I;
performing maximum pooling operation and average pooling operation on the re-calibrated characteristic I to obtain a maximum pooling characteristic and an average pooling characteristic;
splicing the maximum pooling characteristics and the average pooling characteristics, and generating a characteristic mapping Q by utilizing convolutions
Mapping the feature to QsWeighting on the characteristic I, performing characteristic recalibration, ending by point-by-point convolution of 1x1, recovering original channel dimension, performing connection inactivation and inputtingAnd jumping connection, namely fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
Optionally, the loss between the predicted value and the actual value of the training sample is calculated, and the obtained loss value specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
The invention further provides a human body abnormal behavior recognition system based on a mixed attention mechanism, which comprises:
the low-level detail feature extraction module is used for extracting features of the original image to obtain low-level detail features F;
the main significant feature screening module is used for screening the low-level detail features F to obtain main significant features F';
the high-level semantic feature extraction module is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
the feature fusion module is used for fusing the high-level semantic features and the low-level detail features to obtain fused features;
the loss value calculation module is used for calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
the optimization module is used for optimizing the training parameters based on the loss values;
the training module is used for training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module is used for recognizing the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
Optionally, the module for screening for main significant features specifically includes:
the first average pooling feature and first maximum pooling feature extracting unit is used for inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
the weight coefficient Mc calculating unit is used for splicing the first average pooling characteristic and the first maximum pooling characteristic and obtaining a weight coefficient Mc through a Sigmoid activation function;
the characteristic F 'determining unit is used for multiplying the weight coefficient Mc and the low-level detail characteristic F to obtain a new scaled characteristic F';
a second average pooling feature and second maximum pooling feature extracting unit, configured to input the new feature F' to an average pooling layer and a maximum pooling layer of a channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
the weight coefficient Ms calculation unit is used for splicing the second average pooling characteristic and the second maximum pooling characteristic and obtaining a weight coefficient Ms through a Sigmoid activation function;
and the main significant feature F ' determination unit is used for multiplying the weight coefficient Ms and the scaled new feature F ' to obtain a main significant feature F '.
Optionally, the high-level semantic feature extraction module and the feature fusion module specifically include:
a point-by-point convolution and depth separable convolution operation unit for performing point-by-point convolution and depth separable convolution operations on the main significant feature F' to obtain a feature G;
the compression unit is used for performing compression operation on the characteristic G by adopting global average pooling to obtain a compressed vector L;
the excitation operation unit is used for carrying out excitation operation on the compressed vector L to obtain an output S;
the recalibration unit is used for weighting the output S to the characteristic G to obtain a recalibrated characteristic I;
a maximum pooling operation and average pooling operation unit for performing maximum pooling operation and average pooling operation on the re-calibrated feature I to obtain a maximum pooling feature and an average pooling feature;
a splicing unit for splicing the maximum pooling characteristic and the average pooling characteristic and generating a characteristic mapping Q by convolutions
A feature fusion unit for mapping the features QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
Optionally, the loss value calculation module specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the method comprises the steps of extracting features through an original image to obtain low-level detail features F; screening the low-level detail features F to obtain main significant features F'; inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features; fusing the high-level semantic features and the low-level detail features to obtain fused features; calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value; optimizing a training parameter based on the loss value; training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model; the abnormal behaviors of the human body are identified based on the trained abnormal behavior identification model, so that the identification precision and effect of the abnormal behaviors of the human body are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of an abnormal behavior recognition framework of a hybrid attentive mechanism according to an embodiment of the present invention;
FIG. 2 is a flowchart of a method for recognizing abnormal human behavior based on a hybrid attention mechanism according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a convolution block attention module according to an embodiment of the present invention;
FIG. 4 is a diagram of a convolution feature extraction module according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a human body abnormal behavior recognition system based on a hybrid attention mechanism according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention aims to provide a human body abnormal behavior identification method and system based on a mixed attention mechanism, which can realize accurate identification of human body abnormal behaviors.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a schematic diagram of an abnormal behavior recognition framework of a hybrid attention mechanism according to an embodiment of the present invention, and fig. 2 is a flowchart of a human body abnormal behavior recognition method based on the hybrid attention mechanism according to an embodiment of the present invention, as shown in fig. 1 and fig. 2, the method includes:
step 101: and (4) performing feature extraction on the original image to obtain low-level detail features F.
Specifically, the original picture is processed by a Stem module to obtain a feature F.
Step 102: and screening the low-level detail feature F to obtain a main significant feature F'.
In order to enhance the significance characteristics and reduce the attention degree of other information, a convolution attention module is introduced, the low-level detail characteristics F extracted in the step 1 are scaled to obtain new main significance characteristics F ", the structure of the convolution attention module is shown in FIG. 3, and the specific processing flow is as follows:
to effectively focus on meaningful channel features, channel attention is calculated. Firstly, respectively carrying out global average pooling and maximum pooling on the features F through a space dimension, then respectively sending the features F into a shared network MLP, splicing the two obtained features, and then obtaining a weight coefficient M through a Sigmoid activation functioncAnd finally the weighting factor McMultiplying the original input feature F to obtain a new feature F' after scaling, which is defined as:
Figure BDA0003044413220000071
Figure BDA0003044413220000072
wherein the content of the first and second substances,
Figure BDA0003044413220000073
and
Figure BDA0003044413220000074
mean pooling characteristic and maximum pooling characteristic are represented, respectively, and σ represents Sigmoid activation function.
To effectively focus on meaningful spatial features, spatial attention is computed. Firstly, respectively carrying out average pooling and maximum pooling on the features F' in one channel dimension, and splicing the two features together; then, a convolution layer with an activation function of Sigmoid is used to obtain a weight coefficient MsAnd finally the weighting factor MsMultiplying the input feature F 'to obtain a scaled new feature F', which is defined as:
Figure BDA0003044413220000081
Figure BDA0003044413220000082
wherein, sigma represents Sigmoid activation function, f represents convolution operation,
Figure BDA0003044413220000083
and
Figure BDA0003044413220000084
representing the average pooling characteristic and the maximum pooling characteristic, respectively
Figure BDA0003044413220000085
Representing the point-by-point multiplication of the matrix and F "the final output.
Step 103: and inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features.
In order to further mine high-level semantic information and improve the feature extraction capability of a network model, a convolution feature extraction module is provided, the significant feature F' obtained in the step 102 is input to obtain high-level semantic features and is fused with low-level detail features to enhance the interactivity of the network model, the structure of the convolution feature extraction module is shown in FIG. 4, and the specific processing flow is as follows:
firstly, performing 1 × 1 point-by-point convolution on input F ″, changing the dimensionality of an output channel according to an expansion ratio and adopting a deep separable convolution operation, and effectively reducing the parameter number while ensuring the independence among the channels, wherein the definition is as follows:
G=f2(f1(F″)) (5)
wherein, F' represents the input of the input,g denotes the input feature map, f1(. represents a point-by-point convolution, f2(. cndot.) represents a depth separable convolution.
Then, in order to obtain the global distribution of response on the feature channel, the global mean pooling is used as a compression operation, the feature G is changed into a feature of 1 × 1 × C by a convolution operation, and the obtained vector has a global receptive field to some extent, and the formula is as follows:
Figure BDA0003044413220000086
wherein, UsqIndicating a compression operation, L indicates a compressed vector, and H × W indicates its size.
Then, a full-link layer is adopted to form a Bottleneeck structure to learn the correlation among channels, a parameter W is introduced to generate a weight for each feature channel, wherein the parameter W is learnable and is convolved by the convolution of an activation proportion multiplied by a global feature dimension by a number of 1x1, and the formula is as follows:
S=Uex(L,W) (7)
wherein, UexRepresenting the excitation operation, S being the output of the operation, and possibly characterizing the importance of different features, W adjusting the excitation operation based on a scale parameter.
The output weight of the excitation operation is regarded as the importance of each selected characteristic channel, and the selected characteristic channel is weighted to the previous characteristic channel by channel through multiplication, so that the recalibration of the original characteristic in the channel dimension is completed, and the formula is as follows:
I=Uscale(G,S)=G·S (8)
wherein, denotes a matrix multiplication operation, UscaleIndicating an assign weight operation.
In order to obtain deeper high-level feature information, maximum pooling operation and average pooling operation are respectively carried out on the re-calibrated features, the unique features of the object are effectively extracted, the extracted pooling features are spliced, and a new feature is generated by convolutionSign mapping QsIt is defined as follows:
Qs(I)=σ(h([Iavg;Imax])) (9)
wherein, IavgAnd ImaxMean pooling and maximum pooling features are represented, respectively, σ represents Sigmoid activation function, and h (·) represents convolution operation.
Finally, the features are mapped to QsWeighting the previous feature I, ending by point-by-point convolution of 1x1 after completing the re-calibration of the feature, recovering the original channel dimension, performing connection inactivation and input jump connection, fusing the high-level semantic features and the low-level detail features extracted by a convolution feature extraction module in a multi-level manner, and enhancing the interactivity, wherein the definition is as follows:
Z=D(G′(U′scale(I·Qs))) (10)
wherein D (-) denotes a hopping connection,. denotes a matrix multiplication operation, G 'denotes a convolution operation, U'scaleIndicating the assignment of weights, Z indicates the output characteristics.
Step 104: and fusing the high-level semantic features and the low-level detail features to obtain fused features.
Step 105: and calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value.
The cross entropy loss function is corrected by adopting label smoothing, the curve is smooth, derivation is easy to conduct, the gradient is stable, the network has better generalization, finally, more accurate prediction is generated on invisible data, and the accuracy of image classification is improved.
y′=(1-ε)×y+ε/k (11)
Wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
Step 106: and optimizing the training parameters based on the loss values.
Step 107: and training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model.
Step 108: and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
And training the model based on the characteristic information, the model parameters and all the training samples to obtain a trained abnormal behavior recognition model, and recognizing and classifying the abnormal behaviors in all the test samples based on the obtained abnormal behavior model.
All samples are calculated through the softmax function to obtain a corresponding probability, the probability sum is 1, and the corresponding abnormal behavior category with the maximum probability is judged.
Figure BDA0003044413220000101
PiRepresenting the probability that the predicted object belongs to the i-th class behavior, exp (-) representing the mapping of the real output to zero to positive infinity, ΣiMeaning that all probabilities are summed.
Fig. 5 is a schematic structural diagram of a human body abnormal behavior recognition system based on a hybrid attention mechanism according to an embodiment of the present invention, and as shown in fig. 5, the system includes:
the low-level detail feature extraction module 201 is configured to perform feature extraction on an original image to obtain a low-level detail feature F;
a main significant feature screening module 202, configured to screen the low-level detail feature F to obtain a main significant feature F ″;
the high-level semantic feature extraction module 203 is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
a feature fusion module 204, configured to fuse the high-level semantic features and the low-level detail features to obtain fused features;
a loss value calculation module 205, configured to calculate a loss between a predicted value and an actual value of the training sample to obtain a loss value;
an optimization module 206, configured to optimize a training parameter based on the loss value;
the training module 207 is configured to train the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module 208 is configured to recognize the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. For the system disclosed by the embodiment, the description is relatively simple because the system corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
The principles and embodiments of the present invention have been described herein using specific examples, which are provided only to help understand the method and the core concept of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.

Claims (8)

1. A human body abnormal behavior identification method based on a mixed attention mechanism is characterized by comprising the following steps:
extracting the features of the original image to obtain low-level detail features F;
screening the low-level detail features F to obtain main significant features F';
inputting the main significant features F' into a convolution feature extraction module to obtain high-level semantic features;
fusing the high-level semantic features and the low-level detail features to obtain fused features;
calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
optimizing a training parameter based on the loss value;
training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and identifying the abnormal behaviors of the human body based on the trained abnormal behavior identification model.
2. The method for recognizing the abnormal human behavior based on the mixed attention mechanism according to claim 1, wherein the step of screening the low-level detail features F to obtain the main significant features F ″ specifically comprises the steps of:
inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension, and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
splicing the first average pooling characteristic and the first maximum pooling characteristic, and obtaining a weight coefficient Mc through a Sigmoid activation function;
multiplying the weight coefficient Mc with the low-level detail feature F to obtain a new feature F' after zooming;
inputting the new feature F' into an average pooling layer and a maximum pooling layer of the channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
splicing the second average pooling characteristic and the second maximum pooling characteristic, and obtaining a weight coefficient Ms through a Sigmoid activation function;
and multiplying the weight coefficient Ms and the scaled new feature F 'to obtain a main significant feature F'.
3. The method for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 1, wherein the step of inputting the main significant features F "to a convolution feature extraction module to obtain high-level semantic features and the step of fusing the high-level semantic features and the low-level detail features to obtain fused features specifically comprises the steps of:
carrying out point-by-point convolution and depth separable convolution operations on the main significant features F' to obtain features G;
compressing the feature G by adopting global average pooling to obtain a compressed vector L;
carrying out excitation operation on the compressed vector L to obtain an output S;
weighting the output S to a characteristic G to obtain a re-calibrated characteristic I;
performing maximum pooling operation and average pooling operation on the re-calibrated characteristic I to obtain a maximum pooling characteristic and an average pooling characteristic;
splicing the maximum pooling characteristics and the average pooling characteristics, and generating a characteristic mapping Q by utilizing convolutions
Mapping the feature to QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
4. The method for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 1, wherein the loss between the predicted value and the actual value of the training sample is calculated, and the following formula is specifically adopted to obtain the loss value:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
5. A human body abnormal behavior recognition system based on a mixed attention mechanism is characterized in that the recognition system comprises:
the low-level detail feature extraction module is used for extracting features of the original image to obtain low-level detail features F;
the main significant feature screening module is used for screening the low-level detail features F to obtain main significant features F';
the high-level semantic feature extraction module is used for inputting the main significant features F' into the convolution feature extraction module to obtain high-level semantic features;
the feature fusion module is used for fusing the high-level semantic features and the low-level detail features to obtain fused features;
the loss value calculation module is used for calculating the loss between the predicted value and the actual value of the training sample to obtain a loss value;
the optimization module is used for optimizing the training parameters based on the loss values;
the training module is used for training the neural network model based on the optimized training parameters and the fused features to obtain a trained abnormal behavior recognition model;
and the abnormal behavior recognition module is used for recognizing the abnormal behavior of the human body based on the trained abnormal behavior recognition model.
6. The system for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the main significant feature screening module specifically comprises:
the first average pooling feature and first maximum pooling feature extracting unit is used for inputting the low-level detail features F into a global average pooling layer and a maximum pooling layer of a space dimension and sending the low-level detail features F into a shared network MLP to obtain first average pooling features and first maximum pooling features;
the weight coefficient Mc calculating unit is used for splicing the first average pooling characteristic and the first maximum pooling characteristic and obtaining a weight coefficient Mc through a Sigmoid activation function;
the characteristic F 'determining unit is used for multiplying the weight coefficient Mc and the low-level detail characteristic F to obtain a new scaled characteristic F';
a second average pooling feature and second maximum pooling feature extracting unit, configured to input the new feature F' to an average pooling layer and a maximum pooling layer of a channel dimension to obtain a second average pooling feature and a second maximum pooling feature;
the weight coefficient Ms calculation unit is used for splicing the second average pooling characteristic and the second maximum pooling characteristic and obtaining a weight coefficient Ms through a Sigmoid activation function;
and the main significant feature F ' determination unit is used for multiplying the weight coefficient Ms and the scaled new feature F ' to obtain a main significant feature F '.
7. The system for recognizing the abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the high-level semantic feature extraction module and the feature fusion module specifically comprise:
a point-by-point convolution and depth separable convolution operation unit for performing point-by-point convolution and depth separable convolution operations on the main significant feature F' to obtain a feature G;
the compression unit is used for performing compression operation on the characteristic G by adopting global average pooling to obtain a compressed vector L;
the excitation operation unit is used for carrying out excitation operation on the compressed vector L to obtain an output S;
the recalibration unit is used for weighting the output S to the characteristic G to obtain a recalibrated characteristic I;
a maximum pooling operation and average pooling operation unit for performing maximum pooling operation and average pooling operation on the re-calibrated feature I to obtain a maximum pooling feature and an average pooling feature;
a splicing unit for splicing the maximum pooling characteristic and the average pooling characteristic and generating a characteristic mapping Q by convolutions
A feature fusion unit for mapping the features QsWeighting the feature I, performing feature recalibration, ending by point-by-point convolution of 1x1, recovering the original channel dimension, performing connection inactivation and input jump connection, and fusing the high-level semantic features and the low-level detail features extracted by the convolution feature extraction module in a multi-level manner to obtain fused features.
8. The system for recognizing abnormal human behavior based on the mixed attention mechanism as claimed in claim 5, wherein the loss value calculating module specifically adopts the following formula:
and y ═ 1-epsilon x y + epsilon/k, wherein k represents the number of classes in a specific task, y represents a k-dimensional matrix composed of k classes, epsilon represents a smoothing factor, and y' represents a k-dimensional matrix composed of k classes after label smoothing.
CN202110468555.9A 2021-04-28 2021-04-28 Human body abnormal behavior identification method and system based on mixed attention mechanism Active CN113516028B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110468555.9A CN113516028B (en) 2021-04-28 2021-04-28 Human body abnormal behavior identification method and system based on mixed attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110468555.9A CN113516028B (en) 2021-04-28 2021-04-28 Human body abnormal behavior identification method and system based on mixed attention mechanism

Publications (2)

Publication Number Publication Date
CN113516028A true CN113516028A (en) 2021-10-19
CN113516028B CN113516028B (en) 2024-01-19

Family

ID=78063994

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110468555.9A Active CN113516028B (en) 2021-04-28 2021-04-28 Human body abnormal behavior identification method and system based on mixed attention mechanism

Country Status (1)

Country Link
CN (1) CN113516028B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745175A (en) * 2022-04-11 2022-07-12 中国科学院信息工程研究所 Attention mechanism-based network malicious traffic identification method and system

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN108830157A (en) * 2018-05-15 2018-11-16 华北电力大学(保定) Human bodys' response method based on attention mechanism and 3D convolutional neural networks
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN110222653A (en) * 2019-06-11 2019-09-10 中国矿业大学(北京) A kind of skeleton data Activity recognition method based on figure convolutional neural networks
CN110852273A (en) * 2019-11-12 2020-02-28 重庆大学 Behavior identification method based on reinforcement learning attention mechanism
WO2020113886A1 (en) * 2018-12-07 2020-06-11 中国科学院自动化研究所 Behavior feature extraction method, system and apparatus based on time-space/frequency domain hybrid learning
CN111626171A (en) * 2020-05-21 2020-09-04 青岛科技大学 Group behavior identification method based on video segment attention mechanism and interactive relation activity diagram modeling
CN112307958A (en) * 2020-10-30 2021-02-02 河北工业大学 Micro-expression identification method based on spatiotemporal appearance movement attention network
CN112307982A (en) * 2020-11-02 2021-02-02 西安电子科技大学 Human behavior recognition method based on staggered attention-enhancing network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089556B1 (en) * 2017-06-12 2018-10-02 Konica Minolta Laboratory U.S.A., Inc. Self-attention deep neural network for action recognition in surveillance videos
CN108830157A (en) * 2018-05-15 2018-11-16 华北电力大学(保定) Human bodys' response method based on attention mechanism and 3D convolutional neural networks
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
WO2020113886A1 (en) * 2018-12-07 2020-06-11 中国科学院自动化研究所 Behavior feature extraction method, system and apparatus based on time-space/frequency domain hybrid learning
CN110059582A (en) * 2019-03-28 2019-07-26 东南大学 Driving behavior recognition methods based on multiple dimensioned attention convolutional neural networks
CN110222653A (en) * 2019-06-11 2019-09-10 中国矿业大学(北京) A kind of skeleton data Activity recognition method based on figure convolutional neural networks
CN110852273A (en) * 2019-11-12 2020-02-28 重庆大学 Behavior identification method based on reinforcement learning attention mechanism
CN111626171A (en) * 2020-05-21 2020-09-04 青岛科技大学 Group behavior identification method based on video segment attention mechanism and interactive relation activity diagram modeling
CN112307958A (en) * 2020-10-30 2021-02-02 河北工业大学 Micro-expression identification method based on spatiotemporal appearance movement attention network
CN112307982A (en) * 2020-11-02 2021-02-02 西安电子科技大学 Human behavior recognition method based on staggered attention-enhancing network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHAI, B: "Research on Detection Method of Abnormal Behavior of People in Video Surveillance", 2018 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2018), pages 289 - 293 *
余阿祥, 李承润: "多注意力机制的口罩检测网络", 南京师范大学学报(工程技术版), pages 23 - 29 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114745175A (en) * 2022-04-11 2022-07-12 中国科学院信息工程研究所 Attention mechanism-based network malicious traffic identification method and system
CN114745175B (en) * 2022-04-11 2022-12-23 中国科学院信息工程研究所 Network malicious traffic identification method and system based on attention mechanism

Also Published As

Publication number Publication date
CN113516028B (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN111626350B (en) Target detection model training method, target detection method and device
CN109597997B (en) Comment entity and aspect-level emotion classification method and device and model training thereof
CN111126258B (en) Image recognition method and related device
CN111061843A (en) Knowledge graph guided false news detection method
CN112016500A (en) Group abnormal behavior identification method and system based on multi-scale time information fusion
CN109190472B (en) Pedestrian attribute identification method based on image and attribute combined guidance
CN111626116A (en) Video semantic analysis method based on fusion of multi-attention mechanism and Graph
CN116994069B (en) Image analysis method and system based on multi-mode information
CN115564766B (en) Preparation method and system of water turbine volute seat ring
CN115131638B (en) Training method, device, medium and equipment for visual text pre-training model
CN116699297B (en) Charging pile detection system and method thereof
CN115761900B (en) Internet of things cloud platform for practical training base management
CN114978613B (en) Network intrusion detection method based on data enhancement and self-supervision feature enhancement
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN111008570B (en) Video understanding method based on compression-excitation pseudo-three-dimensional network
CN113516028A (en) Human body abnormal behavior identification method and system based on mixed attention mechanism
CN110458215A (en) Pedestrian's attribute recognition approach based on multi-time Scales attention model
CN115909374A (en) Information identification method, device, equipment, storage medium and program product
CN117475236A (en) Data processing system and method for mineral resource exploration
CN112163494A (en) Video false face detection method and electronic device
CN115393927A (en) Multi-modal emotion emergency decision system based on multi-stage long and short term memory network
CN114550297A (en) Pedestrian intention analysis method and system
CN114241253A (en) Model training method, system, server and storage medium for illegal content identification
CN116030507A (en) Electronic equipment and method for identifying whether face in image wears mask
CN113205044A (en) Deep counterfeit video detection method based on characterization contrast prediction learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant