CN114764902A - Behavior recognition method and device and storage medium - Google Patents

Behavior recognition method and device and storage medium Download PDF

Info

Publication number
CN114764902A
CN114764902A CN202110002186.4A CN202110002186A CN114764902A CN 114764902 A CN114764902 A CN 114764902A CN 202110002186 A CN202110002186 A CN 202110002186A CN 114764902 A CN114764902 A CN 114764902A
Authority
CN
China
Prior art keywords
sample
time sequence
neural network
dimensional
dimensional space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110002186.4A
Other languages
Chinese (zh)
Inventor
王晴
王凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Communications Ltd Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Communications Ltd Research Institute filed Critical China Mobile Communications Group Co Ltd
Priority to CN202110002186.4A priority Critical patent/CN114764902A/en
Publication of CN114764902A publication Critical patent/CN114764902A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a behavior recognition method, a behavior recognition device and a storage medium, wherein the method comprises the following steps: acquiring data to be detected; the data to be detected is three-dimensional point cloud data; recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result; the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.

Description

Behavior recognition method and device and storage medium
Technical Field
The present invention relates to the field of networks, and in particular, to a behavior recognition method, apparatus, and storage medium.
Background
Currently, video surveillance is still the main focus for behavior recognition in a home environment. In the scheme of video monitoring, an algorithm of behavior recognition needs to be applied, the algorithm of behavior recognition needs to be analyzed based on a time sequence, the calculation force required by the time sequence is higher than that of a single time stamp, and in order to optimize a model, the method usually increases the depth, the width and the base number of a network, which all need to occupy more resources. And too big model can be difficult to be moved in the edge side, can only rely on the high in the clouds analysis, but the high in the clouds analysis is unfavorable for privacy protection, under the dangerous condition that takes place data leakage, user's identity information is exposed completely. How to keep the lightweight of the model while optimizing the prediction effect of the model is one of the existing problems.
Disclosure of Invention
In view of the above, the main object of the present invention is to provide a behavior recognition method, apparatus and storage medium.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the embodiment of the invention provides a behavior identification method, which comprises the following steps:
acquiring data to be detected; the data to be detected is three-dimensional point cloud data;
recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
In the above scheme, the identifying the data to be detected by using a preset behavior identification model to obtain a behavior result includes:
acquiring a first three-dimensional space-time sequence and a second three-dimensional space-time sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical thermodynamic diagram and a time dimension corresponding to the vertical thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction feature and the second direction feature to obtain a target feature;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
In the above scheme, the method further comprises: generating a preset behavior recognition model; the generating of the preset behavior recognition model includes:
acquiring a training sample set; the training sample set comprises: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the behavior recognition model.
In the above scheme, the training the neural network according to the at least one training sample and the behavior label corresponding to each training sample includes:
acquiring three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
In the foregoing solution, the neural network includes: a first neural network portion, a second neural network portion, and a fully-connected layer;
the training of the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data comprises:
the first neural network part extracts a first sample direction characteristic according to the first sample three-dimensional space-time sequence;
the second neural network part extracts a second sample direction characteristic according to the second sample three-dimensional space-time sequence;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
The embodiment of the invention provides a behavior recognition device, which comprises:
the acquisition module is used for acquiring data to be detected; the data to be detected is three-dimensional point cloud data;
the identification module is used for identifying the data to be detected by using a preset behavior identification model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
In the above scheme, the identification module is configured to obtain a first three-dimensional space-time sequence and a second three-dimensional space-time sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical thermodynamic diagram and a time dimension corresponding to the vertical thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction feature and the second direction feature to obtain a target feature;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
In the above scheme, the apparatus further comprises: the preprocessing module is used for acquiring a training sample set; the training sample set includes: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the behavior recognition model.
In the above scheme, the preprocessing module is configured to obtain three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
In the foregoing solution, the neural network includes: a first neural network portion, a second neural network portion, and a fully connected layer;
the preprocessing module is used for extracting a first sample direction characteristic according to the first sample three-dimensional time-space sequence through a first neural network part;
extracting a second sample direction characteristic according to the second sample three-dimensional space-time sequence through a second neural network part;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
The embodiment of the invention provides a behavior recognition device, which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the program to realize the steps of the behavior recognition method.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the behavior recognition method described in any one of the above.
The embodiment of the invention provides a behavior identification method, a behavior identification device and a storage medium, wherein the method comprises the following steps: acquiring data to be detected; the data to be detected is three-dimensional point cloud data; recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result; the behavior recognition model is obtained based on neural network training; the neural network adopts a block based on a space attention layer and a channel attention layer, and a rectangular pooling layer for capturing contexts in different directions; therefore, behavior recognition is carried out based on the three-dimensional point cloud data, recognition cost and power consumption are reduced, personal information cannot be directly seen through the three-dimensional point cloud data due to the fact that the three-dimensional point cloud data is low in resolution, the static object eliminating function is achieved, privacy is protected, and the method is particularly suitable for being used in scenes with strong privacy such as homes.
Drawings
Fig. 1 is a schematic flowchart of a behavior recognition method according to an embodiment of the present invention;
fig. 2 is a schematic flowchart of another behavior recognition method according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a network structure of a Resnet block according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a spatial attention layer provided in an embodiment of the present invention;
FIG. 5 is a schematic structural diagram of a channel attention layer provided in an embodiment of the present invention;
FIG. 6 is a schematic diagram of rectangular pooling provided by an embodiment of the present invention;
fig. 7 is a schematic diagram of a behavior recognition result according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a behavior recognition apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of another behavior recognition device according to an embodiment of the present invention.
Detailed Description
According to the method provided by the embodiment of the invention, an image to be detected is obtained, and a behavior recognition model is used for recognizing the image to be detected to obtain a first characteristic; matching the first feature with at least one feature in a feature search library, determining a first distance of the first feature from each of the at least one feature; adjusting the first distance corresponding to each feature based on the weight; and determining the type of the image to be detected according to the adjusted first distance corresponding to each feature.
The present invention will be described in further detail with reference to examples.
Fig. 1 is a schematic flowchart of a behavior recognition method according to an embodiment of the present invention; as shown in fig. 1, the method is applied to a smart device; the intelligent device can be a server, a computer and the like, and the method comprises the following steps:
step 101, acquiring data to be detected; the data to be detected is three-dimensional (3D) point cloud data;
step 102, recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
Here, the three-dimensional point cloud data may be collected by any commercial millimeter wave radar. The transmitting and receiving antenna of the millimeter wave radar factory is required to be 3-transmitting and 4-receiving and above, a static elimination algorithm of the millimeter wave radar can be used, and only the information of a moving human body is reserved.
The behavioral result may include: behaviors and probabilities corresponding to the behaviors; for example, the behavior is squatting with a probability of 90%; the probabilities characterize the confidence level of the respective behaviors.
Here, the neural network may adopt a Resnet network, and accordingly, the blocks based on the spatial attention layer and the channel attention layer are residual block (Resnet block) based on the spatial attention layer and the channel attention layer;
and the originally adopted maximum pooling layer and average pooling layer are changed into rectangular pooling layers for capturing the contexts in different directions.
The neural network can also adopt other neural networks, only the space attention layer and the channel attention layer are applied to the neural network, and the rectangular pooling layer is adopted to replace other pooling layers.
In an embodiment, the identifying the data to be detected by using a preset behavior identification model to obtain a behavior result includes:
acquiring a first three-dimensional space-time sequence and a second three-dimensional space-time sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical direction thermodynamic diagram and a time dimension corresponding to the vertical direction thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction characteristic and the second direction characteristic to obtain a target characteristic;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
The following is described with respect to obtaining the first three-dimensional spatio-temporal sequence and the second three-dimensional spatio-temporal sequence. The three-dimensional point cloud data presents energy values and corresponds to three-dimensional coordinates (x, y, z); the energy values of the three-dimensional point cloud data may be mapped in an interpolated manner into horizontal (x, y) and vertical (x, z) planes, respectively. And (3) performing the above processing on the three-dimensional point cloud data according to a time axis to obtain 2 rows of data with the dimensionality of 3, namely obtaining a horizontal direction thermodynamic diagram with a time dimensionality (namely a first three-dimensional space-time sequence) and a horizontal direction thermodynamic diagram with a time dimensionality (namely a second three-dimensional space-time sequence).
In an embodiment, the method further comprises: and generating a preset behavior recognition model.
Specifically, the generating a preset behavior recognition model includes:
acquiring a training sample set; the training sample set comprises: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the trained neural network serving as the behavior recognition model.
Wherein the training the neural network according to the at least one training sample and the behavior label corresponding to each training sample comprises:
acquiring three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
The behavior tag may include at least one of: standing up, sitting down, falling down, cough, back pain, chest pain, etc. Correspondingly, three-dimensional point cloud sample data corresponding to different behaviors can be collected, and the length of each action segment can be 5 seconds during collection. The inclusion of behaviors is not limited to the above behaviors.
Specifically, with respect to obtaining three-dimensional sample point cloud data, the millimeter wave radar may be placed on a horizontal desktop with a height of about 1 meter, and the subject may make a corresponding action at a distance of 2-5 meters from the radar.
Specifically, the neural network comprises: a first neural network portion, a second neural network portion, and a fully connected layer;
the training of the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data comprises:
the first neural network part extracts a first sample direction characteristic according to the first sample three-dimensional space-time sequence;
the second neural network part extracts a second sample direction characteristic according to the second sample three-dimensional space-time sequence;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
The feature fusion described above can be a simple connection; for example, the first directional feature is a vector with dimensions of 1 × n, the second directional feature is another vector with dimensions of 1 × n, and a vector with dimensions of 1 × 2n is obtained through fusion and serves as the target feature;
the merging of the sample direction features is the same, and is not described in detail here.
The method provided by the embodiment of the invention is applied to any suitable behavior recognition; for example, for the identification of falling behaviors in a home environment, whether a user falls in the environment can be identified, and an alarm is given if the user falls, so that the safety of the user such as the old, children and the like is guaranteed.
FIG. 2 is a schematic diagram of a neural network according to an embodiment of the present invention; as shown in fig. 2, the neural network includes: a first neural network portion, a second neural network portion, a fully connected layer, a softmax layer;
the first neural network portion may employ a Resnet portion and the second neural network portion may employ a Resnet portion.
The structure as in table 1 below is used for the Resnet part:
Figure BDA0002881897900000091
TABLE 1
Here, the neural network uses the Resnet structure. Wherein, the block based on the spatial attention layer and the channel attention layer is each residual block (Resnet block) in the Resnet structure.
That is, for each block in the Resnet structure, an attention layer is added; and the attention layer is used for learning the attention weight and carrying out weighting operation on the weight and the input to obtain the attention feature. The entire structure is divided into two parts, first through the spatial attention layer and then through the channeling attention layer.
Attention is paid toA force layer comprising: a spatial attention layer and a channel attention layer. Specifically, the network structure diagram of each respet block is shown in fig. 3, where a spatial attention layer and a channel attention layer are added to each original respet block, and the original respet block passes through the spatial attention layer first and then passes through the channel attention layer. In the figure, the position of the upper end of the main shaft,
Figure BDA0002881897900000092
representing element multiplication.
FIG. 4 is a schematic structural diagram of a spatial attention layer provided in an embodiment of the present invention; the following is described with respect to the spatial attention layer in conjunction with FIG. 4;
the spatial attention layer is used for compressing the channel, maximum pooling operation (Max pool) and Average pooling operation (Average pool) are respectively carried out on the channel dimension, and the times of extraction in the pooling operation process are both high times wide. And then, connecting the results after the Max pool and the Average pool in parallel, reducing the dimension to a channel through a convolution operation, and finally generating a spatial attention weight through a sigmoid function. The weights are weighted with the inputs to obtain the final features.
The algorithm for the spatial attention layer can be expressed as: m is a group ofs(F)=δ(f([AvgPool(F);MaxPool(F)]))
Where δ represents a sigmoid function, f () represents a convolution operation, avgpool (f) represents an average pooling operation; maxpool (F) represents the maximum pooling operation.
FIG. 5 is a schematic structural diagram of a channel attention layer provided in an embodiment of the present invention; the following is described for the channel attention layer in conjunction with FIG. 5;
the channel attention layer is obtained by compressing the feature map in the space dimension to obtain a one-dimensional vector. In spatial compression, the spatial information is aggregated using max pool and average pool, respectively. And then feeding the results after the max pool and the average pool into a shared full-connection network, finally compressing the spatial dimension, and summing and combining element by element to generate a channel attention map.
In the case of a graph alone, the channel is focused on what is important on the graph. The average value pooling has feedback to each pixel point on the feature map, and the maximum value pooling has gradient feedback only at the place with the maximum response in the feature map when the gradient back propagation calculation is performed.
The algorithmic expression of the channel attention layer is: mc (F) ═ delta (MLP ([ AvgPool (F); MaxPoint (F))
Wherein, delta represents a sigmoid function, and MLP represents a multilayer perceptron; avgpool (f) denotes average pooling operation; maxpool (F) represents the maximum pooling operation.
The rectangular pooling layer is described below.
In the Resnet network architecture, rectangular pooling is used instead of global pooling in the Resnet network to more efficiently capture long dependencies. Rectangular pooling has two advantages:
first, it deploys an elongated pooling nucleus shape along one spatial dimension, thus capturing the long distance relationship of isolated regions. Secondly, the narrow kernel shape is kept in other spatial dimensions, so that the local context is captured conveniently, and irrelevant areas are prevented from interfering with the label prediction. Integrating such long narrow pool kernels enables a semantic segmentation network to aggregate global and local contexts simultaneously. This is fundamentally different from traditional pooling, which collects context from a fixed square area.
FIG. 6 is a schematic diagram of rectangular pooling provided by an embodiment of the present invention; rectangular pooling consists of two paths, focusing on capturing remote context in both horizontal and vertical directions, respectively. As shown in fig. 6, a feature diagram is input, and in fact, C × H × W (C indicates the number of output channels, H indicates the height, and W indicates the width) is input, and for convenience of representation, only one channel is drawn in the figure, and the processing manner of each channel is the same, and the following description also takes one channel as an example. The input feature map is changed into H1 and 1W after horizontal rectangular pooling and vertical rectangular pooling, and the pooling method is to average the elements in the pooling core and take the value as the pooling output value. And then, amplifying the two output feature vectors respectively along the vertical direction and the horizontal direction through one-dimensional convolution, wherein the two amplified feature maps have the same size, and fusing the amplified feature maps. And finally, multiplying the original input graph by convolution and sigmoid processing to obtain a final output result.
In this process, each location of the output is associated with a respective location in the input. Thus, through multiple aggregation processes, long-term dependencies can be built throughout the scene. In addition, the final dot product operation can also be considered as a kind of attention learning.
Rectangular pooling allows for a longer but narrower range than global pooling, which may avoid unnecessary links being established between locations that are further away. Compared with a nonlocal structure (non-local structure) which focuses on a global module but needs to calculate the relationship between each pair of positions, the rectangular pooling calculation is light, and the capability of the algorithm for capturing remote spatial dependency relationships is greatly improved without consuming a large amount of calculation power cost and time cost.
The embodiment of the invention also provides a method for training the neural network to obtain a behavior recognition model. The specific method comprises the following steps: 3D point cloud data output by a common commercial millimeter wave radar is used as sample 3D point cloud data in the acquisition behaviors, and a thermodynamic diagram in the horizontal direction and a thermodynamic diagram in the vertical direction (corresponding to a time dimension) are generated based on the sample 3D point cloud data and processed to serve as input values of a student network.
Wherein, ordinary commercial millimeter wave radar factory is commercial TI, NXP, content, etc., the transmitting-receiving antenna requires 3-transmitting 4-receiving and more than, can use the quiet thing of millimeter wave radar system self-carrying to dispel the algorithm, only keep the information of the activity human body; and training an optimized neural network according to the sample 3D point cloud data and behavior identification information corresponding to the sample 3D point cloud data to obtain an optimal behavior identification model for predicting behaviors.
With respect to acquiring 3D point cloud data, a millimeter wave radar may be placed on a horizontal desktop having a height of about 1 meter, and a subject may make corresponding actions at a distance of 2-5 meters from the radar, including: standing up, sitting down, falling down, coughing, back pain, chest pain, etc., each action segment being 5 seconds in length. The inclusion of behaviors is not limited to the above behaviors.
Regarding data processing, it is possible to input a three-dimensional perspective view on the time axis with a dimension of 4 by interpolatively mapping millimeter wave point cloud data into a three-dimensional space.
Regarding the output of the model, the behavior recognition model calculates the probability that the current behavior belongs to a certain behavior at the Softmax layer, and outputs the behavior with the maximum probability.
Based on different trained actions, the method provided by the embodiment of the invention can be used for identifying dangerous behaviors such as falling, fighting and the like.
Correspondingly, the behavior recognition method further comprises the following steps: and performing dangerous behavior alarm based on the behavior recognition result.
Specifically, when dangerous behaviors are detected, warning information is sent to relevant organizations (such as a community service center, a dispatching department and the like), and the relevant organizations take corresponding actions, such as people can be dispatched to gate to help solitary old people, public security inspection can be carried out in relevant areas to prevent fighting, and therefore the social improvement level is improved.
Fig. 7 is a schematic diagram of a behavior recognition result according to an embodiment of the present invention; as shown in fig. 7, the behavior recognition result is the standing.
Considering that the low-resolution millimeter wave radar is sparse, the difficulty of learning fine human posture features is high, and optimization is needed on the basis of the existing model to achieve a better effect. The main means of network optimization at present is to deepen and widen the network by increasing, which leads to high time cost of prediction and high demand on computing power. The timeliness of the fall prediction itself and the portability of edge-side operations require that the optimization of the algorithm must be as lightweight and efficient as possible. Based on this, the method provided by the embodiment of the present invention provides a module of spatial attention-channel attention for optimization, and the module of spatial attention-channel attention is a lightweight general-purpose module, so that the module can be seamlessly integrated into any Convolutional Neural Network (CNN) architecture with negligible overhead, and can perform end-to-end training together with the base CNN. The whole optimization process is light and efficient, and excessive extra calculation cost is not needed.
In addition, the neural network of the invention adopts a rectangular pooling layer, thereby improving the long-distance dependence of the model. The point clouds are mostly long-strip point clouds serving as the mapping of the human body structure, the rectangular pooling core can be better adapted to the current task than a square pooling core, and pollution information from irrelevant areas is reduced. Meanwhile, the rectangular pooling can capture the dependence relationship between long distances, and compared with the existing method, the rectangular pooling has the advantages of less memory consumption and calculation power and obvious advantages.
Fig. 8 is a schematic structural diagram of a behavior recognition apparatus according to an embodiment of the present invention; as shown in fig. 8, the apparatus includes:
the acquisition module is used for acquiring data to be detected; the data to be detected is three-dimensional point cloud data;
the identification module is used for identifying the data to be detected by using a preset behavior identification model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
In an embodiment, the identification module is configured to obtain a first three-dimensional space-time sequence and a second three-dimensional space-time sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical thermodynamic diagram and a time dimension corresponding to the vertical thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction characteristic and the second direction characteristic to obtain a target characteristic;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
In one embodiment, the apparatus further comprises: the preprocessing module is used for acquiring a training sample set; the training sample set includes: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the behavior recognition model.
In an embodiment, the preprocessing module is configured to obtain three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
In one embodiment, the neural network comprises: a first neural network portion, a second neural network portion, and a fully connected layer;
the preprocessing module is used for extracting a first sample direction characteristic according to the first sample three-dimensional time-space sequence through a first neural network part;
extracting a second sample direction characteristic according to the second sample three-dimensional space-time sequence through a second neural network part;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
It should be noted that: in the embodiment, when the behavior recognition apparatus implements the corresponding behavior recognition method, the division of each program module is merely used as an example, and in practical applications, the processing distribution may be completed by different program modules as needed, that is, the internal structure of the server is divided into different program modules to complete all or part of the processing described above. In addition, the apparatus provided by the above embodiment and the embodiment of the corresponding method belong to the same concept, and the specific implementation process thereof is described in the method embodiment, which is not described herein again.
Fig. 9 is a schematic structural diagram of another behavior recognition apparatus according to an embodiment of the present invention, and as shown in fig. 9, the apparatus 90 includes: a processor 901 and a memory 902 for storing a computer program operable on the processor; the processor 901 is configured to, when running the computer program, execute: acquiring data to be detected; the data to be detected is three-dimensional point cloud data; recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result; the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
The behavior recognition device may execute the method shown in fig. 1, which belongs to the same concept as the method embodiment shown in fig. 1, and the specific implementation process thereof is described in detail in the method embodiment and is not described herein again.
In practical applications, the apparatus 90 may further include: at least one network interface 903. The various components of the device 90 are coupled together by a bus system 904. It is understood that the bus system 904 is used to enable connected communication between these components. The bus system 904 includes a power bus, a control bus, and a status signal bus in addition to a data bus. But for the sake of clarity the various busses are labeled in figure 9 as the bus system 904. The number of the processors 901 may be at least one. The network interface 903 is used for communication between the apparatus 90 and other devices in a wired or wireless manner.
Memory 902 in embodiments of the present invention is used to store various types of data to support the operation of device 90.
The method disclosed in the above embodiments of the present invention may be applied to the processor 901, or implemented by the processor 901. Processor 901 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be implemented by integrated logic circuits of hardware in the processor 901 or by instructions in the form of software. The Processor 901 may be a general purpose Processor, a DiGital Signal Processor (DSP), or other programmable logic device, discrete gate or transistor logic device, discrete hardware component, etc. Processor 901 may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present invention. The general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed by the embodiment of the invention can be directly implemented by a hardware decoding processor, or can be implemented by combining hardware and software modules in the decoding processor. The software modules may be located in a storage medium located in the memory 902, and the processor 901 reads the information in the memory 902 and performs the steps of the aforementioned methods in combination with its hardware.
In an exemplary embodiment, the apparatus 90 may be implemented by one or more Application Specific Integrated Circuits (ASICs), DSPs, Programmable Logic Devices (PLDs), Complex Programmable Logic Devices (CPLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, Micro Controllers (MCUs), microprocessors (microprocessors), or other electronic components for performing the foregoing methods.
An embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored; the computer program, when executed by a processor, performs: acquiring data to be detected; the data to be detected is three-dimensional point cloud data; recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result; the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
The computer program may execute the method shown in fig. 1 when being executed by a processor, and belongs to the same concept as the method embodiment shown in fig. 1, and the specific implementation process thereof is detailed in the method embodiment and is not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or in other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention or portions thereof contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.
It should be noted that: "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
The technical means described in the embodiments of the present application may be arbitrarily combined without conflict.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (12)

1. A method of behavior recognition, the method comprising:
acquiring data to be detected; the data to be detected is three-dimensional point cloud data;
recognizing the data to be detected by using a preset behavior recognition model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
2. The method according to claim 1, wherein the identifying the data to be detected by using a preset behavior identification model to obtain a behavior result comprises:
acquiring a first three-dimensional space-time sequence and a second three-dimensional space-time sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical thermodynamic diagram and a time dimension corresponding to the vertical thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction characteristic and the second direction characteristic to obtain a target characteristic;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
3. The method according to claim 1 or 2, characterized in that the method further comprises: generating a preset behavior recognition model; the generating of the preset behavior recognition model includes:
acquiring a training sample set; the training sample set comprises: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the behavior recognition model.
4. The method of claim 3, wherein the training the neural network according to the at least one training sample and the behavior label corresponding to each training sample comprises:
acquiring three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
5. The method of claim 4, wherein the neural network comprises: a first neural network portion, a second neural network portion, and a fully connected layer;
the training of the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data comprises:
the first neural network part extracts a first sample directional feature according to the first sample three-dimensional space-time sequence;
the second neural network part extracts a second sample direction characteristic according to the second sample three-dimensional space-time sequence;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
6. An apparatus for behavior recognition, the apparatus comprising:
the acquisition module is used for acquiring data to be detected; the data to be detected is three-dimensional point cloud data;
the identification module is used for identifying the data to be detected by using a preset behavior identification model to obtain a behavior result;
the behavior recognition model is obtained based on neural network training; the neural network employs blocks based on spatial attention layers and channel attention layers, and rectangular pooling layers for capturing different directional contexts.
7. The device according to claim 6, wherein the identification module is configured to obtain a first three-dimensional spatiotemporal sequence and a second three-dimensional spatiotemporal sequence according to the data to be detected; the first three-dimensional space-time sequence comprises a horizontal thermodynamic diagram and a time dimension corresponding to the horizontal thermodynamic diagram; the second three-dimensional space-time sequence comprises a vertical thermodynamic diagram and a time dimension corresponding to the vertical thermodynamic diagram;
extracting a first direction characteristic according to the first three-dimensional space-time sequence, and extracting a second direction characteristic according to the second three-dimensional space-time sequence;
fusing the first direction characteristic and the second direction characteristic to obtain a target characteristic;
decoding and predicting the target sample characteristics to obtain three-dimensional positions of N skeleton nodes; n is greater than or equal to 1;
and obtaining the behavior result based on the three-dimensional positions of the N skeleton nodes.
8. The apparatus of claim 6 or 7, further comprising: the preprocessing module is used for acquiring a training sample set; the training sample set includes: at least one training sample and a behavior label corresponding to each training sample; the training sample is three-dimensional point cloud sample data.
And training the neural network according to the at least one training sample and the behavior label corresponding to each training sample to obtain the behavior recognition model.
9. The apparatus of claim 8, wherein the preprocessing module is configured to obtain three-dimensional point cloud sample data; the three-dimensional point cloud sample data comprises: a first sample three-dimensional space-time sequence and a second sample three-dimensional space-time sequence; the first sample three-dimensional space-time sequence comprises a sample horizontal direction thermodynamic diagram and a time dimension corresponding to the sample horizontal direction thermodynamic diagram; the second sample three-dimensional space-time sequence comprises a sample vertical direction thermodynamic diagram and a time dimension corresponding to the sample vertical direction thermodynamic diagram;
and training the neural network based on the first sample three-dimensional space-time sequence, the second sample three-dimensional space-time sequence and the behavior label corresponding to the three-dimensional point cloud sample data.
10. The apparatus of claim 9, wherein the neural network comprises: a first neural network portion, a second neural network portion, and a fully-connected layer;
the preprocessing module is used for extracting a first sample direction characteristic according to the first sample three-dimensional time-space sequence through a first neural network part;
extracting a second sample direction characteristic according to the second sample three-dimensional space-time sequence through a second neural network part;
fusing the first sample direction feature and the second sample direction feature to obtain a target sample feature;
decoding and predicting the target sample characteristics by using the full-connection layer to obtain sample three-dimensional positions of N skeleton nodes; obtaining a sample prediction result based on the sample three-dimensional positions of the N skeleton nodes;
and comparing the sample prediction result with the behavior label corresponding to the three-dimensional point cloud sample data, and optimizing the neural network based on the comparison result.
11. A behavior recognition arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method according to any of claims 1 to 5 are implemented by the processor when executing the program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 5.
CN202110002186.4A 2021-01-04 2021-01-04 Behavior recognition method and device and storage medium Pending CN114764902A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110002186.4A CN114764902A (en) 2021-01-04 2021-01-04 Behavior recognition method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110002186.4A CN114764902A (en) 2021-01-04 2021-01-04 Behavior recognition method and device and storage medium

Publications (1)

Publication Number Publication Date
CN114764902A true CN114764902A (en) 2022-07-19

Family

ID=82364190

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110002186.4A Pending CN114764902A (en) 2021-01-04 2021-01-04 Behavior recognition method and device and storage medium

Country Status (1)

Country Link
CN (1) CN114764902A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035606A (en) * 2022-08-11 2022-09-09 天津大学 Bone action recognition method based on segment-driven contrast learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035606A (en) * 2022-08-11 2022-09-09 天津大学 Bone action recognition method based on segment-driven contrast learning
CN115035606B (en) * 2022-08-11 2022-10-21 天津大学 Bone action recognition method based on segment-driven contrast learning

Similar Documents

Publication Publication Date Title
Zou et al. Occupancy detection in the office by analyzing surveillance videos and its application to building energy conservation
CN108875540B (en) Image processing method, device and system and storage medium
Workman et al. A unified model for near and remote sensing
Sharma et al. Performance analysis of moving object detection using BGS techniques in visual surveillance
Shi et al. Optimal placement and intelligent smoke detection algorithm for wildfire-monitoring cameras
Chen et al. Research on recognition of fly species based on improved RetinaNet and CBAM
CN111178284A (en) Pedestrian re-identification method and system based on spatio-temporal union model of map data
KR102333143B1 (en) System for providing people counting service
CN112200041B (en) Video motion recognition method and device, storage medium and electronic equipment
WO2023142602A1 (en) Image processing method and apparatus, and computer-readable storage medium
Ratre et al. Tucker tensor decomposition‐based tracking and Gaussian mixture model for anomaly localisation and detection in surveillance videos
WO2022111387A1 (en) Data processing method and related apparatus
Guo et al. Using multi-scale and hierarchical deep convolutional features for 3D semantic classification of TLS point clouds
Gupta et al. A novel finetuned YOLOv6 transfer learning model for real-time object detection
Li et al. Vehicle detection in remote sensing images using denoizing-based convolutional neural networks
Ma et al. Scene invariant crowd counting using multi‐scales head detection in video surveillance
Venkatesvara Rao et al. Real-time video object detection and classification using hybrid texture feature extraction
Wang et al. Object counting in video surveillance using multi-scale density map regression
Liu et al. Online human action recognition with spatial and temporal skeleton features using a distributed camera network
CN114764902A (en) Behavior recognition method and device and storage medium
CN113822134A (en) Instance tracking method, device, equipment and storage medium based on video
Mago et al. Optimized outdoor parking system for smart cities using advanced saliency detection method and hybrid features extraction model
Parashar et al. A robust covariate‐invariant gait recognition based on pose features
Zhu Image quality assessment model based on multi-feature fusion of energy Internet of Things
Han et al. Multibranch spatial-channel attention for semantic labeling of very high-resolution remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination